1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-31 22:04:40 +03:00

Implement pipeline mode in libpq

Pipeline mode in libpq lets an application avoid the Sync messages in
the FE/BE protocol that are implicit in the old libpq API after each
query.  The application can then insert Sync at its leisure with a new
libpq function PQpipelineSync.  This can lead to substantial reductions
in query latency.

Co-authored-by: Craig Ringer <craig.ringer@enterprisedb.com>
Co-authored-by: Matthieu Garrigues <matthieu.garrigues@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Aya Iwata <iwata.aya@jp.fujitsu.com>
Reviewed-by: Daniel Vérité <daniel@manitou-mail.org>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Kirk Jamison <k.jamison@fujitsu.com>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Nikhil Sontakke <nikhils@2ndquadrant.com>
Reviewed-by: Vaishnavi Prabakaran <VaishnaviP@fast.au.fujitsu.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>

Discussion: https://postgr.es/m/CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=B4dMMZw@mail.gmail.com
Discussion: https://postgr.es/m/CAJkzx4T5E-2cQe3dtv2R78dYFvz+in8PY7A8MArvLhs_pg75gg@mail.gmail.com
This commit is contained in:
Alvaro Herrera
2021-03-15 18:13:42 -03:00
parent 146cb3889c
commit acb7e4eb6b
18 changed files with 2706 additions and 113 deletions

View File

@ -3180,6 +3180,33 @@ ExecStatusType PQresultStatus(const PGresult *res);
</para>
</listitem>
</varlistentry>
<varlistentry id="libpq-pgres-pipeline-sync">
<term><literal>PGRES_PIPELINE_SYNC</literal></term>
<listitem>
<para>
The <structname>PGresult</structname> represents a
synchronization point in pipeline mode, requested by
<xref linkend="libpq-PQpipelineSync"/>.
This status occurs only when pipeline mode has been selected.
</para>
</listitem>
</varlistentry>
<varlistentry id="libpq-pgres-pipeline-aborted">
<term><literal>PGRES_PIPELINE_ABORTED</literal></term>
<listitem>
<para>
The <structname>PGresult</structname> represents a pipeline that has
received an error from the server. <function>PQgetResult</function>
must be called repeatedly, and each time it will return this status code
until the end of the current pipeline, at which point it will return
<literal>PGRES_PIPELINE_SYNC</literal> and normal processing can
resume.
</para>
</listitem>
</varlistentry>
</variablelist>
If the result status is <literal>PGRES_TUPLES_OK</literal> or
@ -4677,8 +4704,9 @@ int PQsendDescribePortal(PGconn *conn, const char *portalName);
<xref linkend="libpq-PQsendQueryParams"/>,
<xref linkend="libpq-PQsendPrepare"/>,
<xref linkend="libpq-PQsendQueryPrepared"/>,
<xref linkend="libpq-PQsendDescribePrepared"/>, or
<xref linkend="libpq-PQsendDescribePortal"/>
<xref linkend="libpq-PQsendDescribePrepared"/>,
<xref linkend="libpq-PQsendDescribePortal"/>, or
<xref linkend="libpq-PQpipelineSync"/>
call, and returns it.
A null pointer is returned when the command is complete and there
will be no more results.
@ -4702,6 +4730,19 @@ PGresult *PQgetResult(PGconn *conn);
<xref linkend="libpq-PQconsumeInput"/>.
</para>
<para>
In pipeline mode, <function>PQgetResult</function> will return normally
unless an error occurs; for any subsequent query sent after the one
that caused the error until (and excluding) the next synchronization point,
a special result of type <literal>PGRES_PIPELINE_ABORTED</literal> will
be returned, and a null pointer will be returned after it.
When the pipeline synchronization point is reached, a result of type
<literal>PGRES_PIPELINE_SYNC</literal> will be returned.
The result of the next query after the synchronization point follows
immediately (that is, no null pointer is returned after
the synchronization point.)
</para>
<note>
<para>
Even when <xref linkend="libpq-PQresultStatus"/> indicates a fatal
@ -4926,6 +4967,476 @@ int PQflush(PGconn *conn);
</sect1>
<sect1 id="libpq-pipeline-mode">
<title>Pipeline Mode</title>
<indexterm zone="libpq-pipeline-mode">
<primary>libpq</primary>
<secondary>pipeline mode</secondary>
</indexterm>
<indexterm zone="libpq-pipeline-mode">
<primary>pipelining</primary>
<secondary>in libpq</secondary>
</indexterm>
<indexterm zone="libpq-pipeline-mode">
<primary>batch mode</primary>
<secondary>in libpq</secondary>
</indexterm>
<para>
<application>libpq</application> pipeline mode allows applications to
send a query without having to read the result of the previously
sent query. Taking advantage of the pipeline mode, a client will wait
less for the server, since multiple queries/results can be
sent/received in a single network transaction.
</para>
<para>
While pipeline mode provides a significant performance boost, writing
clients using the pipeline mode is more complex because it involves
managing a queue of pending queries and finding which result
corresponds to which query in the queue.
</para>
<para>
Pipeline mode also generally consumes more memory on both the client and server,
though careful and aggressive management of the send/receive queue can mitigate
this. This applies whether or not the connection is in blocking or non-blocking
mode.
</para>
<para>
While the pipeline API was introduced in
<productname>PostgreSQL</productname> 14, it is a client-side feature
which doesn't require special server support, and works on any server
that supports the v3 extended query protocol.
</para>
<sect2 id="libpq-pipeline-using">
<title>Using Pipeline Mode</title>
<para>
To issue pipelines, the application must switch the connection
into pipeline mode,
which is done with <xref linkend="libpq-PQenterPipelineMode"/>.
<xref linkend="libpq-PQpipelineStatus"/> can be used
to test whether pipeline mode is active.
In pipeline mode, only <link linkend="libpq-async">asynchronous operations</link>
are permitted, and <literal>COPY</literal> is disallowed.
Using synchronous command execution functions
such as <function>PQfn</function>,
<function>PQexec</function>,
<function>PQexecParams</function>,
<function>PQprepare</function>,
<function>PQexecPrepared</function>,
<function>PQdescribePrepared</function>,
<function>PQdescribePortal</function>,
is an error condition.
Once all dispatched commands have had their results processed, and
the end pipeline result has been consumed, the application may return
to non-pipelined mode with <xref linkend="libpq-PQexitPipelineMode"/>.
</para>
<note>
<para>
It is best to use pipeline mode with <application>libpq</application> in
<link linkend="libpq-PQsetnonblocking">non-blocking mode</link>. If used
in blocking mode it is possible for a client/server deadlock to occur.
<footnote>
<para>
The client will block trying to send queries to the server, but the
server will block trying to send results to the client from queries
it has already processed. This only occurs when the client sends
enough queries to fill both its output buffer and the server's receive
buffer before it switches to processing input from the server,
but it's hard to predict exactly when that will happen.
</para>
</footnote>
</para>
</note>
<sect3 id="libpq-pipeline-sending">
<title>Issuing Queries</title>
<para>
After entering pipeline mode, the application dispatches requests using
<xref linkend="libpq-PQsendQuery"/>,
<xref linkend="libpq-PQsendQueryParams"/>,
or its prepared-query sibling
<xref linkend="libpq-PQsendQueryPrepared"/>.
These requests are queued on the client-side until flushed to the server;
this occurs when <xref linkend="libpq-PQpipelineSync"/> is used to
establish a synchronization point in the pipeline,
or when <xref linkend="libpq-PQflush"/> is called.
The functions <xref linkend="libpq-PQsendPrepare"/>,
<xref linkend="libpq-PQsendDescribePrepared"/>, and
<xref linkend="libpq-PQsendDescribePortal"/> also work in pipeline mode.
Result processing is described below.
</para>
<para>
The server executes statements, and returns results, in the order the
client sends them. The server will begin executing the commands in the
pipeline immediately, not waiting for the end of the pipeline.
If any statement encounters an error, the server aborts the current
transaction and does not execute any subsequent command in the queue
until the next synchronization point established by
<function>PQpipelineSync</function>;
a <literal>PGRES_PIPELINE_ABORTED</literal> result is produced for
each such command.
(This remains true even if the commands in the pipeline would rollback
the transaction.)
Query processing resumes after the synchronization point.
</para>
<para>
It's fine for one operation to depend on the results of a
prior one; for example, one query may define a table that the next
query in the same pipeline uses. Similarly, an application may
create a named prepared statement and execute it with later
statements in the same pipeline.
</para>
</sect3>
<sect3 id="libpq-pipeline-results">
<title>Processing Results</title>
<para>
To process the result of one query in a pipeline, the application calls
<function>PQgetResult</function> repeatedly and handles each result
until <function>PQgetResult</function> returns null.
The result from the next query in the pipeline may then be retrieved using
<function>PQgetResult</function> again and the cycle repeated.
The application handles individual statement results as normal.
When the results of all the queries in the pipeline have been
returned, <function>PQgetResult</function> returns a result
containing the status value <literal>PGRES_PIPELINE_SYNC</literal>
</para>
<para>
The client may choose to defer result processing until the complete
pipeline has been sent, or interleave that with sending further
queries in the pipeline; see <xref linkend="libpq-pipeline-interleave"/>.
</para>
<para>
To enter single-row mode, call <function>PQsetSingleRowMode</function>
before retrieving results with <function>PQgetResult</function>.
This mode selection is effective only for the query currently
being processed. For more information on the use of
<function>PQsetSingleRowMode</function>,
refer to <xref linkend="libpq-single-row-mode"/>.
</para>
<para>
<function>PQgetResult</function> behaves the same as for normal
asynchronous processing except that it may contain the new
<type>PGresult</type> types <literal>PGRES_PIPELINE_SYNC</literal>
and <literal>PGRES_PIPELINE_ABORTED</literal>.
<literal>PGRES_PIPELINE_SYNC</literal> is reported exactly once for each
<function>PQpipelineSync</function> at the corresponding point
in the pipeline.
<literal>PGRES_PIPELINE_ABORTED</literal> is emitted in place of a normal
query result for the first error and all subsequent results
until the next <literal>PGRES_PIPELINE_SYNC</literal>;
see <xref linkend="libpq-pipeline-errors"/>.
</para>
<para>
<function>PQisBusy</function>, <function>PQconsumeInput</function>, etc
operate as normal when processing pipeline results.
</para>
<para>
<application>libpq</application> does not provide any information to the
application about the query currently being processed (except that
<function>PQgetResult</function> returns null to indicate that we start
returning the results of next query). The application must keep track
of the order in which it sent queries, to associate them with their
corresponding results.
Applications will typically use a state machine or a FIFO queue for this.
</para>
</sect3>
<sect3 id="libpq-pipeline-errors">
<title>Error Handling</title>
<para>
From the client's perspective, after <function>PQresultStatus</function>
returns <literal>PGRES_FATAL_ERROR</literal>,
the pipeline is flagged as aborted.
<function>PQresultStatus</function> will report a
<literal>PGRES_PIPELINE_ABORTED</literal> result for each remaining queued
operation in an aborted pipeline. The result for
<function>PQpipelineSync</function> is reported as
<literal>PGRES_PIPELINE_SYNC</literal> to signal the end of the aborted pipeline
and resumption of normal result processing.
</para>
<para>
The client <emphasis>must</emphasis> process results with
<function>PQgetResult</function> during error recovery.
</para>
<para>
If the pipeline used an implicit transaction, then operations that have
already executed are rolled back and operations that were queued to follow
the failed operation are skipped entirely. The same behavior holds if the
pipeline starts and commits a single explicit transaction (i.e. the first
statement is <literal>BEGIN</literal> and the last is
<literal>COMMIT</literal>) except that the session remains in an aborted
transaction state at the end of the pipeline. If a pipeline contains
<emphasis>multiple explicit transactions</emphasis>, all transactions that
committed prior to the error remain committed, the currently in-progress
transaction is aborted, and all subsequent operations are skipped completely,
including subsequent transactions. If a pipeline synchronization point
occurs with an explicit transaction block in aborted state, the next pipeline
will become aborted immediately unless the next command puts the transaction
in normal mode with <command>ROLLBACK</command>.
</para>
<note>
<para>
The client must not assume that work is committed when it
<emphasis>sends</emphasis> a <literal>COMMIT</literal> &mdash; only when the
corresponding result is received to confirm the commit is complete.
Because errors arrive asynchronously, the application needs to be able to
restart from the last <emphasis>received</emphasis> committed change and
resend work done after that point if something goes wrong.
</para>
</note>
</sect3>
<sect3 id="libpq-pipeline-interleave">
<title>Interleaving Result Processing and Query Dispatch</title>
<para>
To avoid deadlocks on large pipelines the client should be structured
around a non-blocking event loop using operating system facilities
such as <function>select</function>, <function>poll</function>,
<function>WaitForMultipleObjectEx</function>, etc.
</para>
<para>
The client application should generally maintain a queue of work
remaining to be dispatched and a queue of work that has been dispatched
but not yet had its results processed. When the socket is writable
it should dispatch more work. When the socket is readable it should
read results and process them, matching them up to the next entry in
its corresponding results queue. Based on available memory, results from the
socket should be read frequently: there's no need to wait until the
pipeline end to read the results. Pipelines should be scoped to logical
units of work, usually (but not necessarily) one transaction per pipeline.
There's no need to exit pipeline mode and re-enter it between pipelines,
or to wait for one pipeline to finish before sending the next.
</para>
<para>
An example using <function>select()</function> and a simple state
machine to track sent and received work is in
<filename>src/test/modules/libpq_pipeline/libpq_pipeline.c</filename>
in the PostgreSQL source distribution.
</para>
</sect3>
</sect2>
<sect2 id="libpq-pipeline-functions">
<title>Functions Associated with Pipeline Mode</title>
<variablelist>
<varlistentry id="libpq-PQpipelineStatus">
<term><function>PQpipelineStatus</function><indexterm><primary>PQpipelineStatus</primary></indexterm></term>
<listitem>
<para>
Returns the current pipeline mode status of the
<application>libpq</application> connection.
<synopsis>
PGpipelineStatus PQpipelineStatus(const PGconn *conn);
</synopsis>
</para>
<para>
<function>PQpipelineStatus</function> can return one of the following values:
<variablelist>
<varlistentry>
<term>
<literal>PQ_PIPELINE_ON</literal>
</term>
<listitem>
<para>
The <application>libpq</application> connection is in
pipeline mode.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<literal>PQ_PIPELINE_OFF</literal>
</term>
<listitem>
<para>
The <application>libpq</application> connection is
<emphasis>not</emphasis> in pipeline mode.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<literal>PQ_PIPELINE_ABORTED</literal>
</term>
<listitem>
<para>
The <application>libpq</application> connection is in pipeline
mode and an error occurred while processing the current pipeline.
The aborted flag is cleared when <function>PQgetResult</function>
returns a result of type <literal>PGRES_PIPELINE_SYNC</literal>.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</listitem>
</varlistentry>
<varlistentry id="libpq-PQenterPipelineMode">
<term><function>PQenterPipelineMode</function><indexterm><primary>PQenterPipelineMode</primary></indexterm></term>
<listitem>
<para>
Causes a connection to enter pipeline mode if it is currently idle or
already in pipeline mode.
<synopsis>
int PQenterPipelineMode(PGconn *conn);
</synopsis>
</para>
<para>
Returns 1 for success.
Returns 0 and has no effect if the connection is not currently
idle, i.e., it has a result ready, or it is waiting for more
input from the server, etc.
This function does not actually send anything to the server,
it just changes the <application>libpq</application> connection
state.
</para>
</listitem>
</varlistentry>
<varlistentry id="libpq-PQexitPipelineMode">
<term><function>PQexitPipelineMode</function><indexterm><primary>PQexitPipelineMode</primary></indexterm></term>
<listitem>
<para>
Causes a connection to exit pipeline mode if it is currently in pipeline mode
with an empty queue and no pending results.
<synopsis>
int PQexitPipelineMode(PGconn *conn);
</synopsis>
</para>
<para>
Returns 1 for success. Returns 1 and takes no action if not in
pipeline mode. If the current statement isn't finished processing,
or <function>PQgetResult</function> has not been called to collect
results from all previously sent query, returns 0 (in which case,
use <xref linkend="libpq-PQerrorMessage"/> to get more information
about the failure).
</para>
</listitem>
</varlistentry>
<varlistentry id="libpq-PQpipelineSync">
<term><function>PQpipelineSync</function><indexterm><primary>PQpipelineSync</primary></indexterm></term>
<listitem>
<para>
Marks a synchronization point in a pipeline by sending a
<link linkend="protocol-flow-ext-query">sync message</link>
and flushing the send buffer. This serves as
the delimiter of an implicit transaction and an error recovery
point; see <xref linkend="libpq-pipeline-errors"/>.
<synopsis>
int PQpipelineSync(PGconn *conn);
</synopsis>
</para>
<para>
Returns 1 for success. Returns 0 if the connection is not in
pipeline mode or sending a
<link linkend="protocol-flow-ext-query">sync message</link>
failed.
</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2 id="libpq-pipeline-tips">
<title>When to Use Pipeline Mode</title>
<para>
Much like asynchronous query mode, there is no meaningful performance
overhead when using pipeline mode. It increases client application complexity,
and extra caution is required to prevent client/server deadlocks, but
pipeline mode can offer considerable performance improvements, in exchange for
increased memory usage from leaving state around longer.
</para>
<para>
Pipeline mode is most useful when the server is distant, i.e., network latency
(<quote>ping time</quote>) is high, and also when many small operations
are being performed in rapid succession. There is usually less benefit
in using pipelined commands when each query takes many multiples of the client/server
round-trip time to execute. A 100-statement operation run on a server
300ms round-trip-time away would take 30 seconds in network latency alone
without pipelining; with pipelining it may spend as little as 0.3s waiting for
results from the server.
</para>
<para>
Use pipelined commands when your application does lots of small
<literal>INSERT</literal>, <literal>UPDATE</literal> and
<literal>DELETE</literal> operations that can't easily be transformed
into operations on sets, or into a <literal>COPY</literal> operation.
</para>
<para>
Pipeline mode is not useful when information from one operation is required by
the client to produce the next operation. In such cases, the client
would have to introduce a synchronization point and wait for a full client/server
round-trip to get the results it needs. However, it's often possible to
adjust the client design to exchange the required information server-side.
Read-modify-write cycles are especially good candidates; for example:
<programlisting>
BEGIN;
SELECT x FROM mytable WHERE id = 42 FOR UPDATE;
-- result: x=2
-- client adds 1 to x:
UPDATE mytable SET x = 3 WHERE id = 42;
COMMIT;
</programlisting>
could be much more efficiently done with:
<programlisting>
UPDATE mytable SET x = x + 1 WHERE id = 42;
</programlisting>
</para>
<para>
Pipelining is less useful, and more complex, when a single pipeline contains
multiple transactions (see <xref linkend="libpq-pipeline-errors"/>).
</para>
</sect2>
</sect1>
<sect1 id="libpq-single-row-mode">
<title>Retrieving Query Results Row-by-Row</title>
@ -4966,6 +5477,13 @@ int PQflush(PGconn *conn);
Each object should be freed with <xref linkend="libpq-PQclear"/> as usual.
</para>
<para>
When using pipeline mode, single-row mode needs to be activated for each
query in the pipeline before retrieving results for that query
with <function>PQgetResult</function>.
See <xref linkend="libpq-pipeline-mode"/> for more information.
</para>
<para>
<variablelist>
<varlistentry id="libpq-PQsetSingleRowMode">

View File

@ -130,6 +130,10 @@
<application>libpq</application> library.
</para>
<para>
Client applications cannot use these functions while a libpq connection is in pipeline mode.
</para>
<sect2 id="lo-create">
<title>Creating a Large Object</title>