1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-30 11:03:19 +03:00

Support writable foreign tables.

This patch adds the core-system infrastructure needed to support updates
on foreign tables, and extends contrib/postgres_fdw to allow updates
against remote Postgres servers.  There's still a great deal of room for
improvement in optimization of remote updates, but at least there's basic
functionality there now.

KaiGai Kohei, reviewed by Alexander Korotkov and Laurenz Albe, and rather
heavily revised by Tom Lane.
This commit is contained in:
Tom Lane
2013-03-10 14:14:53 -04:00
parent 7f49a67f95
commit 21734d2fb8
29 changed files with 3672 additions and 347 deletions

View File

@ -3040,36 +3040,41 @@ ANALYZE measurement;
Foreign data is accessed with help from a
<firstterm>foreign data wrapper</firstterm>. A foreign data wrapper is a
library that can communicate with an external data source, hiding the
details of connecting to the data source and fetching data from it. There
is a foreign data wrapper available as a <filename>contrib</> module,
which can read plain data files residing on the server. Other kind of
foreign data wrappers might be found as third party products. If none of
the existing foreign data wrappers suit your needs, you can write your
own; see <xref linkend="fdwhandler">.
details of connecting to the data source and obtaining data from it.
There are some foreign data wrappers available as <filename>contrib</>
modules; see <xref linkend="contrib">. Other kinds of foreign data
wrappers might be found as third party products. If none of the existing
foreign data wrappers suit your needs, you can write your own; see <xref
linkend="fdwhandler">.
</para>
<para>
To access foreign data, you need to create a <firstterm>foreign server</>
object, which defines how to connect to a particular external data source,
according to the set of options used by a particular foreign data
object, which defines how to connect to a particular external data source
according to the set of options used by its supporting foreign data
wrapper. Then you need to create one or more <firstterm>foreign
tables</firstterm>, which define the structure of the remote data. A
foreign table can be used in queries just like a normal table, but a
foreign table has no storage in the PostgreSQL server. Whenever it is
used, <productname>PostgreSQL</productname> asks the foreign data wrapper
to fetch the data from the external source.
to fetch data from the external source, or transmit data to the external
source in the case of update commands.
</para>
<para>
Accessing remote data may require authentication at the external
Accessing remote data may require authenticating to the external
data source. This information can be provided by a
<firstterm>user mapping</>, which can provide additional options based
<firstterm>user mapping</>, which can provide additional data
such as user names and passwords based
on the current <productname>PostgreSQL</productname> role.
</para>
<para>
Currently, foreign tables are read-only. This limitation may be fixed
in a future release.
For additional information, see
<xref linkend="sql-createforeigndatawrapper">,
<xref linkend="sql-createserver">,
<xref linkend="sql-createusermapping">, and
<xref linkend="sql-createforeigntable">.
</para>
</sect1>

View File

@ -13,14 +13,15 @@
wrapper, which consists of a set of functions that the core server
calls. The foreign data wrapper is responsible for fetching
data from the remote data source and returning it to the
<productname>PostgreSQL</productname> executor. This chapter outlines how
to write a new foreign data wrapper.
<productname>PostgreSQL</productname> executor. If updating foreign
tables is to be supported, the wrapper must handle that, too.
This chapter outlines how to write a new foreign data wrapper.
</para>
<para>
The foreign data wrappers included in the standard distribution are good
references when trying to write your own. Look into the
<filename>contrib/file_fdw</> subdirectory of the source tree.
<filename>contrib</> subdirectory of the source tree.
The <xref linkend="sql-createforeigndatawrapper"> reference page also has
some useful details.
</para>
@ -84,9 +85,19 @@
<para>
The FDW handler function returns a palloc'd <structname>FdwRoutine</>
struct containing pointers to the following callback functions:
struct containing pointers to the callback functions described below.
The scan-related functions are required, the rest are optional.
</para>
<para>
The <structname>FdwRoutine</> struct type is declared in
<filename>src/include/foreign/fdwapi.h</>, which see for additional
details.
</para>
<sect2 id="fdw-callbacks-scan">
<title>FDW Routines For Scanning Foreign Tables</title>
<para>
<programlisting>
void
@ -96,7 +107,7 @@ GetForeignRelSize (PlannerInfo *root,
</programlisting>
Obtain relation size estimates for a foreign table. This is called
at the beginning of planning for a query involving a foreign table.
at the beginning of planning for a query that scans a foreign table.
<literal>root</> is the planner's global information about the query;
<literal>baserel</> is the planner's information about this table; and
<literal>foreigntableid</> is the <structname>pg_class</> OID of the
@ -181,23 +192,6 @@ GetForeignPlan (PlannerInfo *root,
<para>
<programlisting>
void
ExplainForeignScan (ForeignScanState *node,
ExplainState *es);
</programlisting>
Print additional <command>EXPLAIN</> output for a foreign table scan.
This can just return if there is no need to print anything.
Otherwise, it should call <function>ExplainPropertyText</> and
related functions to add fields to the <command>EXPLAIN</> output.
The flag fields in <literal>es</> can be used to determine what to
print, and the state of the <structname>ForeignScanState</> node
can be inspected to provide run-time statistics in the <command>EXPLAIN
ANALYZE</> case.
</para>
<para>
<programlisting>
void
BeginForeignScan (ForeignScanState *node,
int eflags);
</programlisting>
@ -212,6 +206,8 @@ BeginForeignScan (ForeignScanState *node,
<structname>ForeignScanState</> node (in particular, from the underlying
<structname>ForeignScan</> plan node, which contains any FDW-private
information provided by <function>GetForeignPlan</>).
<literal>eflags</> contains flag bits describing the executor's
operating mode for this plan node.
</para>
<para>
@ -246,9 +242,9 @@ IterateForeignScan (ForeignScanState *node);
<para>
Note that <productname>PostgreSQL</productname>'s executor doesn't care
whether the rows returned violate the <literal>NOT NULL</literal>
constraints which were defined on the foreign table columns - but the
planner does care, and may optimize queries incorrectly if
whether the rows returned violate any <literal>NOT NULL</literal>
constraints that were defined on the foreign table columns &mdash; but
the planner does care, and may optimize queries incorrectly if
<literal>NULL</> values are present in a column declared not to contain
them. If a <literal>NULL</> value is encountered when the user has
declared that none should be present, it may be appropriate to raise an
@ -277,6 +273,356 @@ EndForeignScan (ForeignScanState *node);
to remote servers should be cleaned up.
</para>
</sect2>
<sect2 id="fdw-callbacks-update">
<title>FDW Routines For Updating Foreign Tables</title>
<para>
If an FDW supports writable foreign tables, it should provide
some or all of the following callback functions depending on
the needs and capabilities of the FDW:
</para>
<para>
<programlisting>
void
AddForeignUpdateTargets (Query *parsetree,
RangeTblEntry *target_rte,
Relation target_relation);
</programlisting>
<command>UPDATE</> and <command>DELETE</> operations are performed
against rows previously fetched by the table-scanning functions. The
FDW may need extra information, such as a row ID or the values of
primary-key columns, to ensure that it can identify the exact row to
update or delete. To support that, this function can add extra hidden,
or <quote>junk</>, target columns to the list of columns that are to be
retrieved from the foreign table during an <command>UPDATE</> or
<command>DELETE</>.
</para>
<para>
To do that, add <structname>TargetEntry</> items to
<literal>parsetree-&gt;targetList</>, containing expressions for the
extra values to be fetched. Each such entry must be marked
<structfield>resjunk</> = <literal>true</>, and must have a distinct
<structfield>resname</> that will identify it at execution time.
Avoid using names matching <literal>ctid<replaceable>N</></literal> or
<literal>wholerow<replaceable>N</></literal>, as the core system can
generate junk columns of these names.
</para>
<para>
This function is called in the rewriter, not the planner, so the
information available is a bit different from that available to the
planning routines.
<literal>parsetree</> is the parse tree for the <command>UPDATE</> or
<command>DELETE</> command, while <literal>target_rte</> and
<literal>target_relation</> describe the target foreign table.
</para>
<para>
If the <function>AddForeignUpdateTargets</> pointer is set to
<literal>NULL</>, no extra target expressions are added.
(This will make it impossible to implement <command>DELETE</>
operations, though <command>UPDATE</> may still be feasible if the FDW
relies on an unchanging primary key to identify rows.)
</para>
<para>
<programlisting>
List *
PlanForeignModify (PlannerInfo *root,
ModifyTable *plan,
Index resultRelation,
int subplan_index);
</programlisting>
Perform any additional planning actions needed for an insert, update, or
delete on a foreign table. This function generates the FDW-private
information that will be attached to the <structname>ModifyTable</> plan
node that performs the update action. This private information must
have the form of a <literal>List</>, and will be delivered to
<function>BeginForeignModify</> during the execution stage.
</para>
<para>
<literal>root</> is the planner's global information about the query.
<literal>plan</> is the <structname>ModifyTable</> plan node, which is
complete except for the <structfield>fdwPrivLists</> field.
<literal>resultRelation</> identifies the target foreign table by its
rangetable index. <literal>subplan_index</> identifies which target of
the <structname>ModifyTable</> plan node this is, counting from zero;
use this if you want to index into <literal>node-&gt;plans</> or other
substructure of the <literal>plan</> node.
</para>
<para>
See <xref linkend="fdw-planning"> for additional information.
</para>
<para>
If the <function>PlanForeignModify</> pointer is set to
<literal>NULL</>, no additional plan-time actions are taken, and the
<literal>fdw_private</> list delivered to
<function>BeginForeignModify</> will be NIL.
</para>
<para>
<programlisting>
void
BeginForeignModify (ModifyTableState *mtstate,
ResultRelInfo *rinfo,
List *fdw_private,
int subplan_index,
int eflags);
</programlisting>
Begin executing a foreign table modification operation. This routine is
called during executor startup. It should perform any initialization
needed prior to the actual table modifications. Subsequently,
<function>ExecForeignInsert</>, <function>ExecForeignUpdate</> or
<function>ExecForeignDelete</> will be called for each tuple to be
inserted, updated, or deleted.
</para>
<para>
<literal>mtstate</> is the overall state of the
<structname>ModifyTable</> plan node being executed; global data about
the plan and execution state is available via this structure.
<literal>rinfo</> is the <structname>ResultRelInfo</> struct describing
the target foreign table. (The <structfield>ri_FdwState</> field of
<structname>ResultRelInfo</> is available for the FDW to store any
private state it needs for this operation.)
<literal>fdw_private</> contains the private data generated by
<function>PlanForeignModify</>, if any.
<literal>subplan_index</> identifies which target of
the <structname>ModifyTable</> plan node this is.
<literal>eflags</> contains flag bits describing the executor's
operating mode for this plan node.
</para>
<para>
Note that when <literal>(eflags &amp; EXEC_FLAG_EXPLAIN_ONLY)</> is
true, this function should not perform any externally-visible actions;
it should only do the minimum required to make the node state valid
for <function>ExplainForeignModify</> and <function>EndForeignModify</>.
</para>
<para>
If the <function>BeginForeignModify</> pointer is set to
<literal>NULL</>, no action is taken during executor startup.
</para>
<para>
<programlisting>
TupleTableSlot *
ExecForeignInsert (EState *estate,
ResultRelInfo *rinfo,
TupleTableSlot *slot,
TupleTableSlot *planSlot);
</programlisting>
Insert one tuple into the foreign table.
<literal>estate</> is global execution state for the query.
<literal>rinfo</> is the <structname>ResultRelInfo</> struct describing
the target foreign table.
<literal>slot</> contains the tuple to be inserted; it will match the
rowtype definition of the foreign table.
<literal>planSlot</> contains the tuple that was generated by the
<structname>ModifyTable</> plan node's subplan; it differs from
<literal>slot</> in possibly containing additional <quote>junk</>
columns. (The <literal>planSlot</> is typically of little interest
for <command>INSERT</> cases, but is provided for completeness.)
</para>
<para>
The return value is either a slot containing the data that was actually
inserted (this might differ from the data supplied, for example as a
result of trigger actions), or NULL if no row was actually inserted
(again, typically as a result of triggers). The passed-in
<literal>slot</> can be re-used for this purpose.
</para>
<para>
The data in the returned slot is used only if the <command>INSERT</>
query has a <literal>RETURNING</> clause. Hence, the FDW could choose
to optimize away returning some or all columns depending on the contents
of the <literal>RETURNING</> clause. However, some slot must be
returned to indicate success, or the query's reported rowcount will be
wrong.
</para>
<para>
If the <function>ExecForeignInsert</> pointer is set to
<literal>NULL</>, attempts to insert into the foreign table will fail
with an error message.
</para>
<para>
<programlisting>
TupleTableSlot *
ExecForeignUpdate (EState *estate,
ResultRelInfo *rinfo,
TupleTableSlot *slot,
TupleTableSlot *planSlot);
</programlisting>
Update one tuple in the foreign table.
<literal>estate</> is global execution state for the query.
<literal>rinfo</> is the <structname>ResultRelInfo</> struct describing
the target foreign table.
<literal>slot</> contains the new data for the tuple; it will match the
rowtype definition of the foreign table.
<literal>planSlot</> contains the tuple that was generated by the
<structname>ModifyTable</> plan node's subplan; it differs from
<literal>slot</> in possibly containing additional <quote>junk</>
columns. In particular, any junk columns that were requested by
<function>AddForeignUpdateTargets</> will be available from this slot.
</para>
<para>
The return value is either a slot containing the row as it was actually
updated (this might differ from the data supplied, for example as a
result of trigger actions), or NULL if no row was actually updated
(again, typically as a result of triggers). The passed-in
<literal>slot</> can be re-used for this purpose.
</para>
<para>
The data in the returned slot is used only if the <command>UPDATE</>
query has a <literal>RETURNING</> clause. Hence, the FDW could choose
to optimize away returning some or all columns depending on the contents
of the <literal>RETURNING</> clause. However, some slot must be
returned to indicate success, or the query's reported rowcount will be
wrong.
</para>
<para>
If the <function>ExecForeignUpdate</> pointer is set to
<literal>NULL</>, attempts to update the foreign table will fail
with an error message.
</para>
<para>
<programlisting>
TupleTableSlot *
ExecForeignDelete (EState *estate,
ResultRelInfo *rinfo,
TupleTableSlot *slot,
TupleTableSlot *planSlot);
</programlisting>
Delete one tuple from the foreign table.
<literal>estate</> is global execution state for the query.
<literal>rinfo</> is the <structname>ResultRelInfo</> struct describing
the target foreign table.
<literal>slot</> contains nothing useful upon call, but can be used to
hold the returned tuple.
<literal>planSlot</> contains the tuple that was generated by the
<structname>ModifyTable</> plan node's subplan; in particular, it will
carry any junk columns that were requested by
<function>AddForeignUpdateTargets</>. The junk column(s) must be used
to identify the tuple to be deleted.
</para>
<para>
The return value is either a slot containing the row that was deleted,
or NULL if no row was deleted (typically as a result of triggers). The
passed-in <literal>slot</> can be used to hold the tuple to be returned.
</para>
<para>
The data in the returned slot is used only if the <command>DELETE</>
query has a <literal>RETURNING</> clause. Hence, the FDW could choose
to optimize away returning some or all columns depending on the contents
of the <literal>RETURNING</> clause. However, some slot must be
returned to indicate success, or the query's reported rowcount will be
wrong.
</para>
<para>
If the <function>ExecForeignDelete</> pointer is set to
<literal>NULL</>, attempts to delete from the foreign table will fail
with an error message.
</para>
<para>
<programlisting>
void
EndForeignModify (EState *estate,
ResultRelInfo *rinfo);
</programlisting>
End the table update and release resources. It is normally not important
to release palloc'd memory, but for example open files and connections
to remote servers should be cleaned up.
</para>
<para>
If the <function>EndForeignModify</> pointer is set to
<literal>NULL</>, no action is taken during executor shutdown.
</para>
</sect2>
<sect2 id="fdw-callbacks-explain">
<title>FDW Routines for <command>EXPLAIN</></title>
<para>
<programlisting>
void
ExplainForeignScan (ForeignScanState *node,
ExplainState *es);
</programlisting>
Print additional <command>EXPLAIN</> output for a foreign table scan.
This function can call <function>ExplainPropertyText</> and
related functions to add fields to the <command>EXPLAIN</> output.
The flag fields in <literal>es</> can be used to determine what to
print, and the state of the <structname>ForeignScanState</> node
can be inspected to provide run-time statistics in the <command>EXPLAIN
ANALYZE</> case.
</para>
<para>
If the <function>ExplainForeignScan</> pointer is set to
<literal>NULL</>, no additional information is printed during
<command>EXPLAIN</>.
</para>
<para>
<programlisting>
void
ExplainForeignModify (ModifyTableState *mtstate,
ResultRelInfo *rinfo,
List *fdw_private,
int subplan_index,
struct ExplainState *es);
</programlisting>
Print additional <command>EXPLAIN</> output for a foreign table update.
This function can call <function>ExplainPropertyText</> and
related functions to add fields to the <command>EXPLAIN</> output.
The flag fields in <literal>es</> can be used to determine what to
print, and the state of the <structname>ModifyTableState</> node
can be inspected to provide run-time statistics in the <command>EXPLAIN
ANALYZE</> case. The first four arguments are the same as for
<function>BeginForeignModify</>.
</para>
<para>
If the <function>ExplainForeignModify</> pointer is set to
<literal>NULL</>, no additional information is printed during
<command>EXPLAIN</>.
</para>
</sect2>
<sect2 id="fdw-callbacks-analyze">
<title>FDW Routines for <command>ANALYZE</></title>
<para>
<programlisting>
bool
@ -291,6 +637,9 @@ AnalyzeForeignTable (Relation relation,
to a function that will collect sample rows from the table in
<parameter>func</>, plus the estimated size of the table in pages in
<parameter>totalpages</>. Otherwise, return <literal>false</>.
</para>
<para>
If the FDW does not support collecting statistics for any tables, the
<function>AnalyzeForeignTable</> pointer can be set to <literal>NULL</>.
</para>
@ -314,11 +663,7 @@ AcquireSampleRowsFunc (Relation relation, int elevel,
if the FDW does not have any concept of dead rows.)
</para>
<para>
The <structname>FdwRoutine</> struct type is declared in
<filename>src/include/foreign/fdwapi.h</>, which see for additional
details.
</para>
</sect2>
</sect1>
@ -432,9 +777,10 @@ GetForeignServerByName(const char *name, bool missing_ok);
<para>
The FDW callback functions <function>GetForeignRelSize</>,
<function>GetForeignPaths</>, and <function>GetForeignPlan</> must fit
into the workings of the <productname>PostgreSQL</> planner. Here are
some notes about what they must do.
<function>GetForeignPaths</>, <function>GetForeignPlan</>, and
<function>PlanForeignModify</> must fit into the workings of the
<productname>PostgreSQL</> planner. Here are some notes about what
they must do.
</para>
<para>
@ -546,6 +892,33 @@ GetForeignServerByName(const char *name, bool missing_ok);
same as for an ordinary restriction clause.
</para>
<para>
When planning an <command>UPDATE</> or <command>DELETE</>,
<function>PlanForeignModify</> can look up the <structname>RelOptInfo</>
struct for the foreign table and make use of the
<literal>baserel-&gt;fdw_private</> data previously created by the
scan-planning functions. However, in <command>INSERT</> the target
table is not scanned so there is no <structname>RelOptInfo</> for it.
</para>
<para>
For an <command>UPDATE</> or <command>DELETE</> against an external data
source that supports concurrent updates, it is recommended that the
<literal>ForeignScan</> operation lock the rows that it fetches, perhaps
via the equivalent of <command>SELECT FOR UPDATE</>. The FDW may also
choose to lock rows at fetch time when the foreign table is referenced
in a <command>SELECT FOR UPDATE/SHARE</>; if it does not, the
<literal>FOR UPDATE</> or <literal>FOR SHARE</> option is essentially a
no-op so far as the foreign table is concerned. This behavior may yield
semantics slightly different from operations on local tables, where row
locking is customarily delayed as long as possible: remote rows may get
locked even though they subsequently fail locally-applied restriction or
join conditions. However, matching the local semantics exactly would
require an additional remote access for every row, and might be
impossible anyway depending on what locking semantics the external data
source provides.
</para>
</sect1>
</chapter>

View File

@ -13,6 +13,7 @@
files in the server's file system. Data files must be in a format
that can be read by <command>COPY FROM</command>;
see <xref linkend="sql-copy"> for details.
Access to such data files is currently read-only.
</para>
<para>
@ -160,7 +161,7 @@
<example>
<title id="csvlog-fdw">Create a Foreign Table for PostgreSQL CSV Logs</title>
<para>
One of the obvious uses for the <literal>file_fdw</> is to make
the PostgreSQL activity log available as a table for querying. To
@ -217,8 +218,8 @@ OPTIONS ( filename '/home/josh/9.1/data/pg_log/pglog.csv', format 'csv' );
</para>
<para>
That's it &mdash; now you can query your log directly. In production, of course,
you would need to define some way to adjust to log rotation.
That's it &mdash; now you can query your log directly. In production, of
course, you would need to define some way to deal with log rotation.
</para>
</example>

View File

@ -61,7 +61,10 @@
<para>
Now you need only <command>SELECT</> from a foreign table to access
the data stored in its underlying remote table.
the data stored in its underlying remote table. You can also modify
the remote table using <command>INSERT</>, <command>UPDATE</>, or
<command>DELETE</>. (Of course, the remote user you have specified
in your user mapping must have privileges to do these things.)
</para>
<para>

View File

@ -117,9 +117,10 @@ CREATE FOREIGN DATA WRAPPER <replaceable class="parameter">name</replaceable>
<title>Notes</title>
<para>
At the moment, the foreign-data wrapper functionality is rudimentary.
There is no support for updating a foreign table, and optimization of
queries is primitive (and mostly left to the wrapper, too).
<productname>PostgreSQL</>'s foreign-data functionality is still under
active development. Optimization of queries is primitive (and mostly left
to the wrapper, too). Thus, there is considerable room for future
performance improvements.
</para>
</refsect1>
@ -158,7 +159,7 @@ CREATE FOREIGN DATA WRAPPER mywrapper
9075-9 (SQL/MED), with the exception that the <literal>HANDLER</literal>
and <literal>VALIDATOR</literal> clauses are extensions and the standard
clauses <literal>LIBRARY</literal> and <literal>LANGUAGE</literal>
are not implemented in PostgreSQL.
are not implemented in <productname>PostgreSQL</>.
</para>
<para>
@ -175,6 +176,7 @@ CREATE FOREIGN DATA WRAPPER mywrapper
<member><xref linkend="sql-dropforeigndatawrapper"></member>
<member><xref linkend="sql-createserver"></member>
<member><xref linkend="sql-createusermapping"></member>
<member><xref linkend="sql-createforeigntable"></member>
</simplelist>
</refsect1>