1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-30 11:03:19 +03:00

Allow specifying row filters for logical replication of tables.

This feature adds row filtering for publication tables. When a publication
is defined or modified, an optional WHERE clause can be specified. Rows
that don't satisfy this WHERE clause will be filtered out. This allows a
set of tables to be partially replicated. The row filter is per table. A
new row filter can be added simply by specifying a WHERE clause after the
table name. The WHERE clause must be enclosed by parentheses.

The row filter WHERE clause for a table added to a publication that
publishes UPDATE and/or DELETE operations must contain only columns that
are covered by REPLICA IDENTITY. The row filter WHERE clause for a table
added to a publication that publishes INSERT can use any column. If the
row filter evaluates to NULL, it is regarded as "false". The WHERE clause
only allows simple expressions that don't have user-defined functions,
user-defined operators, user-defined types, user-defined collations,
non-immutable built-in functions, or references to system columns. These
restrictions could be addressed in the future.

If you choose to do the initial table synchronization, only data that
satisfies the row filters is copied to the subscriber. If the subscription
has several publications in which a table has been published with
different WHERE clauses, rows that satisfy ANY of the expressions will be
copied. If a subscriber is a pre-15 version, the initial table
synchronization won't use row filters even if they are defined in the
publisher.

The row filters are applied before publishing the changes. If the
subscription has several publications in which the same table has been
published with different filters (for the same publish operation), those
expressions get OR'ed together so that rows satisfying any of the
expressions will be replicated.

This means all the other filters become redundant if (a) one of the
publications have no filter at all, (b) one of the publications was
created using FOR ALL TABLES, (c) one of the publications was created
using FOR ALL TABLES IN SCHEMA and the table belongs to that same schema.

If your publication contains a partitioned table, the publication
parameter publish_via_partition_root determines if it uses the partition's
row filter (if the parameter is false, the default) or the root
partitioned table's row filter.

Psql commands \dRp+ and \d <table-name> will display any row filters.

Author: Hou Zhijie, Euler Taveira, Peter Smith, Ajin Cherian
Reviewed-by: Greg Nancarrow, Haiying Tang, Amit Kapila, Tomas Vondra, Dilip Kumar, Vignesh C, Alvaro Herrera, Andres Freund, Wei Wang
Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X%2BmK_DitLXF25B%3DjE6xyNCH4YOwM860JR7HarGQ%40mail.gmail.com
This commit is contained in:
Amit Kapila
2022-02-22 07:54:12 +05:30
parent ebf6c5249b
commit 52e4f0cd47
33 changed files with 3120 additions and 243 deletions

View File

@ -6325,6 +6325,15 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
Reference to relation
</para></entry>
</row>
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>prqual</structfield> <type>pg_node_tree</type>
</para>
<para>Expression tree (in <function>nodeToString()</function>
representation) for the relation's publication qualifying condition. Null
if there is no publication qualifying condition.</para></entry>
</row>
</tbody>
</tgroup>
</table>

View File

@ -30,7 +30,7 @@ ALTER PUBLICATION <replaceable class="parameter">name</replaceable> RENAME TO <r
<phrase>where <replaceable class="parameter">publication_object</replaceable> is one of:</phrase>
TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [, ... ]
TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ WHERE ( <replaceable class="parameter">expression</replaceable> ) ] [, ... ]
ALL TABLES IN SCHEMA { <replaceable class="parameter">schema_name</replaceable> | CURRENT_SCHEMA } [, ... ]
</synopsis>
</refsynopsisdiv>
@ -52,7 +52,9 @@ ALTER PUBLICATION <replaceable class="parameter">name</replaceable> RENAME TO <r
remove one or more tables/schemas from the publication. Note that adding
tables/schemas to a publication that is already subscribed to will require an
<literal>ALTER SUBSCRIPTION ... REFRESH PUBLICATION</literal> action on the
subscribing side in order to become effective.
subscribing side in order to become effective. Note also that the combination
of <literal>DROP</literal> with a <literal>WHERE</literal> clause is not
allowed.
</para>
<para>
@ -110,6 +112,12 @@ ALTER PUBLICATION <replaceable class="parameter">name</replaceable> RENAME TO <r
specified, the table and all its descendant tables (if any) are
affected. Optionally, <literal>*</literal> can be specified after the table
name to explicitly indicate that descendant tables are included.
If the optional <literal>WHERE</literal> clause is specified, rows for
which the <replaceable class="parameter">expression</replaceable>
evaluates to false or null will not be published. Note that parentheses
are required around the expression. The
<replaceable class="parameter">expression</replaceable> is evaluated with
the role used for the replication connection.
</para>
</listitem>
</varlistentry>

View File

@ -163,8 +163,11 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
<para>
Specifies whether to copy pre-existing data in the publications
that are being subscribed to when the replication starts.
The default is <literal>true</literal>. (Previously-subscribed
tables are not copied.)
The default is <literal>true</literal>.
</para>
<para>
Previously subscribed tables are not copied, even if a table's row
filter <literal>WHERE</literal> clause has since been modified.
</para>
</listitem>
</varlistentry>

View File

@ -28,7 +28,7 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
<phrase>where <replaceable class="parameter">publication_object</replaceable> is one of:</phrase>
TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [, ... ]
TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ] [ WHERE ( <replaceable class="parameter">expression</replaceable> ) ] [, ... ]
ALL TABLES IN SCHEMA { <replaceable class="parameter">schema_name</replaceable> | CURRENT_SCHEMA } [, ... ]
</synopsis>
</refsynopsisdiv>
@ -78,6 +78,14 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
publication, so they are never explicitly added to the publication.
</para>
<para>
If the optional <literal>WHERE</literal> clause is specified, rows for
which the <replaceable class="parameter">expression</replaceable>
evaluates to false or null will not be published. Note that parentheses
are required around the expression. It has no effect on
<literal>TRUNCATE</literal> commands.
</para>
<para>
Only persistent base tables and partitioned tables can be part of a
publication. Temporary tables, unlogged tables, foreign tables,
@ -225,6 +233,22 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
disallowed on those tables.
</para>
<para>
A <literal>WHERE</literal> (i.e. row filter) expression must contain only
columns that are covered by the <literal>REPLICA IDENTITY</literal>, in
order for <command>UPDATE</command> and <command>DELETE</command> operations
to be published. For publication of <command>INSERT</command> operations,
any column may be used in the <literal>WHERE</literal> expression. The
<literal>WHERE</literal> clause allows simple expressions that don't have
user-defined functions, user-defined operators, user-defined types,
user-defined collations, non-immutable built-in functions, or references to
system columns.
If your publication contains a partitioned table, the publication parameter
<literal>publish_via_partition_root</literal> determines if it uses the
partition's row filter (if the parameter is false, the default) or the root
partitioned table's row filter.
</para>
<para>
For an <command>INSERT ... ON CONFLICT</command> command, the publication will
publish the operation that actually results from the command. So depending
@ -247,6 +271,11 @@ CREATE PUBLICATION <replaceable class="parameter">name</replaceable>
<para>
<acronym>DDL</acronym> operations are not published.
</para>
<para>
The <literal>WHERE</literal> clause expression is executed with the role used
for the replication connection.
</para>
</refsect1>
<refsect1>
@ -259,6 +288,13 @@ CREATE PUBLICATION mypublication FOR TABLE users, departments;
</programlisting>
</para>
<para>
Create a publication that publishes all changes from active departments:
<programlisting>
CREATE PUBLICATION active_departments FOR TABLE departments WHERE (active IS TRUE);
</programlisting>
</para>
<para>
Create a publication that publishes all changes in all tables:
<programlisting>

View File

@ -208,6 +208,11 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
that are being subscribed to when the replication starts.
The default is <literal>true</literal>.
</para>
<para>
If the publications contain <literal>WHERE</literal> clauses, it
will affect what data is copied. Refer to the
<xref linkend="sql-createsubscription-notes" /> for details.
</para>
</listitem>
</varlistentry>
@ -293,7 +298,7 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
</variablelist>
</refsect1>
<refsect1>
<refsect1 id="sql-createsubscription-notes" xreflabel="Notes">
<title>Notes</title>
<para>
@ -319,6 +324,26 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
the parameter <literal>create_slot = false</literal>. This is an
implementation restriction that might be lifted in a future release.
</para>
<para>
If any table in the publication has a <literal>WHERE</literal> clause, rows
for which the <replaceable class="parameter">expression</replaceable>
evaluates to false or null will not be published. If the subscription has
several publications in which the same table has been published with
different <literal>WHERE</literal> clauses, a row will be published if any
of the expressions (referring to that publish operation) are satisfied. In
the case of different <literal>WHERE</literal> clauses, if one of the
publications has no <literal>WHERE</literal> clause (referring to that
publish operation) or the publication is declared as
<literal>FOR ALL TABLES</literal> or
<literal>FOR ALL TABLES IN SCHEMA</literal>, rows are always published
regardless of the definition of the other expressions.
If the subscriber is a <productname>PostgreSQL</productname> version before
15 then any row filtering is ignored during the initial data synchronization
phase. For this case, the user might want to consider deleting any initially
copied data that would be incompatible with subsequent filtering.
</para>
</refsect1>
<refsect1>