1
0
mirror of https://github.com/postgres/postgres.git synced 2025-10-24 01:29:19 +03:00
Commit Graph

140 Commits

Author SHA1 Message Date
Masahiko Sawada
072ee847ad Skip logical decoding of already-aborted transactions.
Previously, transaction aborts were detected concurrently only during
system catalog scans while replaying a transaction in streaming mode.

This commit adds an additional CLOG lookup to check the transaction
status, allowing the logical decoding to skip changes also when it
doesn't touch system catalogs, if the transaction is already
aborted. This optimization enhances logical decoding performance,
especially for large transactions that have already been rolled back,
as it avoids unnecessary disk or network I/O.

To avoid potential slowdowns caused by frequent CLOG lookups for small
transactions (most of which commit), the CLOG lookup is performed only
for large transactions before eviction. The performance benchmark
results showed there is not noticeable performance regression due to
CLOG lookups.

Reviewed-by: Amit Kapila, Peter Smith, Vignesh C, Ajin Cherian
Reviewed-by: Dilip Kumar, Andres Freund
Discussion: https://postgr.es/m/CAD21AoDht9Pz_DFv_R2LqBTBbO4eGrpa9Vojmt5z5sEx3XwD7A@mail.gmail.com
2025-02-12 16:31:34 -08:00
Álvaro Herrera
14e87ffa5c Add pg_constraint rows for not-null constraints
We now create contype='n' pg_constraint rows for not-null constraints on
user tables.  Only one such constraint is allowed for a column.

We propagate these constraints to other tables during operations such as
adding inheritance relationships, creating and attaching partitions and
creating tables LIKE other tables.  These related constraints mostly
follow the well-known rules of conislocal and coninhcount that we have
for CHECK constraints, with some adaptations: for example, as opposed to
CHECK constraints, we don't match not-null ones by name when descending
a hierarchy to alter or remove it, instead matching by the name of the
column that they apply to.  This means we don't require the constraint
names to be identical across a hierarchy.

The inheritance status of these constraints can be controlled: now we
can be sure that if a parent table has one, then all children will have
it as well.  They can optionally be marked NO INHERIT, and then children
are free not to have one.  (There's currently no support for altering a
NO INHERIT constraint into inheriting down the hierarchy, but that's a
desirable future feature.)

This also opens the door for having these constraints be marked NOT
VALID, as well as allowing UNIQUE+NOT NULL to be used for functional
dependency determination, as envisioned by commit e49ae8d3bc.  It's
likely possible to allow DEFERRABLE constraints as followup work, as
well.

psql shows these constraints in \d+, though we may want to reconsider if
this turns out to be too noisy.  Earlier versions of this patch hid
constraints that were on the same columns of the primary key, but I'm
not sure that that's very useful.  If clutter is a problem, we might be
better off inventing a new \d++ command and not showing the constraints
in \d+.

For now, we omit these constraints on system catalog columns, because
they're unlikely to achieve anything.

The main difference to the previous attempt at this (b0e96f3119) is
that we now require that such a constraint always exists when a primary
key is in the column; we didn't require this previously which had a
number of unpalatable consequences.  With this requirement, the code is
easier to reason about.  For example:

- We no longer have "throwaway constraints" during pg_dump.  We needed
  those for the case where a table had a PK without a not-null
  underneath, to prevent a slow scan of the data during restore of the
  PK creation, which was particularly problematic for pg_upgrade.

- We no longer have to cope with attnotnull being set spuriously in
  case a primary key is dropped indirectly (e.g., via DROP COLUMN).

Some bits of code in this patch were authored by Jian He.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Author: Bernd Helmle <mailings@oopsware.de>
Reviewed-by: 何建 (jian he) <jian.universality@gmail.com>
Reviewed-by: 王刚 (Tender Wang) <tndrwang@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/202408310358.sdhumtyuy2ht@alvherre.pgsql
2024-11-08 13:28:48 +01:00
Amit Kapila
d759c1a0b8 Stabilize the test added by commit 022564f60c.
The test was unstable in branches 14 and 15 as we were relying on the
number of changes in the table having a toast column to start streaming.
On branches >= 16, we have a GUC debug_logical_replication_streaming which
can stream each change, so the test was stable in those branches.

Change the test to use PREPARE TRANSACTION as that should make the result
consistent and test the code changed in 022564f60c.

Reported-by: Daniel Gustafsson as per buildfarm
Author: Hou Zhijie, Amit Kapila
Backpatch-through: 14
Discussion: https://postgr.es/m/8C2F86AA-981E-4803-B14D-E264C0255330@yesql.se
2024-10-08 12:25:52 +05:30
Amit Kapila
022564f60c Fix fetching default toast value during decoding of in-progress transactions.
During logical decoding of in-progress transactions, we perform the toast
table scan while fetching the default toast value for an attribute. We
forgot to initialize the flag during this scan to indicate that the system
table scan is in progress. We need this flag to ensure that during logical
decoding we never directly access the tableam or heap APIs because we check
for concurrent aborts only in systable_* APIs.

Reported-by: Alexander Lakhin
Author: Takeshi Ideriha, Hou Zhijie
Reviewed-by: Amit Kapila, Hou Zhijie
Backpatch-through: 14
Discussion: https://postgr.es/m/18641-6687273b7f15269d@postgresql.org
2024-10-07 15:38:45 +05:30
Masahiko Sawada
52f1d6730b Fix memory counter update in ReorderBuffer.
Commit 5bec1d6bc5 changed the memory usage updates of the
ReorderBufferTXN to zero all at once by subtracting txn->size, rather
than updating it for each change. However, if TOAST reconstruction
data remained in the transaction when freeing it, there were cases
where it further subtracted the memory counter from zero, resulting in
an assertion failure.

This change calculates the memory size for each change and updates the
memory usage to precisely the amount that has been freed.

Backpatch to v17, where this was introducd.

Reviewed-by: Amit Kapila, Shlok Kyal
Discussion: https://postgr.es/m/CAD21AoAqkNUvicgKPT_dXzNoOwpPkVTg0QPPxEcWmzT0moCJ1g%40mail.gmail.com
Backpatch-through: 17
2024-08-26 11:00:07 -07:00
Masahiko Sawada
bb19b70081 Fix possibility of logical decoding partial transaction changes.
When creating and initializing a logical slot, the restart_lsn is set
to the latest WAL insertion point (or the latest replay point on
standbys). Subsequently, WAL records are decoded from that point to
find the start point for extracting changes in the
DecodingContextFindStartpoint() function. Since the initial
restart_lsn could be in the middle of a transaction, the start point
must be a consistent point where we won't see the data for partial
transactions.

Previously, when not building a full snapshot, serialized snapshots
were restored, and the SnapBuild jumps to the consistent state even
while finding the start point. Consequently, the slot's restart_lsn
and confirmed_flush could be set to the middle of a transaction. This
could lead to various unexpected consequences. Specifically, there
were reports of logical decoding decoding partial transactions, and
assertion failures occurred because only subtransactions were decoded
without decoding their top-level transaction until decoding the commit
record.

To resolve this issue, the changes prevent restoring the serialized
snapshot and jumping to the consistent state while finding the start
point.

On v17 and HEAD, a flag indicating whether snapshot restores should be
skipped has been added to the SnapBuild struct, and SNAPBUILD_VERSION
has been bumpded.

On backbranches, the flag is stored in the LogicalDecodingContext
instead, preserving on-disk compatibility.

Backpatch to all supported versions.

Reported-by: Drew Callahan
Reviewed-by: Amit Kapila, Hayato Kuroda
Discussion: https://postgr.es/m/2444AA15-D21B-4CCE-8052-52C7C2DAFE5C%40amazon.com
Backpatch-through: 12
2024-07-11 22:48:23 +09:00
Heikki Linnakangas
e6cd857726 Make RelationFlushRelation() work without ResourceOwner during abort
ReorderBufferImmediateInvalidation() executes invalidation messages in
an aborted transaction. However, RelationFlushRelation sometimes
required a valid resource owner, to temporarily increment the refcount
of the relache entry. Commit b8bff07daa worked around that in the main
subtransaction abort function, AbortSubTransaction(), but missed this
similar case in ReorderBufferImmediateInvalidation().

To fix, introduce a separate function to invalidate a relcache
entry. It does the same thing as RelationClearRelation(rebuild==true)
does when outside a transaction, but can be called without
incrementing the refcount.

Add regression test. Before this fix, it failed with:

ERROR: ResourceOwnerEnlarge called after release started

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://www.postgresql.org/message-id/e56be7d9-14b1-664d-0bfc-00ce9772721c@gmail.com
2024-06-06 18:56:28 +03:00
Peter Eisentraut
17974ec259 Revise GUC names quoting in messages again
After further review, we want to move in the direction of always
quoting GUC names in error messages, rather than the previous (PG16)
wildly mixed practice or the intermittent (mid-PG17) idea of doing
this depending on how possibly confusing the GUC name is.

This commit applies appropriate quotes to (almost?) all mentions of
GUC names in error messages.  It partially supersedes a243569bf6 and
8d9978a717, which had moved things a bit in the opposite direction
but which then were abandoned in a partial state.

Author: Peter Smith <smithpb2250@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w%40mail.gmail.com
2024-05-17 11:44:26 +02:00
Alvaro Herrera
6f8bb7c1e9 Revert structural changes to not-null constraints
There are some problems with the new way to handle these constraints
that were detected at the last minute, and require fixes that appear too
invasive to be doing this late in the cycle.  Revert this (again) for
now, we'll try again with these problems fixed.

The following commits are reverted:

    b0e96f3119  Catalog not-null constraints
    9b581c5341  Disallow changing NO INHERIT status of a not-null constraint
    d0ec2ddbe0  Fix not-null constraint test
    ac22a9545c  Move privilege check to the right place
    b0f7dd915b  Check stack depth in new recursive functions
    3af7217942  Update information_schema definition for not-null constraints
    c3709100be  Fix propagating attnotnull in multiple inheritance
    d9f686a72e  Fix restore of not-null constraints with inheritance
    d72d32f52d  Don't try to assign smart names to constraints
    0cd711271d  Better handle indirect constraint drops
    13daa33fa5  Disallow NO INHERIT not-null constraints on partitioned tables
    d45597f72f  Disallow direct change of NO INHERIT of not-null constraints
    21ac38f498  Fix inconsistencies in error messages

Discussion: https://postgr.es/m/202405110940.joxlqcx4dogd@alvherre.pgsql
2024-05-13 11:31:09 +02:00
Amit Kapila
ddd5f4f54a Add a slot synchronization function.
This commit introduces a new SQL function pg_sync_replication_slots()
which is used to synchronize the logical replication slots from the
primary server to the physical standby so that logical replication can be
resumed after a failover or planned switchover.

A new 'synced' flag is introduced in pg_replication_slots view, indicating
whether the slot has been synchronized from the primary server. On a
standby, synced slots cannot be dropped or consumed, and any attempt to
perform logical decoding on them will result in an error.

The logical replication slots on the primary can be synchronized to the
hot standby by using the 'failover' parameter of
pg-create-logical-replication-slot(), or by using the 'failover' option of
CREATE SUBSCRIPTION during slot creation, and then calling
pg_sync_replication_slots() on standby. For the synchronization to work,
it is mandatory to have a physical replication slot between the primary
and the standby aka 'primary_slot_name' should be configured on the
standby, and 'hot_standby_feedback' must be enabled on the standby. It is
also necessary to specify a valid 'dbname' in the 'primary_conninfo'.

If a logical slot is invalidated on the primary, then that slot on the
standby is also invalidated.

If a logical slot on the primary is valid but is invalidated on the
standby, then that slot is dropped but will be recreated on the standby in
the next pg_sync_replication_slots() call provided the slot still exists
on the primary server. It is okay to recreate such slots as long as these
are not consumable on standby (which is the case currently). This
situation may occur due to the following reasons:
- The 'max_slot_wal_keep_size' on the standby is insufficient to retain
WAL records from the restart_lsn of the slot.
- 'primary_slot_name' is temporarily reset to null and the physical slot
is removed.

The slot synchronization status on the standby can be monitored using the
'synced' column of pg_replication_slots view.

A functionality to automatically synchronize slots by a background worker
and allow logical walsenders to wait for the physical will be done in
subsequent commits.

Author: Hou Zhijie, Shveta Malik, Ajin Cherian based on an earlier version by Peter Eisentraut
Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-02-14 09:45:36 +05:30
Amit Kapila
c393308b69 Allow to enable failover property for replication slots via SQL API.
This commit adds the failover property to the replication slot. The
failover property indicates whether the slot will be synced to the standby
servers, enabling the resumption of corresponding logical replication
after failover. But note that this commit does not yet include the
capability to sync the replication slot; the subsequent commits will add
that capability.

A new optional parameter 'failover' is added to the
pg_create_logical_replication_slot() function. We will also enable to set
'failover' option for slots via the subscription commands in the
subsequent commits.

The value of the 'failover' flag is displayed as part of
pg_replication_slots view.

Author: Hou Zhijie, Shveta Malik, Ajin Cherian
Reviewed-by: Peter Smith, Bertrand Drouvot, Dilip Kumar, Masahiko Sawada, Nisha Moond, Kuroda, Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-01-25 12:15:46 +05:30
Amit Kapila
57891c256c Add STREAM_START/STREAM_STOP for transactional messages during decoding.
In test_decoding module, when skip_empty_xacts option was specified, add
stream_start/stop for streaming transactional messages. This makes the
handling of transactional messages stream consistent irrespective of
whether skip_empty_xacts option was specified.

Commit 26dd0284b9 made a similar change for non-streaming messages but
forgot to update the streaming cases.

Author: Peter Smith
Reviewed-by: Amit Kapila
Discussion: http://postgr.es/m/OS0PR01MB5716AEBD2988F8F5E9D5985794DFA@OS0PR01MB5716.jpnprd01.prod.outlook.com
2023-10-30 14:36:21 +05:30
Michael Paquier
173b56f1ef Add flush option to pg_logical_emit_message()
Since its introduction, LogLogicalMessage() (via the SQL interface
pg_logical_emit_message()) has never included a call to XLogFlush(),
causing it to potentially lose messages on a crash when used in
non-transactional mode.  This has come up to me as a problem while
playing with ideas to design a test suite for what has become
039_end_of_wal.pl introduced in bae868caf2 by Thomas Munro, because
there are no direct ways to force a WAL flush via SQL.

The default is false, to not flush messages and influence existing
use-cases where this function could be used.  If set to true, the
message emitted is flushed before returning back to the caller, making
the message durable on crash.  This new option has no effect when using
pg_logical_emit_message() in transactional mode, as the record's flush
is guaranteed by the WAL record generated by the transaction committed.

Two queries of test_decoding are tweaked to cover the new code path for
the flush.

Bump catalog version.

Author: Michael Paquier
Reviewed-by: Andres Freund, Amit Kapila, Fujii Masao, Tung Nguyen, Tomas
Vondra
Discussion: https://postgr.es/m/ZNsdThSe2qgsfs7R@paquier.xyz
2023-10-18 11:24:59 +09:00
Michael Paquier
98e89740e5 test_decoding: Remove unnecessary table in twophase test
The end of this test is dropping all the relations created but forgot
about this one.  This is not critical, but let's be clean, and the test
expects a cleanup, as documented.

Author: Nishant Sharma
Discussion: https://postgr.es/m/CADrsxdb0ueGV9nrC6s8zvXLkGUhnEjx7Ou_p5wo38TvmSvF83A@mail.gmail.com
2023-10-10 17:49:22 +09:00
Amit Kapila
54ccfd6586 Fix the misuse of origin filter across multiple pg_logical_slot_get_changes() calls.
The pgoutput module uses a global variable (publish_no_origin) to cache
the action for the origin filter, but we didn't reset the flag when
shutting down the output plugin, so subsequent retries may access the
previous publish_no_origin value.

We fix this by storing the flag in the output plugin's private data.
Additionally, the patch removes the currently unused origin string from the
structure.

For the back branch, to avoid changing the exposed structure, we eliminated the
global variable and instead directly used the origin string for change
filtering.

Author: Hou Zhijie
Reviewed-by: Amit Kapila, Michael Paquier
Backpatch-through: 16
Discussion: http://postgr.es/m/OS0PR01MB571690EF24F51F51EFFCBB0E94FAA@OS0PR01MB5716.jpnprd01.prod.outlook.com
2023-09-27 14:32:51 +05:30
Alvaro Herrera
b0e96f3119 Catalog not-null constraints
We now create contype='n' pg_constraint rows for not-null constraints.

We propagate these constraints to other tables during operations such as
adding inheritance relationships, creating and attaching partitions and
creating tables LIKE other tables.  We also spawn not-null constraints
for inheritance child tables when their parents have primary keys.
These related constraints mostly follow the well-known rules of
conislocal and coninhcount that we have for CHECK constraints, with some
adaptations: for example, as opposed to CHECK constraints, we don't
match not-null ones by name when descending a hierarchy to alter it,
instead matching by column name that they apply to.  This means we don't
require the constraint names to be identical across a hierarchy.

For now, we omit them for system catalogs.  Maybe this is worth
reconsidering.  We don't support NOT VALID nor DEFERRABLE clauses
either; these can be added as separate features later (this patch is
already large and complicated enough.)

psql shows these constraints in \d+.

pg_dump requires some ad-hoc hacks, particularly when dumping a primary
key.  We now create one "throwaway" not-null constraint for each column
in the PK together with the CREATE TABLE command, and once the PK is
created, all those throwaway constraints are removed.  This avoids
having to check each tuple for nullness when the dump restores the
primary key creation.

pg_upgrading from an older release requires a somewhat brittle procedure
to create a constraint state that matches what would be created if the
database were being created fresh in Postgres 17.  I have tested all the
scenarios I could think of, and it works correctly as far as I can tell,
but I could have neglected weird cases.

This patch has been very long in the making.  The first patch was
written by Bernd Helmle in 2010 to add a new pg_constraint.contype value
('n'), which I (Álvaro) then hijacked in 2011 and 2012, until that one
was killed by the realization that we ought to use contype='c' instead:
manufactured CHECK constraints.  However, later SQL standard
development, as well as nonobvious emergent properties of that design
(mostly, failure to distinguish them from "normal" CHECK constraints as
well as the performance implication of having to test the CHECK
expression) led us to reconsider this choice, so now the current
implementation uses contype='n' again.  During Postgres 16 this had
already been introduced by commit e056c557ae, but there were some
problems mainly with the pg_upgrade procedure that couldn't be fixed in
reasonable time, so it was reverted.

In 2016 Vitaly Burovoy also worked on this feature[1] but found no
consensus for his proposed approach, which was claimed to be closer to
the letter of the standard, requiring an additional pg_attribute column
to track the OID of the not-null constraint for that column.
[1] https://postgr.es/m/CAKOSWNkN6HSyatuys8xZxzRCR-KL1OkHS5-b9qd9bf1Rad3PLA@mail.gmail.com

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Author: Bernd Helmle <mailings@oopsware.de>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
2023-08-25 13:31:24 +02:00
Alvaro Herrera
d6677b93c7 Make test_decoding ddl.out shorter
Some of the test_decoding test output was extremely wide, because it
deals with massive toasted values, and the aligned mode causes psql to
produce 200kB of whitespace and dashes. Change to unaligned mode
temporarily to avoid that behavior.

Backpatch to 14, where it applies cleanly.

Discussion: https://postgr.es/m/20230405103953.sxleixp3uz5lazst@alvherre.pgsql
2023-07-24 17:48:06 +02:00
Amit Kapila
26dd0284b9 Add BEGIN/COMMIT for transactional messages during decoding.
In test_decoding module, when skip_empty_xacts option was specified, add
BEGIN/COMMIT for transactional messages. This makes the handling of
transactional messages consistent irrespective of whether skip_empty_xacts
option was specified.

We decided not to backpatch this change because skip_empty_xacts is
primarily used to have consistent test results across different runs and
this change won't help with that.

Author: Vignesh C
Reviewed-by: Ashutosh Bapat, Hou Zhijie
Discussion: https://postgr.es/m/CAExHW5ujRhbOz6_aTq_jQA8NjeFqq9d_8G9viShWvXx8gdSXiQ@mail.gmail.com
2023-07-11 08:31:11 +05:30
Peter Eisentraut
657f5f223e Remove incidental md5() function uses from several tests
This removes md5() function calls from these test suites:

- bloom
- test_decoding
- isolation
- recovery
- subscription

This covers all remaining test suites where md5() calls were just used
to generate some random data and can be replaced by appropriately
adapted sha256() calls.  This will eventually allow these tests to
pass in OpenSSL FIPS mode (which does not allow MD5 use).  See also
208bf364a9.  Unlike for the main regression tests, I didn't write a
fipshash() wrapper here, because that would have been too repetitive
and wouldn't really save much here.  In some cases it was easier to
remove one layer of indirection by changing column types from text to
bytea.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/flat/f9b480b5-e473-d2d1-223a-4b9db30a229a@eisentraut.org
2023-07-04 14:31:57 +02:00
David Rowley
eef231e816 Fix some typos and some incorrectly duplicated words
Author: Justin Pryzby
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/ZD3D1QxoccnN8A1V@telsasoft.com
2023-04-18 14:03:49 +12:00
Peter Eisentraut
de4d456b40 Improve several permission-related error messages.
Mainly move some detail from errmsg to errdetail, remove explicit
mention of superuser where appropriate, since that is implied in most
permission checks, and make messages more uniform.

Author: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/20230316234701.GA903298@nathanxps13
2023-03-17 10:33:09 +01:00
Amit Kapila
16b1fe0037 Fix assertion failures while processing NEW_CID record in logical decoding.
When the logical decoding restarts from NEW_CID, since there is no
association between the top transaction and its subtransaction, both are
created as top transactions and have the same LSN. This caused the
assertion failure in AssertTXNLsnOrder().

This patch skips the assertion check until we reach the LSN at which we
start decoding the contents of the transaction, specifically
start_decoding_at LSN in SnapBuild. This is okay because we don't
guarantee to make the association between top transaction and
subtransaction until we try to decode the actual contents of transaction.
The ordering of the records prior to the start_decoding_at LSN should have
been checked before the restart.

The other assertion failure is due to the reason that we forgot to track
that we have considered top-level transaction id in the list of catalog
changing transactions that were committed when one of its subtransactions
is marked as containing catalog change.

Reported-by: Tomas Vondra, Osumi Takamichi
Author: Masahiko Sawada, Kuroda Hayato
Reviewed-by: Amit Kapila, Dilip Kumar, Kuroda Hayato, Kyotaro Horiguchi, Masahiko Sawada
Backpatch-through: 10
Discussion: https://postgr.es/m/a89b46b6-0239-2fd5-71a9-b19b1f7a7145%40enterprisedb.com
Discussion: https://postgr.es/m/TYCPR01MB83733C6CEAE47D0280814D5AED7A9%40TYCPR01MB8373.jpnprd01.prod.outlook.com
2022-10-20 08:49:48 +05:30
Amit Kapila
d2169c9985 Fix the incorrect assertion introduced in commit 7f13ac8123.
It has been incorrectly assumed in commit 7f13ac8123 that we can either
purge all or none in the catalog modifying xids list retrieved from a
serialized snapshot. It is quite possible that some of the xids in that
array are old enough to be pruned but not others.

As per buildfarm

Author: Amit Kapila and Masahiko Sawada
Reviwed-by: Masahiko Sawada
Discussion: https://postgr.es/m/CAA4eK1LBtv6ayE+TvCcPmC-xse=DVg=SmbyQD1nv_AaqcpUJEg@mail.gmail.com
2022-08-29 08:10:10 +05:30
Amit Kapila
7f13ac8123 Fix catalog lookup with the wrong snapshot during logical decoding.
Previously, we relied on HEAP2_NEW_CID records and XACT_INVALIDATION
records to know if the transaction has modified the catalog, and that
information is not serialized to snapshot. Therefore, after the restart,
if the logical decoding decodes only the commit record of the transaction
that has actually modified a catalog, we will miss adding its XID to the
snapshot. Thus, we will end up looking at catalogs with the wrong
snapshot.

To fix this problem, this change adds the list of transaction IDs and
sub-transaction IDs, that have modified catalogs and are running during
snapshot serialization, to the serialized snapshot. After restart or
otherwise, when we restore from such a serialized snapshot, the
corresponding list is restored in memory. Now, when decoding a COMMIT
record, we check both the list and the ReorderBuffer to see if the
transaction has modified catalogs.

Since this adds additional information to the serialized snapshot, we
cannot backpatch it. For back branches, we took another approach.
We remember the last-running-xacts list of the decoded RUNNING_XACTS
record after restoring the previously serialized snapshot. Then, we mark
the transaction as containing catalog changes if it's in the list of
initial running transactions and its commit record has
XACT_XINFO_HAS_INVALS. This doesn't require any file format changes but
the transaction will end up being added to the snapshot even if it has
only relcache invalidations. But that won't be a problem since we use
snapshot built during decoding only to read system catalogs.

This commit bumps SNAPBUILD_VERSION because of a change in SnapBuild.

Reported-by: Mike Oh
Author: Masahiko Sawada
Reviewed-by: Amit Kapila, Shi yu, Takamichi Osumi, Kyotaro Horiguchi, Bertrand Drouvot, Ahsan Hadi
Backpatch-through: 10
Discussion: https://postgr.es/m/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2%40amazon.com
2022-08-11 10:09:24 +05:30
Amit Kapila
366283961a Allow users to skip logical replication of data having origin.
This patch adds a new SUBSCRIPTION parameter "origin". It specifies
whether the subscription will request the publisher to only send changes
that don't have an origin or send changes regardless of origin. Setting it
to "none" means that the subscription will request the publisher to only
send changes that have no origin associated. Setting it to "any" means
that the publisher sends changes regardless of their origin. The default
is "any".
Usage:
CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres port=9999'
PUBLICATION pub1 WITH (origin = none);

This can be used to avoid loops (infinite replication of the same data)
among replication nodes.

This feature allows filtering only the replication data originating from
WAL but for initial sync (initial copy of table data) we don't have such a
facility as we can only distinguish the data based on origin from WAL. As
a follow-up patch, we are planning to forbid the initial sync if the
origin is specified as none and we notice that the publication tables were
also replicated from other publishers to avoid duplicate data or loops.

We forbid to allow creating origin with names 'none' and 'any' to avoid
confusion with the same name options.

Author: Vignesh C, Amit Kapila
Reviewed-By: Peter Smith, Amit Kapila, Dilip Kumar, Shi yu, Ashutosh Bapat, Hayato Kuroda
Discussion: https://postgr.es/m/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubhjcQ@mail.gmail.com
2022-07-21 08:47:38 +05:30
Andres Freund
5264add784 pgstat: add/extend tests for resetting various kinds of stats.
- subscriber stats reset path was untested
- slot stat sreset path for all slots was untested
- pg_stat_database.sessions etc was untested
- pg_stat_reset_shared() was untested, for any kind of shared stats
- pg_stat_reset() was untested

Author: Melanie Plageman <melanieplageman@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
2022-04-07 15:43:43 -07:00
Tomas Vondra
2c7ea57e56 Revert "Logical decoding of sequences"
This reverts a sequence of commits, implementing features related to
logical decoding and replication of sequences:

 - 0da92dc530
 - 80901b3291
 - b779d7d8fd
 - d5ed9da41d
 - a180c2b34d
 - 75b1521dae
 - 2d2232933b
 - 002c9dd97a
 - 05843b1aa4

The implementation has issues, mostly due to combining transactional and
non-transactional behavior of sequences. It's not clear how this could
be fixed, but it'll require reworking significant part of the patch.

Discussion: https://postgr.es/m/95345a19-d508-63d1-860a-f5c2f41e8d40@enterprisedb.com
2022-04-07 20:06:36 +02:00
Andres Freund
0f96965c65 pgstat: add pg_stat_force_next_flush(), use it to simplify tests.
In the stats collector days it was hard to write tests for the stats system,
because fundamentally delivery of stats messages over UDP was not
synchronous (nor guaranteed). Now we easily can force pending stats updates to
be flushed synchronously.

This moves stats.sql into a parallel group, there isn't a reason for it to run
in isolation anymore. And it may shake out some bugs.

Bumps catversion.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de
2022-04-06 23:35:56 -07:00
Alvaro Herrera
7103ebb7aa Add support for MERGE SQL command
MERGE performs actions that modify rows in the target table using a
source table or query. MERGE provides a single SQL statement that can
conditionally INSERT/UPDATE/DELETE rows -- a task that would otherwise
require multiple PL statements.  For example,

MERGE INTO target AS t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED AND t.balance > s.delta THEN
  UPDATE SET balance = t.balance - s.delta
WHEN MATCHED THEN
  DELETE
WHEN NOT MATCHED AND s.delta > 0 THEN
  INSERT VALUES (s.sid, s.delta)
WHEN NOT MATCHED THEN
  DO NOTHING;

MERGE works with regular tables, partitioned tables and inheritance
hierarchies, including column and row security enforcement, as well as
support for row and statement triggers and transition tables therein.

MERGE is optimized for OLTP and is parameterizable, though also useful
for large scale ETL/ELT. MERGE is not intended to be used in preference
to existing single SQL commands for INSERT, UPDATE or DELETE since there
is some overhead.  MERGE can be used from PL/pgSQL.

MERGE does not support targetting updatable views or foreign tables, and
RETURNING clauses are not allowed either.  These limitations are likely
fixable with sufficient effort.  Rewrite rules are also not supported,
but it's not clear that we'd want to support them.

Author: Pavan Deolasee <pavan.deolasee@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Simon Riggs <simon.riggs@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions)
Reviewed-by: Peter Geoghegan <pg@bowt.ie> (earlier versions)
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
Discussion: https://postgr.es/m/20201231134736.GA25392@alvherre.pgsql
2022-03-28 16:47:48 +02:00
Noah Misch
e186f56f9c Close race condition in slot_creation_error.spec.
Use the pattern from detach-partition-concurrently-3.spec.  Per
buildfarm member wrasse.

Reviewed by Kyotaro Horiguchi and Andres Freund.

Discussion: https://postgr.es/m/20220318072837.GC2739027@rfd.leadboat.com
2022-03-18 18:18:00 -07:00
Tomas Vondra
a180c2b34d Stabilize test_decoding touching with sequences
Some of the test_decoding regression tests are unstable due to modifying
a sequence. The first increment of a sequence after a checkpoint is
always logged (and thus decoded), which makes the output unpredictable.
The runs are usually much shorter than a checkpoint internal, so these
failures are rare, but we've seen a couple of them on animals that are
either slow or are running with valgrind/clobber cache/...

Fixed by skipping sequence decoding in most tests, with the exception of
the test aimed at testing decoding of sequences.

Reported-by: Amita Kapila
Discussion: https://postgr.es/m/d045f3c2-6cfb-06d3-5540-e63c320df8bc@enterprisedb.com
2022-03-08 19:23:00 +01:00
Noah Misch
766075105c Use PG_TEST_TIMEOUT_DEFAULT for pg_regress suite non-elapsing timeouts.
Currently, only contrib/test_decoding has this property.  Use \getenv to
load the timeout value.

Discussion: https://postgr.es/m/20220218052842.GA3627003@rfd.leadboat.com
2022-03-04 18:53:13 -08:00
Tom Lane
618c16707a Rearrange libpq's error reporting to avoid duplicated error text.
Since commit ffa2e4670, libpq accumulates text in conn->errorMessage
across a whole query cycle.  In some situations, we may report more
than one error event within a cycle: the easiest case to reach is
where we report a FATAL error message from the server, and then a
bit later we detect loss of connection.  Since, historically, each
error PGresult bears the entire content of conn->errorMessage,
this results in duplication of the FATAL message in any output that
concatenates the contents of the PGresults.

Accumulation in errorMessage still seems like a good idea, especially
in view of the number of places that did ad-hoc error concatenation
before ffa2e4670.  So to fix this, let's track how much of
conn->errorMessage has been read out into error PGresults, and only
include new text in later PGresults.  The tricky part of that is
to be sure that we never discard an error PGresult once made (else
we'd risk dropping some text, a problem much worse than duplication).
While libpq formerly did that in some code paths, a little bit of
rearrangement lets us postpone making an error PGresult at all until
we are about to return it.

A side benefit of that postponement is that it now becomes practical
to return a dummy static PGresult in cases where we hit out-of-memory
while trying to manufacture an error PGresult.  This eliminates the
admittedly-very-rare case where we'd return NULL from PQgetResult,
indicating successful query completion, even though what actually
happened was an OOM failure.

Discussion: https://postgr.es/m/ab4288f8-be5c-57fb-2400-e3e857f53e46@enterprisedb.com
2022-02-18 15:35:21 -05:00
Andres Freund
278cdea6b9 Add isolation test for errors during logical slot creation.
I didn't include this test in 2f6501fa3c, because I was not sure the error
messages for the terminated connection is stable across platforms. But it
sounds like it is, and if not, we'd want to do something about the instability
anyway...

Discussion: https://postgr.es/m/CAD21AoDAeEpAbZEyYJsPZJUmSPaRicVSBObaL7sPaofnKz+9zg@mail.gmail.com
2022-02-14 21:53:36 -08:00
Amit Kapila
5e01001ffb WAL log unchanged toasted replica identity key attributes.
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.

Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Reviewed-by: Alvaro Herrera, Haiying Tang, Andres Freund
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
2022-02-14 08:55:58 +05:30
Tomas Vondra
b779d7d8fd Fix skip-empty-xacts with sequences in test_decoding
Regression tests need to use skip-empty-xacts = false, because there
might be accidental concurrent activity (like autovacuum), particularly
on slow machines. The tests added by 80901b3291 failed to do that in a
couple places, triggering occasional failures on buildfarm.

Fixing the tests however uncovered a bug in the code, because sequence
callbacks did not handle skip-empty-xacts properly. For trasactional
increments we need to check/update the xact_wrote_changes flag, and emit
the BEGIN if it's the first change in the transaction.

Reported-by: Andres Freund
Discussion: https://postgr.es/m/20220212220413.b25amklo7t4xb7ni%40alap3.anarazel.de
2022-02-12 23:50:42 +01:00
Tomas Vondra
80901b3291 Add decoding of sequences to test_decoding
Commit 0da92dc530 improved the logical decoding infrastructure to handle
sequences, and did various changes to related parts (WAL logging etc.).
But it did not include any implementation of the new callbacks added to
OutputPluginCallbacks.

This extends test_decoding with two callbacks to decode sequences. The
decoding of sequences may be disabled using 'include-sequences', a new
option of the output plugin.

Author: Tomas Vondra, Cary Huang
Reviewed-by: Peter Eisentraut, Hannu Krosing, Andres Freund
Discussion: https://postgr.es/m/d045f3c2-6cfb-06d3-5540-e63c320df8bc@enterprisedb.com
Discussion: https://postgr.es/m/1710ed7e13b.cd7177461430746.3372264562543607781@highgo.ca
2022-02-12 00:51:46 +01:00
Michael Paquier
ece8c76192 Remove assertion for replication origins in PREPARE TRANSACTION
When using replication origins, pg_replication_origin_xact_setup() is an
optional choice to be able to set a LSN and a timestamp to mark the
origin, which would be additionally added to WAL for transaction commits
or aborts (including 2PC transactions).  An assertion in the code path
of PREPARE TRANSACTION assumed that this data should always be set, so
it would trigger when using replication origins without setting up an
origin LSN.  Some tests are added to cover more this kind of scenario.

Oversight in commit 1eb6d65.

Per discussion with Amit Kapila and Masahiko Sawada.

Discussion: https://postgr.es/m/YbbBfNSvMm5nIINV@paquier.xyz
Backpatch-through: 11
2021-12-14 10:58:15 +09:00
Michael Paquier
1922d7c6e1 Add SQL functions to monitor the directory contents of replication slots
This commit adds a set of functions able to look at the contents of
various paths related to replication slots:
- pg_ls_logicalsnapdir, for pg_logical/snapshots/
- pg_ls_logicalmapdir, for pg_logical/mappings/
- pg_ls_replslotdir, for pg_replslot/<slot_name>/

These are intended to be used by monitoring tools.  Unlike pg_ls_dir(),
execution permission can be granted to non-superusers.  Roles members of
pg_monitor gain have access to those functions.

Bump catalog version.

Author: Bharath Rupireddy
Reviewed-by: Nathan Bossart, Justin Pryzby
Discussion: https://postgr.es/m/CALj2ACWsfizZjMN6bzzdxOk1ADQQeSw8HhEjhmVXn_Pu+7VzLw@mail.gmail.com
2021-11-23 19:29:42 +09:00
Amit Kapila
29b5905470 Fix toast rewrites in logical decoding.
Commit 325f2ec555 introduced pg_class.relwrite to skip operations on
tables created as part of a heap rewrite during DDL. It links such
transient heaps to the original relation OID via this new field in
pg_class but forgot to do anything about toast tables. So, logical
decoding was not able to skip operations on internally created toast
tables. This leads to an error when we tried to decode the WAL for the
next operation for which it appeared that there is a toast data where
actually it didn't have any toast data.

To fix this, we set pg_class.relwrite for internally created toast tables
as well which allowed skipping operations on them during logical decoding.

Author: Bertrand Drouvot
Reviewed-by: David Zhang, Amit Kapila
Backpatch-through: 11, where it was introduced
Discussion: https://postgr.es/m/b5146fb1-ad9e-7d6e-f980-98ed68744a7c@amazon.com
2021-08-25 09:53:07 +05:30
Tom Lane
4a054069a3 Improve display of query results in isolation tests.
Previously, isolationtester displayed SQL query results using some
ad-hoc code that clearly hadn't had much effort expended on it.
Field values longer than 14 characters weren't separated from
the next field, and usually caused misalignment of the columns
too.  Also there was no visual separation of a query's result
from subsequent isolationtester output.  This made test result
files confusing and hard to read.

To improve matters, let's use libpq's PQprint() function.  Although
that's long since unused by psql, it's still plenty good enough
for the purpose here.

Like 741d7f104, back-patch to all supported branches, so that this
isn't a stumbling block for back-patching isolation test changes.

Discussion: https://postgr.es/m/582362.1623798221@sss.pgh.pa.us
2021-06-23 11:13:00 -04:00
Tom Lane
741d7f1047 Use annotations to reduce instability of isolation-test results.
We've long contended with isolation test results that aren't entirely
stable.  Some test scripts insert long delays to try to force stable
results, which is not terribly desirable; but other erratic failure
modes remain, causing unrepeatable buildfarm failures.  I've spent a
fair amount of time trying to solve this by improving the server-side
support code, without much success: that way is fundamentally unable
to cope with diffs that stem from chance ordering of arrival of
messages from different server processes.

We can improve matters on the client side, however, by annotating
the test scripts themselves to show the desired reporting order
of events that might occur in different orders.  This patch adds
three types of annotations to deal with (a) test steps that might or
might not complete their waits before the isolationtester can see them
waiting; (b) test steps in different sessions that can legitimately
complete in either order; and (c) NOTIFY messages that might arrive
before or after the completion of a step in another session.  We might
need more annotation types later, but this seems to be enough to deal
with the instabilities we've seen in the buildfarm.  It also lets us
get rid of all the long delays that were previously used, cutting more
than a minute off the runtime of the isolation tests.

Back-patch to all supported branches, because the buildfarm
instabilities affect all the branches, and because it seems desirable
to keep isolationtester's capabilities the same across all branches
to simplify possible future back-patching of tests.

Discussion: https://postgr.es/m/327948.1623725828@sss.pgh.pa.us
2021-06-22 21:43:12 -04:00
Amit Kapila
6f4bdf8152 Fix assertion during streaming of multi-insert toast changes.
While decoding the multi-insert WAL we can't clean the toast untill we get
the last insert of that WAL record. Now if we stream the changes before we
get the last change, the memory for toast chunks won't be released and we
expect the txn to have streamed all changes after streaming.  This
restriction is mainly to ensure the correctness of streamed transactions
and it doesn't seem worth uplifting such a restriction just to allow this
case because anyway we will stream the transaction once such an insert is
complete.

Previously we were using two different flags (one for toast tuples and
another for speculative inserts) to indicate partial changes. Now instead
we replaced both of them with a single flag to indicate partial changes.

Reported-by: Pavan Deolasee
Author: Dilip Kumar
Reviewed-by: Pavan Deolasee, Amit Kapila
Discussion: https://postgr.es/m/CABOikdN-_858zojYN-2tNcHiVTw-nhxPwoQS4quExeweQfG1Ug@mail.gmail.com
2021-05-27 07:59:43 +05:30
Amit Kapila
fc69509131 Fix tests for replication slots stats.
Some of the tests were not considering that the slot's spill stats could be
received by the stats collector after we have reset the stats. Remove
those tests and don't check total bytes decoded and sent to output plugin
in the spilled stats test as we can send the spilled stats to the stats
collector before actually sending the changes to output plugin.

Reported-by: Tom Lane as per buildfarm
Author: Vignesh C, Sawada Masahiko
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de
2021-05-13 10:23:27 +05:30
Amit Kapila
51ef917303 Another try to fix the test case added by commit f5fc2f5b23.
As per analysis, it appears that the 'drop slot' message from the previous
test and 'create slot' message of the new test are either missed or not
yet delivered to the stats collector due to which we will still see the
stats from the old slot. This can happen rarely which could be the reason
that we are seeing some failures in the buildfarm randomly. To avoid that
we are using a different slot name for the tests in
test_decoding/sql/stats.sql.

Reported-by: Tom Lane based on buildfarm reports
Author: Sawada Masahiko
Reviewed-by: Amit Kapila, Vignesh C
Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de
2021-04-30 07:55:42 +05:30
Amit Kapila
c64dcc7fee Fix test case added by commit f5fc2f5b23.
In the new test after resetting the stats, we were not waiting for the
stats message to be delivered. Also, we need to decode the results for
the new test, otherwise, it will show the old stats.

In passing,
a. Change docs added by commit f5fc2f5b23 as per suggestion by
Justin Pryzby.
b. Bump the PGSTAT_FILE_FORMAT_ID as commit f5fc2f5b23 changes the file
format of stats.

Reported-by: Tom Lane based on buildfarm reports
Author: Vignesh C, Justin Pryzby
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de
2021-04-19 09:02:47 +05:30
Amit Kapila
f5fc2f5b23 Add information of total data processed to replication slot stats.
This adds the statistics about total transactions count and total
transaction data logically sent to the decoding output plugin from
ReorderBuffer. Users can query the pg_stat_replication_slots view to check
these stats.

Suggested-by: Andres Freund
Author: Vignesh C and Amit Kapila
Reviewed-by: Sawada Masahiko, Amit Kapila
Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de
2021-04-16 07:34:43 +05:30
Peter Eisentraut
0b5e824528 Message improvement
The previous wording contained a superfluous comma.  Adjust phrasing
for grammatical correctness and clarity.
2021-04-07 07:42:44 +02:00
Andres Freund
5f79580ad6 Fix memory lifetime issues of replication slot stats.
When accessing replication slot stats, introduced in 9868167500,
pgstat_read_statsfiles() reads the data into newly allocated
memory. Unfortunately the current memory context at that point is the
callers, leading to leaks and use-after-free dangers.

The fix is trivial, explicitly use pgStatLocalContext. There's some
potential for further improvements, but that's outside of the scope of
this bugfix.

No backpatch necessary, feature is only in HEAD.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i@alap3.anarazel.de
2021-03-17 16:21:46 -07:00
Amit Kapila
19890a064e Add option to enable two_phase commits via pg_create_logical_replication_slot.
Commit 0aa8a01d04 extends the output plugin API to allow decoding of
prepared xacts and allowed the user to enable/disable the two-phase option
via pg_logical_slot_get_changes(). This can lead to a problem such that
the first time when it gets changes via pg_logical_slot_get_changes()
without two_phase option enabled it will not get the prepared even though
prepare is after consistent snapshot. Now next time during getting changes,
if the two_phase option is enabled it can skip prepare because by that
time start decoding point has been moved. So the user will only get commit
prepared.

Allow to enable/disable this option at the create slot time and default
will be false. It will break the existing slots which is fine in a major
release.

Author: Ajin Cherian
Reviewed-by: Amit Kapila and Vignesh C
Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com
2021-03-03 07:34:11 +05:30