Teach _bt_preprocess_array_keys to eliminate redundant array equality
scan keys directly, rather than just marking them as redundant. Its
_bt_preprocess_keys caller is no longer required to ignore input scan
keys that were marked redundant in this way. Oversights like the one
fixed by commit f22e17f7 are no longer possible.
The new scheme also makes it easier for _bt_preprocess_keys to output a
so.keyData[] scan key array with _more_ scan keys than it was passed in
its scan.keyData[] input scan key array. An upcoming patch that adds
skip scan optimizations to nbtree will take advantage of this.
In passing, remove and rename certain _bt_preprocess_keys variables to
make the difference between our input scan key array and our output scan
key array clearer.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAH2-Wz=9A_UtM7HzUThSkQ+BcrQsQZuNhWOvQWK06PRkEp=SKQ@mail.gmail.com
table_beginscan_parallel and index_beginscan_parallel contain
Asserts checking that the relation a worker will use in
a parallel scan is the same one the leader intended. However,
they were checking for relation OID match, which was not strong
enough to detect the mismatch problem fixed in 126ec0bc7.
What would be strong enough is to compare relfilenodes instead.
Arguably, that's a saner definition anyway, since a scan surely
operates on a physical relation not a logical one. Hence,
store and compare RelFileLocators not relation OIDs. Also
ensure that index_beginscan_parallel checks the index identity
not just the table identity.
Discussion: https://postgr.es/m/2127254.1726789524@sss.pgh.pa.us
Commit 5bf748b8, which enhanced nbtree ScalarArrayOp execution, made
parallel index scans work with the new design for arrays via explicit
scheduling of primitive index scans. Under this scheme a parallel index
scan with array keys will perform the same number of index descents as
an equivalent serial index scan (barring corner cases where an
individual parallel worker discovers that it can advance the scan's
array keys without anybody needing to perform another descent of the
index to get to the relevant page on the leaf level).
Despite all this, the pgstats accounting wasn't updated; it continued to
increment the total number of index scans for the rel once per _bt_first
call, no matter the details. As a result, the number of (primitive)
index scans could be over-counted during parallel scans.
To fix, delay incrementing the count of index scans until after we've
established that another descent of the index (using either _bt_search
or _bt_endpoint) is required. That way pg_stat_user_tables.idx_scan
always advances in the same way, regardless of whether or not the scan
makes use of parallelism.
Oversight in commit 5bf748b8, which enhanced nbtree ScalarArrayOp
execution.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAH2-Wz=E7XrkvscBN0U6V81NK3Q-dQOmivvbEsjG-zwEfDdFpg@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com
Backpatch: 17-, where nbtree SAOP execution was enhanced.
We need to do RestoreRelationMap before loading catalog-derived
state, else the worker may end up with catalog relcache entries
containing stale relfilenode data. Move up RestoreReindexState
too, on the principle that that should also happen before we
do much of any catalog access.
I think ideally these things would happen even before InitPostgres,
but there are various problems standing in the way of that, notably
that the relmapper thinks "active" mappings should be discarded at
transaction end. The implication of this is that InitPostgres and
RestoreLibraryState will see the same catalog state as an independent
backend would see, which is probably fine; at least, it's been like
that all along.
Per report from Justin Pryzby. There is a case to be made that
this should be back-patched. But given the lack of complaints
before 6e086fa2e and the short amount of time remaining before
17.0 wraps, I'll just put it in HEAD for now.
Discussion: https://postgr.es/m/ZuoU_8EbSTE14o1U@pryzbyj2023
This commit moves pg_wal_replay_wait() procedure to be a neighbor of
WAL-related functions in xlogfuncs.c. The implementation of LSN waiting
continues to reside in the same place.
By proposal from Michael Paquier.
Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/18c0fa64-0475-415e-a1bd-665d922c5201%40eisentraut.org
This opens the possibility to define keys for more types of statistics
kinds in PgStat_HashKey, the first case being 8-byte query IDs for
statistics like pg_stat_statements.
This increases the size of PgStat_HashKey from 12 to 16 bytes, while
PgStatShared_HashEntry, entry stored in the dshash for pgstats, keeps
the same size due to alignment.
xl_xact_stats_item, that tracks the stats items to drop in commit WAL
records, is increased from 12 to 16 bytes. Note that individual chunks
in commit WAL records should be multiples of sizeof(int), hence 8-byte
object IDs are stored as two uint32, based on a suggestion from Heikki
Linnakangas.
While on it, the field of PgStat_HashKey is renamed from "objoid" to
"objid", as for some stats kinds this field does not refer to OIDs but
just IDs, like for replication slot stats.
This commit bumps the following format variables:
- PGSTAT_FILE_FORMAT_ID, as PgStat_HashKey is written to the stats file
for non-serialized stats kinds in the dshash table.
- XLOG_PAGE_MAGIC for the changes in xl_xact_stats_item.
- Catalog version, for the SQL function pg_stat_have_stats().
Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/ZsvTS9EW79Up8I62@paquier.xyz
* Rename $node_standby1 to $node_standby in 043_wal_replay_wait.pl as there
is only one standby.
* Remove useless debug printing in 043_wal_replay_wait.pl.
* Fix typo in one check description in 043_wal_replay_wait.pl.
* Fix some wording in comments and documentation.
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/1d7b08f2-64a2-77fb-c666-c9a74c68eeda%40gmail.com
Reviewed-by: Alexander Lakhin
Commit 5bf748b8, which enhanced nbtree ScalarArrayOp execution, made
parallel index scans work with the new design for arrays via explicit
scheduling of primitive index scans. A backend that successfully
scheduled the scan's next primitive index scan saved its backend local
array keys in shared memory. Any backend could pick up the scheduled
primitive scan within _bt_first. This scheme decouples scheduling a
primitive scan from starting the scan (by performing another descent of
the index via a _bt_search call from _bt_first) to make things robust.
The scheme had a deadlock hazard, at least when the leader process
participated in the scan. _bt_parallel_seize had a code path that made
backends that were not in an immediate position to start a scheduled
primitive index scan wait for some other backend to do so instead.
Under the right circumstances, the leader process could wait here
forever: the leader would wait for any other backend to start the
primitive scan, while every worker was busy waiting on the leader to
consume tuples from the scan's tuple queue.
To fix, don't wait for a scheduled primitive index scan to be started by
some other eligible backend from within _bt_parallel_seize (when the
calling backend isn't in a position to do so itself). Return false
instead, while recording that the scan has a scheduled primitive index
scan in backend local state. This leaves the backend in the same state
as the existing case where a backend schedules (or tries to schedule)
another primitive index scan from within _bt_advance_array_keys, before
calling _bt_parallel_seize. _bt_parallel_seize already handles that
case by returning false without waiting, and without unsetting the
backend local state. Leaving the backend in this state enables it to
start a previously scheduled primitive index scan once it gets back to
_bt_first.
Oversight in commit 5bf748b8, which enhanced nbtree ScalarArrayOp
execution.
Matthias van de Meent, with tweaks by me.
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reported-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzmMGaPa32u9x_FvEbPTUkP5e95i=QxR8054nvCRydP-sw@mail.gmail.com
Backpatch: 17-, where nbtree SAOP execution was enhanced.
Add WITHOUT OVERLAPS clause to PRIMARY KEY and UNIQUE constraints.
These are backed by GiST indexes instead of B-tree indexes, since they
are essentially exclusion constraints with = for the scalar parts of
the key and && for the temporal part.
(previously committed as 46a0cd4cef, reverted by 46a0cd4cefb; the new
part is this:)
Because 'empty' && 'empty' is false, the temporal PK/UQ constraint
allowed duplicates, which is confusing to users and breaks internal
expectations. For instance, when GROUP BY checks functional
dependencies on the PK, it allows selecting other columns from the
table, but in the presence of duplicate keys you could get the value
from any of their rows. So we need to forbid empties.
This all means that at the moment we can only support ranges and
multiranges for temporal PK/UQs, unlike the original patch (above).
Documentation and tests for this are added. But this could
conceivably be extended by introducing some more general support for
the notion of "empty" for other types.
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
This is support function 12 for the GiST AM and translates
"well-known" RT*StrategyNumber values into whatever strategy number is
used by the opclass (since no particular numbers are actually
required). We will use this to support temporal PRIMARY
KEY/UNIQUE/FOREIGN KEY/FOR PORTION OF functionality.
This commit adds two implementations, one for internal GiST opclasses
(just an identity function) and another for btree_gist opclasses. It
updates btree_gist from 1.7 to 1.8, adding the support function for
all its opclasses.
(previously committed as 6db4598fcb, reverted by 8aee330af55; this is
essentially unchanged from those)
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
When we are building a hash index that is large enough to need
pre-sorting (larger than either maintenance_work_mem or NBuffers),
the initial sorting phase is interruptible, but the insertion
phase wasn't. Add the missing CHECK_FOR_INTERRUPTS().
Per bug #18616 from Alexander Lakhin. Back-patch to all
supported branches.
Pavel Borisov
Discussion: https://postgr.es/m/18616-acbb9e5caf41e964@postgresql.org
Remove redundant checks for locale->collate_is_c now that we always
have a valid pg_locale_t.
Also, remove pg_locale_deterministic() wrapper, which is no longer
useful after commit e9931bfb75. Just check the field directly,
consistent with other fields in pg_locale_t.
Author: Andreas Karlsson
Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se
hashvalidate(), which validates the signatures of support functions
for the hash AM, contained several hardcoded exceptions. For example,
hash/date_ops support function 1 was hashint4(), which would
ordinarily fail validation because the function argument is int4, not
date. But this works internally because int4 and date are of the same
size. There are several more exceptions like this that happen to work
and were allowed historically but would now fail the function
signature validation.
This patch removes those exceptions by providing new support functions
that have the proper declared signatures. They internally share most
of the code with the "wrong" functions they replace, so the behavior
is still the same.
With the exceptions gone, hashvalidate() is now simplified and relies
fully on check_amproc_signature().
hashvarlena() and hashvarlenaextended() are kept in pg_proc.dat
because some extensions currently use them to build hash functions for
their own types, and we need to keep exposing these functions as
"LANGUAGE internal" functions for that to continue to work.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/29c3b746-69e7-482a-b37c-dbbf7e5b009b@eisentraut.org
When systable_beginscan() and systable_beginscan_ordered() choose an
index scan, they remap the attribute numbers in the passed-in scan
keys to the attribute numbers of the index, and then write those
remapped attribute numbers back into the scan key passed by the
caller. This second part is surprising and gratuitous. It means that
a scan key cannot safely be used more than once (but it might
sometimes work, depending on circumstances). Also, there is no value
in providing these remapped attribute numbers back to the caller,
since they can't do anything with that.
Fix that by making a copy of the scan keys passed by the caller and
make the modifications there.
Also, some code that had to work around the previous situation is
simplified.
Discussion: https://www.postgresql.org/message-id/flat/f8c739d9-f48d-4187-b214-df3391ba41ab@eisentraut.org
This brings more clarity to heapam.c, by cleanly separating all the
logic related to WAL replay and the rest of Heap and Heap2, similarly
to other RMGRs like hash, btree, etc.
The header reorganization is also nice in heapam.c, cutting half of the
headers required.
Author: Li Yong
Reviewed-by: Sutou Kouhei, Michael Paquier
Discussion: https://postgr.es/m/EFE55E65-D7BD-4C6A-B630-91F43FD0771B@ebay.com
The index access methods all had similar code that copied the
passed-in scan keys to local storage. They all used memmove() for
that, which is not wrong, but it seems confusing not to use memcpy()
when that would work. Presumably, this was all once copied from
ancient code and never adjusted.
Discussion: https://www.postgresql.org/message-id/flat/f8c739d9-f48d-4187-b214-df3391ba41ab@eisentraut.org
This change makes the code slightly easier to reason about, because
there is actually no need to know if a specific caller of one of these
routines should fail hard on a PANIC, or just let it go through with a
DEBUG2.
The only caller of XLogFileReadAnyTLI() used DEBUG2, and XLogFileRead()
has never used its emode. This can be simplified since 1bb2558046
that has introduced XLogFileReadAnyTLI(), splitting both.
Author: Yugo Nagata
Discussion: https://postgr.es/m/20240906201043.a640f3b44e755d4db2b6943e@sraoss.co.jp
Similarly to 2065ddf5e3, this introduces a define for "pg_tblspc".
This makes the style more consistent with the existing PG_STAT_TMP_DIR,
for example.
There is a difference with the other cases with the introduction of
PG_TBLSPC_DIR_SLASH, required in two places for recovery and backups.
Author: Bertrand Drouvot
Reviewed-by: Ashutosh Bapat, Álvaro Herrera, Yugo Nagata, Michael
Paquier
Discussion: https://postgr.es/m/ZryVvjqS9SnV1GPP@ip-10-97-1-34.eu-west-3.compute.internal
Add bounds checking to nbtree's lookahead/skip-within-a-page mechanism.
Otherwise it's possible for cases with lots of before-array-keys tuples
to overflow an int16 variable, causing the mechanism to generate an out
of bounds page offset number.
Oversight in commit 5bf748b8, which enhanced nbtree ScalarArrayOp
execution.
Reported-By: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/6c68ac42-bbb5-8b24-103e-af0e279c536f@gmail.com
Backpatch: 17-, where nbtree SAOP execution was enhanced.
Rather than the SQL injection_points_load(), this commit changes the
injection point test introduced in 768a9fd553 to rely on the two
macros INJECTION_POINT_LOAD() and INJECTION_POINT_CACHED(), that have
been originally introduced for the sake of this test.
This runs the test as a two-step process: load the injection point, then
run its callback directly from the local cache loaded. What the test
did originally was also fine, but the point here is to have an example
in core of how to use these new macros.
While on it, fix the header ordering in multixact.c, as pointed out by
Alexander Korotkov. This was an oversight in 768a9fd553.
Per discussion with Álvaro Herrera.
Author: Michael Paquier
Discussion: https://postgr.es/m/ZsUnJUlSOBNAzwW1@paquier.xyz
Discussion: https://postgr.es/m/CAPpHfduzaBz7KMhwuVOZMTpG=JniPG4aUosXPZCxZydmzq_oEQ@mail.gmail.com
Before commit a0e0fb1ba5, multixact.c contained a case in the
multixact-read path where it would loop sleeping 1ms each time until
another multixact-create path completed, which was uncovered by any
tests. That commit changed the code to rely on a condition variable
instead. Add a test now, which relies on injection points and "loading"
thereof (because of it being in a critical section), per commit
4b211003ec.
Author: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Michaël Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/0925F9A9-4D53-4B27-A87E-3D83A757B0E0@yandex-team.ru
This patch provides the additional logging information in the following
conflict scenarios while applying changes:
insert_exists: Inserting a row that violates a NOT DEFERRABLE unique constraint.
update_differ: Updating a row that was previously modified by another origin.
update_exists: The updated row value violates a NOT DEFERRABLE unique constraint.
update_missing: The tuple to be updated is missing.
delete_differ: Deleting a row that was previously modified by another origin.
delete_missing: The tuple to be deleted is missing.
For insert_exists and update_exists conflicts, the log can include the origin
and commit timestamp details of the conflicting key with track_commit_timestamp
enabled.
update_differ and delete_differ conflicts can only be detected when
track_commit_timestamp is enabled on the subscriber.
We do not offer additional logging for exclusion constraint violations because
these constraints can specify rules that are more complex than simple equality
checks. Resolving such conflicts won't be straightforward. This area can be
further enhanced if required.
Author: Hou Zhijie
Reviewed-by: Shveta Malik, Amit Kapila, Nisha Moond, Hayato Kuroda, Dilip Kumar
Discussion: https://postgr.es/m/OS0PR01MB5716352552DFADB8E9AD1D8994C92@OS0PR01MB5716.jpnprd01.prod.outlook.com
Commit 97ddda8a82 removed the rmtree()
behavior from XLOG_TBLSPC_CREATE, obsoleting that part of the comment.
The comment's point about XLOG_DBASE_CREATE was wrong when commit
fa0f466d53 introduced the point. (It
would have been accurate if that commit had predated commit
fbcbc5d06f introducing the second
checkpoint of CREATE DATABASE.) Nothing can skip log_smgrcreate() on
the basis of wal_level=minimal, so don't comment on that.
Commit c6b92041d3 expanded WAL skipping
from five specific operations to relfilenodes generally, hence the
CreateDatabaseUsingFileCopy() comment change.
Discussion: https://postgr.es/m/20231008022204.cc@rfd.leadboat.com
Previously, (auto)analyze used global variables VacuumPageHit,
VacuumPageMiss, and VacuumPageDirty to track buffer usage. However,
pgBufferUsage provides a more generic way to track buffer usage with
support functions.
This change replaces those global variables with pgBufferUsage in
analyze. Since analyze was the sole user of those variables, it
removes their declarations. Vacuum previously used those variables but
replaced them with pgBufferUsage as part of a bug fix, commit
5cd72cc0c.
Additionally, it adjusts the buffer usage message in both vacuum and
analyze for better consistency.
Author: Anthonin Bonnefoy
Reviewed-by: Masahiko Sawada, Michael Paquier
Discussion: https://postgr.es/m/CAO6_Xqr__kTTCLkftqS0qSCm-J7_xbRG3Ge2rWhucxQJMJhcRA%40mail.gmail.com
Teach nbtree backwards scans to avoid relocking a just-read leaf page to
read its current left sibling link when it isn't truly necessary. This
happened inside _bt_readnextpage whenever _bt_readpage had already
determined that there'll be no further matches to the left (or at least
none for the current primitive index scan, for a scan with array keys).
A new precheck inside _bt_readnextpage is all that we need to avoid
these useless lock acquisitions. Arguably, using a precheck like this
was a missed opportunity for commit 2ed5b87f96, which taught nbtree to
drop leaf page pins early to avoid blocking cleanup by VACUUM. Forwards
scans already managed to avoid relocking the page like this.
The optimization added by this commit is particularly helpful with
backwards scans that use array keys where the scan must perform multiple
primitive index scans. Such backwards scans will now avoid a useless
leaf page re-lock at the end of each primitive index scan.
Note that this commit does not attempt to avoid needlessly re-locking a
leaf page that was just read when the scan must follow the leaf page's
left link. That more ambitious optimization could work by stashing the
left link when the page is first read by a backwards scan, allowing the
subsequent _bt_readnextpage call to optimistically skip re-reading the
original page just to get a new copy of its left link. For now we only
address cases where we don't care about our original page's left link.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=xgs7PojG=EUvhgadwENzu_mY_riNh-w9wFPsaS717ew@mail.gmail.com
Coverity complained that the hash_create() call might access
hash_table_ctl->hctl. That's a false alarm, hash_create() only
accesses that field when passed the HASH_SHARED_MEM flag. Try to
silence it by using a plain local variable instead of a const. That's
how the HASHCTL is initialized in all the other hash_create() calls.
There was no need for these to be static buffers, local variables work
just as well. I think they were marked as 'static' to imply that they
are read-only, but 'const' is more appropriate for that, so change
them to const.
To make it possible to mark the variables as 'const', also add 'const'
decorations to the transformRelOptions() signature.
Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/54c29fb0-edf2-48ea-9814-44e918bbd6e8@iki.fi
Parallel workers failed after a sequence like
BEGIN;
CREATE USER foo;
SET SESSION AUTHORIZATION foo;
because check_session_authorization could not see the uncommitted
pg_authid row for "foo". This is because we ran RestoreGUCState()
in a separate transaction using an ordinary just-created snapshot.
The same disease afflicts any other GUC that requires catalog lookups
and isn't forgiving about the lookups failing.
To fix, postpone RestoreGUCState() into the worker's main transaction
after we've set up a snapshot duplicating the leader's. This affects
check_transaction_isolation and check_transaction_deferrable, which
think they should only run during transaction start. Make them
act like check_transaction_read_only, which already knows it should
silently accept the value when InitializingParallelWorker.
This un-reverts commit f5f30c22e. The original plan was to back-patch
that, but the fact that 0ae5b763e proved to be a pre-requisite shows
that the subtle API change for GUC hooks might actually break some of
them. The problem we're trying to fix seems not worth taking such a
risk for in stable branches.
Per bug #18545 from Andrey Rachitskiy.
Discussion: https://postgr.es/m/18545-feba138862f19aaa@postgresql.org
The previous coding here threw an error from assign_client_encoding
if it was invoked in a parallel worker. That's a very fundamental
violation of the GUC hook API: assign hooks must not throw errors.
The place to complain is in the check hook, so move the test to
there, and use the regular check-hook API (ie return false) to
report it.
The reason this coding is a problem is that it breaks GUC rollback,
which may occur after we leave InitializingParallelWorker state.
That case seems not actually reachable before now, but commit
f5f30c22e made it reachable, so we need to fix this before that
can be un-reverted.
In passing, improve the commentary in ParallelWorkerMain, and
add a check for failure of SetClientEncoding. That's another
case that can't happen now but might become possible after
foreseeable code rearrangements (notably, if the shortcut of
skipping PrepareClientEncoding stops being OK).
Discussion: https://postgr.es/m/18545-feba138862f19aaa@postgresql.org
Previously, passing NULL for pg_locale_t meant "use the libc provider
and the server environment". Now that the database collation is
represented as a proper pg_locale_t (not dependent on setlocale()),
remove special cases for NULL.
Leave wchar2char() and char2wchar() unchanged for now, because the
callers don't always have a libc-based pg_locale_t available.
Discussion: https://postgr.es/m/cfd9eb85-c52a-4ec9-a90e-a5e4de56e57d@eisentraut.org
Reviewed-by: Peter Eisentraut, Andreas Karlsson
pg_wal_replay_wait() is to be used on standby and specifies waiting for
the specific WAL location to be replayed. This option is useful when
the user makes some data changes on primary and needs a guarantee to see
these changes are on standby.
The queue of waiters is stored in the shared memory as an LSN-ordered pairing
heap, where the waiter with the nearest LSN stays on the top. During
the replay of WAL, waiters whose LSNs have already been replayed are deleted
from the shared memory pairing heap and woken up by setting their latches.
pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise,
the snapshot could prevent the replay of WAL records, implying a kind of
self-deadlock. This is why it is only possible to implement
pg_wal_replay_wait() as a procedure working without an active snapshot,
not a function.
Catversion is bumped.
Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru
Author: Kartyshov Ivan, Alexander Korotkov
Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila
Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira
Reviewed-by: Heikki Linnakangas, Kyotaro Horiguchi
This is used in the startup process to check that the pgstats file we
are reading includes the redo LSN referring to the shutdown checkpoint
where it has been written. The redo LSN in the pgstats file needs to
match with what the control file has.
This is intended to be used for an upcoming change that will extend the
write of the stats file to happen during checkpoints, rather than only
shutdown sequences.
Bump PGSTAT_FILE_FORMAT_ID.
Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/Zp8o6_cl0KSgsnvS@paquier.xyz
This reverts commit f5f30c22ed.
Some buildfarm animals are failing with "cannot change
"client_encoding" during a parallel operation". It looks like
assign_client_encoding is unhappy at being asked to roll back a
client_encoding setting after a parallel worker encounters a
failure. There must be more to it though: why didn't I see this
during local testing? In any case, it's clear that moving the
RestoreGUCState() call is not as side-effect-free as I thought.
Given that the bug f5f30c22e intended to fix has gone unreported
for years, it's not something that's urgent to fix; I'm not
willing to risk messing with it further with only days to our
next release wrap.
Parallel workers failed after a sequence like
BEGIN;
CREATE USER foo;
SET SESSION AUTHORIZATION foo;
because check_session_authorization could not see the uncommitted
pg_authid row for "foo". This is because we ran RestoreGUCState()
in a separate transaction using an ordinary just-created snapshot.
The same disease afflicts any other GUC that requires catalog lookups
and isn't forgiving about the lookups failing.
To fix, postpone RestoreGUCState() into the worker's main transaction
after we've set up a snapshot duplicating the leader's. This affects
check_transaction_isolation and check_transaction_deferrable, which
think they should only run during transaction start. Make them
act like check_transaction_read_only, which already knows it should
silently accept the value when InitializingParallelWorker.
Per bug #18545 from Andrey Rachitskiy. Back-patch to all
supported branches, because this has been wrong for awhile.
Discussion: https://postgr.es/m/18545-feba138862f19aaa@postgresql.org