Currently, WaitForLSNReplay() immediately throws an error if waiting for LSN
replay is not successful. This commit teaches WaitForLSNReplay() to return
the result of waiting, while making pg_wal_replay_wait() responsible for
throwing an appropriate error.
This is preparation to adding 'no_error' argument to pg_wal_replay_wait() and
new function pg_wal_replay_wait_status(), which returns the last wait result
status.
Additionally, we stop distinguishing situations when we find our instance to
be not in a recovery state before entering the waiting loop and inside
the waiting loop. Standby promotion may happen at any moment, even between
issuing a procedure call statement and pg_wal_replay_wait() doing a first
check of recovery status. Thus, there is no pointing distinguishing these
situations.
Also, since we may exit the waiting loop and see our instance not in recovery
without throwing an error, we need to deleteLSNWaiter() in that case. We do
this unconditionally for the sake of simplicity, even if standby was already
promoted after reaching the target LSN, the startup process surely already
deleted us.
Reported-by: Michael Paquier
Discussion: https://postgr.es/m/ZtUF17gF0pNpwZDI%40paquier.xyz
Reviewed-by: Michael Paquier, Pavel Borisov
3c5db1d6b implemented the pg_wal_replay_wait() stored procedure. Due to
the patch development history, the implementation resided in
src/backend/commands/waitlsn.c (src/include/commands/waitlsn.h for headers).
014f9f34d moved pg_wal_replay_wait() itself to
src/backend/access/transam/xlogfuncs.c near to the WAL-manipulation functions.
But most of the implementation stayed in place.
The code in src/backend/commands/waitlsn.c has nothing to do with commands,
but is related to WAL. So, this commit moves this code into
src/backend/access/transam/xlogwait.c (src/include/access/xlogwait.h for
headers).
Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/18c0fa64-0475-415e-a1bd-665d922c5201%40eisentraut.org
Reviewed-by: Pavel Borisov
Currently, when a single relcache entry gets invalidated,
TypeCacheRelCallback() has to loop over all type cache entries to find
appropriate typentry to invalidate. Unfortunately, using the syscache here
is impossible, because this callback could be called outside a transaction
and this makes impossible catalog lookups. This is why present commit
introduces RelIdToTypeIdCacheHash to map relation OID to its composite type
OID.
We are keeping RelIdToTypeIdCacheHash entry while corresponding type cache
entry have something to clean. Therefore, RelIdToTypeIdCacheHash shouldn't
get bloat in the case of temporary tables flood.
There are many places in lookup_type_cache() where syscache invalidation,
user interruption, or even error could occur. In order to handle this, we
keep an array of in-progress type cache entries. In the case of
lookup_type_cache() interruption this array is processed to keep
RelIdToTypeIdCacheHash in a consistent state.
Discussion: https://postgr.es/m/5812a6e5-68ae-4d84-9d85-b443176966a1%40sigaev.ru
Author: Teodor Sigaev
Reviewed-by: Aleksander Alekseev, Tom Lane, Michael Paquier, Roman Zharkov
Reviewed-by: Andrei Lepikhov, Pavel Borisov, Jian He, Alexander Lakhin
Reviewed-by: Artur Zakirov
Previously, a Query generated through the transform phase would have
unset stmt_location, tracking the starting point of a query string.
Extensions relying on the statement location to extract its relevant
parts in the source text string would fallback to use the whole
statement instead, leading to confusing results like in
pg_stat_statements for queries relying on nested queries, like:
- EXPLAIN, with top-level and nested query using the same query string,
and a query ID coming from the nested query when the non-top-level
entry.
- Multi-statements, with only partial portions of queries being
normalized.
- COPY TO with a query, SELECT or DMLs.
This patch improves things by keeping track of the statement locations
and propagate it to Query during transform, allowing PGSS to only show
the relevant part of the query for nested query. This leads to less
bloat in entries for non-top-level entries, as queries can now be
grouped within the same (toplevel, queryid) duos in pg_stat_statements.
The result gives a stricter one-one mapping between query IDs and its
query strings.
The regression tests introduced in 45e0ba30fc40 produce differences
reflecting the new logic.
Author: Anthonin Bonnefoy
Reviewed-by: Michael Paquier, Jian He
Discussion: https://postgr.es/m/CAO6_XqqM6S9bQ2qd=75W+yKATwoazxSNhv5sjW06fjGAtHbTUA@mail.gmail.com
Some buildfarm members complained about an always-true test in the
SOFT_ERROR_OCCURRED macro. Fix by reading the field directly rather
than using the macro.
Reported-by: Tom Lane
Discussion: https://postgr.es/m/2144895.1729653514@sss.pgh.pa.us
SASL frontend mechanisms are implemented with pg_fe_sasl_mech and
not the _be_ variant which is the backend implementation. Spotted
while reading adjacent code.
The proposed OAUTHBEARER SASL mechanism will need to allow larger
messages in the exchange, since tokens are sent directly by the
client. Move this limit into the pg_be_sasl_mech struct so that
it can be changed per-mechanism.
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAOYmi+nqX_5=Se0W0Ynrr55Fha3CMzwv_R9P3rkpHb=1kG7ZTQ@mail.gmail.com
It was possible for the code to read out-of-bound data from the
"day_tab" table with some crafted input data. Let's treat these as
invalid input as the month number is incorrect.
A test is added to test this case with a check on the errno returned by
the decoding routine. A test close to the new one added in this commit
was testing for a failure, but did not look at the errno generated, so
let's use this commit to also change it, adding a check on the errno
returned by DecodeDateTime().
Like the other test scripts, dt_test should likely be expanded to
include more checks based on the errnos generated in these code paths.
This is left as future work.
This issue exists since 2e6f97560a83, so backpatch all the way down.
Reported-by: Pavel Nekrasov
Author: Bruce Momjian, Pavel Nekrasov
Discussion: https://postgr.es/m/18614-6bbe00117352309e@postgresql.org
Backpatch-through: 12
Enable manipulation of attribute statistics. Only superficial
validation is performed, so it's possible to add nonsense, and it's up
to the planner (or other users of statistics) to behave reasonably in
that case.
Bump catalog version.
Author: Corey Huinker
Discussion: https://postgr.es/m/CADkLM=eErgzn7ECDpwFcptJKOk9SxZEk5Pot4d94eVTZsvj3gw@mail.gmail.com
Previously, CREATE/ALTER EXTENSION gave basically no useful
context about errors reported while executing script files.
I think the idea was that you could run the same commands
manually to see the error, but that's often quite inconvenient.
Let's improve that.
If we get an error during raw parsing, we won't have a current
statement identified by a RawStmt node, but we should always get
a syntax error position. Show the portion of the script from
the last semicolon-newline before the error position to the first
one after it. There are cases where this might show only a
fragment of a statement, but that should be uncommon, and it
seems better than showing the whole script file.
Without an error cursor, if we have gotten past raw parsing (which
we probably have), we can report just the current SQL statement as
an item of error context.
In any case also report the script file name as error context,
since it might not be entirely obvious which of a series of
update scripts failed. We can also show an approximate script
line number in case whatever we printed of the query isn't
sufficiently identifiable.
The error-context code path is already exercised by some
test_extensions test cases, but add tests for the syntax-error
path.
Discussion: https://postgr.es/m/ZvV1ClhnbJLCz7Sm@msg.df7cb.de
Up to now, the parser's reporting of a statement's stmt_location
included any preceding whitespace or comments. This isn't really
desirable but was done to avoid accounting honestly for nonterminals
that reduce to empty. It causes problems for pg_stat_statements,
which partially compensates by manually stripping whitespace, but
is not bright enough to strip /*-style comments. There will be
more problems with an upcoming patch to improve reporting of errors
in extension scripts, so it's time to do something about this.
The thing we have to do to make it work right is to adjust
YYLLOC_DEFAULT to scan the inputs of each production to find the
first one that has a valid location (i.e., did not reduce to
empty). In theory this adds a little bit of per-reduction overhead,
but in practice it's negligible. I checked by measuring the time
to run raw_parser() on the contents of information_schema.sql, and
there was basically no change.
Having done that, we can rely on any nonterminal that didn't reduce
to completely empty to have a correct starting location, and we don't
need the kluges the stmtmulti production formerly used.
This should have a side benefit of allowing parse error reports to
include an error position in some cases where they formerly failed to
do so, due to trying to report the position of an empty nonterminal.
I did not go looking for an example though. The one previously known
case where that could happen (OptSchemaEltList) no longer needs the
kluge it had; but I rather doubt that that was the only case.
Discussion: https://postgr.es/m/ZvV1ClhnbJLCz7Sm@msg.df7cb.de
Previously, ecpg_log() always called ECPGget_sqlca() to retrieve sqlca,
even though it was only needed for debug logging. This commit updates
ecpg_log() to call ECPGget_sqlca() only when debug logging is enabled.
Author: Yuto Sasaki
Reviewed-by: Alvaro Herrera, Tom Lane, Fujii Masao
Discussion: https://postgr.es/m/TY2PR01MB3628A85689649BABC9A1C6C3C1782@TY2PR01MB3628.jpnprd01.prod.outlook.com
... to fix bugs when the referenced table is partitioned.
The catalog representation we chose for foreign keys connecting
partitioned tables (in commit f56f8f8da6af) is inconvenient, in the
sense that a standalone table has a different way to represent the
constraint when referencing a partitioned table, than when the same
table becomes a partition (and vice versa). Because of this, we need to
create additional catalog rows on detach (pg_constraint and pg_trigger),
and remove them on attach. We were doing some of those things, but not
all of them, leading to missing catalog rows in certain cases.
The worst problem seems to be that we are missing action triggers after
detaching a partition, which means that you could update/delete rows
from the referenced partitioned table that still had referencing rows on
that table, the server failing to throw the required errors.
!!!
Note that this means existing databases with FKs that reference
partitioned tables might have rows that break relational integrity, on
tables that were once partitions on the referencing side of the FK.
Another possible problem is that trying to reattach a table
that had been detached would fail indicating that internal triggers
cannot be found, which from the user's point of view is nonsensical.
In branches 15 and above, we fix this by creating a new helper function
addFkConstraint() which is in charge of creating a standalone
pg_constraint row, and repurposing addFkRecurseReferencing() and
addFkRecurseReferenced() so that they're only the recursive routine for
each side of the FK, and they call addFkConstraint() to create
pg_constraint at each partitioning level and add the necessary triggers.
These new routines can be used during partition creation, partition
attach and detach, and foreign key creation. This reduces redundant
code and simplifies the flow.
In branches 14 and 13, we have a much simpler fix that consists on
simply removing the constraint on detach. The reason is that those
branches are missing commit f4566345cf40, which reworked the way this
works in a way that we didn't consider back-patchable at the time.
We opted to leave branch 12 alone, because it's different from branch 13
enough that the fix doesn't apply; and because it is going in EOL mode
very soon, patching it now might be worse since there's no way to undo
the damage if it goes wrong.
Existing databases might need to be repaired.
In the future we might want to rethink the catalog representation to
avoid this problem, but for now the code seems to do what's required to
make the constraints operate correctly.
Co-authored-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Co-authored-by: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Guillaume Lelarge <guillaume@lelarge.info>
Reported-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Reported-by: Thomas Baehler (SBB CFF FFS) <thomas.baehler2@sbb.ch>
Discussion: https://postgr.es/m/20230420144344.40744130@karst
Discussion: https://postgr.es/m/20230705233028.2f554f73@karst
Discussion: https://postgr.es/m/GVAP278MB02787E7134FD691861635A8BC9032@GVAP278MB0278.CHEP278.PROD.OUTLOOK.COM
Discussion: https://postgr.es/m/18541-628a61bc267cd2d3@postgresql.org
The approach of declaring a function pointer with an empty argument
list and hoping that the compiler will not complain about casting it
to another type no longer works with C23, because foo() is now
equivalent to foo(void).
We don't need to do this here. With a few struct forward declarations
we can supply a correct argument list without having to pull in
another header file.
(This is the only new warning with C23. Together with the previous
fix a67a49648d9, this makes the whole code compile cleanly under C23.)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/95c6a9bf-d306-43d8-b880-664ef08f2944%40eisentraut.org
There have never been any regression tests in PGSS for various query
patterns for nested queries combined with level tracking, like:
- Multi-statements.
- CREATE TABLE AS
- CREATE/REFRESH MATERIALIZED VIEW
- DECLARE CURSOR
- EXPLAIN, with a subset of the above supported.
- COPY.
All the tests added here track historical, sometimes confusing, existing
behaviors. For example, EXPLAIN stores two PGSS entries with the same
top-level query string but two different query IDs as one is calculated
for the top-level EXPLAIN (this part is right) and a second one for the
inner query in the EXPLAIN (this part is not right).
A couple of patches are under discussion to improve the situation, and
all the tests added here will prove useful to evaluate the changes
discussed.
Author: Anthonin Bonnefoy
Reviewed-by: Michael Paquier, Jian He
Discussion: https://postgr.es/m/CAO6_XqqM6S9bQ2qd=75W+yKATwoazxSNhv5sjW06fjGAtHbTUA@mail.gmail.com
If the query is rewritten into a NOTIFY command by a DO INSTEAD
rule, we'd get an assertion failure, or in non-assert builds
issue a rather confusing error message. Improve that.
Also fix a longstanding grammar mistake in a nearby error message.
Per bug #18664 from Alexander Lakhin. Back-patch to all supported
branches.
Tender Wang and Tom Lane
Discussion: https://postgr.es/m/18664-ffd0ebc2386598df@postgresql.org
The finished transaction list can contain XIDs that are older than the
serializable global xmin. It's a short-lived state;
ClearOldPredicateLocks() removes any such transactions from the list,
and it's called whenever the global xmin advances. But if another
backend calls SummarizeOldestCommittedSxact() in that window, it will
call SerialAdd() on an XID that's older than the global xmin, or if
there are no more transactions running, when global xmin is
invalid. That trips the assertion in SerialAdd().
Fixes bug #18658 reported by Andrew Bille. Thanks to Alexander Lakhin
for analysis. Backpatch to all versions.
Discussion: https://www.postgresql.org/message-id/18658-7dab125ec688c70b%40postgresql.org
This test can act as a template when implementing an isolation test with
injection points, and tracks in a much simpler way some of the behaviors
implied in the existing isolation test "inplace" that has been added in
c35f419d6efb. Particularly, a detach does not affect a backend wait; a
wait needs to be interrupted by a wakeup.
Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/ZxGTONm_ctQz--io@paquier.xyz
The decision in b6e1157e7 to ignore raw_expr when evaluating a
JsonValueExpr was incorrect. While its value is not ultimately
used (since formatted_expr's value is), failing to initialize it
can lead to problems, for instance, when the expression tree in
raw_expr contains Aggref nodes, which must be initialized to
ensure the parent Agg node works correctly.
Also, optimize eval_const_expressions_mutator()'s handling of
JsonValueExpr a bit. Currently, when formatted_expr cannot be folded
into a constant, we end up processing it twice -- once directly in
eval_const_expressions_mutator() and again recursively via
ece_generic_processing(). This recursive processing is required to
handle raw_expr. To avoid the redundant processing of formatted_expr,
we now process raw_expr directly in eval_const_expressions_mutator().
Finally, update the comment of JsonValueExpr to describe the roles of
raw_expr and formatted_expr more clearly.
Bug: #18657
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Diagnosed-by: Fabio R. Sluzala <fabio3rs@gmail.com>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18657-1b90ccce2b16bdb8@postgresql.org
Backpatch-through: 16
pg_shadow is not "publicly readable". (pg_group is, but there seems
no need to make that distinction here.) Seems to be a thinko dating
clear back to 7762619e9.
Antonin Houska
Discussion: https://postgr.es/m/31926.1729252247@antos
Make nbtree backwards scans optimistically access the next page to be
read to the left by following a prevPage block number that's now stashed
in currPos when the leaf page is first read. This approach matches the
one taken during forward scans, which follow a symmetric nextPage block
number from currPos. We stash both a prevPage and a nextPage, since the
scan direction might change (when fetching from a scrollable cursor).
Backwards scans will no longer need to lock the same page twice, except
in rare cases where the scan detects a concurrent page split (or page
deletion). Testing has shown this optimization to be particularly
effective during parallel index-only backwards scans: ~12% reductions in
query execution time are quite possible.
We're much better off being optimistic; concurrent left sibling page
splits are rare in general. It's possible that we'll need to lock more
pages than the pessimistic approach would have, but only when there are
_multiple_ concurrent splits of the left sibling page we now start at.
If there's just a single concurrent left sibling page split, the new
approach to scanning backwards will at least break even relative to the
old one (we'll acquire the same number of leaf page locks as before).
The optimization from this commit has long been contemplated by comments
added by commit 2ed5b87f96, which changed the rules for locking/pinning
during nbtree index scans. The approach that that commit introduced to
leaf level link traversal when scanning forwards is now more or less
applied all the time, regardless of the direction we're scanning in.
Following uniform conventions around sibling link traversal is simpler.
The only real remaining difference between our forward and backwards
handling is that our backwards handling must still detect and recover
from any concurrent left sibling splits (and concurrent page deletions),
as documented in the nbtree README. That is structured as a single,
isolated extra step that takes place in _bt_readnextpage.
Also use this opportunity to further simplify the functions that deal
with reading pages and traversing sibling links on the leaf level, and
to document their preconditions and postconditions (with respect to
things like buffer locks, buffer pins, and seizing the parallel scan).
This enhancement completely supersedes the one recently added by commit
3f44959f.
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAEze2WgpBGRgTTxTWVPXc9+PB6fc1a7t+VyGXHzfnrFXcQVxnA@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzkBTuFv7W2+84jJT8mWZLXVL0GHq2hMUTn6c9Vw=eYrCw@mail.gmail.com
The present wording about viewing shared_memory_size_in_huge_pages
seems to suggest that the parameter cannot be viewed after startup
at all, whereas the intent is to make it clear that you can't use
"postgres -C" to view this parameter while the server is running.
This commit rephrases this section to remove the ambiguity.
Author: Seino Yuki
Reviewed-by: Michael Paquier, David G. Johnston, Fujii Masao
Discussion: https://postgr.es/m/420584fd274f9ec4f337da55ffb3b790%40oss.nttdata.com
Backpatch-through: 15
Commit 5d2e1cc117b introduced some strsep() uses, but it did the
memory management wrong in some cases. We need to keep a separate
pointer to the allocate memory so that we can free it later, because
strsep() advances the pointer we pass to it, and it at the end it
will be NULL, so any free() calls won't do anything.
(This fixes two of the four places changed in commit 5d2e1cc117b. The
other two don't have this problem.)
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c3944222a@eisentraut.org
The previous code (from commit 5d2e1cc117b) did not detect end of
string correctly, so it would fail to error out if fewer than the
expected number of fields were present, which could then later lead to
a crash when NULL string pointers are accessed.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reported-by: Ranier Vilela <ranier.vf@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c3944222a@eisentraut.org
Commit 3bf3ab8c56 initially introduced support for unlogged
materialized views, but this was later disallowed by commit 3223b25ff7.
Additionally, commit d25f519107 added more code for handling
unlogged materialized views. This commit cleans up all unused
code related to them.
If unlogged materialized views had been supported in any official
release, psql would need to retain code to handle them for compatibility
with older servers. However, since they were never included in
an official release, this code is no longer necessary.
Author: Pixian Shi
Reviewed-by: Yugo Nagata, Fujii Masao
Discussion: https://postgr.es/m/CAAccyYKRZ=OvAvgowiSH+OELbStLP=p2Ht=R3CgT=OaNSH5DAA@mail.gmail.com
The arguments of the function were listed in an incorrect order in the
description of the routine. This information can be seen with perldoc.
Issue spotted while working on this area of the code.
Backpatch-through: 17
Don't get confused by an unmatched right brace in the input.
(Previously, this led to discarding information about file-level
variables and then possibly crashing.)
Detect, rather than crash on, an attempt to index into a non-array
variable.
As before, in the absence of field complaints I'm not too
excited about back-patching these.
Per valgrind testing by Alexander Lakhin.
Discussion: https://postgr.es/m/a239aec2-6c79-5fc9-9272-cea41158a360@gmail.com
After repartitioning the inner side of a hash join that would have
exceeded the allowed size, we check if all the tuples from a parent
partition moved to one child partition. That is evidence that it
contains duplicate keys and later attempts to repartition will also
fail, so we should give up trying to limit memory (for lack of a better
fallback strategy).
A thinko prevented the check from working correctly in partition 0 (the
one that is partially loaded into memory already). After
repartitioning, we should check for extreme skew if the *parent*
partition's space_exhausted flag was set, not the child partition's.
The consequence was repeated futile repartitioning until per-partition
data exceeded various limits including "ERROR: invalid DSA memory alloc
request size 1811939328", OS allocation failure, or temporary disk space
errors. (We could also do something about some of those symptoms, but
that's material for separate patches.)
This problem only became likely when PostgreSQL 16 introduced support
for Parallel Hash Right/Full Join, allowing NULL keys into the hash
table. Repartitioning always leaves NULL in partition 0, no matter how
many times you do it, because the hash value is all zero bits. That's
unlikely for other hashed values, but they might still have caused
wasted extra effort before giving up.
Back-patch to all supported releases.
Reported-by: Craig Milhiser <craig@milhiser.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/CA%2BwnhO1OfgXbmXgC4fv_uu%3DOxcDQuHvfoQ4k0DFeB0Qqd-X-rQ%40mail.gmail.com
Instead of using Node *, we can use an incomplete struct. That way,
everything has the correct type and fewer casts are required. This
technique is already used elsewhere in node type definitions.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/637eeea8-5663-460b-a114-39572c0f6c6e%40eisentraut.org
Valgrind reports that checks on lex->inc_state are undefined for the
"dummy lexer" used for incremental parsing, since it's only partially
initialized on the stack. This was introduced in 0785d1b8b2.
Zero-initialize the whole struct.
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAOYmi+n9QWr4gsAADZc6qFQjFViXQYVk=gBy_EvxuqsgPJcb_g@mail.gmail.com