Previously, invalidated logical and physical replication slots could
be copied using the pg_copy_logical_replication_slot and
pg_copy_physical_replication_slot functions. Replication slots that
were invalidated for reasons other than WAL removal retained their
restart_lsn. This meant that a new slot copied from an invalidated
slot could have a restart_lsn pointing to a WAL segment that might
have already been removed.
This commit restricts the copying of invalidated replication slots.
Backpatch to v16, where slots could retain their restart_lsn when
invalidated for reasons other than WAL removal.
For v15 and earlier, this check is not required since slots can only
be invalidated due to WAL removal, and existing checks already handle
this issue.
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CANhcyEU65aH0VYnLiu%3DOhNNxhnhNhwcXBeT-jvRe1OiJTo_Ayg%40mail.gmail.com
Backpatch-through: 16
This adds a new connection parameter which instructs libpq to
write out keymaterial clientside into a file in order to make
connection debugging with Wireshark and similar tools possible.
The file format used is the standardized NSS format.
Author: Abhishek Chanda <abhishek.becs@gmail.com>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/CAKiP-K85C8uQbzXKWf5wHQPkuygGUGcufke713iHmYWOe9q2dA@mail.gmail.com
The alleged "statistics pg_dump bug" that prevented us from enabling
stats dumping in commit 172259afb563 wasn't a pg_dump bug after all: it
was just a side effect of not running pg_dump at the right time (namely,
before giving autovacuum some time to do its thing and then disabling it
to stabilize things). Move the code around to fix this problem and
enable statistics dumping.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Diagnosed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/5f3703fd7f27da62a8f3615218f937507f522347.camel@j-davis.com
Discussion: https://postgr.es/m/CAExHW5sDm+aGb7A4EXK=X9rkrmSPDgc03EdADt=wWkdMO=XPSA@mail.gmail.com
This renames %node_params to %old_node_params, @initdb_params to
@old_initdb_params, and adds separate @new_initdb_params and
%new_node_params rather than reusing the former in confusing ways.
Extracted from a larger patch from the same author.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAExHW5sDm+aGb7A4EXK=X9rkrmSPDgc03EdADt=wWkdMO=XPSA@mail.gmail.com
The check for non-inheritable constraints is performed later, and the
same comment is included at that point.
While we're here, remove one extraneous blank line.
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CACJufxETi6x86S8EkH8mRfOcm2AenoE9t1pyCFVMpU34gVhF3w@mail.gmail.com
The addition of SpecialCasing.txt by commit 286a365b9c2 was not added
to the make target dependencies, so the invoked script would fail
because the required file wasn't downloaded first. (The meson version
appears to work correctly.)
The issue is that the transactions prepared before two-phase decoding is
enabled can fail to replicate to the subscriber after being committed on a
promoted standby following a failover. This is because the two_phase_at
field of a slot, which tracks the LSN from which two-phase decoding
starts, is not synchronized to standby servers. Without two_phase_at, the
logical decoding might incorrectly identify prepared transaction as
already replicated to the subscriber after promotion of standby server,
causing them to be skipped.
To address the issue on HEAD, the two_phase_at field of the slot is
exposed by the pg_replication_slots view and allows the slot
synchronization to copy this value to the corresponding synced slot on the
standby server.
This bug is likely to occur if the user toggles the two_phase option to
true after initial slot creation. Given that altering the two_phase option
of a replication slot is not allowed in PostgreSQL 17, this bug is less
likely to occur. We can't change the view/function definition in
backbranch so we can't push the same fix but we are brainstorming an
appropriate solution for PG17.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/TYAPR01MB5724CC7C288535BBCEEE65DA94A72@TYAPR01MB5724.jpnprd01.prod.outlook.com
compareentry() is declared to work on WordEntryIN structs, but
tsvectorrecv() is using it in two places to work on WordEntry
structs. This is almost okay, since WordEntry is the first
field of WordEntryIN. But on machines with 8-byte pointers,
WordEntryIN will have a larger alignment spec than WordEntry,
and it's at least theoretically possible that the compiler
could generate code that depends on the larger alignment.
Given the lack of field reports, this may be just a hypothetical bug
that upsets nothing except sanitizer tools. Or it may be real on
certain hardware but nobody's tried to use tsvectorrecv() on such
hardware. In any case we should fix it, and the fix is trivial:
just change compareentry() so that it works on WordEntry without any
mention of WordEntryIN. We can also get rid of the quite-useless
intermediate function WordEntryCMP.
Bug: #18875
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18875-07a29c49c825a608@postgresql.org
Backpatch-through: 13
In the previous commit HeapBitmapScan's skip_fetch optimization was removed,
due to being broken in not easily fixable ways. Add a test that verifies we
don't re-introduce this bug if somebody tries to re-add the feature.
Only add the test to master for now, it's possible it's not entirely
stable. That seems sufficient, as we're not going to re-introduce the feature
on the backbranches. I did verify that the test passes on all branches. If the
test turns out to be unproblematic, we can backpatch it later, should we feel
a need to do so.
Discussion: https://postgr.es/m/CAEze2Wg3gXXZTr6_rwC+s4-o2ZVFB5F985uUSgJTsECx6AmGcQ@mail.gmail.com
The optimization does not take the removal of TIDs by a concurrent vacuum into
account. The concurrent vacuum can remove dead TIDs and make pages ALL_VISIBLE
while those dead TIDs are referenced in the bitmap. This can lead to a
skip_fetch scan returning too many tuples.
It likely would be possible to implement this optimization safely, but we
don't have the necessary infrastructure in place. Nor is it clear that it's
worth building that infrastructure, given how limited the skip_fetch
optimization is.
In the backbranches we just disable the optimization by always passing
need_tuples=true to table_beginscan_bm(). We can't perform API/ABI changes in
the backbranches and we want to make the change as minimal as possible.
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reported-By: Konstantin Knizhnik <knizhnik@garret.ru>
Discussion: https://postgr.es/m/CAEze2Wg3gXXZTr6_rwC+s4-o2ZVFB5F985uUSgJTsECx6AmGcQ@mail.gmail.com
Backpatch-through: 13
In the historical implementation of SQL functions (if they don't get
inlined), we built plans for all the contained queries at first call
within an outer query, and then re-used those plans for the duration
of the outer query, and then forgot everything. This was not ideal,
not least because the plans could not be customized to specific values
of the function's parameters. Our plancache infrastructure seems
mature enough to be used here. That will solve both the problem with
not being able to build custom plans and the problem with not being
able to share work across successive outer queries.
Aside from those performance concerns, this change fixes a
longstanding bugaboo with SQL functions: you could not write DDL that
would affect later statements in the same function. That's mostly
still true with new-style SQL functions, since the results of parse
analysis are baked into the stored query trees (and protected by
dependency records). But for old-style SQL functions, it will now
work much as it does with PL/pgSQL functions, because we delay parse
analysis and planning of each query until we're ready to run it.
Some edge cases that require replanning are now handled better too;
see for example the new rowsecurity test, where we now detect an RLS
context change that was previously missed.
One other edge-case change that might be worthy of a release note
is that we now insist that a SQL function's result be generated
by the physically-last query within it. Previously, if the last
original query was deleted by a DO INSTEAD NOTHING rule, we'd be
willing to take the result from the preceding query instead.
This behavior was undocumented except in source-code comments,
and it seems hard to believe that anyone's relying on it.
Along the way to this feature, we needed a few infrastructure changes:
* The plancache can now take either a raw parse tree or an
analyzed-but-not-rewritten Query as the starting point for a
CachedPlanSource. If given a Query, it is caller's responsibility
that nothing will happen to invalidate that form of the query.
We use this for new-style SQL functions, where what's in pg_proc is
serialized Query(s) and we trust the dependency mechanism to disallow
DDL that would break those.
* The plancache now offers a way to invoke a post-rewrite callback
to examine/modify the rewritten parse tree when it is rebuilding
the parse trees after a cache invalidation. We need this because
SQL functions sometimes adjust the parse tree to make its output
exactly match the declared result type; if the plan gets rebuilt,
that has to be re-done.
* There is a new backend module utils/cache/funccache.c that
abstracts the idea of caching data about a specific function
usage (a particular function and set of input data types).
The code in it is moved almost verbatim from PL/pgSQL, which
has done that for a long time. We use that logic now for
SQL-language functions too, and maybe other PLs will have use
for it in the future.
Author: Alexander Pyhalov <a.pyhalov@postgrespro.ru>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://postgr.es/m/8216639.NyiUUSuA9g@aivenlaptop
For GiST, having a sortsupport function allows building the index
using the "sorted build" method, which is much faster.
For b-tree, the sortsupport routine doesn't give any new
functionality, but speeds up sorting a tiny bit. The difference is not
very significant, about 2% in cursory testing on my laptop, because
the range type comparison function has quite a lot of overhead from
detoasting. In any case, since we have the function for GiST anyway,
we might as well register it for the btree opfamily too.
Author: Bernd Helmle <mailings@oopsware.de>
Discussion: https://www.postgresql.org/message-id/64d324ce2a6d535d3f0f3baeeea7b25beff82ce4.camel@oopsware.de
Various places allocated shared memory by first allocating a small chunk
using ShmemInitStruct(), followed by ShmemAlloc() calls to allocate more
memory. Unfortunately, ShmemAlloc() does not update ShmemIndex, so this
affected pg_shmem_allocations - it only shown the initial chunk.
This commit modifies the following allocations, to allocate everything
as a single chunk, and then split it internally.
- PredXactList
- RWConflictPool
- PGPROC structures
- Fast-Path Lock Array
The fast-path lock array is allocated separately, not as a part of the
PGPROC structures allocation.
Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAH2L28vHzRankszhqz7deXURxKncxfirnuW68zD7+hVAqaS5GQ@mail.gmail.com
pg_shmem_allocations tracks the memory allocated by ShmemInitStruct(),
but for shared hash tables that covered only the header and hash
directory. The remaining parts (segments and buckets) were allocated
later using ShmemAlloc(), which does not update the shmem accounting.
Thus, these allocations were not shown in pg_shmem_allocations.
This commit improves the situation by allocating all the hash table
parts at once, using a single ShmemInitStruct() call. This way the
ShmemIndex entries (and thus pg_shmem_allocations) better reflect the
proper size of the hash table.
This affects allocations for private (non-shared) hash tables too, as
the hash_create() code is shared. For non-shared tables this however
makes no practical difference.
This changes the alignment a bit. ShmemAlloc() aligns the chunks using
CACHELINEALIGN(), which means some parts (header, directory, segments)
were aligned this way. Allocating all parts as a single chunk removes
this (implicit) alignment. We've considered adding explicit alignment,
but we've decided not to - it seems to be merely a coincidence due to
using the ShmemAlloc() API, not due to necessity.
Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAH2L28vHzRankszhqz7deXURxKncxfirnuW68zD7+hVAqaS5GQ@mail.gmail.com
Without this, an additional change to the same pg_attribute row
within the same command will fail. This is possible at least with
ALTER TABLE ADD COLUMN on a multiple-inheritance-pathway structure.
(Another potential hazard is that immediately-following operations
might not see the missingval.)
Introduced by 95f650674, which split the former coding that
used a single pg_attribute update to change both atthasdef and
atthasmissing/attmissingval into two updates, but missed that
this should entail two CommandCounterIncrements as well. Like
that fix, back-patch through v13.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/025a3ffa-5eff-4a88-97fb-8f583b015965@gmail.com
Backpatch-through: 13
Currently, the cancel request key is a 32-bit token, which isn't very
much entropy. If you want to cancel another session's query, you can
brute-force it. In most environments, an unauthorized cancellation of
a query isn't very serious, but it nevertheless would be nice to have
more protection from it. Hence make the key longer, to make it harder
to guess.
The longer cancellation keys are generated when using the new protocol
version 3.2. For connections using version 3.0, short 4-bytes keys are
still used.
The new longer key length is not hardcoded in the protocol anymore,
the client is expected to deal with variable length keys, up to 256
bytes. This flexibility allows e.g. a connection pooler to add more
information to the cancel key, which might be useful for finding the
connection.
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Discussion: https://www.postgresql.org/message-id/508d0505-8b7a-4864-a681-e7e5edfe32aa@iki.fi
All supported version of the PostgreSQL server send the
NegotiateProtocolVersion message when an unsupported minor protocol
version is requested by a client. But many other applications that
implement the PostgreSQL protocol (connection poolers, or other
databases) do not, and the same is true for PostgreSQL server versions
older than 9.3. Connecting to such other applications thus fails if a
client requests a protocol version different than 3.0.
This patch adds a max_protocol_version connection option to libpq that
specifies the protocol version that libpq should request from the
server. Currently only 3.0 is supported, but that will change in a
future commit that bumps the protocol version. Even after that version
bump the default will likely stay 3.0 for the time being. Once more of
the ecosystem supports the NegotiateProtocolVersion message we might
want to change the default to the latest minor version.
This also adds the similar min_protocol_version connection option, to
allow the client to specify that connecting should fail if a lower
protocol version is attempted by the server. This can be used to
ensure that certain protocol features are used, which can be
particularly useful if those features impact security.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Discussion: https://www.postgresql.org/message-id/CAGECzQTfc_O%2BHXqAo5_-xG4r3EFVsTefUeQzSvhEyyLDba-O9w@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAGECzQRbAGqJnnJJxTdKewTsNOovUt4bsx3NFfofz3m2j-t7tA@mail.gmail.com
Previously libpq would always error out if the server sends a
NegotiateProtocolVersion message. This was fine because libpq only
supported a single protocol version and did not support any protocol
parameters. But in the upcoming commits, we will introduce a new
protocol version and the NegotiateProtocolVersion message starts to
actually be used.
This patch modifies the client side checks to allow a range of
supported protocol versions, instead of only allowing the exact
version that was requested. Currently this "range" only contains the
3.0 version, but in a future commit we'll change this.
Also clarify the error messages, making them suitable for the world
where libpq will support multiple protocol versions and protocol
extensions.
Note that until the later commits that introduce new protocol version,
this change does not have any behavioural effect, because libpq will
only request version 3.0 and will never send protocol parameters, and
therefore will never receive a NegotiateProtocolVersion message from
the server.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Discussion: https://www.postgresql.org/message-id/CAGECzQTfc_O%2BHXqAo5_-xG4r3EFVsTefUeQzSvhEyyLDba-O9w@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAGECzQRbAGqJnnJJxTdKewTsNOovUt4bsx3NFfofz3m2j-t7tA@mail.gmail.com
The changes made in commit d2b4b4c2259 contained incorrect comments:
They said that certain forward declarations were necessary to "avoid
including pathnodes.h here", but the file is itself pathnodes.h! So
change the comment to just say it's a forward declaration in one case,
and in the other case we don't need the declaration at all because it
already appeared earlier in the file.
timingsafe_bcmp() should be used instead of memcmp() or a naive
for-loop, when comparing passwords or secret tokens, to avoid leaking
information about the secret token by timing. This commit just
introduces the function but does not change any existing code to use
it yet.
Co-authored-by: Jelte Fennema-Nio <github-tech@jeltef.nl>
Discussion: https://www.postgresql.org/message-id/7b86da3b-9356-4e50-aa1b-56570825e234@iki.fi
The test added in 93bc3d75d8e failed in a build with RELCACHE_FORCE_RELEASE
and CATCACHE_FORCE_RELEASE defined. The test intentionally forgets to exit
batchmode - normally that would trigger an error at the end of the
transaction, which the test verifies. However, with RELCACHE_FORCE_RELEASE
and CATCACHE_FORCE_RELEASE defined, we get other code (output function lookup)
entering batchmode and erroring out because batchmode isn't allowed to be
entered recursively.
Fix that by changing the queries in question to not output any rows. That's
not exactly pretty, but seems to avoid the problem reliably.
Eventually we might want to make RELCACHE_FORCE_RELEASE and
CATCACHE_FORCE_RELEASE GUCs, so we can disable them where necessary - this
isn't the first test having difficulty with those debug options. But that's
for later.
Per buildfarm member prion.
Discussion: https://postgr.es/m/uc62i6vi5gd4bi6wtjj5poadqxolgy55e7ihkmf3mthjegb6zl@zqo7xez7sc2r
The test added in 93bc3d75d8e sometimes fails on windows, due to warnings like
WARNING: some useless files may be left behind in old database directory "base/16514"
The reason for that is createdb_failure_callback() does not ensure that there
are no open file descriptors for files in the partially created,
to-be-dropped, database. We do take care in dropdb(), but that involves
waiting for checkpoints and a ProcSignalBarrier, which we probably don't want
to do in an error callback. This should probably be fixed one day, but for
now 001_aio.pl needs to cope.
Per buildfarm animals fairywren and drongo.
Discussion: https://postgr.es/m/uc62i6vi5gd4bi6wtjj5poadqxolgy55e7ihkmf3mthjegb6zl@zqo7xez7sc2r
This expands the NOT ENFORCED constraint flag, previously only
supported for CHECK constraints (commit ca87c415e2f), to foreign key
constraints.
Normally, when a foreign key constraint is created on a table, action
and check triggers are added to maintain data integrity. With this
patch, if a constraint is marked as NOT ENFORCED, integrity checks are
no longer required, making these triggers unnecessary. Consequently,
when creating a NOT ENFORCED foreign key constraint, triggers will not
be created, and the constraint will be marked as NOT VALID.
Similarly, if an existing foreign key constraint is changed to NOT
ENFORCED, the associated triggers will be dropped, and the constraint
will also be marked as NOT VALID. Conversely, if a NOT ENFORCED
foreign key constraint is changed to ENFORCED, the necessary triggers
will be created, and the will be changed to VALID by performing
necessary validation.
Since not-enforced foreign key constraints have no triggers, the
shortcut used for example in psql and pg_dump to skip looking for
foreign keys if the relation is known not to have triggers no longer
applies. (It already didn't work for partitioned tables.)
Author: Amul Sul <sulamul@gmail.com>
Reviewed-by: Joel Jacobson <joel@compiler.org>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Isaac Morland <isaac.morland@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Tested-by: Triveni N <triveni.n@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b962c5AcYW9KUt_R_ER5qs3fUGbe4az-SP-vuwPS-w-AGA@mail.gmail.com
If io_method is set in TEMP_CONFIG the test added in 93bc3d75d8e fails,
because it assumes the io_method specified at initdb is actually used.
Fix that by appending the io_method again, after initdb (and thus after
TEMP_CONFIG has been added by Cluster.pm).
Per buildfarm animal bumblebee
Discussion: https://postgr.es/m/zh5u22wbpcyfw2ddl3lsvmsxf4yvsrvgxqwwmfjddc4c2khsgp@gfysyjsaelr5
Allow multiple backends to initialize WAL buffers concurrently. This way
`MemSet((char *) NewPage, 0, XLOG_BLCKSZ);` can run in parallel without
taking a single LWLock in exclusive mode.
The new algorithm works as follows:
* reserve a page for initialization using XLogCtl->InitializeReserved,
* ensure the page is written out,
* once the page is initialized, try to advance XLogCtl->InitializedUpTo and
signal to waiters using XLogCtl->InitializedUpToCondVar condition
variable,
* repeat previous steps until we reserve initialization up to the target
WAL position,
* wait until concurrent initialization finishes using a
XLogCtl->InitializedUpToCondVar.
Now, multiple backends can, in parallel, concurrently reserve pages,
initialize them, and advance XLogCtl->InitializedUpTo to point to the latest
initialized page.
Author: Yura Sokolov <y.sokolov@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Tested-by: Michael Paquier <michael@paquier.xyz>
Even after reaching the minimum recovery point, if there are long-lived
write transactions with 64 subtransactions on the primary, the recovery
snapshot may not yet be ready for hot standby, delaying read-only
connections on the standby. Previously, when read-only connections were
not accepted due to this condition, the following error message was logged:
FATAL: the database system is not yet accepting connections
DETAIL: Consistent recovery state has not been yet reached.
This DETAIL message was misleading because the following message was
already logged in this case:
LOG: consistent recovery state reached
This contradiction, i.e., indicating that the recovery state was consistent
while also stating it wasn’t, caused confusion.
This commit improves the error message to better reflect the actual state:
FATAL: the database system is not yet accepting connections
DETAIL: Recovery snapshot is not yet ready for hot standby.
HINT: To enable hot standby, close write transactions with more than 64 subtransactions on the primary server.
To implement this, the commit introduces a new postmaster signal,
PMSIGNAL_RECOVERY_CONSISTENT. When the startup process reaches
a consistent recovery state, it sends this signal to the postmaster,
allowing it to correctly recognize that state.
Since this is not a clear bug, the change is applied only to the master
branch and is not back-patched.
Author: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Yugo Nagata <nagata@sraoss.co.jp>
Discussion: https://postgr.es/m/02db8cd8e1f527a8b999b94a4bee3165@oss.nttdata.com
Push an ErrorContextCallback adding additional detail about the process
performing the I/O and the owner of the I/O when those are not the same.
For io_method worker, this adds context specifying which process owns
the I/O that the I/O worker is processing.
For io_method io_uring, this adds context only when a backend is
*completing* I/O for another backend. It specifies the pid of the owning
process.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/rdml3fpukrqnas7qc5uimtl2fyytrnu6ymc2vjf2zuflbsjuul%40hyizyjsexwmm
50e17ad28 (v14) and 29f45e299 (v15) made it so the planner could identify
IN and NOT IN clauses which have Const lists as right-hand arguments and
when an appropriate hash function is available for the data types, mark
the ScalarArrayOpExpr as hashable so the executor could execute it more
optimally by building and probing a hash table during expression
evaluation.
These commits both worked correctly when there was only a single
ScalarArrayOpExpr in the given expression being processed by the
planner, but when there were multiple, only the first was checked and any
subsequent ones were not identified, which resulted in less optimal
expression evaluation during query execution for all but the first found
ScalarArrayOpExpr.
Backpatch to 14, where 50e17ad28 was introduced.
Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/29a76f51-97b0-4c07-87b7-ec8e3b5345c9@gmail.com
Backpatch-through: 14
Create a function that will sort the elements of an array
according to the element type's sort order. If the array
has more than one dimension, the sub-arrays of the first
dimension are sorted per normal array-comparison rules,
leaving their contents alone.
In support of this, add pg_type.typarray to the set of fields
cached by the typcache.
Author: Junwang Zhao <zhjwpku@gmail.com>
Co-authored-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://postgr.es/m/CAEG8a3J41a4dpw_-F94fF-JPRXYxw-GfsgoGotKcjs9LVfEEvw@mail.gmail.com
As of 15.4, macOS has strchrnul(), but access to it is blocked behind
a check for MACOSX_DEPLOYMENT_TARGET >= 15.4. But our does-it-link
configure check finds it, so we try to use it, and fail with the
present default deployment target (namely 15.0). This accounts for
today's buildfarm failures on indri and sifaka.
This is the identical problem that we faced some years ago when Apple
introduced preadv and pwritev in the same way. We solved that in
commit f014b1b9b by using AC_CHECK_DECLS instead of AC_CHECK_FUNCS
to check the functions' availability. So do the same now for
strchrnul(). Interestingly, we already had a workaround for
"the link check doesn't agree with <string.h>" cases with glibc,
which we no longer need since only the header declaration is being
checked.
Testing this revealed that the meson version of this check has never
worked, because it failed to use "-Werror=unguarded-availability-new".
(Apparently nobody's tried to build with meson on macOS versions that
lack preadv/pwritev as standard.) Adjust that while at it. Also,
we had never put support for "-Werror=unguarded-availability-new"
into v13, but we need that now.
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/385134.1743523038@sss.pgh.pa.us
Backpatch-through: 13
mdreadv() has a codepath to zero out buffers when a read returns zero bytes,
guarded by a check for zero_damaged_pages || InRecovery.
The InRecovery codepath to zero out buffers in mdreadv() appears to be
unreachable. The only known paths to reach mdreadv()/mdstartreadv() in
recovery are XLogReadBufferExtended(), vm_readbuf(), and fsm_readbuf(), each
of which takes care to extend the relation if necessary. This looks to either
have been the case for a long time, or the code was never reachable.
The zero_damaged_pages path is incomplete, as missing segments are not
created.
Putting blocks into the buffer-pool that do not exist on disk is rather
problematic, as such blocks will, at least initially, not be found by scans
that rely on smgrnblocks(), as they are beyond EOF. It also can cause weird
problems with relation extension, as relation extension does not expect blocks
beyond EOF to exist.
Therefore we would like to remove that path.
mdstartreadv(), which I added in e5fe570b51c, does not implement this zeroing
logic. I had started a discussion about that a while ago (linked below), but
forgot to act on the conclusion of the discussion, namely to disable the
in-memory-zeroing behavior.
We could certainly implement equivalent zeroing logic in mdstartreadv(), but
it would have to be more complicated due to potential differences in the
zero_damaged_pages setting between the definer and completor of IO. Given that
we want to remove the logic, that does not seem worth implementing the
necessary logic.
For now, put an Assert(false) and comments documenting this choice into
mdreadv() and comments documenting the deprecation of the path in mdreadv()
and the non-implementation of it in mdstartreadv(). If we, during testing,
discover that we do need the path, we can implement it at that time.
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/postgr.es/m/20250330024513.ac.nmisch@google.com
Discussion: https://postgr.es/m/postgr.es/m/3qxxsnciyffyf3wyguiz4besdp5t5uxvv3utg75cbcszojlz7p@uibfzmnukkbd
To make the tests possible, a few functions from bufmgr.c/localbuf.c had to be
exported, via buf_internals.h.
Reviewed-by: Noah Misch <noah@leadboat.com>
Co-authored-by: Andres Freund <andres@anarazel.de>
Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Add a test to pg_upgrade's test suite that verifies that
dump-restore-dump of regression database produces equivalent output to
dumping it directly. This was already being tested by running
pg_upgrade itself, but non-binary-upgrade mode was not being covered.
The regression database has accrued, over time, a sufficient collection
of interesting objects to ensure good coverage, but there hasn't been a
concerted effort to be completely exhaustive, so it is likely still
possible to have more.
This'd belong more naturally in the pg_dump test suite, but we chose to
put it in src/bin/pg_upgrade/t/002_pg_upgrade.pl because we need a run
of the regression tests which is already done here, so this has less
total test runtime impact. Also, experiments have shown that using
parallel dump/restore is slightly faster, so we use --format=directory -j2.
This test has already reported pg_dump bugs, as fixed in fd41ba93e463,
74563f6b9021, d611f8b1587b, 4694aedf63bf.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/CAExHW5uF5V=Cjecx3_Z=7xfh4rg2Wf61PT+hfquzjBqouRzQJQ@mail.gmail.com
Due to splitting the block id into two 16 bit integers, BlockIdSet()
is more expensive than one might think. Doing it once per returned
tuple shows up as a small but reliably reproducible cost. It's simple
enough to set the block number just once per block in pagemode, so do
so.
Author: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/lxzj26ga6ippdeunz6kuncectr5gfuugmm2ry22qu6hcx6oid6@lzx3sjsqhmt6
On Red Hat 9 systems (or similar), the packaged gcc targets x86-64-v2,
but clang does not. This has caused build failures in the wake of
commit e2809e3a1 when building --with-llvm.
The most expedient fix is to use the same function attributes for
the inlined function as we do for the global function.
Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> (plus members skimmer and bumblebee)
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Todd Cook <cookt@blackduck.com>
Discussion: https://postgr.es/m/CANWCAZZSxs3a1YRKehkgk2OHKbrVn+xZ+AWW8Co2R_f70NqqmA@mail.gmail.com
95d6e9af0 added code to display the tuplestore storage type for
WindowAgg nodes and added a test to ensure the "Disk" storage method was
working correctly by setting work_mem to 64 and running a test which
caused the WindowAgg to go to disk. Seemingly, the number of rows
chosen there wasn't quite enough for that to happen in x86 32-bit.
Fix this by increasing the number of rows slightly.
I suspect the buildfarm didn't catch this as MEMORY_CONTEXT_CHECKING
builds will use a bit more memory for MemoryChunks to store the
requested_size and also because of the additional space to store the
chunk's sentinel byte.
Reported-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/Z-q3ZAM4OhE-4UiI@msg.df7cb.de
This code was passing literal strings to psqlscan_emit,
which is quite contrary to that function's specification:
"If you pass it something that is not part of the yytext
string, you are making a mistake". It accidentally worked
anyway, even in non-safe_encoding mode. psqlscan_emit
would compute a garbage "reference" pointer, but would
never dereference that since the passed string is all-ASCII.
So there's no live bug today, but that is a happenstance
outcome of psqlscan_emit's current implementation.
Let's make psqlscan_test_variable do what it's supposed to,
namely append directly to the output buffer. This is just
future-proofing against possible changes in psqlscan_emit,
so I don't feel a need to back-patch.
pg_crc32c.h now has a simplified copy of the loop in pg_crc32c_sse42.c
suitable for inlining where possible.
This may slightly reduce contention for the WAL insertion lock,
but that hasn't been tested. The motivation for this change is avoid
regressing for a future commit that will use a function pointer for
non-constant input in all x86 builds.
While it's technically possible to make a similar change for Arm and
LoongArch, there are some questions about how inlining should work
since those platforms prefer stricter alignment. There are also no
immediate plans to add additional implementations for them.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Raghuveer Devulapalli <raghuveer.devulapalli@intel.com>
Discussion: https://postgr.es/m/CANWCAZZEiTzhZcuwTiJ2=opiNpAUn1vuDRu1N02z61AthwRZLA@mail.gmail.com
Discussion: https://postgr.es/m/CANWCAZYRhLHArpyfV4uRK-Rw9N5oV5HMkkKtBehcuTjNOMwCZg@mail.gmail.com