1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-27 12:41:57 +03:00
Commit Graph

61136 Commits

Author SHA1 Message Date
3dbdf86c63 Fix failing regression test on x86-32 machines
95d6e9af0 added code to display the tuplestore storage type for
WindowAgg nodes and added a test to ensure the "Disk" storage method was
working correctly by setting work_mem to 64 and running a test which
caused the WindowAgg to go to disk.  Seemingly, the number of rows
chosen there wasn't quite enough for that to happen in x86 32-bit.

Fix this by increasing the number of rows slightly.

I suspect the buildfarm didn't catch this as MEMORY_CONTEXT_CHECKING
builds will use a bit more memory for MemoryChunks to store the
requested_size and also because of the additional space to store the
chunk's sentinel byte.

Reported-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/Z-q3ZAM4OhE-4UiI@msg.df7cb.de
2025-04-01 10:52:25 +13:00
2fd3e2fa5c Fix accidentally-harmless thinko in psqlscan_test_variable().
This code was passing literal strings to psqlscan_emit,
which is quite contrary to that function's specification:
"If you pass it something that is not part of the yytext
string, you are making a mistake".  It accidentally worked
anyway, even in non-safe_encoding mode.  psqlscan_emit
would compute a garbage "reference" pointer, but would
never dereference that since the passed string is all-ASCII.
So there's no live bug today, but that is a happenstance
outcome of psqlscan_emit's current implementation.

Let's make psqlscan_test_variable do what it's supposed to,
namely append directly to the output buffer.  This is just
future-proofing against possible changes in psqlscan_emit,
so I don't feel a need to back-patch.
2025-03-31 12:16:32 -04:00
0fcf02ad45 doc: Mention clock synchronization recommendation for hot_standby_feedback
hot_standby_feedback mechanics assume that clocks are synchronized,
but it was not clear from documentation.

Author: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CAKZiRmwBcALLrDgCyEhHP1enUxtPMjyNM_d1A2Lng3_6Rf4Qfw%40mail.gmail.com
2025-03-31 16:54:50 +02:00
e2809e3a10 Inline CRC computation for small fixed-length input on x86
pg_crc32c.h now has a simplified copy of the loop in pg_crc32c_sse42.c
suitable for inlining where possible.

This may slightly reduce contention for the WAL insertion lock,
but that hasn't been tested. The motivation for this change is avoid
regressing for a future commit that will use a function pointer for
non-constant input in all x86 builds.

While it's technically possible to make a similar change for Arm and
LoongArch, there are some questions about how inlining should work
since those platforms prefer stricter alignment. There are also no
immediate plans to add additional implementations for them.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Raghuveer Devulapalli <raghuveer.devulapalli@intel.com>
Discussion: https://postgr.es/m/CANWCAZZEiTzhZcuwTiJ2=opiNpAUn1vuDRu1N02z61AthwRZLA@mail.gmail.com
Discussion: https://postgr.es/m/CANWCAZYRhLHArpyfV4uRK-Rw9N5oV5HMkkKtBehcuTjNOMwCZg@mail.gmail.com
2025-03-31 13:17:21 +07:00
4694aedf63 Add relallfrozen to pg_dump statistics.
Author: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/CADkLM=desCuf3dVHasADvdUVRmb-5gO0mhMO5u9nzgv6i7U86Q@mail.gmail.com
2025-03-30 22:14:06 -07:00
2a5e709e72 Enable IO concurrency on all systems
Previously effective_io_concurrency and maintenance_io_concurrency could not
be set above 0 on machines without fadvise support. AIO enables IO concurrency
without such support, via io_method=worker.

Currently only subsystems using the read stream API will take advantage of
this. Other users of maintenance_io_concurrency (like recovery prefetching)
which leverage OS advice directly will not benefit from this change. In those
cases, maintenance_io_concurrency will have no effect on I/O behavior.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/CAAKRu_atGgZePo=_g6T3cNtfMf0QxpvoUh5OUqa_cnPdhLd=gw@mail.gmail.com
2025-03-30 19:16:47 -04:00
ae3df4b341 read_stream: Introduce and use optional batchmode support
Submitting IO in larger batches can be more efficient than doing so
one-by-one, particularly for many small reads. It does, however, require
the ReadStreamBlockNumberCB callback to abide by the restrictions of AIO
batching (c.f. pgaio_enter_batchmode()). Basically, the callback may not:
a) block without first calling pgaio_submit_staged(), unless a
   to-be-waited-on lock cannot be part of a deadlock, e.g. because it is
   never held while waiting for IO.

b) directly or indirectly start another batch pgaio_enter_batchmode()

As this requires care and is nontrivial in some cases, batching is only
used with explicit opt-in.

This patch adds an explicit flag (READ_STREAM_USE_BATCHING) to read_stream and
uses it where appropriate.

There are two cases where batching would likely be beneficial, but where we
aren't using it yet:

1) bitmap heap scans, because the callback reads the VM

   This should soon be solved, because we are planning to remove the use of
   the VM, due to that not being sound.

2) The first phase of heap vacuum

   This could be made to support batchmode, but would require some care.

Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
2025-03-30 18:36:41 -04:00
f4d0730bbc aio: Basic read_stream adjustments for real AIO
Adapt the read stream logic for real AIO:
- If AIO is enabled, we shouldn't issue advice, but if it isn't, we should
  continue issuing advice
- AIO benefits from reading ahead with direct IO
- If effective_io_concurrency=0, pass READ_BUFFERS_SYNCHRONOUSLY to
  StartReadBuffers() to ensure synchronous IO execution

There are further improvements we should consider:

- While in read_stream_look_ahead(), we can use AIO batch submission mode for
  increased efficiency. That however requires care to avoid deadlocks and thus
  done separately.
- It can be beneficial to defer starting new IOs until we can issue multiple
  IOs at once. That however requires non-trivial heuristics to decide when to
  do so.

Reviewed-by: Noah Misch <noah@leadboat.com>
Co-authored-by: Andres Freund <andres@anarazel.de>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
2025-03-30 18:26:44 -04:00
b27f8637ea docs: Reframe track_io_timing related docs as wait time
With AIO it does not make sense anymore to track the time for each individual
IO, as multiple IOs can be in-flight at the same time. Instead we now track
the time spent *waiting* for IOs.

This should be reflected in the docs. While, so far, we only do a subset of
reads, and no other operations, via AIO, describing the GUC and view columns
as measuring IO waits is accurate for synchronous and asynchronous IO.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/5dzyoduxlvfg55oqtjyjehez5uoq6hnwgzor4kkybkfdgkj7ag@rbi4gsmzaczk
2025-03-30 18:04:40 -04:00
12ce89fd07 bufmgr: Use AIO in StartReadBuffers()
This finally introduces the first actual use of AIO. StartReadBuffers() now
uses the AIO routines to issue IO.

As the implementation of StartReadBuffers() is also used by the functions for
reading individual blocks (StartReadBuffer() and through that
ReadBufferExtended()) this means all buffered read IO passes through the AIO
paths.  However, as those are synchronous reads, actually performing the IO
asynchronously would be rarely beneficial. Instead such IOs are flagged to
always be executed synchronously. This way we don't have to duplicate a fair
bit of code.

When io_method=sync is used, the IO patterns generated after this change are
the same as before, i.e. actual reads are only issued in WaitReadBuffers() and
StartReadBuffers() may issue prefetch requests.  This allows to bypass most of
the actual asynchronicity, which is important to make a change as big as this
less risky.

One thing worth calling out is that, if IO is actually executed
asynchronously, the precise meaning of what track_io_timing is measuring has
changed. Previously it tracked the time for each IO, but that does not make
sense when multiple IOs are executed concurrently. Now it only measures the
time actually spent waiting for IO. A subsequent commit will adjust the docs
for this.

While AIO is now actually used, the logic in read_stream.c will often prevent
using sufficiently many concurrent IOs. That will be addressed in the next
commit.

Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-authored-by: Andres Freund <andres@anarazel.de>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-30 18:02:23 -04:00
047cba7fa0 bufmgr: Implement AIO read support
This commit implements the infrastructure to perform asynchronous reads into
the buffer pool.

To do so, it:

- Adds readv AIO callbacks for shared and local buffers

  It may be worth calling out that shared buffer completions may be run in a
  different backend than where the IO started.

- Adds an AIO wait reference to BufferDesc, to allow backends to wait for
  in-progress asynchronous IOs

- Adapts StartBufferIO(), WaitIO(), TerminateBufferIO(), and their localbuf.c
  equivalents, to be able to deal with AIO

- Moves the code to handle BM_PIN_COUNT_WAITER into a helper function, as it
  now also needs to be called on IO completion

As of this commit, nothing issues AIO on shared/local buffers. A future commit
will update StartReadBuffers() to do so.

Buffer reads executed through this infrastructure will report invalid page /
checksum errors / warnings differently than before:

In the error case the error message will cover all the blocks that were
included in the read, rather than just the reporting the first invalid
block. If more than one block is invalid, the error will include information
about the range of the read, the first invalid block and the number of invalid
pages, with a HINT towards the server log for per-block details.

For the warning case (i.e. zero_damaged_buffers) we would previously emit one
warning message for each buffer in a multi-block read. Now there is only a
single warning message for the entire read, again referring to the server log
for more details in case of multiple checksum failures within a single larger
read.

Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-30 17:28:03 -04:00
ef64fe26ba aio: Add WARNING result status
If an IO succeeds, but issues a warning, e.g. due to a page verification
failure with zero_damaged_pages, we want to issue that warning in the context
of the issuer of the IO, not the process that executes the completion (always
the case for worker).

It's already possible for a completion callback to report a custom error
message, we just didn't have a result status that allowed a user of AIO to
know that a warning should be emitted even though the IO request succeeded.

All that's needed for that is a dedicated PGAIO_RS_ value.

Previously there were not enough bits in PgAioResult.id for the new
value. Increase. While at that, add defines for the amount of bits and static
asserts to check that the widths are appropriate.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250329212929.a6.nmisch@google.com
2025-03-30 16:27:10 -04:00
d445990adc Let caller of PageIsVerified() control ignore_checksum_failure
For AIO the completion of a read into shared buffers (i.e. verifying the page
including the checksum, updating the BufferDesc to reflect the IO) can happen
in a different backend than the backend that started the IO. As
ignore_checksum_failure can differ between backends, we need to allow the
caller of PageIsVerified() control whether to ignore checksum failures.

The commit leaves a gap in the PIV_* values, as an upcoming commit, which
depends on this commit, will add PIV_LOG_LOG, which better fits just after
PIV_LOG_WARNING.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250329212929.a6.nmisch@google.com
2025-03-30 16:27:10 -04:00
b96d3c3897 pgstat: Allow checksum errors to be reported in critical sections
For AIO we execute completion callbacks in critical sections (to ensure that
AIO can in the future be used for WAL, which in turn requires that we can call
completion callbacks in critical sections, to get the resources for WAL
io). To report checksum errors a backend now has to call
pgstat_prepare_report_checksum_failure(), before entering a critical section,
which guarantees the relevant pgstats entry is in shared memory, the relevant
DSM segment is mapped into the backend's memory and the address is known via a
PgStat_EntryRef.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/wkjj4p2rmkevutkwc6tewoovdqznj6c6nvjmvii4oo5wmbh5sr@retq7d6uqs4j
2025-03-30 16:12:04 -04:00
4244cf6876 Add errhint_internal()
We have errmsg_internal(), errdetail_internal(), but not errhint_internal().

Sometimes it is useful to output a hint with already translated format
string (e.g. because there different messages depending on the condition). For
message/detail we do that with the _internal() variants, but we can't do that
with hint today.  It's possible to work around that that by using something
like
  str = psprintf(translated_format, args);
  ereport(...
          errhint("%s", str);
but that's not exactly pretty and makes it harder to avoid memory leaks.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/ym3dqpa4xcvoeknewcw63x77vnqdosbqcetjinb2zfoh65k55m@m4ozmwhr6lk6
2025-03-30 16:10:51 -04:00
49b82522f1 Remove incidental md5() function use from test
Replace md5() with sha256() in tests introduced in 14ffaece0f, to
allow test to pass in OpenSSL FIPS mode.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3518736.1743307492@sss.pgh.pa.us
2025-03-30 13:22:39 +02:00
d6d8054dc7 localbuf: Track pincount in BufferDesc as well
For AIO on temporary table buffers the AIO subsystem needs to be able to
ensure a pin on a buffer while AIO is going on, even if the IO issuing query
errors out. Tracking the buffer in LocalRefCount does not work, as it would
cause CheckForLocalBufferLeaks() to assert out.

Instead, also track the refcount in BufferDesc.state, not just
LocalRefCount. This also makes local buffers behave a bit more akin to shared
buffers.

Note that we still don't need locking, AIO completion callbacks for local
buffers are executed in the issuing session (i.e. nobody else has access to
the BufferDesc).

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
2025-03-29 16:36:51 -04:00
08ccd56ac7 aio, bufmgr: Comment fixes/improvements
Some of these comments have been wrong for a while (12f3867f55), some I
recently introduced (da7226993f, 55b454d0e1). This includes an update to a
comment in FlushBuffer(), which will be copied in a future commit.

These changes seem big enough to be worth doing in separate commits.

Suggested-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250319212530.80.nmisch@google.com
2025-03-29 14:45:42 -04:00
50cb7505b3 aio: Implement support for reads in smgr/md/fd
This implements the following:

1) An smgr AIO target, for AIO on smgr files. This should be usable not just
   for md.c but also other SMGR implementation if we ever get them.
2) readv support in fd.c, which requires a small bit of infrastructure work in
   fd.c
3) smgr.c and md.c support for readv

There still is nothing performing AIO, but as of this commit it would be
possible.

As part of this change FileGetRawDesc() actually ensures that the file is
opened - previously it was basically not usable. It's used to reopen a file in
IO workers.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-29 13:38:35 -04:00
dee8002468 Fix mis-attribution of checksum failure stats to the wrong database
Checksum failure stats could be attributed to the wrong database in two cases:

- when a read of a shared relation encountered a checksum error , it would be
  attributed to the current database, instead of the "database" representing
  shared relations

- when using CREATE DATABASE ... STRATEGY WAL_LOG checksum errors in the
  source database would be attributed to the current database

The checksum stats reporting via PageIsVerifiedExtended(PIV_REPORT_STAT) does
not have access to the information about what database a page belongs to.

This fixes the issue by removing PIV_REPORT_STAT and delegating the
responsibility to report stats to the caller, which now can learn about the
number of stats via a new optional argument.

As this changes the signature of PageIsVerifiedExtended() and all callers
should adapt to the new signature, use the occasion to rename the function to
PageIsVerified() and remove the compatibility macro.

We could instead have fixed this by adding information about the database to
the args of PageIsVerified(), but there are soon-to-be-applied patches that
need to separate the stats reporting from the PageIsVerified() call
anyway. Those patches also include testing for the failure paths, something we
inexplicably have not had.

As there is no caller of pgstat_report_checksum_failure() left, remove it.

It'd be possible, but awkward to fix this in the back branches. We considered
doing the work not quite worth it, as mis-attributed stats should still elicit
concern. The emitted error messages do allow to attribute the errors
correctly.

Discussion: https://postgr.es/m/5tyic6epvdlmd6eddgelv47syg2b5cpwffjam54axp25xyq2ga@ptwkinxqo3az
Discussion: https://postgr.es/m/mglpvvbhighzuwudjxzu4br65qqcxsnyvio3nl4fbog3qknwhg@e4gt7npsohuz
2025-03-29 13:38:35 -04:00
68f97aeadb amcheck: Add a GIN index to the CREATE INDEX CONCURRENTLY tests
The existing CREATE INDEX CONCURRENTLY tests checking only B-Tree, but
can be cheaply extended to also check GIN. This helps increasing test
coverage for GIN amcheck, especially related to handling concurrent page
splits and posting list trees.

This already helped to identify several issues during development of the
GIN amcheck support.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-By: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/BC221A56-977C-418E-A1B8-9EFC881D80C5%40enterprisedb.com
2025-03-29 16:47:44 +01:00
ca738bdc4c amcheck: Add a test with GIN index on JSONB data
Extend the existing test of GIN checks to also include an index on JSONB
data, using the jsonb_path_ops opclass. This is a common enough usage of
GIN that it makes sense to have better test coverage for it.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-By: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/BC221A56-977C-418E-A1B8-9EFC881D80C5%40enterprisedb.com
2025-03-29 16:47:44 +01:00
ec4327d106 amcheck: Fix indentation in verify_gin.c
I forgot to reindent the code after a couple last-minute adjustments
just before committing 14ffaece0f.

Discussion: https://postgr.es/m/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru
2025-03-29 16:47:44 +01:00
116e851db5 Fix "‘static’ is not at beginning of declaration" warning
b98be8a2a2 used "const static" instead of "static const". We normally use the
latter form.

Discussion: https://postgr.es/m/z4mc2hzecahyq3paupfsouhuupmzmgum45md3k5my6bmo7gvn7@z5j26doqamqy
2025-03-29 10:48:59 -04:00
14ffaece0f amcheck: Add gin_index_check() to verify GIN index
Adds a new function, validating two kinds of invariants on a GIN index:

- parent-child consistency: Paths in a GIN graph have to contain
  consistent keys. Tuples on parent pages consistently include tuples
  from child pages; parent tuples do not require any adjustments.

- balanced-tree / graph: Each internal page has at least one downlink,
  and can reference either only leaf pages or only internal pages.

The GIN verification is based on work by Grigory Kryachko, reworked by
Heikki Linnakangas and with various improvements by Andrey Borodin.
Investigation and fixes for multiple bugs by Kirill Reshke.

Author: Grigory Kryachko <GSKryachko@gmail.com>
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Author: Andrey Borodin <amborodin@acm.org>
Reviewed-By: José Villanova <jose.arthur@gmail.com>
Reviewed-By: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-By: Nikolay Samokhvalov <samokhvalov@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-By: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-By: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru
2025-03-29 15:44:29 +01:00
53a2a1564a pgbench: Make set_random_seed() 64-bit everywhere.
Delete an intermediate variable, a redundant cast, a use of long and a
use of long long.  scanf() the seed directly into a uint64, now that we
can do that with SCNu64 from <inttypes.h>.

The previous coding was from pre-C99 times when %lld might not have been
there, so it read into an unsigned long.  Therefore behavior varied
by OS, and --random-seed would accept either 32 or 64 bit seeds.  Now
it's the same everywhere.

Author: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/b936d2fb-590d-49c3-a615-92c3a88c6c19%40eisentraut.org
2025-03-29 15:24:42 +01:00
d70b17636d amcheck: Move common routines into a separate module
Before performing checks on an index, we need to take some safety
measures that apply to all index AMs. This includes:

* verifying that the index can be checked - Only selected AMs are
supported by amcheck (right now only B-Tree). The index has to be
valid and not a temporary index from another session.

* changing (and then restoring) user's security context

* obtaining proper locks on the index (and table, if needed)

* discarding GUC changes from the index functions

Until now this was implemented in the B-Tree amcheck module, but it's
something every AM will have to do. So relocate the code into a new
module verify_common for reuse.

The shared steps are implemented by amcheck_lock_relation_and_check(),
receiving the AM-specific verification as a callback. Custom parameters
may be supplied using a pointer.

Author: Andrey Borodin <amborodin@acm.org>
Reviewed-By: José Villanova <jose.arthur@gmail.com>
Reviewed-By: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-By: Nikolay Samokhvalov <samokhvalov@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Mark Dilger <mark.dilger@enterprisedb.com>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru
2025-03-29 15:14:49 +01:00
fb9dff7663 Fix grammar in GIN README
Author: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/CALdSSPgu9uAhVYojQ0yjG%3Dq5MaqmiSLUJPhz%2B-u7cA6K6Mc9UA%40mail.gmail.com
2025-03-29 15:14:25 +01:00
8b6a0e2392 Fix MERGE with DO NOTHING actions into a partitioned table.
ExecInitPartitionInfo() duplicates much of the logic in
ExecInitMerge(), except that it failed to handle DO NOTHING
actions. This would cause an "unknown action in MERGE WHEN clause"
error if a MERGE with any DO NOTHING actions attempted to insert into
a partition not already initialised by ExecInitModifyTable().

Bug: #18871
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Gurjeet Singh <gurjeet@singh.im>
Discussion: https://postgr.es/m/18871-b44e3c96de3bd2e8%40postgresql.org
Backpatch-through: 15
2025-03-29 09:58:40 +00:00
a0ed19e0a9 Use PRI?64 instead of "ll?" in format strings (continued).
Continuation of work started in commit 15a79c73, after initial trial.

Author: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/b936d2fb-590d-49c3-a615-92c3a88c6c19%40eisentraut.org
2025-03-29 10:43:57 +01:00
a0a4601765 Matview statistics depend on matview data.
REFRESH MATERIALIZED VIEW replaces the storage, which resets
statistics, so statistics must be restored afterward.

If both statistics and data are being dumped for a materialized view,
add a dependency from the former to the latter. Defer the statistics
to SECTION_POST_DATA, and use RESTORE_PASS_POST_ACL.

Reported-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAExHW5s47kmubpbbRJzSM-Zfe0Tj2O3GBagB7YAyE8rQ-V24Uw@mail.gmail.com
2025-03-28 16:12:55 -07:00
775a06d44c Make group_similar_or_args() reorder clause list as little as possible
Currently, group_similar_or_args() permutes original positions of clauses
independently on whether it manages to find any groups of similar clauses.
While we are not providing any strict warranties on saving the original order
of OR-clauses, it is preferred that the original order be modified as little
as possible.

This commit changes the reordering algorithm of group_similar_or_args() in
the following way.  We reorder each group of similar clauses so that the
first item of the group stays in place, but all the other items are moved
after it.  So, if there are no similar clauses, the order of clauses stays
the same.  When there are some groups, only required reordering happens while
the rest of the clauses remain in their places.

Reported-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/3ac7c436-81e1-4191-9caf-b0dd70b51511%40gmail.com
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru>
2025-03-28 23:37:49 +02:00
519338ace4 Optimize popcount functions with ARM SVE intrinsics.
This commit introduces SVE implementations of pg_popcount{32,64}.
Unlike the Neon versions, we need an additional configure-time
check to determine if the compiler supports SVE intrinsics, and we
need a runtime check to determine if the current CPU supports SVE
instructions.  Our testing showed that the SVE implementations are
much faster for larger inputs and are comparable to the status
quo for smaller inputs.

Author: "Devanga.Susmitha@fujitsu.com" <Devanga.Susmitha@fujitsu.com>
Co-authored-by: "Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com>
Co-authored-by: "Malladi, Rama" <ramamalladi@hotmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com
Discussion: https://postgr.es/m/OSZPR01MB84990A9A02A3515C6E85A65B8B2A2%40OSZPR01MB8499.jpnprd01.prod.outlook.com
2025-03-28 16:20:20 -05:00
3c8e463b0d Revert "Tidy up locale thread safety in ECPG library."
This reverts commit 8e993bff53.

It causes various build failures on the buildfarm, to be investigated.

Discussion: https://postgr.es/m/CWZBBRR6YA8D.8EHMDRGLCKCD%40neon.tech
2025-03-28 21:27:37 +01:00
6be53c2767 Optimize popcount functions with ARM Neon intrinsics.
This commit introduces Neon implementations of pg_popcount{32,64},
pg_popcount(), and pg_popcount_masked().  As in simd.h, we assume
that all available AArch64 hardware supports Neon, so we don't need
any new configure-time or runtime checks.  Some compilers already
emit Neon instructions for these functions, but our hand-rolled
implementations for pg_popcount() and pg_popcount_masked()
performed better in testing, likely due to better instruction-level
parallelism.

Author: "Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com
2025-03-28 14:49:35 -05:00
51a0382e8d Fix crash if LockErrorCleanup() is called twice
The refactoring in commit 3c0fd64fec removed the clearing of
awaitedLock from LockErrorCleanup(). It's still needed, otherwise
LockErrorCleanup() during abort processing will try to update the
LOCALLOCK struct even after the lock has already been released. Put it
back.

Reported-by: Richard Guo <guofenglinux@gmail.com>
Reported-by: Robins Tharakan <tharakan@gmail.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4_dNX1SzBmvFdoY-LxJh_4W_BjtVd5i008ihfU-wFF=eg@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/18832-38e5575b1bbd7277@postgresql.org
Discussion: https://www.postgresql.org/message-id/e11a30e5-c0d8-491d-8546-3a1b50c10ad4@gmail.com
2025-03-28 20:19:17 +02:00
9ac6f7e7ce Rename TRY_POPCNT_FAST to TRY_POPCNT_X86_64.
This macro protects x86_64-specific code, and a subsequent commit
will introduce AArch64-specific versions of that code.  To prevent
confusion, let's rename it to clearly indicate that it's for
x86_64.  We should likely move this code to its own file (perhaps
merging it with the AVX-512 popcount code), but that is left as a
future exercise.

Reviewed-by: "Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com
2025-03-28 12:27:47 -05:00
a5419bc72e Fix timestamp overflow in UUIDv7 implementation.
The uuidv7_interval() function previously converted a shifted
microsecond-precision timestamp (64-bit integer) to another 64-bit
integer representing a timestamp with nanosecond precision. This
conversion caused overflow for dates beyond the year 2262. The
millisecond and sub-millisecond parts were then extracted from this
nanosecond-precision timestamp and stored in UUIDv7 values.

With this commit, the millisecond and sub-millisecond parts are stored
directly into the UUIDv7 value without being converted back to a
nanosecond precision timestamp. Following RFC 9562, the timestamp is
stored as an unsigned integer, enabling support for dates up to the
year 10889.

Reported and fixed by Andrey Borodin, with cosmetic changes and
regression tests by me.

Reported-by: Andrey Borodin <x4mmm@yandex-team.ru>
Author: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/96DEC2D9-659A-40E8-B7BA-AF5D162A9E21@yandex-team.ru
2025-03-28 09:39:11 -07:00
8e993bff53 Tidy up locale thread safety in ECPG library.
Remove setlocale() and _configthreadlocal() as fallback strategy on
systems that don't have uselocale(), where ECPG tries to control
LC_NUMERIC formatting on input and output of floating point numbers.  It
was probably broken on some systems (NetBSD), and the code was also
quite messy and complicated, with obsolete configure tests (Windows).
It was also arguably broken, or at least had unstated environmental
requirements, if pgtypeslib code was called directly.

Instead, introduce PG_C_LOCALE to refer to the "C" locale as a locale_t
value.  It maps to the special constant LC_C_LOCALE when defined by libc
(macOS, NetBSD), or otherwise uses a process-lifetime locale_t that is
allocated on first use, just as ECPG previously did itself.  The new
replacement might be more widely useful.  Then change the float parsing
and printing code to pass that to _l() functions where appropriate.

Unfortunately the portability of those functions is a bit complicated.
First, many obvious and useful _l() functions are missing from POSIX,
though most standard libraries define some of them anyway.  Second,
although the thread-safe save/restore technique can be used to replace
the missing ones, Windows and NetBSD refused to implement standard
uselocale().  They might have a point: "wide scope" uselocale() is hard
to combine with other code and error-prone, especially in library code.
Luckily they have the  _l() functions we want so far anyway.  So we have
to be prepared for both ways of doing things:

1.  In ECPG, use strtod_l() for parsing, and supply a port.h replacement
using uselocale() over a limited scope if missing.

2.  Inside our own snprintf.c, use three different approaches to format
floats.  For frontend code, call libc's snprintf_l(), or wrap libc's
snprintf() in uselocale() if it's missing.  For backend code, snprintf.c
can keep assuming that the global locale's LC_NUMERIC is "C" and call
libc's snprintf() without change, for now.

(It might eventually be possible to call our in-tree Ryū routines to
display floats in snprintf.c, given the C-locale-always remit of our
in-tree snprintf(), but this patch doesn't risk changing anything that
complicated.)

Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Tristan Partin <tristan@partin.io>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CWZBBRR6YA8D.8EHMDRGLCKCD%40neon.tech
2025-03-28 16:18:36 +01:00
2247281c47 Cast result of i64abs() back to int64
Without the cast, the return type could be long or long long,
depending on what int64 is underneath.  This doesn't affect code
correctness, but it could result in format-mismatch warnings when
attempting to printf such values using PRId64.

Reported-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+hUKGJc4s+Wyb3EFOQNN9VVK+Qv40r2LK41o9PkS9ThxviTvQ@mail.gmail.com
2025-03-28 14:34:57 +01:00
83ccc85859 pg_overexplain: Use PG_MODULE_MAGIC_EXT.
I committed this contrib module just after Tom committed
55527368bd07248e91e3d37a782bf66b76f06865; adjust it to match.

Author: Man Zeng <zengman@halodbtech.com>
Discussion: http://postgr.es/m/174313513707.60295.16516085012903412705.pgcf@coridan.postgresql.org
2025-03-28 09:16:29 -04:00
9f0c36aea0 pg_overexplain: Call previous hooks as appropriate.
It makes no sense to remember the previous values of the hook variables
and then never bother calling those functions. Thanks to Andrei for
spotting my goof.

Author: Andrei Lepikhov <lepihov@gmail.com>
Discussion: http://postgr.es/m/41a344e3-ffb1-4296-8ba7-801f1e9642e5@gmail.com
2025-03-28 09:02:37 -04:00
cdc168ad4b Add support for not-null constraints on virtual generated columns
This was left out of the original patch for virtual generated columns
(commit 83ea6c5402).

This just involves a bit of extra work in the executor to expand the
generation expressions and run a "IS NOT NULL" test against them.

There is also a bit of work to make sure that not-null constraints are
checked during a table rewrite.

Author: jian he <jian.universality@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Navneet Kumar <thanit3111@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CACJufxHArQysbDkWFmvK+D1TPHQWWTxWN15cMuUaTYX3xhQXgg@mail.gmail.com
2025-03-28 13:53:37 +01:00
747ddd38cb Modernize some code a bit
Modernize code in ExecRelCheck() and ExecConstraints() a bit,
preparing the way for some new code.

Co-authored-by: jian he <jian.universality@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Navneet Kumar <thanit3111@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CACJufxHArQysbDkWFmvK+D1TPHQWWTxWN15cMuUaTYX3xhQXgg@mail.gmail.com
2025-03-28 10:49:15 +01:00
9a9ead1105 Rename a node field for clarity
Rename ResultRelInfo.ri_ConstraintExprs to ri_CheckConstraintExprs.
This reflects its specific purpose better and avoids confusion with
adjacent fields with similar but distinct purposes.

Discussion: https://postgr.es/m/CACJufxHArQysbDkWFmvK+D1TPHQWWTxWN15cMuUaTYX3xhQXgg@mail.gmail.com
2025-03-28 09:50:01 +01:00
fb2ea12f42 pg_createsubscriber: Add '--all' option.
The '--all' option indicates that the tool queries the source server
(publisher) for all databases and creates subscriptions on the target
server (subscriber) for databases with matching names. Without this user
needs to explicitly specify all databases by using -d option for each
database.

This simplifies converting a physical standby to a logical subscriber,
particularly during upgrades.

The options '--database', '--publication', '--subscription', and
'--replication-slot' cannot be used when '--all' is specified.

Author: Shubham Khanna <khannashubham1197@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Discussion: https://postgr.es/m/CAHv8RjKhA=_h5vAbozzJ1Opnv=KXYQHQ-fJyaMfqfRqPpnC2bA@mail.gmail.com
2025-03-28 12:26:39 +05:30
890fc826c9 Use thread-safe strftime_l() instead of strftime().
This removes some setlocale() calls and a lot of commentary about how
dangerous that is.  strftime_l() is from POSIX 2008, and on Windows we
use _wcsftime_l().

While here, adjust error message for strftime_l() failure: it does not
in practice set errno (even though POSIX says it could), so no %m.

Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKGJqVe0%2BPv9dvC9dSums_PXxGo9SWcxYAMBguWJUGbWz-A%40mail.gmail.com
2025-03-28 07:13:43 +01:00
474d7a1fd8 Stablize tests added in 3abe9dc188.
The problem is that after the ALTER SUBSCRIPTION tap_sub SET PUBLICATION
command, we didn't wait for the new walsender to start on the publisher.
Immediately after ALTER, we performed Insert and expected it to replicate.
However, the replication could start from a point after the INSERT location,
and as the subscription isn't copying initial data, we could miss such an
Insert.

The fix is to wait for connection to be established between publisher and
subscriber before starting DML operations that are expected to replicate.

As per CI.

Reported-by: Andres Freund <andres@anarazel.de>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/CALDaNm2ms1deM5EYNLFEfESv_Kw=Y4AiTB0LP=qGS-UpFwGbPg@mail.gmail.com
2025-03-28 11:03:05 +05:30
058b5152f0 Fix guc_malloc calls for consistency and OOM checks
check_createrole_self_grant and check_synchronized_standby_slots
were allocating memory on a LOG elevel without checking if the
allocation succeeded or not, which would have led to a segfault
on allocation failure.

On top of that, a number of callsites were using the ERROR level,
relying on erroring out rather than returning false to allow the
GUC machinery handle it gracefully.  Other callsites used WARNING
instead of LOG.  While neither being not wrong, this changes all
check_ functions do it consistently with LOG.

init_custom_variable gets a promoted elevel to FATAL to keep
the guc_malloc error handling in line with the rest of the
error handling in that function which already call FATAL.  If
we encounter an OOM in this callsite there is no graceful
handling to be had, better to error out hard.

Backpatch the fix to check_createrole_self_grant down to v16
and the fix to check_synchronized_standby_slots down to v17
where they were introduced.

Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Nikita <pm91.arapov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Bug: #18845
Discussion: https://postgr.es/m/18845-582c6e10247377ec@postgresql.org
Backpatch-through: 16
2025-03-27 22:57:34 +01:00
043799fa08 Use streaming read I/O in heap amcheck
Instead of directly invoking ReadBuffer() for each unskippable block in
the heap relation, verify_heapam() now uses the read stream API to
acquire the next buffer to check for corruption.

Author: Matheus Alcantara <matheusssilv97@gmail.com>
Co-authored-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/flat/CAFY6G8eLyz7%2BsccegZYFj%3D5tAUR-GZ9uEq4Ch5gvwKqUwb_hCA%40mail.gmail.com
2025-03-27 14:04:14 -04:00