Specify whether the bucket bounds are inclusive or exclusive,
and improve some other vague language. Explain the behavior that
occurs when the "low" bound is greater than the "high" bound.
Make width_bucket_numeric's comment more like that for
width_bucket_float8, in particular noting that infinite
bounds are rejected (since they became possible in v14).
Reported-by: Ben Peachey Higdon <bpeacheyhigdon@gmail.com>
Author: Robert Treat <rob@xzilla.net>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/2BD74F86-5B89-4AC1-8F13-23CED3546AC1@gmail.com
Backpatch-through: 13
Commit 85e5e222b, which added (a forerunner of) this logic,
argued that
Adding the necessary complexity to make this work doesn't seem like
it would be repaid in significantly better plans, because in cases
where such a PHV exists, there is probably a corresponding join order
constraint that would allow a good plan to be found without using the
star-schema exception.
The flaw in this claim is that there may be other join-order
restrictions that prevent us from finding a join order that doesn't
involve a "dangerous" PHV. In particular we now recognize that
small join_collapse_limit or from_collapse_limit could prevent it.
Therefore, let's bite the bullet and make the case work.
We don't have to extend the executor's support for nestloop parameters
as I thought at the time, because we can instead push the evaluation
of the placeholder's expression into the left-hand input of the
NestLoop node. So there's not really a lot of downside to this
solution, and giving the planner more join-order flexibility should
have value beyond just avoiding failure.
Having said that, there surely is a nonzero risk of introducing
new bugs. Since this failure mode escaped detection for ten years,
such cases don't seem common enough to justify a lot of risk.
Therefore, let's put this fix into master but leave the back branches
alone (for now anyway).
Bug: #18953
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Diagnosed-by: Richard Guo <guofenglinux@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18953-1c9883a9d4afeb30@postgresql.org
While choosing an autogenerated name for an index, look for
pre-existing relations using a SnapshotDirty snapshot, instead of the
previous behavior that considered only committed-good pg_class rows.
This allows us to detect and avoid conflicts against indexes that are
still being built.
It's still possible to fail due to a race condition, but the window
is now just the amount of time that it takes DefineIndex to validate
all its parameters, call smgrcreate(), and enter the index's pg_class
row. Formerly the race window covered the entire time needed to
create and fill an index, which could be very long if the table is
large. Worse, if the conflicting index creation is part of a larger
transaction, it wouldn't be visible till COMMIT.
So this isn't a complete solution, but it should greatly ameliorate
the problem, and the patch is simple enough to be back-patchable.
It might at some point be useful to do the same for pg_constraint
entries (cf. ChooseConstraintName, ConstraintNameExists, and related
functions). However, in the absence of field complaints, I'll leave
that alone for now. The relation-name test should be good enough for
index-based constraints, while foreign-key constraints seem to be okay
since they require exclusive locks to create.
Bug: #18959
Reported-by: Maximilian Chrzan <maximilian.chrzan@here.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/18959-f63b53b864bb1417@postgresql.org
Backpatch-through: 13
We never create regress.def, and if we did this code would fail to
delete it, because "win" is not the correct PORTNAME for Windows.
This thinko seems to have originated in commit 7a6b562fd from 1999,
although it got moved around multiple times since then.
Author: Christoph Berg <myon@debian.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/aFVR7R7VDX7y2ruc@msg.df7cb.de
The TAP tests that verify logical and physical replication slot behavior
during checkpoints (046_checkpoint_logical_slot.pl and
047_checkpoint_physical_slot.pl) inserted two batches of 2 million rows each,
generating approximately 520 MB of WAL. On slow machines, or when compiled
with '-DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE', this caused the
tests to run for 8-9 minutes and occasionally time out, as seen on the
buildfarm animal prion.
This commit modifies the mentioned tests to utilize the $node->advance_wal()
function, thereby reducing runtime. Once we do not use the generated data,
the proposed function is a good alternative, which cuts the total wall-clock
run time.
While here, remove superfluous '\n' characters from several note() calls;
these appeared literally in the build-farm logs and looked odd. Also, remove
excessive 'shared_preload_libraries' GUC from the config and add a check for
'injection_points' extension availability.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/fbc5d94e-6fbd-4a64-85d4-c9e284a58eb2%40gmail.com
Backpatch-through: 17
In version 17 we added support for cross-partition EXCLUDE
constraints, as long as they included all partition key columns and
compared them with equality (see 8c852ba9a4). I updated the docs for
exclusion constraints, but I missed that the docs for CREATE TABLE
still said that they were not supported. This commit fixes that.
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Co-authored-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/c955d292-b92d-42d1-a2a0-1ec6715a2546@illuminatedcomputing.com
Backpatch-through: 17
Increase consistency of --help and man page synopses between pg_dump
and pg_dumpall. These should now be very similar, as pg_dumpall can
now also produce non-text dump output. But actually, they had drifted
further apart.
- Use verb "export" consistently, instead of "dump" or "extract".
- Use "SQL script" instead of just "script" or "text file".
- Maintain consistent distinction between SQL script and other
formats/archives (which is relevant for pg_restore).
Reviewed-by: Robert Treat <rob@xzilla.net>
Discussion: https://www.postgresql.org/message-id/flat/3f71d8a7-095b-4829-9b0b-fce09e9866b3%40eisentraut.org
Improve the clarity of LOG messages when a failover logical slot
synchronization fails, making the reasons more explicit for easier
debugging.
Update the documentation to outline scenarios where slot synchronization
can fail, especially during the initial sync, and emphasize that
pg_sync_replication_slot() is primarily intended for testing and
debugging purposes.
We also discussed improving the functionality of
pg_sync_replication_slot() so that it can be used reliably, but we would
take up that work for next version after some more discussion and review.
Reported-by: Suraj Kharage <suraj.kharage@enterprisedb.com>
Author: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/CAF1DzPWTcg+m+x+oVVB=y4q9=PYYsL_mujVp7uJr-_oUtWNGbA@mail.gmail.com
Commit 8492feb98f added support for parallel CREATE INDEX on GIN indexes.
However, previously two places in the documentation and two in the source
code comments still stated that only B-tree and BRIN indexes support
parallel builds.
This commit updates those references to correctly include GIN indexes.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Discussion: https://postgr.es/m/7d27d068-90e2-4022-9bd7-09b0fd3d4f47@oss.nttdata.com
Commit 302cf15759 added support for LIKE in CREATE FOREIGN TABLE.
In this feature, since indexes are not created for foreign tables,
comments on indexes are not copied either.
However, the documentation incorrectly stated that index comments
would be copied when using INCLUDING COMMENTS. This commit
corrects that by removing the mention of index comments.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/f86cd84f-a6a3-4451-bae7-5cca9e63b06d@oss.nttdata.com
I thought underscores wouldn't even work in "id"s, so I never checked to
see if anything referenced it, but it seems it does work, so adjust the
calling site for the dash syntax.
Commit 285613c60a introduced the min_protocol_version and
max_protocol_version connection options for libpq, but their descriptions
were placed in the middle of the unrelated ssl_min_protocol_version and
ssl_max_protocol_version entries.
This commit moves the min_protocol_version and max_protocol_version
descriptions to appear after the SSL-related options. This improves
the logical order and makes it easier for users to locate the relevant
settings in the libpq documentation.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://postgr.es/m/a3391f36-30f5-4d4a-825b-232476819de8@oss.nttdata.com
Fix two issues in parent_key validation in posting trees:
* It's not enough to check stack->parentblk is valid to determine if the
parentkey is valid. It's possible parentblk is set to a valid block
number, but parentkey is invalid. So check parentkey directly.
* We don't need to invalidate parentkey for all child pages of the
rightmost page. It's enough to invalidate it for the rightmost child
only, which means we can check more cases (less false negatives).
Issues reported by Arseniy Mukhin, along with a proposed patch. Review
by Andrey M. Borodin, cleanup and improvements by me.
Author: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CAE7r3MJ611B9TE=YqBBncewp7-k64VWs+sjk7XF6fJUX77uFBA@mail.gmail.com
The checks introduced by commit 14ffaece0f did not get the parent key
checks quite right, missing some data corruption cases. In particular:
* The "rightlink" check was not working as intended, because rightlink
is a BlockNumber, and InvalidBlockNumber is 0xFFFFFFFF, so
!GinPageGetOpaque(page)->rightlink
almost always evaluates to false (except for rightlink=0). So in most
cases parenttup was left NULL, preventing any checks against parent.
* Use GinGetDownlink() to retrieve child blkno to avoid triggering
Assert, same as the core GIN code.
Issues reported by Arseniy Mukhin, along with a proposed patch. Review
by Andrey M. Borodin, cleanup and improvements by me.
Author: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CAE7r3MJ611B9TE=YqBBncewp7-k64VWs+sjk7XF6fJUX77uFBA@mail.gmail.com
This tightens a couple checks in checking GIN indexes, which might have
resulted in incorrect results (false positives/negatives).
* The code skipped ordering checks if the entries were for different
attributes (for multi-column GIN indexes), possibly missing some cases
of data corruption. But the attribute number is part of the ordering,
so we can check that.
* The root page was skipped when checking entry order, but that is
unnecessary. The root page is subject to the same ordering rules, we
can process it just like any other page.
* The high key on the right-most page was not checked, but that is
needed only for inner pages (we don't store the high key for those).
For leaf pages we can check the high key just fine.
* Correct the detection of split pages. If the page gets split, the
cached parent key is greater than the current child key (not less, as
the code incorrectly expected).
Issues reported by Arseniy Mukhin, along with a proposed patch. Review
by Andrey M. Borodin, cleanup and improvements by me.
Author: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CAE7r3MJ611B9TE=YqBBncewp7-k64VWs+sjk7XF6fJUX77uFBA@mail.gmail.com
Commit 4909b38af0 introduced logic to distribute invalidation messages
from catalog-modifying transactions to all concurrent in-progress
transactions. However, since each transaction distributes not only its
original invalidation messages but also previously distributed
messages to other transactions, this leads to an exponential increase
in allocation request size for invalidation messages, ultimately
causing memory allocation failure.
This commit fixes this issue by tracking distributed invalidation
messages separately per decoded transaction and not redistributing
these messages to other in-progress transactions. The maximum size of
distributed invalidation messages that one transaction can store is
limited to MAX_DISTR_INVAL_MSG_PER_TXN (8MB). Once the size of the
distributed invalidation messages exceeds this threshold, we
invalidate all caches in locations where distributed invalidation
messages need to be executed.
Back-patch to all supported versions where we introduced the fix by
commit 4909b38af0.
Note that this commit adds two new fields to ReorderBufferTXN to store
the distributed transactions. This change breaks ABI compatibility in
back branches, affecting third-party extensions that depend on the
size of the ReorderBufferTXN struct, though this scenario seems
unlikely.
Additionally, it adds a new flag to the txn_flags field of
ReorderBufferTXN to indicate distributed invalidation message
overflow. This should not affect existing implementations, as it is
unlikely that third-party extensions use unused bits in the txn_flags
field.
Bug: #18938#18942
Author: vignesh C <vignesh21@gmail.com>
Reported-by: Duncan Sands <duncan.sands@deepbluecap.com>
Reported-by: John Hutchins <john.hutchins@wicourts.gov>
Reported-by: Laurence Parry <greenreaper@hotmail.com>
Reported-by: Max Madden <maxmmadden@gmail.com>
Reported-by: Braulio Fdo Gonzalez <brauliofg@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/680bdaf6-f7d1-4536-b580-05c2760c67c6@deepbluecap.com
Discussion: https://postgr.es/m/18942-0ab1e5ae156613ad@postgresql.org
Discussion: https://postgr.es/m/18938-57c9a1c463b68ce0@postgresql.org
Discussion: https://postgr.es/m/CAD1FGCT2sYrP_70RTuo56QTizyc+J3wJdtn2gtO3VttQFpdMZg@mail.gmail.com
Discussion: https://postgr.es/m/CANO2=B=2BT1hSYCE=nuuTnVTnjidMg0+-FfnRnqM6kd23qoygg@mail.gmail.com
Backpatch-through: 13
Sometimes the TupleDesc used in verify_compact_attribute() is shared
among backends, and since CompactAttribute.attcacheoff gets updated
during tuple deformation, it was possible that another backend would
set attcacheoff on a given CompactAttribute in the small window of time
from when the attcacheoff from the live CompactAttribute was being set
in the 'tmp' CompactAttribute and before the Assert verifying that the
live and tmp CompactAttributes matched.
Here we adjust the code to make a copy of the live CompactAttribute so
that we're not trying to Assert against a shared copy of it.
Author: David Rowley <dgrowleyml@gmail.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/7195e408-758c-4031-8e61-4f842c716ac0@gmail.com
Previously there was no memory barrier enforcing correct memory ordering when
waiting for a free IO handle. However, in the much more common case of waiting
for IO to complete, memory barriers already were present.
On strongly ordered architectures like x86 this had no negative consequences,
but on some armv8 hardware (observed on Apple hardware), it was possible for
the update, in the IO worker, to PgAioHandle->state to become visible before
->distilled_result becoming visible, leading to rather confusing assertion
failures. The failures were rare enough that the bug sometimes took days to
reproduce when running 027_stream_regress in a loop.
Once finally debugged, it was easy enough to come up with a much quicker
repro: Trigger a lot of very fast IO by limiting io_combine_limit to 1 and
ensure that we always have to wait for a free handle by setting
io_max_concurrency to 1. Triggering lots of concurrent seqscans in that setup
triggers the issue within seconds.
One reason this was hard to debug was that the assertion failure most commonly
happened in WaitReadBuffers(), rather than in the AIO subsystem itself. The
assertions added in this commit make problems like this easier to understand.
Also add a comment to the IO worker explaining that we rely on the lwlock
acquisition for correct memory ordering.
I think it'd be good to add a tap test that stress tests buffer IO, but that's
material for a separate patch.
Thanks a lot to Alexander and Konstantin for all the debugging help.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Investigated-by: Andres Freund <andres@anarazel.de>
Investigated-by: Alexander Lakhin <exclusion@gmail.com>
Investigated-by: Konstantin Knizhnik <knizhnik@garret.ru>
Discussion: https://postgr.es/m/2dkz7azclpeiqcmouamdixyn5xhlzy4rvikxrbovyzvi6rnv5c@pz7o7osv2ahf
Our maintenance of typedefs.list has been a little haphazard
(and apparently we can't alphabetize worth a darn). Replace
the file with the authoritative list from our buildfarm, and
run pgindent using that.
I also updated the additions/exclusions lists in pgindent where
necessary to keep pgindent from messing things up significantly.
Notably, now that regex_t and some related names are macros not real
typedefs, we have to whitelist them explicitly. The exclusions list
has also drifted noticeably, presumably due to changes of system
headers on the buildfarm animals that contribute to the list.
Unlike in prior years, I've not manually added typedef names that
are missing from the buildfarm's list because they are not used to
declare any variables or fields. So there are a few places where
the typedef declaration itself is formatted worse than before,
e.g. typedef enum IoMethod. I could preserve the names that were
manually added to the list previously, but I'd really prefer to find
a less manual way of dealing with these cases. A quick grep finds
about 75 such symbols, most of which have never gotten any special
treatment.
Per discussion among pgsql-release, doing this now seems appropriate
even though we're still a week or two away from making the v18 branch.
In the \conninfo psql command, the "Client User" column shows the user who
established the connection, while the "Superuser" column reflects whether
the current user in the current execution context is a superuser. This means
the users referred to in these columns can differ, for example, if the current
user was changed with the SET ROLE command.
This commit adds a note to the \conninfo documentation to clarify
this behavior and avoid potential confusion.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/685961b8-b6ce-40bb-b2d5-c2ff135d3388@oss.nttdata.com
Commit bba2fbc623 modified \conninfo to display the protocol version
used by the current connection, but it only showed the major version (e.g., 3).
This commit updates \conninfo to display the full protocol version (e.g., 3.2).
Since support for new version 3.2 was added in v18, and the server supports
both 3.0 and 3.2, showing the complete version helps users understand
exactly which protocol version the current session is using.
Although this is a minor behavior change, it's considered a fix for
an oversight in the original patch and is included in v18.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/685961b8-b6ce-40bb-b2d5-c2ff135d3388@oss.nttdata.com
The new tests verify that logical and physical replication slots are still
valid after an immediate restart on checkpoint completion when the slot was
advanced during the checkpoint.
This commit introduces two new injection points to make these tests possible:
* checkpoint-before-old-wal-removal - triggered in the checkpointer process
just before old WAL segments cleanup;
* logical-replication-slot-advance-segment - triggered in
LogicalConfirmReceivedLocation() when restart_lsn was changed enough to
point to the next WAL segment.
Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 17
The patch fixes the issue with the unexpected removal of old WAL segments
after checkpoint, followed by an immediate restart. The issue occurs when
a slot is advanced after the start of the checkpoint and before old WAL
segments are removed at the end of the checkpoint.
The patch introduces a new in-memory state for slots: last_saved_restart_lsn,
which is used to calculate the oldest LSN for removing WAL segments. This
state is updated every time with the current restart_lsn at the moment when
the slot is saved to disk.
This fix changes the shared memory layout. It's applied to HEAD only because
we don't have to preserve ABI compatibility during the beta stage. Another
fix that doesn't affect the ABI is committed to back branches.
Discussion: https://postgr.es/m/1d12d2-67235980-35-19a406a0%4063439497
Author: Vitaly Davydov <v.davydov@postgrespro.ru>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
_bt_readnextpage expects so->currPos.buf to be InvalidBuffer (and for
the position's page to be unlocked) when called. However, it does not
expect there to be no pins held on any page. In particular, so->markPos
might hold a separate pin, both before and after the call. Fix some
comments that seemed to suggest otherwise.
Follow-up commit to commit 7c319f54, which made _bt_killitems drop pins
it acquired itself.
Running COPY within a pipeline can break protocol synchronization in
multiple ways. psql is limited in terms of result processing if mixing
COPY commands with normal queries while controlling a pipeline with the
new meta-commands, as an effect of the following reasons:
- In COPY mode, the backend ignores additional Sync messages and will
not send a matching ReadyForQuery expected by the frontend. Doing a
\syncpipeline just after COPY will leave the frontend waiting for a
ReadyForQuery message that won't be sent, leaving psql out-of-sync.
- libpq automatically sends a Sync with the Copy message which is not
tracked in the command queue, creating an unexpected synchronisation
point that psql cannot really know about. While it is possible to track
such activity for a \copy, this cannot really be done sanely with plain
COPY queries. Backend failures during a COPY would leave the pipeline
in an aborted state while the backend would be in a clean state, ready
to process commands.
At the end, fixing those issues would require modifications in how libpq
handles pipeline and COPY. So, rather than implementing workarounds in
psql to shortcut the libpq internals (with command queue handling for
one), and because meta-commands for pipelines in psql are a new feature
with COPY in a pipeline having a limited impact compared to other
queries, this commit forbids the use of COPY within a pipeline to avoid
possible break of protocol synchronisation within psql. If there is a
use-case for COPY support within pipelines in libpq, this could always
be added in the future, if necessary.
Most of the changes of this commit impacts the tests for psql pipelines,
removing the tests related to COPY. Some TAP tests still exist for COPY
TO/FROM and \copy to/from, to check that that connections are aborted
when this operation is attempted.
Reported-by: Nikita Kalinin <n.kalinin@postgrespro.ru>
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://postgr.es/m/AC468509-06E8-4E2A-A4B1-63046A4AC6AB@postgrespro.ru