1
0
mirror of https://github.com/postgres/postgres.git synced 2025-09-03 15:22:11 +03:00
Commit Graph

2207 Commits

Author SHA1 Message Date
Tom Lane
7e784d1dc1 Improve client error messages for immediate-stop situations.
Up to now, if the DBA issued "pg_ctl stop -m immediate", the message
sent to clients was the same as for a crash-and-restart situation.
This is confusing, not least because the message claims that the
database will soon be up again, something we have no business
predicting.

Improve things so that we can generate distinct messages for the two
cases (and also recognize an ad-hoc SIGQUIT, should somebody try that).
To do that, add a field to pmsignal.c's shared memory data structure
that the postmaster sets just before broadcasting SIGQUIT to its
children.  No interlocking seems to be necessary; the intervening
signal-sending and signal-receipt should sufficiently serialize accesses
to the field.  Hence, this isn't any riskier than the existing usages
of pmsignal.c.

We might in future extend this idea to improve other
postmaster-to-children signal scenarios, although none of them
currently seem to be as badly overloaded as SIGQUIT.

Discussion: https://postgr.es/m/559291.1608587013@sss.pgh.pa.us
2020-12-24 12:58:32 -05:00
Fujii Masao
00f690a239 Revert "Get rid of the dedicated latch for signaling the startup process".
Revert ac22929a26, as well as the followup fix 113d3591b8. Because it broke
the assumption that the startup process waiting for the recovery conflict
on buffer pin should be waken up only by buffer unpin or the timeout enabled
in ResolveRecoveryConflictWithBufferPin(). It caused, for example,
SIGHUP signal handler or walreceiver process to wake that startup process
up unnecessarily frequently.

Additionally, add the comments about why that dedicated latch that
the reverted patch tried to get rid of should not be removed.

Thanks to Kyotaro Horiguchi for the discussion.

Author: Fujii Masao
Discussion: https://postgr.es/m/d8c0c608-021b-3c73-fffd-3240829ee986@oss.nttdata.com
2020-12-17 18:06:51 +09:00
Tom Lane
b3817f5f77 Improve hash_create()'s API for some added robustness.
Invent a new flag bit HASH_STRINGS to specify C-string hashing, which
was formerly the default; and add assertions insisting that exactly
one of the bits HASH_STRINGS, HASH_BLOBS, and HASH_FUNCTION be set.
This is in hopes of preventing recurrences of the type of oversight
fixed in commit a1b8aa1e4 (i.e., mistakenly omitting HASH_BLOBS).

Also, when HASH_STRINGS is specified, insist that the keysize be
more than 8 bytes.  This is a heuristic, but it should catch
accidental use of HASH_STRINGS for integer or pointer keys.
(Nearly all existing use-cases set the keysize to NAMEDATALEN or
more, so there's little reason to think this restriction should
be problematic.)

Tweak hash_create() to insist that the HASH_ELEM flag be set, and
remove the defaults it had for keysize and entrysize.  Since those
defaults were undocumented and basically useless, no callers
omitted HASH_ELEM anyway.

Also, remove memset's zeroing the HASHCTL parameter struct from
those callers that had one.  This has never been really necessary,
and while it wasn't a bad coding convention it was confusing that
some callers did it and some did not.  We might as well save a few
cycles by standardizing on "not".

Also improve the documentation for hash_create().

In passing, improve reinit.c's usage of a hash table by storing
the key as a binary Oid rather than a string; and, since that's
a temporary hash table, allocate it in CurrentMemoryContext for
neatness.

Discussion: https://postgr.es/m/590625.1607878171@sss.pgh.pa.us
2020-12-15 11:38:53 -05:00
Peter Eisentraut
eb93f3a0b6 Convert elog(LOG) calls to ereport() where appropriate
User-visible log messages should go through ereport(), so they are
subject to translation.  Many remaining elog(LOG) calls are really
debugging calls.

Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://www.postgresql.org/message-id/flat/92d6f545-5102-65d8-3c87-489f71ea0a37%40enterprisedb.com
2020-12-04 14:25:23 +01:00
Thomas Munro
57faaf376e Use truncate(2) where appropriate.
When truncating files by name, use truncate(2).  Windows hasn't got it,
so keep our previous coding based on ftruncate(2) as a fallback.

Discussion: https://postgr.es/m/16663-fe97ccf9932fc800%40postgresql.org
2020-12-01 15:42:22 +13:00
Thomas Munro
9f35f94373 Free disk space for dropped relations on commit.
When committing a transaction that dropped a relation, we previously
truncated only the first segment file to free up disk space (the one
that won't be unlinked until the next checkpoint).

Truncate higher numbered segments too, even though we unlink them on
commit.  This frees the disk space immediately, even if other backends
have open file descriptors and might take a long time to get around to
handling shared invalidation events and closing them.  Also extend the
same behavior to the first segment, in recovery.

Back-patch to all supported releases.

Bug: #16663
Reported-by: Denis Patron <denis.patron@previnet.it>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Neil Chen <carpenter.nail.cz@gmail.com>
Reviewed-by: David Zhang <david.zhang@highgo.ca>
Discussion: https://postgr.es/m/16663-fe97ccf9932fc800%40postgresql.org
2020-12-01 13:21:03 +13:00
Alvaro Herrera
dcfff74fb1 Restore lock level to update statusFlags
Reverts 27838981be (some comments are kept).  Per discussion, it does
not seem safe to relax the lock level used for this; in order for it to
be safe, there would have to be memory barriers between the point we set
the flag and the point we set the trasaction Xid, which perhaps would
not be so bad; but there would also have to be barriers at the readers'
side, which from a performance perspective might be bad.

Now maybe this analysis is wrong and it *is* safe for some reason, but
proof of that is not trivial.

Discussion: https://postgr.es/m/20201118190928.vnztes7c2sldu43a@alap3.anarazel.de
2020-11-26 12:30:48 -03:00
Thomas Munro
a7e65dc88b Fix WaitLatch(NULL) on Windows.
Further to commit 733fa9aa, on Windows when a latch is triggered but we
aren't currently waiting for it, we need to locate the latch's HANDLE
rather than calling ResetEvent(NULL).

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reported-by: Ranier Vilela <ranier.vf@gmail.com>
Discussion: https://postgr.es/m/CAEudQArTPi1YBc%2Bn1fo0Asy3QBFhVjp_QgyKG-8yksVn%2ByRTiw%40mail.gmail.com
2020-11-25 17:55:49 +13:00
Michael Paquier
d03d7549b2 Use macros instead of hardcoded offsets for LWLock initialization
This makes the code slightly easier to follow, as the initialization
relies on an offset that overlapped with an equivalent set of macros
defined, which are used in other places already.

Author: Japin Li
Discussion: https://postgr.es/m/MEYP282MB1669FB410006758402F2C3A2B6E00@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
2020-11-24 12:39:58 +09:00
Tom Lane
789b938bf2 Centralize logic for skipping useless ereport/elog calls.
While ereport() and elog() themselves are quite cheap when the
error message level is too low to be printed, some places need to do
substantial work before they can call those macros at all.  To allow
optimizing away such setup work when nothing is to be printed, make
elog.c export a new function message_level_is_interesting(elevel)
that reports whether ereport/elog will do anything.  Make use of that
in various places that had ad-hoc direct tests of log_min_messages etc.
Also teach ProcSleep to use it to avoid some work.  (There may well
be other places that could usefully use this; I didn't search hard.)

Within elog.c, refactor a little bit to avoid having duplicate copies
of the policy-setting logic.  When that code was written, we weren't
relying on the availability of inline functions; so it had some
duplications in the name of efficiency, which I got rid of.

Alvaro Herrera and Tom Lane

Discussion: https://postgr.es/m/129515.1606166429@sss.pgh.pa.us
2020-11-23 19:10:46 -05:00
Alvaro Herrera
450c8230b1 Don't hold ProcArrayLock longer than needed in rare cases
While cancelling an autovacuum worker, we hold ProcArrayLock while
formatting a debugging log string.  We can make this shorter by saving
the data we need to produce the message and doing the formatting outside
the locked region.

This isn't terribly critical, as it only occurs pretty rarely: when a
backend runs deadlock detection and it happens to be blocked by a
autovacuum running autovacuum.  Still, there's no need to cause a hiccup
in ProcArrayLock processing, which can be very high-traffic in some
cases.

While at it, rework code so that we only print the string when it is
really going to be used, as suggested by Michael Paquier.

Discussion: https://postgr.es/m/20201118214127.GA3179@alvherre.pgsql
Reviewed-by: Michael Paquier <michael@paquier.xyz>
2020-11-23 18:55:23 -03:00
Thomas Munro
7888b09994 Add BarrierArriveAndDetachExceptLast().
Provide a way for one process to continue the remaining phases of a
(previously) parallel computation alone.  Later patches will use this to
extend Parallel Hash Join.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BA6ftXPz4oe92%2Bx8Er%2BxpGZqto70-Q_ERwRaSyA%3DafNg%40mail.gmail.com
2020-11-19 18:13:46 +13:00
Alvaro Herrera
27838981be Relax lock level for setting PGPROC->statusFlags
We don't actually need a lock to set PGPROC->statusFlags itself; what we
do need is a shared lock on either XidGenLock or ProcArrayLock in order to
ensure MyProc->pgxactoff keeps still while we modify the mirror array in
ProcGlobal->statusFlags.  Some places were using an exclusive lock for
that, which is excessive.  Relax those to use shared lock only.

procarray.c has a couple of places with somewhat brittle assumptions
about PGPROC changes: ProcArrayEndTransaction uses only shared lock, so
it's permissible to change MyProc only.  On the other hand,
ProcArrayEndTransactionInternal also changes other procs, so it must
hold exclusive lock.  Add asserts to ensure those assumptions continue
to hold.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20201117155501.GA13805@alvherre.pgsql
2020-11-18 13:24:22 -03:00
Tom Lane
2bd49b493a Don't Insert() a VFD entry until it's fully built.
Otherwise, if FDDEBUG is enabled, the debugging output fails because
it tries to read the fileName, which isn't set up yet (and should in
fact always be NULL).

AFAICT, this has been wrong since Berkeley.  Before 96bf88d52,
it would accidentally fail to crash on platforms where snprintf()
is forgiving about being passed a NULL pointer for %s; but the
file name intended to be included in the debug output wouldn't
ever have shown up.

Report and fix by Greg Nancarrow.  Although this is only visibly
broken in custom-made builds, it still seems worth back-patching
to all supported branches, as the FDDEBUG code is pretty useless
as it stands.

Discussion: https://postgr.es/m/CAJcOf-cUDgm9qYtC_B6XrC6MktMPNRby2p61EtSGZKnfotMArw@mail.gmail.com
2020-11-16 20:32:55 -05:00
Alvaro Herrera
cd9c1b3e19 Rename PGPROC->vacuumFlags to statusFlags
With more flags associated to a PGPROC entry that are not related to
vacuum (currently existing or planned), the name "statusFlags" describes
its purpose better.

(The same is done to the mirroring PROC_HDR->vacuumFlags.)

No functional changes in this commit.

This was suggested first by Hari Babu Kommi in [1] and then by Michael
Paquier at [2].

[1] https://postgr.es/m/CAJrrPGcsDC-oy1AhqH0JkXYa0Z2AgbuXzHPpByLoBGMxfOZMEQ@mail.gmail.com
[2] https://postgr.es/m/20200820060929.GB3730@paquier.xyz

Author: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20201116182446.qcg3o6szo2zookyr@localhost
2020-11-16 19:42:55 -03:00
Michael Paquier
788dd0b839 Fix some typos
Author: Daniel Gustafsson
Discussion: https://postgr.es/m/C36ADFDF-D09A-4EE5-B186-CB46C3653F4C@yesql.se
2020-11-14 11:43:10 +09:00
Michael Paquier
e152506ade Revert pg_relation_check_pages()
This reverts the following set of commits, following complaints about
the lack of portability of the central part of the code in bufmgr.c as
well as the use of partition mapping locks during page reads:
c780a7a9
f2b88396
b787d4ce
ce7f772c
60a51c6b

Per discussion with Andres Freund, Robert Haas and myself.

Bump catalog version.

Discussion: https://postgr.es/m/20201029181729.2nrub47u7yqncsv7@alap3.anarazel.de
2020-11-04 10:21:46 +09:00
Amit Kapila
8c2d8f6cc4 Fix typos.
Author: Hou Zhijie
Discussion: https://postgr.es/m/855a9421839d402b8b351d273c89a8f8@G08CNEXMBPEKD05.g08.fujitsu.local
2020-11-03 08:38:27 +05:30
Michael Paquier
8a15e735be Fix some grammar and typos in comments and docs
The documentation fixes are backpatched down to where they apply.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20201031020801.GD3080@telsasoft.com
Backpatch-through: 9.6
2020-11-02 15:14:41 +09:00
Andres Freund
1c7675a7a4 Fix wrong data table horizon computation during backend startup.
When ComputeXidHorizons() was called before MyDatabaseOid is set,
e.g. because a dead row in a shared relation is encountered during
InitPostgres(), the horizon for normal tables was computed too
aggressively, ignoring all backends connected to a database.

During subsequent pruning in a data table the too aggressive horizon
could end up still being used, possibly leading to still needed tuples
being removed. Not good.

This is a bug in dc7420c2c9, which the test added in 94bc27b576 made
visible, if run with force_parallel_mode set to regress. In that case
the bug is reliably triggered, because "pruning_query" is run in a
parallel worker and the start of that parallel worker is likely to
encounter a dead row in pg_database.

The fix is trivial: Compute a more pessimistic data table horizon if
MyDatabaseId is not yet known.

Author: Andres Freund
Discussion: https://postgr.es/m/20201029040030.p4osrmaywhqaesd4@alap3.anarazel.de
2020-10-28 21:49:07 -07:00
Andres Freund
94bc27b576 Centralize horizon determination for temp tables, fixing bug due to skew.
This fixes a bug in the edge case where, for a temp table, heap_page_prune()
can end up with a different horizon than heap_vacuum_rel(). Which can trigger
errors like "ERROR: cannot freeze committed xmax ...".

The bug was introduced due to interaction of a7212be8b9 "Set cutoff xmin more
aggressively when vacuuming a temporary table." with dc7420c2c9 "snapshot
scalability: Don't compute global horizons while building snapshots.".

The problem is caused by lazy_scan_heap() assuming that the only reason its
HeapTupleSatisfiesVacuum() call would return HEAPTUPLE_DEAD is if the tuple is
a HOT tuple, or if the tuple's inserting transaction has aborted since the
heap_page_prune() call. But after a7212be8b9 that was also possible in other
cases for temp tables, because heap_page_prune() uses a different visibility
test after dc7420c2c9.

The fix is fairly simple: Move the special case logic for temp tables from
vacuum_set_xid_limits() to the infrastructure introduced in dc7420c2c9. That
ensures that the horizon used for pruning is at least as aggressive as the one
used by lazy_scan_heap(). The concrete horizon used for temp tables is
slightly different than the logic in dc7420c2c9, but should always be as
aggressive as before (see comments).

A significant benefit to centralizing the logic procarray.c is that now the
more aggressive horizons for temp tables does not just apply to VACUUM but
also to e.g. HOT pruning and the nbtree killtuples logic.

Because isTopLevel is not needed by vacuum_set_xid_limits() anymore, I
undid the the related changes from a7212be8b9.

This commit also adds an isolation test ensuring that the more aggressive
vacuuming and pruning of temp tables keeps working.

Debugged-By: Amit Kapila <amit.kapila16@gmail.com>
Debugged-By: Tom Lane <tgl@sss.pgh.pa.us>
Debugged-By: Ashutosh Sharma <ashu.coek88@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20201014203103.72oke6hqywcyhx7s@alap3.anarazel.de
Discussion: https://postgr.es/m/20201015083735.derdzysdtqdvxshp@alap3.anarazel.de
2020-10-28 18:02:31 -07:00
Michael Paquier
c780a7a90a Add CheckBuffer() to check on-disk pages without shared buffer loading
CheckBuffer() is designed to be a concurrent-safe function able to run
sanity checks on a relation page without loading it into the shared
buffers.  The operation is done using a lock on the partition involved
in the shared buffer mapping hashtable and an I/O lock for the buffer
itself, preventing the risk of false positives due to any concurrent
activity.

The primary use of this function is the detection of on-disk corruptions
for relation pages.  If a page is found in shared buffers, the on-disk
page is checked if not dirty (a follow-up checkpoint would flush a valid
version of the page if dirty anyway), as it could be possible that a
page was present for a long time in shared buffers with its on-disk
version corrupted.  Such a scenario could lead to a corrupted cluster if
a host is plugged off for example.  If the page is not found in shared
buffers, its on-disk state is checked.  PageIsVerifiedExtended() is used
to apply the same sanity checks as when a page gets loaded into shared
buffers.

This function will be used by an upcoming patch able to check the state
of on-disk relation pages using a SQL function.

Author: Julien Rouhaud, Michael Paquier
Reviewed-by:  Masahiko Sawada
Discussion: https://postgr.es/m/CAOBaU_aVvMjQn=ge5qPiJOPMmOj5=ii3st5Q0Y+WuLML5sR17w@mail.gmail.com
2020-10-28 11:12:46 +09:00
Michael Paquier
d401c5769e Extend PageIsVerified() to handle more custom options
This is useful for checks of relation pages without having to load the
pages into the shared buffers, and two cases can make use of that: page
verification in base backups and the online, lock-safe, flavor.

Compatibility is kept with past versions using a macro that calls the
new extended routine with the set of options compatible with the
original version.

Extracted from a larger patch by the same author.

Author: Anastasia Lubennikova
Reviewed-by: Michael Paquier, Julien Rouhaud
Discussion: https://postgr.es/m/608f3476-0598-2514-2c03-e05c7d2b0cbd@postgrespro.ru
2020-10-26 09:55:28 +09:00
Peter Eisentraut
26ec6b5948 Avoid invalid alloc size error in shm_mq
In shm_mq_receive(), a huge payload could trigger an unjustified
"invalid memory alloc request size" error due to the way the buffer
size is increased.

Add error checks (documenting the upper limit) and avoid the error by
limiting the allocation size to MaxAllocSize.

Author: Markus Wanner <markus.wanner@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/3bb363e7-ac04-0ac4-9fe8-db1148755bfa%402ndquadrant.com
2020-10-19 08:52:25 +02:00
Thomas Munro
70516a178a Handle EACCES errors from kevent() better.
While registering for postmaster exit events, we have to handle a couple
of edge cases where the postmaster is already gone.  Commit 815c2f09
missed one: EACCES must surely imply that PostmasterPid no longer
belongs to our postmaster process (or alternatively an unexpected
permissions model has been imposed on us).  Like ESRCH, this should be
treated as a WL_POSTMASTER_DEATH event, rather than being raised with
ereport().

No known problems reported in the wild.  Per code review from Tom Lane.
Back-patch to 13.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3624029.1602701929%40sss.pgh.pa.us
2020-10-15 18:34:21 +13:00
Thomas Munro
b94109ce37 Make WL_POSTMASTER_DEATH level-triggered on kqueue builds.
If WaitEventSetWait() reports that the postmaster has gone away, later
calls to WaitEventSetWait() should continue to report that.  Otherwise
further waits that occur in the proc_exit() path after we already
noticed the postmaster's demise could block forever.

Back-patch to 13, where the kqueue support landed.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3624029.1602701929%40sss.pgh.pa.us
2020-10-15 11:41:58 +13:00
Andres Freund
7b28913bca Fix and test snapshot behavior on standby.
I (Andres) broke this in 623a9CA79bx, because I didn't think about the
way snapshots are built on standbys sufficiently. Unfortunately our
existing tests did not catch this, as they are all just querying with
psql (therefore ending up with fresh snapshots).

The fix is trivial, we just need to increment the transaction
completion counter in ExpireTreeKnownAssignedTransactionIds(), which
is the equivalent of ProcArrayEndTransaction() during recovery.

This commit also adds a new test doing some basic testing of the
correctness of snapshots built on standbys. To avoid the
aforementioned issue of one-shot psql's not exercising the snapshot
caching, the test uses a long lived psqls, similar to
013_crash_restart.pl. It'd be good to extend the test further.

Reported-By: Ian Barwick <ian.barwick@2ndquadrant.com>
Author: Andres Freund <andres@anarazel.de>
Author: Ian Barwick <ian.barwick@2ndquadrant.com>
Discussion: https://postgr.es/m/61291ffe-d611-f889-68b5-c298da9fb18f@2ndquadrant.com
2020-09-30 17:28:51 -07:00
Thomas Munro
dee663f784 Defer flushing of SLRU files.
Previously, we called fsync() after writing out individual pg_xact,
pg_multixact and pg_commit_ts pages due to cache pressure, leading to
regular I/O stalls in user backends and recovery.  Collapse requests for
the same file into a single system call as part of the next checkpoint,
as we already did for relation files, using the infrastructure developed
by commit 3eb77eba.  This can cause a significant improvement to
recovery performance, especially when it's otherwise CPU-bound.

Hoist ProcessSyncRequests() up into CheckPointGuts() to make it clearer
that it applies to all the SLRU mini-buffer-pools as well as the main
buffer pool.  Rearrange things so that data collected in CheckpointStats
includes SLRU activity.

Also remove the Shutdown{CLOG,CommitTS,SUBTRANS,MultiXact}() functions,
because they were redundant after the shutdown checkpoint that
immediately precedes them.  (I'm not sure if they were ever needed, but
they aren't now.)

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (parts)
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com>
Discussion: https://postgr.es/m/CA+hUKGLJ=84YT+NvhkEEDAuUtVHMfQ9i-N7k_o50JmQ6Rpj_OQ@mail.gmail.com
2020-09-25 19:00:15 +12:00
Thomas Munro
733fa9aa51 Allow WaitLatch() to be used without a latch.
Due to flaws in commit 3347c982ba, using WaitLatch() without
WL_LATCH_SET could cause an assertion failure or crash.  Repair.

While here, also add a check that the latch we're switching to belongs
to this backend, when changing from one latch to another.

Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
2020-09-23 15:17:30 +12:00
Peter Eisentraut
80fc96eceb Standardize order of use strict and use warnings in Perl code
The standard order in PostgreSQL and other code is use strict first,
but some code was uselessly inconsistent about this.
2020-09-21 17:04:36 +02:00
Peter Eisentraut
3d13867a2c Fix whitespace 2020-09-20 14:42:54 +02:00
David Rowley
19c60ad69a Optimize compactify_tuples function
This function could often be seen in profiles of vacuum and could often
be a significant bottleneck during recovery. The problem was that a qsort
was performed in order to sort an array of item pointers in reverse offset
order so that we could use that to safely move tuples up to the end of the
page without overwriting the memory of yet-to-be-moved tuples. i.e. we
used to compact the page starting at the back of the page and move towards
the front. The qsort that this required could be expensive for pages with
a large number of tuples.

In this commit, we take another approach to tuple compactification.

Now, instead of sorting the remaining item pointers array we first check
if the array is presorted and only memmove() the tuples that need to be
moved. This presorted check can be done very cheaply in the calling
functions when the array is being populated. This presorted case is very
fast.

When the item pointer array is not presorted we must copy tuples that need
to be moved into a temp buffer before copying them back into the page
again. This differs from what we used to do here as we're now copying the
tuples back into the page in reverse line pointer order. Previously we
left the existing order alone.  Reordering the tuples results in an
increased likelihood of hitting the pre-sorted case the next time around.
Any newly added tuple which consumes a new line pointer will also maintain
the correct sort order of tuples in the page which will also result in the
presorted case being hit the next time.  Only consuming an unused line
pointer can cause the order of tuples to go out again, but that will be
corrected next time the function is called for the page.

Benchmarks have shown that the non-presorted case is at least equally as
fast as the original qsort method even when the page just has a few
tuples. As the number of tuples becomes larger the new method maintains
its performance whereas the original qsort method became much slower when
the number of tuples on the page became large.

Author: David Rowley
Reviewed-by: Thomas Munro
Tested-by: Jakub Wartak
Discussion: https://postgr.es/m/CA+hUKGKMQFVpjr106gRhwk6R-nXv0qOcTreZuQzxgpHESAL6dw@mail.gmail.com
2020-09-16 13:22:20 +12:00
Fujii Masao
95233011a0 Fix typos.
Author: Naoki Nakamichi
Discussion: https://postgr.es/m/b6919d145af00295a8e86ce4d034b7cd@oss.nttdata.com
2020-09-14 14:16:07 +09:00
Peter Eisentraut
3e0242b24c Message fixes and style improvements 2020-09-14 06:42:30 +02:00
Tom Lane
6693a96b32 Don't run atexit callbacks during signal exits from ProcessStartupPacket.
Although 58c6feccf fixed the case for SIGQUIT, we were still calling
proc_exit() from signal handlers for SIGTERM and timeout failures in
ProcessStartupPacket.  Fortunately, at the point where that code runs,
we haven't yet connected to shared memory in any meaningful way, so
there is nothing we need to undo in shared memory.  This means it
should be safe to use _exit(1) here, ie, not run any atexit handlers
but also inform the postmaster that it's not a crash exit.

To make sure nobody breaks the "nothing to undo" expectation, add
a cross-check that no on-shmem-exit or before-shmem-exit handlers
have been registered yet when we finish using these signal handlers.

This change is simple enough that maybe it could be back-patched,
but I won't risk that right now.

Discussion: https://postgr.es/m/1850884.1599601164@sss.pgh.pa.us
2020-09-11 12:20:16 -04:00
Michael Paquier
aad546bd0a doc: Fix some grammar and inconsistencies
Some comments are fixed while on it.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20200818171702.GK17022@telsasoft.com
Backpatch-through: 9.6
2020-09-10 15:50:19 +09:00
Tom Lane
c9ae5cbb88 Install an error check into cancel_before_shmem_exit().
Historically, cancel_before_shmem_exit() just silently did nothing
if the specified callback wasn't the top-of-stack.  The folly of
ignoring this case was exposed by the bugs fixed in 303640199 and
bab150045, so let's make it throw elog(ERROR) instead.

There is a decent argument to be made that PG_ENSURE_ERROR_CLEANUP
should use some separate infrastructure, so it wouldn't break if
something inside the guarded code decides to register a new
before_shmem_exit callback.  However, a survey of the surviving
uses of before_shmem_exit() and PG_ENSURE_ERROR_CLEANUP doesn't
show any plausible conflicts of that sort today, so for now we'll
forgo the extra complexity.  (It will almost certainly become
necessary if anyone ever wants to wrap PG_ENSURE_ERROR_CLEANUP
around arbitrary user-defined actions, though.)

No backpatch, since this is developer support not a production issue.

Bharath Rupireddy, per advice from Andres Freund, Robert Haas, and myself

Discussion: https://postgr.es/m/CALj2ACWk7j4F2v2fxxYfrroOF=AdFNPr1WsV+AGtHAFQOqm_pw@mail.gmail.com
2020-09-08 15:54:25 -04:00
Andres Freund
5871f09c98 Fix autovacuum cancellation.
The problem is caused by me (Andres) having ProcSleep() look at the
wrong PGPROC entry in 5788e258bb.

Unfortunately it seems hard to write a reliable test for autovacuum
cancellations. Perhaps somebody will come up with a good approach, but
it seems worth fixing the issue even without a test.

Reported-By: Jeff Janes <jeff.janes@gmail.com>
Author: Jeff Janes <jeff.janes@gmail.com>
Discussion: https://postgr.es/m/CAMkU=1wH2aUy+wDRDz+5RZALdcUnEofV1t9PzXS_gBJO9vZZ0Q@mail.gmail.com
2020-09-08 11:25:34 -07:00
Thomas Munro
861c6e7c8e Skip unnecessary stat() calls in walkdir().
Some kernels can tell us the type of a "dirent", so we can avoid a call
to stat() or lstat() in many cases.  Define a new function
get_dirent_type() to contain that logic, for use by the backend and
frontend versions of walkdir(), and perhaps other callers in future.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Juan José Santamaría Flecha <juanjo.santamaria@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BFzxupGGN4GpUdbzZN%2Btn6FQPHo8w0Q%2BAPH5Wz8RG%2Bww%40mail.gmail.com
2020-09-07 18:28:06 +12:00
Tom Lane
695de5d1ed Split Makefile symbol CFLAGS_VECTOR into two symbols.
Replace CFLAGS_VECTOR with CFLAGS_UNROLL_LOOPS and CFLAGS_VECTORIZE,
allowing us to distinguish whether we want to apply -funroll-loops,
-ftree-vectorize, or both to a particular source file.  Up to now
the only consumer of the symbol has been checksum.c which wants
both, so that there was no need to distinguish; but that's about
to change.

Amit Khandekar, reviewed and edited a little by me

Discussion: https://postgr.es/m/CAJ3gD9evtA_vBo+WMYMyT-u=keHX7-r8p2w7OSRfXf42LTwCZQ@mail.gmail.com
2020-09-06 21:28:16 -04:00
Magnus Hagander
2a093355aa Fix typo in comment
Author: Hou, Zhijie
2020-09-06 19:26:55 +02:00
Tom Lane
a5cc4dab6d Yet more elimination of dead stores and useless initializations.
I'm not sure what tool Ranier was using, but the ones I contributed
were found by using a newer version of scan-build than I tried before.

Ranier Vilela and Tom Lane

Discussion: https://postgr.es/m/CAEudQAo1+AcGppxDSg8k+zF4+Kv+eJyqzEDdbpDg58-=MQcerQ@mail.gmail.com
2020-09-05 13:17:32 -04:00
Bruce Momjian
e36e936e0e remove redundant initializations
Reported-by: Ranier Vilela

Discussion: https://postgr.es/m/CAEudQAo1+AcGppxDSg8k+zF4+Kv+eJyqzEDdbpDg58-=MQcerQ@mail.gmail.com

Author: Ranier Vilela

Backpatch-through: master
2020-09-03 22:57:35 -04:00
Amit Kapila
4ab77697f6 Fix the SharedFileSetUnregister API.
Commit 808e13b282 introduced a few APIs to extend the existing Buffile
interface. In SharedFileSetDeleteOnProcExit, it tries to delete the list
element while traversing the list with 'foreach' construct which makes the
behavior of list traversal unpredictable.

Author: Amit Kapila
Reviewed-by: Dilip Kumar
Tested-by: Dilip Kumar and Neha Sharma
Discussion: https://postgr.es/m/CAA4eK1JhLatVcQ2OvwA_3s0ih6Hx9+kZbq107cXVsSWWukH7vA@mail.gmail.com
2020-09-01 08:11:39 +05:30
Michael Paquier
77c7267c37 Fix comment in procarray.c
The description of GlobalVisDataRels was missing, GlobalVisCatalogRels
being mentioned instead.

Author: Jim Nasby
Discussion: https://postgr.es/m/8e06c883-2858-1fd4-07c5-560c28b08dcd@amazon.com
2020-08-27 16:40:34 +09:00
Tom Lane
e942af7b82 Suppress compiler warning in non-cassert builds.
Oversight in 808e13b28, reported by Bruce Momjian.

Discussion: https://postgr.es/m/20200826160251.GB21909@momjian.us
2020-08-26 17:08:11 -04:00
Amit Kapila
808e13b282 Extend the BufFile interface.
Allow BufFile to support temporary files that can be used by the single
backend when the corresponding files need to be survived across the
transaction and need to be opened and closed multiple times. Such files
need to be created as a member of a SharedFileSet.

Additionally, this commit implements the interface for BufFileTruncate to
allow files to be truncated up to a particular offset and extends the
BufFileSeek API to support the SEEK_END case. This also adds an option to
provide a mode while opening the shared BufFiles instead of always opening
in read-only mode.

These enhancements in BufFile interface are required for the upcoming
patch to allow the replication apply worker, to handle streamed
in-progress transactions.

Author: Dilip Kumar, Amit Kapila
Reviewed-by: Amit Kapila
Tested-by: Neha Sharma
Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
2020-08-26 07:36:43 +05:30
Fujii Masao
d259afa736 Fix typos in comments.
Author: Masahiko Sawada
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CA+fd4k4m9hFSrRLB3etPWO5_v5=MujVZWRtz63q+55hM0Dz25Q@mail.gmail.com
2020-08-21 12:35:22 +09:00
Andres Freund
1fe1f42e3e Acquire ProcArrayLock exclusively in ProcArrayClearTransaction.
This corrects an oversight by me in 2072932407, which made
ProcArrayClearTransaction() increment xactCompletionCount. That requires an
exclusive lock, obviously.

There's other approaches that avoid the exclusive acquisition, but given that a
2PC commit is fairly heavyweight, it doesn't seem worth doing so. I've not been
able to measure a performance difference, unsurprisingly.  I did add a
comment documenting that we could do so, should it ever become a bottleneck.

Reported-By: Tom Lane <tgl@sss.pgh.pa.us>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/1355915.1597794204@sss.pgh.pa.us
2020-08-19 18:24:33 -07:00
Andres Freund
07f32fcd23 Fix race condition in snapshot caching when 2PC is used.
When preparing a transaction xactCompletionCount needs to be
incremented, even though the transaction has not committed
yet. Otherwise the snapshot used within the transaction otherwise can
get reused outside of the prepared transaction. As GetSnapshotData()
does not include the current xid when building a snapshot, reuse would
not be correct.

Somewhat surprisingly the regression tests only rarely show incorrect
results without the fix. The reason for that is that often the
snapshot's xmax will be >= the backend xid, yielding a snapshot that
is correct, despite the bug.

I'm working on a reliable test for the bug, but it seems worth seeing
whether this fixes all the BF failures while I do.

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/E1k7tGP-0005V0-5k@gemulon.postgresql.org
2020-08-18 16:31:12 -07:00