1
0
mirror of https://github.com/postgres/postgres.git synced 2025-08-28 18:48:04 +03:00
Commit Graph

2291 Commits

Author SHA1 Message Date
Tom Lane
28d936031a Get rid of artificial restriction on hash table sizes on Windows.
The point of introducing the hash_mem_multiplier GUC was to let users
reproduce the old behavior of hash aggregation, i.e. that it could use
more than work_mem at need.  However, the implementation failed to get
the job done on Win64, where work_mem is clamped to 2GB to protect
various places that calculate memory sizes using "long int".  As
written, the same clamp was applied to hash_mem.  This resulted in
severe performance regressions for queries requiring a bit more than
2GB for hash aggregation, as they now spill to disk and there's no
way to stop that.

Getting rid of the work_mem restriction seems like a good idea, but
it's a big job and could not conceivably be back-patched.  However,
there's only a fairly small number of places that are concerned with
the hash_mem value, and it turns out to be possible to remove the
restriction there without too much code churn or any ABI breaks.
So, let's do that for now to fix the regression, and leave the
larger task for another day.

This patch does introduce a bit more infrastructure that should help
with the larger task, namely pg_bitutils.h support for working with
size_t values.

Per gripe from Laurent Hasson.  Back-patch to v13 where the
behavior change came in.

Discussion: https://postgr.es/m/997817.1627074924@sss.pgh.pa.us
Discussion: https://postgr.es/m/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com
2021-07-25 14:02:27 -04:00
Andres Freund
d9d8aa9bb9 Deduplicate choice of horizon for a relation procarray.c.
5a1e1d8302 was a minimal bug fix for dc7420c2c9. To avoid future bugs of
that kind, deduplicate the choice of a relation's horizon into a new helper,
GlobalVisHorizonKindForRel().

As the code in question was only introduced in dc7420c2c9 it seems worth
backpatching this change as well, otherwise 14 will look different from all
other branches.

A different approach to this was suggested by Matthias van de Meent.

Author: Andres Freund
Discussion: https://postgr.es/m/20210621122919.2qhu3pfugxxp3cji@alap3.anarazel.de
Backpatch: 14, like 5a1e1d8302
2021-07-24 20:25:37 -07:00
Tom Lane
6310809c4a Fix check for conflicting session- vs transaction-level locks.
We have an implementation restriction that PREPARE TRANSACTION can't
handle cases where both session-lifespan and transaction-lifespan locks
are held on the same lockable object.  (That's because we'd otherwise
need to acquire a new PROCLOCK entry during post-prepare cleanup, which
is an operation that might fail.  The situation can only arise with odd
usages of advisory locks, so removing the restriction is probably not
worth the amount of effort it would take.)  AtPrepare_Locks attempted
to enforce this, but its logic was many bricks shy of a load, because
it only detected cases where the session and transaction locks had the
same lockmode.  Locks of different modes on the same object would lead
to the rather unhelpful message "PANIC: we seem to have dropped a bit
somewhere".

To fix, build a transient hashtable with one entry per locktag,
not one per locktag + mode, and use that to detect conflicts.

Per bug #17122 from Alexander Pyhalov.  This bug is ancient,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/17122-04f3c32098a62233@postgresql.org
2021-07-24 18:35:52 -04:00
Thomas Munro
0c08856856 Don't use #if inside function-like macro arguments.
No concrete problem reported, but in the past it's been known to cause
problems on some compilers so let's avoid doing that.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/234364.1626704007%40sss.pgh.pa.us
2021-07-20 11:19:13 +12:00
Thomas Munro
04cad8f7bc Adjust commit 2dbe8905 for ancient macOS.
A couple of open flags used in an assertion didn't exist in macOS 10.4.
Per build farm animal prairiedog.  Also add O_EXCL while here (there are
a few more standard flags but they're not relevant and likely to be
missing).
2021-07-19 16:50:21 +12:00
Thomas Munro
2dbe890571 Support direct I/O on macOS.
Macs don't understand O_DIRECT, but they can disable caching with a
separate fcntl() call.  Extend the file opening functions in fd.c to
handle this for us if the caller passes in PG_O_DIRECT.

For now, this affects only WAL data and even then only if you set:

  max_wal_senders=0
  wal_level=minimal

This is not expected to be very useful on its own, but later proposed
patches will make greater use of direct I/O, and it'll be useful for
testing if developers on Macs can see the effects.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2BADiyyHe0cun2wfT%2BSVnFVqNYPxoO6J9zcZkVO7%2BNGig%40mail.gmail.com
2021-07-19 11:01:01 +12:00
Tom Lane
a49d081235 Replace explicit PIN entries in pg_depend with an OID range test.
As of v14, pg_depend contains almost 7000 "pin" entries recording
the OIDs of built-in objects.  This is a fair amount of bloat for
every database, and it adds time to pg_depend lookups as well as
initdb.  We can get rid of all of those entries in favor of an OID
range check, i.e. "OIDs below FirstUnpinnedObjectId are pinned".

(template1 and the public schema are exceptions.  Those exceptions
are now wired into IsPinnedObject() instead of initdb's code for
filling pg_depend, but it's the same amount of cruft either way.)

The contents of pg_shdepend are modified likewise.

Discussion: https://postgr.es/m/3737988.1618451008@sss.pgh.pa.us
2021-07-15 11:41:47 -04:00
Peter Eisentraut
55b2a23407 Fix lack of message pluralization 2021-07-14 09:15:14 +02:00
Tom Lane
f10f0ae420 Replace RelationOpenSmgr() with RelationGetSmgr().
The idea behind this patch is to design out bugs like the one fixed
by commit 9d523119f.  Previously, once one did RelationOpenSmgr(rel),
it was considered okay to access rel->rd_smgr directly for some
not-very-clear interval.  But since that pointer will be cleared by
relcache flushes, we had bugs arising from overreliance on a previous
RelationOpenSmgr call still being effective.

Now, very little code except that in rel.h and relcache.c should ever
touch the rd_smgr field directly.  The normal coding rule is to use
RelationGetSmgr(rel) and not expect the result to be valid for longer
than one smgr function call.  There are a couple of places where using
the function every single time seemed like overkill, but they are now
annotated with large warning comments.

Amul Sul, after an idea of mine.

Discussion: https://postgr.es/m/CANiYTQsU7yMFpQYnv=BrcRVqK_3U3mtAzAsJCaqtzsDHfsUbdQ@mail.gmail.com
2021-07-12 17:01:36 -04:00
Thomas Munro
2f78338064 Remove more obsolete comments about semaphores.
Commit 6753333f stopped using semaphores as the sleep/wake mechanism for
heavyweight locks, but some obsolete references to that scheme remained
in comments.  As with similar commit 25b93a29, back-patch all the way.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CA%2BhUKGLafjB1uzXcy%3D%3D2L3cy7rjHkqOVn7qRYGBjk%3D%3DtMJE7Yg%40mail.gmail.com
2021-07-09 18:04:24 +12:00
Thomas Munro
a2595e039c Remove some dead stores.
Remove redundant local variable assignments left behind by commit
2fc7af5e96.

Author: Quan Zongliang <quanzongliang@yeah.net>
Reviewed-by: Jacob Champion <pchampion@vmware.com>
Discussion: https://postgr.es/m/de141d14-4fd6-3148-99bf-856b71aa948a%40yeah.net
2021-07-02 13:32:21 +12:00
David Rowley
3788c66788 Improve various places that double the size of a buffer
Several places were performing a tight loop to determine the first power
of 2 number that's > or >= the required memory.  Instead of using a loop
for that, we can use pg_nextpower2_32 or pg_nextpower2_64.  When we need a
power of 2 number equal to or greater than a given amount, we just pass
the amount to the nextpower2 function.  When we need a power of 2 greater
than the amount, we just pass the amount + 1.

Additionally, in tsearch there were a couple of locations that were
performing a while loop when a simple "if" would have done.  In both of
these locations only 1 item is being added, so the loop could only have
ever iterated once.  Changing the loop into an if statement makes the code
very slightly more optimal as the condition is checked once rather than
twice.

There are quite a few remaining locations that increase the size of the
buffer in the following form:

  while (reqsize >= buflen)
  {
     buflen *= 2;
     buf = repalloc(buf, buflen);
  }

These are not touched in this commit.  repalloc will error out for sizes
larger than MaxAllocSize.  Changing these to use pg_nextpower2_32 would
remove the chance of that error being raised.  It's unclear from the code
if the sizes could ever become that large, so err on the side of caution.

Discussion: https://postgr.es/m/CAApHDvp=tns7RL4PH0ZR0M+M-YFLquK7218x=0B_zO+DbOma+w@mail.gmail.com
Reviewed-by: Zhihong Yu
2021-07-01 15:29:06 +12:00
Andrew Dunstan
e1c1c30f63 Pre branch pgindent / pgperltidy run
Along the way make a slight adjustment to
src/include/utils/queryjumble.h to avoid an unused typedef.
2021-06-28 11:05:54 -04:00
Peter Eisentraut
c31833779d Message style improvements 2021-06-28 08:36:44 +02:00
Andres Freund
5a1e1d8302 Use correct horizon when vacuuming catalog relations.
In dc7420c2c9 I (Andres) accidentally used
RelationIsAccessibleInLogicalDecoding() as the sole condition to use the
non-shared catalog horizon in GetOldestNonRemovableTransactionId(). That is
incorrect, as RelationIsAccessibleInLogicalDecoding() checks whether wal_level
is logical.

The correct check, as done e.g. in GlobalVisTestFor(), is to check
IsCatalogRelation() and RelationIsAccessibleInLogicalDecoding().

The observed misbehavior of this bug was that there could be an endless loop
in lazy_scan_prune(), because the horizons used in heap_page_prune() and the
individual tuple liveliness checks did not match. Likely there are other
potential consequences as well.

A later commit will unify the determination which horizon has to be used, and
add additional assertions to make it easier to catch a bug like this.

Reported-By: Justin Pryzby <pryzby@telsasoft.com>
Diagnosed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAEze2Wg32Y9+WJfw=aofkRx1ZRFt_Ev6bNPc4PSaz7PjSFtZgQ@mail.gmail.com
2021-06-21 05:13:46 -07:00
Noah Misch
5f1df62a45 Remove pg_wait_for_backend_termination().
It was unable to wait on a backend that had already left the procarray.
Users tolerant of that limitation can poll pg_stat_activity.  Other
users can employ the "timeout" argument of pg_terminate_backend().

Reviewed by Bharath Rupireddy.

Discussion: https://postgr.es/m/20210605013236.GA208701@rfd.leadboat.com
2021-06-14 17:29:37 -07:00
Noah Misch
0aac73e6a2 Copy-edit text for the pg_terminate_backend() "timeout" parameter.
Revert the pg_description entry to its v13 form, since those messages
usually remain shorter and don't discuss individual parameters.  No
catversion bump, since pg_description content does not impair backend
compatibility or application compatibility.

Justin Pryzby

Discussion: https://postgr.es/m/20210612182743.GY16435@telsasoft.com
2021-06-14 17:29:37 -07:00
Andres Freund
d8e950d3ae Improve and cleanup ProcArrayAdd(), ProcArrayRemove().
941697c3c1 changed ProcArrayAdd()/Remove() substantially. As reported by
zhanyi, I missed that due to the introduction of the PGPROC->pgxactoff
ProcArrayRemove() does not need to search for the position in
procArray->pgprocnos anymore - that's pgxactoff. Remove the search loop.

The removal of the search loop reduces assertion coverage a bit - but we can
easily do better than before by adding assertions to other loops over
pgprocnos elements.

Also do a bit of cleanup, mostly by reducing line lengths by introducing local
helper variables and adding newlines.

Author: zhanyi <w@hidva.com>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/tencent_5624AA3B116B3D1C31CA9744@qq.com
2021-06-11 21:33:07 -07:00
David Rowley
2ded19fa3a Fix typo and outdated information in README.barrier
README.barrier didn't seem to get the memo when atomics were added. Fix
that.

Author: Tatsuo Ishii, David Rowley
Discussion: https://postgr.es/m/20210516.211133.2159010194908437625.t-ishii%40sraoss.co.jp
Backpatch-through: 9.6, oldest supported release
2021-05-18 09:54:56 +12:00
Tom Lane
def5b065ff Initial pgindent and pgperltidy run for v14.
Also "make reformat-dat-files".

The only change worthy of note is that pgindent messed up the formatting
of launcher.c's struct LogicalRepWorkerId, which led me to notice that
that struct wasn't used at all anymore, so I just took it out.
2021-05-12 13:14:10 -04:00
Thomas Munro
c2dc19342e Revert recovery prefetching feature.
This set of commits has some bugs with known fixes, but at this late
stage in the release cycle it seems best to revert and resubmit next
time, along with some new automated test coverage for this whole area.

Commits reverted:

dc88460c: Doc: Review for "Optionally prefetch referenced data in recovery."
1d257577: Optionally prefetch referenced data in recovery.
f003d9f8: Add circular WAL decoding buffer.
323cbe7c: Remove read_page callback from XLogReader.

Remove the new GUC group WAL_RECOVERY recently added by a55a9847, as the
corresponding section of config.sgml is now reverted.

Discussion: https://postgr.es/m/CAOuzzgrn7iKnFRsB4MHp3UisEQAGgZMbk_ViTN4HV4-Ksq8zCg%40mail.gmail.com
2021-05-10 16:06:09 +12:00
Peter Eisentraut
b94409a02f Prevent lwlock dtrace probes from unnecessary work
If dtrace is compiled in but disabled, the lwlock dtrace probes still
evaluate their arguments.  Since PostgreSQL 13, T_NAME(lock) does
nontrivial work, so it should be avoided if not needed.  To fix, make
these calls conditional on the *_ENABLED() macro corresponding to each
probe.

Reviewed-by: Craig Ringer <craig.ringer@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/CAGRY4nwxKUS_RvXFW-ugrZBYxPFFM5kjwKT5O+0+Stuga5b4+Q@mail.gmail.com
2021-05-03 12:18:27 +02:00
Peter Eisentraut
82c3cd9741 Factor out system call names from error messages
Instead, put them in via a format placeholder.  This reduces the
number of distinct translatable messages and also reduces the chances
of typos during translation.  We already did this for the system call
arguments in a number of cases, so this is just the same thing taken a
bit further.

Discussion: https://www.postgresql.org/message-id/flat/92d6f545-5102-65d8-3c87-489f71ea0a37%40enterprisedb.com
2021-04-23 14:21:37 +02:00
Peter Eisentraut
9486844f30 Use correct format placeholder for WSAGetLastError()
Some code thought this was unsigned, but it's signed int.
2021-04-23 14:21:37 +02:00
Michael Paquier
7ef8b52cf0 Fix typos and grammar in comments and docs
Author: Justin Pryzby
Discussion: https://postgr.es/m/20210416070310.GG3315@telsasoft.com
2021-04-19 11:32:30 +09:00
Thomas Munro
1d257577e0 Optionally prefetch referenced data in recovery.
Introduce a new GUC recovery_prefetch, disabled by default.  When
enabled, look ahead in the WAL and try to initiate asynchronous reading
of referenced data blocks that are not yet cached in our buffer pool.
For now, this is done with posix_fadvise(), which has several caveats.
Better mechanisms will follow in later work on the I/O subsystem.

The GUC maintenance_io_concurrency is used to limit the number of
concurrent I/Os we allow ourselves to initiate, based on pessimistic
heuristics used to infer that I/Os have begun and completed.

The GUC wal_decode_buffer_size is used to limit the maximum distance we
are prepared to read ahead in the WAL to find uncached blocks.

Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (parts)
Reviewed-by: Andres Freund <andres@anarazel.de> (parts)
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (parts)
Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com>
Tested-by: Dmitry Dolgov <9erthalion6@gmail.com>
Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com>
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
2021-04-08 23:20:42 +12:00
Magnus Hagander
aaf0432572 Add functions to wait for backend termination
This adds a function, pg_wait_for_backend_termination(), and a new
timeout argument to pg_terminate_backend(), which will wait for the
backend to actually terminate (with or without signaling it to do so
depending on which function is called). The default behaviour of
pg_terminate_backend() remains being timeout=0 which does not waiting.
For pg_wait_for_backend_termination() the default wait is 5 seconds.

Author: Bharath Rupireddy
Reviewed-By: Fujii Masao, David Johnston, Muhammad Usama,
             Hou Zhijie, Magnus Hagander
Discussion: https://postgr.es/m/CALj2ACUBpunmyhYZw-kXCYs5NM+h6oG_7Df_Tn4mLmmUQifkqA@mail.gmail.com
2021-04-08 11:40:54 +02:00
Thomas Munro
2f27f8c511 Provide ReadRecentBuffer() to re-pin buffers by ID.
If you know the ID of a buffer that recently held a block that you would
like to pin, this function can be used check if it's still there.  It
can be used to avoid a second lookup in the buffer mapping table after
PrefetchBuffer() reports a cache hit.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq=AovOddfHpA@mail.gmail.com
2021-04-08 17:50:25 +12:00
Peter Geoghegan
3c3b8a4b26 Truncate line pointer array during VACUUM.
Teach VACUUM to truncate the line pointer array of each heap page when a
contiguous group of LP_UNUSED line pointers appear at the end of the
array -- these unused and unreferenced items are excluded.  This process
occurs during VACUUM's second pass over the heap, right after LP_DEAD
line pointers on the page (those encountered/pruned during the first
pass) are marked LP_UNUSED.

Truncation avoids line pointer bloat with certain workloads,
particularly those involving continual range DELETEs and bulk INSERTs
against the same table.

Also harden heapam code to check for an out-of-range page offset number
in places where we weren't already doing so.

Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAEze2WjgaQc55Y5f5CQd3L=eS5CZcff2Obxp=O6pto8-f0hC4w@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-Wzn6a64PJM1Ggzm=uvx2otsopJMhFQj_g1rAj4GWr3ZSzw@mail.gmail.com
2021-04-07 08:47:15 -07:00
Andres Freund
90c885cdab Increment xactCompletionCount during subtransaction abort.
Snapshot caching, introduced in 623a9ba79b, did not increment
xactCompletionCount during subtransaction abort. That could lead to an older
snapshot being reused. That is, at least as far as I can see, not a
correctness issue (for MVCC snapshots there's no difference between "in
progress" and "aborted"). The only difference between the old and new
snapshots would be a newer ->xmax.

While HeapTupleSatisfiesMVCC makes the same visibility determination, reusing
the old snapshot leads HeapTupleSatisfiesMVCC to not set
HEAP_XMIN_INVALID. Which subsequently causes the kill_prior_tuple optimization
to not kick in (via HeapTupleIsSurelyDead() returning false). The performance
effects of doing the same index-lookups over and over again is how the issue
was discovered...

Fix the issue by incrementing xactCompletionCount in
XidCacheRemoveRunningXids. It already acquires ProcArrayLock exclusively,
making that an easy proposition.

Add a test to ensure that kill_prior_tuple prevents index growth when it
involves aborted subtransaction of the current transaction.

Author: Andres Freund
Discussion: https://postgr.es/m/20210406043521.lopeo7bbigad3n6t@alap3.anarazel.de
Discussion: https://postgr.es/m/20210317055718.v6qs3ltzrformqoa%40alap3.anarazel.de
2021-04-06 09:24:50 -07:00
Fujii Masao
43620e3286 Add function to log the memory contexts of specified backend process.
Commit 3e98c0bafb added pg_backend_memory_contexts view to display
the memory contexts of the backend process. However its target process
is limited to the backend that is accessing to the view. So this is
not so convenient when investigating the local memory bloat of other
backend process. To improve this situation, this commit adds
pg_log_backend_memory_contexts() function that requests to log
the memory contexts of the specified backend process.

This information can be also collected by calling
MemoryContextStats(TopMemoryContext) via a debugger. But
this technique cannot be used in some environments because no debugger
is available there. So, pg_log_backend_memory_contexts() allows us to
see the memory contexts of specified backend more easily.

Only superusers are allowed to request to log the memory contexts
because allowing any users to issue this request at an unbounded rate
would cause lots of log messages and which can lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its memory contexts at LOG_SERVER_ONLY level,
so that these memory contexts will appear in the server log but not
be sent to the client. It logs one message per memory context.
Because if it buffers all memory contexts into StringInfo to log them
as one message, which may require the buffer to be enlarged very much
and lead to OOM error since there can be a large number of memory
contexts in a backend.

When a backend process is consuming huge memory, logging all its
memory contexts might overrun available disk space. To prevent this,
now this patch limits the number of child contexts to log per parent
to 100. As with MemoryContextStats(), it supposes that practical cases
where the log gets long will typically be huge numbers of siblings
under the same parent context; while the additional debugging value
from seeing details about individual siblings beyond 100 will not be large.

There was another proposed patch to add the function to return
the memory contexts of specified backend as the result sets,
instead of logging them, in the discussion. However that patch is
not included in this commit because it had several issues to address.

Thanks to Tatsuhito Kasahara, Andres Freund, Tom Lane, Tomas Vondra,
Michael Paquier, Kyotaro Horiguchi and Zhihong Yu for the discussion.

Bump catalog version.

Author: Atsushi Torikoshi
Reviewed-by: Kyotaro Horiguchi, Zhihong Yu, Fujii Masao
Discussion: https://postgr.es/m/0271f440ac77f2a4180e0e56ebd944d1@oss.nttdata.com
2021-04-06 13:44:15 +09:00
Fujii Masao
ad8b674922 Shut down transaction tracking at startup process exit.
Maxim Orlov reported that the shutdown of standby server could result in
the following assertion failure. The cause of this issue was that,
when the shutdown caused the startup process to exit, recovery-time
transaction tracking was not shut down even if it's already initialized,
and some locks the tracked transactions were holding could not be released.
At this situation, if other process was invoked and the PGPROC entry that
the startup process used was assigned to it, it found such unreleased locks
and caused the assertion failure, during the initialization of it.

    TRAP: FailedAssertion("SHMQueueEmpty(&(MyProc->myProcLocks[i]))"

This commit fixes this issue by making the startup process shut down
transaction tracking and release all locks, at the exit of it.

Back-patch to all supported branches.

Reported-by: Maxim Orlov
Author: Fujii Masao
Reviewed-by: Maxim Orlov
Discussion: https://postgr.es/m/ad4ce692cc1d89a093b471ab1d969b0b@postgrespro.ru
2021-04-06 02:25:37 +09:00
Andres Freund
225a22b19e Improve efficiency of wait event reporting, remove proc.h dependency.
pgstat_report_wait_start() and pgstat_report_wait_end() required two
conditional branches so far. One to check if MyProc is NULL, the other to
check if pgstat_track_activities is set. As wait events are used around
comparatively lightweight operations, and are inlined (reducing branch
predictor effectiveness), that's not great.

The dependency on MyProc has a second disadvantage: Low-level subsystems, like
storage/file/fd.c, report wait events, but architecturally it is preferable
for them to not depend on inter-process subsystems like proc.h (defining
PGPROC).  After this change including pgstat.h (nor obviously its
sub-components like backend_status.h, wait_event.h, ...) does not pull in IPC
related headers anymore.

These goals, efficiency and abstraction, are achieved by having
pgstat_report_wait_start/end() not interact with MyProc, but instead a new
my_wait_event_info variable. At backend startup it points to a local variable,
removing the need to check for MyProc being NULL. During process
initialization my_wait_event_info is redirected to MyProc->wait_event_info. At
shutdown this is reversed. Because wait event reporting now does not need to
know about where the wait event is stored, it does not need to know about
PGPROC anymore.

The removal of the branch for checking pgstat_track_activities is simpler:
Don't check anymore. The cost due to the branch are often higher than the
store - and even if not, pgstat_track_activities is rarely disabled.

The main motivator to commit this work now is that removing the (indirect)
pgproc.h include from pgstat.h simplifies a patch to move statistics reporting
to shared memory (which still has a chance to get into 14).

Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210402194458.2vu324hkk2djq6ce@alap3.anarazel.de
2021-04-03 12:03:45 -07:00
Andres Freund
1d9c5d0ce2 Do not rely on pgstat.h to indirectly include storage/ headers.
An upcoming patch might remove the (now indirect) proc.h
include (which in turn includes other headers), and it's cleaner for
the modified files to include their dependencies directly anyway...

Discussion: https://postgr.es/m/20210402194458.2vu324hkk2djq6ce@alap3.anarazel.de
2021-04-02 20:02:47 -07:00
Stephen Frost
c9c41c7a33 Rename Default Roles to Predefined Roles
The term 'default roles' wasn't quite apt as these roles aren't able to
be modified or removed after installation, so rename them to be
'Predefined Roles' instead, adding an entry into the newly added
Obsolete Appendix to help users of current releases find the new
documentation.

Bruce Momjian and Stephen Frost

Discussion: https://postgr.es/m/157742545062.1149.11052653770497832538%40wrigleys.postgresql.org
and https://www.postgresql.org/message-id/20201120211304.GG16415@tamriel.snowman.net
2021-04-01 15:32:06 -04:00
Etsuro Fujita
27e1f14563 Add support for asynchronous execution.
This implements asynchronous execution, which runs multiple parts of a
non-parallel-aware Append concurrently rather than serially to improve
performance when possible.  Currently, the only node type that can be
run concurrently is a ForeignScan that is an immediate child of such an
Append.  In the case where such ForeignScans access data on different
remote servers, this would run those ForeignScans concurrently, and
overlap the remote operations to be performed simultaneously, so it'll
improve the performance especially when the operations involve
time-consuming ones such as remote join and remote aggregation.

We may extend this to other node types such as joins or aggregates over
ForeignScans in the future.

This also adds the support for postgres_fdw, which is enabled by the
table-level/server-level option "async_capable".  The default is false.

Robert Haas, Kyotaro Horiguchi, Thomas Munro, and myself.  This commit
is mostly based on the patch proposed by Robert Haas, but also uses
stuff from the patch proposed by Kyotaro Horiguchi and from the patch
proposed by Thomas Munro.  Reviewed by Kyotaro Horiguchi, Konstantin
Knizhnik, Andrey Lepikhov, Movead Li, Thomas Munro, Justin Pryzby, and
others.

Discussion: https://postgr.es/m/CA%2BTgmoaXQEt4tZ03FtQhnzeDEMzBck%2BLrni0UWHVVgOTnA6C1w%40mail.gmail.com
Discussion: https://postgr.es/m/CA%2BhUKGLBRyu0rHrDCMC4%3DRn3252gogyp1SjOgG8SEKKZv%3DFwfQ%40mail.gmail.com
Discussion: https://postgr.es/m/20200228.170650.667613673625155850.horikyota.ntt%40gmail.com
2021-03-31 18:45:00 +09:00
Peter Geoghegan
30aaab26e5 PageAddItemExtended(): Add LP_UNUSED assertion.
Assert that LP_UNUSED items have no storage.  If it's worth having
defensive code in non-assert builds then it's worth having an assertion
as well.
2021-03-28 20:10:02 -07:00
Michael Paquier
909b449e00 Fix concurrency issues with WAL segment recycling on Windows
This commit is mostly a revert of aaa3aed, that switched the routine
doing the internal renaming of recycled WAL segments to use on Windows a
combination of CreateHardLinkA() plus unlink() instead of rename().  As
reported by several users of Postgres 13, this is causing concurrency
issues when manipulating WAL segments, mostly in the shape of the
following error:
LOG:  could not rename file "pg_wal/000000XX000000YY000000ZZ":
Permission denied

This moves back to a logic where a single rename() (well, pgrename() for
Windows) is used.  This issue has proved to be hard to hit when I tested
it, facing it only once with an archive_command that was not able to do
its work, so it is environment-sensitive.  The reporters of this issue
have been able to confirm that the situation improved once we switched
back to a single rename().  In order to check things, I have provided to
the reporters a patched build based on 13.2 with aaa3aed reverted, to
test if the error goes away, and an unpatched build of 13.2 to test if
the error still showed up (just to make sure that I did not mess up my
build process).

Extra thanks to Fujii Masao for pointing out what looked like the
culprit commit, and to all the reporters for taking the time to test
what I have sent them.

Reported-by: Andrus, Guy Burgess, Yaroslav Pashinsky, Thomas Trenz
Reviewed-by: Tom Lane, Andres Freund
Discussion: https://postgr.es/m/3861ff1e-0923-7838-e826-094cc9bef737@hot.ee
Discussion: https://postgr.es/m/16874-c3eecd319e36a2bf@postgresql.org
Discussion: https://postgr.es/m/095ccf8d-7f58-d928-427c-b17ace23cae6@burgess.co.nz
Discussion: https://postgr.es/m/16927-67c570d968c99567%40postgresql.org
Discussion: https://postgr.es/m/YFBcRbnBiPdGZvfW@paquier.xyz
Backpatch-through: 13
2021-03-22 14:02:26 +09:00
Peter Eisentraut
96ae658e62 Move lwlock-release probe back where it belongs
The documentation specifically states that lwlock-release fires before
any released waiters have been awakened.  It worked that way until
ab5194e6f6, where is seems to have been
misplaced accidentally.  Move it back where it belongs.

Author: Craig Ringer <craig.ringer@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/CAGRY4nwxKUS_RvXFW-ugrZBYxPFFM5kjwKT5O+0+Stuga5b4+Q@mail.gmail.com
2021-03-21 08:02:30 +01:00
Thomas Munro
61752afb26 Provide recovery_init_sync_method=syncfs.
Since commit 2ce439f3 we have opened every file in the data directory
and called fsync() at the start of crash recovery.  This can be very
slow if there are many files, leading to field complaints of systems
taking minutes or even hours to begin crash recovery.

Provide an alternative method, for Linux only, where we call syncfs() on
every possibly different filesystem under the data directory.  This is
equivalent, but avoids faulting in potentially many inodes from
potentially slow storage.

The new mode comes with some caveats, described in the documentation, so
the default value for the new setting is "fsync", preserving the older
behavior.

Reported-by: Michael Brown <michael.brown@discourse.org>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Paul Guo <guopa@vmware.com>
Reviewed-by: Bruce Momjian <bruce@momjian.us>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: David Steele <david@pgmasters.net>
Discussion: https://postgr.es/m/11bc2bb7-ecb5-3ad0-b39f-df632734cd81%40discourse.org
Discussion: https://postgr.es/m/CAEET0ZHGnbXmi8yF3ywsDZvb3m9CbdsGZgfTXscQ6agcbzcZAw%40mail.gmail.com
2021-03-20 12:07:28 +13:00
Tomas Vondra
cd91de0d17 Remove temporary files after backend crash
After a crash of a backend using temporary files, the files used to be
left behind, on the basis that it might be useful for debugging. But we
don't have any reports of anyone actually doing that, and it means the
disk usage may grow over time due to repeated backend failures (possibly
even hitting ENOSPC). So this behavior is a bit unfortunate, and fixing
it required either manual cleanup (deleting files, which is error-prone)
or restart of the instance (i.e. service disruption).

This implements automatic cleanup of temporary files, controled by a new
GUC remove_temp_files_after_crash. By default the files are removed, but
it can be disabled to restore the old behavior if needed.

Author: Euler Taveira
Reviewed-by: Tomas Vondra, Michael Paquier, Anastasia Lubennikova, Thomas Munro
Discussion: https://postgr.es/m/CAH503wDKdYzyq7U-QJqGn%3DGm6XmoK%2B6_6xTJ-Yn5WSvoHLY1Ww%40mail.gmail.com
2021-03-18 17:38:28 +01:00
Fujii Masao
d75288fb27 Make archiver process an auxiliary process.
This commit changes WAL archiver process so that it's treated as
an auxiliary process and can use shared memory. This is an infrastructure
patch required for upcoming shared-memory based stats collector patch
series. These patch series basically need any processes including archiver
that can report the statistics to access to shared memory. Since this patch
itself is useful to simplify the code and when users monitor the status of
archiver, it's committed separately in advance.

This commit simplifies the code for WAL archiving. For example, previously
backends need to signal to archiver via postmaster when they notify
archiver that there are some WAL files to archive. On the other hand,
this commit removes that signal to postmaster and enables backends to
notify archier directly using shared latch.

Also, as the side of this change, the information about archiver process
becomes viewable at pg_stat_activity view.

Author: Kyotaro Horiguchi
Reviewed-by: Andres Freund, Álvaro Herrera, Julien Rouhaud, Tomas Vondra, Arthur Zakirov, Fujii Masao
Discussion: https://postgr.es/m/20180629.173418.190173462.horiguchi.kyotaro@lab.ntt.co.jp
2021-03-15 13:13:14 +09:00
Thomas Munro
1b88b8908e Specialize checkpointer sort functions.
When sorting a potentially large number of dirty buffers, the
checkpointer can benefit from a faster sort routine.  One reported
improvement on a large buffer pool system was 1.4s -> 0.6s.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse%2BHG0Yig%40mail.gmail.com
2021-03-12 23:56:02 +13:00
Amit Kapila
519e4c9ee2 Fix size overflow in calculation introduced by commits d6ad34f3 and bea449c6.
Reported-by: Thomas Munro
Author: Takayuki Tsunakawa
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CA+hUKG+oPoFizjABt=GXZWTEHx3oev5rAe2scjW2r6F1rguo5w@mail.gmail.com
2021-03-12 15:42:08 +05:30
Thomas Munro
43c6662496 Minor modernization for README.barrier.
Itanium is very uncommon and being discontinued.  ARM is everywhere.
Prefer ARM as an example of an architecture with weak memory ordering.
2021-03-12 15:36:16 +13:00
Thomas Munro
d87251048a Replace buffer I/O locks with condition variables.
1.  Backends waiting for buffer I/O are now interruptible.

2.  If something goes wrong in a backend that is currently performing
I/O, waiting backends no longer wake up until that backend reaches
AbortBufferIO() and broadcasts on the CV.  Previously, any waiters would
wake up (because the I/O lock was automatically released) and then
busy-loop until AbortBufferIO() cleared BM_IO_IN_PROGRESS.

3.  LWLockMinimallyPadded is removed, as it would now be unused.

Author: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier version, 2016)
Discussion: https://postgr.es/m/CA%2BhUKGJ8nBFrjLuCTuqKN0pd2PQOwj9b_jnsiGFFMDvUxahj_A%40mail.gmail.com
Discussion: https://postgr.es/m/CA+Tgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr=C56Xng@mail.gmail.com
2021-03-11 10:36:17 +13:00
Thomas Munro
bd69ddfcdb Remove obsolete comment for WaitForProcSignalBarrier().
Commit 814f1d8b removed the behavior described.

Reported-by: Amit Kapila <amit.kapila16@gmail.com>
2021-03-02 09:30:57 +13:00
Thomas Munro
f5a5773a9d Allow condition variables to be used in interrupt code.
Adjust the condition variable sleep loop to work correctly when code
reached by its internal CHECK_FOR_INTERRUPTS() call interacts with
another condition variable.

There are no such cases currently, but a proposed patch would do this.

Discussion: https://postgr.es/m/CA+hUKGLdemy2gBm80kz20GTe6hNVwoErE8KwcJk6-U56oStjtg@mail.gmail.com
2021-03-01 17:24:47 +13:00
Thomas Munro
814f1d8bc3 Use condition variables for ProcSignalBarriers.
Instead of a poll/sleep loop, use a condition variable for precise
wake-up whenever a backend's pss_barrierGeneration advances.

Discussion: https://postgr.es/m/CA+hUKGLdemy2gBm80kz20GTe6hNVwoErE8KwcJk6-U56oStjtg@mail.gmail.com
2021-03-01 17:23:43 +13:00
Thomas Munro
6148656a0b Use EVFILT_SIGNAL for kqueue latches.
Cut down on system calls and other overheads by waiting for SIGURG
explicitly with kqueue instead of using a signal handler and self-pipe.
Affects *BSD and macOS systems.

This leaves only the poll implementation with a signal handler and the
traditional self-pipe trick.

Discussion: https://postgr.es/m/CA+hUKGJjxPDpzBE0a3hyUywBvaZuC89yx3jK9RFZgfv_KHU7gg@mail.gmail.com
2021-03-01 14:20:04 +13:00