1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-21 00:42:43 +03:00
Commit Graph

687 Commits

Author SHA1 Message Date
Andres Freund
d6d8054dc7 localbuf: Track pincount in BufferDesc as well
For AIO on temporary table buffers the AIO subsystem needs to be able to
ensure a pin on a buffer while AIO is going on, even if the IO issuing query
errors out. Tracking the buffer in LocalRefCount does not work, as it would
cause CheckForLocalBufferLeaks() to assert out.

Instead, also track the refcount in BufferDesc.state, not just
LocalRefCount. This also makes local buffers behave a bit more akin to shared
buffers.

Note that we still don't need locking, AIO completion callbacks for local
buffers are executed in the issuing session (i.e. nobody else has access to
the BufferDesc).

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
2025-03-29 16:36:51 -04:00
Andres Freund
08ccd56ac7 aio, bufmgr: Comment fixes/improvements
Some of these comments have been wrong for a while (12f3867f55), some I
recently introduced (da7226993f, 55b454d0e1). This includes an update to a
comment in FlushBuffer(), which will be copied in a future commit.

These changes seem big enough to be worth doing in separate commits.

Suggested-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20250319212530.80.nmisch@google.com
2025-03-29 14:45:42 -04:00
Andres Freund
dee8002468 Fix mis-attribution of checksum failure stats to the wrong database
Checksum failure stats could be attributed to the wrong database in two cases:

- when a read of a shared relation encountered a checksum error , it would be
  attributed to the current database, instead of the "database" representing
  shared relations

- when using CREATE DATABASE ... STRATEGY WAL_LOG checksum errors in the
  source database would be attributed to the current database

The checksum stats reporting via PageIsVerifiedExtended(PIV_REPORT_STAT) does
not have access to the information about what database a page belongs to.

This fixes the issue by removing PIV_REPORT_STAT and delegating the
responsibility to report stats to the caller, which now can learn about the
number of stats via a new optional argument.

As this changes the signature of PageIsVerifiedExtended() and all callers
should adapt to the new signature, use the occasion to rename the function to
PageIsVerified() and remove the compatibility macro.

We could instead have fixed this by adding information about the database to
the args of PageIsVerified(), but there are soon-to-be-applied patches that
need to separate the stats reporting from the PageIsVerified() call
anyway. Those patches also include testing for the failure paths, something we
inexplicably have not had.

As there is no caller of pgstat_report_checksum_failure() left, remove it.

It'd be possible, but awkward to fix this in the back branches. We considered
doing the work not quite worth it, as mis-attributed stats should still elicit
concern. The emitted error messages do allow to attribute the errors
correctly.

Discussion: https://postgr.es/m/5tyic6epvdlmd6eddgelv47syg2b5cpwffjam54axp25xyq2ga@ptwkinxqo3az
Discussion: https://postgr.es/m/mglpvvbhighzuwudjxzu4br65qqcxsnyvio3nl4fbog3qknwhg@e4gt7npsohuz
2025-03-29 13:38:35 -04:00
Thomas Munro
ce1a75c4fe Support buffer forwarding in StartReadBuffers().
StartReadBuffers() reports a short read when it finds a cached block
that ends a range needing I/O by updating the caller's *nblocks.  It
doesn't want to have to unpin the trailing hit that it knows the caller
wants, so the v17 version used sleight of hand in the name of
simplicity: it included it in *nblocks as if it were part of the I/O,
but internally tracked the shorter real I/O size in io_buffers_len (now
removed).

This API change "forwards" the delimiting buffer to the next call.  It's
still pinned, and still stored in the caller's array, but *nblocks no
longer includes stray buffers that are not really part of the operation.
The expectation is that the caller still wants the rest of the blocks
and will call again starting from that point, and now it can pass the
already pinned buffer back in (or choose not to and release it).

The change is needed for the coming asynchronous I/O version's larger
version of the problem: by definition it must move BM_IO_IN_PROGRESS
negotiation from WaitReadBuffers() to StartReadBuffers(), but it might
already have many buffers pinned before it discovers a need to split an
I/O.  (The current synchronous I/O version hides that detail from
callers by looping over smaller reads if required to make all covered
buffers valid in WaitReadBuffers(), so it looks like one operation but
it might occasionally be several under the covers.)

Aside from avoiding unnecessary pin traffic, this will also be important
for later work on out-of-order streams: you can't prioritize data that
is already available right now if that fact is hidden from you.

The new API is natural for read_stream.c (see ed0b87ca).  After a short
read it leaves forwarded buffers where they fell in its circular queue
for the continuing call to pick up.

Single-block StartReadBuffer() and traditional ReadBuffer() share code
but are not affected by the change.  They don't do multi-block I/O.

Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions)
Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
2025-03-21 20:43:59 +13:00
Andres Freund
202b12774d bufmgr: Improve stats when a buffer is read in concurrently
Previously we would have the following inaccuracies when a backend tried to
read in a buffer, but that buffer was read in concurrently by another backend:
- the read IO was double-counted in the global buffer access stats (pgBufferUsage)
- the buffer hit was not accounted for in:
  - global buffer access statistics
  - pg_stat_io
  - relation level IO stats
  - vacuum cost balancing

While trying to read in a buffer that is concurrently read in by another
backend is not a common occurrence, it's also not that rare, e.g. due to
concurrent sequential scans on the same relation.  This scenario has become
more likely in PG 17, due to the introducing of read streams, which can pin
multiple buffers before calling StartBufferIO() for all the buffers.

This behaviour has historically grown, but there doesn't seem to be any reason
to continue with the wrong accounting.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_Zk-B08AzPsO-6680LUHLOCGaNJYofaxTFseLa=OepV1g@mail.gmail.com
2025-03-20 19:58:22 -04:00
Thomas Munro
10f6646847 Introduce io_max_combine_limit.
The existing io_combine_limit can be changed by users.  The new
io_max_combine_limit is fixed at server startup time, and functions as a
silent clamp on the user setting.  That in itself is probably quite
useful, but the primary motivation is:

aio_init.c allocates shared memory for all asynchronous IOs including
some per-block data, and we didn't want to waste memory you'd never used
by assuming they could be up to PG_IOV_MAX.  This commit already halves
the size of 'AioHandleIov' and 'AioHandleData'.  A follow-up commit can
now expand PG_IOV_MAX without affecting that.

Since our GUC system doesn't support dependencies or cross-checks
between GUCs, the user-settable one now assigns a "raw" value to
io_combine_limit_guc, and the lower of io_combine_limit_guc and
io_max_combine_limit is maintained in io_combine_limit.

Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version)
Discussion: https://postgr.es/m/CA%2BhUKG%2B2T9p-%2BzM6Eeou-RAJjTML6eit1qn26f9twznX59qtCA%40mail.gmail.com
2025-03-19 15:23:54 +13:00
Andres Freund
771ba90298 localbuf: Introduce StartLocalBufferIO()
To initiate IO on a shared buffer we have StartBufferIO(). For temporary table
buffers no similar function exists - likely because the code for that
currently is very simple due to the lack of concurrency.

However, the upcoming AIO support will make it possible to re-encounter a
local buffer, while the buffer already is the target of IO. In that case we
need to wait for already in-progress IO to complete. This commit makes it
easier to add the necessary code, by introducing StartLocalBufferIO().

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com
2025-03-15 22:07:48 -04:00
Andres Freund
4b4d33b9ea localbuf: Introduce FlushLocalBuffer()
Previously we had two paths implementing writing out temporary table
buffers. For shared buffers, the logic for that is centralized in
FlushBuffer(). Introduce FlushLocalBuffer() to do the same for local buffers.

Besides being a nice cleanup on its own, it also makes an upcoming change
slightly easier.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com
2025-03-15 22:07:48 -04:00
Andres Freund
dd6f2618f6 localbuf: Introduce TerminateLocalBufferIO()
Previously TerminateLocalBufferIO() was open-coded in multiple places, which
doesn't seem like a great idea. While TerminateLocalBufferIO() currently is
rather simple, an upcoming patch requires additional code to be added to
TerminateLocalBufferIO(), making this modification particularly worthwhile.

For some reason FlushRelationBuffers() previously cleared BM_JUST_DIRTIED,
even though that's never set for temporary buffers. This is not carried over
as part of this change.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com
2025-03-15 22:07:48 -04:00
Andres Freund
0762a151b0 localbuf: Introduce InvalidateLocalBuffer()
Previously, there were three copies of this code, two of them
identical. There's no good reason for that.

This change is nice on its own, but the main motivation is the AIO patchset,
which needs to add extra checks the deduplicated code, which of course is
easier if there is only one version.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com
2025-03-15 22:07:48 -04:00
Andres Freund
fa6af9b25e localbuf: Fix dangerous coding pattern in GetLocalVictimBuffer()
If PinLocalBuffer() were to modify the buf_state, the buf_state in
GetLocalVictimBuffer() would be out of date. Currently that does not happen,
as PinLocalBuffer() only modifies the buf_state if adjust_usagecount=true and
GetLocalVictimBuffer() passes false.

However, it's easy to make this not the case anymore - it cost me a few hours
to debug the consequences.

The minimal fix would be to just refetch the buf_state after after calling
PinLocalBuffer(), but the same danger exists in later parts of the
function. Instead, declare buf_state in the narrower scopes and re-read the
state in conditional branches.  Besides being safer, it also fits well with
an upcoming set of cleanup patches that move the contents of the conditional
branches in GetLocalVictimBuffer() into helper functions.

I "broke" this in 794f259447.

Arguably this should be backpatched, but as the relevant functions are not
exported and there is no actual misbehaviour, I chose to not backpatch, at
least for now.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com
2025-03-15 22:07:48 -04:00
Thomas Munro
01261fb078 Improve buffer manager API for backend pin limits.
Previously the support functions assumed that the caller needed one pin
to make progress, and could optionally use some more, allowing enough
for every connection to do the same.  Add a couple more functions for
callers that want to know:

* what the maximum possible number could be, irrespective of currently
  held pins, for space planning purposes

* how many additional pins they could acquire right now, without the
  special case allowing one pin, for callers that already hold pins and
  could already make progress even if no extra pins are available

The pin limit logic began in commit 31966b15.  This refactoring is
better suited to read_stream.c, which will be adjusted to respect the
remaining limit as it changes over time in a follow-up commit.  It also
computes MaxProportionalPins up front, to avoid performing divisions
whenever a caller needs to check the balance.

Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions)
Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
2025-03-14 17:13:09 +13:00
Michael Paquier
6c349d83b6 Re-add GUC track_wal_io_timing
This commit is a rework of 2421e9a51d, about which Andres Freund has
raised some concerns as it is valuable to have both track_io_timing and
track_wal_io_timing in some cases, as the WAL write and fsync paths can
be a major bottleneck for some workloads.  Hence, it can be relevant to
not calculate the WAL timings in environments where pg_test_timing
performs poorly while capturing some IO data under track_io_timing for
the non-WAL IO paths.  The opposite can be also true: it should be
possible to disable the non-WAL timings and enable the WAL timings (the
previous GUC setups allowed this possibility).

track_wal_io_timing is added back in this commit, controlling if WAL
timings should be calculated in pg_stat_io for the read, fsync and write
paths, as done previously with pg_stat_wal.  pg_stat_wal previously
tracked only the sync and write parts (now removed), read stats is new
data tracked in pg_stat_io, all three are aggregated if
track_wal_io_timing is enabled.  The read part matters during recovery
or if a XLogReader is used.

Extra note: more control over if the types of timings calculated in
pg_stat_io could be done with a GUC that lists pairs of (IOObject,IOOp).

Reported-by: Andres Freund <andres@anarazel.de>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/3opf2wh2oljco6ldyqf7ukabw3jijnnhno6fjb4mlu6civ5h24@fcwmhsgmlmzu
2025-02-26 09:49:59 +09:00
Andres Freund
37c87e63f9 Change relpath() et al to return path by value
For AIO, and also some other recent patches, we need the ability to call
relpath() in a critical section. Until now that was not feasible, as it
allocated memory.

The fact that relpath() allocated memory also made it awkward to use in log
messages because we had to take care to free the memory afterwards. Which we
e.g. didn't do for when zeroing out an invalid buffer.

We discussed other solutions, e.g. filling a pre-allocated buffer that's
passed to relpath(), but they all came with plenty downsides or were larger
projects. The easiest fix seems to be to make relpath() return the path by
value.

To be able to return the path by value we need to determine the maximum length
of a relation path. This patch adds a long #define that computes the exact
maximum, which is verified to be correct in a regression test.

As this change the signature of relpath(), extensions using it will need to
adapt their code. We discussed leaving a backward-compat shim in place, but
decided it's not worth it given the use of relpath() doesn't seem widespread.

Discussion: https://postgr.es/m/xeri5mla4b5syjd5a25nok5iez2kr3bm26j2qn4u7okzof2bmf@kwdh2vf7npra
2025-02-25 09:02:07 -05:00
Michael Paquier
2421e9a51d Remove read/sync fields from pg_stat_wal and GUC track_wal_io_timing
The four following attributes are removed from pg_stat_wal:
* wal_write
* wal_sync
* wal_write_time
* wal_sync_time

a051e71e28 has added an equivalent of this information in pg_stat_io
with more granularity as this now spreads across the backend types, IO
context and IO objects.  So, keeping the same information in pg_stat_wal
has little benefits.

Another benefit of this commit is the removal of PendingWalStats,
simplifying an upcoming patch to add per-backend WAL statistics, which
already support IO statistics and which have access to the write/sync
stats data of WAL.

The GUC track_wal_io_timing, that was used to enable or disable the
aggregation of the write and sync timings for WAL, is also removed.
pgstat_prepare_io_time() is simplified.

Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, due to the update of PgStat_WalStats.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal
2025-02-24 09:51:56 +09:00
Richard Guo
71d02dc478 Fix unsafe access to BufferDescriptors
When considering a local buffer, the GetBufferDescriptor() call in
BufferGetLSNAtomic() would be retrieving a shared buffer with a bad
buffer ID.  Since the code checks whether the buffer is shared before
using the retrieved BufferDesc, this issue did not lead to any
malfunction.  Nonetheless this seems like trouble waiting to happen,
so fix it by ensuring that GetBufferDescriptor() is only called when
we know the buffer is shared.

Author: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAHewXNku-o46-9cmUgyv6LkSZ25doDrWq32p=oz9kfD8ovVJMg@mail.gmail.com
Backpatch-through: 13
2025-02-19 11:05:35 +09:00
Thomas Munro
9e17ac997f Remove obsolete comment.
Commit 755a4c10d1 prevented StartReadBuffers() from crossing md.c
segment boundaries in one operation, but a comment about that
possibility remained.
2025-02-14 13:16:05 +13:00
Peter Eisentraut
827b4060a8 Remove unnecessary (char *) casts [mem]
Remove (char *) casts around memory functions such as memcmp(),
memcpy(), or memset() where the cast is useless.  Since these
functions don't take char * arguments anyway, these casts are at best
complicated casts to (void *), about which see commit 7f798aca1d.

Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
2025-02-12 08:50:13 +01:00
Michael Paquier
1e380fa7d8 Fix comment of StrategySyncStart()
The top comment of StrategySyncStart() mentions BufferSync(), but this
function calls BgBufferSync(), not BufferSync().

Oversight in 9cd00c457e.

Author: Ashutosh Bapat
Discussion: https://postgr.es/m/CAExHW5tgkjag8i-s=RFrCn5KAWDrC4zEPPkfUKczfccPOxBRQQ@mail.gmail.com
Backpatch-through: 13
2025-01-31 11:05:57 +09:00
Tom Lane
f6ff75f796 Make BufferIsExclusiveLocked and BufferIsDirty work for local buffers.
These functions tried to check the state of the buffer's content lock
even for local buffers.  Since we don't use the content lock for a
local buffer, that would lead to a "false" result from
LWLockHeldByMeInMode, which would mean a misleading "false" answer
from BufferIsExclusiveLocked (we'd rather that case always return
"true") or an assertion failure in BufferIsDirty.

The core code never applies these two functions to local buffers,
and apparently no extensions do either, since we've not heard
complaints.  Still, in the name of future-proofing, let's fix
them to act as though a pinned local buffer is content-locked.

Author: Srinath Reddy <srinath2133@gmail.com>
Discussion: https://postgr.es/m/19396ef77f8.1098c4a1810508.2255483659262451647@zohocorp.com
2025-01-29 13:23:31 -05:00
Tom Lane
23d7562018 Remove PrintBufferDescs() and PrintPinnedBufs().
These have been #ifdef'd out for a long time, and in fact have
been uncompilable since commit 48354581a of 2016-04-10.  The
fact that nobody noticed for so long demonstrates their lack of
usefulness, so let's remove them rather than fix them.

Author: Jacob Brazeal <jacob.brazeal@gmail.com>
Discussion: https://postgr.es/m/CA+COZaB+9CN_f63PPRoVhHjYmCwwmb_9CWLxqCJdMWDqs1a-JA@mail.gmail.com
2025-01-19 14:00:22 -05:00
Michael Paquier
f92c854cf4 Make pg_stat_io count IOs as bytes instead of blocks for some operations
Currently in pg_stat_io view, IOs are counted as blocks of size
BLCKSZ.  There are two limitations with this design:
* The actual number of I/O requests sent to the kernel is lower because
I/O requests may be merged before being sent.  Additionally, it gives
the impression that all I/Os are done in block size, which shadows the
benefits of merging I/O requests.
* Some patches are under work to extend pg_stat_io for the tracking of
operations that may not be linked to the block size.  For example, WAL
read IOs are done in variable bytes and it is not possible to correctly
show these IOs in pg_stat_io view, and we want to keep all this data in
a single system view rather than spread it across multiple relations to
ease monitoring.

WaitReadBuffers() can now be tracked as a single read operation
worth N blocks.  Same for ExtendBufferedRelShared() and
ExtendBufferedRelLocal() for extensions.

Three columns are added to pg_stat_io for reads, writes and extensions
for the byte calculations.  op_bytes, which was always hardcoded to
BLCKSZ, is removed.  IO backend statistics are updated to reflect these
changes.

Bump catalog version.

Author: Nazir Bilal Yavuz
Reviewed-by: Bertrand Drouvot, Melanie Plageman
Discussion: https://postgr.es/m/CAN55FZ0oqxBaaHAEsj=xFqkzE3n5P=3RA1V_igXwL-RV7QRzyw@mail.gmail.com
2025-01-14 12:14:29 +09:00
Michael Paquier
f0bf7857be Merge pgstat_count_io_op_n() and pgstat_count_io_op()
The pgstat_count_io_op() function, which counts a single I/O operation,
wraps pgstat_count_io_op_n() with a counter value of 1.  The latter is
declared in pgstat.h and used nowhere in the code, so let's remove it in
favor of the former.

This change makes also the code more symmetric with
pgstat_count_io_op_time(), that already uses a similar set of arguments,
except that it counts also the I/O time.  This will ease a bit the
integration of a follow-up patch that adds byte-level tracking in
pg_stat_io for some of its attributes, lifting the current restriction
based on BLCKSZ as all I/O operations are assumed to be block-based.

Author: Nazir Bilal Yavuz
Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/CAN55FZ32ze812=yjyZg1QeXhKvACUM_Nu0_gyPQcUKKuVHL5xA@mail.gmail.com
2025-01-10 09:57:27 +09:00
Bruce Momjian
50e6eb731d Update copyright for 2025
Backpatch-through: 13
2025-01-01 11:21:55 -05:00
Peter Eisentraut
8743ea1b2e Remove useless casts to (const void *)
Similar to commit 7f798aca1d, but I didn't think to look for "const"
as well.
2024-12-06 18:49:01 +01:00
Peter Eisentraut
7f798aca1d Remove useless casts to (void *)
Many of them just seem to have been copied around for no real reason.
Their presence causes (small) risks of hiding actual type mismatches
or silently discarding qualifiers

Discussion: https://www.postgresql.org/message-id/flat/461ea37c-8b58-43b4-9736-52884e862820@eisentraut.org
2024-11-28 08:27:20 +01:00
Peter Eisentraut
a274bbb1b3 Remove a useless cast to (void *) in hash_search() call
This pattern was previously cleaned up in 54a177a948, but a new
instance snuck in around the same time in 31966b151e.
2024-11-14 09:30:14 +01:00
Peter Eisentraut
e18512c000 Remove unused #include's from backend .c files
as determined by IWYU

These are mostly issues that are new since commit dbbca2cf29.

Discussion: https://www.postgresql.org/message-id/flat/0df1d5b1-8ca8-4f84-93be-121081bde049%40eisentraut.org
2024-10-27 08:26:50 +01:00
Michael Paquier
57a36e890d Fix grammar of a comment in bufmgr.c
Author: Junwang Zhao
Discussion: https://postgr.es/m/CAEG8a3L5YjxXCjx0LhkwHdDGsNgpFGEqH7SqtXRPNP+dwFMVZQ@mail.gmail.com
2024-10-21 11:25:29 +09:00
Andres Freund
755a4c10d1 bufmgr/smgr: Don't cross segment boundaries in StartReadBuffers()
With real AIO it doesn't make sense to cross segment boundaries with one
IO. Add smgrmaxcombine() to allow upper layers to query which buffers can be
merged.

We could continue to cross segment boundaries when not using AIO, but it
doesn't really make sense, because md.c will never be able to perform the read
across the segment boundary in one system call. Which means we'll mark more
buffers as undergoing IO than really makes sense - if another backend desires
to read the same blocks, it'll be blocked longer than necessary. So it seems
better to just never cross the boundary.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi
2024-10-08 11:37:45 -04:00
Andres Freund
488f826c72 bufmgr: Return early in ScheduleBufferTagForWriteback() if fsync=off
As pg_flush_data() doesn't do anything with fsync disabled, there's no point
in tracking the buffer for writeback. Arguably the better fix would be to
change pg_flush_data() to flush data even with fsync off, but that's a
behavioral change, whereas this is just a small optimization.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi
2024-10-08 11:37:45 -04:00
Noah Misch
c582b75851 Add block_range_read_stream_cb(), to deduplicate code.
This replaces two functions for iterating over all blocks in a range.  A
pending patch will use this instead of adding a third.

Nazir Bilal Yavuz

Discussion: https://postgr.es/m/20240820184742.f2.nmisch@google.com
2024-09-03 10:46:20 -07:00
Daniel Gustafsson
31a98934d1 Fix typos in code comments and test data
The typos in 005_negotiate_encryption.pl and pg_combinebackup.c
shall be backported to v17 where they were introduced.

Backpatch-through: v17
Discussion: https://postgr.es/m/Ztaj7BkN4658OMxF@paquier.xyz
2024-09-03 11:33:38 +02:00
Heikki Linnakangas
478846e768 Rename some shared memory initialization routines
To make them follow the usual naming convention where
FoobarShmemSize() calculates the amount of shared memory needed by
Foobar subsystem, and FoobarShmemInit() performs the initialization.

I didn't rename CreateLWLocks() and InitShmmeIndex(), because they are
a little special. They need to be called before any of the other
ShmemInit() functions, because they set up the shared memory
bookkeeping itself. I also didn't rename InitProcGlobal(), because
unlike other Shmeminit functions, it's not called by individual
backends.

Reviewed-by: Andreas Karlsson
Discussion: https://www.postgresql.org/message-id/c09694ff-2453-47e5-b26c-32a16cd75ce6@iki.fi
2024-08-29 09:46:21 +03:00
Masahiko Sawada
c584781bcc Use pgBufferUsage for buffer usage tracking in analyze.
Previously, (auto)analyze used global variables VacuumPageHit,
VacuumPageMiss, and VacuumPageDirty to track buffer usage. However,
pgBufferUsage provides a more generic way to track buffer usage with
support functions.

This change replaces those global variables with pgBufferUsage in
analyze. Since analyze was the sole user of those variables, it
removes their declarations. Vacuum previously used those variables but
replaced them with pgBufferUsage as part of a bug fix, commit
5cd72cc0c.

Additionally, it adjusts the buffer usage message in both vacuum and
analyze for better consistency.

Author: Anthonin Bonnefoy
Reviewed-by: Masahiko Sawada, Michael Paquier
Discussion: https://postgr.es/m/CAO6_Xqr__kTTCLkftqS0qSCm-J7_xbRG3Ge2rWhucxQJMJhcRA%40mail.gmail.com
2024-08-13 18:49:45 -07:00
Noah Misch
840b3b5b4e Fix private struct field name to match the code using it.
Commit 8720a15e9a added the wrong name.

Nazir Bilal Yavuz

Discussion: https://postgr.es/m/20240720181405.5a.nmisch@google.com
2024-07-23 05:32:03 -07:00
Noah Misch
8720a15e9a Use read streams in CREATE DATABASE when STRATEGY=WAL_LOG.
While this doesn't significantly change runtime now, it arranges for
STRATEGY=WAL_LOG to benefit automatically from future optimizations to
the read_stream subsystem.  For large tables in the template database,
this does read 16x as many bytes per system call.  Platforms with high
per-call overhead, if any, may see an immediate benefit.

Nazir Bilal Yavuz

Discussion: https://postgr.es/m/CAN55FZ0JKL6vk1xQp6rfOXiNFV1u1H0tJDPPGHWoiO3ea2Wc=A@mail.gmail.com
2024-07-20 04:22:12 -07:00
Noah Misch
af07a827b9 Refactor PinBufferForBlock() to remove checks about persistence.
There are checks in PinBufferForBlock() function to set persistence of
the relation.  This function is called for each block in the relation.
Instead, set persistence of the relation before PinBufferForBlock().

Nazir Bilal Yavuz

Discussion: https://postgr.es/m/CAN55FZ0JKL6vk1xQp6rfOXiNFV1u1H0tJDPPGHWoiO3ea2Wc=A@mail.gmail.com
2024-07-20 04:22:12 -07:00
Noah Misch
e00c45f685 Remove "smgr_persistence == 0" dead code.
Reaching that code would have required multiple processes performing
relation extension during recovery, which does not happen.  That caller
has the persistence available, so pass it.  This was dead code as soon
as commit 210622c60e added it.

Discussion: https://postgr.es/m/CAN55FZ0JKL6vk1xQp6rfOXiNFV1u1H0tJDPPGHWoiO3ea2Wc=A@mail.gmail.com
2024-07-20 04:22:12 -07:00
Thomas Munro
e656657f2b Fix RBM_ZERO_AND_LOCK.
Commit 210622c6 accidentally zeroed out pages even if they were found in
the buffer pool.  It should always lock the page, but it should only
zero pages that were not already valid.  Otherwise, concurrent readers
that hold only a pin could see corrupted page contents changing under
their feet.

While here, rename ZeroAndLockBuffer() to match the RBM_ flag name.
Also restore a some useful comments lost by 210622c6's refactoring, and
add some new ones to clarify why we need to use the BM_IO_IN_PROGRESS
infrastructure despite not doing I/O.

Reported-by: Noah Misch <noah@leadboat.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> (earlier version)
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier version)
Discussion: https://postgr.es/m/20240512171658.7e.nmisch@google.com
Discussion: https://postgr.es/m/7ed10231-ce47-03d5-d3f9-4aea0dc7d5a4%40gmail.com
2024-06-10 12:32:59 +12:00
Peter Eisentraut
17974ec259 Revise GUC names quoting in messages again
After further review, we want to move in the direction of always
quoting GUC names in error messages, rather than the previous (PG16)
wildly mixed practice or the intermittent (mid-PG17) idea of doing
this depending on how possibly confusing the GUC name is.

This commit applies appropriate quotes to (almost?) all mentions of
GUC names in error messages.  It partially supersedes a243569bf6 and
8d9978a717, which had moved things a bit in the opposite direction
but which then were abandoned in a partial state.

Author: Peter Smith <smithpb2250@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w%40mail.gmail.com
2024-05-17 11:44:26 +02:00
Daniel Gustafsson
950d4a2cb1 Fix typos and duplicate words
This fixes various typos, duplicated words, and tiny bits of whitespace
mainly in code comments but also in docs.

Author: Daniel Gustafsson <daniel@yesql.se>
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Author: Alexander Lakhin <exclusion@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se
2024-04-18 21:28:07 +02:00
Masahiko Sawada
810f64a015 Revert indexed and enlargable binary heap implementation.
This reverts commit b840508644 and bcb14f4abc. These commits were made
for commit 5bec1d6bc5 (Improve eviction algorithm in ReorderBuffer
using max-heap for many subtransactions). However, per discussion,
commit efb8acc0d0 replaced binary heap + index with pairing heap, and
made these commits unnecessary.

Reported-by: Jeff Davis
Discussion: https://postgr.es/m/12747c15811d94efcc5cda72d6b35c80d7bf3443.camel%40j-davis.com
2024-04-11 17:18:05 +09:00
Thomas Munro
13453eedd3 Add pg_buffercache_evict() function for testing.
When testing buffer pool logic, it is useful to be able to evict
arbitrary blocks.  This function can be used in SQL queries over the
pg_buffercache view to set up a wide range of buffer pool states.  Of
course, buffer mappings might change concurrently so you might evict a
block other than the one you had in mind, and another session might
bring it back in at any time.  That's OK for the intended purpose of
setting up developer testing scenarios, and more complicated interlocking
schemes to give stronger guararantees about that would likely be less
flexible for actual testing work anyway.  Superuser-only.

Author: Palak Chaturvedi <chaturvedipalak1911@gmail.com>
Author: Thomas Munro <thomas.munro@gmail.com> (docs, small tweaks)
Reviewed-by: Nitin Jadhav <nitinjadhavpostgres@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Cary Huang <cary.huang@highgo.ca>
Reviewed-by: Cédric Villemain <cedric.villemain+pgsql@abcsql.com>
Reviewed-by: Jim Nasby <jim.nasby@gmail.com>
Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CALfch19pW48ZwWzUoRSpsaV9hqt0UPyaBPC4bOZ4W+c7FF566A@mail.gmail.com
2024-04-08 16:23:40 +12:00
Thomas Munro
98f320eb2e Increase default vacuum_buffer_usage_limit to 2MB.
The BAS_VACUUM ring size has been 256kB since commit d526575f introduced
the mechanism 17 years ago.  Commit 1cbbee03 recently made it
configurable but retained the traditional default.  The correct default
size has been debated for years, but 256kB is certainly very small.
VACUUM soon needs to write back data it dirtied only 32 blocks ago,
which usually requires flushing the WAL.  New experiments in prefetching
pages for VACUUM exacerbated the problem by crashing into dirty data
even sooner.  Let's make the default 2MB.  That's 1.6% of the default
toy buffer pool size, and 0.2% of 1GB, which would be a considered a
small shared_buffers setting for a real system these days.  Users are
still free to set the GUC to a different value.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20240403221257.md4gfki3z75cdyf6%40awork3.anarazel.de
Discussion: https://postgr.es/m/CA%2BhUKGLY4Q4ZY4f1rvnFtv6%2BPkjNf8MejdPkcju3Qii9DYqqcQ%40mail.gmail.com
2024-04-06 23:12:03 +13:00
Thomas Munro
3bd8439ed6 Allow BufferAccessStrategy to limit pin count.
While pinning extra buffers to look ahead, users of strategies are in
danger of using too many buffers.  For some strategies, that means
"escaping" from the ring, and in others it means forcing dirty data to
disk very frequently with associated WAL flushing.  Since external code
has no insight into any of that, allow individual strategy types to
expose a clamp that should be applied when deciding how many buffers to
pin at once.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/CAAKRu_aJXnqsyZt6HwFLnxYEBgE17oypkxbKbT1t1geE_wvH2Q%40mail.gmail.com
2024-04-06 23:11:45 +13:00
Masahiko Sawada
b840508644 Add functions to binaryheap for efficient key removal and update.
Previously, binaryheap didn't support updating a key and removing a
node in an efficient way. For example, in order to remove a node from
the binaryheap, the caller had to pass the node's position within the
array that the binaryheap internally has. Removing a node from the
binaryheap is done in O(log n) but searching for the key's position is
done in O(n).

This commit adds a hash table to binaryheap in order to track the
position of each nodes in the binaryheap. That way, by using newly
added functions such as binaryheap_update_up() etc., both updating a
key and removing a node can be done in O(1) on an average and O(log n)
in worst case. This is known as the indexed binary heap. The caller
can specify to use the indexed binaryheap by passing indexed = true.

The current code does not use the new indexing logic, but it will be
used by an upcoming patch.

Reviewed-by: Vignesh C, Peter Smith, Hayato Kuroda, Ajin Cherian,
Tomas Vondra, Shubham Khanna
Discussion: https://postgr.es/m/CAD21AoDffo37RC-eUuyHJKVEr017V2YYDLyn1xF_00ofptWbkg%40mail.gmail.com
2024-04-03 10:44:21 +09:00
Thomas Munro
210622c60e Provide vectored variant of ReadBuffer().
Break ReadBuffer() up into two steps.  StartReadBuffers() and
WaitReadBuffers() give us two main advantages:

1.  Multiple consecutive blocks can be read with one system call.
2.  Advice (hints of future reads) can optionally be issued to the
kernel ahead of time.

The traditional ReadBuffer() function is now implemented in terms of
those functions, to avoid duplication.

A new GUC io_combine_limit is defined, and the functions for limiting
per-backend pin counts are made into public APIs.  Those are provided
for use by callers of StartReadBuffers(), when deciding how many buffers
to read at once.  The following commit will add a higher level mechanism
for doing that automatically with a practical interface.

With some more infrastructure in later work, StartReadBuffers() could
be extended to start real asynchronous I/O instead of just issuing
advice and leaving WaitReadBuffers() to do the work synchronously.

Author: Thomas Munro <thomas.munro@gmail.com>
Author: Andres Freund <andres@anarazel.de> (some optimization tweaks)
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Tested-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
2024-04-03 00:23:20 +13:00
Peter Eisentraut
dbbca2cf29 Remove unused #include's from backend .c files
as determined by include-what-you-use (IWYU)

While IWYU also suggests to *add* a bunch of #include's (which is its
main purpose), this patch does not do that.  In some cases, a more
specific #include replaces another less specific one.

Some manual adjustments of the automatic result:

- IWYU currently doesn't know about includes that provide global
  variable declarations (like -Wmissing-variable-declarations), so
  those includes are being kept manually.

- All includes for port(ability) headers are being kept for now, to
  play it safe.

- No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the
  patch from exploding in size.

Note that this patch touches just *.c files, so nothing declared in
header files changes in hidden ways.

As a small example, in src/backend/access/transam/rmgr.c, some IWYU
pragma annotations are added to handle a special case there.

Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
2024-03-04 12:02:20 +01:00
Heikki Linnakangas
024c521117 Replace BackendIds with 0-based ProcNumbers
Now that BackendId was just another index into the proc array, it was
redundant with the 0-based proc numbers used in other places. Replace
all usage of backend IDs with proc numbers.

The only place where the term "backend id" remains is in a few pgstat
functions that expose backend IDs at the SQL level. Those IDs are now
in fact 0-based ProcNumbers too, but the documentation still calls
them "backend ids". That term still seems appropriate to describe what
the numbers are, so I let it be.

One user-visible effect is that pg_temp_0 is now a valid temp schema
name, for backend with ProcNumber 0.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi
2024-03-03 19:38:22 +02:00