postgres

mirror of https://github.com/postgres/postgres.git synced 2025-11-21 00:42:43 +03:00

Author	SHA1	Message	Date
Andres Freund	d6d8054dc7	localbuf: Track pincount in BufferDesc as well For AIO on temporary table buffers the AIO subsystem needs to be able to ensure a pin on a buffer while AIO is going on, even if the IO issuing query errors out. Tracking the buffer in LocalRefCount does not work, as it would cause CheckForLocalBufferLeaks() to assert out. Instead, also track the refcount in BufferDesc.state, not just LocalRefCount. This also makes local buffers behave a bit more akin to shared buffers. Note that we still don't need locking, AIO completion callbacks for local buffers are executed in the issuing session (i.e. nobody else has access to the BufferDesc). Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt	2025-03-29 16:36:51 -04:00
Andres Freund	08ccd56ac7	aio, bufmgr: Comment fixes/improvements Some of these comments have been wrong for a while (`12f3867f55`), some I recently introduced (`da7226993f`, `55b454d0e1`). This includes an update to a comment in FlushBuffer(), which will be copied in a future commit. These changes seem big enough to be worth doing in separate commits. Suggested-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20250319212530.80.nmisch@google.com	2025-03-29 14:45:42 -04:00
Andres Freund	dee8002468	Fix mis-attribution of checksum failure stats to the wrong database Checksum failure stats could be attributed to the wrong database in two cases: - when a read of a shared relation encountered a checksum error , it would be attributed to the current database, instead of the "database" representing shared relations - when using CREATE DATABASE ... STRATEGY WAL_LOG checksum errors in the source database would be attributed to the current database The checksum stats reporting via PageIsVerifiedExtended(PIV_REPORT_STAT) does not have access to the information about what database a page belongs to. This fixes the issue by removing PIV_REPORT_STAT and delegating the responsibility to report stats to the caller, which now can learn about the number of stats via a new optional argument. As this changes the signature of PageIsVerifiedExtended() and all callers should adapt to the new signature, use the occasion to rename the function to PageIsVerified() and remove the compatibility macro. We could instead have fixed this by adding information about the database to the args of PageIsVerified(), but there are soon-to-be-applied patches that need to separate the stats reporting from the PageIsVerified() call anyway. Those patches also include testing for the failure paths, something we inexplicably have not had. As there is no caller of pgstat_report_checksum_failure() left, remove it. It'd be possible, but awkward to fix this in the back branches. We considered doing the work not quite worth it, as mis-attributed stats should still elicit concern. The emitted error messages do allow to attribute the errors correctly. Discussion: https://postgr.es/m/5tyic6epvdlmd6eddgelv47syg2b5cpwffjam54axp25xyq2ga@ptwkinxqo3az Discussion: https://postgr.es/m/mglpvvbhighzuwudjxzu4br65qqcxsnyvio3nl4fbog3qknwhg@e4gt7npsohuz	2025-03-29 13:38:35 -04:00
Thomas Munro	ce1a75c4fe	Support buffer forwarding in StartReadBuffers(). StartReadBuffers() reports a short read when it finds a cached block that ends a range needing I/O by updating the caller's nblocks. It doesn't want to have to unpin the trailing hit that it knows the caller wants, so the v17 version used sleight of hand in the name of simplicity: it included it in nblocks as if it were part of the I/O, but internally tracked the shorter real I/O size in io_buffers_len (now removed). This API change "forwards" the delimiting buffer to the next call. It's still pinned, and still stored in the caller's array, but *nblocks no longer includes stray buffers that are not really part of the operation. The expectation is that the caller still wants the rest of the blocks and will call again starting from that point, and now it can pass the already pinned buffer back in (or choose not to and release it). The change is needed for the coming asynchronous I/O version's larger version of the problem: by definition it must move BM_IO_IN_PROGRESS negotiation from WaitReadBuffers() to StartReadBuffers(), but it might already have many buffers pinned before it discovers a need to split an I/O. (The current synchronous I/O version hides that detail from callers by looping over smaller reads if required to make all covered buffers valid in WaitReadBuffers(), so it looks like one operation but it might occasionally be several under the covers.) Aside from avoiding unnecessary pin traffic, this will also be important for later work on out-of-order streams: you can't prioritize data that is already available right now if that fact is hidden from you. The new API is natural for read_stream.c (see `ed0b87ca`). After a short read it leaves forwarded buffers where they fell in its circular queue for the continuing call to pick up. Single-block StartReadBuffer() and traditional ReadBuffer() share code but are not affected by the change. They don't do multi-block I/O. Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions) Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com	2025-03-21 20:43:59 +13:00
Andres Freund	202b12774d	bufmgr: Improve stats when a buffer is read in concurrently Previously we would have the following inaccuracies when a backend tried to read in a buffer, but that buffer was read in concurrently by another backend: - the read IO was double-counted in the global buffer access stats (pgBufferUsage) - the buffer hit was not accounted for in: - global buffer access statistics - pg_stat_io - relation level IO stats - vacuum cost balancing While trying to read in a buffer that is concurrently read in by another backend is not a common occurrence, it's also not that rare, e.g. due to concurrent sequential scans on the same relation. This scenario has become more likely in PG 17, due to the introducing of read streams, which can pin multiple buffers before calling StartBufferIO() for all the buffers. This behaviour has historically grown, but there doesn't seem to be any reason to continue with the wrong accounting. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_Zk-B08AzPsO-6680LUHLOCGaNJYofaxTFseLa=OepV1g@mail.gmail.com	2025-03-20 19:58:22 -04:00
Thomas Munro	10f6646847	Introduce io_max_combine_limit. The existing io_combine_limit can be changed by users. The new io_max_combine_limit is fixed at server startup time, and functions as a silent clamp on the user setting. That in itself is probably quite useful, but the primary motivation is: aio_init.c allocates shared memory for all asynchronous IOs including some per-block data, and we didn't want to waste memory you'd never used by assuming they could be up to PG_IOV_MAX. This commit already halves the size of 'AioHandleIov' and 'AioHandleData'. A follow-up commit can now expand PG_IOV_MAX without affecting that. Since our GUC system doesn't support dependencies or cross-checks between GUCs, the user-settable one now assigns a "raw" value to io_combine_limit_guc, and the lower of io_combine_limit_guc and io_max_combine_limit is maintained in io_combine_limit. Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Discussion: https://postgr.es/m/CA%2BhUKG%2B2T9p-%2BzM6Eeou-RAJjTML6eit1qn26f9twznX59qtCA%40mail.gmail.com	2025-03-19 15:23:54 +13:00
Andres Freund	771ba90298	localbuf: Introduce StartLocalBufferIO() To initiate IO on a shared buffer we have StartBufferIO(). For temporary table buffers no similar function exists - likely because the code for that currently is very simple due to the lack of concurrency. However, the upcoming AIO support will make it possible to re-encounter a local buffer, while the buffer already is the target of IO. In that case we need to wait for already in-progress IO to complete. This commit makes it easier to add the necessary code, by introducing StartLocalBufferIO(). Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com	2025-03-15 22:07:48 -04:00
Andres Freund	4b4d33b9ea	localbuf: Introduce FlushLocalBuffer() Previously we had two paths implementing writing out temporary table buffers. For shared buffers, the logic for that is centralized in FlushBuffer(). Introduce FlushLocalBuffer() to do the same for local buffers. Besides being a nice cleanup on its own, it also makes an upcoming change slightly easier. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com	2025-03-15 22:07:48 -04:00
Andres Freund	dd6f2618f6	localbuf: Introduce TerminateLocalBufferIO() Previously TerminateLocalBufferIO() was open-coded in multiple places, which doesn't seem like a great idea. While TerminateLocalBufferIO() currently is rather simple, an upcoming patch requires additional code to be added to TerminateLocalBufferIO(), making this modification particularly worthwhile. For some reason FlushRelationBuffers() previously cleared BM_JUST_DIRTIED, even though that's never set for temporary buffers. This is not carried over as part of this change. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com	2025-03-15 22:07:48 -04:00
Andres Freund	0762a151b0	localbuf: Introduce InvalidateLocalBuffer() Previously, there were three copies of this code, two of them identical. There's no good reason for that. This change is nice on its own, but the main motivation is the AIO patchset, which needs to add extra checks the deduplicated code, which of course is easier if there is only one version. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com	2025-03-15 22:07:48 -04:00
Andres Freund	fa6af9b25e	localbuf: Fix dangerous coding pattern in GetLocalVictimBuffer() If PinLocalBuffer() were to modify the buf_state, the buf_state in GetLocalVictimBuffer() would be out of date. Currently that does not happen, as PinLocalBuffer() only modifies the buf_state if adjust_usagecount=true and GetLocalVictimBuffer() passes false. However, it's easy to make this not the case anymore - it cost me a few hours to debug the consequences. The minimal fix would be to just refetch the buf_state after after calling PinLocalBuffer(), but the same danger exists in later parts of the function. Instead, declare buf_state in the narrower scopes and re-read the state in conditional branches. Besides being safer, it also fits well with an upcoming set of cleanup patches that move the contents of the conditional branches in GetLocalVictimBuffer() into helper functions. I "broke" this in `794f259447`. Arguably this should be backpatched, but as the relevant functions are not exported and there is no actual misbehaviour, I chose to not backpatch, at least for now. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com	2025-03-15 22:07:48 -04:00
Thomas Munro	01261fb078	Improve buffer manager API for backend pin limits. Previously the support functions assumed that the caller needed one pin to make progress, and could optionally use some more, allowing enough for every connection to do the same. Add a couple more functions for callers that want to know: * what the maximum possible number could be, irrespective of currently held pins, for space planning purposes * how many additional pins they could acquire right now, without the special case allowing one pin, for callers that already hold pins and could already make progress even if no extra pins are available The pin limit logic began in commit `31966b15`. This refactoring is better suited to read_stream.c, which will be adjusted to respect the remaining limit as it changes over time in a follow-up commit. It also computes MaxProportionalPins up front, to avoid performing divisions whenever a caller needs to check the balance. Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions) Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com	2025-03-14 17:13:09 +13:00
Michael Paquier	6c349d83b6	Re-add GUC track_wal_io_timing This commit is a rework of `2421e9a51d`, about which Andres Freund has raised some concerns as it is valuable to have both track_io_timing and track_wal_io_timing in some cases, as the WAL write and fsync paths can be a major bottleneck for some workloads. Hence, it can be relevant to not calculate the WAL timings in environments where pg_test_timing performs poorly while capturing some IO data under track_io_timing for the non-WAL IO paths. The opposite can be also true: it should be possible to disable the non-WAL timings and enable the WAL timings (the previous GUC setups allowed this possibility). track_wal_io_timing is added back in this commit, controlling if WAL timings should be calculated in pg_stat_io for the read, fsync and write paths, as done previously with pg_stat_wal. pg_stat_wal previously tracked only the sync and write parts (now removed), read stats is new data tracked in pg_stat_io, all three are aggregated if track_wal_io_timing is enabled. The read part matters during recovery or if a XLogReader is used. Extra note: more control over if the types of timings calculated in pg_stat_io could be done with a GUC that lists pairs of (IOObject,IOOp). Reported-by: Andres Freund <andres@anarazel.de> Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/3opf2wh2oljco6ldyqf7ukabw3jijnnhno6fjb4mlu6civ5h24@fcwmhsgmlmzu	2025-02-26 09:49:59 +09:00
Andres Freund	37c87e63f9	Change relpath() et al to return path by value For AIO, and also some other recent patches, we need the ability to call relpath() in a critical section. Until now that was not feasible, as it allocated memory. The fact that relpath() allocated memory also made it awkward to use in log messages because we had to take care to free the memory afterwards. Which we e.g. didn't do for when zeroing out an invalid buffer. We discussed other solutions, e.g. filling a pre-allocated buffer that's passed to relpath(), but they all came with plenty downsides or were larger projects. The easiest fix seems to be to make relpath() return the path by value. To be able to return the path by value we need to determine the maximum length of a relation path. This patch adds a long #define that computes the exact maximum, which is verified to be correct in a regression test. As this change the signature of relpath(), extensions using it will need to adapt their code. We discussed leaving a backward-compat shim in place, but decided it's not worth it given the use of relpath() doesn't seem widespread. Discussion: https://postgr.es/m/xeri5mla4b5syjd5a25nok5iez2kr3bm26j2qn4u7okzof2bmf@kwdh2vf7npra	2025-02-25 09:02:07 -05:00
Michael Paquier	2421e9a51d	Remove read/sync fields from pg_stat_wal and GUC track_wal_io_timing The four following attributes are removed from pg_stat_wal: * wal_write * wal_sync * wal_write_time * wal_sync_time `a051e71e28` has added an equivalent of this information in pg_stat_io with more granularity as this now spreads across the backend types, IO context and IO objects. So, keeping the same information in pg_stat_wal has little benefits. Another benefit of this commit is the removal of PendingWalStats, simplifying an upcoming patch to add per-backend WAL statistics, which already support IO statistics and which have access to the write/sync stats data of WAL. The GUC track_wal_io_timing, that was used to enable or disable the aggregation of the write and sync timings for WAL, is also removed. pgstat_prepare_io_time() is simplified. Bump catalog version. Bump PGSTAT_FILE_FORMAT_ID, due to the update of PgStat_WalStats. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal	2025-02-24 09:51:56 +09:00
Richard Guo	71d02dc478	Fix unsafe access to BufferDescriptors When considering a local buffer, the GetBufferDescriptor() call in BufferGetLSNAtomic() would be retrieving a shared buffer with a bad buffer ID. Since the code checks whether the buffer is shared before using the retrieved BufferDesc, this issue did not lead to any malfunction. Nonetheless this seems like trouble waiting to happen, so fix it by ensuring that GetBufferDescriptor() is only called when we know the buffer is shared. Author: Tender Wang <tndrwang@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/CAHewXNku-o46-9cmUgyv6LkSZ25doDrWq32p=oz9kfD8ovVJMg@mail.gmail.com Backpatch-through: 13	2025-02-19 11:05:35 +09:00
Thomas Munro	9e17ac997f	Remove obsolete comment. Commit `755a4c10d1` prevented StartReadBuffers() from crossing md.c segment boundaries in one operation, but a comment about that possibility remained.	2025-02-14 13:16:05 +13:00
Peter Eisentraut	827b4060a8	Remove unnecessary (char ) casts [mem] Remove (char ) casts around memory functions such as memcmp(), memcpy(), or memset() where the cast is useless. Since these functions don't take char * arguments anyway, these casts are at best complicated casts to (void *), about which see commit `7f798aca1d`. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org	2025-02-12 08:50:13 +01:00
Michael Paquier	1e380fa7d8	Fix comment of StrategySyncStart() The top comment of StrategySyncStart() mentions BufferSync(), but this function calls BgBufferSync(), not BufferSync(). Oversight in `9cd00c457e`. Author: Ashutosh Bapat Discussion: https://postgr.es/m/CAExHW5tgkjag8i-s=RFrCn5KAWDrC4zEPPkfUKczfccPOxBRQQ@mail.gmail.com Backpatch-through: 13	2025-01-31 11:05:57 +09:00
Tom Lane	f6ff75f796	Make BufferIsExclusiveLocked and BufferIsDirty work for local buffers. These functions tried to check the state of the buffer's content lock even for local buffers. Since we don't use the content lock for a local buffer, that would lead to a "false" result from LWLockHeldByMeInMode, which would mean a misleading "false" answer from BufferIsExclusiveLocked (we'd rather that case always return "true") or an assertion failure in BufferIsDirty. The core code never applies these two functions to local buffers, and apparently no extensions do either, since we've not heard complaints. Still, in the name of future-proofing, let's fix them to act as though a pinned local buffer is content-locked. Author: Srinath Reddy <srinath2133@gmail.com> Discussion: https://postgr.es/m/19396ef77f8.1098c4a1810508.2255483659262451647@zohocorp.com	2025-01-29 13:23:31 -05:00
Tom Lane	23d7562018	Remove PrintBufferDescs() and PrintPinnedBufs(). These have been #ifdef'd out for a long time, and in fact have been uncompilable since commit `48354581a` of 2016-04-10. The fact that nobody noticed for so long demonstrates their lack of usefulness, so let's remove them rather than fix them. Author: Jacob Brazeal <jacob.brazeal@gmail.com> Discussion: https://postgr.es/m/CA+COZaB+9CN_f63PPRoVhHjYmCwwmb_9CWLxqCJdMWDqs1a-JA@mail.gmail.com	2025-01-19 14:00:22 -05:00
Michael Paquier	f92c854cf4	Make pg_stat_io count IOs as bytes instead of blocks for some operations Currently in pg_stat_io view, IOs are counted as blocks of size BLCKSZ. There are two limitations with this design: * The actual number of I/O requests sent to the kernel is lower because I/O requests may be merged before being sent. Additionally, it gives the impression that all I/Os are done in block size, which shadows the benefits of merging I/O requests. * Some patches are under work to extend pg_stat_io for the tracking of operations that may not be linked to the block size. For example, WAL read IOs are done in variable bytes and it is not possible to correctly show these IOs in pg_stat_io view, and we want to keep all this data in a single system view rather than spread it across multiple relations to ease monitoring. WaitReadBuffers() can now be tracked as a single read operation worth N blocks. Same for ExtendBufferedRelShared() and ExtendBufferedRelLocal() for extensions. Three columns are added to pg_stat_io for reads, writes and extensions for the byte calculations. op_bytes, which was always hardcoded to BLCKSZ, is removed. IO backend statistics are updated to reflect these changes. Bump catalog version. Author: Nazir Bilal Yavuz Reviewed-by: Bertrand Drouvot, Melanie Plageman Discussion: https://postgr.es/m/CAN55FZ0oqxBaaHAEsj=xFqkzE3n5P=3RA1V_igXwL-RV7QRzyw@mail.gmail.com	2025-01-14 12:14:29 +09:00
Michael Paquier	f0bf7857be	Merge pgstat_count_io_op_n() and pgstat_count_io_op() The pgstat_count_io_op() function, which counts a single I/O operation, wraps pgstat_count_io_op_n() with a counter value of 1. The latter is declared in pgstat.h and used nowhere in the code, so let's remove it in favor of the former. This change makes also the code more symmetric with pgstat_count_io_op_time(), that already uses a similar set of arguments, except that it counts also the I/O time. This will ease a bit the integration of a follow-up patch that adds byte-level tracking in pg_stat_io for some of its attributes, lifting the current restriction based on BLCKSZ as all I/O operations are assumed to be block-based. Author: Nazir Bilal Yavuz Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/CAN55FZ32ze812=yjyZg1QeXhKvACUM_Nu0_gyPQcUKKuVHL5xA@mail.gmail.com	2025-01-10 09:57:27 +09:00
Bruce Momjian	50e6eb731d	Update copyright for 2025 Backpatch-through: 13	2025-01-01 11:21:55 -05:00
Peter Eisentraut	8743ea1b2e	Remove useless casts to (const void *) Similar to commit `7f798aca1d`, but I didn't think to look for "const" as well.	2024-12-06 18:49:01 +01:00
Peter Eisentraut	7f798aca1d	Remove useless casts to (void *) Many of them just seem to have been copied around for no real reason. Their presence causes (small) risks of hiding actual type mismatches or silently discarding qualifiers Discussion: https://www.postgresql.org/message-id/flat/461ea37c-8b58-43b4-9736-52884e862820@eisentraut.org	2024-11-28 08:27:20 +01:00
Peter Eisentraut	a274bbb1b3	Remove a useless cast to (void *) in hash_search() call This pattern was previously cleaned up in `54a177a948`, but a new instance snuck in around the same time in `31966b151e`.	2024-11-14 09:30:14 +01:00
Peter Eisentraut	e18512c000	Remove unused #include's from backend .c files as determined by IWYU These are mostly issues that are new since commit `dbbca2cf29`. Discussion: https://www.postgresql.org/message-id/flat/0df1d5b1-8ca8-4f84-93be-121081bde049%40eisentraut.org	2024-10-27 08:26:50 +01:00
Michael Paquier	57a36e890d	Fix grammar of a comment in bufmgr.c Author: Junwang Zhao Discussion: https://postgr.es/m/CAEG8a3L5YjxXCjx0LhkwHdDGsNgpFGEqH7SqtXRPNP+dwFMVZQ@mail.gmail.com	2024-10-21 11:25:29 +09:00
Andres Freund	755a4c10d1	bufmgr/smgr: Don't cross segment boundaries in StartReadBuffers() With real AIO it doesn't make sense to cross segment boundaries with one IO. Add smgrmaxcombine() to allow upper layers to query which buffers can be merged. We could continue to cross segment boundaries when not using AIO, but it doesn't really make sense, because md.c will never be able to perform the read across the segment boundary in one system call. Which means we'll mark more buffers as undergoing IO than really makes sense - if another backend desires to read the same blocks, it'll be blocked longer than necessary. So it seems better to just never cross the boundary. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi	2024-10-08 11:37:45 -04:00
Andres Freund	488f826c72	bufmgr: Return early in ScheduleBufferTagForWriteback() if fsync=off As pg_flush_data() doesn't do anything with fsync disabled, there's no point in tracking the buffer for writeback. Arguably the better fix would be to change pg_flush_data() to flush data even with fsync off, but that's a behavioral change, whereas this is just a small optimization. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi	2024-10-08 11:37:45 -04:00
Noah Misch	c582b75851	Add block_range_read_stream_cb(), to deduplicate code. This replaces two functions for iterating over all blocks in a range. A pending patch will use this instead of adding a third. Nazir Bilal Yavuz Discussion: https://postgr.es/m/20240820184742.f2.nmisch@google.com	2024-09-03 10:46:20 -07:00
Daniel Gustafsson	31a98934d1	Fix typos in code comments and test data The typos in 005_negotiate_encryption.pl and pg_combinebackup.c shall be backported to v17 where they were introduced. Backpatch-through: v17 Discussion: https://postgr.es/m/Ztaj7BkN4658OMxF@paquier.xyz	2024-09-03 11:33:38 +02:00
Heikki Linnakangas	478846e768	Rename some shared memory initialization routines To make them follow the usual naming convention where FoobarShmemSize() calculates the amount of shared memory needed by Foobar subsystem, and FoobarShmemInit() performs the initialization. I didn't rename CreateLWLocks() and InitShmmeIndex(), because they are a little special. They need to be called before any of the other ShmemInit() functions, because they set up the shared memory bookkeeping itself. I also didn't rename InitProcGlobal(), because unlike other Shmeminit functions, it's not called by individual backends. Reviewed-by: Andreas Karlsson Discussion: https://www.postgresql.org/message-id/c09694ff-2453-47e5-b26c-32a16cd75ce6@iki.fi	2024-08-29 09:46:21 +03:00
Masahiko Sawada	c584781bcc	Use pgBufferUsage for buffer usage tracking in analyze. Previously, (auto)analyze used global variables VacuumPageHit, VacuumPageMiss, and VacuumPageDirty to track buffer usage. However, pgBufferUsage provides a more generic way to track buffer usage with support functions. This change replaces those global variables with pgBufferUsage in analyze. Since analyze was the sole user of those variables, it removes their declarations. Vacuum previously used those variables but replaced them with pgBufferUsage as part of a bug fix, commit `5cd72cc0c`. Additionally, it adjusts the buffer usage message in both vacuum and analyze for better consistency. Author: Anthonin Bonnefoy Reviewed-by: Masahiko Sawada, Michael Paquier Discussion: https://postgr.es/m/CAO6_Xqr__kTTCLkftqS0qSCm-J7_xbRG3Ge2rWhucxQJMJhcRA%40mail.gmail.com	2024-08-13 18:49:45 -07:00
Noah Misch	840b3b5b4e	Fix private struct field name to match the code using it. Commit `8720a15e9a` added the wrong name. Nazir Bilal Yavuz Discussion: https://postgr.es/m/20240720181405.5a.nmisch@google.com	2024-07-23 05:32:03 -07:00
Noah Misch	8720a15e9a	Use read streams in CREATE DATABASE when STRATEGY=WAL_LOG. While this doesn't significantly change runtime now, it arranges for STRATEGY=WAL_LOG to benefit automatically from future optimizations to the read_stream subsystem. For large tables in the template database, this does read 16x as many bytes per system call. Platforms with high per-call overhead, if any, may see an immediate benefit. Nazir Bilal Yavuz Discussion: https://postgr.es/m/CAN55FZ0JKL6vk1xQp6rfOXiNFV1u1H0tJDPPGHWoiO3ea2Wc=A@mail.gmail.com	2024-07-20 04:22:12 -07:00
Noah Misch	af07a827b9	Refactor PinBufferForBlock() to remove checks about persistence. There are checks in PinBufferForBlock() function to set persistence of the relation. This function is called for each block in the relation. Instead, set persistence of the relation before PinBufferForBlock(). Nazir Bilal Yavuz Discussion: https://postgr.es/m/CAN55FZ0JKL6vk1xQp6rfOXiNFV1u1H0tJDPPGHWoiO3ea2Wc=A@mail.gmail.com	2024-07-20 04:22:12 -07:00
Noah Misch	e00c45f685	Remove "smgr_persistence == 0" dead code. Reaching that code would have required multiple processes performing relation extension during recovery, which does not happen. That caller has the persistence available, so pass it. This was dead code as soon as commit `210622c60e` added it. Discussion: https://postgr.es/m/CAN55FZ0JKL6vk1xQp6rfOXiNFV1u1H0tJDPPGHWoiO3ea2Wc=A@mail.gmail.com	2024-07-20 04:22:12 -07:00
Thomas Munro	e656657f2b	Fix RBM_ZERO_AND_LOCK. Commit `210622c6` accidentally zeroed out pages even if they were found in the buffer pool. It should always lock the page, but it should only zero pages that were not already valid. Otherwise, concurrent readers that hold only a pin could see corrupted page contents changing under their feet. While here, rename ZeroAndLockBuffer() to match the RBM_ flag name. Also restore a some useful comments lost by 210622c6's refactoring, and add some new ones to clarify why we need to use the BM_IO_IN_PROGRESS infrastructure despite not doing I/O. Reported-by: Noah Misch <noah@leadboat.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> (earlier version) Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier version) Discussion: https://postgr.es/m/20240512171658.7e.nmisch@google.com Discussion: https://postgr.es/m/7ed10231-ce47-03d5-d3f9-4aea0dc7d5a4%40gmail.com	2024-06-10 12:32:59 +12:00
Peter Eisentraut	17974ec259	Revise GUC names quoting in messages again After further review, we want to move in the direction of always quoting GUC names in error messages, rather than the previous (PG16) wildly mixed practice or the intermittent (mid-PG17) idea of doing this depending on how possibly confusing the GUC name is. This commit applies appropriate quotes to (almost?) all mentions of GUC names in error messages. It partially supersedes `a243569bf6` and `8d9978a717`, which had moved things a bit in the opposite direction but which then were abandoned in a partial state. Author: Peter Smith <smithpb2250@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w%40mail.gmail.com	2024-05-17 11:44:26 +02:00
Daniel Gustafsson	950d4a2cb1	Fix typos and duplicate words This fixes various typos, duplicated words, and tiny bits of whitespace mainly in code comments but also in docs. Author: Daniel Gustafsson <daniel@yesql.se> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Alexander Lakhin <exclusion@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se	2024-04-18 21:28:07 +02:00
Masahiko Sawada	810f64a015	Revert indexed and enlargable binary heap implementation. This reverts commit `b840508644` and `bcb14f4abc`. These commits were made for commit `5bec1d6bc5` (Improve eviction algorithm in ReorderBuffer using max-heap for many subtransactions). However, per discussion, commit `efb8acc0d0` replaced binary heap + index with pairing heap, and made these commits unnecessary. Reported-by: Jeff Davis Discussion: https://postgr.es/m/12747c15811d94efcc5cda72d6b35c80d7bf3443.camel%40j-davis.com	2024-04-11 17:18:05 +09:00
Thomas Munro	13453eedd3	Add pg_buffercache_evict() function for testing. When testing buffer pool logic, it is useful to be able to evict arbitrary blocks. This function can be used in SQL queries over the pg_buffercache view to set up a wide range of buffer pool states. Of course, buffer mappings might change concurrently so you might evict a block other than the one you had in mind, and another session might bring it back in at any time. That's OK for the intended purpose of setting up developer testing scenarios, and more complicated interlocking schemes to give stronger guararantees about that would likely be less flexible for actual testing work anyway. Superuser-only. Author: Palak Chaturvedi <chaturvedipalak1911@gmail.com> Author: Thomas Munro <thomas.munro@gmail.com> (docs, small tweaks) Reviewed-by: Nitin Jadhav <nitinjadhavpostgres@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Cary Huang <cary.huang@highgo.ca> Reviewed-by: Cédric Villemain <cedric.villemain+pgsql@abcsql.com> Reviewed-by: Jim Nasby <jim.nasby@gmail.com> Reviewed-by: Maxim Orlov <orlovmg@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CALfch19pW48ZwWzUoRSpsaV9hqt0UPyaBPC4bOZ4W+c7FF566A@mail.gmail.com	2024-04-08 16:23:40 +12:00
Thomas Munro	98f320eb2e	Increase default vacuum_buffer_usage_limit to 2MB. The BAS_VACUUM ring size has been 256kB since commit `d526575f` introduced the mechanism 17 years ago. Commit `1cbbee03` recently made it configurable but retained the traditional default. The correct default size has been debated for years, but 256kB is certainly very small. VACUUM soon needs to write back data it dirtied only 32 blocks ago, which usually requires flushing the WAL. New experiments in prefetching pages for VACUUM exacerbated the problem by crashing into dirty data even sooner. Let's make the default 2MB. That's 1.6% of the default toy buffer pool size, and 0.2% of 1GB, which would be a considered a small shared_buffers setting for a real system these days. Users are still free to set the GUC to a different value. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20240403221257.md4gfki3z75cdyf6%40awork3.anarazel.de Discussion: https://postgr.es/m/CA%2BhUKGLY4Q4ZY4f1rvnFtv6%2BPkjNf8MejdPkcju3Qii9DYqqcQ%40mail.gmail.com	2024-04-06 23:12:03 +13:00
Thomas Munro	3bd8439ed6	Allow BufferAccessStrategy to limit pin count. While pinning extra buffers to look ahead, users of strategies are in danger of using too many buffers. For some strategies, that means "escaping" from the ring, and in others it means forcing dirty data to disk very frequently with associated WAL flushing. Since external code has no insight into any of that, allow individual strategy types to expose a clamp that should be applied when deciding how many buffers to pin at once. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_aJXnqsyZt6HwFLnxYEBgE17oypkxbKbT1t1geE_wvH2Q%40mail.gmail.com	2024-04-06 23:11:45 +13:00
Masahiko Sawada	b840508644	Add functions to binaryheap for efficient key removal and update. Previously, binaryheap didn't support updating a key and removing a node in an efficient way. For example, in order to remove a node from the binaryheap, the caller had to pass the node's position within the array that the binaryheap internally has. Removing a node from the binaryheap is done in O(log n) but searching for the key's position is done in O(n). This commit adds a hash table to binaryheap in order to track the position of each nodes in the binaryheap. That way, by using newly added functions such as binaryheap_update_up() etc., both updating a key and removing a node can be done in O(1) on an average and O(log n) in worst case. This is known as the indexed binary heap. The caller can specify to use the indexed binaryheap by passing indexed = true. The current code does not use the new indexing logic, but it will be used by an upcoming patch. Reviewed-by: Vignesh C, Peter Smith, Hayato Kuroda, Ajin Cherian, Tomas Vondra, Shubham Khanna Discussion: https://postgr.es/m/CAD21AoDffo37RC-eUuyHJKVEr017V2YYDLyn1xF_00ofptWbkg%40mail.gmail.com	2024-04-03 10:44:21 +09:00
Thomas Munro	210622c60e	Provide vectored variant of ReadBuffer(). Break ReadBuffer() up into two steps. StartReadBuffers() and WaitReadBuffers() give us two main advantages: 1. Multiple consecutive blocks can be read with one system call. 2. Advice (hints of future reads) can optionally be issued to the kernel ahead of time. The traditional ReadBuffer() function is now implemented in terms of those functions, to avoid duplication. A new GUC io_combine_limit is defined, and the functions for limiting per-backend pin counts are made into public APIs. Those are provided for use by callers of StartReadBuffers(), when deciding how many buffers to read at once. The following commit will add a higher level mechanism for doing that automatically with a practical interface. With some more infrastructure in later work, StartReadBuffers() could be extended to start real asynchronous I/O instead of just issuing advice and leaving WaitReadBuffers() to do the work synchronously. Author: Thomas Munro <thomas.munro@gmail.com> Author: Andres Freund <andres@anarazel.de> (some optimization tweaks) Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Tested-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com	2024-04-03 00:23:20 +13:00
Peter Eisentraut	dbbca2cf29	Remove unused #include's from backend .c files as determined by include-what-you-use (IWYU) While IWYU also suggests to add a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org	2024-03-04 12:02:20 +01:00
Heikki Linnakangas	024c521117	Replace BackendIds with 0-based ProcNumbers Now that BackendId was just another index into the proc array, it was redundant with the 0-based proc numbers used in other places. Replace all usage of backend IDs with proc numbers. The only place where the term "backend id" remains is in a few pgstat functions that expose backend IDs at the SQL level. Those IDs are now in fact 0-based ProcNumbers too, but the documentation still calls them "backend ids". That term still seems appropriate to describe what the numbers are, so I let it be. One user-visible effect is that pg_temp_0 is now a valid temp schema name, for backend with ProcNumber 0. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi	2024-03-03 19:38:22 +02:00

1 2 3 4 5 ...

687 Commits