postgres

mirror of https://github.com/postgres/postgres.git synced 2025-11-18 02:02:55 +03:00

Author	SHA1	Message	Date
Michael Paquier	f554a95379	Remove initialization from PendingBackendStats `9a8dd2c5a6` has added an initialization to PendingBackendStats, which has been causing compilation warnings in the buildfarm. This code does not strictly require it as PendingBackendStats is always initialized with memset(0), so let's remove it. Per report from multiple buildfarm members, like ayu and batfish, via Tom Lane. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/1870853.1741749264@sss.pgh.pa.us	2025-03-12 20:37:43 +09:00
Heikki Linnakangas	043745c3a0	Improve snapmgr.c comment Add more details on the different kinds of snapshots, how to use them, and how the active snapshot stack works. Discussion: https://www.postgresql.org/message-id/7c56f180-b9e1-481e-8c1d-efa63de3ecbb@iki.fi	2025-03-11 23:28:38 +02:00
Heikki Linnakangas	8076c00592	Assert that a snapshot is active or registered before it's used The comment in GetTransactionSnapshot() said that you "should call RegisterSnapshot or PushActiveSnapshot on the returned snap if it is to be used very long". That felt too unclear to me. Make the comment more strongly worded. To enforce that rule and to catch potential bugs where a snapshot might get invalidated while it's still in use, add an assertion to HeapTupleSatisfiesMVCC() to check that the snapshot is registered or pushed to active stack. No new bugs were found by this, but it seems like good future-proofing. It's not a great place for the check; HeapTupleSatisfiesMVCC() is in fact safe to call with an unregistered snapshot, and the assertion won't catch other unsafe uses. But it goes a long way in practice. Fix a few cases that were playing fast and loose with that and just assumed that the snapshot cannot be invalidated during a scan. Those assumptions were not wrong, but they're not performance critical, so let's drop the excuses and just register the snapshot. These were false positives found by the new assertion. Discussion: https://www.postgresql.org/message-id/7c56f180-b9e1-481e-8c1d-efa63de3ecbb@iki.fi	2025-03-11 23:20:34 +02:00
Tom Lane	8b1b342544	Improve EXPLAIN's display of window functions. Up to now we just punted on showing the window definitions used in a plan, with window function calls represented as "OVER (?)". To improve that, show the window definition implemented by each WindowAgg plan node, and reference their window names in OVER. For nameless window clauses generated by "OVER (...)", assign unique names w1, w2, etc. In passing, re-order the properties shown for a WindowAgg node so that the Run Condition (if any) appears after the Window property and before the Filter (if any). This seems more sensible since the Run Condition is associated with the Window and acts before the Filter. Thanks to David G. Johnston and Álvaro Herrera for design suggestions. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/144530.1741469955@sss.pgh.pa.us	2025-03-11 11:19:54 -04:00
Peter Geoghegan	0fbceae841	Show index search count in EXPLAIN ANALYZE, take 2. Expose the count of index searches/index descents in EXPLAIN ANALYZE's output for index scan/index-only scan/bitmap index scan nodes. This information is particularly useful with scans that use ScalarArrayOp quals, where the number of index searches can be unpredictable due to implementation details that interact with physical index characteristics (at least with nbtree SAOP scans, since Postgres 17 commit `5bf748b8`). The information shown also provides useful context when EXPLAIN ANALYZE runs a plan with an index scan node that successfully applied the skip scan optimization (set to be added to nbtree by an upcoming patch). The instrumentation works by teaching all index AMs to increment a new nsearches counter whenever a new index search begins. The counter is incremented at exactly the same point that index AMs already increment the pg_stat_*_indexes.idx_scan counter (we're counting the same event, but at the scan level rather than the relation level). Parallel queries have workers copy their local counter struct into shared memory when an index scan node ends -- even when it isn't a parallel aware scan node. An earlier version of this patch that only worked with parallel aware scans became commit `5ead85fb` (though that was quickly reverted by commit `d00107cd` following "debug_parallel_query=regress" buildfarm failures). Our approach doesn't match the approach used when tracking other index scan related costs (e.g., "Rows Removed by Filter:"). It is comparable to the approach used in similar cases involving costs that are only readily accessible inside an access method, not from the executor proper (e.g., "Heap Blocks:" output for a Bitmap Heap Scan, which was recently enhanced to show per-worker costs by commit `5a1e6df3`, using essentially the same scheme as the one used here). It is necessary for index AMs to have direct responsibility for maintaining the new counter, since the counter might need to be incremented multiple times per amgettuple call (or per amgetbitmap call). But it is also necessary for the executor proper to manage the shared memory now used to transfer each worker's counter struct to the leader. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Tomas Vondra <tomas@vondra.me> Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com	2025-03-11 09:20:50 -04:00
Michael Paquier	76def4cdd7	Add WAL data to backend statistics This commit adds per-backend WAL statistics, providing the same information as pg_stat_wal, except that it is now possible to know how much WAL activity is happening in each backend rather than an overall aggregate of all the activity. Like pg_stat_wal, the implementation relies on pgWalUsage, tracking the difference of activity between two reports to pgstats. This data can be retrieved with a new system function called pg_stat_get_backend_wal(), that returns one tuple based on the PID provided in input. Like pg_stat_get_backend_io(), this is useful when joined with pg_stat_activity to get a live picture of the WAL generated for each running backend, showing how the activity is [un]balanced. pgstat_flush_backend() gains a new flag value, able to control the flush of the WAL stats. This commit relies mostly on the infrastructure provided by `9aea73fc61`, that has introduced backend statistics. Bump catalog version. A bump of PGSTAT_FILE_FORMAT_ID is not required, as backend stats do not persist on disk. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal	2025-03-11 09:04:11 +09:00
Alexander Korotkov	6bb6a62f3c	Use extended stats for precise estimation of bucket size in hash join Recognizing the real-life complexity where columns in the table often have functional dependencies, PostgreSQL's estimation of the number of distinct values over a set of columns can be underestimated (or much rarely, overestimated) when dealing with multi-clause JOIN. In the case of hash join, it can end up with a small number of predicted hash buckets and, as a result, picking non-optimal merge join. To improve the situation, we introduce one additional stage of bucket size estimation - having two or more join clauses estimator lookup for extended statistics and use it for multicolumn estimation. Clauses are grouped into lists, each containing expressions referencing the same relation. The result of the multicolumn estimation made over such a list is combined with others according to the caller's logic. Clauses that are not estimated are returned to the caller for further estimation. Discussion: https://postgr.es/m/52257607-57f6-850d-399a-ec33a654457b%40postgrespro.ru Author: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Andy Fan <zhihui.fan1213@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>	2025-03-10 13:42:01 +02:00
Peter Geoghegan	67fc4c9fd7	Make parallel nbtree index scans use an LWLock. Teach parallel nbtree index scans to use an LWLock (not a spinlock) to protect the scan's shared descriptor state. Preparation for an upcoming patch that will add skip scan optimizations to nbtree. That patch will create the need to occasionally allocate memory while the scan descriptor is locked, while copying datums that were serialized by another backend. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com	2025-03-08 11:10:14 -05:00
Peter Eisentraut	8021c77769	Make amcanorder independent of amconsistentordering Follow-up to commit `af4002b381`: Make amconsistentordering not depend on amcanorder. Although they are related, they are independent properties. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org	2025-03-08 09:37:06 +01:00
Michael Paquier	9a8dd2c5a6	Improve check for detection of pending data in backend statistics The callback pgstat_backend_have_pending_cb() is used as a way for pg_stat_report() to detect if there is any pending data for backend statistics. It did not include a check based on pgstat_tracks_backend_bktype(), that discards processes whose backend types do not support backend statistics. The logic is not a problem on HEAD, as processes that do not support backend statistics cannot touch PendingBackendStats, so the callback would always report that there is no pending data in this case. However, we would run into trouble once backend statistics include portions of pending stats that are not always zeroed, like pgWalUsage. There is no reason for pgstat_backend_have_pending_cb() to not check for pgstat_tracks_backend_bktype(), anyway, and this pattern is safer in the long run, so let's update the code to do so. While on it, this commit adds a proper initialization to PendingBackendStats. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/Z8l6EMM4ImVoWRkg@ip-10-97-1-34.eu-west-3.compute.internal	2025-03-08 10:56:30 +09:00
Peter Eisentraut	7f24c02743	Improve possible performance regression Commit `ce62f2f2a0` introduced calls to GetIndexAmRoutineByAmId() in lsyscache.c functions. This call is a bit more expensive than a simple syscache lookup. So rearrange the nesting so that we call that one last and do the cheaper checks first. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org	2025-03-07 11:46:33 +01:00
Peter Eisentraut	af4002b381	Rename amcancrosscompare After more discussion about commit `ce62f2f2a0`, rename the index AM property amcancrosscompare to two separate properties amconsistentequality and amconsistentordering. Also improve the documentation and update some comments that were previously missed. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org	2025-03-07 11:46:33 +01:00
Dean Rasheed	6da469bada	Allow casting between bytea and integer types. This allows smallint, integer, and bigint values to be cast to and from bytea. The bytea value is the two's complement representation of the integer, with the most significant byte first. For example: 1234::bytea -> \x000004d2 (-1234)::bytea -> \xfffffb2e Author: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Joel Jacobson <joel@compiler.org> Reviewed-by: Yugo Nagata <nagata@sraoss.co.jp> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/CAJ7c6TPtOp6%2BkFX5QX3fH1SVr7v65uHr-7yEJ%3DGMGQi5uhGtcA%40mail.gmail.com	2025-03-07 09:31:18 +00:00
Heikki Linnakangas	393e0d2314	Split WaitEventSet functions to separate source file latch.c now only contains the Latch related functions, which build on the WaitEventSet abstraction. Most of the platform-dependent stuff is now in waiteventset.c. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/8a507fb6-df28-49d3-81a5-ede180d7f0fb@iki.fi	2025-03-06 01:26:16 +02:00
Andrew Dunstan	4603903d29	Allow json{b}_strip_nulls to remove null array elements An additional paramater ("strip_in_arrays") is added to these functions. It defaults to false. If true, then null array elements are removed as well as null valued object fields. JSON that just consists of a single null is not affected. Author: Florents Tselai <florents.tselai@gmail.com> Discussion: https://postgr.es/m/4BCECCD5-4F40-4313-9E98-9E16BEB0B01D@gmail.com	2025-03-05 10:04:02 -05:00
Michael Paquier	f4694e0f35	Fix some gaps in pg_stat_io with WAL receiver and WAL summarizer The WAL receiver and WAL summarizer processes gain each one a call to pgstat_report_wal(), to make sure that they report their WAL statistics to pgstats, gathering data for pg_stat_io. In the WAL receiver, the stats reports are timed with status updates sent to the primary, that depend on wal_receiver_status_interval and wal_receiver_timeout. This is a conservative choice, but perhaps we could be more aggressive with the frequency of the stats reports. An interesting historical fact is that the WAL receiver does writes and syncs of WAL, but it has never reported its statistics to pgstats in pg_stat_wal. In the WAL summarizer, the stats reports are done each time the process waits for WAL. While on it, pg_stat_io is adjusted so as these two processes do not report any rows when IOObject is not WAL, making the view easier to use with less rows. Two tests are added in TAP, checking statistics for the WAL summarizer and the WAL receiver. Status updates in the WAL receiver are currently possible in the recovery test 001_stream_rep.pl. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/Z8UKZyVSHUUQJHNb@paquier.xyz	2025-03-05 10:17:39 +09:00
Tomas Vondra	c878de1db4	Make FP_LOCK_SLOTS_PER_BACKEND look like a function The FP_LOCK_SLOTS_PER_BACKEND macro looks like a constant, but it depends on the max_locks_per_transaction GUC, and thus can change. This is non-obvious and confusing, so make it look more like a function by renaming it to FastPathLockSlotsPerBackend(). While at it, use the macro when initializing fast-path shared memory, instead of using the formula. Reported-by: Andres Freund Discussion: https://postgr.es/m/ffiwtzc6vedo6wb4gbwelon5nefqg675t5c7an2ta7pcz646cg%40qwmkdb3l4ett	2025-03-04 18:33:12 +01:00
Michael Paquier	c76db55c90	Split pgstat_bestart() into three different routines pgstat_bestart(), used post-authentication to set up a backend entry in the PgBackendStatus array, so as its data becomes visible in pg_stat_activity and related catalogs, has its logic divided into three routines with this commit, called in order at different steps of the backend initialization: * pgstat_bestart_initial() sets up the backend entry with a minimal amount of information, reporting it with a new BackendState called STATE_STARTING while waiting for backend initialization and client authentication to complete. The main benefit that this offers is observability, so as it is possible to monitor the backend activity during authentication. This step happens earlier than in the logic prior to this commit. pgstat_beinit() happens earlier as well, before authentication. * pgstat_bestart_security() reports the SSL/GSS status of the connection, once authentication completes. Auxiliary processes, for example, do not need to call this step, hence it is optional. This step is called after performing authentication, same as previously. * pgstat_bestart_final() reports the user and database IDs, takes the entry out of STATE_STARTING, and reports its application_name. This is called as the last step of the three, once authentication completes. An injection point is added, with a test checking that the "starting" phase of a backend entry is visible in pg_stat_activity. Some follow-up patches are planned to take advantage of this refactoring with more information provided in backend entries during authentication (LDAP hanging was a problem for the author, initially). Author: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAOYmi+=60deN20WDyCoHCiecgivJxr=98s7s7-C8SkXwrCfHXg@mail.gmail.com	2025-03-04 14:09:44 +09:00
Michael Paquier	40d3f82744	Add more assertions in palloc0() and palloc_extended() palloc() includes an assertion checking that an alloc() implementation never returns NULL for all MemoryContextMethods. This commit adds a similar assertion in palloc0(). In palloc_extend(), a different assertion is added, checking that MCXT_ALLOC_NO_OOM is set when an alloc() routine returns NULL. These additions can be useful to catch errors when implementing a new set of MemoryContextMethods routines. Author: Andreas Karlsson <andreas@proxel.se> Discussion: https://postgr.es/m/507e8eba-2035-4a12-a777-98199a66beb8@proxel.se	2025-03-04 10:53:10 +09:00
Melanie Plageman	06eae9e621	Trigger more frequent autovacuums with relallfrozen Calculate the insert threshold for triggering an autovacuum of a relation based on the number of unfrozen pages. By only considering the unfrozen portion of the table when calculating how many tuples to add to the insert threshold, we can trigger more frequent vacuums of insert-heavy tables. This increases the chances of vacuuming those pages when they still reside in shared buffers This also increases the number of autovacuums triggered by tuples inserted and not by wraparound risk. We prefer to freeze these pages during insert-triggered autovacuums, as anti-wraparound vacuums are not automatically canceled by conflicting lock requests. We calculate the unfrozen percentage of the table using the recently added (`99f8f3fbbc`) relallfrozen column of pg_class. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com> Reviewed-by: Robert Treat <rob@xzilla.net> Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_aj-P7YyBz_cPNwztz6ohP%2BvWis%3Diz3YcomkB3NpYA--w%40mail.gmail.com	2025-03-03 14:42:00 -05:00
Melanie Plageman	99f8f3fbbc	Add relallfrozen to pg_class Add relallfrozen, an estimate of the number of pages marked all-frozen in the visibility map. pg_class already has relallvisible, an estimate of the number of pages in the relation marked all-visible in the visibility map. This is used primarily for planning. relallfrozen, together with relallvisible, is useful for estimating the outstanding number of all-visible but not all-frozen pages in the relation for the purposes of scheduling manual VACUUMs and tuning vacuum freeze parameters. A future commit will use relallfrozen to trigger more frequent vacuums on insert-focused workloads with significant volume of frozen data. Bump catalog version Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Robert Treat <rob@xzilla.net> Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_aj-P7YyBz_cPNwztz6ohP%2BvWis%3Diz3YcomkB3NpYA--w%40mail.gmail.com	2025-03-03 11:18:05 -05:00
Tomas Vondra	8492feb98f	Allow parallel CREATE INDEX for GIN indexes Allow using parallel workers to build a GIN index, similarly to BTREE and BRIN. For large tables this may result in significant speedup when the build is CPU-bound. The work is divided so that each worker builds index entries on a subset of the table, determined by the regular parallel scan used to read the data. Each worker uses a local tuplesort to sort and merge the entries for the same key. The TID lists do not overlap (for a given key), which means the merge sort simply concatenates the two lists. The merged entries are written into a shared tuplesort for the leader. The leader needs to merge the sorted entries again, before writing them into the index. But this way a significant part of the work happens in the workers, and the leader is left with merging fewer large entries, which is more efficient. Most of the parallelism infrastructure is a simplified copy of the code used by BTREE indexes, omitting the parts irrelevant for GIN indexes (e.g. uniqueness checks). Original patch by me, with reviews and substantial improvements by Matthias van de Meent, certainly enough to make him a co-author. Author: Tomas Vondra, Matthias van de Meent Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-f25ad98582dc%40enterprisedb.com	2025-03-03 16:53:06 +01:00
Michael Paquier	3f1db99bfa	Handle auxiliary processes in SQL functions of backend statistics This commit impacts the following SQL functions, authorizing the access to the PGPROC entries of auxiliary processes when attempting to fetch or reset backend-level pgstats entries: - pg_stat_reset_backend_stats() - pg_stat_get_backend_io() This is relevant since `a051e71e28` for at least the WAL summarizer, WAL receiver and WAL writer processes, that has changed the backend statistics to authorize these three following the addition of WAL I/O statistics in pg_stat_io and backend statistics. The code is more flexible with future changes written this way, adapting automatically to any updates done in pgstat_tracks_backend_bktype(). While on it, pgstat_report_wal() gains a call to pgstat_flush_backend(), making sure that backend I/O statistics are updated when calling this routine. This makes the statistics report correctly for the WAL writer. WAL receiver and WAL summarizer do not call pgstat_report_wal() yet (spoiler: both should). It should be possible to lift some of the existing restrictions for other auxiliary processes, as well, but this is left as future work. Reported-by: Rahila Syed <rahilasyed90@gmail.com> Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/CAH2L28v9BwN8_y0k6FQ591=0g2Hj_esHLGj3bP38c9nmVykoiA@mail.gmail.com	2025-03-03 09:57:48 +09:00
Michael Paquier	c2a50ac678	Invent pgstat_fetch_stat_backend_by_pid() This code is extracted from pg_stat_get_backend_io() in pgstatfuncs.c, so as it can be shared with other areas that need backend pgstats entries while having the benefits of the various sanity checks refactored here. As per its name, this retrieves backend statistics based on a PID, with the option of retrieving a BackendType if given in input. Currently, this is used for the backend-level IO statistics. The next move would be to reuse that for the backend-level WAL statistics. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal	2025-02-28 11:20:31 +09:00
Peter Eisentraut	ce62f2f2a0	Generalize hash and ordering support in amapi Stop comparing access method OID values against HASH_AM_OID and BTREE_AM_OID, and instead check the IndexAmRoutine for an index to see if it advertises its ability to perform the necessary ordering, hashing, or cross-type comparing functionality. A field amcanorder already existed, this uses it more widely. Fields amcanhash and amcancrosscompare are added for the other purposes. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2025-02-27 17:03:31 +01:00
Michael Paquier	495864a4cf	Refactor code of pg_stat_get_wal() building result tuple This commit adds to pgstatfuncs.c a new routine called pg_stat_wal_build_tuple(), helper routine for pg_stat_get_wal(). This is in charge of filling one tuple based on the contents of PgStat_WalStats retrieved from pgstats. This refactoring will be used by an upcoming patch introducing backend-level WAL statistics, simplifying the main patch. Note that it is not possible for stats_reset to be NULL in pg_stat_wal; backend statistics need to be able to handle this case. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal	2025-02-27 11:54:36 +09:00
Michael Paquier	0e42d31b0b	Adding new PgStat_WalCounters structure in pgstat.h This new structure contains the counters and the data related to the WAL activity statistics gathered from WalUsage, separated into its own structure so as it can be shared across more than one Stats structure in pg_stat.h. This refactoring will be used by an upcoming patch introducing backend-level WAL statistics. Bump PGSTAT_FILE_FORMAT_ID. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal	2025-02-26 16:48:54 +09:00
Michael Paquier	d7cbeaf261	Remove pgstat_flush_wal() All the processes that generate WAL should call pgstat_report_wal() to report all their statistics related to WAL, and this is already what happens in the tree. Keeping pgstat_report_wal() is confusing while the other routine is encouraged. This routine is not required since `fc415edf8c`, where it was lastly used in pgstat_report_stat() before an equivalent callback existed. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/Z71oPkJJICrRB5Ws@paquier.xyz	2025-02-26 15:37:28 +09:00
Michael Paquier	6c349d83b6	Re-add GUC track_wal_io_timing This commit is a rework of `2421e9a51d`, about which Andres Freund has raised some concerns as it is valuable to have both track_io_timing and track_wal_io_timing in some cases, as the WAL write and fsync paths can be a major bottleneck for some workloads. Hence, it can be relevant to not calculate the WAL timings in environments where pg_test_timing performs poorly while capturing some IO data under track_io_timing for the non-WAL IO paths. The opposite can be also true: it should be possible to disable the non-WAL timings and enable the WAL timings (the previous GUC setups allowed this possibility). track_wal_io_timing is added back in this commit, controlling if WAL timings should be calculated in pg_stat_io for the read, fsync and write paths, as done previously with pg_stat_wal. pg_stat_wal previously tracked only the sync and write parts (now removed), read stats is new data tracked in pg_stat_io, all three are aggregated if track_wal_io_timing is enabled. The read part matters during recovery or if a XLogReader is used. Extra note: more control over if the types of timings calculated in pg_stat_io could be done with a GUC that lists pairs of (IOObject,IOOp). Reported-by: Andres Freund <andres@anarazel.de> Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/3opf2wh2oljco6ldyqf7ukabw3jijnnhno6fjb4mlu6civ5h24@fcwmhsgmlmzu	2025-02-26 09:49:59 +09:00
Andres Freund	37c87e63f9	Change relpath() et al to return path by value For AIO, and also some other recent patches, we need the ability to call relpath() in a critical section. Until now that was not feasible, as it allocated memory. The fact that relpath() allocated memory also made it awkward to use in log messages because we had to take care to free the memory afterwards. Which we e.g. didn't do for when zeroing out an invalid buffer. We discussed other solutions, e.g. filling a pre-allocated buffer that's passed to relpath(), but they all came with plenty downsides or were larger projects. The easiest fix seems to be to make relpath() return the path by value. To be able to return the path by value we need to determine the maximum length of a relation path. This patch adds a long #define that computes the exact maximum, which is verified to be correct in a regression test. As this change the signature of relpath(), extensions using it will need to adapt their code. We discussed leaving a backward-compat shim in place, but decided it's not worth it given the use of relpath() doesn't seem widespread. Discussion: https://postgr.es/m/xeri5mla4b5syjd5a25nok5iez2kr3bm26j2qn4u7okzof2bmf@kwdh2vf7npra	2025-02-25 09:02:07 -05:00
Andres Freund	5ee75e32fa	Add static asserts for MAX_BACKENDS limiting factors So far the various dependencies were documented in the comment above MAX_BACKENDS, but not checked. Discussion: https://postgr.es/m/CA+COZaBO_s3LfALq=b+HcBHFSOEGiApVjrRacCe4VP9m7CJsNQ@mail.gmail.com	2025-02-24 06:23:41 -05:00
Andres Freund	6394a3a61c	Move MAX_BACKENDS to procnumber.h MAX_BACKENDS influences many things besides postmaster. I e.g. noticed that we don't have static assertions ensuring BUF_REFCOUNT_MASK is big enough for MAX_BACKENDS, adding them would require including postmaster.h in buf_internals.h which doesn't seem right. While at that, add MAX_BACKENDS_BITS, as that's useful in various places for static assertions (to be added in subsequent commits). Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/wptizm4qt6yikgm2pt52xzyv6ycmqiutloyvypvmagn7xvqkce@d4xuv3mylpg4	2025-02-24 06:23:41 -05:00
Michael Paquier	2421e9a51d	Remove read/sync fields from pg_stat_wal and GUC track_wal_io_timing The four following attributes are removed from pg_stat_wal: * wal_write * wal_sync * wal_write_time * wal_sync_time `a051e71e28` has added an equivalent of this information in pg_stat_io with more granularity as this now spreads across the backend types, IO context and IO objects. So, keeping the same information in pg_stat_wal has little benefits. Another benefit of this commit is the removal of PendingWalStats, simplifying an upcoming patch to add per-backend WAL statistics, which already support IO statistics and which have access to the write/sync stats data of WAL. The GUC track_wal_io_timing, that was used to enable or disable the aggregation of the write and sync timings for WAL, is also removed. pgstat_prepare_io_time() is simplified. Bump catalog version. Bump PGSTAT_FILE_FORMAT_ID, due to the update of PgStat_WalStats. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal	2025-02-24 09:51:56 +09:00
Peter Eisentraut	454c182f85	backend libpq void * argument for binary data Change some backend libpq functions to take void * for binary data instead of char *. This removes the need for numerous casts. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org	2025-02-23 14:27:02 +01:00
Peter Eisentraut	f98765f0ce	jsonb internal API void * argument for binary data Change some internal jsonb API functions to take void * for binary data instead of char *. This removes the need for numerous casts. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org	2025-02-23 08:34:55 +01:00
Masahiko Sawada	44fe30fdab	Add default_char_signedness field to ControlFileData. The signedness of the 'char' type in C is implementation-dependent. For instance, 'signed char' is used by default on x86 CPUs, while 'unsigned char' is used on aarch CPUs. Previously, we accidentally let C implementation signedness affect persistent data. This led to inconsistent results when comparing char data across different platforms. This commit introduces a new 'default_char_signedness' field in ControlFileData to store the signedness of the 'char' type. While this change does not encourage the use of 'char' without explicitly specifying its signedness, this field can be used as a hint to ensure consistent behavior for pre-v18 data files that store data sorted by the 'char' type on disk (e.g., GIN and GiST indexes), especially in cross-platform replication scenarios. Newly created database clusters unconditionally set the default char signedness to true. pg_upgrade (with an upcoming commit) changes this flag for clusters if the source database cluster has signedness=false. As a result, signedness=false setting will become rare over time. If we had known about the problem during the last development cycle that forced initdb (v8.3), we would have made all clusters signed or all clusters unsigned. Making pg_upgrade the only source of signedness=false will cause the population of database clusters to converge toward that retrospective ideal. Bump catalog version (for the catalog changes) and PG_CONTROL_VERSION (for the additions in ControlFileData). Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com	2025-02-21 10:12:08 -08:00
Peter Eisentraut	329304c901	Support text position search functions with nondeterministic collations This allows using text position search functions with nondeterministic collations. These functions are - position, strpos - replace - split_part - string_to_array - string_to_table which all use common internal infrastructure. There was previously no internal implementation of this, so it was met with a not-supported error. This adds the internal implementation and removes the error. Unlike with deterministic collations, the search cannot use any byte-by-byte optimized techniques but has to go substring by substring. We also need to consider that the found match could have a different length than the needle and that there could be substrings of different length matching at a position. In most cases, we need to find the longest such substring (greedy semantics), but this can be configured by each caller. Reviewed-by: Euler Taveira <euler@eulerto.com> Discussion: https://www.postgresql.org/message-id/flat/582b2613-0900-48ca-8b0d-340c06f4d400@eisentraut.org	2025-02-21 12:21:17 +01:00
Michael Paquier	984410b923	Add missing deparsing of [NO] IDENT to XMLSERIALIZE() NO INDENT is the default, and is added if no explicit indentation flag was provided with XMLSERIALIZE(). Oversight in `483bdb2afe`. Author: Jim Jones <jim.jones@uni-muenster.de> Discussion: https://postgr.es/m/bebd457e-5b43-46b3-8fc6-f6a6509483ba@uni-muenster.de Backpatch-through: 16	2025-02-21 17:30:56 +09:00
Peter Eisentraut	3e4d868615	Remove various unnecessary (char ) casts Remove a number of (char ) casts that are unnecessary. Or in some cases, rewrite the code to make the purpose of the cast clearer. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org	2025-02-20 19:49:27 +01:00
Daniel Gustafsson	b3f0be788a	Add support for OAUTHBEARER SASL mechanism This commit implements OAUTHBEARER, RFC 7628, and OAuth 2.0 Device Authorization Grants, RFC 8628. In order to use this there is a new pg_hba auth method called oauth. When speaking to a OAuth- enabled server, it looks a bit like this: $ psql 'host=example.org oauth_issuer=... oauth_client_id=...' Visit https://oauth.example.org/login and enter the code: FPQ2-M4BG Device authorization is currently the only supported flow so the OAuth issuer must support that in order for users to authenticate. Third-party clients may however extend this and provide their own flows. The built-in device authorization flow is currently not supported on Windows. In order for validation to happen server side a new framework for plugging in OAuth validation modules is added. As validation is implementation specific, with no default specified in the standard, PostgreSQL does not ship with one built-in. Each pg_hba entry can specify a specific validator or be left blank for the validator installed as default. This adds a requirement on libcurl for the client side support, which is optional to build, but the server side has no additional build requirements. In order to run the tests, Python is required as this adds a https server written in Python. Tests are gated behind PG_TEST_EXTRA as they open ports. This patch has been a multi-year project with many contributors involved with reviews and in-depth discussions: Michael Paquier, Heikki Linnakangas, Zhihong Yu, Mahendrakar Srinivasarao, Andrey Chudnovsky and Stephen Frost to name a few. While Jacob Champion is the main author there have been some levels of hacking by others. Daniel Gustafsson contributed the validation module and various bits and pieces; Thomas Munro wrote the client side support for kqueue. Author: Jacob Champion <jacob.champion@enterprisedb.com> Co-authored-by: Daniel Gustafsson <daniel@yesql.se> Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Antonin Houska <ah@cybertec.at> Reviewed-by: Kashif Zeeshan <kashi.zeeshan@gmail.com> Discussion: https://postgr.es/m/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.camel@vmware.com	2025-02-20 16:25:17 +01:00
Amit Langote	525392d572	Don't lock partitions pruned by initial pruning Before executing a cached generic plan, AcquireExecutorLocks() in plancache.c locks all relations in a plan's range table to ensure the plan is safe for execution. However, this locks runtime-prunable relations that will later be pruned during "initial" runtime pruning, introducing unnecessary overhead. This commit defers locking for such relations to executor startup and ensures that if the CachedPlan is invalidated due to concurrent DDL during this window, replanning is triggered. Deferring these locks avoids unnecessary locking overhead for pruned partitions, resulting in significant speedup, particularly when many partitions are pruned during initial runtime pruning. * Changes to locking when executing generic plans: AcquireExecutorLocks() now locks only unprunable relations, that is, those found in PlannedStmt.unprunableRelids (introduced in commit `cbc127917e`), to avoid locking runtime-prunable partitions unnecessarily. The remaining locks are taken by ExecDoInitialPruning(), which acquires them only for partitions that survive pruning. This deferral does not affect the locks required for permission checking in InitPlan(), which takes place before initial pruning. ExecCheckPermissions() now includes an Assert to verify that all relations undergoing permission checks, none of which can be in the set of runtime-prunable relations, are properly locked. * Plan invalidation handling: Deferring locks introduces a window where prunable relations may be altered by concurrent DDL, invalidating the plan. A new function, ExecutorStartCachedPlan(), wraps ExecutorStart() to detect and handle invalidation caused by deferred locking. If invalidation occurs, ExecutorStartCachedPlan() updates CachedPlan using the new UpdateCachedPlan() function and retries execution with the updated plan. To ensure all code paths that may be affected by this handle invalidation properly, all callers of ExecutorStart that may execute a PlannedStmt from a CachedPlan have been updated to use ExecutorStartCachedPlan() instead. UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. A new CachedPlan.stmt_context, created as a child of CachedPlan.context, allows freeing old PlannedStmts while preserving the CachedPlan structure and its statement list. This ensures that loops over statements in upstream callers of ExecutorStartCachedPlan() remain intact. ExecutorStart() and ExecutorStart_hook implementations now return a boolean value indicating whether plan initialization succeeded with a valid PlanState tree in QueryDesc.planstate, or false otherwise, in which case QueryDesc.planstate is NULL. Hook implementations are required to call standard_ExecutorStart() at the beginning, and if it returns false, they should do the same without proceeding. * Testing: To verify these changes, the delay_execution module tests scenarios where cached plans become invalid due to changes in prunable relations after deferred locks. * Note to extension authors: ExecutorStart_hook implementations must verify plan validity after calling standard_ExecutorStart(), as explained earlier. For example: if (prev_ExecutorStart) plan_valid = prev_ExecutorStart(queryDesc, eflags); else plan_valid = standard_ExecutorStart(queryDesc, eflags); if (!plan_valid) return false; <extension-code> return true; Extensions accessing child relations, especially prunable partitions, via ExecGetRangeTableRelation() must now ensure their RT indexes are present in es_unpruned_relids (introduced in commit `cbc127917e`), or they will encounter an error. This is a strict requirement after this change, as only relations in that set are locked. The idea of deferring some locks to executor startup, allowing locks for prunable partitions to be skipped, was first proposed by Tom Lane. Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: David Rowley <dgrowleyml@gmail.com> (earlier versions) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier versions) Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com	2025-02-20 17:09:48 +09:00
Alexander Korotkov	e983ee9380	Improve statistics estimation for single-column GROUP BY in sub-queries This commit follows the idea of the `4767bc8ff2`. If sub-query has only one GROUP BY column, we can consider its output variable as being unique. We can employ this fact in the statistics to make more precise estimations in the upper query block. Author: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>	2025-02-19 11:59:30 +02:00
Amit Kapila	ac0e33136a	Invalidate inactive replication slots. This commit introduces idle_replication_slot_timeout GUC that allows inactive slots to be invalidated at the time of checkpoint. Because checkpoints happen checkpoint_timeout intervals, there can be some lag between when the idle_replication_slot_timeout was exceeded and when the slot invalidation is triggered at the next checkpoint. To avoid such lags, users can force a checkpoint to promptly invalidate inactive slots. Note that the idle timeout invalidation mechanism is not applicable for slots that do not reserve WAL or for slots on the standby server that are synced from the primary server (i.e., standby slots having 'synced' field 'true'). Synced slots are always considered to be inactive because they don't perform logical decoding to produce changes. The slots can become inactive for a long period if a subscriber is down due to a system error or inaccessible because of network issues. If such a situation persists, it might be more practical to recreate the subscriber rather than attempt to recover the node and wait for it to catch up which could be time-consuming. Then, external tools could create replication slots (e.g., for migrations or upgrades) that may fail to remove them if an error occurs, leaving behind unused slots that take up space and resources. Manually cleaning them up can be tedious and error-prone, and without intervention, these lingering slots can cause unnecessary WAL retention and system bloat. As the duration of idle_replication_slot_timeout is in minutes, any test using that would be time-consuming. We are planning to commit a follow up patch for tests by using the injection point framework. Author: Nisha Moond <nisha.moond412@gmail.com> Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Vignesh C <vignesh21@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com Discussion: https://postgr.es/m/OS0PR01MB5716C131A7D80DAE8CB9E88794FC2@OS0PR01MB5716.jpnprd01.prod.outlook.com	2025-02-19 09:29:50 +05:30
Michael Paquier	f2e4c2b203	Make the description of some GUCs more consistent This commit improves the description of a couple of GUCs, to be more consistent with the style of their surroundings: * array_nulls * enable_self_join_elimination * optimize_bounded_sort * row_security * synchronize_seqscans Author: Kyotaro Horiguchi Discussion: https://postgr.es/m/20250218.103240.1422205966404509831.horikyota.ntt@gmail.com	2025-02-19 08:42:35 +09:00
Alexander Korotkov	fc069a3a63	Implement Self-Join Elimination The Self-Join Elimination (SJE) feature removes an inner join of a plain table to itself in the query tree if it is proven that the join can be replaced with a scan without impacting the query result. Self-join and inner relation get replaced with the outer in query, equivalence classes, and planner info structures. Also, the inner restrictlist moves to the outer one with the removal of duplicated clauses. Thus, this optimization reduces the length of the range table list (this especially makes sense for partitioned relations), reduces the number of restriction clauses and, in turn, selectivity estimations, and potentially improves total planner prediction for the query. This feature is dedicated to avoiding redundancy, which can appear after pull-up transformations or the creation of an EquivalenceClass-derived clause like the below. SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3); SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x); SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x; In the future, we could also reduce redundancy caused by subquery pull-up after unnecessary outer join removal in cases like the one below. SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x); Also, it can drastically help to join partitioned tables, removing entries even before their expansion. The SJE proof is based on innerrel_is_unique() machinery. We can remove a self-join when for each outer row: 1. At most, one inner row matches the join clause; 2. Each matched inner row must be (physically) the same as the outer one; 3. Inner and outer rows have the same row mark. In this patch, we use the next approach to identify a self-join: 1. Collect all merge-joinable join quals which look like a.x = b.x; 2. Add to the list above the baseretrictinfo of the inner table; 3. Check innerrel_is_unique() for the qual list. If it returns false, skip this pair of joining tables; 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the possibility of self-join elimination, the inner and outer clauses must match exactly. The relation replacement procedure is not trivial and is partly combined with the one used to remove useless left joins. Tests covering this feature were added to join.sql. Some of the existing regression tests changed due to self-join removal logic. Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com> Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Simon Riggs <simon@2ndquadrant.com> Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org> Reviewed-by: David Rowley <david.rowley@2ndquadrant.com> Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com> Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Hywel Carver <hywel@skillerwhale.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Reviewed-by: Greg Stark <stark@mit.edu> Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec> Reviewed-by: Michał Kłeczek <michal@kleczek.org> Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>	2025-02-17 12:44:12 +02:00
Alexander Korotkov	3fb58625d1	Revert: Get rid of WALBufMappingLock This commit reverts `6a2275b895`. Buildfarm failure on batta spots some concurrency issue, which requires further investigation.	2025-02-17 12:35:28 +02:00
Michael Paquier	eaf502747b	Move wal_buffers_full from PgStat_PendingWalStats to WalUsage wal_buffers_full has been introduced in pg_stat_wal in `8d9a935965`, as some information providing metrics for the tuning of the GUC wal_buffers. WalUsage has been introduced before that in `df3b181499`. Moving this field is proving to be beneficial for several reasons: - This information can now be made available in more layers, providing more granularity than just pg_stat_wal, on a per-query basis: EXPLAIN, pgss and VACUUM/ANALYZE logs. - A patch is under discussion to provide statistics for WAL at backend level, and this move simplifies a bit the handling of pending statistics. The remaining data in PgStat_PendingWalStats now relates to write/sync counters and times, with equivalents present in pg_stat_io, that backend statistics are able to already track. So this should cut all the dependencies between PgStat_PendingWalStats and WAL stats at backend level. As of this change, wal_buffers_full only shows in pg_stat_wal. Author: Bertrand Drouvot Reviewed-by: Ilia Evdokimov Discussion: https://postgr.es/m/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal	2025-02-17 13:14:28 +09:00
Alexander Korotkov	6a2275b895	Get rid of WALBufMappingLock Allow multiple backends to initialize WAL buffers concurrently. This way `MemSet((char ) NewPage, 0, XLOG_BLCKSZ);` can run in parallel without taking a single LWLock in exclusive mode. The new algorithm works as follows: reserve a page for initialization using XLogCtl->InitializeReserved, * ensure the page is written out, * once the page is initialized, try to advance XLogCtl->InitializedUpTo and signal to waiters using XLogCtl->InitializedUpToCondVar condition variable, * repeat previous steps until we reserve initialization up to the target WAL position, * wait until concurrent initialization finishes using a XLogCtl->InitializedUpToCondVar. Now, multiple backends can, in parallel, concurrently reserve pages, initialize them, and advance XLogCtl->InitializedUpTo to point to the latest initialized page. Author: Yura Sokolov <y.sokolov@postgrespro.ru> Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>	2025-02-17 04:25:29 +02:00
Nathan Bossart	977d865c36	Describe special values in GUC descriptions more consistently. Many GUCs accept special values like -1 or an empty string to disable the feature, use a system default, etc. While the documentation consistently lists these special values, the GUC descriptions do not. Many such descriptions fail to mention the special values, and those that do vary in phrasing and placement. This commit aims to bring some consistency to this area by applying the following rules: * Special values should be listed at the end of the long description. * Descriptions should use numerals (e.g., "0") instead of words (e.g., "zero"). * Special value mentions should be concise and direct (e.g., "0 disables the timeout.", "An empty string means use the operating system setting."). * Multiple special values should be listed in ascending order. Of course, there are exceptions, such as max_pred_locks_per_relation and search_path, whose special values are too complex to include. And there are cases like listen_addresses, where the meaning of an empty string is arguably too obvious to include. In those cases, I've refrained from adding special value information to the GUC description. Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/Z6aIy4aywxUZHAo6%40nathan	2025-02-14 10:44:30 -06:00
Peter Eisentraut	ed5e5f0710	Remove unnecessary (char ) casts [xlog] Remove (char ) casts no longer needed after XLogRegisterData() and XLogRegisterBufData() argument type change. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org	2025-02-13 10:57:07 +01:00

... 3 4 5 6 7 ...

9273 Commits