postgres

mirror of https://github.com/postgres/postgres.git synced 2025-11-21 00:42:43 +03:00

Author	SHA1	Message	Date
Jeff Davis	38172d1856	Injection points for hash aggregation. Requires adding a guard against shift-by-32. Previously, that was impossible because the number of partitions was always greater than 1, but a new injection point can force the number of partitions to 1. Discussion: https://postgr.es/m/ff4e59305e5d689e03cd256a736348d3e7958f8f.camel@j-davis.com	2025-02-11 11:26:25 -08:00
Melanie Plageman	052026c9b9	Eagerly scan all-visible pages to amortize aggressive vacuum Aggressive vacuums must scan every unfrozen tuple in order to advance the relfrozenxid/relminmxid. Because data is often vacuumed before it is old enough to require freezing, relations may build up a large backlog of pages that are set all-visible but not all-frozen in the visibility map. When an aggressive vacuum is triggered, all of these pages must be scanned. These pages have often been evicted from shared buffers and even from the kernel buffer cache. Thus, aggressive vacuums often incur large amounts of extra I/O at the expense of foreground workloads. To amortize the cost of aggressive vacuums, eagerly scan some all-visible but not all-frozen pages during normal vacuums. All-visible pages that are eagerly scanned and set all-frozen in the visibility map are counted as successful eager freezes and those not frozen are counted as failed eager freezes. If too many eager scans fail in a row, eager scanning is temporarily suspended until a later portion of the relation. The number of failures tolerated is configurable globally and per table. To effectively amortize aggressive vacuums, we cap the number of successes as well. Capping eager freeze successes also limits the amount of potentially wasted work if these pages are modified again before the next aggressive vacuum. Once we reach the maximum number of blocks successfully eager frozen, eager scanning is disabled for the remainder of the vacuum of the relation. Original design idea from Robert Haas, with enhancements from Andres Freund, Tomas Vondra, and me Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Treat <rob@xzilla.net> Reviewed-by: Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_ZF_KCzZuOrPrOqjGVe8iRVWEAJSpzMgRQs%3D5-v84cXUg%40mail.gmail.com	2025-02-11 13:53:48 -05:00
Andres Freund	4dd09a1d41	config: Rename "Asynchronous Behavior" to "I/O" "I/O" seems more descriptive than "Asynchronous Behavior", given that some of the GUCs in the section don't relate to anything asynchronous. Most other abbreviations in the config sections are un-abbreviated, but "Input/Output" seems less likely to be helpful than just IO or I/O. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/x3tlw2jk5gm3r3mv47hwrshffyw7halpczkfbk3peksxds7bvc@lguk43z3bsyq	2025-02-11 12:53:40 -05:00
Andres Freund	740766d37c	config: Split "Worker Processes" out of "Asynchronous Behavior" Having all the worker related GUCs in the same section as IO controlling GUCs doesn't really make sense. Create a separate section for them. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/x3tlw2jk5gm3r3mv47hwrshffyw7halpczkfbk3peksxds7bvc@lguk43z3bsyq	2025-02-11 12:53:40 -05:00
Tom Lane	c366d2bdba	Allow extension functions to participate in in-place updates. Commit `1dc5ebc90` allowed PL/pgSQL to perform in-place updates of expanded-object variables that are being updated with assignments like "x := f(x, ...)". However this was allowed only for a hard-wired list of functions f(), since we need to be sure that f() will not modify the variable if it fails. It was always envisioned that we should make that extensible, but at the time we didn't have a good way to do so. Since then we've invented the idea of "support functions" to allow attaching specialized optimization knowledge to functions, and that is a perfect mechanism for doing this. Hence, adjust PL/pgSQL to use a support function request instead of hard-wired logic to decide if in-place update is safe. Preserve the previous optimizations by creating support functions for the three functions that were previously hard-wired. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com> Discussion: https://postgr.es/m/CACxu=vJaKFNsYxooSnW1wEgsAO5u_v1XYBacfVJ14wgJV_PYeg@mail.gmail.com	2025-02-11 12:49:34 -05:00
Jeff Davis	9f12da78d9	Lock table in ShareUpdateExclusive when importing index stats. Follow locking behavior of ANALYZE when importing statistics. In particular, when importing index statistics, the table must be locked in ShareUpdateExclusive mode. Fixes bug reportd by Jian He. ANALYZE doesn't update statistics on partitioned indexes, and the locking requirements are slightly different for in-place updates on partitioned indexes versus normal indexes. To be conservative, lock both the partitioned table and the partitioned index in ShareUpdateExclusive mode when importing stats for a partitioned index. Author: Corey Huinker Reported-by: Jian He Reviewed-by: Michael Paquier Discussion: https://www.postgresql.org/message-id/CACJufxGreTY7qsCV8%2BBkuv0p5SXGTScgh%3DD%2BDq6%3D%2B_%3DXTp7FWg%40mail.gmail.com	2025-02-10 12:58:13 -08:00
Peter Eisentraut	9926f854d0	Cache NO ACTION foreign keys separately from RESTRICT foreign keys Now that we generate different SQL for temporal NO ACTION vs RESTRICT foreign keys, we should cache their query plans with different keys. Since the key also includes the constraint oid, this shouldn't be necessary, but we have been seeing build farm failures that suggest we might be sometimes using a cached NO ACTION plan to implement a RESTRICT constraint. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2025-02-09 13:43:56 +01:00
Peter Eisentraut	a9258629ed	Make TLS write functions' buffer arguments pointers const This also makes it match the equivalent APIs in libpq. Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org	2025-02-09 12:43:30 +01:00
Peter Eisentraut	b92c03342d	Allow non-btree speculative insertion indexes Previously, only btrees were supported as the arbiter index for speculative insertion because there was no way to get the equality strategy number for other index methods. We have this now (commit `c09e5a6a01`), so we can support this. At the moment, only btree supports unique indexes, so this does not change anything in practice, but it would allow another index method that has amcanunique to be supported. Co-authored-by: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2025-02-07 11:23:34 +01:00
Peter Eisentraut	bfe21b760e	Support non-btree indexes for foreign keys Previously, only btrees were supported as the referenced unique index for foreign keys because there was no way to get the equality strategy number for other index methods. We have this now (commit `c09e5a6a01`), so we can support this. In fact, this is now just a special case of the existing generalized "period" foreign key support, since that already knows how to lookup equality strategy numbers. Note that this does not change the requirement that the referenced index needs to be unique, and at the moment, only btree supports that, so this does not change anything in practice, but it would allow another index method that has amcanunique to be supported. Co-authored-by: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2025-02-07 11:23:34 +01:00
Peter Eisentraut	83ea6c5402	Virtual generated columns This adds a new variant of generated columns that are computed on read (like a view, unlike the existing stored generated columns, which are computed on write, like a materialized view). The syntax for the column definition is ... GENERATED ALWAYS AS (...) VIRTUAL and VIRTUAL is also optional. VIRTUAL is the default rather than STORED to match various other SQL products. (The SQL standard makes no specification about this, but it also doesn't know about VIRTUAL or STORED.) (Also, virtual views are the default, rather than materialized views.) Virtual generated columns are stored in tuples as null values. (A very early version of this patch had the ambition to not store them at all. But so much stuff breaks or gets confused if you have tuples where a column in the middle is completely missing. This is a compromise, and it still saves space over being forced to use stored generated columns. If we ever find a way to improve this, a bit of pg_upgrade cleverness could allow for upgrades to a newer scheme.) The capabilities and restrictions of virtual generated columns are mostly the same as for stored generated columns. In some cases, this patch keeps virtual generated columns more restricted than they might technically need to be, to keep the two kinds consistent. Some of that could maybe be relaxed later after separate careful considerations. Some functionality that is currently not supported, but could possibly be added as incremental features, some easier than others: - index on or using a virtual column - hence also no unique constraints on virtual columns - extended statistics on virtual columns - foreign-key constraints on virtual columns - not-null constraints on virtual columns (check constraints are supported) - ALTER TABLE / DROP EXPRESSION - virtual column cannot have domain type - virtual columns are not supported in logical replication The tests in generated_virtual.sql have been copied over from generated_stored.sql with the keyword replaced. This way we can make sure the behavior is mostly aligned, and the differences can be visible. Some tests for currently not supported features are currently commented out. Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Tested-by: Shlok Kyal <shlok.kyal.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b0a6@eisentraut.org	2025-02-07 09:46:59 +01:00
Amit Langote	cbc127917e	Track unpruned relids to avoid processing pruned relations This commit introduces changes to track unpruned relations explicitly, making it possible for top-level plan nodes, such as ModifyTable and LockRows, to avoid processing partitions pruned during initial pruning. Scan-level nodes, such as Append and MergeAppend, already avoid the unnecessary processing by accessing partition pruning results directly via part_prune_index. In contrast, top-level nodes cannot access pruning results directly and need to determine which partitions remain unpruned. To address this, this commit introduces a new bitmapset field, es_unpruned_relids, which the executor uses to track the set of unpruned relations. This field is referenced during plan initialization to skip initializing certain nodes for pruned partitions. It is initialized with PlannedStmt.unprunableRelids, a new field that the planner populates with RT indexes of relations that cannot be pruned during runtime pruning. These include relations not subject to partition pruning and those required for execution regardless of pruning. PlannedStmt.unprunableRelids is computed during set_plan_refs() by removing the RT indexes of runtime-prunable relations, identified from PartitionPruneInfos, from the full set of relation RT indexes. ExecDoInitialPruning() then updates es_unpruned_relids by adding partitions that survive initial pruning. To support this, PartitionedRelPruneInfo and PartitionedRelPruningData now include a leafpart_rti_map[] array that maps partition indexes to their corresponding RT indexes. The former is used in set_plan_refs() when constructing unprunableRelids, while the latter is used in ExecDoInitialPruning() to convert partition indexes returned by get_matching_partitions() into RT indexes, which are then added to es_unpruned_relids. These changes make it possible for ModifyTable and LockRows nodes to process only relations that remain unpruned after initial pruning. ExecInitModifyTable() trims lists, such as resultRelations, withCheckOptionLists, returningLists, and updateColnosLists, to consider only unpruned partitions. It also creates ResultRelInfo structs only for these partitions. Similarly, child RowMarks for pruned relations are skipped. By avoiding unnecessary initialization of structures for pruned partitions, these changes improve the performance of updates and deletes on partitioned tables during initial runtime pruning. Due to ExecInitModifyTable() changes as described above, EXPLAIN on a plan for UPDATE and DELETE that uses runtime initial pruning no longer lists partitions pruned during initial pruning. Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com	2025-02-07 17:15:09 +09:00
Nathan Bossart	401a6956fa	Disallow COPY FREEZE on foreign tables. This didn't actually work: the COPY succeeds, but the FREEZE optimization isn't applied. There doesn't seem to be an easy way to support FREEZE on foreign tables, so let's follow the precedent established by commit `5c9a5513a3` by raising an error early. This is arguably a bug fix, but due to the lack of reports, the minimal discussion on the mailing list, and the potential to break existing scripts, I am not back-patching it for now. Author: Sami Imseih <samimseih@gmail.com> Reviewed-by: Zhang Mingli <zmlpostgres@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0ujeNgKpE3OrLtR%3DeJGa5LkGMekFzQTwjgw%3DrzaLufQLQ%40mail.gmail.com	2025-02-06 15:23:40 -06:00
Nathan Bossart	527f8fec22	Fix autovacuum_vacuum_max_threshold's GUC description. Most GUCs that accept a special value to disable the feature mention it in their GUC description. This commit adds that information to autovacuum_vacuum_max_threshold's description. Oversight in commit `306dc520b9`.	2025-02-06 11:59:12 -06:00
Nathan Bossart	306dc520b9	Introduce autovacuum_vacuum_max_threshold. One way autovacuum chooses tables to vacuum is by comparing the number of updated or deleted tuples with a value calculated using autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor. The threshold specifies the base value for comparison, and the scale factor specifies the fraction of the table size to add to it. This strategy ensures that smaller tables are vacuumed after fewer updates/deletes than larger tables, which is reasonable in many cases but can result in infrequent vacuums on very large tables. This is undesirable for a couple of reasons, such as very large tables incurring a huge amount of bloat between vacuums. This new parameter provides a way to set a limit on the value calculated with autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor so that very large tables are vacuumed more frequently. By default, it is set to 100,000,000 tuples, but it can be disabled by setting it to -1. It can also be adjusted for individual tables by changing storage parameters. Author: Nathan Bossart <nathandbossart@gmail.com> Co-authored-by: Frédéric Yhuel <frederic.yhuel@dalibo.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Michael Banck <mbanck@gmx.net> Reviewed-by: Joe Conway <mail@joeconway.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com> Reviewed-by: Vinícius Abrahão <vinnix.bsd@gmail.com> Reviewed-by: Robert Treat <rob@xzilla.net> Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru> Discussion: https://postgr.es/m/956435f8-3b2f-47a6-8756-8c54ded61802%40dalibo.com	2025-02-05 15:48:18 -06:00
Amit Kapila	0ec3c295e7	Avoid updating inactive_since for invalid replication slots. It is possible for the inactive_since value of an invalid replication slot to be updated multiple times, which is unexpected behavior like during the release of the slot or at the time of restart. This is harmless because invalid slots are not allowed to be accessed but it is not prudent to update invalid slots. We are planning to invalidate slots due to other reasons like idle time and it will look odd that the slot's inactive_since displays the recent time in this field after invalidated due to idle time. So, this patch ensures that the inactive_since field of slots is not updated for invalid slots. In the passing, ensure to use the same inactive_since time for all the slots at restart while restoring them from the disk. Author: Nisha Moond <nisha.moond412@gmail.com> Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Vignesh C <vignesh21@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CABdArM7QdifQ_MHmMA=Cc4v8+MeckkwKncm2Nn6tX9wSCQ-+iw@mail.gmail.com	2025-02-05 08:56:14 +05:30
Alexander Korotkov	627d63419e	Allow usage of match_orclause_to_indexcol() for joins This commit allows transformation of OR-clauses into SAOP's for index scans within nested loop joins. That required the following changes. 1. Make match_orclause_to_indexcol() and group_similar_or_args() understand const-ness in the same way as match_opclause_to_indexcol(). This generally makes our approach more uniform. 2. Make match_join_clauses_to_index() pass OR-clauses to match_clause_to_index(). 3. Also switch match_join_clauses_to_index() to use list_append_unique_ptr() for adding clauses to *joinorclauses. That avoids possible duplicates when processing the same clauses with different indexes. Previously such duplicates were elimited in match_clause_to_index(), but now group_similar_or_args() each time generates distinct copies of grouped OR clauses. Discussion: https://postgr.es/m/CAPpHfdv%2BjtNwofg-p5z86jLYZUTt6tR17Wy00ta0dL%3DwHQN3ZA%40mail.gmail.com Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru> Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>	2025-02-04 23:21:49 +02:00
Alexander Korotkov	23ef119f58	Revise the header comment for match_clause_to_indexcol() Since `d4378c0005`, match_clause_to_indexcol() doesn't always return NULL for an OR clause. This commit reflects that in the function header comment. Reported-by: Pavel Borisov <pashkin.elfe@gmail.com>	2025-02-04 23:18:47 +02:00
Michael Paquier	a051e71e28	Add data for WAL in pg_stat_io and backend statistics This commit adds WAL IO stats to both pg_stat_io view and per-backend IO statistics (pg_stat_get_backend_io()). This change is possible since `f92c854cf4`, as WAL IO is not counted in blocks in some code paths where its stats data is measured (like WAL read in xlogreader.c). IOContext gains IOCONTEXT_INIT and IOObject IOOBJECT_WAL, with the following combinations allowed: - IOOBJECT_WAL/IOCONTEXT_NORMAL is used to track I/O operations done on already-created WAL segments. - IOOBJECT_WAL/IOCONTEXT_INIT is used for tracking I/O operations done when initializing WAL segments. The core changes are done in pg_stat_io.c, backend statistics inherit them. Backend statistics and pg_stat_io are now available for the WAL writer, the WAL receiver and the WAL summarizer processes. I/O timing data is controlled by the GUC track_io_timing, like the existing data of pg_stat_io for consistency. The timings related to IOOBJECT_WAL show up if the GUC is enabled (disabled by default). Bump pgstats file version, due to the additions in IOObject and IOContext, impacting the amount of data written for the fixed-numbered IO stats kind in the pgstats file. Author: Nazir Bilal Yavuz Reviewed-by: Bertrand Drouvot, Nitin Jadhav, Amit Kapila, Michael Paquier, Melanie Plageman, Bharath Rupireddy Discussion: https://postgr.es/m/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com	2025-02-04 16:50:00 +09:00
Peter Eisentraut	622f678c10	Integrate GistTranslateCompareType() into IndexAmTranslateCompareType() This turns GistTranslateCompareType() into a callback function of the gist index AM instead of a standalone function. The existing callers are changed to use IndexAmTranslateCompareType(). This then makes that code not hardcoded toward gist. This means in particular that the temporal keys code is now independent of gist. Also, this generalizes commit `74edabce7a`, so other index access methods other than the previously hardcoded ones could now work as REPLICA IDENTITY in a logical replication subscriber. Author: Mark Dilger <mark.dilger@enterprisedb.com> Co-authored-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2025-02-03 10:53:18 +01:00
Michael Paquier	b998fedab7	Improve comment on top of pgstat_count_io_op_time() This commit adds more documentation to pgstat_count_io_op_time() in pgstat_io.c, explaining its internals for pgstat_count_buffer_*(), pgBufferUsage and the contexts where these are used. Extracted from a larger patch by the same author. Author: Nazir Bilal Yavuz Discussion: https://postgr.es/m/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com	2025-02-03 11:19:58 +09:00
Michael Paquier	fcce828529	Fix typo in xlog.c "recovery" is not a verb. Introduced in `68cb5af46c`.	2025-02-03 09:22:45 +09:00
Peter Eisentraut	c09e5a6a01	Convert strategies to and from compare types For each Index AM, provide a mapping between operator strategies and the system-wide generic concept of a comparison type. For example, for btree, BTLessStrategyNumber maps to and from COMPARE_LT. Numerous places in the planner and executor think directly in terms of btree strategy numbers (and a few in terms of hash strategy numbers.) These should be converted over subsequent commits to think in terms of CompareType instead. (This commit doesn't make any use of this API yet.) Author: Mark Dilger <mark.dilger@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2025-02-02 10:26:04 +01:00
Peter Eisentraut	119fc30dd5	Move CompareType to separate header file We'll want to make use of it in more places, and we'd prefer to not have to include all of primnodes.h everywhere. Author: Mark Dilger <mark.dilger@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2025-02-02 08:11:57 +01:00
Michael Paquier	d61b9662b0	Mention jsonlog in description of logging_collector in GUC table logging_collector was only mentioning stderr and csvlog, and forgot about jsonlog. Oversight in `dc686681e0`, that has added support for jsonlog in log_destination. While on it, the description in the GUC table is tweaked to be more consistent with the documentation and postgresql.conf.sample. Author: Umar Hayat Reviewed-by: Ashutosh Bapat, Tom Lane Discussion: https://postgr.es/m/CAD68Dp1K_vBYqBEukHw=1jF7e76t8aszGZTFL2ugi=H7r=a7MA@mail.gmail.com Backpatch-through: 13	2025-02-02 11:31:21 +09:00
Peter Eisentraut	43493cceda	Add get_opfamily_name() function This refactors and simplifies various existing code to make use of the new function. Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2025-02-01 10:42:58 +01:00
Peter Eisentraut	a5709b5bb2	Rename GistTranslateStratnum() to GistTranslateCompareType() Follow up to commit `630f9a43ce`. The previous name had become confusing, because it doesn't actually translate a strategy number but a CompareType into a strategy number. We might add the inverse at some point, which would then probably be called something like GistTranslateStratnum. Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2025-02-01 10:18:46 +01:00
Tom Lane	53a4936505	Doc: add commentary about cowboy assignment of maintenance_work_mem. Whilst working on commit `041e8b95b` I happened to notice that parallel_vacuum_main() assigns directly to the maintenance_work_mem GUC. This is definitely not per project conventions, so I tried to fix it to use SetConfigOption(). But that fails with "parameter cannot be set during a parallel operation". It doesn't seem worth working on a cleaner answer, at least not till we have a few more instances of similar problems. But add some commentary, just so nobody gets the idea that this is an approved way to set a GUC.	2025-01-31 15:17:15 -05:00
Tom Lane	d4c3a6b8ad	Remove obsolete restriction on the range of log_rotation_size. When syslogger.c was first written, we didn't want to assume that all platforms have 64-bit ftello. But we've been assuming that since v13 (cf commit `799d22461`), so let's use that in syslogger.c and allow log_rotation_size to range up to INT_MAX kilobytes. The old code effectively limited log_rotation_size to 2GB regardless of platform. While nobody's complained, that doesn't seem too far away from what might be thought reasonable these days. I noticed this while searching for instances of "1024L" in connection with commit `041e8b95b`. These were the last such instances. (We still have instances of L-suffixed literals, but most of them are associated with wait intervals for pg_usleep or similar functions. I don't see any urgent reason to change that.)	2025-01-31 14:36:56 -05:00
Tom Lane	041e8b95b8	Get rid of our dependency on type "long" for memory size calculations. Consistently use "Size" (or size_t, or in some places int64 or double) as the type for variables holding memory allocation sizes. In most places variables' data types were fine already, but we had an ancient habit of computing bytes from kilobytes-units GUCs with code like "work_mem * 1024L". That risks overflow on Win64 where they did not make "long" as wide as "size_t". We worked around that by restricting such GUCs' ranges, so you couldn't set work_mem et al higher than 2GB on Win64. This patch removes that restriction, after replacing such calculations with "work_mem * (Size) 1024" or variants of that. It should be noted that this patch was constructed by searching outwards from the GUCs that have MAX_KILOBYTES as upper limit. So I can't positively guarantee there are no other places doing memory-size arithmetic in int or long variables. I do however feel pretty confident that increasing MAX_KILOBYTES on Win64 is safe now. Also, nothing in our code should be dealing in multiple-gigabyte allocations without authorization from a relevant GUC, so it seems pretty likely that this search caught everything that could be at risk of overflow. Author: Vladlen Popolitov <v.popolitov@postgrespro.ru> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1a01f0-66ec2d80-3b-68487680@27595217	2025-01-31 13:52:40 -05:00
Daniel Gustafsson	e21d6f2971	Move PG_MAX_AUTH_TOKEN_LENGTH to libpq/auth.h Future SASL mechanism, like OAUTHBEARER, will use this as a limit on token messages coming from the client, so promote it to the header file to make it available. This patch is extracted from a larger body of work aimed at adding support for OAUTHBEARER in libpq. Author: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CAOYmi+kJqzo6XsR9TEhvVfeVNQ-TyFM5LATypm9yoQVYk=4Wrw@mail.gmail.com	2025-01-31 15:39:35 +01:00
Amit Langote	76aa615943	Fix bad indentation introduced in commit `d47cbf474` Per buildfarm member koel	2025-01-31 16:44:24 +09:00
Amit Langote	d47cbf474e	Perform runtime initial pruning outside ExecInitNode() This commit builds on the prior change that moved PartitionPruneInfos out of individual plan nodes into a list in PlannedStmt, making it possible to initialize PartitionPruneStates without traversing the plan tree and perform runtime initial pruning before ExecInitNode() initializes the plan trees. These tasks are now handled in a new routine, ExecDoInitialPruning(), which is called by InitPlan() before calling ExecInitNode() on various plan trees. ExecDoInitialPruning() performs the initial pruning and saves the result -- a Bitmapset of indexes for surviving child subnodes -- in es_part_prune_results, a list in EState. PartitionPruneStates created for initial pruning are stored in es_part_prune_states, another list in EState, for later use during exec pruning. Both lists are parallel to es_part_prune_infos, which holds the PartitionPruneInfos from PlannedStmt, enabling shared indexing. PartitionPruneStates initialized in ExecDoInitialPruning() now include only the PartitionPruneContexts for initial pruning steps. Exec pruning contexts are initialized later in ExecInitPartitionExecPruning() when the parent plan node is initialized, as the exec pruning step expressions depend on the parent node's PlanState. The existing function PartitionPruneFixSubPlanMap() has been repurposed for this initialization to avoid duplicating a similar loop structure for finding PartitionedRelPruningData to initialize exec pruning contexts for. It has been renamed to InitExecPruningContexts() to reflect its new primary responsibility. The original logic to "fix subplan maps" remains intact but is now encapsulated within the renamed function. This commit removes two obsolete Asserts in partkey_datum_from_expr(). The ExprContext used for pruning expression evaluation is now independent of the parent PlanState, making these Asserts unnecessary. By centralizing pruning logic and decoupling it from the plan initialization step (ExecInitNode()), this change sets the stage for future patches that will use the result of initial pruning to save the overhead of redundant processing for pruned partitions. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com	2025-01-31 15:47:15 +09:00
Amit Kapila	f41d8468dd	Raise an error while trying to acquire an invalid slot. Once a replication slot is invalidated, it cannot be altered or used to fetch changes. However, a process could still acquire an invalid slot and fail later. For example, if a process acquires a logical slot that was invalidated due to wal_removed, it will eventually fail in CreateDecodingContext() when attempting to access the removed WAL. Similarly, for physical replication slots, even if the slot is invalidated and invalidation_reason is set to wal_removed, the walsender does not currently check for invalidation when starting physical replication. Instead, replication starts, and an error is only reported later while trying to access WAL. Similarly, we prohibit modifying slot properties for invalid slots but give the error for the same after acquiring the slot. This patch improves error handling by detecting invalid slots earlier at the time of slot acquisition which is the first step. This also helped in unifying different ERROR messages at different places and gave a consistent message for invalid slots. This means that the message for invalid slots will change to a generic message. This will also be helpful for future patches where we are planning to invalidate slots due to more reasons like idle_timeout because we don't have to modify multiple places in such cases and avoid the chances of missing out on a particular place. Author: Nisha Moond <nisha.moond412@gmail.com> Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Vignesh C <vignesh21@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CABdArM6pBL5hPnSQ+5nEVMANcF4FCH7LQmgskXyiLY75TMnKpw@mail.gmail.com	2025-01-31 10:27:35 +05:30
Michael Paquier	ce5c620fb6	Add pgstat_drop_matching_entries() to pgstats This allows users of the cumulative statistics to drop entries in the shared hash stats table, deleting as well local references. Callers of this function can optionally define a callback able to filter which entries to drop, similarly to pgstat_reset_matching_entries() with its callback do_reset(). pgstat_drop_all_entries() is refactored so as it uses this new function. Author: Lukas Fitti Discussion: https://postgr.es/m/CAP53PkwuFbo3NkwZgxwNRMjMfqPEqidD-SggaoQ4ijotBVLJAA@mail.gmail.com	2025-01-31 12:27:19 +09:00
Michael Paquier	1e380fa7d8	Fix comment of StrategySyncStart() The top comment of StrategySyncStart() mentions BufferSync(), but this function calls BgBufferSync(), not BufferSync(). Oversight in `9cd00c457e`. Author: Ashutosh Bapat Discussion: https://postgr.es/m/CAExHW5tgkjag8i-s=RFrCn5KAWDrC4zEPPkfUKczfccPOxBRQQ@mail.gmail.com Backpatch-through: 13	2025-01-31 11:05:57 +09:00
Tom Lane	b9d232b9de	Use "ssize_t" not "long" in max_stack_depth-related code. This change adapts these functions to the machine's address width without depending on "long" to be the right size. (It isn't on Win64, for example.) While it seems unlikely anyone would care to run with a stack depth limit exceeding 2GB, this is part of a general push to avoid using type "long" to represent memory sizes. It's convenient to use ssize_t rather than the perhaps-more-obvious choice of size_t/Size, because the code involved depends on working with a signed data type. Our MAX_KILOBYTES limit already ensures that ssize_t will be sufficient to represent the maximum value of max_stack_depth. Extracted from a larger patch by Vladlen, plus additional hackery by me. Author: Vladlen Popolitov <v.popolitov@postgrespro.ru> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1a01f0-66ec2d80-3b-68487680@27595217	2025-01-30 16:44:47 -05:00
Tom Lane	b9aa4166fa	Avoid integer overflow while testing wal_skip_threshold condition. smgrDoPendingSyncs had two distinct risks of integer overflow while deciding which way to ensure durability of a newly-created relation. First, it accumulated the total size of all forks in a variable of type BlockNumber (uint32). While we restrict an individual fork's size to fit in that, I don't believe there's such a restriction on all of them added together. Second, it proceeded to multiply the sum by BLCKSZ, which most certainly could overflow a uint32. (The exact expression is total_blocks * BLCKSZ / 1024. The compiler might choose to optimize that to total_blocks * 8, which is not at quite as much risk of overflow as a literal reading would be, but it's still wrong.) If an overflow did occur it could lead to a poor choice to shove a very large relation into WAL instead of fsync'ing it. This wouldn't be fatal, but it could be inefficient. Change total_blocks to uint64 which should be plenty, and rearrange the comparison calculation to be overflow-safe. I noticed this while looking for ramifications of the proposed change in MAX_KILOBYTES. It's not entirely clear to me why wal_skip_threshold is limited to MAX_KILOBYTES in the first place, but in any case this code is unsafe regardless of the range of wal_skip_threshold. Oversight in `c6b92041d` which introduced wal_skip_threshold, so back-patch to v13. Discussion: https://postgr.es/m/1a01f0-66ec2d80-3b-68487680@27595217 Backpatch-through: 13	2025-01-30 15:36:44 -05:00
Melanie Plageman	a5358c14b2	Move BitmapTableScan per-scan setup into a helper Add BitmapTableScanSetup(), a helper which contains all of the code that must be done on every scan of the table in a bitmap table scan. This includes scanning the index, building the bitmap, and setting up the scan descriptors. Pushing this setup into a helper function makes BitmapHeapNext() more readable. Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/CAN55FZ1vXu%2BZdT0_MM-i1vbTdfHHf0KR3cK6R5gs6dNNNpyrJw%40mail.gmail.com	2025-01-30 15:28:33 -05:00
Tom Lane	115a365519	Simplify executor's handling of CaseTestExpr & CoerceToDomainValue. Instead of deciding at runtime whether to read from casetest.value or caseValue_datum, split EEOP_CASE_TESTVAL into two opcodes and make the decision during expression compilation. Similarly for EEOP_DOMAIN_TESTVAL. This actually results in net less code, mainly because llvmjit_expr.c's code for handling these opcodes gets shorter. The performance gain is doubtless negligible, but this seems worth changing anyway on grounds of simplicity and understandability. Author: Andreas Karlsson <andreas@proxel.se> Co-authored-by: Xing Guo <higuoxing@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CACpMh+AiBYAWn+D1aU7Rsy-V1tox06Cbc0H3qA7rwL5zdJ=anQ@mail.gmail.com	2025-01-30 13:21:42 -05:00
Amit Langote	bb3ec16e14	Move PartitionPruneInfo out of plan nodes into PlannedStmt This moves PartitionPruneInfo from plan nodes to PlannedStmt, simplifying traversal by centralizing all PartitionPruneInfo structures in a single list in it, which holds all instances for the main query and its subqueries. Instead of plan nodes (Append or MergeAppend) storing PartitionPruneInfo pointers, they now reference an index in this list. A bitmapset field is added to PartitionPruneInfo to store the RT indexes corresponding to the apprelids field in Append or MergeAppend. This allows execution pruning logic to verify that it operates on the correct plan node, mainly to facilitate debugging. Duplicated code in set_append_references() and set_mergeappend_references() is refactored into a new function, register_pruneinfo(). This updates RT indexes by applying rtoffet and adds PartitionPruneInfo to the global list in PlannerGlobal. By allowing pruning to be performed without traversing the plan tree, this change lays the groundwork for runtime initial pruning to occur independently of plan tree initialization. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> (earlier version) Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com	2025-01-30 11:57:32 +09:00
Tom Lane	ba0da16bd0	Require callers of coerce_to_domain() to supply base type/typmod. In view of the issue fixed in commit `0da39aa76`, it no longer seems like a great idea for coerce_to_domain() to offer to perform a lookup that its caller probably should have done already. The caller should be providing a value of the domain's base type, so it's hard to envision a valid case where it hasn't looked up that type. After `0da39aa76` there is only one caller using the option for internal lookup, and that one can trivially be rearranged to not do that. So this seems more like a bug-encouraging misfeature than a useful shortcut; let's get rid of it (in HEAD only, there's no need to break any external callers in back branches). Discussion: https://postgr.es/m/1865579.1738113656@sss.pgh.pa.us	2025-01-29 15:42:25 -05:00
Tom Lane	0da39aa766	Handle default NULL insertion a little better. If a column is omitted in an INSERT, and there's no column default, the code in preptlist.c generates a NULL Const to be inserted. Furthermore, if the column is of a domain type, we wrap the Const in CoerceToDomain, so as to throw a run-time error if the domain has a NOT NULL constraint. That's fine as far as it goes, but there are two problems: 1. We're being sloppy about the type/typmod that the Const is labeled with. It really should have the domain's base type/typmod, since it's the input to CoerceToDomain not the output. This can result in coerce_to_domain inserting a useless length-coercion function (useless because it's being applied to a null). The coercion would typically get const-folded away later, but it'd be better not to create it in the first place. 2. We're not applying expression preprocessing (specifically, eval_const_expressions) to the resulting expression tree. The planner's primary expression-preprocessing pass already happened, so that means the length coercion step and CoerceToDomain node miss preprocessing altogether. This is at the least inefficient, since it means the length coercion and CoerceToDomain will actually be executed for each inserted row, though they could be const-folded away in most cases. Worse, it seems possible that missing preprocessing for the length coercion could result in an invalid plan (for example, due to failing to perform default-function-argument insertion). I'm not aware of any live bug of that sort with core datatypes, and it might be unreachable for extension types as well because of restrictions of CREATE CAST, but I'm not entirely convinced that it's unreachable. Hence, it seems worth back-patching the fix (although I only went back to v14, as the patch doesn't apply cleanly at all in v13). There are several places in the rewriter that are building null domain constants the same way as preptlist.c. While those are before the planner and hence don't have any reachable bug, they're still applying a length coercion that will be const-folded away later, uselessly wasting cycles. Hence, make a utility routine that all of these places can call to do it right. Making this code more careful about the typmod assigned to the generated NULL constant has visible but cosmetic effects on some of the plans shown in contrib/postgres_fdw's regression tests. Discussion: https://postgr.es/m/1865579.1738113656@sss.pgh.pa.us Backpatch-through: 14	2025-01-29 15:31:55 -05:00
Tom Lane	f6ff75f796	Make BufferIsExclusiveLocked and BufferIsDirty work for local buffers. These functions tried to check the state of the buffer's content lock even for local buffers. Since we don't use the content lock for a local buffer, that would lead to a "false" result from LWLockHeldByMeInMode, which would mean a misleading "false" answer from BufferIsExclusiveLocked (we'd rather that case always return "true") or an assertion failure in BufferIsDirty. The core code never applies these two functions to local buffers, and apparently no extensions do either, since we've not heard complaints. Still, in the name of future-proofing, let's fix them to act as though a pinned local buffer is content-locked. Author: Srinath Reddy <srinath2133@gmail.com> Discussion: https://postgr.es/m/19396ef77f8.1098c4a1810508.2255483659262451647@zohocorp.com	2025-01-29 13:23:31 -05:00
John Naylor	128897b101	Fix grammatical typos around possessive "its" Some places spelled it "it's", which is short for "it is". In passing, fix a couple other nearby grammatical errors. Author: Jacob Brazeal <jacob.brazeal@gmail.com> Discussion: https://postgr.es/m/CA+COZaAO8g1KJCV0T48=CkJMjAnnfTGLWOATz+2aCh40c2Nm+g@mail.gmail.com	2025-01-29 14:39:14 +07:00
Amit Kapila	75eb9766ec	Rename pubgencols_type to pubgencols in pg_publication. The column added in commit `e65dbc9927`, pubgencols_type, was inconsistent with the naming conventions of other columns in the pg_publication catalog. Author: Vignesh C <vignesh21@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Discussion: https://postgr.es/m/CALDaNm1u-ufVOW-RUsXSooqzkpohxfZYy=z78fbcr_9Pq5hbCg@mail.gmail.com	2025-01-28 10:42:46 +05:30
Michael Paquier	30a6ed0ce4	Track per-relation cumulative time spent in [auto]vacuum and [auto]analyze This commit adds four fields to the statistics of relations, aggregating the amount of time spent for each operation on a relation: - total_vacuum_time, for manual vacuum. - total_autovacuum_time, for vacuum done by the autovacuum daemon. - total_analyze_time, for manual analyze. - total_autoanalyze_time, for analyze done by the autovacuum daemon. This gives users the option to derive the average time spent for these operations with the help of the related "count" fields. Bump catalog version (for the catalog changes) and PGSTAT_FILE_FORMAT_ID (for the additions in PgStat_StatTabEntry). Author: Sami Imseih Reviewed-by: Bertrand Drouvot, Michael Paquier Discussion: https://postgr.es/m/CAA5RZ0uVOGBYmPEeGF2d1B_67tgNjKx_bKDuL+oUftuoz+=Y1g@mail.gmail.com	2025-01-28 09:57:32 +09:00
Michael Paquier	65281391a9	Print out error position for some ALTER TABLE ALTER COLUMN type A ParseState exists in ATPrepAlterColumnType() since its introduction in `077db40fa1`, and it has never relied on a query string that could be used to point at a location in the origin string on error. The output of some regression tests are updated, showing the error location where applicable. Six error strings are upgraded with the error location. Author: Jian He Discussion: https://postgr.es/m/CACJufxGfbPfWLjcEz33G9eW_epDW0UDi2H05i9eSTPKGJ4rxSA@mail.gmail.com	2025-01-27 13:51:23 +09:00
Álvaro Herrera	0a16c8326c	Add missing CommandCounterIncrement For commit `b663b9436e` I thought this was useless, but turns out not to be for the case where a partitioned table has two identical foreign key constraints which can both be matched by the same constraint in a partition during attach. This CCI makes the match search for the second constraint in the parent ignore the constraint in the child that has already been matched by the first constraint in the parent. Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/c599253c-1ccd-4161-80fc-c9065e037a09@gmail.com	2025-01-26 17:34:28 +01:00
Noah Misch	d28cd3e7b2	At update of non-LP_NORMAL TID, fail instead of corrupting page header. The right mix of DDL and VACUUM could corrupt a catalog page header such that PageIsVerified() durably fails, requiring a restore from backup. This affects only catalogs that both have a syscache and have DDL code that uses syscache tuples to construct updates. One of the test permutations shows a variant not yet fixed. This makes !TransactionIdIsValid(TM_FailureData.xmax) possible with TM_Deleted. I think core and PGXN are indifferent to that. Per bug #17821 from Alexander Lakhin. Back-patch to v13 (all supported versions). The test case is v17+, since it uses INJECTION_POINT. Discussion: https://postgr.es/m/17821-dd8c334263399284@postgresql.org	2025-01-25 11:28:14 -08:00

1 2 3 4 5 ...

26488 Commits