postgres

mirror of https://github.com/postgres/postgres.git synced 2025-10-22 14:32:25 +03:00

Author	SHA1	Message	Date
Noah Misch	a07e03fd8f	Fix data loss at inplace update after heap_update(). As previously-added tests demonstrated, heap_inplace_update() could instead update an unrelated tuple of the same catalog. It could lose the update. Losing relhasindex=t was a source of index corruption. Inplace-updating commands like VACUUM will now wait for heap_update() commands like GRANT TABLE and GRANT DATABASE. That isn't ideal, but a long-running GRANT already hurts VACUUM progress more just by keeping an XID running. The VACUUM will behave like a DELETE or UPDATE waiting for the uncommitted change. For implementation details, start at the systable_inplace_update_begin() header comment and README.tuplock. Back-patch to v12 (all supported versions). In back branches, retain a deprecated heap_inplace_update(), for extensions. Reported by Smolkin Grigory. Reviewed by Nitin Motiani, (in earlier versions) Heikki Linnakangas, and (in earlier versions) Alexander Lakhin. Discussion: https://postgr.es/m/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com	2024-09-24 15:25:18 -07:00
Tom Lane	54562c9cfa	Improve Asserts checking relation matching in parallel scans. table_beginscan_parallel and index_beginscan_parallel contain Asserts checking that the relation a worker will use in a parallel scan is the same one the leader intended. However, they were checking for relation OID match, which was not strong enough to detect the mismatch problem fixed in `126ec0bc7`. What would be strong enough is to compare relfilenodes instead. Arguably, that's a saner definition anyway, since a scan surely operates on a physical relation not a logical one. Hence, store and compare RelFileLocators not relation OIDs. Also ensure that index_beginscan_parallel checks the index identity not just the table identity. Discussion: https://postgr.es/m/2127254.1726789524@sss.pgh.pa.us	2024-09-20 16:37:55 -04:00
Michael Paquier	b14e9ce7d5	Extend PgStat_HashKey.objid from 4 to 8 bytes This opens the possibility to define keys for more types of statistics kinds in PgStat_HashKey, the first case being 8-byte query IDs for statistics like pg_stat_statements. This increases the size of PgStat_HashKey from 12 to 16 bytes, while PgStatShared_HashEntry, entry stored in the dshash for pgstats, keeps the same size due to alignment. xl_xact_stats_item, that tracks the stats items to drop in commit WAL records, is increased from 12 to 16 bytes. Note that individual chunks in commit WAL records should be multiples of sizeof(int), hence 8-byte object IDs are stored as two uint32, based on a suggestion from Heikki Linnakangas. While on it, the field of PgStat_HashKey is renamed from "objoid" to "objid", as for some stats kinds this field does not refer to OIDs but just IDs, like for replication slot stats. This commit bumps the following format variables: - PGSTAT_FILE_FORMAT_ID, as PgStat_HashKey is written to the stats file for non-serialized stats kinds in the dshash table. - XLOG_PAGE_MAGIC for the changes in xl_xact_stats_item. - Catalog version, for the SQL function pg_stat_have_stats(). Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/ZsvTS9EW79Up8I62@paquier.xyz	2024-09-18 12:44:15 +09:00
Peter Eisentraut	fc0438b4e8	Add temporal PRIMARY KEY and UNIQUE constraints Add WITHOUT OVERLAPS clause to PRIMARY KEY and UNIQUE constraints. These are backed by GiST indexes instead of B-tree indexes, since they are essentially exclusion constraints with = for the scalar parts of the key and && for the temporal part. (previously committed as `46a0cd4cef`, reverted by 46a0cd4cefb; the new part is this:) Because 'empty' && 'empty' is false, the temporal PK/UQ constraint allowed duplicates, which is confusing to users and breaks internal expectations. For instance, when GROUP BY checks functional dependencies on the PK, it allows selecting other columns from the table, but in the presence of duplicate keys you could get the value from any of their rows. So we need to forbid empties. This all means that at the moment we can only support ranges and multiranges for temporal PK/UQs, unlike the original patch (above). Documentation and tests for this are added. But this could conceivably be extended by introducing some more general support for the notion of "empty" for other types. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-09-17 11:29:30 +02:00
Peter Eisentraut	7406ab623f	Add stratnum GiST support function This is support function 12 for the GiST AM and translates "well-known" RT*StrategyNumber values into whatever strategy number is used by the opclass (since no particular numbers are actually required). We will use this to support temporal PRIMARY KEY/UNIQUE/FOREIGN KEY/FOR PORTION OF functionality. This commit adds two implementations, one for internal GiST opclasses (just an identity function) and another for btree_gist opclasses. It updates btree_gist from 1.7 to 1.8, adding the support function for all its opclasses. (previously committed as `6db4598fcb`, reverted by 8aee330af55; this is essentially unchanged from those) Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-09-17 11:29:29 +02:00
Michael Paquier	00c76cf21c	Move logic related to WAL replay of Heap/Heap2 into its own file This brings more clarity to heapam.c, by cleanly separating all the logic related to WAL replay and the rest of Heap and Heap2, similarly to other RMGRs like hash, btree, etc. The header reorganization is also nice in heapam.c, cutting half of the headers required. Author: Li Yong Reviewed-by: Sutou Kouhei, Michael Paquier Discussion: https://postgr.es/m/EFE55E65-D7BD-4C6A-B630-91F43FD0771B@ebay.com	2024-09-12 13:32:05 +09:00
Peter Eisentraut	56fead44dc	Add amgettreeheight index AM API routine The only current implementation is for btree where it calls _bt_getrootheight(). Other index types can now also use this to pass information to their amcostestimate routine. Previously, btree was hardcoded and other index types could not hook into the optimizer at this point. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2024-09-10 10:03:23 +02:00
Peter Eisentraut	2b5f57977f	Add const qualifiers to XLogRegister*() functions Add const qualifiers to XLogRegisterData() and XLogRegisterBufData(). Several unconstify() calls can be removed. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/dd889784-9ce7-436a-b4f1-52e4a5e577bd@eisentraut.org	2024-09-03 08:06:03 +02:00
Peter Geoghegan	1343ae954c	Give nbtree move right function internal linkage. Declare _bt_moveright() static. This is a minor modularity win; the routine was already private to nbtsearch.c for all practical purposes. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAEze2WgWVzCNEXQB_op5MMZMDgJ3fg3AhVm6bq2iZPpJNXGhWw@mail.gmail.com	2024-08-12 14:36:55 -04:00
David Rowley	ffabb56c94	Fix a series of typos and outdated references Author: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/c1d63754-cb85-2d8a-8409-bde2c4d2d04b@gmail.com	2024-08-12 23:27:09 +12:00
Heikki Linnakangas	1e35951e71	Turn a few 'validnsps' static variables into locals There was no need for these to be static buffers, local variables work just as well. I think they were marked as 'static' to imply that they are read-only, but 'const' is more appropriate for that, so change them to const. To make it possible to mark the variables as 'const', also add 'const' decorations to the transformRelOptions() signature. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/54c29fb0-edf2-48ea-9814-44e918bbd6e8@iki.fi	2024-08-06 23:03:43 +03:00
Robert Haas	8a53539bd6	Wait for WAL summarization to catch up before creating .partial file. When a standby is promoted, CleanupAfterArchiveRecovery() may decide to rename the final WAL file from the old timeline by adding ".partial" to the name. If WAL summarization is enabled and this file is renamed before its partial contents are summarized, WAL summarization breaks: the summarizer gets stuck at that point in the WAL stream and just errors out. To fix that, first make the startup process wait for WAL summarization to catch up before renaming the file. Generally, this should be quick, and if it's not, the user can shut off summarize_wal and try again. To make this fix work, also teach the WAL summarizer that after a promotion has occurred, no more WAL can appear on the previous timeline: previously, the WAL summarizer wouldn't switch to the new timeline until we actually started writing WAL there, but that meant that when the startup process was waiting for the WAL summarizer, it was waiting for an action that the summarizer wasn't yet prepared to take. In the process of fixing these bugs, I realized that the logic to wait for WAL summarization to catch up was spread out in a way that made it difficult to reuse properly, so this code refactors things to make it easier. Finally, add a test case that would have caught this bug and the previously-fixed bug that WAL summarization sometimes needs to back up when the timeline changes. Discussion: https://postgr.es/m/CA+TgmoZGEsZodXC4f=XZNkAeyuDmWTSkpkjCEOcF19Am0mt_OA@mail.gmail.com	2024-07-26 15:00:48 -04:00
Thomas Munro	f6bef362ca	Refactor tidstore.c iterator buffering. Previously, TidStoreIterateNext() would expand the set of offsets for each block into an internal buffer that it overwrote each time. In order to be able to collect the offsets for multiple blocks before working with them, change the contract. Now, the offsets are obtained by a separate call to TidStoreGetBlockOffsets(), which can be called at a later time. TidStoreIteratorResult objects are safe to copy and store in a queue. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/CAAKRu_bbkmwAzSBgnezancgJeXrQZXy4G4kBTd+5=cr86H5yew@mail.gmail.com	2024-07-24 17:32:35 +12:00
Amit Kapila	1462aad2e4	Allow altering of two_phase option of a SUBSCRIPTION. The two_phase option is controlled by both the publisher (as a slot option) and the subscriber (as a subscription option), so the slot option must also be modified. Changing the 'two_phase' option for a subscription from 'true' to 'false' is permitted only when there are no pending prepared transactions corresponding to that subscription. Otherwise, the changes of already prepared transactions can be replicated again along with their corresponding commit leading to duplicate data or errors. To avoid data loss, the 'two_phase' option for a subscription can only be changed from 'false' to 'true' once the initial data synchronization is completed. Therefore this is performed later by the logical replication worker. Author: Hayato Kuroda, Ajin Cherian, Amit Kapila Reviewed-by: Peter Smith, Hou Zhijie, Amit Kapila, Vitaly Davydov, Vignesh C Discussion: https://postgr.es/m/8fab8-65d74c80-1-2f28e880@39088166	2024-07-24 10:13:36 +05:30
Peter Eisentraut	774d47b6c0	Move all extern declarations for GUC variables to header files Add extern declarations in appropriate header files for global variables related to GUC. In many cases, this was handled quite inconsistently before, with some GUC variables declared in a header file and some only pulled in via ad-hoc extern declarations in various .c files. Also add PGDLLIMPORT qualifications to those variables. These were previously missing because src/tools/mark_pgdllimport.pl has only been used with header files. This also fixes -Wmissing-variable-declarations warnings for GUC variables (not yet part of the standard warning options). Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org	2024-07-24 06:31:07 +02:00
Peter Eisentraut	935e675f3c	Get rid of a global variable bootstrap_data_checksum_version can just as easily be passed to where it is used via function arguments. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org	2024-07-23 10:00:41 +02:00
Michael Paquier	ffb0603929	Improve comments in slru.{c,h} about segment name format slru.h described incorrectly how SLRU segment names are formatted depending on the segment number and if long or short segment names are used. This commit closes the gap with a better description, fitting with the reality. Reported-by: Noah Misch Author: Aleksander Alekseev Discussion: https://postgr.es/m/20240626002747.dc.nmisch@google.com Backpatch-through: 17	2024-07-23 16:54:51 +09:00
Robert Haas	402b586d0a	Do not summarize WAL if generated with wal_level=minimal. To do this, we must include the wal_level in the first WAL record covered by each summary file; so add wal_level to struct Checkpoint and the payload of XLOG_CHECKPOINT_REDO and XLOG_END_OF_RECOVERY. This, in turn, requires bumping XLOG_PAGE_MAGIC and, since the Checkpoint is also stored in the control file, also PG_CONTROL_VERSION. It's not great to do that so late in the release cycle, but the alternative seems to ship v17 without robust protections against this scenario, which could result in corrupted incremental backups. A side effect of this patch is that, when a server with wal_level=replica is started with summarize_wal=on for the first time, summarization will no longer begin with the oldest WAL that still exists in pg_wal, but rather from the first checkpoint after that. This change should be harmless, because a WAL summary for a partial checkpoint cycle can never make an incremental backup possible when it would otherwise not have been. Report by Fujii Masao. Patch by me. Review and/or testing by Jakub Wartak and Fujii Masao. Discussion: http://postgr.es/m/6e30082e-041b-4e31-9633-95a66de76f5d@oss.nttdata.com	2024-07-18 12:09:48 -04:00
Noah Misch	4a7f91b3d3	Remove comment about xl_heap_inplace "AT END OF STRUCT". Commit `2c03216d83` moved the tuple data from there to the buffer-0 data. Back-patch to v12 (all supported versions), the plan for the next change to this struct. Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com	2024-06-27 19:21:06 -07:00
Alexander Korotkov	70a845c04a	Remove extra comment at TableAmRoutine.scan_analyze_next_block The extra comment was accidentally copied here by `6377e12a` from heapam_scan_analyze_next_block(). Reported-by: Matthias van de Meent Discussion: https://postgr.es/m/CAEze2WjC5PiweG-4oe0hB_Zr59iF3tRE1gURm8w4Cg5b6JEBGw%40mail.gmail.com	2024-06-22 16:17:50 +03:00
Tom Lane	92c49d1062	Fix insertion of SP-GiST REDIRECT tuples during REINDEX CONCURRENTLY. Reconstruction of an SP-GiST index by REINDEX CONCURRENTLY may insert some REDIRECT tuples. This will typically happen in a transaction that lacks an XID, which leads either to assertion failure in spgFormDeadTuple or to insertion of a REDIRECT tuple with zero xid. The latter's not good either, since eventually VACUUM will apply GlobalVisTestIsRemovableXid() to the zero xid, resulting in either an assertion failure or a garbage answer. In practice, since REINDEX CONCURRENTLY locks out index scans till it's done, it doesn't matter whether it inserts REDIRECTs or PLACEHOLDERs; and likewise it doesn't matter how soon VACUUM reduces such a REDIRECT to a PLACEHOLDER. So in non-assert builds there's no observable problem here, other than perhaps a little index bloat. But it's not behaving as intended. To fix, remove the failing Assert in spgFormDeadTuple, acknowledging that we might sometimes insert a zero XID; and guard VACUUM's GlobalVisTestIsRemovableXid() call with a test for valid XID, ensuring that we'll reduce such a REDIRECT the first time VACUUM sees it. (Versions before v14 use TransactionIdPrecedes here, which won't fail on zero xid, so they really have no bug at all in non-assert builds.) Another solution could be to not create REDIRECTs at all during REINDEX CONCURRENTLY, making the relevant code paths treat that case like index build (which likewise knows that no concurrent index scans can be happening). That would allow restoring the Assert in spgFormDeadTuple, but we'd still need the VACUUM change because redirection tuples with zero xid may be out there already. But there doesn't seem to be a nice way for spginsert() to tell that it's being called in REINDEX CONCURRENTLY without some API changes, so we'll leave that as a possible future improvement. In HEAD, also rename the SpGistState.myXid field to redirectXid, which seems less misleading (since it might not in fact be our transaction's XID) and is certainly less uninformatively generic. Per bug #18499 from Alexander Lakhin. Back-patch to all supported branches. Discussion: https://postgr.es/m/18499-8a519c280f956480@postgresql.org	2024-06-17 14:30:59 -04:00
Peter Geoghegan	6207f08f70	Harmonize function parameter names for Postgres 17. Make sure that function declarations use names that exactly match the corresponding names from function definitions in a few places. These inconsistencies were all introduced during Postgres 17 development. pg_bsd_indent still has a couple of similar inconsistencies, which I (pgeoghegan) have left untouched for now. This commit was written with help from clang-tidy, by mechanically applying the same rules as similar clean-up commits (the earliest such commit was commit `035ce1fe`).	2024-06-12 17:01:51 -04:00
Peter Eisentraut	be5942aee7	Remove unused typedefs There were a few typedefs that were never used to define a variable or field. This had the effect that the buildfarm's typedefs.list would not include them, and so they would need to be re-added manually to keep the overall pgindent result perfectly clean. We can easily get rid of these typedefs to avoid the issue. In a few cases, we just let the enum or struct stand on its own without a typedef around it. In one case, an enum was abused to define flag bits; that's better done with macro definitions. This fixes all the remaining issues with missing entries in the buildfarm's typedefs.list. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/1919000.1715815925@sss.pgh.pa.us	2024-05-17 07:36:12 +02:00
Peter Eisentraut	8aee330af5	Revert temporal primary keys and foreign keys This feature set did not handle empty ranges correctly, and it's now too late for PostgreSQL 17 to fix it. The following commits are reverted: `6db4598fcb` Add stratnum GiST support function `46a0cd4cef` Add temporal PRIMARY KEY and UNIQUE constraints `86232a49a4` Fix comment on gist_stratnum_btree `030e10ff1a` Rename pg_constraint.conwithoutoverlaps to conperiod `a88c800deb` Use daterange and YMD in without_overlaps tests instead of tsrange. `5577a71fb0` Use half-open interval notation in without_overlaps tests `34768ee361` Add temporal FOREIGN KEY contraints `482e108cd3` Add test for REPLICA IDENTITY with a temporal key `c3db1f30cb` doc: clarify PERIOD and WITHOUT OVERLAPS in CREATE TABLE `144c2ce0cc` Fix ON CONFLICT DO NOTHING/UPDATE for temporal indexes Discussion: https://www.postgresql.org/message-id/d0b64a7a-dfe4-4b84-a906-c7dedfa40a3e@eisentraut.org	2024-05-16 08:17:46 +02:00
Tomas Vondra	41d2c6f952	Add missing index_insert_cleanup calls The optimization for inserts into BRIN indexes added by `c1ec02be1d` relies on a cache that needs to be explicitly released after calling index_insert(). The commit however failed to invoke the cleanup in validate_index(), which calls index_insert() indirectly through table_index_validate_scan(). After inspecting index_insert() callers, it seems unique_key_recheck() is missing the call too. Fixed by adding the two missing index_insert_cleanup() calls. The commit does two additional improvements. The aminsertcleanup() signature is modified to have the index as the first argument, to make it more like the other AM callbacks. And the aminsertcleanup() callback is invoked even if the ii_AmCache is NULL, so that it can decide if the cleanup is necessary. Author: Alvaro Herrera, Tomas Vondra Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/202401091043.e3nrqiad6gb7@alvherre.pgsql	2024-04-19 16:08:34 +02:00
Daniel Gustafsson	950d4a2cb1	Fix typos and duplicate words This fixes various typos, duplicated words, and tiny bits of whitespace mainly in code comments but also in docs. Author: Daniel Gustafsson <daniel@yesql.se> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Alexander Lakhin <exclusion@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se	2024-04-18 21:28:07 +02:00
Peter Geoghegan	f698704155	Fix nbtree posting list comment. Oversight in commit `0d861bbb70`.	2024-04-16 11:20:41 -04:00
Peter Geoghegan	aa1def44c3	Fix nbtree page recycling comment. Oversight in commit `e5d8a99903`.	2024-04-16 11:14:36 -04:00
Alexander Korotkov	6377e12a5a	revert: Generalize relation analyze in table AM interface This commit reverts `27bc1772fc` and `dd1f6b0c17`. Per review by Andres Freund. Discussion: https://postgr.es/m/20240415201057.khoyxbwwxfgzomeo%40awork3.anarazel.de	2024-04-16 13:14:20 +03:00
Alexander Korotkov	922c4c461d	Revert: Allow table AM to store complex data structures in rd_amcache This commit reverts `02eb07ea89` per review by Andres Freund. Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de	2024-04-11 16:02:49 +03:00
Alexander Korotkov	8dd0bb84da	Revert: Allow table AM tuple_insert() method to return the different slot This commit reverts `c35a3fb5e0` per review by Andres Freund. Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de	2024-04-11 16:02:45 +03:00
Alexander Korotkov	193e6d18e5	Revert: Allow locking updated tuples in tuple_update() and tuple_delete() This commit reverts `87985cc925` and `818861eb57` per review by Andres Freund. Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de	2024-04-11 16:01:34 +03:00
Alexander Korotkov	da841aa4dc	Revert: Let table AM insertion methods control index insertion This commit reverts `b1484a3f19` per review by Andres Freund. Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de	2024-04-11 16:01:30 +03:00
Alexander Korotkov	bc1e2092eb	Revert: Custom reloptions for table AM This commit reverts `9bd99f4c26` and `422041542f` per review by Andres Freund. Discussion: https://postgr.es/m/20240410165236.rwyrny7ihi4ddxw4%40awork3.anarazel.de	2024-04-11 15:46:35 +03:00
Alexander Korotkov	dd1f6b0c17	Provide a way block-level table AMs could re-use acquire_sample_rows() While keeping API the same, this commit provides a way for block-level table AMs to re-use existing acquire_sample_rows() by providing custom callbacks for getting the next block and the next tuple. Reported-by: Andres Freund Discussion: https://postgr.es/m/20240407214001.jgpg5q3yv33ve6y3%40awork3.anarazel.de Reviewed-by: Pavel Borisov	2024-04-08 14:39:48 +03:00
Alexander Korotkov	422041542f	Fill CommonRdOptions with default values in extract_autovac_opts() Reported-by: Thomas Munro Reported-by: Pavel Borisov Discussion: https://postgr.es/m/CA%2BhUKGLZzLR50RBvuqOO3MZ%3DF54ETz-rTp1PDX9uDGP_GqyYqA%40mail.gmail.com	2024-04-08 12:18:23 +03:00
Alexander Korotkov	9bd99f4c26	Custom reloptions for table AM Let table AM define custom reloptions for its tables. This allows specifying AM-specific parameters by the WITH clause when creating a table. The reloptions, which could be used outside of table AM, are now extracted into the CommonRdOptions data structure. These options could be by decision of table AM directly specified by a user or calculated in some way. The new test module test_tam_options evaluates the ability to set up custom reloptions and calculate fields of CommonRdOptions on their base. The code may use some parts from prior work by Hao Wu. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Discussion: https://postgr.es/m/AMUA1wBBBxfc3tKRLLdU64rb.1.1683276279979.Hmail.wuhao%40hashdata.cn Reviewed-by: Reviewed-by: Pavel Borisov, Matthias van de Meent, Jess Davis	2024-04-08 11:23:28 +03:00
John Naylor	8a1b31e6e5	Use bump context for TID bitmaps stored by vacuum Vacuum does not pfree individual entries, and only frees the entire storage space when finished with it. This allows using a bump context, eliminating the chunk header in each leaf allocation. Most leaf allocations will be 16 to 32 bytes, so that's a significant savings. TidStoreCreateLocal gets a boolean parameter to indicate that the created store is insert-only. This requires a separate tree context for iteration, since we free the iteration state after iteration completes. Discussion: https://postgr.es/m/CANWCAZac%3DpBePg3rhX8nXkUuaLoiAJJLtmnCfZsPEAS4EtJ%3Dkg%40mail.gmail.com Discussion: https://postgr.es/m/CANWCAZZQFfxvzO8yZHFWtQV+Z2gAMv1ku16Vu7KWmb5kZQyd1w@mail.gmail.com	2024-04-08 14:39:49 +07:00
Thomas Munro	041b96802e	Use streaming I/O in ANALYZE. The ANALYZE command prefetches and reads sample blocks chosen by a BlockSampler algorithm. Instead of calling [Prefetch\|Read]Buffer() for each block, ANALYZE now uses the streaming API introduced in `b5a9b18cd0`. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com	2024-04-08 13:16:28 +12:00
Thomas Munro	b7b0f3f272	Use streaming I/O in sequential scans. Instead of calling ReadBuffer() for each block, heap sequential scans and TID range scans now use the streaming API introduced in `b5a9b18cd0`. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg%3DgEQ%40mail.gmail.com	2024-04-08 01:53:57 +12:00
Tomas Vondra	04e72ed617	BitmapHeapScan: Push skip_fetch optimization into table AM Commit `7c70996ebf` introduced an optimization to allow bitmap scans to operate like index-only scans by not fetching a block from the heap if none of the underlying data is needed and the block is marked all visible in the visibility map. With the introduction of table AMs, a FIXME was added to this code indicating that the skip_fetch logic should be pushed into the table AM-specific code, as not all table AMs may use a visibility map in the same way. This commit resolves this FIXME for the current block. The layering violation is still present in BitmapHeapScans's prefetching code, which uses the visibility map to decide whether or not to prefetch a block. However, this can be addressed independently. Author: Melanie Plageman Reviewed-by: Andres Freund, Heikki Linnakangas, Tomas Vondra, Mark Dilger Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com	2024-04-07 00:24:14 +02:00
Peter Geoghegan	5bf748b86b	Enhance nbtree ScalarArrayOp execution. Commit `9e8da0f7` taught nbtree to handle ScalarArrayOpExpr quals natively. This works by pushing down the full context (the array keys) to the nbtree index AM, enabling it to execute multiple primitive index scans that the planner treats as one continuous index scan/index path. This earlier enhancement enabled nbtree ScalarArrayOp index-only scans. It also allowed scans with ScalarArrayOp quals to return ordered results (with some notable restrictions, described further down). Take this general approach a lot further: teach nbtree SAOP index scans to decide how to execute ScalarArrayOp scans (when and where to start the next primitive index scan) based on physical index characteristics. This can be far more efficient. All SAOP scans will now reliably avoid duplicative leaf page accesses (just like any other nbtree index scan). SAOP scans whose array keys are naturally clustered together now require far fewer index descents, since we'll reliably avoid starting a new primitive scan just to get to a later offset from the same leaf page. The scan's arrays now advance using binary searches for the array element that best matches the next tuple's attribute value. Required scan key arrays (i.e. arrays from scan keys that can terminate the scan) ratchet forward in lockstep with the index scan. Non-required arrays (i.e. arrays from scan keys that can only exclude non-matching tuples) "advance" without the process ever rolling over to a higher-order array. Naturally, only required SAOP scan keys trigger skipping over leaf pages (non-required arrays cannot safely end or start primitive index scans). Consequently, even index scans of a composite index with a high-order inequality scan key (which we'll mark required) and a low-order SAOP scan key (which we won't mark required) now avoid repeating leaf page accesses -- that benefit isn't limited to simpler equality-only cases. In general, all nbtree index scans now output tuples as if they were one continuous index scan -- even scans that mix a high-order inequality with lower-order SAOP equalities reliably output tuples in index order. This allows us to remove a couple of special cases that were applied when building index paths with SAOP clauses during planning. Bugfix commit `807a40c5` taught the planner to avoid generating unsafe path keys: path keys on a multicolumn index path, with a SAOP clause on any attribute beyond the first/most significant attribute. These cases are now all safe, so we go back to generating path keys without regard for the presence of SAOP clauses (just like with any other clause type). Affected queries can now exploit scan output order in all the usual ways (e.g., certain "ORDER BY ... LIMIT n" queries can now terminate early). Also undo changes from follow-up bugfix commit `a4523c5a`, which taught the planner to produce alternative index paths, with path keys, but without low-order SAOP index quals (filter quals were used instead). We'll no longer generate these alternative paths, since they can no longer offer any meaningful advantages over standard index qual paths. Affected queries thereby avoid all of the disadvantages that come from using filter quals within index scan nodes. They can avoid extra heap page accesses from using filter quals to exclude non-matching tuples (index quals will never have that problem). They can also skip over irrelevant sections of the index in more cases (though only when nbtree determines that starting another primitive scan actually makes sense). There is a theoretical risk that removing restrictions on SAOP index paths from the planner will break compatibility with amcanorder-based index AMs maintained as extensions. Such an index AM could have the same limitations around ordered SAOP scans as nbtree had up until now. Adding a pro forma incompatibility item about the issue to the Postgres 17 release notes seems like a good idea. Author: Peter Geoghegan <pg@bowt.ie> Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/CAH2-Wz=ksvN_sjcnD1+Bt-WtifRA5ok48aDYnq3pkKhxgMQpcw@mail.gmail.com	2024-04-06 11:47:10 -04:00
David Rowley	44086b0975	Preliminary refactor of heap scanning functions To allow the use of the read stream API added in `b5a9b18cd` for sequential scans on heap tables, here we make some adjustments to make that change less invasive and perhaps make the code easier to follow in the process. Here heapgetpage() gets broken into two functions: 1) The part which reads the block has now been moved into a function named heapfetchbuf(). 2) The part which performed pruning and populated the scan's rs_vistuples[] array is now moved into a new function named heap_prepare_pagescan(). The functionality provided by heap_prepare_pagescan() was only ever required by SO_ALLOW_PAGEMODE scans, so the branching that was previously done in heapgetpage() is no longer needed as we simply just don't call heap_prepare_pagescan() from heapgettup() in the refactored code. Author: Melanie Plageman Discussion: https://postgr.es/m/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg=gEQ@mail.gmail.com	2024-04-04 16:41:13 +13:00
Alexander Korotkov	97ce821e3e	Fix the parameters order for TableAmRoutine.relation_copy_for_cluster() Specify OldTable first, NewTable second as used by table_relation_copy_for_cluster() and as implemented in heapam_relation_copy_for_cluster(). Backpatch to PostgreSQL 12, where TableAmRoutine was introduced. Discussion: https://postgr.es/m/ME3P282MB3166860D4911AE82F92DF7C5B63F2%40ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM Author: Japin Li Reviewed-by: Pavel Borisov Backpatch-through: 12	2024-04-04 00:34:28 +03:00
Heikki Linnakangas	6dbb490261	Combine freezing and pruning steps in VACUUM Execute both freezing and pruning of tuples in the same heap_page_prune() function, now called heap_page_prune_and_freeze(), and emit a single WAL record containing all changes. That reduces the overall amount of WAL generated. This moves the freezing logic from vacuumlazy.c to the heap_page_prune_and_freeze() function. The main difference in the coding is that in vacuumlazy.c, we looked at the tuples after the pruning had already happened, but in heap_page_prune_and_freeze() we operate on the tuples before pruning. The heap_prepare_freeze_tuple() function is now invoked after we have determined that a tuple is not going to be pruned away. VACUUM no longer needs to loop through the items on the page after pruning. heap_page_prune_and_freeze() does all the work. It now returns the list of dead offsets, including existing LP_DEAD items, to the caller. Similarly it's now responsible for tracking 'all_visible', 'all_frozen', and 'hastup' on the caller's behalf. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov	2024-04-03 19:32:28 +03:00
Alexander Korotkov	867cc7b6dd	Revert "Custom reloptions for table AM" This reverts commit `c95c25f9af` due to multiple design issues spotted after commit. Reported-by: Jeff Davis Discussion: https://postgr.es/m/11550b536211d5748bb2865ed6cb3502ff073bf7.camel%40j-davis.com	2024-04-02 11:29:00 +03:00
Heikki Linnakangas	3d0f730bf1	Introduce 'options' argument to heap_page_prune() Currently there is only one option, HEAP_PAGE_PRUNE_MARK_UNUSED_NOW which replaces the old boolean argument, but upcoming patches will introduce at least one more. Having a lot of boolean arguments makes it hard to see at the call sites what the arguments mean, so prefer a bitmask of options with human-readable names. Author: Melanie Plageman <melanieplageman@gmail.com> Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> Discussion: https://www.postgresql.org/message-id/20240401172219.fngjosaqdgqqvg4e@liskov	2024-04-02 00:56:05 +03:00
Alexander Korotkov	b1484a3f19	Let table AM insertion methods control index insertion Previously, the executor did index insert unconditionally after calling table AM interface methods tuple_insert() and multi_insert(). This commit introduces the new parameter insert_indexes for these two methods. Setting '*insert_indexes' to true saves the current logic. Setting it to false indicates that table AM cares about index inserts itself and doesn't want the caller to do that. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Pavel Borisov, Matthias van de Meent, Mark Dilger	2024-03-30 22:53:56 +02:00
Alexander Korotkov	c95c25f9af	Custom reloptions for table AM Let table AM define custom reloptions for its tables. This allows to specify AM-specific parameters by WITH clause when creating a table. The code may use some parts from prior work by Hao Wu. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Discussion: https://postgr.es/m/AMUA1wBBBxfc3tKRLLdU64rb.1.1683276279979.Hmail.wuhao%40hashdata.cn Reviewed-by: Reviewed-by: Pavel Borisov, Matthias van de Meent	2024-03-30 22:36:25 +02:00
Alexander Korotkov	27bc1772fc	Generalize relation analyze in table AM interface Currently, there is just one algorithm for sampling tuples from a table written in acquire_sample_rows(). Custom table AM can just redefine the way to get the next block/tuple by implementing scan_analyze_next_block() and scan_analyze_next_tuple() API functions. This approach doesn't seem general enough. For instance, it's unclear how to sample this way index-organized tables. This commit allows table AM to encapsulate the whole sampling algorithm (currently implemented in acquire_sample_rows()) into the relation_analyze() API function. Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com Reviewed-by: Pavel Borisov, Matthias van de Meent	2024-03-30 22:34:04 +02:00

1 2 3 4 5 ...

1815 Commits