postgres

mirror of https://github.com/postgres/postgres.git synced 2025-12-21 05:21:08 +03:00

Author	SHA1	Message	Date
Heikki Linnakangas	7cfdb4d1e7	Update TransactionXmin when MyProc->xmin is updated GetSnapshotData() set TransactionXmin = MyProc->xmin, but when SnapshotResetXmin() advanced MyProc->xmin, it did not advance TransactionXmin correspondingly. That meant that TransactionXmin could be older than MyProc->xmin, and XIDs between than TransactionXmin and the real MyProc->xmin could be vacuumed away. One known consequence is in pg_subtrans lookups: we might try to look up the status of an XID that was already truncated away. Back-patch to all supported versions. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/d27a046d-a1e4-47d1-a95c-fbabe41debb4@iki.fi	2024-12-21 23:42:52 +02:00
Thomas Munro	0350b876b0	Fix corruption when relation truncation fails. RelationTruncate() does three things, while holding an AccessExclusiveLock and preventing checkpoints: 1. Logs the truncation. 2. Drops buffers, even if they're dirty. 3. Truncates some number of files. Step 2 could previously be canceled if it had to wait for I/O, and step 3 could and still can fail in file APIs. All orderings of these operations have data corruption hazards if interrupted, so we can't give up until the whole operation is done. When dirty pages were discarded but the corresponding blocks were left on disk due to ERROR, old page versions could come back from disk, reviving deleted data (see pgsql-bugs #18146 and several like it). When primary and standby were allowed to disagree on relation size, standbys could panic (see pgsql-bugs #18426) or revive data unknown to visibility management on the primary (theorized). Changes: * WAL is now unconditionally flushed first * smgrtruncate() is now called in a critical section, preventing interrupts and causing PANIC on file API failure * smgrtruncate() has a new parameter for existing fork sizes, because it can't call smgrnblocks() itself inside a critical section The changes apply to RelationTruncate(), smgr_redo() and pg_truncate_visibility_map(). That last is also brought up to date with other evolutions of the truncation protocol. The VACUUM FileTruncate() failure mode had been discussed in older reports than the ones referenced below, with independent analysis from many people, but earlier theories on how to fix it were too complicated to back-patch. The more recently invented cancellation bug was diagnosed by Alexander Lakhin. Other corruption scenarios were spotted by me while iterating on this patch and earlier commit `75818b3a`. Back-patch to all supported releases. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reported-by: rootcause000@gmail.com Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/18146-04e908c662113ad5%40postgresql.org Discussion: https://postgr.es/m/18426-2d18da6586f152d6%40postgresql.org	2024-12-20 23:57:18 +13:00
Peter Geoghegan	9e85b20da7	Avoid nbtree index scan SAOP scanBehind confusion. Consistently reset so->scanBehind at the beginning of nbtree array advancement, even during sktrig_required=false calls (calls where array advancement is triggered by an unsatisfied non-required array scan key). Otherwise, it's possible for queries to fail to return all relevant tuples to the scan given a low-order required scan key that was previously deemed "satisfied" by a truncated high key attribute value. This only happened at the point where a later non-required array scan key needed to be "advanced" once on the next leaf page (that is, once the right sibling of the truncated high key page was reached). The underlying issue was that later code within _bt_advance_array_keys assumed that the so->scanBehind flag must have been set using the current page's high key (not the previous page's high key). Any later successful recheck call to _bt_check_compare would therefore spuriously be prevented from making _bt_advance_array_keys return true, based on the faulty belief that the truncated attribute must be from the scan's current tuple (i.e. the non-pivot tuple at the start of the next page). _bt_advance_array_keys would return false for the tuple, ultimately resulting in _bt_checkkeys failing to return a matching tuple. Oversight in commit `5bf748b8`, which enhanced nbtree ScalarArrayOp execution. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkJKncfqyAUTeuB5GgRhT1vhsWO2q11dbZNqKmvjopP_g@mail.gmail.com Backpatch: 17-, where commit `5bf748b8` first appears.	2024-12-19 11:08:53 -05:00
David Rowley	7b8d45d278	Fix Assert failure in WITH RECURSIVE UNION queries If the non-recursive part of a recursive CTE ended up using TTSOpsBufferHeapTuple as the table slot type, then a duplicate value could cause an Assert failure in CheckOpSlotCompatibility() when checking the hash table for the duplicate value. The expected slot type for the deform step was TTSOpsMinimalTuple so the Assert failed when the TTSOpsBufferHeapTuple slot was used. This is a long-standing bug which we likely didn't notice because it seems much more likely that the non-recursive term would have required projection and used a TTSOpsVirtual slot, which CheckOpSlotCompatibility is ok with. There doesn't seem to be any harm done here other than the Assert failure. Both TTSOpsMinimalTuple and TTSOpsBufferHeapTuple slot types require tuple deformation, so the EEOP__FETCHSOME ExprState step would have properly existed in the ExprState. The solution is to pass NULL for the ExecBuildGroupingEqual's 'lops' parameter. This means the ExprState's EEOP__FETCHSOME step won't expect a fixed slot type. This makes CheckOpSlotCompatibility() happy as no checking is performed when the ExprEvalStep is not expecting a fixed slot type. Reported-by: Richard Guo Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAMbWs4-8U9q2LAtf8+ghV11zeUReA3AmrYkxzBEv0vKnDxwkKA@mail.gmail.com Backpatch-through: 13, all supported versions	2024-12-19 13:12:18 +13:00
Nathan Bossart	18452b70ac	Accommodate very large dshash tables. If a dshash table grows very large (e.g., the dshash table for cumulative statistics when there are millions of tables), resizing it may fail with an error like: ERROR: invalid DSA memory alloc request size 1073741824 To fix, permit dshash resizing to allocate more than 1 GB by providing the DSA_ALLOC_HUGE flag. Reported-by: Andreas Scherbaum Author: Matthias van de Meent Reviewed-by: Cédric Villemain, Michael Paquier, Andres Freund Discussion: https://postgr.es/m/80a12d59-0d5e-4c54-866c-e69cd6536471%40pgug.de Backpatch-through: 13	2024-12-17 15:24:45 -06:00
Tomas Vondra	42eae257cf	Update comments about index parallel builds Commit `b437571714` allowed parallel builds for BRIN, but left behind two comments claiming only btree indexes support parallel builds. Reported by Egor Rogov, along with similar issues in SGML docs. Backpatch to 17, where parallel builds for BRIN were introduced. Reported-by: Egor Rogov Backpatch-through: 17 Discussion: https://postgr.es/m/114e2d5d-125e-07d8-94aa-5ad175fb7443@postgrespro.ru	2024-12-17 15:48:29 +01:00
Nathan Bossart	d09fbf645e	Revert "Don't truncate database and user names in startup packets." This reverts commit `562bee0fc1`. We received a report from the field about this change in behavior, so it seems best to revert this commit and to add proper multibyte-aware truncation as a follow-up exercise. Fixes bug #18711. Reported-by: Adam Rauch Reviewed-by: Tom Lane, Bertrand Drouvot, Bruce Momjian, Thomas Munro Discussion: https://postgr.es/m/18711-7503ee3e449d2c47%40postgresql.org Backpatch-through: 17	2024-12-12 15:52:04 -06:00
Noah Misch	4bd9de3f41	Fix elog(FATAL) before PostmasterMain() or just after fork(). Since commit `97550c0711`, these failed with "PANIC: proc_exit() called in child process" due to uninitialized or stale MyProcPid. That was reachable if close() failed in ClosePostmasterPorts() or setlocale(category, "C") failed, both unlikely. Back-patch to v13 (all supported versions). Discussion: https://postgr.es/m/20241208034614.45.nmisch@google.com	2024-12-10 13:52:02 -08:00
Michael Paquier	67ef403d0e	Fix comments of GUC hooks for timezone_abbreviations The GUC assign and check hooks used "assign_timezone_abbreviations", which was incorrect. Issue noticed while browsing this area of the code, introduced in `0a20ff54f5`. Reviewed-by: Tom Lane Discussion: https://postgr.es/m/Z1eV6Y8yk77GZhZI@paquier.xyz Backpatch-through: 16	2024-12-10 13:02:24 +09:00
Daniel Gustafsson	9add1bbfa6	Fix small memory leaks in GUC checks Follow-up commit to `a9d58bfe8a`. Backpatch down to v16 where this was added in order to keep the code consistent for future backpatches. Author: Tofig Aliev <t.aliev@postgrespro.ru> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/bba4313fdde9db46db279f96f3b748b1@postgrespro.ru Backpatch-through: 16	2024-12-09 20:58:23 +01:00
Tom Lane	556f7b7bc1	Simplify executor's determination of whether to use parallelism. Our parallel-mode code only works when we are executing a query in full, so ExecutePlan must disable parallel mode when it is asked to do partial execution. The previous logic for this involved passing down a flag (variously named execute_once or run_once) from callers of ExecutorRun or PortalRun. This is overcomplicated, and unsurprisingly some of the callers didn't get it right, since it requires keeping state that not all of them have handy; not to mention that the requirements for it were undocumented. That led to assertion failures in some corner cases. The only state we really need for this is the existing QueryDesc.already_executed flag, so let's just put all the responsibility in ExecutePlan. (It could have been done in ExecutorRun too, leading to a slightly shorter patch -- but if there's ever more than one caller of ExecutePlan, it seems better to have this logic in the subroutine than the callers.) This makes those ExecutorRun/PortalRun parameters unnecessary. In master it seems okay to just remove them, returning the API for those functions to what it was before parallelism. Such an API break is clearly not okay in stable branches, but for them we can just leave the parameters in place after documenting that they do nothing. Per report from Yugo Nagata, who also reviewed and tested this patch. Back-patch to all supported branches. Discussion: https://postgr.es/m/20241206062549.710dc01cf91224809dd6c0e1@sraoss.co.jp	2024-12-09 14:38:19 -05:00
Michael Paquier	bb93b33d7e	Improve comment about dropped entries in pgstat.c pgstat_write_statsfile() discards any entries marked as dropped from being written to the stats file at shutdown, and also included an assertion based on the same condition. The intention of the assertion is to track that no pgstats entries should be left around as terminating backends should drop any entries they still hold references on before the stats file is written by the checkpointer, and it not worth taking down the server in this case if there is a bug making that possible. Let's improve the comment of this area to document clearly what's intended. Based on a discussion with Bertrand Drouvot and Anton A. Melnikov. Author: Bertrand Drouvot Discussion: https://postgr.es/m/a13e8cdf-b97a-4ecb-8f42-aaa367974e29@postgrespro.ru Backpatch-through: 15	2024-12-09 14:35:44 +09:00
Michael Paquier	dc5f905418	Fix invalidation of local pgstats references for entry reinitialization `818119afcc` has introduced the "generation" concept in pgstats entries, incremented a counter when a pgstats entry is reinitialized, but it did not count on the fact that backends still holding local references to such entries need to be refreshed if the cache age is outdated. The previous logic only updated local references when an entry was dropped, but it needs also to consider entries that are reinitialized. This matters for replication slot stats (as well as custom pgstats kinds in 18~), where concurrent drops and creates of a slot could cause incorrect stats to be locally referenced. This would lead to an assertion failure at shutdown when writing out the stats file, as the backend holding an outdated local reference would not be able to drop during its shutdown sequence the stats entry that should be dropped, as the last process holding a reference to the stats entry. The checkpointer was then complaining about such an entry late in the shutdown sequence, after the shutdown checkpoint is finished with the control file updated, causing the stats file to not be generated. In non-assert builds, the entry would just be skipped with the stats file written. Note that only logical replication slots use statistics. A test case based on TAP is added to test_decoding, where a persistent connection peeking at a slot's data is kept with concurrent drops and creates of the same slot. This is based on the isolation test case that Anton has sent. As it requires a node shutdown with a check to make sure that the stats file is written with this specific sequence of events, TAP is used instead. Reported-by: Anton A. Melnikov Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/56bf8ff9-dd8c-47b2-872a-748ede82af99@postgrespro.ru Backpatch-through: 15	2024-12-09 10:46:03 +09:00
David Rowley	9d5ce4f1a0	Fix possible crash during WindowAgg evaluation When short-circuiting WindowAgg node evaluation on the top-level WindowAgg node using quals on monotonic window functions, because the WindowAgg run condition can mean there's no need to evaluate subsequent window function results in the same partition once the run condition becomes false, it was possible that the executor would use stale results from the previous invocation of the window function in some cases. A fix for this was partially done by a5832722, but that commit only fixed the issue for non-top-level WindowAgg nodes. I mistakenly thought that the top-level WindowAgg didn't have this issue, but Jayesh's example case clearly shows that's incorrect. At the time, I also thought that this only affected 32-bit systems as all window functions which then supported run conditions returned BIGINT, however, that's wrong as ExecProject is still called and that could cause evaluation of any other window function belonging to the same WindowAgg node, one of which may return a byref type. The only queries affected by this are WindowAggs with a "Run Condition" which contains at least one window function with a byref result type, such as lead() or lag() on a byref column. The window clause must also contain a PARTITION BY clause (without a PARTITION BY, execution of the WindowAgg stops immediately when the run condition becomes false and there's no risk of using the stale results). Reported-by: Jayesh Dehankar Discussion: https://postgr.es/m/193261e2c4d.3dd3cd7c1842.871636075166132237@zohocorp.com Backpatch-through: 15, where WindowAgg run conditions were added	2024-12-09 14:24:07 +13:00
Tom Lane	ec7b89cc53	Ensure that pg_amop/amproc entries depend on their lefttype/righttype. Usually an entry in pg_amop or pg_amproc does not need a dependency on its amoplefttype/amoprighttype/amproclefttype/amprocrighttype types, because there is an indirect dependency via the argument types of its referenced operator or procedure, or via the opclass it belongs to. However, for some support procedures in some index AMs, the argument types of the support procedure might not mention the column data type at all. Also, the amop/amproc entry might be treated as "loose" in the opfamily, in which case it lacks a dependency on any particular opclass; or it might be a cross-type entry having a reference to a datatype that is not its opclass' opcintype. The upshot of all this is that there are cases where a datatype can be dropped while leaving behind amop/amproc entries that mention it, because there is no path in pg_depend showing that those entries depend on that type. Such entries are harmless in normal activity, because they won't get used, but they cause problems for maintenance actions such as dropping the operator family. They also cause pg_dump to produce bogus output. The previous commit put a band-aid on the DROP OPERATOR FAMILY failure, but a real fix is needed. To fix, add pg_depend entries showing that a pg_amop/pg_amproc entry depends on its lefttype/righttype. To avoid bloating pg_depend too much, skip this if the referenced operator or function has that type as an input type. (I did not bother with considering the possible indirect dependency via the opclass' opcintype; at least in the reported case, that wouldn't help anyway.) Probably, the reason this has escaped notice for so long is that add-on datatypes and relevant opclasses/opfamilies are usually packaged as extensions nowadays, so that there's no way to drop a type without dropping the referencing opclasses/opfamilies too. Still, in the absence of pg_depend entries there's nothing that constrains DROP EXTENSION to drop the opfamily entries before the datatype, so it seems possible for a DROP failure to occur anyway. The specific case that was reported doesn't fail in v13, because v13 prefers to attach the support procedure to the opclass not the opfamily. But it's surely possible to construct other edge cases that do fail in v13, so patch that too. Per report from Yoran Heling. Back-patch to all supported branches. Discussion: https://postgr.es/m/Z1MVCOh1hprjK5Sf@gmai021	2024-12-07 15:56:28 -05:00
Tom Lane	5b44a317ae	Make getObjectDescription robust against dangling amproc type links. Yoran Heling reported a case where a data type could be dropped while references to its OID remain behind in pg_amproc. This causes getObjectDescription to fail, which blocks dropping the operator family (since our DROP code likes to construct descriptions of everything it's dropping). The proper fix for this requires adding more pg_depend entries. But to allow DROP to go through with already-corrupt catalogs, tweak getObjectDescription to print "???" for the type instead of failing when it processes such an entry. I changed the logic for pg_amop similarly, for consistency, although it is not known that the problem can manifest in pg_amop. Per report from Yoran Heling. Back-patch to all supported branches (although the problem may be unreachable in v13). Discussion: https://postgr.es/m/Z1MVCOh1hprjK5Sf@gmai021	2024-12-07 14:28:16 -05:00
Tom Lane	765f76d8cd	Fix is_digit labeling of to_timestamp's FFn format codes. These format codes produce or consume strings of digits, so they should be labeled with is_digit = true, but they were not. This has effect in only one place, where is_next_separator() is checked to see if the preceding format code should slurp up all the available digits. Thus, with a format such as '...SSFF3' with remaining input '12345', the 'SS' code would consume all five digits (and then complain about seconds being out of range) when it should eat only two digits. Per report from Nick Davies. This bug goes back to `d589f9446` where the FFn codes were introduced, so back-patch to v13. Discussion: https://postgr.es/m/AM8PR08MB6356AC979252CFEA78B56678B6312@AM8PR08MB6356.eurprd08.prod.outlook.com	2024-12-07 13:12:32 -05:00
John Naylor	83ce20d671	Fix use-after-free in parallel_vacuum_reset_dead_items parallel_vacuum_reset_dead_items used a local variable to hold a pointer from the passed vacrel, purely as a shorthand. This pointer was later freed and a new allocation was made and stored to the struct. Then the local pointer was mistakenly referenced again. This apparently happened not to break anything since the freed chunk would have been put on the context's freelist, so it was accidentally the same pointer anyway, in which case the DSA handle was correctly updated. The minimal fix is to change two places so they access dead_items through the vacrel. This coding style is a maintenance hazard, so while at it get rid of most other similar usages, which were inconsistently used anyway. Analysis and patch by Vallimaharajan G, with further defensive coding by me Backpath to v17, when TidStore came in Discussion: https://postgr.es/m/1936493cc38.68cb2ef27266.7456585136086197135@zohocorp.com	2024-12-04 16:59:12 +07:00
Álvaro Herrera	9abdc1841e	Fix synchronized_standby_slots GUC check hook The validate_sync_standby_slots subroutine requires an LWLock, so it cannot run in processes without PGPROC; skip it there to avoid a crash. This replaces the current test for ReplicationSlotCtl being not null, which appears to be a solution for the same problem but less general. I also rewrote a related comment that mentioned ReplicationSlotCtl in StandbySlotsHaveCaughtup. This code came in with commit bf279ddd1c28; backpatch to 17. Reported-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Discussion: https://postgr.es/m/202411281216.sutbxtr6idnn@alvherre.pgsql	2024-12-03 17:50:57 +01:00
Álvaro Herrera	5ffbbcfa16	Drop "Lock" suffix from LWLock wait event names Commit `da952b415f` unintentially reverted the SQL-visible part of commit `14a9101091`, which breaks queries joining pg_wait_events with pg_stat_acivity. Remove the suffix again. Backpatch to 17. Reported-by: Christophe Courtois <christophe.courtois@dalibo.com> Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/18728-450924477056a339%40postgresql.org Discussion: https://postgr.es/m/Z01w1+LihtRiS0Te@ip-10-97-1-34.eu-west-3.compute.internal	2024-12-03 15:50:03 +01:00
Thomas Munro	d4ffbf47b2	RelationTruncate() must set DELAY_CHKPT_START. Previously, it set only DELAY_CHKPT_COMPLETE. That was important, because it meant that if the XLOG_SMGR_TRUNCATE record preceded a XLOG_CHECKPOINT_ONLINE record in the WAL, then the truncation would also happen on disk before the XLOG_CHECKPOINT_ONLINE record was written. However, it didn't guarantee that the sync request for the truncation was processed before the XLOG_CHECKPOINT_ONLINE record was written. By setting DELAY_CHKPT_START, we guarantee that if an XLOG_SMGR_TRUNCATE record is written to WAL before the redo pointer of a concurrent checkpoint, the sync request queued by that operation must be processed by that checkpoint, rather than being left for the following one. This is a refinement of commit `412ad7a556`. Back-patch to all supported releases, like that commit. Author: Robert Haas <robertmhaas@gmail.com> Reported-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKG%2B-2rjGZC2kwqr2NMLBcEBp4uf59QT1advbWYF_uc%2B0Aw%40mail.gmail.com	2024-12-03 10:19:47 +13:00
Tom Lane	78883cd905	Avoid mislabeling of lateral references, redux. As I'd feared, commit `5c9d8636d` was still a few bricks shy of a load. We can't just leave pulled-up lateral-reference Vars with no new nullingrels: we have to carefully compute what subset of the to-be-replaced Var's nullingrels apply to them, else we still get "wrong varnullingrels" errors. This is a bit tedious, but it looks like we can use the nullingrel data this patch computes for other purposes, enabling better optimization. We don't want to inject unnecessary plan changes into stable branches though, so leave that idea for a later HEAD-only patch. Patch by me, but thanks to Richard Guo for devising a test case that broke `5c9d8636d`, and for preliminary investigation about how to fix it. As before, back-patch to v16. Discussion: https://postgr.es/m/E1tGn4j-0003zi-MP@gemulon.postgresql.org	2024-11-30 12:42:20 -05:00
Tom Lane	72822a99d4	Avoid mislabeling of lateral references when pulling up a subquery. If we are pulling up a subquery that's under an outer join, and the subquery's target list contains a strict expression that uses both a subquery variable and a lateral-reference variable, it's okay to pull up the expression without wrapping it in a PlaceHolderVar. That's safe because if the subquery variable is forced to NULL by the outer join, the expression result will come out as NULL too, so we don't have to force that outcome by evaluating the expression below the outer join. It'd be correct to wrap in a PHV, but that can lead to very significantly worse plans, since we'd then have to use a nestloop plan to pass down the lateral reference to where the expression will be evaluated. However, when we do that, we should not mark the lateral reference variable as being nulled by the outer join, because it isn't after we pull up the expression in this way. So the marking logic added by `cb8e50a4a` was incorrect in this detail, leading to "wrong varnullingrels" errors from the consistency-checking logic in setrefs.c. It seems to be sufficient to just not mark lateral references at all in this case. (I have a nagging feeling that more complexity may be needed in cases where there are several levels of outer join, but some attempts to break it with that didn't succeed.) Per report from Bertrand Mamasam. Back-patch to v16, as the previous patch was. Discussion: https://postgr.es/m/CACZ67_UA_EVrqiFXJu9XK50baEpH=ofEPJswa2kFxg6xuSw-ww@mail.gmail.com	2024-11-28 17:33:16 -05:00
Michael Paquier	7668e85a40	Revert "Handle better implicit transaction state of pipeline mode" This reverts commit `d77f91214f` on all stable branches, due to concerns regarding the compatility side effects this could create in a minor release. The change still exists on HEAD. Discussion: https://postgr.es/m/CA+TgmoZqRgeFTg4+Yf_CMRRXiHuNz1u6ZC4FvVk+rxw0RmOPnw@mail.gmail.com Backpatch-through: 13	2024-11-28 09:43:21 +09:00
Álvaro Herrera	6e793582bc	Fix pg_get_constraintdef for NOT NULL constraints on domains We added pg_constraint rows for all not-null constraints, first for tables and later for domains; but while the ones for tables were reverted, the ones for domains were not. However, we did accidentally revert ruleutils.c support for the ones on domains in `6f8bb7c1e9`, which breaks running pg_get_constraintdef() on them. Put that back. This is only needed in branch 17, because we've reinstated this code in branch master with commit `14e87ffa5c`. Add some new tests in both branches. I couldn't find anything else that needs de-reverting. Reported-by: Erki Eessaar <erki.eessaar@taltech.ee> Reviewed-by: Magnus Hagander <magnus@hagander.net> Discussion: https://postgr.es/m/AS8PR01MB75110350415AAB8BBABBA1ECFE222@AS8PR01MB7511.eurprd01.prod.exchangelabs.com	2024-11-27 13:50:27 +01:00
Michael Paquier	d77f91214f	Handle better implicit transaction state of pipeline mode When using a pipeline, a transaction starts from the first command and is committed with a Sync message or when the pipeline ends. Functions like IsInTransactionBlock() or PreventInTransactionBlock() were already able to understand a pipeline as being in a transaction block, but it was not the case of CheckTransactionBlock(). This function is called for example to generate a WARNING for SET LOCAL, complaining that it is used outside of a transaction block. The current state of the code caused multiple problems, like: - SET LOCAL executed at any stage of a pipeline issued a WARNING, even if the command was at least second in line where the pipeline is in a transaction state. - LOCK TABLE failed when invoked at any step of a pipeline, even if it should be able to work within a transaction block. The pipeline protocol assumes that the first command of a pipeline is not part of a transaction block, and that any follow-up commands is considered as within a transaction block. This commit changes the backend so as an implicit transaction block is started each time the first Execute message of a pipeline has finished processing, with this implicit transaction block ended once a sync is processed. The checks based on XACT_FLAGS_PIPELINING in the routines checking if we are in a transaction block are not necessary: it is enough to rely on the existing ones. Some tests are added to pgbench, that can be backpatched down to v17 when \syncpipeline is involved and down to v14 where \startpipeline and \endpipeline are available. This is unfortunately limited regarding the error patterns that can be checked, but it provides coverage for various pipeline combinations to check if these succeed or fail. These tests are able to capture the case of SET LOCAL's WARNING. The author has proposed a different feature to improve the coverage by adding similar meta-commands to psql where error messages could be checked, something more useful for the cases where commands cannot be used in transaction blocks, like REINDEX CONCURRENTLY or VACUUM. This is considered as future work for v18~. Author: Anthonin Bonnefoy Reviewed-by: Jelte Fennema-Nio, Michael Paquier Discussion: https://postgr.es/m/CAO6_XqrWO8uNBQrSu5r6jh+vTGi5Oiyk4y8yXDORdE2jbzw8xw@mail.gmail.com Backpatch-through: 13	2024-11-27 09:31:37 +09:00
Álvaro Herrera	b0e572819d	Clean up newlines following left parentheses Most came in during the 17 cycle, so backpatch there. Some (particularly reorderbuffer.h) are very old, but backpatching doesn't seem useful. Like commits `c9d2977519`, `c4f113e8fe`.	2024-11-26 17:10:07 +01:00
Peter Eisentraut	ad89c8bda1	Rename C23 keyword constexpr is a keyword in C23. Rename a conflicting identifier for future-proofing. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/08abc832-1384-4aca-a535-1a79765b565e%40eisentraut.org Discussion: https://www.postgresql.org/message-id/flat/87o72eo9iu.fsf%40gentoo.org	2024-11-26 13:35:42 +01:00
Tom Lane	97be02ad00	Fix NULLIF()'s handling of read-write expanded objects. If passed a read-write expanded object pointer, the EEOP_NULLIF code would hand that same pointer to the equality function and then (unless equality was reported) also return the same pointer as its value. This is no good, because a function that receives a read-write expanded object pointer is fully entitled to scribble on or even delete the object, thus corrupting the NULLIF output. (This problem is likely unobservable with the equality functions provided in core Postgres, but it's easy to demonstrate with one coded in plpgsql.) To fix, make sure the pointer passed to the equality function is read-only. We can still return the original read-write pointer as the NULLIF result, allowing optimization of later operations. Per bug #18722 from Alexander Lakhin. This has been wrong since we invented expanded objects, so back-patch to all supported branches. Discussion: https://postgr.es/m/18722-fd9e645448cc78b4@postgresql.org	2024-11-25 18:09:10 -05:00
Noah Misch	718af10dab	Avoid "you don't own a lock of type ExclusiveLock" in GRANT TABLESPACE. This WARNING appeared because SearchSysCacheLocked1() read cc_relisshared before catcache initialization, when the field is false unconditionally. On the basis of reading false there, it constructed a locktag as though pg_tablespace weren't relisshared. Only shared catalogs could be affected, and only GRANT TABLESPACE was affected in practice. SearchSysCacheLocked1() callers use one other shared-relation syscache, DATABASEOID. DATABASEOID is initialized by the end of CheckMyDatabase(), making the problem unreachable for pg_database. Back-patch to v13 (all supported versions). This has no known impact before v16, where ExecGrant_common() first appeared. Earlier branches avoid trouble by having a separate ExecGrant_Tablespace() that doesn't use LOCKTAG_TUPLE. However, leaving this unfixed in v15 could ensnare a future back-patch of a SearchSysCacheLocked1() call. Reported by Aya Iwata. Discussion: https://postgr.es/m/OS7PR01MB11964507B5548245A7EE54E70EA212@OS7PR01MB11964.jpnprd01.prod.outlook.com	2024-11-25 14:42:38 -08:00
Amit Kapila	5f46439d59	Doc: Clarify the `inactive_since` field description. Updated to specify that it represents the exact time a slot became inactive, rather than the period of inactivity. Reported-by: Peter Smith Author: Bruce Momjian, Nisha Moond Reviewed-by: Amit Kapila, Peter Smith Backpatch-through: 17 Discussion: https://postgr.es/m/CAHut+PuvsyA5v8y7rYoY9mkDQzUhwaESM05yCByTMaDoRh30tA@mail.gmail.com	2024-11-25 10:58:06 +05:30
Heikki Linnakangas	9695835538	Fix data loss when restarting the bulk_write facility If a user started a bulk write operation on a fork with existing data to append data in bulk, the bulk_write machinery would zero out all previously written pages up to the last page written by the new bulk_write operation. This is not an issue for PostgreSQL itself, because we never use the bulk_write facility on a non-empty fork. But there are use cases where it makes sense. TimescaleDB extension is known to do that to merge partitions, for example. Backpatch to v17, where the bulk_write machinery was introduced. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Reported-By: Erik Nordström <erik@timescale.com> Reviewed-by: Erik Nordström <erik@timescale.com> Discussion: https://www.postgresql.org/message-id/CACAa4VJ%2BQY4pY7M0ECq29uGkrOygikYtao1UG9yCDFosxaps9g@mail.gmail.com	2024-11-22 16:29:22 +02:00
Álvaro Herrera	e2b08a6295	Fix outdated bit in README.tuplock Apparently this information has been outdated since first committed, because we adopted a different implementation during development per reviews and this detail was not updated in the README. This has been wrong since commit `0ac5ad5134` introduced the file in 2013. Backpatch to all live branches. Reported-by: Will Mortensen <will@extrahop.com> Discussion: https://postgr.es/m/CAMpnoC6yEQ=c0Rdq-J7uRedrP7Zo9UMp6VZyP23QMT68n06cvA@mail.gmail.com	2024-11-21 16:54:36 +01:00
Michael Paquier	afe9b0d9fe	Fix memory leak in pgoutput for the WAL sender RelationSyncCache, the hash table in charge of tracking the relation schemas sent through pgoutput, was forgetting to free the TupleDesc associated to the two slots used to store the new and old tuples, causing some memory to be leaked each time a relation is invalidated when the slots of an existing relation entry are cleaned up. This is rather hard to notice as the bloat is pretty minimal, but a long-running WAL sender would be in trouble over time depending on the workload. sysbench has proved to be pretty good at showing the problem, coupled with some memory monitoring of the WAL sender. Issue introduced in `52e4f0cd47`, that has added row filters for tables logically replicated. Author: Boyu Yang Reviewed-by: Michael Paquier, Hou Zhijie Discussion: https://postgr.es/m/DM3PR84MB3442E14B340E553313B5C816E3252@DM3PR84MB3442.NAMPRD84.PROD.OUTLOOK.COM Backpatch-through: 15	2024-11-21 15:14:11 +09:00
Tom Lane	fea81aee83	Avoid assertion failure if a setop leaf query contains setops. Ordinarily transformSetOperationTree will collect all UNION/ INTERSECT/EXCEPT steps into the setOperations tree of the topmost Query, so that leaf queries do not contain any setOperations. However, it cannot thus flatten a subquery that also contains WITH, ORDER BY, FOR UPDATE, or LIMIT. I (tgl) forgot that in commit `07b4c48b6` and wrote an assertion in rule deparsing that a leaf's setOperations would always be empty. If it were nonempty then we would want to parenthesize the subquery to ensure that the output represents the setop nesting correctly (e.g. UNION below INTERSECT had better get parenthesized). So rather than just removing the faulty Assert, let's change it into an additional case to check to decide whether to add parens. We don't expect that the additional case will ever fire, but it's cheap insurance. Man Zeng and Tom Lane Discussion: https://postgr.es/m/tencent_7ABF9B1F23B0C77606FC5FE3@qq.com	2024-11-20 12:03:47 -05:00
Tom Lane	c1ebef3c10	Compare collations before merging UNION operations. In the dim past we figured it was okay to ignore collations when combining UNION set-operation nodes into a single N-way UNION operation. I believe that was fine at the time, but it stopped being fine when we added nondeterministic collations: the semantics of distinct-ness are affected by those. v17 made it even less fine by allowing per-child sorting operations to be merged via MergeAppend, although I think we accidentally avoided any live bug from that. Add a check that collations match before deciding that two UNION nodes are equivalent. I also failed to resist the temptation to comment plan_union_children() a little better. Back-patch to all supported branches (v13 now), since they all have nondeterministic collations. Discussion: https://postgr.es/m/3605568.1731970579@sss.pgh.pa.us	2024-11-19 18:26:19 -05:00
Noah Misch	1c05004a89	Fix per-session activation of ALTER {ROLE\|DATABASE} SET role. After commit `5a2fed911a`, the catalog state resulting from these commands ceased to affect sessions. Restore the longstanding behavior, which is like beginning the session with a SET ROLE command. If cherry-picking the CVE-2024-10978 fixes, default to including this, too. (This fixes an unintended side effect of fixing CVE-2024-10978.) Back-patch to v12, like that commit. The release team decided to include v12, despite the original intent to halt v12 commits earlier this week. Tom Lane and Noah Misch. Reported by Etienne LAFARGE. Discussion: https://postgr.es/m/CADOZwSb0UsEr4_UTFXC5k7=fyyK8uKXekucd+-uuGjJsGBfxgw@mail.gmail.com	2024-11-15 20:39:59 -08:00
Masahiko Sawada	568e78a653	Fix a possibility of logical replication slot's restart_lsn going backwards. Previously LogicalIncreaseRestartDecodingForSlot() accidentally accepted any LSN as the candidate_lsn and candidate_valid after the restart_lsn of the replication slot was updated, so it potentially caused the restart_lsn to move backwards. A scenario where this could happen in logical replication is: after a logical replication restart, based on previous candidate_lsn and candidate_valid values in memory, the restart_lsn advances upon receiving a subscriber acknowledgment. Then, logical decoding restarts from an older point, setting candidate_lsn and candidate_valid based on an old RUNNING_XACTS record. Subsequent subscriber acknowledgments then update the restart_lsn to an LSN older than the current value. In the reported case, after WAL files were removed by a checkpoint, the retreated restart_lsn prevented logical replication from restarting due to missing WAL segments. This change essentially modifies the 'if' condition to 'else if' condition within the function. The previous code had an asymmetry in this regard compared to LogicalIncreaseXminForSlot(), which does almost the same thing for different fields. The WAL removal issue was reported by Hubert Depesz Lubaczewski. Backpatch to all supported versions, since the bug exists since 9.4 where logical decoding was introduced. Reviewed-by: Tomas Vondra, Ashutosh Bapat, Amit Kapila Discussion: https://postgr.es/m/Yz2hivgyjS1RfMKs%40depesz.com Discussion: https://postgr.es/m/85fff40e-148b-4e86-b921-b4b846289132%40vondra.me Backpatch-through: 13	2024-11-15 17:06:08 -08:00
Tom Lane	5f28e6ba7f	Avoid assertion due to disconnected NFA sub-graphs in regex parsing. In commit `08c0d6ad6` which introduced "rainbow" arcs in regex NFAs, I didn't think terribly hard about what to do when creating the color complement of a rainbow arc. Clearly, the complement cannot match any characters, and I took the easy way out by just not building any arcs at all in the complement arc set. That mostly works, but Nikolay Shaplov found a case where it doesn't: if we decide to delete that sub-NFA later because it's inside a "{0}" quantifier, delsub() suffered an assertion failure. That's because delsub() relies on the target sub-NFA being fully connected. That was always true before, and the best fix seems to be to restore that property. Hence, invent a new arc type CANTMATCH that can be generated in place of an empty color complement, and drop it again later when we start NFA optimization. (At that point we don't need to do delsub() any more, and besides there are other cases where NFA optimization can lead to disconnected subgraphs.) It appears that this bug has no consequences in a non-assert-enabled build: there will be some transiently leaked NFA states/arcs, but they'll get cleaned up eventually. Still, we don't like assertion failures, so back-patch to v14 where rainbow arcs were introduced. Per bug #18708 from Nikolay Shaplov. Discussion: https://postgr.es/m/18708-f94f2599c9d2c005@postgresql.org	2024-11-15 18:23:38 -05:00
Michael Paquier	1d6a03ea41	Fix race conditions with drop of reused pgstats entries This fixes a set of race conditions with cumulative statistics where a shared stats entry could be dropped while it should still be valid in the event when it is reused: an entry may refer to a different object but requires the same hash key. This can happen with various stats kinds, like: - Replication slots that compute internally an index number, for different slot names. - Stats kinds that use an OID in the object key, where a wraparound causes the same key to be used if an OID is used for the same object. - As of PostgreSQL 18, custom pgstats kinds could also be an issue, depending on their implementation. This issue is fixed by introducing a counter called "generation" in the shared entries via PgStatShared_HashEntry, initialized at 0 when an entry is created and incremented when the same entry is reused, to avoid concurrent issues on drop because of other backends still holding a reference to it. This "generation" is copied to the local copy that a backend holds when looking at an object, then cross-checked with the shared entry to make sure that the entry is not dropped even if its "refcount" justifies that if it has been reused. This problem could show up when a backend shuts down and needs to discard any entries it still holds, causing statistics to be removed when they should not, or even an assertion failure. Another report involved a failure in a standby after an OID wraparound, where the startup process would FATAL on a "can only drop stats once", stopping recovery abruptly. The buildfarm has been sporadically complaining about the problem, as well, but the window is hard to reach with the in-core tests. Note that the issue can be reproduced easily by adding a sleep before dshash_find() in pgstat_release_entry_ref() to enlarge the problematic window while repeating test_decoding's isolation test oldest_xmin a couple of times, for example, as pointed out by Alexander Lakhin. Reported-by: Alexander Lakhin, Peter Smith Author: Kyotaro Horiguchi, Michael Paquier Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/CAA4eK1KxuMVyAryz_Vk5yq3ejgKYcL6F45Hj9ZnMNBS-g+PuZg@mail.gmail.com Discussion: https://postgr.es/m/17947-b9554521ad963c9c@postgresql.org Backpatch-through: 15	2024-11-15 11:32:13 +09:00
Michael Paquier	73731b2432	Fix comment in injection_point.c InjectionPointEntry->name was described as a hash key, which was fine when introduced in `d86d20f0ba`, but it is not now. Oversight in `86db52a506`, that has changed the way injection points are stored in shared memory from a hash table to an array. Backpatch-through: 17	2024-11-13 13:58:19 +09:00
Alexander Korotkov	a6fa869cfa	Fix arrays comparison in CompareOpclassOptions() The current code calls array_eq() and does not provide FmgrInfo. This commit provides initialization of FmgrInfo and uses C collation as the safe option for text comparison because we don't know anything about the semantics of opclass options. Backpatch to 13, where opclass options were introduced. Reported-by: Nicolas Maus Discussion: https://postgr.es/m/18692-72ea398df3ec6712%40postgresql.org Backpatch-through: 13	2024-11-12 01:51:20 +02:00
Tom Lane	f4f5d27d87	Parallel workers use AuthenticatedUserId for connection privilege checks. Commit `5a2fed911` had an unexpected side-effect: the parallel worker launched for the new test case would fail if it couldn't use a superuser-reserved connection slot. The reason that test failed while all our pre-existing ones worked is that the connection privilege tests in InitPostgres had been based on the superuserness of the leader's AuthenticatedUserId, but after the rearrangements of `5a2fed911` we were testing the superuserness of CurrentUserId, which the new test case deliberately made to be a non-superuser. This all seems very accidental and probably not the behavior we really want, but a security patch is no time to be redesigning things. Pending some discussion about desirable semantics, hack it so that InitPostgres continues to pay attention to the superuserness of AuthenticatedUserId when starting a parallel worker. Nathan Bossart and Tom Lane, per buildfarm member sawshark. Security: CVE-2024-10978	2024-11-11 17:05:53 -05:00
Tom Lane	cd82afdda5	Fix improper interactions between session_authorization and role. The SQL spec mandates that SET SESSION AUTHORIZATION implies SET ROLE NONE. We tried to implement that within the lowest-level functions that manipulate these settings, but that was a bad idea. In particular, guc.c assumes that it doesn't matter in what order it applies GUC variable updates, but that was not the case for these two variables. This problem, compounded by some hackish attempts to work around it, led to some security-grade issues: * Rolling back a transaction that had done SET SESSION AUTHORIZATION would revert to SET ROLE NONE, even if that had not been the previous state, so that the effective user ID might now be different from what it had been. * The same for SET SESSION AUTHORIZATION in a function SET clause. * If a parallel worker inspected current_setting('role'), it saw "none" even when it should see something else. Also, although the parallel worker startup code intended to cope with the current role's pg_authid row having disappeared, its implementation of that was incomplete so it would still fail. Fix by fully separating the miscinit.c functions that assign session_authorization from those that assign role. To implement the spec's requirement, teach set_config_option itself to perform "SET ROLE NONE" when it sets session_authorization. (This is undoubtedly ugly, but the alternatives seem worse. In particular, there's no way to do it within assign_session_authorization without incompatible changes in the API for GUC assign hooks.) Also, improve ParallelWorkerMain to directly set all the relevant user-ID variables instead of relying on some of them to get set indirectly. That allows us to survive not finding the pg_authid row during worker startup. In v16 and earlier, this includes back-patching `9987a7bf3` which fixed a violation of GUC coding rules: SetSessionAuthorization is not an appropriate place to be throwing errors from. Security: CVE-2024-10978	2024-11-11 10:29:54 -05:00
Nathan Bossart	edcda9bb4c	Ensure cached plans are correctly marked as dependent on role. If a CTE, subquery, sublink, security invoker view, or coercion projection references a table with row-level security policies, we neglected to mark the plan as potentially dependent on which role is executing it. This could lead to later executions in the same session returning or hiding rows that should have been hidden or returned instead. Reported-by: Wolfgang Walther Reviewed-by: Noah Misch Security: CVE-2024-10976 Backpatch-through: 12	2024-11-11 09:00:00 -06:00
Peter Eisentraut	6bf5bf11c3	Translation updates Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 2592030f456910263c8972668576f954fce10595	2024-11-11 13:52:24 +01:00
Tom Lane	943b65358e	Improve fix for not entering parallel mode when holding interrupts. Commit `ac04aa84a` put the shutoff for this into the planner, which is not ideal because it doesn't prevent us from re-using a previously made parallel plan. Revert the planner change and instead put the shutoff into InitializeParallelDSM, modeling it on the existing code there for recovering from failure to allocate a DSM segment. However, that code path is mostly untested, and testing a bit harder showed there's at least one bug: ExecHashJoinReInitializeDSM is not prepared for us to have skipped doing parallel DSM setup. I also thought the Assert in ReinitializeParallelWorkers is pretty ill-advised, and replaced it with a silent Min() operation. The existing test case added by `ac04aa84a` serves fine to test this version of the fix, so no change needed there. Patch by me, but thanks to Noah Misch for the core idea that we could shut off worker creation when !INTERRUPTS_CAN_BE_PROCESSED. Back-patch to v12, as `ac04aa84a` was. Discussion: https://postgr.es/m/CAC-SaSzHUKT=vZJ8MPxYdC_URPfax+yoA1hKTcF4ROz_Q6z0_Q@mail.gmail.com	2024-11-08 13:42:01 -05:00
Amit Langote	a0cdfc8893	Disallow partitionwise join when collations don't match If the collation of any join key column doesn’t match the collation of the corresponding partition key, partitionwise joins can yield incorrect results. For example, rows that would match under the join key collation might be located in different partitions due to the partitioning collation. In such cases, a partitionwise join would yield different results from a non-partitionwise join, so disallow it in such cases. Reported-by: Tender Wang <tndrwang@gmail.com> Author: Jian He <jian.universality@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com Backpatch-through: 12	2024-11-08 17:19:35 +09:00
Amit Langote	b6484ca953	Disallow partitionwise grouping when collations don't match If the collation of any grouping column doesn’t match the collation of the corresponding partition key, partitionwise grouping can yield incorrect results. For example, rows that would be grouped under the grouping collation may end up in different partitions under the partitioning collation. In such cases, full partitionwise grouping would produce results that differ from those without partitionwise grouping, so disallowed that. Partial partitionwise aggregation is still allowed, as the Finalize step reconciles partition-level aggregates with grouping requirements across all partitions, ensuring that the final output remains consistent. This commit also fixes group_by_has_partkey() by ensuring the RelabelType node is stripped from grouping expressions when matching them to partition key expressions to avoid false mismatches. Bug: #18568 Reported-by: Webbo Han <1105066510@qq.com> Author: Webbo Han <1105066510@qq.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/18568-2a9afb6b9f7e6ed3@postgresql.org Discussion: https://postgr.es/m/tencent_9D9103CDA420C07768349CC1DFF88465F90A@qq.com Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com Backpatch-through: 12	2024-11-08 16:07:13 +09:00
Richard Guo	78b1c553bb	Fix inconsistent RestrictInfo serial numbers When we generate multiple clones of the same qual condition to cope with outer join identity 3, we need to ensure that all the clones get the same serial number. To achieve this, we reset the root->last_rinfo_serial counter each time we produce RestrictInfo(s) from the qual list (see deconstruct_distribute_oj_quals). This approach works only if we ensure that we are not changing the qual list in any way that'd affect the number of RestrictInfos built from it. However, with `b262ad440`, an IS NULL qual on a NOT NULL column might result in an additional constant-FALSE RestrictInfo. And different versions of the same qual clause can lead to different conclusions about whether it can be reduced to constant-FALSE. This would affect the number of RestrictInfos built from the qual list for different versions, causing inconsistent RestrictInfo serial numbers across multiple clones of the same qual. This inconsistency can confuse users of these serial numbers, such as rebuild_joinclause_attr_needed, and lead to planner errors such as "ERROR: variable not found in subplan target lists". To fix, reset the root->last_rinfo_serial counter after generating the additional constant-FALSE RestrictInfo. Back-patch to v17 where the issue crept in. In v17, I failed to make a test case that would expose this bug, so no test case for v17. Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs4-B6kafn+LmPuh-TYFwFyEm-vVj3Qqv7Yo-69CEv14rRg@mail.gmail.com	2024-11-08 11:24:26 +09:00

1 2 3 4 5 ...

25779 Commits