postgres

mirror of https://github.com/postgres/postgres.git synced 2025-11-22 12:22:45 +03:00

Author	SHA1	Message	Date
Alexander Korotkov	d4d11940df	Remove the wrong assertion from match_orclause_to_indexcol() Obviously, the constant could be zero. Also, add the relevant check to regression tests. Reported-by: Richard Guo Discussion: https://postgr.es/m/CAMbWs4-siKJdtWhcbqk4Y-xG12do2Ckm1qw672GNsSnDqL9FQg%40mail.gmail.com	2024-11-25 09:07:30 +02:00
Amit Kapila	d05a387d9d	Doc: Clarify the `inactive_since` field description. Updated to specify that it represents the exact time a slot became inactive, rather than the period of inactivity. Reported-by: Peter Smith Author: Bruce Momjian, Nisha Moond Reviewed-by: Amit Kapila, Peter Smith Backpatch-through: 17 Discussion: https://postgr.es/m/CAHut+PuvsyA5v8y7rYoY9mkDQzUhwaESM05yCByTMaDoRh30tA@mail.gmail.com	2024-11-25 11:12:32 +05:30
Alexander Korotkov	ae4569161a	Teach bitmap path generation about transforming OR-clauses to SAOP's When optimizer generates bitmap paths, it considers breaking OR-clause arguments one-by-one. But now, a group of similar OR-clauses can be transformed into SAOP during index matching. So, bitmap paths should keep up. This commit teaches bitmap paths generation machinery to group similar OR-clauses into dedicated RestrictInfos. Those RestrictInfos are considered both to match index as a whole (as SAOP), or to match as a set of individual OR-clause argument one-by-one (the old way). Therefore, bitmap path generation will takes advantage of OR-clauses to SAOP's transformation. The old way of handling them is also considered. So, there shouldn't be planning regression. Discussion: https://postgr.es/m/CAPpHfdu5iQOjF93vGbjidsQkhHvY2NSm29duENYH_cbhC6x%2BMg%40mail.gmail.com Author: Alexander Korotkov, Andrey Lepikhov Reviewed-by: Alena Rybakina, Andrei Lepikhov, Jian he, Robert Haas Reviewed-by: Peter Geoghegan	2024-11-24 01:41:45 +02:00
Alexander Korotkov	d4378c0005	Transform OR-clauses to SAOP's during index matching This commit makes match_clause_to_indexcol() match "(indexkey op C1) OR (indexkey op C2) ... (indexkey op CN)" expression to the index while transforming it into "indexkey op ANY(ARRAY[C1, C2, ...])" (ScalarArrayOpExpr node). This transformation allows handling long OR-clauses with single IndexScan avoiding diving them into a slower BitmapOr. We currently restrict Ci to be either Const or Param to apply this transformation only when it's clearly beneficial. However, in the future, we might switch to a liberal understanding of constants, as it is in other cases. Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru Author: Alena Rybakina, Andrey Lepikhov, Alexander Korotkov Reviewed-by: Peter Geoghegan, Ranier Vilela, Alexander Korotkov, Robert Haas Reviewed-by: Jian He, Tom Lane, Nikolay Shaplov	2024-11-24 01:40:20 +02:00
Jeff Davis	869ee4f10e	Disallow modifying statistics on system columns. Reported-by: Heikki Linnakangas Discussion: https://postgr.es/m/df3e1c41-4e6c-40ad-9636-98deefe488cd@iki.fi	2024-11-22 12:40:24 -08:00
Nathan Bossart	efdc7d7475	Add INT64_HEX_FORMAT and UINT64_HEX_FORMAT to c.h. Like INT64_FORMAT and UINT64_FORMAT, these macros produce format strings for 64-bit integers. However, INT64_HEX_FORMAT and UINT64_HEX_FORMAT generate the output in hexadecimal instead of decimal. Besides introducing these macros, this commit makes use of them in several places. This was originally intended to be part of commit `5d6187d2a2`, but I left it out because I felt there was a nonzero chance that back-patching these new macros into c.h could cause problems with third-party code. We tend to be less cautious with such changes in new major versions. Note that UINT64_HEX_FORMAT was originally added in commit `ee1b30f128`, but it was placed in test_radixtree.c, so it wasn't widely available. This commit moves UINT64_HEX_FORMAT to c.h. Discussion: https://postgr.es/m/ZwQvtUbPKaaRQezd%40nathan	2024-11-22 12:41:57 -06:00
Heikki Linnakangas	ee937f0409	Fix data loss when restarting the bulk_write facility If a user started a bulk write operation on a fork with existing data to append data in bulk, the bulk_write machinery would zero out all previously written pages up to the last page written by the new bulk_write operation. This is not an issue for PostgreSQL itself, because we never use the bulk_write facility on a non-empty fork. But there are use cases where it makes sense. TimescaleDB extension is known to do that to merge partitions, for example. Backpatch to v17, where the bulk_write machinery was introduced. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Reported-By: Erik Nordström <erik@timescale.com> Reviewed-by: Erik Nordström <erik@timescale.com> Discussion: https://www.postgresql.org/message-id/CACAa4VJ%2BQY4pY7M0ECq29uGkrOygikYtao1UG9yCDFosxaps9g@mail.gmail.com	2024-11-22 16:28:24 +02:00
Michael Paquier	c06e71d1ac	Add write_to_file to PgStat_KindInfo for pgstats kinds This new field controls if entries of a stats kind should be written or not to the on-disk pgstats file when shutting down an instance. This affects both fixed and variable-numbered kinds. This is useful for custom statistics by itself, and a patch is under discussion to add a new builtin stats kind where the write of the stats is not necessary. All the built-in stats kinds, as well as the two custom stats kinds in the test module injection_points, set this flag to "true" for now, so as stats entries are written to the on-disk pgstats file. Author: Bertrand Drouvot Reviewed-by: Nazir Bilal Yavuz Discussion: https://postgr.es/m/Zz7T47nHwYgeYwOe@ip-10-97-1-34.eu-west-3.compute.internal	2024-11-22 10:12:26 +09:00
Peter Eisentraut	53dcba9be5	pgindent run for commit `79b575d3bc`	2024-11-21 21:40:17 +01:00
Álvaro Herrera	7300ff1bd8	Fix outdated bit in README.tuplock Apparently this information has been outdated since first committed, because we adopted a different implementation during development per reviews and this detail was not updated in the README. This has been wrong since commit `0ac5ad5134` introduced the file in 2013. Backpatch to all live branches. Reported-by: Will Mortensen <will@extrahop.com> Discussion: https://postgr.es/m/CAMpnoC6yEQ=c0Rdq-J7uRedrP7Zo9UMp6VZyP23QMT68n06cvA@mail.gmail.com	2024-11-21 16:54:36 +01:00
Peter Eisentraut	79b575d3bc	Fix ALTER TABLE / REPLICA IDENTITY for temporal tables REPLICA IDENTITY USING INDEX did not accept a GiST index. This should be allowed when used as a temporal primary key. Author: Paul Jungwirth <pj@illuminatedcomputing.com> Discussion: https://www.postgresql.org/message-id/04579cbf-b134-45e1-8f2d-8c54c849c1ee@illuminatedcomputing.com	2024-11-21 13:50:18 +01:00
Álvaro Herrera	da94e871e8	Unify repetitive error messages	2024-11-21 10:54:30 +01:00
Michael Paquier	ea792bfd93	Fix memory leak in pgoutput for the WAL sender RelationSyncCache, the hash table in charge of tracking the relation schemas sent through pgoutput, was forgetting to free the TupleDesc associated to the two slots used to store the new and old tuples, causing some memory to be leaked each time a relation is invalidated when the slots of an existing relation entry are cleaned up. This is rather hard to notice as the bloat is pretty minimal, but a long-running WAL sender would be in trouble over time depending on the workload. sysbench has proved to be pretty good at showing the problem, coupled with some memory monitoring of the WAL sender. Issue introduced in `52e4f0cd47`, that has added row filters for tables logically replicated. Author: Boyu Yang Reviewed-by: Michael Paquier, Hou Zhijie Discussion: https://postgr.es/m/DM3PR84MB3442E14B340E553313B5C816E3252@DM3PR84MB3442.NAMPRD84.PROD.OUTLOOK.COM Backpatch-through: 15	2024-11-21 15:14:02 +09:00
Peter Geoghegan	7074337698	Refine nbtree = redundancy preprocessing comment. Spell out how a = key associated with a SAOP array renders a > key against the same index column redundant at the relevant point inside _bt_preprocess_keys. Follow-up to commit `5bf748b8`.	2024-11-20 13:37:08 -05:00
Tom Lane	94131cd53c	Avoid assertion failure if a setop leaf query contains setops. Ordinarily transformSetOperationTree will collect all UNION/ INTERSECT/EXCEPT steps into the setOperations tree of the topmost Query, so that leaf queries do not contain any setOperations. However, it cannot thus flatten a subquery that also contains WITH, ORDER BY, FOR UPDATE, or LIMIT. I (tgl) forgot that in commit `07b4c48b6` and wrote an assertion in rule deparsing that a leaf's setOperations would always be empty. If it were nonempty then we would want to parenthesize the subquery to ensure that the output represents the setop nesting correctly (e.g. UNION below INTERSECT had better get parenthesized). So rather than just removing the faulty Assert, let's change it into an additional case to check to decide whether to add parens. We don't expect that the additional case will ever fire, but it's cheap insurance. Man Zeng and Tom Lane Discussion: https://postgr.es/m/tencent_7ABF9B1F23B0C77606FC5FE3@qq.com	2024-11-20 12:03:47 -05:00
Fujii Masao	6c8f670323	file_fdw: Add REJECT_LIMIT option to file_fdw. Commit `4ac2a9bece` introduced the REJECT_LIMIT option for the COPY command. This commit extends the support for this option to file_fdw. As well as REJECT_LIMIT option for COPY, this option limits the maximum number of erroneous rows that can be skipped. If the number of data type conversion errors exceeds this limit, accessing the file_fdw foreign table will fail with an error, even when on_error = 'ignore' is specified. Since the CREATE/ALTER FOREIGN TABLE commands require foreign table options to be single-quoted, this commit updates defGetCopyRejectLimitOption() to handle also string value for them, in addition to int64 value for COPY command option. Author: Atsushi Torikoshi Reviewed-by: Fujii Masao, Yugo Nagata, Kirill Reshke Discussion: https://postgr.es/m/bab68a9fc502b12693f0755b6f35f327@oss.nttdata.com	2024-11-20 23:53:19 +09:00
Tom Lane	a43d7a8c7c	Compare collations before merging UNION operations. In the dim past we figured it was okay to ignore collations when combining UNION set-operation nodes into a single N-way UNION operation. I believe that was fine at the time, but it stopped being fine when we added nondeterministic collations: the semantics of distinct-ness are affected by those. v17 made it even less fine by allowing per-child sorting operations to be merged via MergeAppend, although I think we accidentally avoided any live bug from that. Add a check that collations match before deciding that two UNION nodes are equivalent. I also failed to resist the temptation to comment plan_union_children() a little better. Back-patch to all supported branches (v13 now), since they all have nondeterministic collations. Discussion: https://postgr.es/m/3605568.1731970579@sss.pgh.pa.us	2024-11-19 18:26:19 -05:00
Fujii Masao	c166454496	Improve error message for database object stats manipulation functions. Previously, database object statistics manipulation functions like pg_set_relation_stats() reported unclear error and hint messages when executed during recovery. These messages were "internal", making it difficult for users to understand the issue: ERROR: cannot acquire lock mode ShareUpdateExclusiveLock on database objects while recovery is in progress HINT: Only RowExclusiveLock or less can be acquired on database objects during recovery. This commit updates the error handling so that, if these functions are called during recovery, they produce clearer messages: ERROR: recovery is in progress HINT: Statistics cannot be modified during recovery. The related documentation has also been updated to explicitly clarify that these functions are not available during recovery. Author: Fujii Masao Reviewed-by: Heikki Linnakangas, Maxim Orlov Discussion: https://postgr.es/m/6d313829-5f56-4a28-ae4b-bd01bf1ae791@oss.nttdata.com	2024-11-20 02:00:50 +09:00
Peter Geoghegan	18ea6b3d0d	nbtree: consistently use minoff variable. This was arguably an oversight in commit `29b64d1de7`, which moved this code from nbtutils.c to its nbtsearch.c caller.	2024-11-18 13:35:28 -05:00
Michael Paquier	c1c09007e2	Improve some code format in gist.c Author: Tender Wang Discussion: https://postgr.es/m/CAHewXNmD=K7XmsHq=L1SyyzZYvwU4oaMG9EKSSMe4OrXfykLzg@mail.gmail.com	2024-11-18 13:41:10 +09:00
Michael Paquier	03a42c9652	Use pg_memory_is_all_zeros() in PageIsVerifiedExtended() Relying on pg_memory_is_all_zeros(), which would apply SIMD instructions when dealing with an aligned page, is proving to be at least three times faster than the original size_t-based comparisons when checking if a BLCKSZ page is full of zeros. Note that PageIsVerifiedExtended() is called each time a page is read from disk, and making it faster is a good thing. Author: Bertrand Drouvot Discussion: https://postgr.es/m/CAApHDvq7P-JgFhgtxUPqhavG-qSDVUhyWaEX9M8_MNorFEijZA@mail.gmail.com	2024-11-18 11:52:35 +09:00
Noah Misch	7b88529f43	Fix per-session activation of ALTER {ROLE\|DATABASE} SET role. After commit `5a2fed911a`, the catalog state resulting from these commands ceased to affect sessions. Restore the longstanding behavior, which is like beginning the session with a SET ROLE command. If cherry-picking the CVE-2024-10978 fixes, default to including this, too. (This fixes an unintended side effect of fixing CVE-2024-10978.) Back-patch to v12, like that commit. The release team decided to include v12, despite the original intent to halt v12 commits earlier this week. Tom Lane and Noah Misch. Reported by Etienne LAFARGE. Discussion: https://postgr.es/m/CADOZwSb0UsEr4_UTFXC5k7=fyyK8uKXekucd+-uuGjJsGBfxgw@mail.gmail.com	2024-11-15 20:39:56 -08:00
Masahiko Sawada	e5ed873b1b	Fix a possibility of logical replication slot's restart_lsn going backwards. Previously LogicalIncreaseRestartDecodingForSlot() accidentally accepted any LSN as the candidate_lsn and candidate_valid after the restart_lsn of the replication slot was updated, so it potentially caused the restart_lsn to move backwards. A scenario where this could happen in logical replication is: after a logical replication restart, based on previous candidate_lsn and candidate_valid values in memory, the restart_lsn advances upon receiving a subscriber acknowledgment. Then, logical decoding restarts from an older point, setting candidate_lsn and candidate_valid based on an old RUNNING_XACTS record. Subsequent subscriber acknowledgments then update the restart_lsn to an LSN older than the current value. In the reported case, after WAL files were removed by a checkpoint, the retreated restart_lsn prevented logical replication from restarting due to missing WAL segments. This change essentially modifies the 'if' condition to 'else if' condition within the function. The previous code had an asymmetry in this regard compared to LogicalIncreaseXminForSlot(), which does almost the same thing for different fields. The WAL removal issue was reported by Hubert Depesz Lubaczewski. Backpatch to all supported versions, since the bug exists since 9.4 where logical decoding was introduced. Reviewed-by: Tomas Vondra, Ashutosh Bapat, Amit Kapila Discussion: https://postgr.es/m/Yz2hivgyjS1RfMKs%40depesz.com Discussion: https://postgr.es/m/85fff40e-148b-4e86-b921-b4b846289132%40vondra.me Backpatch-through: 13	2024-11-15 17:06:11 -08:00
Tom Lane	b69bdcee9c	Avoid assertion due to disconnected NFA sub-graphs in regex parsing. In commit `08c0d6ad6` which introduced "rainbow" arcs in regex NFAs, I didn't think terribly hard about what to do when creating the color complement of a rainbow arc. Clearly, the complement cannot match any characters, and I took the easy way out by just not building any arcs at all in the complement arc set. That mostly works, but Nikolay Shaplov found a case where it doesn't: if we decide to delete that sub-NFA later because it's inside a "{0}" quantifier, delsub() suffered an assertion failure. That's because delsub() relies on the target sub-NFA being fully connected. That was always true before, and the best fix seems to be to restore that property. Hence, invent a new arc type CANTMATCH that can be generated in place of an empty color complement, and drop it again later when we start NFA optimization. (At that point we don't need to do delsub() any more, and besides there are other cases where NFA optimization can lead to disconnected subgraphs.) It appears that this bug has no consequences in a non-assert-enabled build: there will be some transiently leaked NFA states/arcs, but they'll get cleaned up eventually. Still, we don't like assertion failures, so back-patch to v14 where rainbow arcs were introduced. Per bug #18708 from Nikolay Shaplov. Discussion: https://postgr.es/m/18708-f94f2599c9d2c005@postgresql.org	2024-11-15 18:23:38 -05:00
Fujii Masao	9a70f67667	Remove unnecessary backslash from CopyFrom() code. Commit `4ac2a9bece` accidentally added an unnecessary backslash to CopyFrom() code. This commit removes it. Author: Yugo Nagata Reviewed-by: Tender Wang Discussion: https://postgr.es/m/20241112114609.4175a2e175282edd1463dbc6@sraoss.co.jp	2024-11-16 01:59:33 +09:00
Peter Eisentraut	9321d2fdf8	Fix collation handling for foreign keys Allowing foreign keys where the referenced and the referencing columns have collations with different notions of equality is problematic. This can only happen when using nondeterministic collations, for example, if the referencing column is case-insensitive and the referenced column is not, or vice versa. It does not happen if both collations are deterministic. To show one example: CREATE COLLATION case_insensitive (provider = icu, deterministic = false, locale = 'und-u-ks-level2'); CREATE TABLE pktable (x text COLLATE "C" PRIMARY KEY); CREATE TABLE fktable (x text COLLATE case_insensitive REFERENCES pktable ON UPDATE CASCADE ON DELETE CASCADE); INSERT INTO pktable VALUES ('A'), ('a'); INSERT INTO fktable VALUES ('A'); BEGIN; DELETE FROM pktable WHERE x = 'a'; TABLE fktable; ROLLBACK; BEGIN; DELETE FROM pktable WHERE x = 'A'; TABLE fktable; ROLLBACK; Both of these DELETE statements delete the one row from fktable. So this means that one row from fktable references two rows in pktable, which should not happen. (That's why a primary key or unique constraint is required on pktable.) When nondeterministic collations were implemented, the SQL standard available to yours truly said that referential integrity checks should be performed with the collation of the referenced column, and so that's how we implemented it. But this turned out to be a mistake in the SQL standard, for the same reasons as above, that was later (SQL:2016) fixed to require both collations to be the same. So that's what we are aiming for here. We don't have to be quite so strict. We can allow different collations if they are both deterministic. This is also good for backward compatibility. So the new rule is that the collations either have to be the same or both deterministic. Or in other words, if one of them is nondeterministic, then both have to be the same. Users upgrading from before that have affected setups will need to make changes to their schemas (i.e., change one or both collations in affected foreign-key relationships) before the upgrade will succeed. Some of the nice test cases for the previous situation in collate.icu.utf8.sql are now obsolete. They are changed to just check the error checking of the new rule. Note that collate.sql already contained a test for foreign keys with different deterministic collations. A bunch of code in ri_triggers.c that added a COLLATE clause to enforce the referenced column's collation can be removed, because both columns now have to have the same notion of equality, so it doesn't matter which one to use. Reported-by: Paul Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Jian He <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/78d824e0-b21e-480d-a252-e4b84bc2c24b@illuminatedcomputing.com	2024-11-15 14:55:54 +01:00
Peter Eisentraut	d31bbfb659	Proper object locking for GRANT/REVOKE Refactor objectNamesToOids() to use get_object_address() internally if possible. Not only does this save a lot of code, it also allows us to use the object locking provided by get_object_address() for GRANT/REVOKE. There was previously a code comment that complained about the lack of locking in objectNamesToOids(), which is now fixed. The check in ExecGrant_Type_check() is obsolete because get_object_address_type() already does the same check. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/bf72b82c-124d-4efa-a484-bb928e9494e4@eisentraut.org	2024-11-15 11:03:48 +01:00
Heikki Linnakangas	cfd7f36c83	jit: Stop emitting some unnecessary instructions In EEOP_BOOL_AND_STEP* and EEOP_BOOL_OR_STEP*, we emitted pointlesss store instructions to store to resnull/resvalue values that were just loaded from the same fields in the previous instructions. They will surely get optimized away by LLVM if any optimizations are enabled, but it's better to not emit them in the first place. In EEOP_BOOL_NOT_STEP, similar story with resnull. In EEOP_NULLIF, when it returns NULL, there was also a redundant store to resvalue just after storing a 0 to it. The value of resvalue doesn't matter when resnull is set, so in fact even storing the 0 is unnecessary, but I kept that because we tend to do that for general tidiness. Author: Xing Guo <higuoxing@gmail.com> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Discussion: https://www.postgresql.org/message-id/CACpMh%2BC%3Dg13WdvzLRSponsVWGgxwDSMzQWM4Gz0heOyaA0-N6g@mail.gmail.com	2024-11-15 10:06:36 +02:00
Peter Eisentraut	e468ec0fdd	Add an assertion in get_object_address() Some places declared a Relation before calling get_object_address() only to assert that the relation is NULL after the call. The new assertion allows passing NULL as the relation argument at those places making the code cleaner and easier to understand. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/ZzG34eNrT83W/Orz@ip-10-97-1-34.eu-west-3.compute.internal	2024-11-15 08:52:43 +01:00
Michael Paquier	818119afcc	Fix race conditions with drop of reused pgstats entries This fixes a set of race conditions with cumulative statistics where a shared stats entry could be dropped while it should still be valid in the event when it is reused: an entry may refer to a different object but requires the same hash key. This can happen with various stats kinds, like: - Replication slots that compute internally an index number, for different slot names. - Stats kinds that use an OID in the object key, where a wraparound causes the same key to be used if an OID is used for the same object. - As of PostgreSQL 18, custom pgstats kinds could also be an issue, depending on their implementation. This issue is fixed by introducing a counter called "generation" in the shared entries via PgStatShared_HashEntry, initialized at 0 when an entry is created and incremented when the same entry is reused, to avoid concurrent issues on drop because of other backends still holding a reference to it. This "generation" is copied to the local copy that a backend holds when looking at an object, then cross-checked with the shared entry to make sure that the entry is not dropped even if its "refcount" justifies that if it has been reused. This problem could show up when a backend shuts down and needs to discard any entries it still holds, causing statistics to be removed when they should not, or even an assertion failure. Another report involved a failure in a standby after an OID wraparound, where the startup process would FATAL on a "can only drop stats once", stopping recovery abruptly. The buildfarm has been sporadically complaining about the problem, as well, but the window is hard to reach with the in-core tests. Note that the issue can be reproduced easily by adding a sleep before dshash_find() in pgstat_release_entry_ref() to enlarge the problematic window while repeating test_decoding's isolation test oldest_xmin a couple of times, for example, as pointed out by Alexander Lakhin. Reported-by: Alexander Lakhin, Peter Smith Author: Kyotaro Horiguchi, Michael Paquier Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/CAA4eK1KxuMVyAryz_Vk5yq3ejgKYcL6F45Hj9ZnMNBS-g+PuZg@mail.gmail.com Discussion: https://postgr.es/m/17947-b9554521ad963c9c@postgresql.org Backpatch-through: 15	2024-11-15 11:31:58 +09:00
Heikki Linnakangas	5b00786857	Pass MyPMChildSlot as an explicit argument to child process All the other global variables passed from postmaster to child have the same value in all the processes, while MyPMChildSlot is more like a parameter to each child process. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi	2024-11-14 16:12:32 +02:00
Heikki Linnakangas	a78af04270	Assign a child slot to every postmaster child process Previously, only backends, autovacuum workers, and background workers had an entry in the PMChildFlags array. With this commit, all postmaster child processes, including all the aux processes, have an entry. Dead-end backends still don't get an entry, though, and other processes that don't touch shared memory will never mark their PMChildFlags entry as active. We now maintain separate freelists for different kinds of child processes. That ensures that there are always slots available for autovacuum and background workers. Previously, pre-authentication backends could prevent autovacuum or background workers from starting up, by using up all the slots. The code to manage the slots in the postmaster process is in a new pmchild.c source file. Because postmaster.c is just so large. Assigning pmsignal slot numbers is now pmchild.c's responsibility. This replaces the PMChildInUse array in pmsignal.c. Some of the comments in postmaster.c still talked about the "stats process", but that was removed in commit `5891c7a8ed`. Fix those while we're at it. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi	2024-11-14 16:12:28 +02:00
Heikki Linnakangas	bb861414fe	Kill dead-end children when there's nothing else left Previously, the postmaster would never try to kill dead-end child processes, even if there were no other processes left. A dead-end backend will eventually exit, when authentication_timeout expires, but if a dead-end backend is the only thing that's preventing the server from shutting down, it seems better to kill it immediately. It's particularly important, if there was a bug in the early startup code that prevented a dead-end child from timing out and exiting normally. Includes a test for that case where a dead-end backend previously prevented the server from shutting down. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi	2024-11-14 16:12:04 +02:00
Heikki Linnakangas	18d67a8d7d	Replace postmaster.c's own backend type codes with BackendType Introduce a separate BackendType for dead-end children, so that we don't need a separate dead_end flag. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi	2024-11-14 16:06:16 +02:00
Peter Eisentraut	a274bbb1b3	Remove a useless cast to (void *) in hash_search() call This pattern was previously cleaned up in `54a177a948`, but a new instance snuck in around the same time in `31966b151e`.	2024-11-14 09:30:14 +01:00
Peter Geoghegan	4e6e375b00	Add nbtree amgettuple return item function. This makes it easier to add precondition assertions. We now assert that the last call to _bt_readpage succeeded, and that the current item index is within the bounds of the currPos items array. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Discussion: https://postgr.es/m/CAH2-WznFkEs9K1PtNruti5JjawY-dwj+gkaEh_k1ZE+1xLLGkA@mail.gmail.com	2024-11-13 09:50:57 -05:00
Michael Paquier	d74b590983	Fix comment in injection_point.c InjectionPointEntry->name was described as a hash key, which was fine when introduced in `d86d20f0ba`, but it is not now. Oversight in `86db52a506`, that has changed the way injection points are stored in shared memory from a hash table to an array. Backpatch-through: 17	2024-11-13 13:58:09 +09:00
Peter Geoghegan	3be30d0075	Fix obsolete nbtree page reuse FSM comment. Oversight in commit `d088ba5a`.	2024-11-12 22:09:00 -05:00
Álvaro Herrera	ff239c3bf4	Silence compilers about extractNotNullColumn() Multiple buildfarm animals warn that a newly added Assert() is impossible to fail; remove it to avoid the noise. While at it, use direct assignment to obtain the value we need, avoiding an unnecessary memcpy(). (I decided to remove the "pfree" call for the detoasted short-datum; because this is only used for DDL, it's not problematic to leak such a small allocation.) Noted by Tom Lane about `14e87ffa5c`. Discussion: https://postgr.es/m/3649828.1731083171@sss.pgh.pa.us	2024-11-12 11:35:43 +01:00
Alexander Korotkov	db22b90024	Fix arrays comparison in CompareOpclassOptions() The current code calls array_eq() and does not provide FmgrInfo. This commit provides initialization of FmgrInfo and uses C collation as the safe option for text comparison because we don't know anything about the semantics of opclass options. Backpatch to 13, where opclass options were introduced. Reported-by: Nicolas Maus Discussion: https://postgr.es/m/18692-72ea398df3ec6712%40postgresql.org Backpatch-through: 13	2024-11-12 01:44:20 +02:00
Tom Lane	73c9f91a1b	Parallel workers use AuthenticatedUserId for connection privilege checks. Commit `5a2fed911` had an unexpected side-effect: the parallel worker launched for the new test case would fail if it couldn't use a superuser-reserved connection slot. The reason that test failed while all our pre-existing ones worked is that the connection privilege tests in InitPostgres had been based on the superuserness of the leader's AuthenticatedUserId, but after the rearrangements of `5a2fed911` we were testing the superuserness of CurrentUserId, which the new test case deliberately made to be a non-superuser. This all seems very accidental and probably not the behavior we really want, but a security patch is no time to be redesigning things. Pending some discussion about desirable semantics, hack it so that InitPostgres continues to pay attention to the superuserness of AuthenticatedUserId when starting a parallel worker. Nathan Bossart and Tom Lane, per buildfarm member sawshark. Security: CVE-2024-10978	2024-11-11 17:05:53 -05:00
Tom Lane	5a2fed911a	Fix improper interactions between session_authorization and role. The SQL spec mandates that SET SESSION AUTHORIZATION implies SET ROLE NONE. We tried to implement that within the lowest-level functions that manipulate these settings, but that was a bad idea. In particular, guc.c assumes that it doesn't matter in what order it applies GUC variable updates, but that was not the case for these two variables. This problem, compounded by some hackish attempts to work around it, led to some security-grade issues: * Rolling back a transaction that had done SET SESSION AUTHORIZATION would revert to SET ROLE NONE, even if that had not been the previous state, so that the effective user ID might now be different from what it had been. * The same for SET SESSION AUTHORIZATION in a function SET clause. * If a parallel worker inspected current_setting('role'), it saw "none" even when it should see something else. Also, although the parallel worker startup code intended to cope with the current role's pg_authid row having disappeared, its implementation of that was incomplete so it would still fail. Fix by fully separating the miscinit.c functions that assign session_authorization from those that assign role. To implement the spec's requirement, teach set_config_option itself to perform "SET ROLE NONE" when it sets session_authorization. (This is undoubtedly ugly, but the alternatives seem worse. In particular, there's no way to do it within assign_session_authorization without incompatible changes in the API for GUC assign hooks.) Also, improve ParallelWorkerMain to directly set all the relevant user-ID variables instead of relying on some of them to get set indirectly. That allows us to survive not finding the pg_authid row during worker startup. In v16 and earlier, this includes back-patching `9987a7bf3` which fixed a violation of GUC coding rules: SetSessionAuthorization is not an appropriate place to be throwing errors from. Security: CVE-2024-10978	2024-11-11 10:29:54 -05:00
Nathan Bossart	cd7ab57532	Ensure cached plans are correctly marked as dependent on role. If a CTE, subquery, sublink, security invoker view, or coercion projection references a table with row-level security policies, we neglected to mark the plan as potentially dependent on which role is executing it. This could lead to later executions in the same session returning or hiding rows that should have been hidden or returned instead. Reported-by: Wolfgang Walther Reviewed-by: Noah Misch Security: CVE-2024-10976 Backpatch-through: 12	2024-11-11 09:00:00 -06:00
Michael Paquier	e7a9496de9	Add two attributes to pg_stat_database for parallel workers activity Two attributes are added to pg_stat_database: * parallel_workers_to_launch, counting the total number of parallel workers that were planned to be launched. * parallel_workers_launched, counting the total number of parallel workers actually launched. The ratio of both fields can provide hints that there are not enough slots available when launching parallel workers, also useful when pg_stat_statements is not deployed on an instance (i.e. `cf54a2c002`). This commit relies on `de3a2ea3b2`, that has added two fields to EState, that get incremented when executing Gather or GatherMerge nodes. A test is added in select_parallel, where parallel workers are spawned. Bump catalog version. Author: Benoit Lobréau Discussion: https://postgr.es/m/783bc7f7-659a-42fa-99dd-ee0565644e25@dalibo.com	2024-11-11 10:40:48 +09:00
Peter Geoghegan	caca6d8d27	Assert consistency of currPage that ended scan. When _bt_readnextpage is called with our nbtree parallel scan already seized (i.e. when it is directly called by _bt_first), we never expect a prior call to _bt_readpage for lastcurrblkno to already indicate that the scan should end -- the _bt_first caller's blkno must always be read. After all, the "prior" _bt_readpage call (the call for lastcurrblkno) probably took place in some other backend (and it might not even have finished by the time our backend reaches _bt_first/_bt_readnextpage). Add a documenting assertion to the path where _bt_readnextpage ends the parallel scan based on information about lastcurrblkno from so->currPos. Assert that the most recent _bt_readpage call that set so->currPos is in fact lastcurrblkno's _bt_readpage call. Follow-up to bugfix commit `b5ee4e52`.	2024-11-08 16:34:41 -05:00
Tom Lane	b8df690492	Improve fix for not entering parallel mode when holding interrupts. Commit `ac04aa84a` put the shutoff for this into the planner, which is not ideal because it doesn't prevent us from re-using a previously made parallel plan. Revert the planner change and instead put the shutoff into InitializeParallelDSM, modeling it on the existing code there for recovering from failure to allocate a DSM segment. However, that code path is mostly untested, and testing a bit harder showed there's at least one bug: ExecHashJoinReInitializeDSM is not prepared for us to have skipped doing parallel DSM setup. I also thought the Assert in ReinitializeParallelWorkers is pretty ill-advised, and replaced it with a silent Min() operation. The existing test case added by `ac04aa84a` serves fine to test this version of the fix, so no change needed there. Patch by me, but thanks to Noah Misch for the core idea that we could shut off worker creation when !INTERRUPTS_CAN_BE_PROCESSED. Back-patch to v12, as `ac04aa84a` was. Discussion: https://postgr.es/m/CAC-SaSzHUKT=vZJ8MPxYdC_URPfax+yoA1hKTcF4ROz_Q6z0_Q@mail.gmail.com	2024-11-08 13:42:10 -05:00
Peter Geoghegan	b5ee4e5202	Avoid nbtree parallel scan currPos confusion. Commit `1bd4bc85`, which refactored nbtree sibling link traversal, made _bt_parallel_seize reset the scan's currPos so that things were consistent with the state of a serial backend moving between pages. This overlooked the fact that _bt_readnextpage relied on the existing currPos state to decide when to end the scan -- even though it came from before the scan was seized. As a result of all this, parallel nbtree scans could needlessly behave like full index scans. To fix, teach _bt_readnextpage to explicitly allow the use of an already read page's so->currPos when deciding whether to end the scan -- even during parallel index scans (allow it consistently now). This requires moving _bt_readnextpage's seizure of the scan to earlier in its loop. That way _bt_readnextpage either deals with the true so->currPos state, or an initialized-by-_bt_parallel_seize currPos state set from when the scan was seized. Now _bt_steppage (the most important _bt_readnextpage caller) takes the same uniform approach to setting up its call using details taken from so->currPos -- regardless of whether the scan happens to be parallel or serial. The new loop structure in _bt_readnextpage is prone to getting confused by P_NONE blknos set when the rightmost or leftmost page was reached. We could avoid that by adding an explicit check, but that would be ugly. Avoid this problem by teaching _bt_parallel_seize to end the parallel scan instead of returning a P_NONE next block/blkno. Doing things this way was arguably a missed opportunity for commit `1bd4bc85`. It allows us to remove a similar "blkno == P_NONE" check from _bt_first. Oversight in commit `1bd4bc85`, which refactored sibling link traversal (as part of optimizing nbtree backward scan locking). Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Diagnosed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Reviewed-By: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Discussion: https://postgr.es/m/f8efb9c0f8d1a71b44fd7f8e42e49c25@oss.nttdata.com	2024-11-08 13:10:10 -05:00
Álvaro Herrera	14e87ffa5c	Add pg_constraint rows for not-null constraints We now create contype='n' pg_constraint rows for not-null constraints on user tables. Only one such constraint is allowed for a column. We propagate these constraints to other tables during operations such as adding inheritance relationships, creating and attaching partitions and creating tables LIKE other tables. These related constraints mostly follow the well-known rules of conislocal and coninhcount that we have for CHECK constraints, with some adaptations: for example, as opposed to CHECK constraints, we don't match not-null ones by name when descending a hierarchy to alter or remove it, instead matching by the name of the column that they apply to. This means we don't require the constraint names to be identical across a hierarchy. The inheritance status of these constraints can be controlled: now we can be sure that if a parent table has one, then all children will have it as well. They can optionally be marked NO INHERIT, and then children are free not to have one. (There's currently no support for altering a NO INHERIT constraint into inheriting down the hierarchy, but that's a desirable future feature.) This also opens the door for having these constraints be marked NOT VALID, as well as allowing UNIQUE+NOT NULL to be used for functional dependency determination, as envisioned by commit `e49ae8d3bc`. It's likely possible to allow DEFERRABLE constraints as followup work, as well. psql shows these constraints in \d+, though we may want to reconsider if this turns out to be too noisy. Earlier versions of this patch hid constraints that were on the same columns of the primary key, but I'm not sure that that's very useful. If clutter is a problem, we might be better off inventing a new \d++ command and not showing the constraints in \d+. For now, we omit these constraints on system catalog columns, because they're unlikely to achieve anything. The main difference to the previous attempt at this (`b0e96f3119`) is that we now require that such a constraint always exists when a primary key is in the column; we didn't require this previously which had a number of unpalatable consequences. With this requirement, the code is easier to reason about. For example: - We no longer have "throwaway constraints" during pg_dump. We needed those for the case where a table had a PK without a not-null underneath, to prevent a slow scan of the data during restore of the PK creation, which was particularly problematic for pg_upgrade. - We no longer have to cope with attnotnull being set spuriously in case a primary key is dropped indirectly (e.g., via DROP COLUMN). Some bits of code in this patch were authored by Jian He. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Bernd Helmle <mailings@oopsware.de> Reviewed-by: 何建 (jian he) <jian.universality@gmail.com> Reviewed-by: 王刚 (Tender Wang) <tndrwang@gmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/202408310358.sdhumtyuy2ht@alvherre.pgsql	2024-11-08 13:28:48 +01:00
Amit Langote	075acdd933	Disallow partitionwise join when collations don't match If the collation of any join key column doesn’t match the collation of the corresponding partition key, partitionwise joins can yield incorrect results. For example, rows that would match under the join key collation might be located in different partitions due to the partitioning collation. In such cases, a partitionwise join would yield different results from a non-partitionwise join, so disallow it in such cases. Reported-by: Tender Wang <tndrwang@gmail.com> Author: Jian He <jian.universality@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com Backpatch-through: 12	2024-11-08 17:25:24 +09:00
Amit Langote	90fe6251c8	Disallow partitionwise grouping when collations don't match If the collation of any grouping column doesn’t match the collation of the corresponding partition key, partitionwise grouping can yield incorrect results. For example, rows that would be grouped under the grouping collation may end up in different partitions under the partitioning collation. In such cases, full partitionwise grouping would produce results that differ from those without partitionwise grouping, so disallowed that. Partial partitionwise aggregation is still allowed, as the Finalize step reconciles partition-level aggregates with grouping requirements across all partitions, ensuring that the final output remains consistent. This commit also fixes group_by_has_partkey() by ensuring the RelabelType node is stripped from grouping expressions when matching them to partition key expressions to avoid false mismatches. Bug: #18568 Reported-by: Webbo Han <1105066510@qq.com> Author: Webbo Han <1105066510@qq.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/18568-2a9afb6b9f7e6ed3@postgresql.org Discussion: https://postgr.es/m/tencent_9D9103CDA420C07768349CC1DFF88465F90A@qq.com Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com Backpatch-through: 12	2024-11-08 16:07:22 +09:00

... 24 25 26 27 28 ...

27434 Commits