postgres

mirror of https://github.com/postgres/postgres.git synced 2025-10-16 17:07:43 +03:00

Author	SHA1	Message	Date
Andres Freund	d38bab5edd	pgbench: Increase RLIMIT_NOFILE if necessary pgbench already had code to check if the soft rlimit is too low for the specified number of connections. If too low, it errored out, telling the user to increase the limit. However, we can do better: If the hard limit allows, increase the soft limit to be sufficiently for the number of connections. It is common for the soft limit to be considerably lower than the hard limit, due to the danger of soft limits > 1024 breaking programs that use the select(2), as explained in [1]. [1]: https://0pointer.net/blog/file-descriptor-limits.html Author: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAGECzQQh6VSy3KG4pN1d%3Dh9J%3DD1rStFCMR%2Bt7yh_Kwj-g87aLQ%40mail.gmail.com	2025-02-19 19:35:09 -05:00
Michael Paquier	9b1cb58c5f	test_escape: Fix output of --help The short option name -f was not listed, only its long option name --force-unsupported. Author: Japin Li Discussion: https://postgr.es/m/ME0P300MB04452BD1FB1B277D4C1C20B9B6C52@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM Backpatch-through: 13	2025-02-20 09:30:54 +09:00
Tomas Vondra	9ba7bcc894	Correct relation size estimate with low fillfactor Since commit `29cf61ade3`, table_block_relation_estimate_size() considers fillfactor when estimating number of rows in a relation before the first ANALYZE. The formula however did not consider tuples may be larger than available space determined by fillfactor, ending with density 0. This ultimately means the relation was estimated to contain a single row. The executor however places at least one tuple per page, even with very low fillfactor values, so the density should be at least 1. Fixed by clamping the density estimate using clamp_row_est(). Reported by Heikki Linnakangas. Fix by me, with regression test inspired by example provided by Heikki. Backpatch to 17, where the issue was introduced. Reported-by: Heikki Linnakangas Backpatch-through: 17 Discussion: https://postgr.es/m/2bf9d973-7789-4937-a7ca-0af9fb49c71e@iki.fi	2025-02-19 23:53:37 +01:00
Tom Lane	e596e077bb	Assert that ExecOpenIndices and ExecCloseIndices are not repeated. These functions should be called at most once per ResultRelInfo; it's wasteful to do otherwise, and certainly the pattern of opening twice and then closing twice is a bad idea. Moreover, aminsertcleanup functions might not be prepared to be called twice, as the just-hardened code in BRIN demonstrates. This amounts to an API change, since such coding patterns were safe even if wasteful before v17. Hence, apply to HEAD only. (Extension code violating this new rule faces some risk in v17, but we just fixed brininsertcleanup and there are probably few other aminsertcleanup functions as yet. So the odds of breaking usable code seem higher than the odds of doing something useful with a back-patch.) Bug: #18815 Reported-by: Sergey Belyashov <sergey.belyashov@gmail.com> Discussion: https://postgr.es/m/18815-2a0407cc7f40b327@postgresql.org	2025-02-19 16:45:12 -05:00
Tom Lane	9ff68679b5	Fix crash in brininsertcleanup during logical replication. Logical replication crashes if the subscriber's partitioned table has a BRIN index. There are two independently blamable causes, and this patch fixes both: 1. brininsertcleanup fails if called twice for the same IndexInfo, because it half-destroys its BrinInsertState but leaves it still linked from ii_AmCache. brininsert would also fail in that state, so it's pretty hard to see any advantage to this coding. Fully remove the BrinInsertState, instead, so that a new brininsert call would create a new cache. 2. A logical replication subscriber sometimes does ExecOpenIndices twice on the same ResultRelInfo, followed by doing ExecCloseIndices twice; the second call reaches the brininsertcleanup bug. Quite aside from tickling unexpected cases in aminsertcleanup methods, this seems very wasteful, because the IndexInfos built in the first ExecOpenIndices call are just lost during the second call, and have to be rebuilt at possibly-nontrivial cost. We should establish a coding rule that you don't do that. The problematic coding is that when the target table is partitioned, apply_handle_tuple_routing calls ExecFindPartition which does ExecOpenIndices (and expects that ExecCleanupTupleRouting will close the indexes again). Using the ResultRelInfo made by ExecFindPartition, it calls apply_handle_delete_internal or apply_handle_insert_internal, both of which think they need to do ExecOpenIndices/ExecCloseIndices for themselves. They do in the main non-partitioned code paths, but not here. The simplest fix is to pull their ExecOpenIndices/ExecCloseIndices calls out and put them in the call sites for the non-partitioned cases. (We could have refactored apply_handle_update_internal similarly, but I did not do so today because there's no bug there: the partitioned code path doesn't call it.) Also, remove the always-duplicative open/close calls within apply_handle_tuple_routing itself. Since brininsertcleanup and indeed the whole aminsertcleanup mechanism are new in v17, there's no observable bug in older branches. A case could be made for trying to avoid these duplicative open/close calls in the older branches, but for now it seems not worth the trouble and risk of new bugs. Bug: #18815 Reported-by: Sergey Belyashov <sergey.belyashov@gmail.com> Discussion: https://postgr.es/m/18815-2a0407cc7f40b327@postgresql.org Backpatch-through: 17	2025-02-19 16:35:15 -05:00
Tomas Vondra	a1b4f289be	Consider BufFiles when adjusting hashjoin parameters Until now ExecChooseHashTableSize() considered only the size of the in-memory hash table, and ignored the memory needed for the batch files. Which can be a significant amount, because each batch needs two BufFiles (each with a BLCKSZ buffer). The same issue applies to increasing the number of batches during execution. It's also possible to trigger a "batch explosion", e.g. due to duplicate values or skew. We've seen reports of joins with hundreds of thousands (or even millions) of batches, consuming gigabytes of memory, triggering OOM errors. These cases may be fairly rare, but it's clearly possible to hit them. These issues can't be prevented during planning. Even if we improve that, it does not help with execution-time batch explosion. We can however reduce the impact and use as little memory as possible. This patch improves the behavior by adjusting how the memory is divided between the hash table and batch files. It may be better to use fewer batch files, even if it means the hash table will exceed the limit. The capacity of the hash node may be increased either by doubling he number of batches, or doubling the size of the in-memory hash table. The outcome is the same, but the memory usage may be very different. For low nbatch values it's better to add batches, for high nbatch values it's better to allow a larger hash table. The patch considers both options, both during the initial sizing and then during execution, to minimize how much the limit gets exceeded. It might seem this patch is relaxing the memory limit - allowing it to be exceeded. But that's not really the case. It has always been like that, except the memory used by batches was ignored. Allowing the hash table to grow may also prevent the batch explosion. If there's a large batch that can't be split (due to hash collisions or duplicate values), at some point the memory limit will increase enough for the batch to fit into the hash table. This patch was in the works for a long time. The early versions were posted in 2019, and revived every year or two when we happened to get the next report of OOM due to a hashjoin batch explosion. Each of those patch versions were reviewed by a couple people. I'm mentioning only Melanie Plageman and Robert Haas, because they reviewed the last version, and the older patches are very different. Reviewed-by: Melanie Plageman, Robert Haas Discussion: https://postgr.es/m/7bed6c08-72a0-4ab9-a79c-e01fcdd0940f@vondra.me Discussion: https://postgr.es/m/20190504003414.bulcbnge3rhwhcsh%40development Discussion: https://postgr.es/m/20190428141901.5dsbge2ka3rxmpk6%40development	2025-02-19 21:08:20 +01:00
Andres Freund	8b886a4e34	tests: BackgroundPsql: Fix potential for lost errors on windows This addresses various corner cases in BackgroundPsql: - On windows stdout and stderr may arrive out of order, leading to errors not being reported, or attributed to the wrong statement. To fix, emit the "query-separation banner" on both stdout and stderr and wait for both. - Very occasionally the "query-separation banner" would not get removed, because we waited until the banner arrived, but then replaced the banner plus newline. To fix, wait for banner and newline. - For interactive psql replacing $banner\n is not sufficient, interactive psql outputs \r\n. - For interactive psql, where commands are echoed to stdout, the \echo command, rather than its output, would be matched. This would sometimes lead to output from the prior query, or wait_connect(), being returned in the next command. This also affected wait_connect(), leading to sometimes sending queries to psql before the connection actually was established. While debugging these issues I also found that it's hard to know whether a query separation banner was attributed to the right query. Make that easier by counting the queries each BackgroundPsql instance has emitted and include the number in the banner. Also emit psql stdout/stderr in query() and wait_connect() as Test::More notes, without that it's rather hard to debug some issues in CI and buildfarm. As this can cause issues not just to-be-added tests, but also existing ones, backpatch the fix to all supported versions. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/wmovm6xcbwh7twdtymxuboaoarbvwj2haasd3sikzlb3dkgz76@n45rzycluzft Backpatch-through: 13	2025-02-19 10:45:48 -05:00
Álvaro Herrera	80d7f99049	Add ATAlterConstraint struct for ALTER .. CONSTRAINT Replace the use of Constraint with a new ATAlterConstraint struct, which allows us to pass additional information. No functionality is added by this commit. This is necessary for future work that allows altering constraints in other ways. I (Álvaro) took the liberty of restructuring the code for ALTER CONSTRAINT beyond what Amul did. The original coding before Amul's patch was unnecessarily baroque, and this change makes things simpler by removing one level of subroutine. Also, partly remove the assumption that only partitioned tables are relevant (by passing sensible 'recurse' arguments) and no longer ignore whether ONLY was specified. I say 'partly' because the current coding only walks down via the 'conparentid' relationship, which is only used for partitioned tables; but future patches could handle ONLY or not for other types of constraint changes for legacy inheritance trees too. Author: Amul Sul <sulamul@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CAAJ_b94bfgPV-8Mw_HwSBeheVwaK9=5s+7+KbBj_NpwXQFgDGg@mail.gmail.com	2025-02-19 13:06:13 +01:00
Alexander Korotkov	e983ee9380	Improve statistics estimation for single-column GROUP BY in sub-queries This commit follows the idea of the `4767bc8ff2`. If sub-query has only one GROUP BY column, we can consider its output variable as being unique. We can employ this fact in the statistics to make more precise estimations in the upper query block. Author: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>	2025-02-19 11:59:30 +02:00
Amit Kapila	8a695d7998	Add a test for commit `ac0e33136a` using the injection point. This test uses an injection point to bypass the time overhead caused by the idle_replication_slot_timeout GUC, which has a minimum value of one minute. Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Author: Nisha Moond <nisha.moond412@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Vignesh C <vignesh21@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com	2025-02-19 15:02:22 +05:30
Michael Paquier	302cf15759	Add support for LIKE in CREATE FOREIGN TABLE LIKE enables the creation of foreign tables based on the column definitions, constraints and objects of the defined source relation(s). This feature mirrors the behavior of CREATE TABLE LIKE, but ignores the INCLUDING sub-options that do not make sense for foreign tables: INDEXES, COMPRESSION, IDENTITY and STORAGE. The supported sub-options are COMMENTS, CONSTRAINTS, DEFAULTS, GENERATED and STATISTICS, mapping with the clauses already supported by the command. Note that the restriction with LIKE in CREATE FOREIGN TABLE was added in `a0c6dfeecf`. Author: Zhang Mingli Reviewed-by: Álvaro Herrera, Sami Imseih, Michael Paquier Discussion: https://postgr.es/m/42d3f855-2275-4361-a42a-826172ca2dc4@Spark	2025-02-19 15:50:37 +09:00
Amit Langote	e7563e3c75	doc: Fix some issues with JSON_TABLE() examples 1. Remove an unused PASSING variable. 2. Adjust formatting of JSON data used in an example to be valid under strict mode Reported-by: Miłosz Chmura <mieszko4@gmail.com> Author: Robert Treat <rob@xzilla.net> Discussion: https://postgr.es/m/173859550337.1071.4748984213168572913@wrigleys.postgresql.org	2025-02-19 15:08:17 +09:00
Amit Kapila	ac0e33136a	Invalidate inactive replication slots. This commit introduces idle_replication_slot_timeout GUC that allows inactive slots to be invalidated at the time of checkpoint. Because checkpoints happen checkpoint_timeout intervals, there can be some lag between when the idle_replication_slot_timeout was exceeded and when the slot invalidation is triggered at the next checkpoint. To avoid such lags, users can force a checkpoint to promptly invalidate inactive slots. Note that the idle timeout invalidation mechanism is not applicable for slots that do not reserve WAL or for slots on the standby server that are synced from the primary server (i.e., standby slots having 'synced' field 'true'). Synced slots are always considered to be inactive because they don't perform logical decoding to produce changes. The slots can become inactive for a long period if a subscriber is down due to a system error or inaccessible because of network issues. If such a situation persists, it might be more practical to recreate the subscriber rather than attempt to recover the node and wait for it to catch up which could be time-consuming. Then, external tools could create replication slots (e.g., for migrations or upgrades) that may fail to remove them if an error occurs, leaving behind unused slots that take up space and resources. Manually cleaning them up can be tedious and error-prone, and without intervention, these lingering slots can cause unnecessary WAL retention and system bloat. As the duration of idle_replication_slot_timeout is in minutes, any test using that would be time-consuming. We are planning to commit a follow up patch for tests by using the injection point framework. Author: Nisha Moond <nisha.moond412@gmail.com> Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Vignesh C <vignesh21@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com Discussion: https://postgr.es/m/OS0PR01MB5716C131A7D80DAE8CB9E88794FC2@OS0PR01MB5716.jpnprd01.prod.outlook.com	2025-02-19 09:29:50 +05:30
Tom Lane	b464e51ab3	Update to latest Snowball sources. It's been some time since we did this, partly because the upstream snowball project hasn't formally tagged a new release since 2021. The main motivation for doing it now is to absorb a bug fix (their commit e322673a841d9abd69994ae8cd20e191090b6ef4), which prevents a null pointer dereference crash if SN_create_env() gets a malloc failure at just the wrong point. We'll patch the back branches with only that change, but we might as well do the full sync dance on HEAD. Aside from a bunch of mostly-minor tweaks to existing stemmers, this update adds a new stemmer for Estonian. It also removes the existing stemmer for Romanian using ISO-8859-2 encoding. Upstream apparently concluded that ISO-8859-2 doesn't provide an adequate representation of some Romanian characters, and the UTF-8 implementation should be used instead. While at it, update the README's instructions for doing a sync, which have not been adjusted during the addition of meson tooling. Thanks to Maksim Korotkov for discovering the null-pointer bug and submitting the fix to upstream snowball. Reported-by: Maksim Korotkov <m.korotkov@postgrespro.ru> Discussion: https://postgr.es/m/1d1a46-67ab1000-21-80c451@83151435	2025-02-18 21:13:54 -05:00
Richard Guo	71d02dc478	Fix unsafe access to BufferDescriptors When considering a local buffer, the GetBufferDescriptor() call in BufferGetLSNAtomic() would be retrieving a shared buffer with a bad buffer ID. Since the code checks whether the buffer is shared before using the retrieved BufferDesc, this issue did not lead to any malfunction. Nonetheless this seems like trouble waiting to happen, so fix it by ensuring that GetBufferDescriptor() is only called when we know the buffer is shared. Author: Tender Wang <tndrwang@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/CAHewXNku-o46-9cmUgyv6LkSZ25doDrWq32p=oz9kfD8ovVJMg@mail.gmail.com Backpatch-through: 13	2025-02-19 11:05:35 +09:00
Richard Guo	c39392ebae	Fix freeing a child join's SpecialJoinInfo In try_partitionwise_join, we try to break down the join between two partitioned relations into joins between matching partitions. To achieve this, we iterate through each pair of partitions from the two joining relations and create child join relations for them. To reduce memory accumulation during each iteration, one step we take is freeing the SpecialJoinInfos created for the child joins. A child join's SpecialJoinInfo is a copy of the parent join's SpecialJoinInfo, with some members being translated copies of their counterparts in the parent. However, when freeing the bitmapset members in a child join's SpecialJoinInfo, we failed to check whether they were translated copies. As a result, we inadvertently freed the members that were still in use by the parent SpecialJoinInfo, leading to crashes when those freed members were accessed. To fix, check if each member of the child join's SpecialJoinInfo is a translated copy and free it only if that's the case. This requires passing the parent join's SpecialJoinInfo as a parameter to free_child_join_sjinfo. Back-patch to v17 where this bug crept in. Bug: #18806 Reported-by: 孟令彬 <m_lingbin@126.com> Diagnosed-by: Tender Wang <tndrwang@gmail.com> Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/18806-d70b0c9fdf63dcbf@postgresql.org Backpatch-through: 17	2025-02-19 10:02:32 +09:00
Michael Paquier	aef6f907f6	test_escape: Fix handling of short options in getopt_long() This addresses two errors in the module, based on the set of options supported: - '-c', for --conninfo, was not listed. - '-f', for --force-unsupported, was not listed. While on it, these are now listed in an alphabetical order. Author: Japin Li Discussion: https://postgr.es/m/ME0P300MB04451FB20CE0346A59C25CADB6FA2@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM Backpatch-through: 13	2025-02-19 09:45:42 +09:00
Michael Paquier	f2e4c2b203	Make the description of some GUCs more consistent This commit improves the description of a couple of GUCs, to be more consistent with the style of their surroundings: * array_nulls * enable_self_join_elimination * optimize_bounded_sort * row_security * synchronize_seqscans Author: Kyotaro Horiguchi Discussion: https://postgr.es/m/20250218.103240.1422205966404509831.horikyota.ntt@gmail.com	2025-02-19 08:42:35 +09:00
Bruce Momjian	06dc1ffd24	doc: add example of sign mismatch with POSIX/ISO-8601 time zones Author: Laurenz Albe Discussion: https://postgr.es/m/eb4d1e15c6822c1937be1491118500dd9201492f.camel@cybertec.at	2025-02-18 15:51:31 -05:00
Jeff Davis	a1f7f80bfe	Update outdated comments in nodeAgg.c. Author: Zhang Mingli Reviewed-by: Richard Guo Discussion: https://postgr.es/m/198a8d1e-0792-4e7f-828e-902aa342f36e@Spark	2025-02-18 10:37:50 -08:00
Melanie Plageman	c623e8593e	Reduce scope of heap vacuum per_buffer_data Move lazy_scan_heap()'s per_buffer_data variable into a tighter scope. In lazy_scan_heap()'s phase I heap vacuuming, the read stream API returns a pointer to the next block number to vacuum. As long as read_stream_next_buffer() returns a valid buffer, per_buffer_data should always be valid. Move per_buffer_data into a tighter scope and make sure it is reset to NULL on each iteration so that we get a core dump instead of bogus data from a previous block if something goes wrong in the read stream API. Suggested-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/626104.1739729538%40sss.pgh.pa.us	2025-02-18 09:29:10 -05:00
Daniel Gustafsson	95ef3d9029	Add PGErrorVerbosity to typedefs.list PGErrorVerbosity was missing which resulted in incorrect whitespace alignment going back all the way to `e3860ffa4d`. No backpatch for this though since we don't pgindent backbranches. Author: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CAGECzQTVi8n-HW4Q27je-b9ckQk7zf6bS_it42gNvQu+DX0NCQ@mail.gmail.com	2025-02-18 13:23:13 +01:00
David Rowley	593509202f	Fix poorly written regression test `bd10ec529` added code to allow redundant functionally dependent GROUP BY columns to be removed using unique indexes and NOT NULL constraints as proofs of functional dependency. In that commit, I (David) added a test to ensure that when there are multiple indexes available to remove columns that we pick the index that allows us to remove the most columns. This test was faulty as it assumed the t3 table's primary key index was valid to use as functional dependency proof, but that's not the case since that's defined as deferrable. Here we adjust the tests added by that commit to use the t2 table instead. That's defined with a non-deferrable primary key. Author: songjinzhou <tsinghualucky912@foxmail.com> Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Japin Li <japinli@hotmail.com> Discussion: https://postgr.es/m/tencent_CD414C79D39668455DF80D35143B87634C08@qq.com	2025-02-19 00:42:22 +13:00
Amit Kapila	217919dd09	Raise a WARNING for max_slot_wal_keep_size in pg_createsubscriber. During the pg_createsubscriber execution, it is possible that the required WAL is removed from the primary/publisher node due to 'max_slot_wal_keep_size'. This patch raises a WARNING during the '--dry-run' mode if the 'max_slot_wal_keep_size' is set to a non-default value on the primary/publisher node. Author: Shubham Khanna <khannashubham1197@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Vignesh C <vignesh21@gmail.com> Discussion: https://postgr.es/m/CAHv8Rj+deqsQXOMa7Tck8CBQUbsua=+4AuMVQ2=MPM0f-ZHbjA@mail.gmail.com	2025-02-18 12:15:43 +05:30
John Naylor	53d3daa491	Specialize intarray sorting There is at least one report in the field of storing millions of integers in arrays, so it seems like a good time to specialize intarray's qsort function. In doing so, streamline the comparators: Previously there were three, two for each direction for sorting and one passed to qunique_arg. To preserve the early exit in the case of descending input, pass the direction as an argument to the comparator. This requires giving up duplicate detection, which previously allowed skipping the qunique_arg() call. Testing showed no regressions this way. In passing, get rid of nearby checks that the input has at least two elements, since preserving them would make some macros less readable. These are not necessary for correctness, and seem like premature optimizations. Author: Andrey M. Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/098A3E67-E4A6-4086-9C66-B1EAEB1DFE1C@yandex-team.ru	2025-02-18 11:04:55 +07:00
Amit Kapila	164bac92f0	Doc: Improve pg_replication_slots.inactive_since description. Author: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAHut+PssvVMTWVtUPto6HbPO8pgVsvtzndt_FdBomA_Oq4zf3w@mail.gmail.com	2025-02-18 09:23:43 +05:30
Thomas Munro	2509b857cc	Fix typo in `2a8a0067`. Builds configured with Valgrind but without assertions would fail due to a typo in the recent change. This should be included when back-patching `2a8a0067` into v17.	2025-02-18 14:44:59 +13:00
Daniel Gustafsson	9cdc21b533	Fix translator notes in comments The translator comments detailing what a %s inclusion refers to were accidentally including too many address types. In practice this is not a problem since it's not a translated string, but to minimize any risk of confusion let's fix them anwyays. Even though this exists in backbranches there is little use for backpatch as the translation work has already happened there, so let's avoid the churn. Author: Japin Li <japinli@hotmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/ME0P300MB04458DE627480614ABE639D2B6FB2@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM	2025-02-17 20:23:34 +01:00
Tomas Vondra	c407d5426b	Add tab completion for ALTER USER/ROLE RESET Currently tab completion for ALTER USER RESET shows a list of all configuration parameters that may be set on a role, irrespectively of which parameters are actually set. This patch improves tab completion to offer only parameters that are set. Author: Robins Tharakan Reviewed-By: Tomas Vondra Discussion: https://postgr.es/m/CAEP4nAzqiT6VbVC5r3nq5byLTnPzjniVGzEMpYcnAHQyNzEuaw%40mail.gmail.com	2025-02-17 18:12:15 +01:00
Tomas Vondra	9df8727c50	Add tab completion for ALTER DATABASE RESET Currently tab completion for ALTER DATABASE RESET shows a list of all configuration parameters that may be set on a database, irrespectively of which parameters are actually set. This patch improves tab completion to offer only parameters that are set. Author: Robins Tharakan Reviewed-By: Tomas Vondra Discussion: https://postgr.es/m/CAEP4nAzqiT6VbVC5r3nq5byLTnPzjniVGzEMpYcnAHQyNzEuaw%40mail.gmail.com	2025-02-17 18:12:15 +01:00
Alexander Korotkov	fc069a3a63	Implement Self-Join Elimination The Self-Join Elimination (SJE) feature removes an inner join of a plain table to itself in the query tree if it is proven that the join can be replaced with a scan without impacting the query result. Self-join and inner relation get replaced with the outer in query, equivalence classes, and planner info structures. Also, the inner restrictlist moves to the outer one with the removal of duplicated clauses. Thus, this optimization reduces the length of the range table list (this especially makes sense for partitioned relations), reduces the number of restriction clauses and, in turn, selectivity estimations, and potentially improves total planner prediction for the query. This feature is dedicated to avoiding redundancy, which can appear after pull-up transformations or the creation of an EquivalenceClass-derived clause like the below. SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3); SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x); SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x; In the future, we could also reduce redundancy caused by subquery pull-up after unnecessary outer join removal in cases like the one below. SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x); Also, it can drastically help to join partitioned tables, removing entries even before their expansion. The SJE proof is based on innerrel_is_unique() machinery. We can remove a self-join when for each outer row: 1. At most, one inner row matches the join clause; 2. Each matched inner row must be (physically) the same as the outer one; 3. Inner and outer rows have the same row mark. In this patch, we use the next approach to identify a self-join: 1. Collect all merge-joinable join quals which look like a.x = b.x; 2. Add to the list above the baseretrictinfo of the inner table; 3. Check innerrel_is_unique() for the qual list. If it returns false, skip this pair of joining tables; 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the possibility of self-join elimination, the inner and outer clauses must match exactly. The relation replacement procedure is not trivial and is partly combined with the one used to remove useless left joins. Tests covering this feature were added to join.sql. Some of the existing regression tests changed due to self-join removal logic. Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com> Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Simon Riggs <simon@2ndquadrant.com> Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org> Reviewed-by: David Rowley <david.rowley@2ndquadrant.com> Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com> Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Hywel Carver <hywel@skillerwhale.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Reviewed-by: Greg Stark <stark@mit.edu> Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec> Reviewed-by: Michał Kłeczek <michal@kleczek.org> Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>	2025-02-17 12:44:12 +02:00
Alexander Korotkov	3fb58625d1	Revert: Get rid of WALBufMappingLock This commit reverts `6a2275b895`. Buildfarm failure on batta spots some concurrency issue, which requires further investigation.	2025-02-17 12:35:28 +02:00
Amit Langote	75dfde1363	Fix an oversight in `cbc127917` to handle MERGE correctly ExecInitModifyTable() forgot to trim MERGE-related lists to exclude entries for result relations pruned during initial pruning, so fix that. While at it, make the function's use of the pruned resultRelations list, rather than ModifyTable.resultRelations, more consistent. Reported-by: Alexander Lakhin <exclusion@gmail.com> (via sqlsmith) Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/e72c94d9-e5f9-4753-9bc1-69d72bd54b8a@gmail.com	2025-02-17 16:12:03 +09:00
Michael Paquier	6a8a7ce476	Add information about WAL buffers full to VACUUM/ANALYZE (VERBOSE) This commit adds the information about the number of times WAL buffers have been full to the logs generated by VACUUM/ANALYZE (VERBOSE) and in the logs generated by autovacuum, complementing the existing information stored by WalUsage. This is the last part of the backend code where the value of wal_buffers_full can be reported, similarly to all the other fields of WalUsage. `320545bfcf` and `ce5bcc4a9f` have done the same for EXPLAIN and pgss. Author: Bertrand Drouvot Reviewed-by: Ilia Evdokimov Discussion: https://postgr.es/m/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal	2025-02-17 15:09:51 +09:00
Michael Paquier	320545bfcf	Add information about WAL buffers being full to EXPLAIN (WAL) This is similar to `ce5bcc4a9f`, relying on the addition of wal_buffers_full to WalUsage. This time, the information is added to the output generated by EXPLAIN (WAL). Author: Bertrand Drouvot Reviewed-by: Ilia Evdokimov Discussion: https://postgr.es/m/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal	2025-02-17 14:50:33 +09:00
Michael Paquier	ce5bcc4a9f	pg_stat_statements: Add wal_buffers_full wal_buffers_full tracks the number of times WAL buffers become full, giving hints to be able to tune the GUC wal_buffers. Up to now, this information was only available in pg_stat_wal. With this field available in WalUsage since `eaf502747b`, exposing it in pg_stat_statements is straight-forward, and it offers more granularity at query level. pg_stat_statements does not need a version bump as one has been done in commit `cf54a2c002` for this development cycle. Author: Bertrand Drouvot Reviewed-by: Ilia Evdokimov Discussion: https://postgr.es/m/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal	2025-02-17 13:55:17 +09:00
Michael Paquier	eaf502747b	Move wal_buffers_full from PgStat_PendingWalStats to WalUsage wal_buffers_full has been introduced in pg_stat_wal in `8d9a935965`, as some information providing metrics for the tuning of the GUC wal_buffers. WalUsage has been introduced before that in `df3b181499`. Moving this field is proving to be beneficial for several reasons: - This information can now be made available in more layers, providing more granularity than just pg_stat_wal, on a per-query basis: EXPLAIN, pgss and VACUUM/ANALYZE logs. - A patch is under discussion to provide statistics for WAL at backend level, and this move simplifies a bit the handling of pending statistics. The remaining data in PgStat_PendingWalStats now relates to write/sync counters and times, with equivalents present in pg_stat_io, that backend statistics are able to already track. So this should cut all the dependencies between PgStat_PendingWalStats and WAL stats at backend level. As of this change, wal_buffers_full only shows in pg_stat_wal. Author: Bertrand Drouvot Reviewed-by: Ilia Evdokimov Discussion: https://postgr.es/m/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal	2025-02-17 13:14:28 +09:00
Alexander Korotkov	6a2275b895	Get rid of WALBufMappingLock Allow multiple backends to initialize WAL buffers concurrently. This way `MemSet((char ) NewPage, 0, XLOG_BLCKSZ);` can run in parallel without taking a single LWLock in exclusive mode. The new algorithm works as follows: reserve a page for initialization using XLogCtl->InitializeReserved, * ensure the page is written out, * once the page is initialized, try to advance XLogCtl->InitializedUpTo and signal to waiters using XLogCtl->InitializedUpToCondVar condition variable, * repeat previous steps until we reserve initialization up to the target WAL position, * wait until concurrent initialization finishes using a XLogCtl->InitializedUpToCondVar. Now, multiple backends can, in parallel, concurrently reserve pages, initialize them, and advance XLogCtl->InitializedUpTo to point to the latest initialized page. Author: Yura Sokolov <y.sokolov@postgrespro.ru> Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>	2025-02-17 04:25:29 +02:00
Richard Guo	fbc0fe9a2e	Adjust tuples estimate for appendrels In set_append_rel_size(), we currently set rel->tuples to rel->rows for an appendrel. Generally, rel->tuples is the raw number of tuples in the relation and rel->rows is the estimated number of tuples after the relation's restriction clauses have been applied. Although an appendrel itself doesn't directly enforce any quals today, its child relations may. Therefore, setting rel->tuples equal to rel->rows for an appendrel isn't always appropriate. Doing so can lead to issues in cost estimates in some cases. For instance, when estimating the number of distinct values from an appendrel, we would not be able to adjust the estimate based on the restriction selectivity. This patch addresses this by setting an appendrel's tuples to the total number of tuples accumulated from each live child, which better aligns with reality. This is arguably a bug, but nobody has complained about that until now, so no back-patch. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru> Discussion: https://postgr.es/m/CAMbWs4_TG_+kVn6fjG-5GYzzukrNK57=g9eUo4gsrUG26OFawg@mail.gmail.com	2025-02-17 11:13:15 +09:00
Tom Lane	a7f95859ef	In fmtIdEnc(), handle failure of enlargePQExpBuffer(). Coverity complained that we weren't doing that, and it's right. This fix just makes fmtIdEnc() honor the general convention that OOM causes a PQExpBuffer to become marked "broken", without any immediate error. In the pretty-unlikely case that we actually did hit OOM here, the end result would be to return an empty string to the caller, probably resulting in invalid SQL syntax in an issued command (if nothing else went wrong, which is even more unlikely). It's tempting to throw an "out of memory" error if the buffer becomes broken, but there's not a lot of point in doing that only here and not in hundreds of other PQExpBuffer-using places in pg_dump and similar callers. The whole issue could do with some non-time-crunched redesign, perhaps. This is a followup to the fixes for CVE-2025-1094, and should be included if cherry-picking those fixes.	2025-02-16 12:46:35 -05:00
Tom Lane	9f45e6a91d	Make escaping functions retain trailing bytes of an invalid character. Instead of dropping the trailing byte(s) of an invalid or incomplete multibyte character, replace only the first byte with a known-invalid sequence, and process the rest normally. This seems less likely to confuse incautious callers than the behavior adopted in `5dc1e42b4`. While we're at it, adjust PQescapeStringInternal to produce at most one bleat about invalid multibyte characters per string. This matches the behavior of PQescapeInternal, and avoids the risk of producing tons of repetitive junk if a long string is simply given in the wrong encoding. This is a followup to the fixes for CVE-2025-1094, and should be included if cherry-picking those fixes. Author: Andres Freund <andres@anarazel.de> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Reported-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/20250215012712.45@rfd.leadboat.com Backpatch-through: 13	2025-02-15 16:20:21 -05:00
Thomas Munro	2a8a00674e	Fix explicit valgrind interaction in read_stream.c. By calling wipe_mem() on per-buffer data memory that has been released, we are also telling Valgrind that the memory is "noaccess". We need to set it to "undefined" before giving it to the registered callback to fill in, when a slot is reused. As discovered by build farm animal skink when the VACUUM streamification patches landed (the first users of per-buffer data). Pushing to master only for now, to clear the error on skink. It's also possible that external code might discover the per-buffer data feature in v17, and reasonable to expect Valgrind not to produce spurious memcheck reports, but the back-patch is deferred until after the imminent minor release is out of the way. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Tested-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKG%2Bg6aXpi2FEHqeLOzE%2BxYw%3DOV%2B-N5jhOEnnV%2BF0USM9xA%40mail.gmail.com	2025-02-15 13:14:03 +13:00
Andres Freund	efdadeb223	Fix PQescapeLiteral()/PQescapeIdentifier() length handling In `5dc1e42b4f` I fixed bugs in various escape functions, unfortunately as part of that I introduced a new bug in PQescapeLiteral()/PQescapeIdentifier(). The bug is that I made PQescapeInternal() just use strlen(), rather than taking the specified input length into account. That's bad, because it can lead to including input that wasn't intended to be included (in case len is shorter than null termination of the string) and because it can lead to reading invalid memory if the input string is not null terminated. Expand test_escape to this kind of bug: a) for escape functions with length support, append data that should not be escaped and check that it is not b) add valgrind requests to detect access of bytes that should not be touched Author: Tom Lane <tgl@sss.pgh.pa.us> Author: Andres Freund <andres@anarazel.de Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/Z64jD3u46gObCo1p@pryzbyj2023 Backpatch: 13	2025-02-14 18:09:19 -05:00
Nathan Bossart	7720082ae5	Add delay time to VACUUM/ANALYZE (VERBOSE) and autovacuum logs. Commit `bb8dff9995` added this information to the pg_stat_progress_vacuum and pg_stat_progress_analyze system views. This commit adds the same information to the output of VACUUM and ANALYZE with the VERBOSE option and to the autovacuum logs. Suggested-by: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal	2025-02-14 14:53:28 -06:00
Daniel Gustafsson	9ad1b3d01f	pgcrypto: Add support for CFB mode in AES encryption Cipher Feedback Mode, CFB, is a self-synchronizing stream cipher which is very similar to CBC performed in reverse. Since OpenSSL supports it, we can easily plug it into the existing cipher selection code without any need for infrastructure changes. This patch was simultaneously submitted by Umar Hayat and Vladyslav Nebozhyn, the latter whom suggested the feauture. The committed patch is Umar's version. Author: Umar Hayat <postgresql.wizard@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CAPBGcbxo9ASzq14VTpQp3mnUJ5omdgTWUJOvWV0L6nNigWE5jw@mail.gmail.com	2025-02-14 21:18:37 +01:00
Nathan Bossart	760bf588de	Use PqMsg_Progress macro in HandleParallelMessage(). Commit `a99cc6c6b4` introduced the PqMsg_Progress macro but missed updating HandleParallelMessage() accordingly. Backpatch-through: 17	2025-02-14 12:57:13 -06:00
Melanie Plageman	c3e775e608	Use streaming read I/O in VACUUM's third phase Make vacuum's third phase (its second pass over the heap), which reaps dead items collected in the first phase and marks them as reusable, use the read stream API. This commit adds a new read stream callback, vacuum_reap_lp_read_stream_next(), that looks ahead in the TidStore and returns the next block number to read for vacuum. Author: Melanie Plageman <melanieplageman@gmail.com> Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGKN3oy0bN_3yv8hd78a4%2BM1tJC9z7mD8%2Bf%2ByA%2BGeoFUwQ%40mail.gmail.com	2025-02-14 12:57:49 -05:00
Melanie Plageman	9256822608	Use streaming read I/O in VACUUM's first phase Make vacuum's first phase, which prunes and freezes tuples and records dead TIDs, use the read stream API by by converting heap_vac_scan_next_block() to a read stream callback. Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CAAKRu_aLwANZpxHc0tC-6OT0OQT4TftDGkKAO5yigMUOv_Tcsw%40mail.gmail.com	2025-02-14 12:57:43 -05:00
Melanie Plageman	32acad7d1d	Convert heap_vac_scan_next_block() boolean parameters to flags The read stream API only allows one piece of extra per block state to be passed back to the API user (per_buffer_data). lazy_scan_heap() needs two pieces of per-buffer data: whether or not the block was all-visible in the visibility map and whether or not it was eagerly scanned. Convert these two pieces of information to flags so that they can be populated by heap_vac_scan_next_block() and returned to lazy_scan_heap(). A future commit will turn heap_vac_scan_next_block() into the read stream callback for heap phase I vacuuming. Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CAAKRu_bmx33jTqATP5GKNFYwAg02a9dDtk4U_ciEjgBHZSVkOQ%40mail.gmail.com	2025-02-14 12:57:37 -05:00
Nathan Bossart	977d865c36	Describe special values in GUC descriptions more consistently. Many GUCs accept special values like -1 or an empty string to disable the feature, use a system default, etc. While the documentation consistently lists these special values, the GUC descriptions do not. Many such descriptions fail to mention the special values, and those that do vary in phrasing and placement. This commit aims to bring some consistency to this area by applying the following rules: * Special values should be listed at the end of the long description. * Descriptions should use numerals (e.g., "0") instead of words (e.g., "zero"). * Special value mentions should be concise and direct (e.g., "0 disables the timeout.", "An empty string means use the operating system setting."). * Multiple special values should be listed in ascending order. Of course, there are exceptions, such as max_pred_locks_per_relation and search_path, whose special values are too complex to include. And there are cases like listen_addresses, where the meaning of an empty string is arguably too obvious to include. In those cases, I've refrained from adding special value information to the GUC description. Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/Z6aIy4aywxUZHAo6%40nathan	2025-02-14 10:44:30 -06:00

1 2 3 4 5 ...

60565 Commits