postgres

mirror of https://github.com/postgres/postgres.git synced 2025-08-31 17:02:12 +03:00

Author	SHA1	Message	Date
Amit Kapila	9b788aafdc	Fix uninitialized access to InitialRunningXacts during decoding. In commit `272248a0c`, we introduced an InitialRunningXacts array to remember transactions and subtransactions that were running when the xl_running_xacts record that we decoded was written. This array was allocated in the snapshot builder memory context after we restore serialized snapshot but we forgot to reset the array while freeing the builder memory context. So, the next time when we start decoding in the same session where we don't restore any serialized snapshot, we ended up using the uninitialized array and that can lead to unpredictable behavior. This problem doesn't exist in HEAD as instead of using InitialRunningXacts, we added the list of transaction IDs and sub-transaction IDs, that have modified catalogs and are running during snapshot serialization, to the serialized snapshot (see commit `7f13ac8123`). Reported-by: Maxim Orlov Author: Masahiko Sawada Reviewed-by: Amit Kapila, Maxim Orlov Backpatch-through: 11 Discussion: https://postgr.es/m/CACG=ezZoz_KG+Ryh9MrU_g5e0HiVoHocEvqFF=NRrhrwKmEQJQ@mail.gmail.com	2022-11-25 08:56:54 +05:30
Amit Kapila	5c51afe23d	Add CHECK_FOR_INTERRUPTS while restoring changes during decoding. Previously in commit `42681dffaf`, we added CFI during decoding changes but missed another similar case that can happen while restoring changes spilled to disk back into memory in a loop. Reported-by: Robert Haas Author: Amit Kapila Backpatch-through: 10 Discussion: https://postgr.es/m/CA+TgmoaLObg0QbstbC8ykDwOdD1bDkr4AbPpB=0DPgA2JW0mFg@mail.gmail.com	2022-10-21 12:08:14 +05:30
Amit Kapila	216af69aec	Fix executing invalidation messages generated by subtransactions during decoding. This problem has been introduced by commit `272248a0c1` where we started assigning the subtransactions to the top-level transaction when we mark both the top-level transaction and its subtransactions as containing catalog changes. After we assign subtransactions to the top-level transaction, we were not allowed to execute any invalidations associated with it when we decide to skip the transaction. The reason to assign the subtransactions to the top-level transaction was to avoid the assertion failure in AssertTXNLsnOrder() as they have the same LSN when we sometimes start accumulating transaction changes for partial transactions after the restart. Now that with commit `64ff0fe4e8`, we skip this assertion check until we reach the LSN at which we start decoding the contents of the transaction, so, there is no reason for such an assignment anymore. The assignment change was introduced in 15 and prior versions but this bug doesn't exist in branches prior to 14 since we don't add invalidation messages to subtransactions. We decided to backpatch through 11 for consistency but not for 10 since its final release is near. Reported-by: Kuroda Hayato Author: Masahiko Sawada Reviewed-by: Amit Kapila Backpatch-through: 11 Discussion: https://postgr.es/m/TYAPR01MB58660803BCAA7849C8584AA4F57E9%40TYAPR01MB5866.jpnprd01.prod.outlook.com Discussion: https://postgr.es/m/a89b46b6-0239-2fd5-71a9-b19b1f7a7145%40enterprisedb.com	2022-10-21 09:22:20 +05:30
Amit Kapila	5f7076cb60	Fix assertion failures while processing NEW_CID record in logical decoding. When the logical decoding restarts from NEW_CID, since there is no association between the top transaction and its subtransaction, both are created as top transactions and have the same LSN. This caused the assertion failure in AssertTXNLsnOrder(). This patch skips the assertion check until we reach the LSN at which we start decoding the contents of the transaction, specifically start_decoding_at LSN in SnapBuild. This is okay because we don't guarantee to make the association between top transaction and subtransaction until we try to decode the actual contents of transaction. The ordering of the records prior to the start_decoding_at LSN should have been checked before the restart. The other assertion failure is due to the reason that we forgot to track that we have considered top-level transaction id in the list of catalog changing transactions that were committed when one of its subtransactions is marked as containing catalog change. Reported-by: Tomas Vondra, Osumi Takamichi Author: Masahiko Sawada, Kuroda Hayato Reviewed-by: Amit Kapila, Dilip Kumar, Kuroda Hayato, Kyotaro Horiguchi, Masahiko Sawada Backpatch-through: 10 Discussion: https://postgr.es/m/a89b46b6-0239-2fd5-71a9-b19b1f7a7145%40enterprisedb.com Discussion: https://postgr.es/m/TYCPR01MB83733C6CEAE47D0280814D5AED7A9%40TYCPR01MB8373.jpnprd01.prod.outlook.com	2022-10-20 09:07:04 +05:30
Amit Kapila	51e9469a41	Add CHECK_FOR_INTERRUPTS while decoding changes. While decoding changes in a loop, if we skip all the changes there is no CFI making the loop uninterruptible. Reported-by: Whale Song and Andrey Borodin Bug: 17580 Author: Masahiko Sawada Reviwed-by: Amit Kapila Backpatch-through: 10 Discussion: https://postgr.es/m/17580-849c1d5b6d7eb422@postgresql.org Discussion: https://postgr.es/m/B319ECD6-9A28-4CDF-A8F4-3591E0BF2369@yandex-team.ru	2022-08-23 08:42:51 +05:30
Amit Kapila	e721123b7a	Fix catalog lookup with the wrong snapshot during logical decoding. Previously, we relied on HEAP2_NEW_CID records and XACT_INVALIDATION records to know if the transaction has modified the catalog, and that information is not serialized to snapshot. Therefore, after the restart, if the logical decoding decodes only the commit record of the transaction that has actually modified a catalog, we will miss adding its XID to the snapshot. Thus, we will end up looking at catalogs with the wrong snapshot. To fix this problem, this changes the snapshot builder so that it remembers the last-running-xacts list of the decoded RUNNING_XACTS record after restoring the previously serialized snapshot. Then, we mark the transaction as containing catalog changes if it's in the list of initial running transactions and its commit record has XACT_XINFO_HAS_INVALS. To avoid ABI breakage, we store the array of the initial running transactions in the static variables InitialRunningXacts and NInitialRunningXacts, instead of storing those in SnapBuild or ReorderBuffer. This approach has a false positive; we could end up adding the transaction that didn't change catalog to the snapshot since we cannot distinguish whether the transaction has catalog changes only by checking the COMMIT record. It doesn't have the information on which (sub) transaction has catalog changes, and XACT_XINFO_HAS_INVALS doesn't necessarily indicate that the transaction has catalog change. But that won't be a problem since we use snapshot built during decoding only to read system catalogs. On the master branch, we took a more future-proof approach by writing catalog modifying transactions to the serialized snapshot which avoids the above false positive. But we cannot backpatch it because of a change in the SnapBuild. Reported-by: Mike Oh Author: Masahiko Sawada Reviewed-by: Amit Kapila, Shi yu, Takamichi Osumi, Kyotaro Horiguchi, Bertrand Drouvot, Ahsan Hadi Backpatch-through: 10 Discussion: https://postgr.es/m/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2%40amazon.com	2022-08-11 08:55:31 +05:30
Fujii Masao	87e5044877	Prevent BASE_BACKUP in the middle of another backup in the same session. Multiple non-exclusive backups are able to be run conrrently in different sessions. But, in the same session, only one non-exclusive backup can be run at the same moment. If pg_backup_start (pg_start_backup in v14 or before) is called in the middle of another non-exclusive backup in the same session, an error is thrown. However, previously, in logical replication walsender mode, even if that walsender session had already called pg_backup_start and started a non-exclusive backup, it could execute BASE_BACKUP command and start another non-exclusive backup. Which caused subsequent pg_backup_stop to throw an error because BASE_BACKUP unexpectedly reset the session state marked by pg_backup_start. This commit prevents BASE_BACKUP command in the middle of another non-exclusive backup in the same session. Back-patch to all supported branches. Author: Fujii Masao Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada, Michael Paquier, Robert Haas Discussion: https://postgr.es/m/3374718f-9fbf-a950-6d66-d973e027f44c@oss.nttdata.com	2022-07-20 09:52:36 +09:00
Amit Kapila	ed2a7a6bf6	Fix memory leak due to LogicalRepRelMapEntry.attrmap. When rebuilding the relation mapping on subscribers, we were not releasing the attribute mapping's memory which was no longer required. The attribute mapping used in logical tuple conversion was refactored in PG13 (by commit `e1551f96e6`) but we forgot to update the related code that frees the attribute map. Author: Hou Zhijie Reviewed-by: Amit Langote, Amit Kapila, Shi yu Backpatch-through: 10, where it was introduced Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com	2022-06-23 08:37:40 +05:30
Tom Lane	d628ce048d	Fix off-by-one loop termination condition in pg_stat_get_subscription(). pg_stat_get_subscription scanned one more LogicalRepWorker array entry than is really allocated. In the worst case this could lead to SIGSEGV, if the LogicalRepCtx data structure is near the end of shared memory. That seems quite unlikely though (thanks to the ordering of calls in CreateSharedMemoryAndSemaphores) and we've heard no field reports of it. A more likely misbehavior is one row of garbage data in the function's result, but even that is not real likely because of the check that the pid field matches some live backend. Report and fix by Kuntal Ghosh. This bug is old, so back-patch to all supported branches. Discussion: https://postgr.es/m/CAGz5QCJykEDzW6jQK6Yz7Qh_PMtD=95de_7QoocbVR2Qy8hWZA@mail.gmail.com	2022-06-07 15:34:30 -04:00
Amit Kapila	87c1dd246a	Fix the logical replication timeout during large transactions. The problem is that we don't send keep-alive messages for a long time while processing large transactions during logical replication where we don't send any data of such transactions. This can happen when the table modified in the transaction is not published or because all the changes got filtered. We do try to send the keep_alive if necessary at the end of the transaction (via WalSndWriteData()) but by that time the subscriber-side can timeout and exit. To fix this we try to send the keepalive message if required after processing certain threshold of changes. Reported-by: Fabrice Chapuis Author: Wang wei and Amit Kapila Reviewed By: Masahiko Sawada, Euler Taveira, Hou Zhijie, Hayato Kuroda Backpatch-through: 10 Discussion: https://postgr.es/m/CAA5-nLARN7-3SLU_QUxfy510pmrYK6JJb=bk3hcgemAM_pAv+w@mail.gmail.com	2022-05-11 10:12:23 +05:30
Amit Kapila	a90de822e4	Fix the check to limit sync workers. We don't allow to invoke more sync workers once we have reached the sync worker limit per subscription. But the check to enforce this also doesn't allow to launch an apply worker if it gets restarted. This code was introduced by commit `de43897122` but we caught the problem only with the test added by recent commit `c91f71b9dc` which started failing occasionally in the buildfarm. As per buildfarm. Diagnosed-by: Amit Kapila, Masahiko Sawada, Tomas Vondra Author: Amit Kapila Backpatch-through: 10 Discussion: https://postgr.es/m/CAH2L28vddB_NFdRVpuyRBJEBWjz4BSyTB=_ektNRH8NJ1jf95g@mail.gmail.com https://postgr.es/m/f90d2b03-4462-ce95-a524-d91464e797c8@enterprisedb.com	2022-04-19 09:29:34 +05:30
Tom Lane	efae4401c4	Improve subscriber's error message for wrong publication relkind. Pre-v13 versions only support logical replication from plain tables, while v13 and later also allow partitioned tables to be published. If you tried to subscribe an older server to such a publication, you got "table XXX not found on publisher", which is pretty unhelpful/confusing. Arrange to deliver a more on-point error message. As commit `c314c147c` did in v13, remove the relkind check from the query WHERE clause altogether, so that "not there" is distinguishable from "wrong relkind". Per report from Radoslav Nedyalkov. Patch v10-v12. Discussion: https://postgr.es/m/2952568.1644876730@sss.pgh.pa.us	2022-02-15 12:21:28 -05:00
Tom Lane	4ec54498c5	Fix limitations on what SQL commands can be issued to a walsender. In logical replication mode, a WalSender is supposed to be able to execute any regular SQL command, as well as the special replication commands. Poor design of the replication-command parser caused it to fail in various cases, notably: * semicolons embedded in a command, or multiple SQL commands sent in a single message; * dollar-quoted literals containing odd numbers of single or double quote marks; * commands starting with a comment. The basic problem here is that we're trying to run repl_scanner.l across the entire input string even when it's not a replication command. Since repl_scanner.l does not understand all of the token types known to the core lexer, this is doomed to have failure modes. We certainly don't want to make repl_scanner.l as big as scan.l, so instead rejigger stuff so that we only lex the first token of a non-replication command. That will usually look like an IDENT to repl_scanner.l, though a comment would end up getting reported as a '-' or '/' single-character token. If the token is a replication command keyword, we push it back and proceed normally with repl_gram.y parsing. Otherwise, we can drop out of exec_replication_command() without examining the rest of the string. (It's still theoretically possible for repl_scanner.l to fail on the first token; but that could only happen if it's an unterminated single- or double-quoted string, in which case you'd have gotten largely the same error from the core lexer too.) In this way, repl_gram.y isn't involved at all in handling general SQL commands, so we can get rid of the SQLCmd node type. (In the back branches, we can't remove it because renumbering enum NodeTag would be an ABI break; so just leave it sit there unused.) I failed to resist the temptation to clean up some other sloppy coding in repl_scanner.l while at it. The only externally-visible behavior change from that is it now accepts \r and \f as whitespace, same as the core lexer. Per bug #17379 from Greg Rychlewski. Back-patch to all supported branches. Discussion: https://postgr.es/m/17379-6a5c6cfb3f1f5e77@postgresql.org	2022-01-24 15:33:34 -05:00
Tom Lane	449a696236	Remember to reset yy_start state when firing up repl_scanner.l. Without this, we get odd behavior when the previous cycle of lexing exited in a non-default exclusive state. Every other copy of this code is aware that it has to do BEGIN(INITIAL), but repl_scanner.l did not get that memo. The real-world impact of this is probably limited, since most replication clients would abandon their connection after getting a syntax error. Still, it's a bug. This mistake is old, so back-patch to all supported branches. Discussion: https://postgr.es/m/1874781.1643035952@sss.pgh.pa.us	2022-01-24 12:09:46 -05:00
Tom Lane	f4ab856e5d	Doc: improve protocol spec for logical replication Type messages. protocol.sgml documented the layout for Type messages, but completely dropped the ball otherwise, failing to explain what they are, when they are sent, or what they're good for. While at it, do a little copy-editing on the description of Relation messages. In passing, adjust the comment for apply_handle_type() to make it clearer that we choose not to do anything when receiving a Type message, not that we think it has no use whatsoever. Per question from Stefen Hillman. Discussion: https://postgr.es/m/CAPgW8pMknK5pup6=T4a_UG=Cz80Rgp=KONqJmTdHfaZb0RvnFg@mail.gmail.com	2021-11-10 13:12:58 -05:00
Alvaro Herrera	5ef2100475	Handle XLOG_OVERWRITE_CONTRECORD in DecodeXLogOp Failing to do so results in inability of logical decoding to process the WAL stream. Handle it by doing nothing. Backpatch all the way back. Reported-by: Petr Jelínek <petr.jelinek@enterprisedb.com>	2021-11-01 13:07:23 -03:00
Amit Kapila	d61cdad0b5	Back-patch "Add parent table name in an error in reorderbuffer.c." This was originally done in commit `5e77625b26` for 15 only, as a troubleshooting aid but multiple people showed interest in back-patching this. Author: Jeremy Schneider Reviewed-by: Amit Kapila Backpatch-through: 9.6 Discussion: https://postgr.es/m/808ed65b-994c-915a-361c-577f088b837f@amazon.com	2021-10-21 09:59:33 +05:30
Michael Paquier	506aa1f71a	Reset properly snapshot export state during transaction abort During a replication slot creation, an ERROR generated in the same transaction as the one creating a to-be-exported snapshot would have left the backend in an inconsistent state, as the associated static export snapshot state was not being reset on transaction abort, but only on the follow-up command received by the WAL sender that created this snapshot on replication slot creation. This would trigger inconsistency failures if this session tried to export again a snapshot, like during the creation of a replication slot. Note that a snapshot export cannot happen in a transaction block, so there is no need to worry resetting this state for subtransaction aborts. Also, this inconsistent state would very unlikely show up to users. For example, one case where this could happen is an out-of-memory error when building the initial snapshot to-be-exported. Dilip found this problem while poking at a different patch, that caused an error in this code path for reasons unrelated to HEAD. Author: Dilip Kumar Reviewed-by: Michael Paquier, Zhihong Yu Discussion: https://postgr.es/m/CAFiTN-s0zA1Kj0ozGHwkYkHwa5U0zUE94RSc_g81WrpcETB5=w@mail.gmail.com Backpatch-through: 9.6	2021-10-18 11:56:57 +09:00
Fujii Masao	aacb3cfaf5	Fix issue with WAL archiving in standby. Previously, walreceiver always closed the currently-opened WAL segment and created its archive notification file, after it finished writing the current segment up and received any WAL data that should be written into the next segment. If walreceiver exited just before any WAL data in the next segment arrived at standby, it did not create the archive notification file of the current segment even though that's known completed. This behavior could cause WAL archiving of the segment to be delayed until subsequent restartpoints or checkpoints created its notification file. To fix the issue, this commit changes walreceiver so that it creates an archive notification file of a current WAL segment immediately if that's known completed before receiving next WAL data. Back-patch to all supported branches. Reported-by: Kyotaro Horiguchi Author: Fujii Masao Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/20200630.165503.1465894182551545886.horikyota.ntt@gmail.com	2021-09-09 23:59:19 +09:00
Amit Kapila	a8e7eb159c	Fix incorrect error code in StartupReplicationOrigin(). ERRCODE_CONFIGURATION_LIMIT_EXCEEDED was used for checksum failure, use ERRCODE_DATA_CORRUPTED instead. Reported-by: Tatsuhito Kasahara Author: Tatsuhito Kasahara Backpatch-through: 9.6, where it was introduced Discussion: https://postgr.es/m/CAP0=ZVLHtYffs8SOWcFJWrBGoRzT9QQbk+_aP+E5AHLNXiOorA@mail.gmail.com	2021-08-30 09:48:56 +05:30
Tom Lane	32d50b8952	Don't try to print data type names in slot_store_error_callback(). The existing code tried to do syscache lookups in an already-failed transaction, which is problematic to say the least. After some consideration of alternatives, the best fix seems to be to just drop type names from the error message altogether. The table and column names seem like sufficient localization. If the user is unsure what types are involved, she can check the local and remote table definitions. Having done that, we can also discard the LogicalRepTypMap hash table, which had no other use. Arguably, LOGICAL_REP_MSG_TYPE replication messages are now obsolete as well; but we should probably keep them in case some other use emerges. (The complexity of removing something from the replication protocol would likely outweigh any savings anyhow.) Masahiko Sawada and Bharath Rupireddy, per complaint from Andres Freund. Back-patch to v10 where this code originated. Discussion: https://postgr.es/m/20210106020229.ne5xnuu6wlondjpe@alap3.anarazel.de	2021-07-02 16:04:54 -04:00
Tom Lane	c39983600b	Make walsenders show their replication commands in pg_stat_activity. A walsender process that has executed a SQL command left the text of that command in pg_stat_activity.query indefinitely, which is quite confusing if it's in RUNNING state but not doing that query. An easy and useful fix is to treat replication commands as if they were SQL queries, and show them in pg_stat_activity according to the same rules as for regular queries. While we're at it, it seems also sensible to set debug_query_string, allowing error logging and debugging to see the replication command. While here, clean up assorted silliness in exec_replication_command: * Clean up SQLCmd code path, and fix its only-accidentally-not-buggy memory management. * Remove useless duplicate call of SnapBuildClearExportedSnapshot(). * replication_scanner_finish() was never called. Back-patch of commit `f560209c6` into v10-v13. I'd originally felt that this didn't merit back-patching, but subsequent confusion while debugging walsender problems suggests that it'll be useful. Also, the original commit has now aged long enough to provide some comfort that it won't cause problems. Discussion: https://postgr.es/m/2673480.1624557299@sss.pgh.pa.us Discussion: https://postgr.es/m/880181.1600026471@sss.pgh.pa.us	2021-06-25 10:46:10 -04:00
Amit Kapila	1f8a934e0a	Fix decoding of speculative aborts. During decoding for speculative inserts, we were relying for cleaning toast hash on confirmation records or next change records. But that could lead to multiple problems (a) memory leak if there is neither a confirmation record nor any other record after toast insertion for a speculative insert in the transaction, (b) error and assertion failures if the next operation is not an insert/update on the same table. The fix is to start queuing spec abort change and clean up toast hash and change record during its processing. Currently, we are queuing the spec aborts for both toast and main table even though we perform cleanup while processing the main table's spec abort record. Later, if we have a way to distinguish between the spec abort record of toast and the main table, we can avoid queuing the change for spec aborts of toast tables. Reported-by: Ashutosh Bapat Author: Dilip Kumar Reviewed-by: Amit Kapila Backpatch-through: 9.6, where it was introduced Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com	2021-06-15 09:02:32 +05:30
Tom Lane	9eecea7f37	Don't use Asserts to check for violations of replication protocol. Using an Assert to check the validity of incoming messages is an extremely poor decision. In a debug build, it should not be that easy for a broken or malicious remote client to crash the logrep worker. The consequences could be even worse in non-debug builds, which will fail to make such checks at all, leading to who-knows-what misbehavior. Hence, promote every Assert that could possibly be triggered by wrong or out-of-order replication messages to a full test-and-ereport. To avoid bloating the set of messages the translation team has to cope with, establish a policy that replication protocol violation error reports don't need to be translated. Hence, all the new messages here use errmsg_internal(). A couple of old messages are changed likewise for consistency. Along the way, fix some non-idiomatic or outright wrong uses of hash_search(). Most of these mistakes are new with the "streaming replication" patch (commit `464824323`), but a couple go back a long way. Back-patch as appropriate. Discussion: https://postgr.es/m/1719083.1623351052@sss.pgh.pa.us	2021-06-12 12:59:15 -04:00
Tom Lane	eea081ad01	Rearrange logrep worker's snapshot handling some more. It turns out that worker.c's code path for TRUNCATE was also careless about establishing a snapshot while executing user-defined code, allowing the checks added by commit `84f5c2908` to fail when a trigger is fired in that context. We could just wrap Push/PopActiveSnapshot around the truncate call, but it seems better to establish a policy of holding a snapshot throughout execution of a replication step. To help with that and possible future requirements, replace the previous ensure_transaction calls with pairs of begin/end_replication_step calls. Per report from Mark Dilger. Back-patch to v11, like the previous changes. Discussion: https://postgr.es/m/B4A3AF82-79ED-4F4C-A4E5-CD2622098972@enterprisedb.com	2021-06-10 12:27:27 -04:00
Tom Lane	ef94805096	Restore the portal-level snapshot after procedure COMMIT/ROLLBACK. COMMIT/ROLLBACK necessarily destroys all snapshots within the session. The original implementation of intra-procedure transactions just cavalierly did that, ignoring the fact that this left us executing in a rather different environment than normal. In particular, it turns out that handling of toasted datums depends rather critically on there being an outer ActiveSnapshot: otherwise, when SPI or the core executor pop whatever snapshot they used and return, it's unsafe to dereference any toasted datums that may appear in the query result. It's possible to demonstrate "no known snapshots" and "missing chunk number N for toast value" errors as a result of this oversight. Historically this outer snapshot has been held by the Portal code, and that seems like a good plan to preserve. So add infrastructure to pquery.c to allow re-establishing the Portal-owned snapshot if it's not there anymore, and add enough bookkeeping support that we can tell whether it is or not. We can't, however, just re-establish the Portal snapshot as part of COMMIT/ROLLBACK. As in normal transaction start, acquiring the first snapshot should wait until after SET and LOCK commands. Hence, teach spi.c about doing this at the right time. (Note that this patch doesn't fix the problem for any PLs that try to run intra-procedure transactions without using SPI to execute SQL commands.) This makes SPI's no_snapshots parameter rather a misnomer, so in HEAD, rename that to allow_nonatomic. replication/logical/worker.c also needs some fixes, because it wasn't careful to hold a snapshot open around AFTER trigger execution. That code doesn't use a Portal, which I suspect someday we're gonna have to fix. But for now, just rearrange the order of operations. This includes back-patching the recent addition of finish_estate() to centralize the cleanup logic there. This also back-patches commit `2ecfeda3e` into v13, to improve the test coverage for worker.c (it was that test that exposed that worker.c's snapshot management is wrong). Per bug #15990 from Andreas Wicht. Back-patch to v11 where intra-procedure COMMIT was added. Discussion: https://postgr.es/m/15990-eee2ac466b11293d@postgresql.org	2021-05-21 14:03:53 -04:00
Amit Kapila	71787b23e3	Fix deadlock for multiple replicating truncates of the same table. While applying the truncate change, the logical apply worker acquires RowExclusiveLock on the relation being truncated. This allowed truncate on the relation at a time by two apply workers which lead to a deadlock. The reason was that one of the workers after updating the pg_class tuple tries to acquire SHARE lock on the relation and started to wait for the second worker which has acquired RowExclusiveLock on the relation. And when the second worker tries to update the pg_class tuple, it starts to wait for the first worker which leads to a deadlock. Fix it by acquiring AccessExclusiveLock on the relation before applying the truncate change as we do for normal truncate operation. Author: Peter Smith, test case by Haiying Tang Reviewed-by: Dilip Kumar, Amit Kapila Backpatch-through: 11 Discussion: https://postgr.es/m/CAHut+PsNm43p0jM+idTvWwiGZPcP0hGrHMPK9TOAkc+a4UpUqw@mail.gmail.com	2021-05-21 08:31:10 +05:30
Alvaro Herrera	b0e6e08b19	Rename the logical replication global "wrconn" The worker.c global wrconn is only meant to be used by logical apply/ tablesync workers, but there are other variables with the same name. To reduce future confusion rename the global from "wrconn" to "LogRepWorkerWalRcvConn". While this is just cosmetic, it seems better to backpatch it all the way back to 10 where this code appeared, to avoid future backpatching issues. Author: Peter Smith <smithpb2250@gmail.com> Discussion: https://postgr.es/m/CAHut+Pu7Jv9L2BOEx_Z0UtJxfDevQSAUW2mJqWU+CtmDrEZVAg@mail.gmail.com	2021-05-12 19:13:54 -04:00
Tom Lane	20f11ca0db	Don't leak malloc'd error string in libpqrcv_check_conninfo(). We leaked the error report from PQconninfoParse, when there was one. It seems unlikely that real usage patterns would repeat the failure often enough to create serious bloat, but let's back-patch anyway to keep the code similar in all branches. Found via valgrind testing. Back-patch to v10 where this code was added. Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us	2021-03-18 22:21:58 -04:00
Amit Kapila	7adc408f4b	Fix memory leak in SnapBuildSerialize. The memory for the snapshot was leaked while serializing it to disk during logical decoding. This memory will be freed only once walsender stops streaming the changes. This can lead to a huge memory increase when master logs Standby Snapshot too frequently say when the user is trying to create many replication slots. Reported-by: funnyxj.fxj@alibaba-inc.com Diagnosed-by: funnyxj.fxj@alibaba-inc.com Author: Amit Kapila Backpatch-through: 9.5 Discussion: https://postgr.es/m/033ab54c-6393-42ee-8ec9-2b399b5d8cde.funnyxj.fxj@alibaba-inc.com	2021-01-13 08:50:13 +05:30
Jeff Davis	4ee058a3bd	Revert "Cannot use WL_SOCKET_WRITEABLE without WL_SOCKET_READABLE." This reverts commit `3a9e64aa0d`. Commit `4bad60e3` fixed the root of the problem that `3a9e64aa` worked around. This enables proper pipelining of commands after terminating replication, eliminating an undocumented limitation. Discussion: https://postgr.es/m/3d57bc29-4459-578b-79cb-7641baf53c57%40iki.fi Backpatch-through: 9.5	2020-12-14 23:49:06 -08:00
Bruce Momjian	b5b7072a67	doc: wire protocol data type for history file content is bytea Document that though the history file content is marked as bytea, it is the same a text, and neither is btyea-escaped or encoding converted. Reported-by: Brar Piening Discussion: https://postgr.es/m/6a1b9cd9-17e3-df67-be55-86102af6bdf5@gmx.de Backpatch-through: 13 - 9.5 (not master)	2020-11-12 14:33:28 -05:00
Tom Lane	3a89ea0eb6	Fix and simplify some usages of TimestampDifference(). Introduce TimestampDifferenceMilliseconds() to simplify callers that would rather have the difference in milliseconds, instead of the select()-oriented seconds-and-microseconds format. This gets rid of at least one integer division per call, and it eliminates some apparently-easy-to-mess-up arithmetic. Two of these call sites were in fact wrong: * pg_prewarm's autoprewarm_main() forgot to multiply the seconds by 1000, thus ending up with a delay 1000X shorter than intended. That doesn't quite make it a busy-wait, but close. * postgres_fdw's pgfdw_get_cleanup_result() thought it needed to compute microseconds not milliseconds, thus ending up with a delay 1000X longer than intended. Somebody along the way had noticed this problem but misdiagnosed the cause, and imposed an ad-hoc 60-second limit rather than fixing the units. This was relatively harmless in context, because we don't care that much about exactly how long this delay is; still, it's wrong. There are a few more callers of TimestampDifference() that don't have a direct need for seconds-and-microseconds, but can't use TimestampDifferenceMilliseconds() either because they do need microsecond precision or because they might possibly deal with intervals long enough to overflow 32-bit milliseconds. It might be worth inventing another API to improve that, but that seems outside the scope of this patch; so those callers are untouched here. Given the fact that we are fixing some bugs, and the likelihood that future patches might want to back-patch code that uses this new API, back-patch to all supported branches. Alexey Kondratov and Tom Lane Discussion: https://postgr.es/m/3b1c053a21c07c1ed5e00be3b2b855ef@postgrespro.ru	2020-11-10 22:51:56 -05:00
Fujii Masao	462ff79669	Archive timeline history files in standby if archive_mode is set to "always". Previously the standby server didn't archive timeline history files streamed from the primary even when archive_mode is set to "always", while it archives the streamed WAL files. This could cause the PITR to fail because there was no required timeline history file in the archive. The cause of this issue was that walreceiver didn't mark those files as ready for archiving. This commit makes walreceiver mark those streamed timeline history files as ready for archiving if archive_mode=always. Then the archiver process archives the marked timeline history files. Back-patch to all supported versions. Reported-by: Grigory Smolkin Author: Grigory Smolkin, Fujii Masao Reviewed-by: David Zhang, Anastasia Lubennikova Discussion: https://postgr.es/m/54b059d4-2b48-13a4-6f43-95a087c92367@postgrespro.ru	2020-09-29 16:24:47 +09:00
Tom Lane	2c74f7139e	Fix bogus cache-invalidation logic in logical replication worker. The code recorded cache invalidation events by zeroing the "localreloid" field of affected cache entries. However, it's possible for an inval event to occur even while we have the entry open and locked. So an ill-timed inval could result in "cache lookup failed for relation 0" errors, if the worker's code tried to use the cleared field. We can fix that by creating a separate bool field to record whether the entry needs to be revalidated. (In the back branches, cram the bool into what had been padding space, to avoid an ABI break in the somewhat unlikely event that any extension is looking at this struct.) Also, rearrange the logic in logicalrep_rel_open so that it does the right thing in cases where table_open would fail. We should retry the lookup by name in that case, but we didn't. The real-world impact of this is probably small. In the first place, the error conditions are very low probability, and in the second place, the worker would just exit and get restarted. We only noticed because in a CLOBBER_CACHE_ALWAYS build, the failure can occur repeatedly, preventing the worker from making progress. Nonetheless, it's clearly a bug, and it impedes a useful type of testing; so back-patch to v10 where this code was introduced. Discussion: https://postgr.es/m/1032727.1600096803@sss.pgh.pa.us	2020-09-16 12:07:31 -04:00
Tom Lane	7156a0eac3	Fix over-eager ping'ing in logical replication receiver. Commit `3f60f690f` only partially fixed the broken-status-tracking issue in LogicalRepApplyLoop: we need ping_sent to have the same lifetime as last_recv_timestamp. The effects are much less serious than what that commit fixed, though. AFAICS this would just lead to extra ping requests being sent, once per second until the sender responds. Still, it's a bug, so backpatch to v10 as before. Discussion: https://postgr.es/m/959627.1599248476@sss.pgh.pa.us	2020-09-04 20:20:05 -04:00
Noah Misch	5a936d64c8	Empty search_path in logical replication apply worker and walsender. This is like CVE-2018-1058 commit `582edc369c`. Today, a malicious user of a publisher or subscriber database can invoke arbitrary SQL functions under an identity running replication, often a superuser. This fix may cause "does not exist" or "no schema has been selected to create in" errors in a replication process. After upgrading, consider watching server logs for these errors. Objects accruing schema qualification in the wake of the earlier commit are unlikely to need further correction. Back-patch to v10, which introduced logical replication. Security: CVE-2020-14349	2020-08-10 09:22:59 -07:00
Alvaro Herrera	1fa6eec974	walsnd: Don't set waiting_for_ping_response spuriously Ashutosh Bapat noticed that when logical walsender needs to wait for WAL, and it realizes that it must send a keepalive message to walreceiver to update the sent-LSN, which does not request a reply from walreceiver, it wrongly sets the flag that it's going to wait for that reply. That means that any future would-be sender of feedback messages ends up not sending a feedback message, because they all believe that a reply is expected. With built-in logical replication there's not much harm in this, because WalReceiverMain will send a ping-back every wal_receiver_timeout/2 anyway; but with other logical replication systems (e.g. pglogical) it can cause significant pain. This problem was introduced in commit `41d5f8ad73`, where the request-reply flag was changed from true to false to WalSndKeepalive, without at the same time removing the line that sets waiting_for_ping_response. Just removing that line would be a sufficient fix, but it seems better to shift the responsibility of setting the flag to WalSndKeepalive itself instead of requiring caller to do it; this is clearly less error-prone. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reported-by: Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com> Backpatch: 9.5 and up Discussion: https://postgr.es/m/20200806225558.GA22401@alvherre.pgsql	2020-08-08 12:31:55 -04:00
Robert Haas	f7013683d9	Fix minor problems with non-exclusive backup cleanup. The previous coding imagined that it could call before_shmem_exit() when a non-exclusive backup began and then remove the previously-added handler by calling cancel_before_shmem_exit() when that backup ended. However, this only works provided that nothing else in the system has registered a before_shmem_exit() hook in the interim, because cancel_before_shmem_exit() is documented to remove a callback only if it is the latest callback registered. It also only works if nothing can ERROR out between the time that sessionBackupState is reset and the time that cancel_before_shmem_exit(), which doesn't seem to be strictly true. To fix, leave the handler installed for the lifetime of the session, arrange to install it just once, and teach it to quietly do nothing if there isn't a non-exclusive backup in process. This was originally committed to master as `303640199d`, but I did not back-patch at the time because the consequences were minor. However, now there's been a second report of this causing trouble with a slightly different test case than the one I reported originally, so now I'm back-patching as far as v11 where JIT was introduced. Patch by me, reviewed by Kyotaro Horiguchi, Michael Paquier (who preferred a different approach, but got outvoted), Fujii Masao, and Tom Lane, and with comments by various others. New problem report from Bharath Rupireddy. Discussion: http://postgr.es/m/CA+TgmobMjnyBfNhGTKQEDbqXYE3_rXWpc4CM63fhyerNCes3mA@mail.gmail.com Discussion: http://postgr.es/m/CALj2ACWk7j4F2v2fxxYfrroOF=AdFNPr1WsV+AGtHAFQOqm_pw@mail.gmail.com	2020-08-06 14:06:24 -04:00
Tom Lane	e8de627a3e	Fix construction of updated-columns bitmap in logical replication. Commit `b9c130a1f` failed to apply the publisher-to-subscriber column mapping while checking which columns were updated. Perhaps less significantly, it didn't exclude dropped columns either. This could result in an incorrect updated-columns bitmap and thus wrong decisions about whether to fire column-specific triggers on the subscriber while applying updates. In HEAD (since commit `9de77b545`), it could also result in accesses off the end of the colstatus array, as detected by buildfarm member skink. Fix the logic, and adjust 003_constraints.pl so that the problem is exposed in unpatched code. In HEAD, also add some assertions to check that we don't access off the ends of these newly variable-sized arrays. Back-patch to v10, as `b9c130a1f` was. Discussion: https://postgr.es/m/CAH2-Wz=79hKQ4++c5A060RYbjTHgiYTHz=fw6mptCtgghH2gJA@mail.gmail.com	2020-07-20 13:40:16 -04:00
Michael Paquier	1657e14b50	Fix oldest xmin and LSN computation across repslots after advancing Advancing a replication slot did not recompute the oldest xmin and LSN values across replication slots, preventing resource removal like segments not recycled at checkpoint time. The original commit that introduced the slot advancing in `9c7d06d` never did the update of those oldest values, and `b0afdca` removed this code. This commit adds a TAP test to check segment recycling with advancing for physical slots, enforcing an extra segment switch before advancing to check if the segment gets correctly recycled after a checkpoint. Reported-by: Andres Freund Reviewed-by: Alexey Kondratov, Kyptaro Horiguchi Discussion: https://postgr.es/m/20200609171904.kpltxxvjzislidks@alap3.anarazel.de Backpatch-through: 11	2020-06-18 16:35:36 +09:00
Michael Paquier	b41a85f533	Fix instance of elog() called while holding a spinlock This broke the project rule to not call any complex code while a spinlock is held. Issue introduced by `b89e151`. Discussion: https://postgr.es/m/20200602.161518.1399689010416646074.horikyota.ntt@gmail.com Backpatch-through: 9.5	2020-06-04 10:18:06 +09:00
Tom Lane	7a8cb4a61e	Don't call palloc() while holding a spinlock, either. Fix some more violations of the "only straight-line code inside a spinlock" rule. These are hazardous not only because they risk holding the lock for an excessively long time, but because it's possible for palloc to throw elog(ERROR), leaving a stuck spinlock behind. copy_replication_slot() had two separate places that did pallocs while holding a spinlock. We can make the code simpler and safer by copying the whole ReplicationSlot struct into a local variable while holding the spinlock, and then referencing that copy. (While that's arguably more cycles than we really need to spend holding the lock, the struct isn't all that big, and this way seems far more maintainable than copying fields piecemeal. Anyway this is surely much cheaper than a palloc.) That bug goes back to v12. InvalidateObsoleteReplicationSlots() not only did a palloc while holding a spinlock, but for extra sloppiness then leaked the memory --- probably for the lifetime of the checkpointer process, though I didn't try to verify that. Fortunately that silliness is new in HEAD. pg_get_replication_slots() had a cosmetic violation of the rule, in that it only assumed it's safe to call namecpy() while holding a spinlock. Still, that's a hazard waiting to bite somebody, and there were some other cosmetic coding-rule violations in the same function, so clean it up. I back-patched this as far as v10; the code exists before that but it looks different, and this didn't seem important enough to adapt the patch further back. Discussion: https://postgr.es/m/20200602.161518.1399689010416646074.horikyota.ntt@gmail.com	2020-06-03 12:36:00 -04:00
Amit Kapila	1567ed9b3a	Fix comment in slot.c. Reported-by: Sawada Masahiko Author: Sawada Masahiko Reviewed-by: Amit Kapila Backpatch-through: 9.5 Discussion: https://postgr.es/m/CA+fd4k4Ws7M7YQ8PqSym5WB1y75dZeBTd1sZJUQdfe0KJQ-iSA@mail.gmail.com	2020-05-18 08:16:45 +05:30
Tom Lane	ee39a4b9e4	Fix bogus initialization of replication origin shared memory state. The previous coding zeroed out offsetof(ReplicationStateCtl, states) more bytes than it was entitled to, as a consequence of starting the zeroing from the wrong pointer (or, if you prefer, using the wrong calculation of how much to zero). It's unsurprising that this has not caused any reported problems, since it can be expected that the newly-allocated block is at the end of what we've used in shared memory, and we always make the shmem block substantially bigger than minimally necessary. Nonetheless, this is wrong and it could bite us someday; plus it's a dangerous model for somebody to copy. This dates back to the introduction of this code (commit `5aa235042`), so back-patch to all supported branches.	2020-05-15 19:05:39 -04:00
Tom Lane	b3fa6d016d	Fix race conditions in synchronous standby management. We have repeatedly seen the buildfarm reach the Assert(false) in SyncRepGetSyncStandbysPriority. This apparently is due to failing to consider the possibility that the sync_standby_priority values in shared memory might be inconsistent; but they will be whenever only some of the walsenders have updated their values after a change in the synchronous_standby_names setting. That function is vastly too complex for what it does, anyway, so rewriting it seems better than trying to apply a band-aid fix. Furthermore, the API of SyncRepGetSyncStandbys is broken by design: it returns a list of WalSnd array indexes, but there is nothing guaranteeing that the contents of the WalSnd array remain stable. Thus, if some walsender exits and then a new walsender process takes over that WalSnd array slot, a caller might make use of WAL position data that it should not, potentially leading to incorrect decisions about whether to release transactions that are waiting for synchronous commit. To fix, replace SyncRepGetSyncStandbys with a new function SyncRepGetCandidateStandbys that copies all the required data from shared memory while holding the relevant mutexes. If the associated walsender process then exits, this data is still safe to make release decisions with, since we know that that much WAL was sent to a valid standby server. This incidentally means that we no longer need to treat sync_standby_priority as protected by the SyncRepLock rather than the per-walsender mutex. SyncRepGetSyncStandbys is no longer used by the core code, so remove it entirely in HEAD. However, it seems possible that external code is relying on that function, so do not remove it from the back branches. Instead, just remove the known-incorrect Assert. When the bug occurs, the function will return a too-short list, which callers should treat as meaning there are not enough sync standbys, which seems like a reasonably safe fallback until the inconsistent state is resolved. Moreover it's bug-compatible with what has been happening in non-assert builds. We cannot do anything about the walsender-replacement race condition without an API/ABI break. The bogus assertion exists back to 9.6, but 9.6 is sufficiently different from the later branches that the patch doesn't apply at all. I chose to just remove the bogus assertion in 9.6, feeling that the probability of a bad outcome from the walsender-replacement race condition is too low to justify rewriting the whole patch for 9.6. Discussion: https://postgr.es/m/21519.1585272409@sss.pgh.pa.us	2020-04-18 14:02:44 -04:00
Peter Eisentraut	250041a561	Save errno across LWLockRelease() calls Fixup for "Drop slot's LWLock before returning from SaveSlotToPath()" Reported-by: Michael Paquier <michael@paquier.xyz>	2020-04-05 10:03:41 +02:00
Peter Eisentraut	ba4cc05ce2	Drop slot's LWLock before returning from SaveSlotToPath() When SaveSlotToPath() is called with elevel=LOG, the early exits didn't release the slot's io_in_progress_lock. This could result in a walsender being stuck on the lock forever. A possible way to get into this situation is if the offending code paths are triggered in a low disk space situation. Author: Pavan Deolasee <pavan.deolasee@2ndquadrant.com> Reported-by: Craig Ringer <craig@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/56a138c5-de61-f553-7e8f-6789296de785%402ndquadrant.com	2020-03-26 13:51:07 +01:00
Noah Misch	ae86e46c3b	Back-patch log_newpage_range(). Back-patch a subset of commit `9155580fd5` to v11, v10, 9.6, and 9.5. Include the latest repairs to this function. Use a new XLOG_FPI_MULTI value instead of reusing XLOG_FPI. That way, if an older server reads WAL from this function, that server will PANIC instead of applying just one page of the record. The next commit adds a call to this function. Discussion: https://postgr.es/m/20200304.162919.898938381201316571.horikyota.ntt@gmail.com	2020-03-21 09:38:33 -07:00
Michael Paquier	da2a7180aa	Add prefix checks in exclude lists for pg_rewind, pg_checksums and base backups An instance of PostgreSQL crashing with a bad timing could leave behind temporary pg_internal.init files, potentially causing failures when verifying checksums. As the same exclusion lists are used between pg_rewind, pg_checksums and basebackup.c, all those tools are extended with prefix checks to keep everything in sync, with dedicated checks added for pg_internal.init. Backpatch down to 11, where pg_checksums (pg_verify_checksums in 11) and checksum verification for base backups have been introduced. Reported-by: Michael Banck Author: Michael Paquier Reviewed-by: Kyotaro Horiguchi, David Steele Discussion: https://postgr.es/m/62031974fd8e941dd8351fbc8c7eff60d59c5338.camel@credativ.de Backpatch-through: 11	2020-02-24 18:14:22 +09:00

1 2 3 4 5 ...

799 Commits