postgres

mirror of https://github.com/postgres/postgres.git synced 2025-08-30 06:01:21 +03:00

Author	SHA1	Message	Date
Amit Kapila	f3a8f73ec2	Use Enums for logical replication message types at more places. Commit `644f0d7cc9` added logical replication message type enums to use instead of character literals but some char substitutions were overlooked. Author: Peter Smith Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CAHut+PsTG=Vrv8hgrvOnAvCNR21jhqMdPk2n0a1uJPoW0p+UfQ@mail.gmail.com	2020-11-26 09:21:14 +05:30
Tom Lane	789b938bf2	Centralize logic for skipping useless ereport/elog calls. While ereport() and elog() themselves are quite cheap when the error message level is too low to be printed, some places need to do substantial work before they can call those macros at all. To allow optimizing away such setup work when nothing is to be printed, make elog.c export a new function message_level_is_interesting(elevel) that reports whether ereport/elog will do anything. Make use of that in various places that had ad-hoc direct tests of log_min_messages etc. Also teach ProcSleep to use it to avoid some work. (There may well be other places that could usefully use this; I didn't search hard.) Within elog.c, refactor a little bit to avoid having duplicate copies of the policy-setting logic. When that code was written, we weren't relying on the availability of inline functions; so it had some duplications in the name of efficiency, which I got rid of. Alvaro Herrera and Tom Lane Discussion: https://postgr.es/m/129515.1606166429@sss.pgh.pa.us	2020-11-23 19:10:46 -05:00
Heikki Linnakangas	c532d15ddd	Split copy.c into four files. Copy.c has grown really large. Split it into more manageable parts: - copy.c now contains only a few functions that are common to COPY FROM and COPY TO. - copyto.c contains code for COPY TO. - copyfrom.c contains code for initializing COPY FROM, and inserting the tuples to the correct table. - copyfromparse.c contains code for reading from the client/file/program, and parsing the input text/CSV/binary format into tuples. All of these parts are fairly complicated, and fairly independent of each other. There is a patch being discussed to implement parallel COPY FROM, which will add a lot of new code to the COPY FROM path, and another patch which would allow INSERTs to use the same multi-insert machinery as COPY FROM, both of which will require refactoring that code. With those two patches, there's going to be a lot of code churn in copy.c anyway, so now seems like a good time to do this refactoring. The CopyStateData struct is also split. All the formatting options, like FORMAT, QUOTE, ESCAPE, are put in a new CopyFormatOption struct, which is used by both COPY FROM and TO. Other state data are kept in separate CopyFromStateData and CopyToStateData structs. Reviewed-by: Soumyadeep Chakraborty, Erik Rijkers, Vignesh C, Andres Freund Discussion: https://www.postgresql.org/message-id/8e15b560-f387-7acc-ac90-763986617bfb%40iki.fi	2020-11-23 10:50:50 +02:00
Alvaro Herrera	27838981be	Relax lock level for setting PGPROC->statusFlags We don't actually need a lock to set PGPROC->statusFlags itself; what we do need is a shared lock on either XidGenLock or ProcArrayLock in order to ensure MyProc->pgxactoff keeps still while we modify the mirror array in ProcGlobal->statusFlags. Some places were using an exclusive lock for that, which is excessive. Relax those to use shared lock only. procarray.c has a couple of places with somewhat brittle assumptions about PGPROC changes: ProcArrayEndTransaction uses only shared lock, so it's permissible to change MyProc only. On the other hand, ProcArrayEndTransactionInternal also changes other procs, so it must hold exclusive lock. Add asserts to ensure those assumptions continue to hold. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/20201117155501.GA13805@alvherre.pgsql	2020-11-18 13:24:22 -03:00
Amit Kapila	9653f24ad8	Fix 'skip-empty-xacts' option in test_decoding for streaming mode. In streaming mode, the transaction can be decoded in multiple streams and those streams can be interleaved with streams of other transactions. So, we can't remember the transaction's write status in the logical decoding context because that might get changed due to some other transactions and lead to wrong answers for 'skip-empty-xacts' option. We decided to keep each transaction's write status in the ReorderBufferTxn to avoid interleaved streams changing the status of some unrelated transactions. Diagnosed-by: Amit Kapila Author: Dilip Kumar Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CAA4eK1LR7=XNM_TLmpZMFuV8ZQpoxkem--NZJYf8YXmesbvwLA@mail.gmail.com	2020-11-17 12:14:53 +05:30
Alvaro Herrera	cd9c1b3e19	Rename PGPROC->vacuumFlags to statusFlags With more flags associated to a PGPROC entry that are not related to vacuum (currently existing or planned), the name "statusFlags" describes its purpose better. (The same is done to the mirroring PROC_HDR->vacuumFlags.) No functional changes in this commit. This was suggested first by Hari Babu Kommi in [1] and then by Michael Paquier at [2]. [1] https://postgr.es/m/CAJrrPGcsDC-oy1AhqH0JkXYa0Z2AgbuXzHPpByLoBGMxfOZMEQ@mail.gmail.com [2] https://postgr.es/m/20200820060929.GB3730@paquier.xyz Author: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20201116182446.qcg3o6szo2zookyr@localhost	2020-11-16 19:42:55 -03:00
Bruce Momjian	66a8f09048	change wire protocol data type for history file content This was marked as BYTEA, but is more like TEXT, which is how we already pass the history timeline file name. Internally, we don't do any encoding or bytea escape handling, but TEXT seems closest. This should cause no behavioral change. Reported-by: Brar Piening Discussion: https://postgr.es/m/6a1b9cd9-17e3-df67-be55-86102af6bdf5@gmx.de Backpatch-through: master	2020-11-12 14:08:59 -05:00
Fujii Masao	1a2ae7c50f	Use standard SIGHUP and SIGTERM handlers in walreceiver. Commit `1e53fe0e70` changed background processes so that they use standard SIGHUP handler. Like that, this commit makes walreceiver use standard SIGHUP and SIGTERM handlers, to simplify the code. As the side effect of this commit, walreceiver can wake up and process the configuration files promptly when receiving SIGHUP. Because the standard SIGHUP handler sets the latch. On the other hand, previously there could be a time lag between the receipt of SIGHUP and the process of configuration files since the dedicated handler didn't set the latch. Author: Bharath Rupireddy, tweaked by Fujii Masao Reviewed-by: Kyotaro Horiguchi, Fujii Masao Discussion: https://postgr.es/m/CALj2ACXPorUqePswDtOeM_s82v9RW32E1fYmOPZ5NuE+TWKj_A@mail.gmail.com	2020-11-12 13:25:23 +09:00
Michael Paquier	03f9cd93ea	Remove useless SHA256 initialization when not using backup manifests Attempting to take a base backup with Postgres linking to a build of OpenSSL with FIPS enabled currently fails with or even without a backup manifest requested because of this mandatory SHA256 initialization used for the manifest file itself. However, there is no need to do this initialization at all if backup manifests are not needed because there is no data to append to the manifest. Note that being able to use backup manifests with OpenSSL+FIPS requires a switch of the SHA2 implementation to use EVP, which would cause an ABI breakage so this cannot be backpatched to 13 as it has been already released, but at least avoiding this SHA256 initialization gives users the possibility to take a base backup even when specifying --no-manifest with pg_basebackup. Author: Michael Paquier Discussion: https://postgr.es/m/20201110020014.GE1887@paquier.xyz Backpatch-through: 13	2020-11-12 10:56:33 +09:00
Tom Lane	ec29427ce2	Fix and simplify some usages of TimestampDifference(). Introduce TimestampDifferenceMilliseconds() to simplify callers that would rather have the difference in milliseconds, instead of the select()-oriented seconds-and-microseconds format. This gets rid of at least one integer division per call, and it eliminates some apparently-easy-to-mess-up arithmetic. Two of these call sites were in fact wrong: * pg_prewarm's autoprewarm_main() forgot to multiply the seconds by 1000, thus ending up with a delay 1000X shorter than intended. That doesn't quite make it a busy-wait, but close. * postgres_fdw's pgfdw_get_cleanup_result() thought it needed to compute microseconds not milliseconds, thus ending up with a delay 1000X longer than intended. Somebody along the way had noticed this problem but misdiagnosed the cause, and imposed an ad-hoc 60-second limit rather than fixing the units. This was relatively harmless in context, because we don't care that much about exactly how long this delay is; still, it's wrong. There are a few more callers of TimestampDifference() that don't have a direct need for seconds-and-microseconds, but can't use TimestampDifferenceMilliseconds() either because they do need microsecond precision or because they might possibly deal with intervals long enough to overflow 32-bit milliseconds. It might be worth inventing another API to improve that, but that seems outside the scope of this patch; so those callers are untouched here. Given the fact that we are fixing some bugs, and the likelihood that future patches might want to back-patch code that uses this new API, back-patch to all supported branches. Alexey Kondratov and Tom Lane Discussion: https://postgr.es/m/3b1c053a21c07c1ed5e00be3b2b855ef@postgrespro.ru	2020-11-10 22:51:54 -05:00
Peter Eisentraut	bdc4edbea6	Move catalog index declarations Move the system catalog index declarations from catalog/indexing.h to the respective parent tables' catalog/pg_*.h files. The original reason for having it split was that the old genbki system produced the output in the order of the catalog files it read, so all the indexing stuff needed to come separately. But this is no longer the case, and keeping it together makes more sense. Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/c7cc82d6-f976-75d6-2e3e-b03d2cab26bb@2ndquadrant.com	2020-11-07 12:26:24 +01:00
Amit Kapila	644f0d7cc9	Use Enum for top level logical replication message types. Logical replication protocol uses a single byte character to identify a message type in logical replication protocol. The code uses string literals for the same. Use Enum so that 1. All the string literals used can be found at a single place. This makes it easy to add more types without the risk of conflicts. 2. It's easy to locate the code handling a given message type. 3. When used with switch statements, it is easy to identify the missing cases using -Wswitch. Author: Ashutosh Bapat Reviewed-by: Kyotaro Horiguchi, Andres Freund, Peter Smith and Amit Kapila Discussion: https://postgr.es/m/CAExHW5uPzQ7L0oAd_ENyvaiYMOPgkrAoJpE+ZY5-obdcVT6NPg@mail.gmail.com	2020-11-02 08:18:18 +05:30
Amit Kapila	8e90ec5580	Track statistics for streaming of changes from ReorderBuffer. This adds the statistics about transactions streamed to the decoding output plugin from ReorderBuffer. Users can query the pg_stat_replication_slots view to check these stats and call pg_stat_reset_replication_slot to reset the stats of a particular slot. Users can pass NULL in pg_stat_reset_replication_slot to reset stats of all the slots. Commit `9868167500` has added the basic infrastructure to capture the stats of slot and this commit extends the statistics collector to track additional information about slots. Bump the catversion as we have added new columns in the catalog entry. Author: Ajin Cherian and Amit Kapila Reviewed-by: Sawada Masahiko and Dilip Kumar Discussion: https://postgr.es/m/CAA4eK1+chpEomLzgSoky-D31qev19AmECNiEAietPQUGEFhtVA@mail.gmail.com	2020-10-29 09:11:51 +05:30
Tom Lane	ad77039fad	Calculate extraUpdatedCols in query rewriter, not parser. It's unsafe to do this at parse time because addition of generated columns to a table would not invalidate stored rules containing UPDATEs on the table ... but there might now be dependent generated columns that were not there when the rule was made. This also fixes an oversight that rewriteTargetView failed to update extraUpdatedCols when transforming an UPDATE on an updatable view. (Since the new calculation is downstream of that, rewriteTargetView doesn't actually need to do anything; but before, there was a demonstrable bug there.) In v13 and HEAD, this leads to easily-visible bugs because (since commit `c6679e4fc`) we won't recalculate generated columns that aren't listed in extraUpdatedCols. In v12 this bitmap is mostly just used for trigger-firing decisions, so you'd only notice a problem if a trigger cared whether a generated column had been updated. I'd complained about this back in May, but then forgot about it until bug #16671 from Michael Paul Killian revived the issue. Back-patch to v12 where this field was introduced. If existing stored rules contain any extraUpdatedCols values, they'll be ignored because the rewriter will overwrite them, so the bug will be fixed even for existing rules. (But note that if someone were to update to 13.1 or 12.5, store some rules with UPDATEs on tables having generated columns, and then downgrade to a prior minor version, they might observe issues similar to what this patch fixes. That seems unlikely enough to not be worth going to a lot of effort to fix.) Discussion: https://postgr.es/m/10206.1588964727@sss.pgh.pa.us Discussion: https://postgr.es/m/16671-2fa55851859fb166@postgresql.org	2020-10-28 13:47:02 -04:00
Heikki Linnakangas	fb5883da86	Remove PartitionRoutingInfo struct. The extra indirection neeeded to access its members via its enclosing ResultRelInfo seems pointless. Move all the fields from PartitionRoutingInfo to ResultRelInfo. Author: Amit Langote Reviewed-by: Alvaro Herrera Discussion: https://www.postgresql.org/message-id/CA%2BHiwqFViT47Zbr_ASBejiK7iDG8%3DQ1swQ-tjM6caRPQ67pT%3Dw%40mail.gmail.com	2020-10-19 14:42:55 +03:00
Alvaro Herrera	b05fe7b442	Review logical replication tablesync code Most importantly, remove optimization in LogicalRepSyncTableStart that skips the normal walrcv_startstreaming/endstreaming dance. The optimization is not critically important for production uses anyway, since it only fires in cases with no activity, and saves an uninteresting amount of work even then. Critically, it obscures bugs by hiding the interesting code path from test cases. Also: in GetSubscriptionRelState, remove pointless relation open; access pg_subscription_rel->srsubstate with GETSTRUCT as is typical rather than SysCacheGetAttr; remove unused 'missing_ok' argument. In wait_for_relation_state_change, use explicit catalog snapshot invalidation rather than obscurely (and expensively) through GetLatestSnapshot. In various places: sprinkle comments more liberally and rewrite a number of them. Other cosmetic code improvements. No backpatch, since no bug is being fixed here. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Petr Jelínek <petr.jelinek@2ndquadrant.com> Discussion: https://postgr.es/m/20201010190637.GA5774@alvherre.pgsql	2020-10-15 11:35:51 -03:00
David Rowley	110d81728a	Fixup some appendStringInfo and appendPQExpBuffer calls A number of places were using appendStringInfo() when they could have been using appendStringInfoString() instead. While there's no functionality change there, it's just more efficient to use appendStringInfoString() when no formatting is required. Likewise for some appendStringInfoString() calls which were just appending a single char. We can just use appendStringInfoChar() for that. Additionally, many places were using appendPQExpBuffer() when they could have used appendPQExpBufferStr(). Change those too. Patch by Zhijie Hou, but further searching by me found significantly more places that deserved the same treatment. Author: Zhijie Hou, David Rowley Discussion: https://postgr.es/m/cb172cf4361e4c7ba7167429070979d4@G08CNEXMBPEKD05.g08.fujitsu.local	2020-10-15 20:35:17 +13:00
Amit Kapila	d7eb52d718	Execute invalidation messages for each XLOG_XACT_INVALIDATIONS message during logical decoding. Prior to commit `c55040ccd0` we have no way of knowing the invalidations before commit. So, while decoding we use to execute all the invalidations at each command end as we had no way of knowing which invalidations happened before that command. Due to this, transactions involving large amounts of DDLs use to take more time and also lead to high CPU usage. But now we know specific invalidations at each command end so we execute only required invalidations. It has been observed that decoding of a transaction containing truncation of a table with 1000 partitions would be finished in 1s whereas before this patch it used to take 4-5 minutes. Author: Dilip Kumar Reviewed-by: Amit Kapila and Keisuke Kuroda Discussion: https://postgr.es/m/CANDwggKYveEtXjXjqHA6RL3AKSHMsQyfRY6bK+NqhAWJyw8psQ@mail.gmail.com	2020-10-15 08:17:51 +05:30
Alvaro Herrera	4e9821b6fa	Restore replication protocol's duplicate command tags I removed the duplicate command tags for START_REPLICATION inadvertently in commit `07082b08cc`, but the replication protocol requires them. The fact that the replication protocol was broken was not noticed because all our test cases use an optimized code path that exits early, failing to verify that the behavior is correct for non-optimized cases. Put them back. Also document this protocol quirk. Add a test case that shows the failure. It might still succeed even without the patch when run on a fast enough server, but it suffices to show the bug in enough cases that it would be noticed in buildfarm. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reported-by: Henry Hinze <henry.hinze@gmail.com> Reviewed-by: Petr Jelínek <petr.jelinek@2ndquadrant.com> Discussion: https://postgr.es/m/16643-eaadeb2a1a58d28c@postgresql.org	2020-10-14 20:12:26 -03:00
Heikki Linnakangas	a04daa97a4	Remove es_result_relation_info from EState. Maintaining 'es_result_relation_info' correctly at all times has become cumbersome, especially with partitioning where each partition gets its own result relation info. Having to set and reset it across arbitrary operations has caused bugs in the past. This changes all the places that used 'es_result_relation_info', to receive the currently active ResultRelInfo via function parameters instead. Author: Amit Langote Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com	2020-10-14 11:41:40 +03:00
Heikki Linnakangas	1375422c78	Create ResultRelInfos later in InitPlan, index them by RT index. Instead of allocating all the ResultRelInfos upfront in one big array, allocate them in ExecInitModifyTable(). es_result_relations is now an array of ResultRelInfo pointers, rather than an array of structs, and it is indexed by the RT index. This simplifies things: we get rid of the separate concept of a "result rel index", and don't need to set it in setrefs.c anymore. This also allows follow-up optimizations (not included in this commit yet) to skip initializing ResultRelInfos for target relations that were not needed at runtime, and removal of the es_result_relation_info pointer. The EState arrays of regular result rels and root result rels are merged into one array. Similarly, the resultRelations and rootResultRelations lists in PlannedStmt are merged into one. It's not actually clear to me why they were kept separate in the first place, but now that the es_result_relations array is indexed by RT index, it certainly seems pointless. The PlannedStmt->resultRelations list is now only needed for ExecRelationIsTargetRelation(). One visible effect of this change is that ExecRelationIsTargetRelation() will now return 'true' also for the partition root, if a partitioned table is updated. That seems like a good thing, although the function isn't used in core code, and I don't see any reason for an FDW to call it on a partition root. Author: Amit Langote Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com	2020-10-13 12:57:02 +03:00
Amit Kapila	f13f2e4841	Fix typos in logical.c and reorderbuffer.c. Reviewed-by: Sawada Masahiko Discussion: https://postgr.es/m/CAA4eK1K6zTpuqf_d7wXCBjo_EF0_B6Fz3Ecp71Vq18t=wG-nzg@mail.gmail.com	2020-10-09 08:16:43 +05:30
Amit Kapila	9868167500	Track statistics for spilling of changes from ReorderBuffer. This adds the statistics about transactions spilled to disk from ReorderBuffer. Users can query the pg_stat_replication_slots view to check these stats and call pg_stat_reset_replication_slot to reset the stats of a particular slot. Users can pass NULL in pg_stat_reset_replication_slot to reset stats of all the slots. This commit extends the statistics collector to track this information about slots. Author: Sawada Masahiko and Amit Kapila Reviewed-by: Amit Kapila and Dilip Kumar Discussion: https://postgr.es/m/CA+fd4k5_pPAYRTDrO2PbtTOe0eHQpBvuqmCr8ic39uTNmR49Eg@mail.gmail.com	2020-10-08 09:09:08 +05:30
Amit Kapila	f07707099c	Display the names of missing columns in error during logical replication. In logical replication when a subscriber is missing some columns, it currently emits an error message that says "some" columns are missing, but it doesn't specify the missing column names. Change that to display missing column names which makes an error to be more informative to the user. We have decided not to backpatch this commit as this is a minor usability improvement and no user has reported this. Reported-by: Bharath Rupireddy Author: Bharath Rupireddy Reviewed-by: Kyotaro Horiguchi and Amit Kapila Discussion: https://postgr.es/m/CALj2ACVkW-EXH_4pmBK8tNeHRz5ksUC4WddGactuCjPiBch-cg@mail.gmail.com	2020-10-07 08:14:19 +05:30
Fujii Masao	fd26f78231	Archive timeline history files in standby if archive_mode is set to "always". Previously the standby server didn't archive timeline history files streamed from the primary even when archive_mode is set to "always", while it archives the streamed WAL files. This could cause the PITR to fail because there was no required timeline history file in the archive. The cause of this issue was that walreceiver didn't mark those files as ready for archiving. This commit makes walreceiver mark those streamed timeline history files as ready for archiving if archive_mode=always. Then the archiver process archives the marked timeline history files. Back-patch to all supported versions. Reported-by: Grigory Smolkin Author: Grigory Smolkin, Fujii Masao Reviewed-by: David Zhang, Anastasia Lubennikova Discussion: https://postgr.es/m/54b059d4-2b48-13a4-6f43-95a087c92367@postgrespro.ru	2020-09-29 16:21:46 +09:00
Amit Kapila	079d0cacf4	Fix the logical replication from HEAD to lower versions. Commit `464824323e` changed the logical replication protocol to allow the streaming of in-progress transactions and used the new version of protocol irrespective of the server version. Use the appropriate version of the protocol based on the server version. Reported-by: Ashutosh Sharma Author: Dilip Kumar Reviewed-by: Ashutosh Sharma and Amit Kapila Discussion: https://postgr.es/m/CAE9k0P=9OpXcNrcU5Gsvd5MZ8GFpiN833vNHzX6Uc=8+h1ft1Q@mail.gmail.com	2020-09-26 10:13:51 +05:30
Amit Kapila	24fb35e111	Update file header comments for logical/relation.c. Author: Amit Langote Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CA+HiwqE20oZoix13JyCeALpTf_SmjarZWtBFe5sND6zz+iupAw@mail.gmail.com	2020-09-18 10:14:30 +05:30
Tom Lane	babef40c9a	Teach walsender to update its process title for replication commands. Because the code path taken for SQL commands executed in a walsender will update the process title, we pretty much have to update the title for replication commands as well. Otherwise, the title shows "idle" for the rest of a logical walsender's lifetime once it's executed any SQL command. Playing with this, I confirm that a walsender now typically spends most of its life reporting walsender postgres [local] START_REPLICATION Considering this in isolation, it might be better to have it say walsender postgres [local] sending replication data However, consistency with the other cases seems to be a stronger argument. In passing, remove duplicative pgstat_report_activity call. Discussion: https://postgr.es/m/880181.1600026471@sss.pgh.pa.us	2020-09-16 21:06:50 -04:00
Alvaro Herrera	07082b08cc	Fix bogus completion tag usage in walsender Since commit `fd5942c18f` (2012, 9.3-era), walsender has been sending completion tags for certain replication commands twice -- and they're not even consistent. Apparently neither libpq nor JDBC have a problem with it, but it's not kosher. Fix by remove the EndCommand() call in the common code path for them all, and inserting specific calls to EndReplicationCommand() specifically in those places where it's needed. EndReplicationCommand() is a new simple function to send the completion tag for replication commands. Do this instead of sending a generic SELECT completion tag for them all, which was also pretty bogus (if innocuous). While at it, change StartReplication() to use EndReplicationCommand() instead of pg_puttextmessage(). In commit `2f9661311b`, I failed to realize that replication commands are not close-enough kin of regular SQL commands, so the DROP_REPLICATION_SLOT tag I added is undeserved and a type pun. Take it out. Backpatch to 13, where the latter commit appeared. The duplicate tag has been sent since 9.3, but since nothing is broken, it doesn't seem worth fixing. Per complaints from Tom Lane. Discussion: https://postgr.es/m/1347966.1600195735@sss.pgh.pa.us	2020-09-16 21:16:25 -03:00
Tom Lane	44fc6e259b	Centralize setup of SIGQUIT handling for postmaster child processes. We decided that the policy established in commit `7634bd4f6` for the bgwriter, checkpointer, walwriter, and walreceiver processes, namely that they should accept SIGQUIT at all times, really ought to apply uniformly to all postmaster children. Therefore, get rid of the duplicative and inconsistent per-process code for establishing that signal handler and removing SIGQUIT from BlockSig. Instead, make InitPostmasterChild do it. The handler set up by InitPostmasterChild is SignalHandlerForCrashExit, which just summarily does _exit(2). In interactive backends, we almost immediately replace that with quickdie, since we would prefer to try to tell the client that we're dying. However, this patch is changing the behavior of autovacuum (both launcher and workers), as well as walsenders. Those processes formerly also used quickdie, but AFAICS that was just mindless copy-and-paste: they don't have any interactive client that's likely to benefit from being told this. The stats collector continues to be an outlier, in that it thinks SIGQUIT means normal exit. That should probably be changed for consistency, but there's another patch set where that's being dealt with, so I didn't do so here. Discussion: https://postgr.es/m/644875.1599933441@sss.pgh.pa.us	2020-09-16 16:04:36 -04:00
Tom Lane	2000b6c10a	Don't fetch partition check expression during InitResultRelInfo. Since there is only one place that actually needs the partition check expression, namely ExecPartitionCheck, it's better to fetch it from the relcache there. In this way we will never fetch it at all if the query never has use for it, and we still fetch it just once when we do need it. The reason for taking an interest in this is that if the relcache doesn't already have the check expression cached, fetching it requires obtaining AccessShareLock on the partition root. That means that operations that look like they should only touch the partition itself will also take a lock on the root. In particular we observed that TRUNCATE on a partition may take a lock on the partition's root, contributing to a deadlock situation in parallel pg_restore. As written, this patch does have a small cost, which is that we are microscopically reducing efficiency for the case where a partition has an empty check expression. ExecPartitionCheck will be called, and will go through the motions of setting up and checking an empty qual, where before it would not have been called at all. We could avoid that by adding a separate boolean flag to track whether there is a partition expression to test. However, this case only arises for a default partition with no siblings, which surely is not an interesting case in practice. Hence adding complexity for it does not seem like a good trade-off. Amit Langote, per a suggestion by me Discussion: https://postgr.es/m/VI1PR03MB31670CA1BD9625C3A8C5DD05EB230@VI1PR03MB3167.eurprd03.prod.outlook.com	2020-09-16 14:28:18 -04:00
Tom Lane	3d65b0593c	Fix bogus cache-invalidation logic in logical replication worker. The code recorded cache invalidation events by zeroing the "localreloid" field of affected cache entries. However, it's possible for an inval event to occur even while we have the entry open and locked. So an ill-timed inval could result in "cache lookup failed for relation 0" errors, if the worker's code tried to use the cleared field. We can fix that by creating a separate bool field to record whether the entry needs to be revalidated. (In the back branches, cram the bool into what had been padding space, to avoid an ABI break in the somewhat unlikely event that any extension is looking at this struct.) Also, rearrange the logic in logicalrep_rel_open so that it does the right thing in cases where table_open would fail. We should retry the lookup by name in that case, but we didn't. The real-world impact of this is probably small. In the first place, the error conditions are very low probability, and in the second place, the worker would just exit and get restarted. We only noticed because in a CLOBBER_CACHE_ALWAYS build, the failure can occur repeatedly, preventing the worker from making progress. Nonetheless, it's clearly a bug, and it impedes a useful type of testing; so back-patch to v10 where this code was introduced. Discussion: https://postgr.es/m/1032727.1600096803@sss.pgh.pa.us	2020-09-16 12:07:31 -04:00
Amit Kapila	69bd60672a	Fix initialization of RelationSyncEntry for streaming transactions. In commit `464824323e`, for each RelationSyncEntry we maintained the list of xids (streamed_txns) for which we have already sent the schema. This helps us to track when to send the schema to the downstream node for replication of streaming transactions. Before this list got initialized, we were processing invalidation messages which access this list and led to an assertion failure. In passing, clean up the nearby code: * Initialize the list of xids with NIL instead of NULL which is our usual coding practice. * Remove the MemoryContext switch for creating a RelationSyncEntry in dynahash. Diagnosed-by: Amit Kapila and Tom Lane Author: Amit Kapila Reviewed-by: Tom Lane and Dilip Kumar Discussion: https://postgr.es/m/904373.1600033123@sss.pgh.pa.us	2020-09-16 07:45:44 +05:30
Tom Lane	f560209c6e	Make walsenders show their replication commands in pg_stat_activity. A walsender process that has executed a SQL command left the text of that command in pg_stat_activity.query indefinitely, which is quite confusing if it's in RUNNING state but not doing that query. An easy and useful fix is to treat replication commands as if they were SQL queries, and show them in pg_stat_activity according to the same rules as for regular queries. While we're at it, it seems also sensible to set debug_query_string, allowing error logging and debugging to see the replication command. While here, clean up assorted silliness in exec_replication_command: * The SQLCmd path failed to restore CurrentMemoryContext to the caller's value, and failed to delete the temp context created in this routine. It's only through great good fortune that these oversights did not result in long-term memory leaks or other problems. It seems cleaner to code SQLCmd as a separate early-exit path, so do it like that. * Remove useless duplicate call of SnapBuildClearExportedSnapshot(). * replication_scanner_finish() was never called. None of those things are significant enough to merit a backpatch, so this is for HEAD only. Discussion: https://postgr.es/m/880181.1600026471@sss.pgh.pa.us	2020-09-14 12:35:00 -04:00
Peter Eisentraut	3e0242b24c	Message fixes and style improvements	2020-09-14 06:42:30 +02:00
Amit Kapila	ddd5f6d260	Remove unused function declaration in logicalproto.h. In the passing, fix a typo in pgoutput.c. Reported-by: Tomas Vondra Author: Tomas Vondra Reviewed-by: Dilip Kumar Discussion: https://postgr.es/m/20200909084353.pncuclpbwlr7vylh@development	2020-09-12 07:47:53 +05:30
Tom Lane	7634bd4f6d	Accept SIGQUIT during error recovery in auxiliary processes. The bgwriter, checkpointer, walwriter, and walreceiver processes claimed to allow SIGQUIT "at all times". In reality SIGQUIT would get re-blocked during error recovery, because we didn't update the actual signal mask immediately, so sigsetjmp() would save and reinstate a mask that includes SIGQUIT. This appears to be simply a coding oversight. There's never a good reason to hold off SIGQUIT in these processes, because it's going to just call _exit(2) which should be safe enough, especially since the postmaster is going to tear down shared memory afterwards. Hence, stick in PG_SETMASK() calls to install the modified BlockSig mask immediately. Also try to improve the comments around sigsetjmp blocks. Most of them were just referencing postgres.c, which is misleading because actually postgres.c manages the signals differently. No back-patch, since there's no evidence that this is causing any problems in the field. Discussion: https://postgr.es/m/CALDaNm1d1hHPZUg3xU4XjtWBOLCrA+-2cJcLpw-cePZ=GgDVfA@mail.gmail.com	2020-09-11 16:01:36 -04:00
Alvaro Herrera	9f1cf97bb5	Print WAL logical message contents in pg_waldump This helps debuggability when looking at WAL streams containing logical messages. Author: Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CAExHW5sWx49rKmXbg5H1Xc1t+nRv9PaYKQmgw82HPt6vWDVmDg@mail.gmail.com	2020-09-10 19:37:02 -03:00
Michael Paquier	aad546bd0a	doc: Fix some grammar and inconsistencies Some comments are fixed while on it. Author: Justin Pryzby Discussion: https://postgr.es/m/20200818171702.GK17022@telsasoft.com Backpatch-through: 9.6	2020-09-10 15:50:19 +09:00
Tom Lane	f3e1e66196	Minor fixes in docs and error messages. Alexander Lakhin Discussion: https://postgr.es/m/ce7debdd-c943-d7a7-9b41-687107b27831@gmail.com	2020-09-09 11:53:39 -04:00
Tom Lane	c8746f999e	Fix over-eager ping'ing in logical replication receiver. Commit `3f60f690f` only partially fixed the broken-status-tracking issue in LogicalRepApplyLoop: we need ping_sent to have the same lifetime as last_recv_timestamp. The effects are much less serious than what that commit fixed, though. AFAICS this would just lead to extra ping requests being sent, once per second until the sender responds. Still, it's a bug, so backpatch to v10 as before. Discussion: https://postgr.es/m/959627.1599248476@sss.pgh.pa.us	2020-09-04 20:33:36 -04:00
Tom Lane	9a851039aa	Remove still more useless assignments. Fix some more things scan-build pointed to as dead stores. In some of these cases, rearranging the code a little leads to more readable code IMO. It's all cosmetic, though. Discussion: https://postgr.es/m/CAEudQAo1+AcGppxDSg8k+zF4+Kv+eJyqzEDdbpDg58-=MQcerQ@mail.gmail.com	2020-09-04 20:33:36 -04:00
Amit Kapila	ac15b499f7	Fix inline marking introduced in commit `464824323e`. Forgot to add inline marking in changes_filename() declaration. In the passing, add inline marking for a similar function subxact_filename(). Reported-By: Nathan Bossart Discussion: https://postgr.es/m/E98FBE8F-B878-480D-A728-A60C6EED3047@amazon.com	2020-09-04 11:25:16 +05:30
Amit Kapila	464824323e	Add support for streaming to built-in logical replication. To add support for streaming of in-progress transactions into the built-in logical replication, we need to do three things: * Extend the logical replication protocol, so identify in-progress transactions, and allow adding additional bits of information (e.g. XID of subtransactions). * Modify the output plugin (pgoutput) to implement the new stream API callbacks, by leveraging the extended replication protocol. * Modify the replication apply worker, to properly handle streamed in-progress transaction by spilling the data to disk and then replaying them on commit. We however must explicitly disable streaming replication during replication slot creation, even if the plugin supports it. We don't need to replicate the changes accumulated during this phase, and moreover we don't have a replication connection open so we don't have where to send the data anyway. Author: Tomas Vondra, Dilip Kumar and Amit Kapila Reviewed-by: Amit Kapila, Kuntal Ghosh and Ajin Cherian Tested-by: Neha Sharma, Mahendra Singh Thalor and Ajin Cherian Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com	2020-09-03 07:54:07 +05:30
Fujii Masao	be9788e998	Avoid unnecessary acquisition of SyncRepLock in transaction commit time. In SyncRepWaitForLSN() routine called in transaction commit time, SyncRepLock is necessary to atomically both check the shared sync_standbys_defined flag and operate the sync replication wait-queue. On the other hand, when the flag is false, the lock is not necessary because the wait-queue is not touched. But due to the changes by commit `48c9f49265`, previously the lock was taken whatever the flag was. This could cause unnecessary performance overhead in every transaction commit time. Therefore this commit avoids that unnecessary aquisition of SyncRepLock. Author: Fujii Masao Reviewed-by: Asim Praveen, Masahiko Sawada, Discussion: https://postgr.es/m/20200406050332.nsscfqjzk2d57zyx@alap3.anarazel.de	2020-09-02 10:55:55 +09:00
Andres Freund	623a9ba79b	snapshot scalability: cache snapshots using a xact completion counter. Previous commits made it faster/more scalable to compute snapshots. But not building a snapshot is still faster. Now that GetSnapshotData() does not maintain RecentGlobal* anymore, that is actually not too hard: This commit introduces xactCompletionCount, which tracks the number of top-level transactions with xids (i.e. which may have modified the database) that completed in some form since the start of the server. We can avoid rebuilding the snapshot's contents whenever the current xactCompletionCount is the same as it was when the snapshot was originally built. Currently this check happens while holding ProcArrayLock. While it's likely possible to perform the check without acquiring ProcArrayLock, it seems better to do that separately / later, some careful analysis is required. Even with the lock this is a significant win on its own. On a smaller two socket machine this gains another ~1.03x, on a larger machine the effect is roughly double (earlier patch version tested though). If we were able to safely avoid the lock there'd be another significant gain on top of that. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de	2020-08-17 21:08:30 -07:00
Andres Freund	5788e258bb	snapshot scalability: Move PGXACT->vacuumFlags to ProcGlobal->vacuumFlags. Similar to the previous commit this increases the chance that data frequently needed by GetSnapshotData() stays in l2 cache. As we now take care to not unnecessarily write to ProcGlobal->vacuumFlags, there should be very few modifications to the ProcGlobal->vacuumFlags array. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de	2020-08-14 15:33:35 -07:00
Andres Freund	1f51c17c68	snapshot scalability: Move PGXACT->xmin back to PGPROC. Now that xmin isn't needed for GetSnapshotData() anymore, it leads to unnecessary cacheline ping-pong to have it in PGXACT, as it is updated considerably more frequently than the other PGXACT members. After the changes in `dc7420c2c9`, this is a very straight-forward change. For highly concurrent, snapshot acquisition heavy, workloads this change alone can significantly increase scalability. E.g. plain pgbench on a smaller 2 socket machine gains 1.07x for read-only pgbench, 1.22x for read-only pgbench when submitting queries in batches of 100, and 2.85x for batches of 100 'SELECT';. The latter numbers are obviously not to be expected in the real-world, but micro-benchmark the snapshot computation scalability (previously spending ~80% of the time in GetSnapshotData()). Author: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de	2020-08-13 16:25:21 -07:00
Andres Freund	dc7420c2c9	snapshot scalability: Don't compute global horizons while building snapshots. To make GetSnapshotData() more scalable, it cannot not look at at each proc's xmin: While snapshot contents do not need to change whenever a read-only transaction commits or a snapshot is released, a proc's xmin is modified in those cases. The frequency of xmin modifications leads to, particularly on higher core count systems, many cache misses inside GetSnapshotData(), despite the data underlying a snapshot not changing. That is the most significant source of GetSnapshotData() scaling poorly on larger systems. Without accessing xmins, GetSnapshotData() cannot calculate accurate horizons / thresholds as it has so far. But we don't really have to: The horizons don't actually change that much between GetSnapshotData() calls. Nor are the horizons actually used every time a snapshot is built. The trick this commit introduces is to delay computation of accurate horizons until there use and using horizon boundaries to determine whether accurate horizons need to be computed. The use of RecentGlobal[Data]Xmin to decide whether a row version could be removed has been replaces with new GlobalVisTest* functions. These use two thresholds to determine whether a row can be pruned: 1) definitely_needed, indicating that rows deleted by XIDs >= definitely_needed are definitely still visible. 2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can definitely be removed GetSnapshotData() updates definitely_needed to be the xmin of the computed snapshot. When testing whether a row can be removed (with GlobalVisTestIsRemovableXid()) and the tested XID falls in between the two (i.e. XID >= maybe_needed && XID < definitely_needed) the boundaries can be recomputed to be more accurate. As it is not cheap to compute accurate boundaries, we limit the number of times that happens in short succession. As the boundaries used by GlobalVisTestIsRemovableXid() are never reset (with maybe_needed updated by GetSnapshotData()), it is likely that further test can benefit from an earlier computation of accurate horizons. To avoid regressing performance when old_snapshot_threshold is set (as that requires an accurate horizon to be computed), heap_page_prune_opt() doesn't unconditionally call TransactionIdLimitedForOldSnapshots() anymore. Both the computation of the limited horizon, and the triggering of errors (with SetOldSnapshotThresholdTimestamp()) is now only done when necessary to remove tuples. This commit just removes the accesses to PGXACT->xmin from GetSnapshotData(), but other members of PGXACT residing in the same cache line are accessed. Therefore this in itself does not result in a significant improvement. Subsequent commits will take advantage of the fact that GetSnapshotData() now does not need to access xmins anymore. Note: This contains a workaround in heap_page_prune_opt() to keep the snapshot_too_old tests working. While that workaround is ugly, the tests currently are not meaningful, and it seems best to address them separately. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de	2020-08-12 16:03:49 -07:00
Peter Eisentraut	1784f278a6	Replace remaining StrNCpy() by strlcpy() They are equivalent, except that StrNCpy() zero-fills the entire destination buffer instead of providing just one trailing zero. For all but a tiny number of callers, that's just overhead rather than being desirable. Remove StrNCpy() as it is now unused. In some cases, namestrcpy() is the more appropriate function to use. While we're here, simplify the API of namestrcpy(): Remove the return value, don't check for NULL input. Nothing was using that anyway. Also, remove a few unused name-related functions. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/44f5e198-36f6-6cdb-7fa9-60e34784daae%402ndquadrant.com	2020-08-10 23:20:37 +02:00

1 2 3 4 5 ...

1002 Commits