postgres

mirror of https://github.com/postgres/postgres.git synced 2025-09-02 04:21:28 +03:00

Author	SHA1	Message	Date
Thomas Munro	bdb657edd6	Remove configure probe and related tests for getrlimit. getrlimit() is in SUSv2 and all targeted systems have it. Windows doesn't have it. We could just use #ifndef WIN32, but for a little more explanation about why we're making things conditional, let's retain the HAVE_GETRLIMIT macro. It's defined in port.h for Unix systems. On systems that have it, it's not necessary to test for RLIMIT_CORE, RLIMIT_STACK or RLIMIT_NOFILE macros, since SUSv2 requires those and all targeted systems have them. Also remove references to a pre-historic alternative spelling of RLIMIT_NOFILE, and coding that seemed to believe that Cygwin didn't have it. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com	2022-08-05 09:18:34 +12:00
Robert Haas	bbe08b8869	Use TRUNCATE to preserve relfilenode for pg_largeobject + index. Commit `9a974cbcba` arranged to preserve the relfilenode of user tables across pg_upgrade, but failed to notice that pg_upgrade treats pg_largeobject as a user table and thus it needs the same treatment. Otherwise, large objects will appear to vanish after a pg_upgrade. Commit `d498e052b4` fixed this problem by teaching pg_dump to UPDATE pg_class.relfilenode for pg_largeobject and its index. However, because an UPDATE on the catalog rows doesn't change anything on disk, this can leave stray files behind in the new cluster. They will normally be empty, but it's a little bit untidy. Hence, this commit arranges to do the same thing using DDL. Specifically, it makes TRUNCATE work for the pg_largeobject catalog when in binary-upgrade mode, and it then uses that command in binary-upgrade dumps as a way of setting pg_class.relfilenode for pg_largeobject and its index. That way, the old files are removed from the new cluster. Discussion: http://postgr.es/m/CA+TgmoYYMXGUJO5GZk1-MByJGu_bB8CbOL6GJQC8=Bzt6x6vDg@mail.gmail.com	2022-07-28 16:03:42 -04:00
Robert Haas	851f4cc75c	Clean up some residual confusion between OIDs and RelFileNumbers. Commit `b0a55e4329` missed a few places where we are referring to the number used as a part of the relation filename as an "OID". We now want to call that a "RelFileNumber". Some of these places actually made it sound like the OID in question is pg_class.oid rather than pg_class.relfilenode, which is especially good to clean up. Dilip Kumar with some editing by me.	2022-07-28 10:20:29 -04:00
Fujii Masao	d396606ebe	Fix comment in procarray.c. Commit `fea10a6434` renamed VariableCacheData.nextFullXid to nextXid. But commit `dc7420c2c9` introduced the comment mentioning nextFullXid. This commit changes"nextFullXid" to "nextXid" in the comment. Author: Zhang Mingli Discussion: https://postgr.es/m/642BA615-4B28-4B0C-BDF6-4D33E366BCDF@gmail.com	2022-07-28 14:56:20 +09:00
Robert Haas	3ac88fddd9	Convert macros to static inline functions (buf_internals.h) Dilip Kumar, reviewed by Vignesh C, Ashutosh Sharma, and me. Discussion: http://postgr.es/m/CAFiTN-tYbM7D+2UGiNc2kAFMSQTa5FTeYvmg-Vj2HvPdVw2Gvg@mail.gmail.com	2022-07-27 13:54:37 -04:00
Heikki Linnakangas	7a08f78aea	Fix ReadRecentBuffer for local buffers. It incorrectly used GetBufferDescriptor instead of GetLocalBufferDescriptor, causing it to not find the correct buffer in most cases, and performing an out-of-bounds memory read in the corner case that temp_buffers > shared_buffers. It also bumped the usage-count on the buffer, even if it was previously pinned. That won't lead to crashes or incorrect results, but it's different from what the shared-buffer case does, and different from the usual code in LocalBufferAlloc. Fix that too, and make the code ordering match LocalBufferAlloc() more closely, so that it's easier to verify that it's doing the same thing. Currently, ReadRecentBuffer() is only used with non-temp relations, in WAL redo, so the broken code is currently dead code. However, it could be used by extensions. Backpatch-through: 14 Discussion: https://www.postgresql.org/message-id/2d74b46f-27c9-fb31-7f99-327a87184cc0%40iki.fi Reviewed-by: Thomas Munro, Zhang Mingli, Richard Guo	2022-07-25 08:52:46 +03:00
Andres Freund	4f20506fe0	Add output path arg in generate-lwlocknames.pl This is in preparation for building postgres with meson / ninja. When building with meson, commands are run at the root of the build tree. Add an option to put build output into the appropriate place. This can be utilized by src/tools/msvc/ for a minor simplification, which also provides some coverage for the new option. Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/5e216522-ba3c-f0e6-7f97-5276d0270029@enterprisedb.com	2022-07-18 12:24:32 -07:00
Thomas Munro	3b8d23a3e1	Make dsm_impl_posix_resize more future-proof. Commit `4518c798` blocks signals for a short region of code, but it assumed that whatever called it had the signal mask set to UnBlockSig on entry. That may be true today (or may even not be, in extensions in the wild), but it would be better not to make that assumption. We should save-and-restore the caller's signal mask. The PG_SETMASK() portability macro couldn't be used for that, which is why it wasn't done before. But... considering that commit `a65e0864` established back in 9.6 that supported POSIX systems have sigprocmask(), and that this is POSIX-only code, there is no reason not to use standard sigprocmask() directly to achieve that. Back-patch to all supported releases, like `4518c798` and `80845b7c`. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKGKx6Biq7_UuV0kn9DW%2B8QWcpJC1qwhizdtD9tN-fn0H0g%40mail.gmail.com	2022-07-16 12:22:42 +12:00
Thomas Munro	80845b7c0b	Don't clobber postmaster sigmask in dsm_impl_resize. Commit `4518c798` intended to block signals in regular backends that allocate DSM segments, but dsm_impl_resize() is also reached by dsm_postmaster_startup(). It's not OK to clobber the postmaster's signal mask, so only manipulate the signal mask when under the postmaster. Back-patch to all releases, like `4518c798`. Discussion: https://postgr.es/m/CA%2BhUKGKNpK%3D2OMeea_AZwpLg7Bm4%3DgYWk7eDjZ5F6YbozfOf8w%40mail.gmail.com	2022-07-15 02:00:09 +12:00
Thomas Munro	5794491058	Avoid shadowing a variable in sync.c. It was confusing to reuse the variable name 'entry' in two scopes. Use distinct variable names. Reported-by: Ranier Vilela <ranier.vf@gmail.com> Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/CAEudQArDrFyQ15Am3rgWBunGBVZFDb90onTS8SRiFAWHeiLiFA%40mail.gmail.com	2022-07-15 00:06:32 +12:00
Thomas Munro	7bae3bbf62	Create a distinct wait event for POSIX DSM allocation. Previously we displayed "DSMFillZeroWrite" while in posix_fallocate(), because we shared the same wait event for "mmap" and "posix" DSM types. Let's introduce a new wait event "DSMAllocate", to be more accurate. Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20220711174518.yldckniicknsxgzl%40awork3.anarazel.de	2022-07-14 23:56:28 +12:00
Thomas Munro	712704d353	Remove redundant ftruncate() for POSIX DSM memory. In early releases of the DSM infrastructure, it was possible to resize segments. That was removed in release 12 by commit `3c60d0fa`. Now the ftruncate() + posix_fallocate() sequence during DSM segment creation has a redundant step: we're always extending from zero to the desired size, so we might as well just call posix_fallocate(). Let's also include the remaining ftruncate() call (non-Linux POSIX systems) in the wait event reporting, for good measure. Discussion: https://postgr.es/m/CA%2BhUKGJSm-nq8s%2B_59zb7NbFQF-OS%3DxTnTAiGLrQpuSmU2y_1A%40mail.gmail.com	2022-07-14 23:56:22 +12:00
Thomas Munro	4518c798b2	Block signals while allocating DSM memory. On Linux, we call posix_fallocate() on shm_open()'d memory to avoid later potential SIGBUS (see commit `899bd785`). Based on field reports of systems stuck in an EINTR retry loop there, there, we made it possible to break out of that loop via slightly odd coding where the CHECK_FOR_INTERRUPTS() call was somewhat removed from the loop (see commit `422952ee`). On further reflection, that was not a great choice for at least two reasons: 1. If interrupts were held, the CHECK_FOR_INTERRUPTS() would do nothing and the EINTR error would be surfaced to the user. 2. If EINTR was reported but neither QueryCancelPending nor ProcDiePending was set, then we'd dutifully retry, but with a bit more understanding of how posix_fallocate() works, it's now clear that you can get into a loop that never terminates. posix_fallocate() is not a function that can do some of the job and tell you about progress if it's interrupted, it has to undo what it's done so far and report EINTR, and if signals keep arriving faster than it can complete (cf recovery conflict signals), you're stuck. Therefore, for now, we'll simply block most signals to guarantee progress. SIGQUIT is not blocked (see InitPostmasterChild()), because its expected handler doesn't return, and unblockable signals like SIGCONT are not expected to arrive at a high rate. For good measure, we'll include the ftruncate() call in the blocked region, and add a retry loop. Back-patch to all supported releases. Reported-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reported-by: Nicola Contu <nicola.contu@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20220701154105.jjfutmngoedgiad3%40alvherre.pgsql	2022-07-14 18:01:27 +12:00
Robert Haas	09c5acee8e	Rename some functions to mention Relation instead of RelFileLocator. This is definitely shorter, and hopefully clearer. Kyotaro Horiguchi, reviewed by Dilip Kumar and by me Discussion: http://postgr.es/m/20220707.174436.1885393789789795413.horikyota.ntt@gmail.com	2022-07-12 10:26:48 -04:00
Thomas Munro	718aa43a4e	Further tidy-up for old CPU architectures. Further to commit `92d70b77`, let's drop the code we carry for the following untested architectures: M68K, M88K, M32R, SuperH. We have no idea if anything actually works there, and surely as vintage hardware and microcontrollers they would be underpowered for modern purposes. We could always consider re-adding SuperH based on evidence of usage and build farm support, if someone shows up to provide it. While here, SPARC is usually written in all caps. Suggested-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Robert Haas <robertmhaas@gmail.com> (the idea, not the patch) Discussion: https://postgr.es/m/959917.1657522169%40sss.pgh.pa.us	2022-07-12 11:05:32 +12:00
Robert Haas	b2d5b4c6e0	Fix mistake in comment. Kyotaro Horiguchi Discussion: http://postgr.es/m/20220708.145951.382076151410075693.horikyota.ntt@gmail.com	2022-07-11 13:33:21 -04:00
Peter Eisentraut	2cd2569c72	Convert macros to static inline functions (bufpage.h) Remove PageIsValid() and PageSizeIsValid(), which weren't used and seem unnecessary. Some code using these formerly-macros needs some adjustments because it was previously playing loose with the Page vs. PageHeader types, which is no longer possible with the functions instead of macros. Reviewed-by: Amul Sul <sulamul@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/5b558da8-99fb-0a99-83dd-f72f05388517%40enterprisedb.com	2022-07-11 07:21:52 +02:00
Thomas Munro	eed959a457	Fix lock assertions in dshash.c. dshash.c previously maintained flags to be able to assert that you didn't hold any partition lock. These flags could get out of sync with reality in error scenarios. Get rid of all that, and make assertions about the locks themselves instead. Since LWLockHeldByMe() loops internally, we don't want to put that inside another loop over all partition locks. Introduce a new debugging-only interface LWLockAnyHeldByMe() to avoid that. This problem was noted by Tom and Andres while reviewing changes to support the new shared memory stats system, and later showed up in reality while working on commit `389869af`. Back-patch to 11, where dshash.c arrived. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Reported-by: Andres Freund <andres@anarazel.de> Reviewed-by: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20220311012712.botrpsikaufzteyt@alap3.anarazel.de Discussion: https://postgr.es/m/CA%2BhUKGJ31Wce6HJ7xnVTKWjFUWQZPBngxfJVx4q0E98pDr3kAw%40mail.gmail.com	2022-07-11 16:43:29 +12:00
Robert Haas	b0a55e4329	Change internal RelFileNode references to RelFileNumber or RelFileLocator. We have been using the term RelFileNode to refer to either (1) the integer that is used to name the sequence of files for a certain relation within the directory set aside for that tablespace/database combination; or (2) that value plus the OIDs of the tablespace and database; or occasionally (3) the whole series of files created for a relation based on those values. Using the same name for more than one thing is confusing. Replace RelFileNode with RelFileNumber when we're talking about just the single number, i.e. (1) from above, and with RelFileLocator when we're talking about all the things that are needed to locate a relation's files on disk, i.e. (2) from above. In the places where we refer to (3) as a relfilenode, instead refer to "relation storage". Since there is a ton of SQL code in the world that knows about pg_class.relfilenode, don't change the name of that column, or of other SQL-facing things that derive their name from it. On the other hand, do adjust closely-related internal terminology. For example, the structure member names dbNode and spcNode appear to be derived from the fact that the structure itself was called RelFileNode, so change those to dbOid and spcOid. Likewise, various variables with names like rnode and relnode get renamed appropriately, according to how they're being used in context. Hopefully, this is clearer than before. It is also preparation for future patches that intend to widen the relfilenumber fields from its current width of 32 bits. Variables that store a relfilenumber are now declared as type RelFileNumber rather than type Oid; right now, these are the same, but that can now more easily be changed. Dilip Kumar, per an idea from me. Reviewed also by Andres Freund. I fixed some whitespace issues, changed a couple of words in a comment, and made one other minor correction. Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com	2022-07-06 11:39:09 -04:00
Andres Freund	3f8148c256	Revert 019_replslot_limit.pl related debugging aids. This reverts most of `91c0570a79`, `f28bf667f6`, `fe0972ee5e`, `afdeff1052`. The only thing left is the retry loop in 019_replslot_limit.pl that avoids spurious failures by retrying a couple times. We haven't seen any hard evidence that this is caused by anything but slow process shutdown. We did not find any cases where walsenders did not vanish after waiting for longer. Therefore there's no reason for this debugging code to remain. Discussion: https://postgr.es/m/20220530190155.47wr3x2prdwyciah@alap3.anarazel.de Backpatch: 15-	2022-07-05 11:01:10 -07:00
Michael Paquier	eb64ceac7e	Remove durable_rename_excl() A previous commit replaced all the calls to this function with durable_rename() as of `dac1ff3`, making it used nowhere in the tree. Using it in extension code is also risky based on the issues described in this previous commit, so let's remove it. This makes possible the removal of HAVE_WORKING_LINK. Author: Nathan Bossart Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13	2022-07-05 12:54:25 +09:00
Thomas Munro	389869af59	Harden dsm_impl.c against unexpected EEXIST. Previously, we trusted the OS not to report EEXIST unless we'd passed in IPC_CREAT \| IPC_EXCL or O_CREAT \| O_EXCL, as appropriate. Solaris's shm_open() can in fact do that, causing us to crash because we didn't ereport and then we blithely assumed the mapping was successful. Let's treat EEXIST just like any other error, unless we're actually trying to create a new segment. This applies to shm_open(), where this behavior has been seen, and also to the equivalent operations for our sysv and mmap modes just on principle. Based on the underlying reason for the error, namely contention on a lock file managed by Solaris librt for each distinct name, this problem is only likely to happen on 15 and later, because the new shared memory stats system produces shm_open() calls for the same path from potentially large numbers of backends concurrently during authentication. Earlier releases only shared memory segments between a small number of parallel workers under one Gather node. You could probably hit it if you tried hard enough though, and we should have been more defensive in the first place. Therefore, back-patch to all supported releases. Per build farm animal margay. This isn't the end of the story, though, it just changes random crashes into random "File exists" errors; more work needed for a green build farm. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGKqKrCV5xKWfh9rnm%3Do%3DDwZLTLtnsj_XpUi9g5%3DV%2B9oyg%40mail.gmail.com	2022-07-01 14:17:54 +12:00
Heikki Linnakangas	adf6d5dfb2	Fix visibility check when XID is committed in CLOG but not in procarray. TransactionIdIsInProgress had a fast path to return 'false' if the single-item CLOG cache said that the transaction was known to be committed. However, that was wrong, because a transaction is first marked as committed in the CLOG but doesn't become visible to others until it has removed its XID from the proc array. That could lead to an error: ERROR: t_xmin is uncommitted in tuple to be updated or for an UPDATE to go ahead without blocking, before the previous UPDATE on the same row was made visible. The window is usually very short, but synchronous replication makes it much wider, because the wait for synchronous replica happens in that window. Another thing that makes it hard to hit is that it's hard to get such a commit-in-progress transaction into the single item CLOG cache. Normally, if you call TransactionIdIsInProgress on such a transaction, it determines that the XID is in progress without checking the CLOG and without populating the cache. One way to prime the cache is to explicitly call pg_xact_status() on the XID. Another way is to use a lot of subtransactions, so that the subxid cache in the proc array is overflown, making TransactionIdIsInProgress rely on pg_subtrans and CLOG checks. This has been broken ever since it was introduced in 2008, but the race condition is very hard to hit, especially without synchronous replication. There were a couple of reports of the error starting from summer 2021, but no one was able to find the root cause then. TransactionIdIsKnownCompleted() is now unused. In 'master', remove it, but I left it in place in backbranches in case it's used by extensions. Also change pg_xact_status() to check TransactionIdIsInProgress(). Previously, it only checked the CLOG, and returned "committed" before the transaction was actually made visible to other queries. Note that this also means that you cannot use pg_xact_status() to reproduce the bug anymore, even if the code wasn't fixed. Report and analysis by Konstantin Knizhnik. Patch by Simon Riggs, with the pg_xact_status() change added by me. Author: Simon Riggs Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/flat/4da7913d-398c-e2ad-d777-f752cf7f0bbb%40garret.ru	2022-06-27 08:21:08 +03:00
Thomas Munro	3ab4fc5dcf	Don't trust signalfd() on illumos. Since commit `6a2a70a02`, we've used signalfd() to receive latch wakeups when building with WAIT_USE_EPOLL (default for Linux and illumos), and our traditional self-pipe when falling back to WAIT_USE_POLL (default for other Unixes with neither epoll() nor kqueue()). Unexplained hangs and kernel panics have been reported on illumos systems, apparently linked to this use of signalfd(), leading illumos users and build farm members to have to define WAIT_USE_POLL explicitly as a work-around. A bug report exists at https://www.illumos.org/issues/13700 but no fix is available yet. Let's provide a way for illumos users to go back to self-pipes with epoll(), like releases before 14, and choose that by default. No change for Linux users. To help with development/debugging, macros WAIT_USE_{EPOLL,POLL} and WAIT_USE_{SIGNALFD,SELF_PIPE} can be defined explicitly to override the defaults. Back-patch to 14, where we started using signalfd(). Reported-by: Japin Li <japinli@hotmail.com> Reported-by: Olaf Bohlen <olbohlen@eenfach.de> (off-list) Reviewed-by: Japin Li <japinli@hotmail.com> Discussion: https://postgr.es/m/MEYP282MB1669C8D88F0997354C2313C1B6CA9%40MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM	2022-06-26 10:55:21 +12:00
Tom Lane	7ab5b4eb48	Be more careful about GucSource for internally-driven GUC settings. The original advice for hard-wired SetConfigOption calls was to use PGC_S_OVERRIDE, particularly for PGC_INTERNAL GUCs. However, that's really overkill for PGC_INTERNAL GUCs, since there is no possibility that we need to override a user-provided setting. Instead use PGC_S_DYNAMIC_DEFAULT in most places, so that the value will appear with source = 'default' in pg_settings and thereby not be shown by psql's new \dconfig command. The one exception is that when changing in_hot_standby in a hot-standby session, we still use PGC_S_OVERRIDE, because people felt that seeing that in \dconfig would be a good thing. Similarly use PGC_S_DYNAMIC_DEFAULT for the auto-tune value of wal_buffers (if possible, that is if wal_buffers wasn't explicitly set to -1), and for the typical 2MB value of max_stack_depth. In combination these changes remove four not-very-interesting entries from the typical output of \dconfig, all of which people fingered as "why is that showing up?" in the discussion thread. Discussion: https://postgr.es/m/3118455.1649267333@sss.pgh.pa.us	2022-06-08 13:26:18 -04:00
Alvaro Herrera	e28bb88519	Revert changes to CONCURRENTLY that "sped up" Xmin advance This reverts commit `d9d076222f` "VACUUM: ignore indexing operations with CONCURRENTLY". These changes caused indexes created with the CONCURRENTLY option to miss heap tuples that were HOT-updated and HOT-pruned during the index creation. Before these changes, HOT pruning would have been prevented by the Xmin of the transaction creating the index, but because this change was precisely to allow the Xmin to move forward ignoring that backend, now other backends scanning the table can prune them. This is not a problem for VACUUM (which requires a lock that conflicts with a CREATE INDEX CONCURRENTLY operation), but HOT-prune can definitely occur. In other words, Xmin advancement was sped up, but at the cost of corrupting the resulting index. Regrettably, this means that the new feature in PG14 that RIC/CIC on very large tables no longer force VACUUM to retain very old tuples goes away. We might try to implement it again in a later release, but for now the risk of indexes missing tuples is too high and there's no easy fix. Backpatch to 14, where this change appeared. Reported-by: Peter Slavov <pet.slavov@gmail.com> Diagnosys-by: Andrey Borodin <x4mmm@yandex-team.ru> Diagnosys-by: Michael Paquier <michael@paquier.xyz> Diagnosys-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/17485-396609c6925b982d%40postgresql.org	2022-05-31 21:24:59 +02:00
Robert Haas	f5bfba5413	shm_mq_sendv: Fix flushing bug when receiver not yet attached. With the old logic, when the reciever had not yet attached, we would never call shm_mq_inc_bytes_written(), even if force_flush = true was specified. That could result in a situation where data that the sender believes it has sent is never received. Along the way, remove a useless function prototype for a nonexistent function from shm_mq.h. Commit `46846433a0` introduced these problems. Pavan Deolasee, with a few changes by me. Discussion: https://postgr.es/m/CABOikdPkwtLLCTnzzmpSMXo3QZa2yXq0J7Q61ssdLFAJYrOVvQ@mail.gmail.com	2022-05-31 08:46:54 -04:00
Thomas Munro	12e28aac8e	Add debugging help in OwnLatch(). Build farm animal gharial recently failed a few times in a parallel worker's call to OwnLatch() with "ERROR: latch already owned". Let's turn that into a PANIC and show the PID of the owner, to try to learn more. Discussion: https://postgr.es/m/CA%2BhUKGJ_0RGcr7oUNzcHdn7zHqHSB_wLSd3JyS2YC_DYB%2B-V%3Dg%40mail.gmail.com	2022-05-31 12:06:11 +12:00
Alvaro Herrera	8d061acd12	Repurpose PROC_COPYABLE_FLAGS as PROC_XMIN_FLAGS This is a slight, convenient semantics change from what commit `0f0cfb4940` ("Fix parallel operations that prevent oldest xmin from advancing") introduced that lets us simplify the coding in the one place where it is used. Backpatch to 13. This is related to commit `6fea65508a` ("Tighten ComputeXidHorizons' handling of walsenders") rewriting the code site where this is used, which has not yet been backpatched, but it may well be in the future. Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/202204191637.eldwa2exvguw@alvherre.pgsql	2022-05-19 16:20:32 +02:00
Alvaro Herrera	c4f113e8fe	Clean up newlines following left parentheses Like commit `c9d2977519`.	2022-05-13 23:52:35 +02:00
Robert Haas	4f2400cb3f	Add a new shmem_request_hook hook. Currently, preloaded libraries are expected to request additional shared memory and LWLocks in _PG_init(). However, it is not unusal for such requests to depend on MaxBackends, which won't be initialized at that time. Such requests could also depend on GUCs that other modules might change. This introduces a new hook where modules can safely use MaxBackends and GUCs to request additional shared memory and LWLocks. Furthermore, this change restricts requests for shared memory and LWLocks to this hook. Previously, libraries could make requests until the size of the main shared memory segment was calculated. Unlike before, we no longer silently ignore requests received at invalid times. Instead, we FATAL if someone tries to request additional shared memory or LWLocks outside of the hook. Nathan Bossart and Julien Rouhaud Discussion: https://postgr.es/m/20220412210112.GA2065815%40nathanxps13 Discussion: https://postgr.es/m/Yn2jE/lmDhKtkUdr@paquier.xyz	2022-05-13 09:31:06 -04:00
Tom Lane	23e7b38bfe	Pre-beta mechanical code beautification. Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.	2022-05-12 15:17:30 -04:00
Thomas Munro	0d3431497d	Add logging for excessive ProcSignalBarrier waits. To enable diagnosis of systems that are not processing ProcSignalBarrier requests promptly, add a LOG message every 5 seconds if we seem to be wedged. Although you could already see this state as a wait event in pg_stat_activity, the log message also shows the PID of the process that is preventing progress. Also add DEBUG1 logging around the whole wait loop. Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CA%2BTgmoYJ03r5359gQutRGP9BtigYCg3_UskcmnVjBf-QO3-0pQ%40mail.gmail.com	2022-05-11 18:03:03 +12:00
Thomas Munro	b74e94dc27	Rethink PROCSIGNAL_BARRIER_SMGRRELEASE. With sufficiently bad luck, it was possible for IssuePendingWritebacks() to reopen a file after we'd processed PROCSIGNAL_BARRIER_SMGRRELEASE and before the file was unlinked by some other backend. That left a small hole in commit 4eb21763's plan to fix all spurious errors from DROP TABLESPACE and similar on Windows. Fix by closing md.c's segments, instead of just closing fd.c's descriptors, and then teaching smgrwriteback() not to open files that aren't already open. Reported-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/20220209220004.kb3dgtn2x2k2gtdm%40alap3.anarazel.de	2022-05-07 16:32:10 +12:00
Andres Freund	8f1537d10e	Fix possibility of self-deadlock in ResolveRecoveryConflictWithBufferPin(). The tests added in `9f8a050f68` failed nearly reliably on FreeBSD in CI, and occasionally on the buildfarm. That turns out to be caused not by a bug in the test, but by a longstanding bug in recovery conflict handling. The standby timeout handler, used by ResolveRecoveryConflictWithBufferPin(), executed SendRecoveryConflictWithBufferPin() inside a signal handler. A bad idea, because the deadlock timeout handler (or a spurious latch set) could have interrupted ProcWaitForSignal(). If unlucky that could cause a self-deadlock on ProcArrayLock, if the deadlock check is in SendRecoveryConflictWithBufferPin()->CancelDBBackends(). To fix, set a flag in StandbyTimeoutHandler(), and check the flag in ResolveRecoveryConflictWithBufferPin(). Subsequently the recovery conflict tests will be backpatched. Discussion: https://postgr.es/m/20220413002626.udl7lll7f3o7nre7@alap3.anarazel.de Backpatch: 10-	2022-05-02 18:25:00 -07:00
Etsuro Fujita	d89f97e83e	Fix typo in comment.	2022-05-02 16:45:00 +09:00
Michael Paquier	55b5686511	Revert recent changes with durable_rename_excl() This reverts commits `2c902bb` and `ccfbd92`. Per buildfarm members kestrel, rorqual and calliphoridae, the assertions checking that a TLI history file should not exist when created by a WAL receiver have been failing, and switching to durable_rename() over durable_rename_excl() would cause the newest TLI history file to overwrite the existing one. We need to think harder about such cases, so revert the new logic for now. Note that all the failures have been reported in the test 025_stuck_on_old_timeline. Discussion: https://postgr.es/m/511362.1651116498@sss.pgh.pa.us	2022-04-28 13:08:16 +09:00
Michael Paquier	2c902bbf19	Remove durable_rename_excl() `ccfbd92` has replaced all existing in-core callers of this function in favor of durable_rename(). durable_rename_excl() is by nature unsafe on crashes happening at the wrong time, so just remove it. Author: Nathan Bossart Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13	2022-04-28 11:10:40 +09:00
Tom Lane	6fea65508a	Tighten ComputeXidHorizons' handling of walsenders. ComputeXidHorizons (nee GetOldestXmin) thought that it could identify walsenders by checking for proc->databaseId == 0. Perhaps that was safe when the code was written, but it's been wrong at least since autovacuum was invented. Background processes that aren't connected to any particular database, such as the autovacuum launcher and logical replication launcher, look like that too. This imprecision is harmful because when such a process advertises an xmin, the result is to hold back dead-tuple cleanup in all databases, though it'd be sufficient to hold it back in shared catalogs (which are the only relations such a process can access). Aside from being generally inefficient, this has recently been seen to cause regression test failures in the buildfarm, as a consequence of the logical replication launcher's startup transaction preventing VACUUM from marking pages of a user table as all-visible. We only want that global hold-back effect for the case where a walsender is advertising a hot standby feedback xmin. Therefore, invent a new PGPROC flag that says that a process' xmin should be considered globally, and check that instead of using the incorrect databaseId == 0 test. Currently only a walsender sets that flag, and only if it is not connected to any particular database. (This is for bug-compatibility with the undocumented behavior of the existing code, namely that feedback sent by a client who has connected to a particular database would not be applied globally. I'm not sure this is a great definition; however, such a client is capable of issuing plain SQL commands, and I don't think we want xmins advertised for such commands to be applied globally. Perhaps this could do with refinement later.) While at it, I rewrote the comment in ComputeXidHorizons, and re-ordered the commented-upon if-tests, to make them match up for intelligibility's sake. This is arguably a back-patchable bug fix, but given the lack of complaints I think it prudent to let it age awhile in HEAD first. Discussion: https://postgr.es/m/1346227.1649887693@sss.pgh.pa.us	2022-04-15 17:50:05 -04:00
David Rowley	a00fd066b1	Add missing spaces after single-line comments Only 1 of 3 of these changes appear to be handled by pgindent. That change is new to v15. The remaining two appear to be left alone by pgindent. The exact reason for that is not 100% clear to me. It seems related to the fact that it's a line that contains only a single line comment and no actual code. It does not seem worth investigating this in too much detail. In any case, these do not conform to our usual practices, so fix them. Author: Justin Pryzby Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com	2022-04-14 09:28:56 +12:00
Alvaro Herrera	24d2b2680a	Remove extraneous blank lines before block-closing braces These are useless and distracting. We wouldn't have written the code with them to begin with, so there's no reason to keep them. Author: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com Discussion: https://postgr.es/m/attachment/133167/0016-Extraneous-blank-lines.patch	2022-04-13 19:16:02 +02:00
Robert Haas	7fc0e7de9f	Revert the addition of GetMaxBackends() and related stuff. This reverts commits `0147fc7`, `4567596`, `aa64f23`, and `5ecd018`. There is no longer agreement that introducing this function was the right way to address the problem. The consensus now seems to favor trying to make a correct value for MaxBackends available to mdules executing their _PG_init() functions. Nathan Bossart Discussion: http://postgr.es/m/20220323045229.i23skfscdbvrsuxa@jrouhaud	2022-04-12 14:45:23 -04:00
David Rowley	b0e5f02ddc	Fix various typos and spelling mistakes in code comments Author: Justin Pryzby Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com	2022-04-11 20:49:41 +12:00
Robert Haas	f37015a161	Rename delayChkpt to delayChkptFlags. Before commit `412ad7a556`, delayChkpt was a Boolean. Now it's an integer. Extensions using it need to be appropriately updated, so let's rename the field to make sure that a hard compilation failure occurs. Replacing delayChkpt with delayChkptFlags made a few comments extend past 80 characters, so I reflowed them and changed some wording very slightly. The back-branches will need a different change to restore compatibility with existing minor releases; this is just for master. Per suggestion from Tom Lane. Discussion: http://postgr.es/m/a7880f4d-1d74-582a-ada7-dad168d046d1@enterprisedb.com	2022-04-08 11:44:17 -04:00
Michael Paquier	efb0ef909f	Track I/O timing for temporary file blocks in EXPLAIN (BUFFERS) Previously, the output of EXPLAIN (BUFFERS) option showed only the I/O timing spent reading and writing shared and local buffers. This commit adds on top of that the I/O timing for temporary buffers in the output of EXPLAIN (for spilled external sorts, hashes, materialization. etc). This can be helpful for users in cases where the I/O related to temporary buffers is the bottleneck. Like its cousin, this information is available only when track_io_timing is enabled. Playing the patch, this is showing an extra overhead of up to 1% even when using gettimeofday() as implementation for interval timings, which is slightly within the usual range noise still that's measurable. Author: Masahiko Sawada Reviewed-by: Georgios Kokolatos, Melanie Plageman, Julien Rouhaud, Ranier Vilela Discussion: https://postgr.es/m/CAD21AoAJgotTeP83p6HiAGDhs_9Fw9pZ2J=_tYTsiO5Ob-V5GQ@mail.gmail.com	2022-04-08 11:27:21 +09:00
Peter Geoghegan	10a8d13823	Truncate line pointer array during heap pruning. Reclaim space from the line pointer array when heap pruning leaves behind a contiguous group of LP_UNUSED items at the end of the array. This happens during subsequent page defragmentation. Certain kinds of heap line pointer bloat are ameliorated by this new optimization. Follow-up work to commit `3c3b8a4b26`, which taught VACUUM to truncate the line pointer array in about the same way during VACUUM's second pass over the heap. We now apply line pointer array truncation during both the first and the second pass over the heap made by VACUUM. We can also perform line pointer array truncation during opportunistic pruning. Matthias van de Meent, with small tweaks by me. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAEze2WjgaQc55Y5f5CQd3L=eS5CZcff2Obxp=O6pto8-f0hC4w@mail.gmail.com Discussion: https://postgr.es/m/CAEze2Wg36%2B4at2eWJNcYNiW2FJmht34x3YeX54ctUSs7kKoNcA%40mail.gmail.com	2022-04-07 15:42:12 -07:00
Thomas Munro	5dc0418fab	Prefetch data referenced by the WAL, take II. Introduce a new GUC recovery_prefetch. When enabled, look ahead in the WAL and try to initiate asynchronous reading of referenced data blocks that are not yet cached in our buffer pool. For now, this is done with posix_fadvise(), which has several caveats. Since not all OSes have that system call, "try" is provided so that it can be enabled where available. Better mechanisms for asynchronous I/O are possible in later work. Set to "try" for now for test coverage. Default setting to be finalized before release. The GUC wal_decode_buffer_size limits the distance we can look ahead in bytes of decoded data. The existing GUC maintenance_io_concurrency is used to limit the number of concurrent I/Os allowed, based on pessimistic heuristics used to infer that I/Os have begun and completed. We'll also not look more than maintenance_io_concurrency * 4 block references ahead. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (earlier version) Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> (earlier version) Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (earlier version) Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> (earlier version) Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> (earlier version) Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> (earlier version) Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com	2022-04-07 19:42:14 +12:00
Andres Freund	5891c7a8ed	pgstat: store statistics in shared memory. Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful statistics. Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory. The header for pgstat.c contains an overview of the architecture. The stats collector is not needed anymore, remove it. By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot "leak" anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On systems with many small relations pgstat_vacuum_stat() could be quite expensive. Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica. Subsequent commits will perform some further code cleanup, adapt docs and add tests. Bumps PGSTAT_FILE_FORMAT_ID. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version) Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version) Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version) Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de	2022-04-06 21:29:46 -07:00
Andres Freund	8b1dccd37c	pgstat: scaffolding for transactional stats creation / drop. One problematic part of the current statistics collector design is that there is no reliable way of getting rid of statistics entries. Because of that pgstat_vacuum_stat() (called by [auto-]vacuum) matches all stats for the current database with the catalog contents and tries to drop now-superfluous entries. That's quite expensive. What's worse, it doesn't work on physical replicas, despite physical replicas collection statistics entries. This commit introduces infrastructure to create / drop statistics entries transactionally, together with the underlying catalog objects (functions, relations, subscriptions). pgstat_xact.c maintains a list of stats entries created / dropped transactionally in the current transaction. To ensure the removal of statistics entries is durable dropped statistics entries are included in commit / abort (and prepare) records, which also ensures that stats entries are dropped on standbys. Statistics entries created separately from creating the underlying catalog object (e.g. when stats were previously lost due to an immediate restart) are not WAL logged. However that can only happen outside of the transaction creating the catalog object, so it does not lead to "leaked" statistics entries. For this to work, functions creating / dropping functions / relations / subscriptions need to call into pgstat. For subscriptions this was already done when dropping subscriptions, via pgstat_report_subscription_drop() (now renamed to pgstat_drop_subscription()). This commit does not actually drop stats yet, it just provides the infrastructure. It is however a largely independent piece of infrastructure, so committing it separately makes sense. Bumps XLOG_PAGE_MAGIC. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de	2022-04-06 18:27:52 -07:00
Andres Freund	46a2d2499a	dsm: allow use in single user mode. It might seem pointless to allow use of dsm in single user mode, but otherwise subsystems might need dedicated single user mode code paths. Besides changing the assert, all that's needed is to make some windows code assuming the presence of postmaster conditional. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CA+hUKGL9hY_VY=+oUK+Gc1iSRx-Ls5qeYJ6q=dQVZnT3R63Taw@mail.gmail.com	2022-04-06 12:40:04 -07:00

... 2 3 4 5 6 ...

2549 Commits