postgres

mirror of https://github.com/postgres/postgres.git synced 2025-11-12 05:01:15 +03:00

Author	SHA1	Message	Date
Michael Paquier	6286efb524	Lower error level from PANIC to FATAL when restoring slots at startup When restoring slot information from disk at startup and filling in shared memory information, the startup process would issue a PANIC message if more slots are found than what max_replication_slots allows, and then Postgres generates a core dump, recommending to increase max_replication_slots. This gives users a switch to crash Postgres at will by creating slots, lower the configuration to not support it, and then restart it. Making Postgres crash hard in this case is overdoing it just to give a recommendation to users. So instead use a FATAL, which makes Postgres fail to start without crashing, still giving the recommendation. This is more consistent with what happens for prepared transactions for example. Author: Michael Paquier Reviewed-by: Andres Freund Discussion: https://postgr.es/m/20181030025109.GD1644@paquier.xyz	2018-11-02 07:59:24 +09:00
Andres Freund	8a99f8a827	Fix error message typo introduced `691d79a079`. Reported-By: Michael Paquier Discussion: https://postgr.es/m/20181101003405.GB1727@paquier.xyz Backpatch: 9.4-, like the previous commit	2018-11-01 10:44:29 -07:00
Andres Freund	691d79a079	Disallow starting server with insufficient wal_level for existing slot. Previously it was possible to create a slot, change wal_level, and restart, even if the new wal_level was insufficient for the slot. That's a problem for both logical and physical slots, because the necessary WAL records are not generated. This removes a few tests in newer versions that, somewhat inexplicably, whether restarting with a too low wal_level worked (a buggy behaviour!). Reported-By: Joshua D. Drake Author: Andres Freund Discussion: https://postgr.es/m/20181029191304.lbsmhshkyymhw22w@alap3.anarazel.de Backpatch: 9.4-, where replication slots where introduced	2018-10-31 15:46:39 -07:00
Andres Freund	62649bad83	Correct constness of a few variables. This allows the compiler / linker to mark affected pages as read-only. There's other cases, but they're a bit more invasive, and should go through some review. These are easy. They were found with objdump -j .data -t src/backend/postgres\|awk '{print $4, $5, $6}'\|sort -r\|less Discussion: https://postgr.es/m/20181015200754.7y7zfuzsoux2c4ya@alap3.anarazel.de	2018-10-15 21:01:14 -07:00
Thomas Munro	e73ca79fc7	Move the replication lag tracker into heap memory. Andres Freund complained about the 128KB of .bss occupied by LagTracker. It's only needed in the walsender process, so allocate it in heap memory there. Author: Thomas Munro Discussion: https://postgr.es/m/20181015200754.7y7zfuzsoux2c4ya%40alap3.anarazel.de	2018-10-16 11:04:41 +13:00
Peter Eisentraut	35584fd05f	Make spelling of "acknowledgment" consistent I used the preferred U.S. spelling, as we do in other cases.	2018-10-15 10:06:45 +02:00
Andres Freund	e9edc1ba0b	Fix logical decoding error when system table w/ toast is repeatedly rewritten. Repeatedly rewriting a mapped catalog table with VACUUM FULL or CLUSTER could cause logical decoding to fail with: ERROR, "could not map filenode \"%s\" to relation OID" To trigger the problem the rewritten catalog had to have live tuples with toasted columns. The problem was triggered as during catalog table rewrites the heap_insert() check that prevents logical decoding information to be emitted for system catalogs, failed to treat the new heap's toast table as a system catalog (because the new heap is not recognized as a catalog table via RelationIsLogicallyLogged()). The relmapper, in contrast to the normal catalog contents, does not contain historical information. After a single rewrite of a mapped table the new relation is known to the relmapper, but if the table is rewritten twice before logical decoding occurs, the relfilenode cannot be mapped to a relation anymore. Which then leads us to error out. This only happens for toast tables, because the main table contents aren't re-inserted with heap_insert(). The fix is simple, add a new heap_insert() flag that prevents logical decoding information from being emitted, and accept during decoding that there might not be tuple data for toast tables. Unfortunately that does not fix pre-existing logical decoding errors. Doing so would require not throwing an error when a filenode cannot be mapped to a relation during decoding, and that seems too likely to hide bugs. If it's crucial to fix decoding for an existing slot, temporarily changing the ERROR in ReorderBufferCommit() to a WARNING appears to be the best fix. Author: Andres Freund Discussion: https://postgr.es/m/20180914021046.oi7dm4ra3ot2g2kt@alap3.anarazel.de Backpatch: 9.4-, where logical decoding was introduced	2018-10-10 13:53:02 -07:00
Tom Lane	d73f4c74dd	In the executor, use an array of pointers to access the rangetable. Instead of doing a lot of list_nth() accesses to es_range_table, create a flattened pointer array during executor startup and index into that to get at individual RangeTblEntrys. This eliminates one source of O(N^2) behavior with lots of partitions. (I'm not exactly convinced that it's the most important source, but it's an easy one to fix.) Amit Langote and David Rowley Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-04 15:48:17 -04:00
Tom Lane	9ddef36278	Centralize executor's opening/closing of Relations for rangetable entries. Create an array estate->es_relations[] paralleling the es_range_table, and store references to Relations (relcache entries) there, so that any given RT entry is opened and closed just once per executor run. Scan nodes typically still call ExecOpenScanRelation, but ExecCloseScanRelation is no more; relation closing is now done centrally in ExecEndPlan. This is slightly more complex than one would expect because of the interactions with relcache references held in ResultRelInfo nodes. The general convention is now that ResultRelInfo->ri_RelationDesc does not represent a separate relcache reference and so does not need to be explicitly closed; but there is an exception for ResultRelInfos in the es_trig_target_relations list, which are manufactured by ExecGetTriggerResultRel and have to be cleaned up by ExecCleanUpTriggerState. (That much was true all along, but these ResultRelInfos are now more different from others than they used to be.) To allow the partition pruning logic to make use of es_relations[] rather than having its own relcache references, adjust PartitionedRelPruneInfo to store an RT index rather than a relation OID. Amit Langote, reviewed by David Rowley and Jesper Pedersen, some mods by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-10-04 14:03:42 -04:00
Tom Lane	fdba460a26	Create an RTE field to record the query's lock mode for each relation. Add RangeTblEntry.rellockmode, which records the appropriate lock mode for each RTE_RELATION rangetable entry (either AccessShareLock, RowShareLock, or RowExclusiveLock depending on the RTE's role in the query). This patch creates the field and makes all creators of RTE nodes fill it in reasonably, but for the moment nothing much is done with it. The plan is to replace assorted post-parser logic that re-determines the right lockmode to use with simple uses of rte->rellockmode. For now, just add Asserts in each of those places that the rellockmode matches what they are computing today. (In some cases the match isn't perfect, so the Asserts are weaker than you might expect; but this seems OK, as per discussion.) This passes check-world for me, but it seems worth pushing in this state to see if the buildfarm finds any problems in cases I failed to test. catversion bump due to change of stored rules. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp	2018-09-30 13:55:51 -04:00
Andres Freund	29c94e03c7	Split ExecStoreTuple into ExecStoreHeapTuple and ExecStoreBufferHeapTuple. Upcoming changes introduce further types of tuple table slots, in preparation of making table storage pluggable. New storage methods will have different representation of tuples, therefore the slot accessor should refer explicitly to heap tuples. Instead of just renaming the functions, split it into one function that accepts heap tuples not residing in buffers, and one accepting ones in buffers. Previously one function was used for both, but that was a bit awkward already, and splitting will allow us to represent slot types for tuples in buffers and normal memory separately. This is split out from the patch introducing abstract slots, as this largely consists out of mechanical changes. Author: Ashutosh Bapat Reviewed-By: Andres Freund Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de	2018-09-25 16:27:48 -07:00
Michael Paquier	d6e98ebe37	Improve some error message strings and errcodes This makes a bit less work for translators, by unifying error strings a bit more with what the rest of the code does, this time for three error strings in autoprewarm and one in base backup code. After some code review of slot.c, some file-access errcodes are reported but lead to an incorrect internal error, while corrupted data makes the most sense, similarly to the previous work done in `e41d0a1`. Also, after calling rmtree(), a WARNING gets reported, which is a duplicate of what the internal call report, so make the code more consistent with all other code paths calling this function. Author: Michael Paquier Discussion: https://postgr.es/m/20180902200747.GC1343@paquier.xyz	2018-09-04 11:06:04 -07:00
Tomas Vondra	4ddd8f5f55	Fix memory leak in TRUNCATE decoding When decoding a TRUNCATE record, the relids array was being allocated in the main ReorderBuffer memory context, but not released with the change resulting in a memory leak. The array was also ignored when serializing/deserializing the change, assuming all the information is stored in the change itself. So when spilling the change to disk, we've only we have serialized only the pointer to the relids array. Thanks to never releasing the array, the pointer however remained valid even after loading the change back to memory, preventing an actual crash. This fixes both the memory leak and (de)serialization. The relids array is still allocated in the main ReorderBuffer memory context (none of the existing ones seems like a good match, and adding an extra context seems like an overkill). The allocation is wrapped in a new ReorderBuffer API functions, to keep the details within reorderbuffer.c, just like the other ReorderBufferGet methods do. Author: Tomas Vondra Discussion: https://www.postgresql.org/message-id/flat/66175a41-9342-2845-652f-1bd4c3ee50aa%402ndquadrant.com Backpatch: 11, where decoding of TRUNCATE was introduced	2018-09-03 02:10:24 +02:00
Michael Paquier	caa0c6ceba	Fix initial sync of slot parent directory when restoring status At the beginning of recovery, information from replication slots is recovered from disk to memory. In order to ensure the durability of the information, the status file as well as its parent directory are synced. It happens that the sync on the parent directory was done directly using the status file path, which is logically incorrect, and the current code has been doing a sync on the same object twice in a row. Reported-by: Konstantin Knizhnik Diagnosed-by: Konstantin Knizhnik Author: Michael Paquier Discussion: https://postgr.es/m/9eb1a6d5-b66f-2640-598d-c5ea46b8f68a@postgrespro.ru Backpatch-through: 9.4-	2018-09-02 12:40:30 -07:00
Tom Lane	44cac93464	Avoid using potentially-under-aligned page buffers. There's a project policy against using plain "char buf[BLCKSZ]" local or static variables as page buffers; preferred style is to palloc or malloc each buffer to ensure it is MAXALIGN'd. However, that policy's been ignored in an increasing number of places. We've apparently got away with it so far, probably because (a) relatively few people use platforms on which misalignment causes core dumps and/or (b) the variables chance to be sufficiently aligned anyway. But this is not something to rely on. Moreover, even if we don't get a core dump, we might be paying a lot of cycles for misaligned accesses. To fix, invent new union types PGAlignedBlock and PGAlignedXLogBlock that the compiler must allocate with sufficient alignment, and use those in place of plain char arrays. I used these types even for variables where there's no risk of a misaligned access, since ensuring proper alignment should make kernel data transfers faster. I also changed some places where we had been palloc'ing short-lived buffers, for coding style uniformity and to save palloc/pfree overhead. Since this seems to be a live portability hazard (despite the lack of field reports), back-patch to all supported versions. Patch by me; thanks to Michael Paquier for review. Discussion: https://postgr.es/m/1535618100.1286.3.camel@credativ.de	2018-09-01 15:27:17 -04:00
Noah Misch	ab0ed6153a	Ignore server-side delays when enforcing wal_sender_timeout. Healthy clients of servers having poor I/O performance, such as buildfarm members hamster and tern, saw unexpected timeouts. That disagreed with documentation. This fix adds one gettimeofday() call whenever ProcessRepliesIfAny() finds no client reply messages. Back-patch to 9.4; the bug's symptom is rare and mild, and the code all moved between 9.3 and 9.4. Discussion: https://postgr.es/m/20180826034600.GA1105084@rfd.leadboat.com	2018-08-31 22:59:58 -07:00
Jeff Davis	ba9d35b8eb	Reconsider new file extension in commit `91f26d5f`. Andres and Tom objected to the choice of the ".tmp" extension. Changing to Andres's suggestion of ".spill". Discussion: https://postgr.es/m/88092095-3348-49D8-8746-EB574B1D30EA%40anarazel.de	2018-08-25 22:52:46 -07:00
Jeff Davis	91f26d5fe4	Change extension of spilled ReorderBufferChange data to ".tmp". The previous extension, ".snap", was chosen for historical reasons and became confusing. Discussion: https://postgr.es/m/CAMp0ubd_P8vBGx8=MfDXQJZxHA5D_Zarw5cCkDxJ_63+pWRJ9w@mail.gmail.com	2018-08-25 09:36:09 -07:00
Tomas Vondra	fa73b377ee	Close the file descriptor in ApplyLogicalMappingFile The function was forgetting to close the file descriptor, resulting in failures like this: ERROR: 53000: exceeded maxAllocatedDescs (492) while trying to open file "pg_logical/mappings/map-4000-4eb-1_60DE1E08-5376b5-537c6b" LOCATION: OpenTransientFile, fd.c:2161 Simply close the file at the end, and backpatch to 9.4 (where logical decoding was introduced). While at it, fix a nearby typo. Discussion: https://www.postgresql.org/message-id/flat/738a590a-2ce5-9394-2bef-7b1caad89b37%402ndquadrant.com	2018-08-16 16:49:57 +02:00
Heikki Linnakangas	8e19a82640	Don't run atexit callbacks in quickdie signal handlers. exit() is not async-signal safe. Even if the libc implementation is, 3rd party libraries might have installed unsafe atexit() callbacks. After receiving SIGQUIT, we really just want to exit as quickly as possible, so we don't really want to run the atexit() callbacks anyway. The original report by Jimmy Yih was a self-deadlock in startup_die(). However, this patch doesn't address that scenario; the signal handling while waiting for the startup packet is more complicated. But at least this alleviates similar problems in the SIGQUIT handlers, like that reported by Asim R P later in the same thread. Backpatch to 9.3 (all supported versions). Discussion: https://www.postgresql.org/message-id/CAOMx_OAuRUHiAuCg2YgicZLzPVv5d9_H4KrL_OFsFP%3DVPekigA%40mail.gmail.com	2018-08-08 19:10:32 +03:00
Michael Paquier	5a23c74b63	Reset properly errno before calling write() `6cb3372` enforces errno to ENOSPC when less bytes than what is expected have been written when it is unset, though it forgot to properly reset errno before doing a system call to write(), causing errno to potentially come from a previous system call. Reported-by: Tom Lane Author: Michael Paquier Reviewed-by: Tom Lane Discussion: https://postgr.es/m/31797.1533326676@sss.pgh.pa.us	2018-08-05 05:31:18 +09:00
Alvaro Herrera	c40489e449	Fix logical replication slot initialization This was broken in commit `9c7d06d606`, which inadvertently gave the wrong value to fast_forward in one StartupDecodingContext call. Fix by flipping the value. Add a test for the obvious error, namely trying to initialize a replication slot with an nonexistent output plugin. While at it, move the CreateDecodingContext call earlier, so that any errors are reported before sending the CopyBoth message. Author: Dave Cramer <davecramer@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CADK3HHLVkeRe1v4P02-5hj55H3_yJg3AEtpXyEY5T3wuzO2jSg@mail.gmail.com	2018-08-01 17:47:15 -04:00
Alvaro Herrera	4f10e7ea7b	Set ActiveSnapshot when logically replaying inserts Input functions for the inserted tuples may require a snapshot, when they are replayed by native logical replication. An example is a domain with a constraint using a SQL-language function, which prior to this commit failed to apply on the subscriber side. Reported-by: Mai Peng <maily.peng@webedia-group.com> Co-authored-by: Minh-Quan TRAN <qtran@itscaro.me> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/4EB4BD78-BFC3-4D04-B8DA-D53DF7160354@webedia-group.com Discussion: https://postgr.es/m/153211336163.1404.11721804383024050689@wrigleys.postgresql.org	2018-07-30 16:30:07 -04:00
Michael Paquier	e41d0a1090	Add proper errcodes to new error messages for read() failures Those would use the default ERRCODE_INTERNAL_ERROR, but for foreseeable failures an errcode ought to be set, ERRCODE_DATA_CORRUPTED making the most sense here. While on the way, fix one errcode_for_file_access missing in origin.c since the code has been created, and remove one assignment of errno to 0 before calling read(), as this was around to fit with what was present before `811b6e36` where errno would not be set when not enough bytes are read. I have noticed the first one, and Tom has pinged me about the second one. Author: Michael Paquier Reported-by: Tom Lane Discussion: https://postgr.es/m/27265.1531925836@sss.pgh.pa.us	2018-07-23 09:37:36 +09:00
Andres Freund	86eaf208ea	Hand code string to integer conversion for performance. As benchmarks show, using libc's string-to-integer conversion is pretty slow. At least part of the reason for that is that strtol[l] have to be more generic than what largely is required inside pg. This patch considerably speeds up int2/int4 input (int8 already was already using hand-rolled code). Most of the existing pg_atoi callers have been converted. But as one requires pg_atoi's custom delimiter functionality, and as it seems likely that there's external pg_atoi users, it seems sensible to just keep pg_atoi around. Author: Andres Freund Reviewed-By: Robert Haas Discussion: https://postgr.es/m/20171208214437.qgn6zdltyq5hmjpk@alap3.anarazel.de	2018-07-22 14:58:23 -07:00
Alvaro Herrera	1573995f55	Rewrite comments in replication slot advance implementation The code added by `9c7d06d606` was a bit obscure; clarify that by rewriting the comments. Lack of clarity has already caused bugs, so it's a worthy goal. Co-authored-by: Arseny Sher <a.sher@postgrespro.ru> Co-authored-by: Michaël Paquier <michael@paquier.xyz> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Petr Jelínek <petr.jelinek@2ndquadrant.com> Discussion: https://postgr.es/m/87y3fgoyrn.fsf@ars-thinkpad	2018-07-19 14:15:44 -04:00
Tom Lane	3cb646264e	Use a ResourceOwner to track buffer pins in all cases. Historically, we've allowed auxiliary processes to take buffer pins without tracking them in a ResourceOwner. However, that creates problems for error recovery. In particular, we've seen multiple reports of assertion crashes in the startup process when it gets an error while holding a buffer pin, as for example if it gets ENOSPC during a write. In a non-assert build, the process would simply exit without releasing the pin at all. We've gotten away with that so far just because a failure exit of the startup process translates to a database crash anyhow; but any similar behavior in other aux processes could result in stuck pins and subsequent problems in vacuum. To improve this, institute a policy that we must always have a resowner backing any attempt to pin a buffer, which we can enforce just by removing the previous special-case code in resowner.c. Add infrastructure to make it easy to create a process-lifespan AuxProcessResourceOwner and clear out its contents at appropriate times. Replace existing ad-hoc resowner management in bgwriter.c and other aux processes with that. (Thus, while the startup process gains a resowner where it had none at all before, some other aux process types are replacing an ad-hoc resowner with this code.) Also use the AuxProcessResourceOwner to manage buffer pins taken during StartupXLOG and ShutdownXLOG, even when those are being run in a bootstrap process or a standalone backend rather than a true auxiliary process. In passing, remove some other ad-hoc resource owner creations that had gotten cargo-culted into various other places. As far as I can tell that was all unnecessary, and if it had been necessary it was incomplete, due to lacking any provision for clearing those resowners later. (Also worth noting in this connection is that a process that hasn't called InitBufferPoolBackend has no business accessing buffers; so there's more to do than just add the resowner if we want to touch buffers in processes not covered by this patch.) Although this fixes a very old bug, no back-patch, because there's no evidence of any significant problem in non-assert builds. Patch by me, pursuant to a report from Justin Pryzby. Thanks to Robert Haas and Kyotaro Horiguchi for reviews. Discussion: https://postgr.es/m/20180627233939.GA10276@telsasoft.com	2018-07-18 12:15:16 -04:00
Heikki Linnakangas	6b387179ba	Fix misc typos, mostly in comments. A collection of typos I happened to spot while reading code, as well as grepping for common mistakes. Backpatch to all supported versions, as applicable, to avoid conflicts when backporting other commits in the future.	2018-07-18 16:17:32 +03:00
Michael Paquier	94019c879a	Fix more portability issues with casts to Size when using off_t This should tame the beast, as there are no other places where off_t is used in the new error messages. Reported again by longfin, which complained about walsender.c while I spotted the other two ones while double-checking.	2018-07-18 09:51:53 +09:00
Michael Paquier	811b6e36a9	Rework error messages around file handling Some error messages related to file handling are using the code path context to define their state. For example, 2PC-related errors are referring to "two-phase status files", or "relation mapping file" is used for catalog-to-filenode mapping, however those prove to be difficult to translate, and are not more helpful than just referring to the path of the file being worked on. So simplify all those error messages by just referring to files with their path used. In some cases, like the manipulation of WAL segments, the context is actually helpful so those are kept. Calls to the system function read() have also been rather inconsistent with their error handling sometimes not reporting the number of bytes read, and some other code paths trying to use an errno which has not been set. The in-core functions are using a more consistent pattern with this patch, which checks for both errno if set or if an inconsistent read is happening. So as to care about pluralization when reading an unexpected number of byte(s), "could not read: read %d of %zu" is used as error message, with %d field being the output result of read() and %zu the expected size. This simplifies the work of translators with less variations of the same message. Author: Michael Paquier Reviewed-by: Álvaro Herrera Discussion: https://postgr.es/m/20180520000522.GB1603@paquier.xyz	2018-07-18 08:01:23 +09:00
Robert Haas	32df1c9afa	Add subtransaction handling for table synchronization workers. Since the old logic was completely unaware of subtransactions, a change made in a subsequently-aborted subtransaction would still cause workers to be stopped at toplevel transaction commit. Fix that by managing a stack of worker lists rather than just one. Amit Khandekar and Robert Haas Discussion: http://postgr.es/m/CAJ3gD9eaG_mWqiOTA2LfAug-VRNn1hrhf50Xi1YroxL37QkZNg@mail.gmail.com	2018-07-16 17:33:22 -04:00
Michael Paquier	ce89ad0fa0	Fix argument of pg_create_logical_replication_slot for slot name All attributes and arguments using a slot name map to the data type "name", but this function has been using "text". This is cosmetic, as even if text is used then the slot name would be truncated to 64 characters anyway and stored as such. The documentation already said so and the function already assumed that the argument was of this type when fetching its value. Bump catalog version. Author: Sawada Masahiko Discussion: https://postgr.es/m/CAD21AoADYz_-eAqH5AVFaCaojcRgwpo9PW=u8kgTMys63oB8Cw@mail.gmail.com	2018-07-13 09:32:12 +09:00
Michael Paquier	9a7b7adc13	Make logical WAL sender report streaming state appropriately WAL senders sending logically-decoded data fail to properly report in "streaming" state when starting up, hence as long as one extra record is not replayed, such WAL senders would remain in a "catchup" state, which is inconsistent with the physical cousin. This can be easily reproduced by for example using pg_recvlogical and restarting the upstream server. The TAP tests have been slightly modified to detect the failure and strengthened so as future tests also make sure that a node is in streaming state when waiting for its catchup. Backpatch down to 9.4 where this code has been introduced. Reported-by: Sawada Masahiko Author: Simon Riggs, Sawada Masahiko Reviewed-by: Petr Jelinek, Michael Paquier, Vaishnavi Prabakaran Discussion: https://postgr.es/m/CAD21AoB2ZbCCqOx=bgKMcLrAvs1V0ZMqzs7wBTuDySezTGtMZA@mail.gmail.com	2018-07-12 10:19:35 +09:00
Michael Paquier	56a7147213	Block replication slot advance for these not yet reserving WAL Such replication slots are physical slots freshly created without WAL being reserved, which is the default behavior, which have not been used yet as WAL consumption resources to retain WAL. This prevents advancing a slot to a position older than any WAL available, which could falsify calculations for WAL segment recycling. This also cleans up a bit the code, as ReplicationSlotRelease() would be called on ERROR, and improves error messages. Reported-by: Kyotaro Horiguchi Author: Michael Paquier Reviewed-by: Andres Freund, Álvaro Herrera, Kyotaro Horiguchi Discussion: https://postgr.es/m/20180626071305.GH31353@paquier.xyz	2018-07-11 08:56:24 +09:00
Alvaro Herrera	a22445ff0b	Flip argument order in XLogSegNoOffsetToRecPtr Commit `fc49e24fa6` added an input argument after the existing output argument. Flip those. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20180708182345.imdgovmkffgtihhk@alvherre.pgsql	2018-07-09 14:33:38 -04:00
Alvaro Herrera	0ce5cf2ef2	Allow replication slots to be dropped in single-user mode Starting with commit `9915de6c1c`, replication slot drop uses a condition variable sleep to wait until the current user of the slot goes away. This is more user friendly than the previous behavior of erroring out if the slot is in use, but it fails with a not-for-user-consumption error message in single-user mode; plus, if you're using single-user mode because you don't want to start the server in the regular mode (say, disk is full and WAL won't recycle because of the slot), it's inconvenient. Fix by skipping the cond variable sleep in single-user mode, since there can't be anybody to wait for anyway. Reported-by: tushar <tushar.ahuja@enterprisedb.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/3b2f809f-326c-38dd-7a9e-897f957a4eb1@enterprisedb.com	2018-07-06 16:38:30 -04:00
Alvaro Herrera	3ca966c06f	logical decoding: beware of an unset specinsert change Coverity complains that there is no protection in the code (at least in non-assertion-enabled builds) against speculative insertion failing to follow the expected protocol. Add an elog(ERROR) for the case.	2018-07-05 17:42:37 -04:00
Michael Paquier	c072e80337	Correct function name in comment of logical decoding code Reported-by: Dave Cramer Author: Euler Taveira Discussion: https://postgr.es/m/CADK3HHKnPGJDLhjOFBY6+70Wd14iEH8c2GKw7UrOuUHp_GNFrA@mail.gmail.com	2018-07-02 13:30:12 +09:00
Andrew Dunstan	1e9c858090	pgindent run prior to branching	2018-06-30 12:25:49 -04:00
Alvaro Herrera	f49a80c481	Fix "base" snapshot handling in logical decoding Two closely related bugs are fixed. First, xmin of logical slots was advanced too early. During xl_running_xacts processing, xmin of the slot was set to the oldest running xid in the record, but that's wrong: actually, snapshots which will be used for not-yet-replayed transactions might consider older txns as running too, so we need to keep xmin back for them. The problem wasn't noticed earlier because DDL which allows to delete tuple (set xmax) while some another not-yet-committed transaction looks at it is pretty rare, if not unique: e.g. all forms of ALTER TABLE which change schema acquire ACCESS EXCLUSIVE lock conflicting with any inserts. The included test case (test_decoding's oldest_xmin) uses ALTER of a composite type, which doesn't have such interlocking. To deal with this, we must be able to quickly retrieve oldest xmin (oldest running xid among all assigned snapshots) from ReorderBuffer. To fix, add another list of ReorderBufferTXNs to the reorderbuffer, where transactions are sorted by base-snapshot-LSN. This is slightly different from the existing (sorted by first-LSN) list, because a transaction can have an earlier LSN but a later Xmin, if its first record does not obtain an xmin (eg. xl_xact_assignment). Note this new list doesn't fully replace the existing txn list: we still need that one to prevent WAL recycling. The second issue concerns SnapBuilder snapshots and subtransactions. SnapBuildDistributeNewCatalogSnapshot never assigned a snapshot to a transaction that is known to be a subtxn, which is good in the common case that the top-level transaction already has one (no point in doing so), but a bug otherwise. To fix, arrange to transfer the snapshot from the subtxn to its top-level txn as soon as the kinship gets known. test_decoding's snapshot_transfer verifies this. Also, fix a minor memory leak: refcount of toplevel's old base snapshot was not decremented when the snapshot is transferred from child. Liberally sprinkle code comments, and rewrite a few existing ones. This part is my (Álvaro's) contribution to this commit, as I had to write all those comments in order to understand the existing code and Arseny's patch. Reported-by: Arseny Sher <a.sher@postgrespro.ru> Diagnosed-by: Arseny Sher <a.sher@postgrespro.ru> Co-authored-by: Arseny Sher <a.sher@postgrespro.ru> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Antonin Houska <ah@cybertec.at> Discussion: https://postgr.es/m/87lgdyz1wj.fsf@ars-thinkpad	2018-06-26 16:48:10 -04:00
Alvaro Herrera	322548a8ab	Update obsolete comments Commit `9fab40ad32` removed some pre-allocating logic in reorderbuffer.c, but left outdated comments in place. Repair. Author: Álvaro Herrera	2018-06-25 15:45:48 -04:00
Michael Paquier	6cb3372411	Address set of issues with errno handling System calls mixed up in error code paths are causing two issues which several code paths have not correctly handled: 1) For write() calls, sometimes the system may return less bytes than what has been written without errno being set. Some paths were careful enough to consider that case, and assumed that errno should be set to ENOSPC, other calls missed that. 2) errno generated by a system call is overwritten by other system calls which may succeed once an error code path is taken, causing what is reported to the user to be incorrect. This patch uses the brute-force approach of correcting all those code paths. Some refactoring could happen in the future, but this is let as future work, which is not targeted for back-branches anyway. Author: Michael Paquier Reviewed-by: Ashutosh Sharma Discussion: https://postgr.es/m/20180622061535.GD5215@paquier.xyz	2018-06-25 11:19:05 +09:00
Peter Eisentraut	8a07ebb3c1	Convert debug message from ereport to elog	2018-06-12 11:33:39 -04:00
Michael Paquier	f8795d2ec8	Fix oversight from `9e149c8` with spin-lock handling Calling an external function while a pin-lock is held is a bad idea as those are designed to be short-lived. The stress of a first commit into a large git history may contribute to that. Reported-by: Andres Freund Discussion: https://postgr.es/m/20180611164952.vmxdpdpirdtkdsz6@alap3.anarazel.de	2018-06-12 06:52:34 +09:00
Michael Paquier	f731cfa94c	Fix a couple of bugs with replication slot advancing feature A review of the code has showed up a couple of issues fixed by this commit: - Physical slots have been using the confirmed LSN position as a start comparison point which is always 0/0, instead use the restart LSN position (logical slots need to use the confirmed LSN position, which was correct). - The actual slot update was incorrect for both physical and logical slots. Physical slots need to use their restart_lsn as base comparison point (confirmed_flush was used because of previous point), and logical slots need to begin reading WAL from restart_lsn (confirmed_flush was used as well), while confirmed_flush is compiled depending on the decoding context and record read, and is the LSN position returned back to the caller. - Never return 0/0 if a slot cannot be advanced. This way, if a slot is advanced while the activity is idle, then the same position is returned to the caller over and over without raising an error. Instead return the LSN the slot has been advanced to. With repetitive calls, the same position is returned hence caller can directly monitor the difference in progress in bytes by doing simply LSN difference calculations, which should be monotonic. Note that as the slot is owned by the backend advancing it, then the read of those fields is fine lock-less, while updates need to happen while the slot mutex is held, so fix that on the way as well. Other locks for in-memory data of replication slots have been already fixed previously. Some of those issues have been pointed out by Petr and Simon during the patch, while I noticed some of them after looking at the code. This also visibly takes of a recently-discovered bug causing assertion failures which can be triggered by a two-step slot forwarding which first advanced the slot to a WAL page boundary and secondly advanced it to the latest position, say 'FF/FFFFFFF' to make sure that the newest LSN is used as forward point. It would have been nice to drop a test for that, but the set of operators working on pg_lsn limits it, so this is left for a future exercise. Author: Michael Paquier Reviewed-by: Petr Jelinek, Simon Riggs Discussion: https://postgr.es/m/CANP8+jLyS=X-CAk59BJnsxKQfjwrmKicHQykyn52Qj-Q=9GLCw@mail.gmail.com Discussion: https://www.postgresql.org/message-id/2840048a-1184-417a-9da8-3299d207a1d7%40postgrespro.ru	2018-06-11 09:26:13 +09:00
Michael Paquier	9e149c847f	Fix and document lock handling for in-memory replication slot data While debugging issues on HEAD for the new slot forwarding feature of Postgres 11, some monitoring of the code surrounding in-memory slot data has proved that the lock handling may cause inconsistent data to be read by read-only callers of slot functions, particularly pg_get_replication_slots() which fetches data for the system view pg_replication_slots, or modules looking directly at slot information. The code paths involved in those problems concern logical decoding initialization (down to 9.4) and WAL reservation for slots (new as of 10). A set of comments documenting all the lock handlings, particularly the dependency with LW locks for slots and the in_use flag as well as the internal mutex lock is added, based on a suggested by Simon Riggs. Some of the fixed code exists down to 9.4 where WAL decoding has been introduced, but as those race conditions are really unlikely going to happen as those concern code paths for slot and decoding creation, just fix the problem on HEAD. Author: Michael Paquier Discussion: https://postgr.es/m/20180528085747.GA27845@paquier.xyz	2018-06-10 19:39:26 +09:00
Tom Lane	b2328bf62b	Fix some assorted compiler warnings on Windows. Don't overflow the result type of constant expressions. Don't negate unsigned types. Define HAVE_STDBOOL_H for Visual C++ 2013 and later. Thomas Munro Reviewed-By: Michael Paquier and Tom Lane Discussion: https://postgr.es/m/CAEepm%3D3%3DTDYEXUEcHpEx%2BTwc31wo7PA0oBAiNt6sWmq93MW02A%40mail.gmail.com	2018-05-01 19:38:26 -04:00
Peter Eisentraut	92e1583b43	Don't do logical replication of TRUNCATE of zero tables When due to publication configuration, a TRUNCATE change ends up with zero tables to be published, don't send the message out, just skip it. It's not wrong, but obviously useless overhead.	2018-04-30 13:49:20 -04:00
Tom Lane	bdf46af748	Post-feature-freeze pgindent run. Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us	2018-04-26 14:47:16 -04:00
Peter Eisentraut	df044026fc	Fix typo in logical truncate replication This could result in some misbehavior in a cascading replication setup.	2018-04-23 13:38:22 -04:00

... 19 20 21 22 23 ...

1736 Commits