postgres

mirror of https://github.com/postgres/postgres.git synced 2025-11-19 13:42:17 +03:00

Author	SHA1	Message	Date
Andres Freund	1a7a43672b	Don't use !! but != 0/NULL to force boolean evaluation. I introduced several uses of !! to force bit arithmetic to be boolean, but per discussion the project prefers != 0/NULL. Discussion: CA+TgmoZP5KakLGP6B4vUjgMBUW0woq_dJYi0paOz-My0Hwt_vQ@mail.gmail.com	2016-03-27 18:10:19 +02:00
Peter Eisentraut	b555ed8102	Merge wal_level "archive" and "hot_standby" into new name "replica" The distinction between "archive" and "hot_standby" existed only because at the time "hot_standby" was added, there was some uncertainty about stability. This is now a long time ago. We would like to move forward with simplifying the replication configuration, but this distinction is in the way, because a primary server cannot tell (without asking a standby or predicting the future) which one of these would be the appropriate level. Pick a new name for the combined setting to make it clearer that it covers all (non-logical) backup and replication uses. The old values are still accepted but are converted internally. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: David Steele <david@pgmasters.net>	2016-03-18 23:56:03 +01:00
Robert Haas	53be0b1add	Provide much better wait information in pg_stat_activity. When a process is waiting for a heavyweight lock, we will now indicate the type of heavyweight lock for which it is waiting. Also, you can now see when a process is waiting for a lightweight lock - in which case we will indicate the individual lock name or the tranche, as appropriate - or for a buffer pin. Amit Kapila, Ildus Kurbangaliev, reviewed by me. Lots of helpful discussion and suggestions by many others, including Alexander Korotkov, Vladimir Borodin, and many others.	2016-03-10 12:44:09 -05:00
Andres Freund	1d4a0ab19a	Avoid unlikely data-loss scenarios due to rename() without fsync. Renaming a file using rename(2) is not guaranteed to be durable in face of crashes. Use the previously added durable_rename()/durable_link_or_rename() in various places where we previously just renamed files. Most of the changed call sites are arguably not critical, but it seems better to err on the side of too much durability. The most prominent known case where the previously missing fsyncs could cause data loss is crashes at the end of a checkpoint. After the actual checkpoint has been performed, old WAL files are recycled. When they're filled, their contents are fdatasynced, but we did not fsync the containing directory. An OS/hardware crash in an unfortunate moment could then end up leaving that file with its old name, but new content; WAL replay would thus not replay it. Reported-By: Tomas Vondra Author: Michael Paquier, Tomas Vondra, Andres Freund Discussion: 56583BDD.9060302@2ndquadrant.com Backpatch: All supported branches	2016-03-09 18:53:53 -08:00
Andres Freund	606e0f9841	Introduce durable_rename() and durable_link_or_rename(). Renaming a file using rename(2) is not guaranteed to be durable in face of crashes; especially on filesystems like xfs and ext4 when mounted with data=writeback. To be certain that a rename() atomically replaces the previous file contents in the face of crashes and different filesystems, one has to fsync the old filename, rename the file, fsync the new filename, fsync the containing directory. This sequence is not generally adhered to currently; which exposes us to data loss risks. To avoid having to repeat this arduous sequence, introduce durable_rename(), which wraps all that. Also add durable_link_or_rename(). Several places use link() (with a fallback to rename()) to rename a file, trying to avoid replacing the target file out of paranoia. Some of those rename sequences need to be durable as well. There seems little reason extend several copies of the same logic, so centralize the link() callers. This commit does not yet make use of the new functions; they're used in a followup commit. Author: Michael Paquier, Andres Freund Discussion: 56583BDD.9060302@2ndquadrant.com Backpatch: All supported branches	2016-03-09 18:53:53 -08:00
Peter Eisentraut	a40814d7aa	Handle invalid libpq sockets in more places Also, make error messages consistent. From: Michael Paquier <michael.paquier@gmail.com>	2016-03-08 21:10:33 -05:00
Andres Freund	b63bea5fd3	Further improvements to `c8f621c43`. Coverity and inspection for the issue addressed in `fd45d16f` found some questionable code. Specifically coverity noticed that the wrong length was added in ReorderBufferSerializeChange() - without immediate negative consequences as the variable isn't used afterwards. During code-review and testing I noticed that a bit of space was wasted when allocating tuple bufs in several places. Thirdly, the debug memset()s in ReorderBufferGetTupleBuf() reduce the error checking valgrind can do. Backpatch: 9.4, like `c8f621c43`.	2016-03-07 14:24:03 -08:00
Andres Freund	fd45d16f62	Fix wrong allocation size in `c8f621c43`. In `c8f621c43` I forgot to account for MAXALIGN when allocating a new tuplebuf in ReorderBufferGetTupleBuf(). That happens to currently not cause active problems on a number of platforms because the affected pointer is already aligned, but others, like ppc and hppa, trigger this in the regression test, due to a debug memset clearing memory. Fix that. Backpatch: 9.4, like the previous commit.	2016-03-06 16:27:20 -08:00
Andres Freund	c8f621c43a	logical decoding: Fix handling of large old tuples with replica identity full. When decoding the old version of an UPDATE or DELETE change, and if that tuple was bigger than MaxHeapTupleSize, we either Assert'ed out, or failed in more subtle ways in non-assert builds. Normally individual tuples aren't bigger than MaxHeapTupleSize, with big datums toasted. But that's not the case for the old version of a tuple for logical decoding; the replica identity is logged as one piece. With the default replica identity btree limits that to small tuples, but that's not the case for FULL. Change the tuple buffer infrastructure to separate allocate over-large tuples, instead of always going through the slab cache. This unfortunately requires changing the ReorderBufferTupleBuf definition, we need to store the allocated size someplace. To avoid requiring output plugins to recompile, don't store HeapTupleHeaderData directly after HeapTupleData, but point to it via t_data; that leaves rooms for the allocated size. As there's no reason for an output plugin to look at ReorderBufferTupleBuf->t_data.header, remove the field. It was just a minor convenience having it directly accessible. Reported-By: Adam Dratwiński Discussion: CAKg6ypLd7773AOX4DiOGRwQk1TVOQKhNwjYiVjJnpq8Wo+i62Q@mail.gmail.com	2016-03-05 18:02:20 -08:00
Andres Freund	0bda14d54c	logical decoding: old/newtuple in spooled UPDATE changes was switched around. Somehow I managed to flip the order of restoring old & new tuples when de-spooling a change in a large transaction from disk. This happens to only take effect when a change is spooled to disk which has old/new versions of the tuple. That only is the case for UPDATEs where he primary key changed or where replica identity is changed to FULL. The tests didn't catch this because either spooled updates, or updates that changed primary keys, were tested; not both at the same time. Found while adding tests for the following commit. Backpatch: 9.4, where logical decoding was added	2016-03-05 18:02:20 -08:00
Andres Freund	d9e903f3cb	logical decoding: Tell reorderbuffer about all xids. Logical decoding's reorderbuffer keeps transactions in an LSN ordered list for efficiency. To make that's efficiently possible upper-level xids are forced to be logged before nested subtransaction xids. That only works though if these records are all looked at: Unfortunately we didn't do so for e.g. row level locks, which are otherwise uninteresting for logical decoding. This could lead to errors like: "ERROR: subxact logged without previous toplevel record". It's not sufficient to just look at row locking records, the xid could appear first due to a lot of other types of records (which will trigger the transaction to be marked logged with MarkCurrentTransactionIdLoggedIfAny). So invent infrastructure to tell reorderbuffer about xids seen, when they'd otherwise not pass through reorderbuffer.c. Reported-By: Jarred Ward Bug: #13844 Discussion: 20160105033249.1087.66040@wrigleys.postgresql.org Backpatch: 9.4, where logical decoding was added	2016-03-05 18:02:20 -08:00
Andres Freund	7c17aac69d	logical decoding: fix decoding of a commit's commit time. When adding replication origins in `5aa235042`, I somehow managed to set the timestamp of decoded transactions to InvalidXLogRecptr when decoding one made without a replication origin. Fix that, and the wrong type of the new commit_time variable. This didn't trigger a regression test failure because we explicitly don't show commit timestamps in the regression tests, as they obviously are variable. Add a test that checks that a decoded commit's timestamp is within minutes of NOW() from before the commit. Reported-By: Weiping Qu Diagnosed-By: Artur Zakirov Discussion: 56D4197E.9050706@informatik.uni-kl.de, 56D42918.1010108@postgrespro.ru Backpatch: 9.5, where `5aa235042` originates.	2016-03-02 23:42:21 -08:00
Alvaro Herrera	10b4852215	Fix typos Author: Amit Langote	2016-02-29 18:11:58 -03:00
Peter Eisentraut	18777c38e9	Improve error message about active replication slot The old phrasing was awkward if a replication slot is activated and deactivated repeatedly.	2016-02-17 21:23:28 -05:00
Robert Haas	63461a63f9	Make builtin lwlock tranche names consistent. Previously, we had a mix of styles. Amit Kapila	2016-02-12 08:07:11 -05:00
Robert Haas	e98fd78607	Fix typo in comment. Michael Paquier	2016-02-05 08:11:00 -05:00
Robert Haas	7191ce8bea	Make all built-in lwlock tranche IDs fixed. This makes the values more stable, which seems like a good thing for anybody who needs to look at at them. Alexander Korotkov and Amit Kapila	2016-02-02 06:45:55 -05:00
Magnus Hagander	e51ab85cd9	Fix typos in comments Author: Michael Paquier	2016-02-01 11:43:48 +01:00
Robert Haas	2251179e6a	Migrate replication slot I/O locks into a separate tranche. This is following in a long train of similar changes and for the same reasons - see `b319356f0e` and `fe702a7b3f` inter alia. Author: Amit Kapila Reviewed-by: Alexander Korotkov, Robert Haas	2016-01-29 09:45:38 -05:00
Simon Riggs	422a55a687	Refactor to create generic WAL page read callback Previously we didn’t have a generic WAL page read callback function, surprisingly. Logical decoding has logical_read_local_xlog_page(), which was actually generic, so move that to xlogfunc.c and rename to read_local_xlog_page(). Maintain logical_read_local_xlog_page() so existing callers still work. As requested by Michael Paquier, Alvaro Herrera and Andres Freund	2016-01-20 17:18:58 -08:00
Tom Lane	26d538dc93	Clean up some lack-of-STRICT issues in the core code, too. A scan for missed proisstrict markings in the core code turned up these functions: brin_summarize_new_values pg_stat_reset_single_table_counters pg_stat_reset_single_function_counters pg_create_logical_replication_slot pg_create_physical_replication_slot pg_drop_replication_slot The first three of these take OID, so a null argument will normally look like a zero to them, resulting in "ERROR: could not open relation with OID 0" for brin_summarize_new_values, and no action for the pg_stat_reset_XXX functions. The other three will dump core on a null argument, though this is mitigated by the fact that they won't do so until after checking that the caller is superuser or has rolreplication privilege. In addition, the pg_logical_slot_get/peek[_binary]_changes family was intentionally marked nonstrict, but failed to make nullness checks on all the arguments; so again a null-pointer-dereference crash is possible but only for superusers and rolreplication users. Add the missing ARGISNULL checks to the latter functions, and mark the former functions as strict in pg_proc. Make that change in the back branches too, even though we can't force initdb there, just so that installations initdb'd in future won't have the issue. Since none of these bugs rise to the level of security issues (and indeed the pg_stat_reset_XXX functions hardly misbehave at all), it seems sufficient to do this. In addition, fix some order-of-operations oddities in the slot_get_changes family, mostly cosmetic, but not the part that moves the function's last few operations into the PG_TRY block. As it stood, there was significant risk for an error to exit without clearing historical information from the system caches. The slot_get_changes bugs go back to 9.4 where that code was introduced. Back-patch appropriate subsets of the pg_proc changes into all active branches, as well.	2016-01-09 16:58:32 -05:00
Alvaro Herrera	b1a9bad9e7	pgstat: add WAL receiver status view & SRF This new view provides insight into the state of a running WAL receiver in a HOT standby node. The information returned includes the PID of the WAL receiver process, its status (stopped, starting, streaming, etc), start LSN and TLI, last received LSN and TLI, timestamp of last message send and receipt, latest end-of-WAL LSN and time, and the name of the slot (if any). Access to the detailed data is only granted to superusers; others only get the PID. Author: Michael Paquier Reviewer: Haribabu Kommi	2016-01-07 16:21:19 -03:00
Bruce Momjian	ee94300446	Update copyright for 2016 Backpatch certain files through 9.1	2016-01-02 13:33:40 -05:00
Robert Haas	6e7b335930	Remove duplicate word. Kyotaro Horiguchi	2015-12-18 12:43:52 -05:00
Robert Haas	4496226782	Fix copy-and-paste error in logical decoding callback. This could result in the error context misidentifying where the error actually occurred. Craig Ringer	2015-12-18 12:17:35 -05:00
Magnus Hagander	a91bdf67c4	Consistently set all fields in pg_stat_replication to null instead of 0 Previously the "sent" field would be set to 0 and all other xlog pointers be set to NULL if there were no valid values (such as when in a backup sending walsender).	2015-12-13 16:53:38 +01:00
Magnus Hagander	263c19572b	Properly initialize write, flush and replay locations in walsender slots These would leak random xlog positions if a walsender used for backup would a walsender slot previously used by a replication walsender. In passing also fix a couple of cases where the xlog pointer is directly compared to zero instead of using XLogRecPtrIsInvalid, noted by Michael Paquier.	2015-12-13 16:46:56 +01:00
Tom Lane	00cdd83521	Adopt the GNU convention for handling tar-archive members exceeding 8GB. The POSIX standard for tar headers requires archive member sizes to be printed in octal with at most 11 digits, limiting the representable file size to 8GB. However, GNU tar and apparently most other modern tars support a convention in which oversized values can be stored in base-256, allowing any practical file to be a tar member. Adopt this convention to remove two limitations: * pg_dump with -Ft output format failed if the contents of any one table exceeded 8GB. * pg_basebackup failed if the data directory contained any file exceeding 8GB. (This would be a fatal problem for installations configured with a table segment size of 8GB or more, and it has also been seen to fail when large core dump files exist in the data directory.) File sizes under 8GB are still printed in octal, so that no compatibility issues are created except in cases that would have failed entirely before. In addition, this patch fixes several bugs in the same area: * In 9.3 and later, we'd defined tarCreateHeader's file-size argument as size_t, which meant that on 32-bit machines it would write a corrupt tar header for file sizes between 4GB and 8GB, even though no error was raised. This broke both "pg_dump -Ft" and pg_basebackup for such cases. * pg_restore from a tar archive would fail on tables of size between 4GB and 8GB, on machines where either "size_t" or "unsigned long" is 32 bits. This happened even with an archive file not affected by the previous bug. * pg_basebackup would fail if there were files of size between 4GB and 8GB, even on 64-bit machines. * In 9.3 and later, "pg_basebackup -Ft" failed entirely, for any file size, on 64-bit big-endian machines. In view of these potential data-loss bugs, back-patch to all supported branches, even though removal of the documented 8GB limit might otherwise be considered a new feature rather than a bug fix.	2015-11-21 20:21:31 -05:00
Andres Freund	f3a764b0da	Set replication origin when decoding commit records. By accident the replication origin was not set properly in DecodeCommit(). That's bad because the origin is passed to the output plugins origin filter, and accessible from the output plugin via ReorderBufferTXN->origin_id. Accessing the origin of individual changes worked before the fix, which is why this wasn't notices earlier. Reported-By: Craig Ringer Author: Craig Ringer Discussion: CAMsr+YFhBJLp=qfSz3-J+0P1zLkE8zNXM2otycn20QRMx380gw@mail.gmail.com Backpatch: 9.5, where replication origins where introduced	2015-11-09 00:03:35 +01:00
Peter Eisentraut	a8d585c091	Message style improvements Message style, plurals, quoting, spelling, consistency with similar messages	2015-10-28 20:38:36 -04:00
Alvaro Herrera	0cd836a4e8	Measure string lengths only once Bernd Helmle complained that CreateReplicationSlot() was assigning the same value to the same variable twice, so we could remove one of them. Code inspection reveals that we can actually remove both assignments: according to the author the assignment was there for beauty of the strlen line only, and another possible fix to that is to put the strlen in its own line, so do that. To be consistent within the file, refactor all duplicated strlen() calls, which is what we do elsewhere in the backend anyway. In basebackup.c, snprintf already returns the right length; no need for strlen afterwards. Backpatch to 9.4, where replication slots were introduced, to keep code identical. Some of this is older, but the patch doesn't apply cleanly and it's only of cosmetic value anyway. Discussion: http://www.postgresql.org/message-id/BE2FD71DEA35A2287EA5F018@eje.credativ.lan	2015-10-27 13:20:40 -03:00
Robert Haas	8f6bb851bd	Remove more volatile qualifiers. Prior to commit `0709b7ee72`, access to variables within a spinlock-protected critical section had to be done through a volatile pointer, but that should no longer be necessary. This continues work begun in `df4077cda2` and `6ba4ecbf47`. Thomas Munro and Michael Paquier	2015-10-06 15:45:02 -04:00
Andres Freund	920218cbc0	Improve errhint() about replication slot naming restrictions. The existing hint talked about "may only contain letters", but the actual requirement is more strict: only lower case letters are allowed. Reported-By: Rushabh Lathia Author: Rushabh Lathia Discussion: AGPqQf2x50qcwbYOBKzb4x75sO_V3g81ZsA8+Ji9iN5t_khFhQ@mail.gmail.com Backpatch: 9.4-, where replication slots were added	2015-10-03 15:29:08 +02:00
Alvaro Herrera	17f5831c81	Fix "sesssion" typo It was introduced alongside replication origins, by commit `5aa2350426`, so backpatch to 9.5. Pointed out by Fujii Masao	2015-09-28 19:13:42 -03:00
Andres Freund	c314ead5be	Add ability to reserve WAL upon slot creation via replication protocol. Since `6fcd885` it is possible to immediately reserve WAL when creating a slot via pg_create_physical_replication_slot(). Extend the replication protocol to allow that as well. Although, in contrast to the SQL interface, it is possible to update the reserved location via the replication interface, it is still useful being able to reserve upon creation there. Otherwise the logic in ReplicationSlotReserveWal() has to be repeated in slot employing clients. Author: Michael Paquier Discussion: CAB7nPqT0Wc1W5mdYGeJ_wbutbwNN+3qgrFR64avXaQCiJMGaYA@mail.gmail.com	2015-09-06 13:30:57 +02:00
Heikki Linnakangas	c80b5f66c6	Fix misc typos. Oskari Saarenmaa. Backpatch to stable branches where applicable.	2015-09-05 11:35:49 +03:00
Andres Freund	e95126cf04	Don't use function definitions looking like old-style ones. This fixes a bunch of somewhat pedantic warnings with new compilers. Since by far the majority of other functions definitions use the (void) style it just seems to be consistent to do so as well in the remaining few places.	2015-08-15 17:25:00 +02:00
Andres Freund	a4b059fdde	Remove duplicated assignment in pg_create_physical_replication_slot. Reported-By: Gurjeet Singh	2015-08-12 17:35:50 +02:00
Alvaro Herrera	4901b2f495	Don't include rel.h when relcache.h is sufficient Trivial change to reduce exposure of rel.h.	2015-08-11 13:03:14 -03:00
Andres Freund	6fcd88511f	Allow pg_create_physical_replication_slot() to reserve WAL. When creating a physical slot it's often useful to immediately reserve the current WAL position instead of only doing after the first feedback message arrives. That e.g. allows slots to guarantee that all the WAL for a base backup will be available afterwards. Logical slots already have to reserve WAL during creation, so generalize that logic into being usable for both physical and logical slots. Catversion bump because of the new parameter. Author: Gurjeet Singh Reviewed-By: Andres Freund Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com	2015-08-11 12:34:31 +02:00
Andres Freund	093d0c83c1	Introduce macros determining if a replication slot is physical or logical. These make the code a bit easier to read, and make it easier to add a more explicit notion of a slot's type at some point in the future. Author: Gurjeet Singh Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com	2015-08-11 12:32:48 +02:00
Andres Freund	3b425b7c02	Minor cleanups in slot related code. Fix a bunch of typos, and remove two superflous includes. Author: Gurjeet Singh Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com Backpatch: 9.4	2015-08-11 12:32:48 +02:00
Andres Freund	3f811c2d6f	Add confirmed_flush column to pg_replication_slots. There's no reason not to expose both restart_lsn and confirmed_flush since they have rather distinct meanings. The former is the oldest WAL still required and valid for both physical and logical slots, whereas the latter is the location up to which a logical slot's consumer has confirmed receiving data. Most of the time a slot will require older WAL (i.e. restart_lsn) than the confirmed position (i.e. confirmed_flush_lsn). Author: Marko Tiikkaja, editorialized by me Discussion: 559D110B.1020109@joh.to	2015-08-10 13:28:18 +02:00
Andres Freund	5c4b25acce	Fix copy & paste mistake in pg_get_replication_slots(). XLogRecPtr was compared with InvalidTransactionId instead of InvalidXLogRecPtr. As both are defined to the same value this doesn't cause any actual problems, but it's still wrong. Backpatch: 9.4-master, bug was introduced in 9.4	2015-08-10 13:28:18 +02:00
Andres Freund	18e8613564	Address points made in post-commit review of replication origins. Amit reviewed the replication origins patch and made some good points. Address them. This fixes typos in error messages, docs and comments and adds a missing error check (although in a should-never-happen scenario). Discussion: CAA4eK1JqUBVeWWKwUmBPryFaje4190ug0y-OAUHWQ6tD83V4xg@mail.gmail.com Backpatch: 9.5, where replication origins were introduced.	2015-08-07 15:09:05 +02:00
Andres Freund	a855118be3	Fix debug message output when connecting to a logical slot. Previously the message erroneously printed the same LSN twice as the assignment to the start_lsn variable was before the message. Correct that. Reported-By: Marko Tiikkaja Author: Marko Tiikkaja Backpatch: 9.5, where logical decoding was introduced	2015-08-05 13:26:01 +02:00
Andres Freund	b2f6f749c7	Fix logical decoding bug leading to inefficient reopening of files. When spilling transaction data to disk a simple typo caused the output file to be closed and reopened for every serialized change. That happens to not have a huge impact on linux, which is why it probably wasn't noticed so far, but on windows that appears to trigger actual disk writes after every change. Not fun. The bug fortunately does not have any impact besides speed. A change could end up being in the wrong segment (last instead of next), but since we read all files to the end, that's just ugly, not really problematic. It's not a problem to upgrade, since transaction spill files do not persist across restarts. Bug: #13484 Reported-By: Olivier Gosseaume Discussion: 20150703090217.1190.63940@wrigleys.postgresql.org Backpatch to 9.4, where logical decoding was added.	2015-07-07 13:12:46 +02:00
Peter Eisentraut	385522c7dc	Fix typo	2015-06-10 21:30:17 -04:00
Tom Lane	32f628be74	Fix assorted inconsistencies in our calls of readlink(). Ensure that we null-terminate the result string (one place in pg_rewind). Be paranoid about out-of-range results from readlink() (should not happen, but there is no good reason for some call sites to be careful about it and others not). Consistently use the whole buffer, not sometimes one byte less. Ensure we emit an appropriate errcode() in all cases. Spell the error messages the same way. The only serious bug here is the missing null-termination in pg_rewind, which is new code, so no need for a back-patch. Abhijit Menon-Sen and Tom Lane	2015-05-28 12:17:22 -04:00
Bruce Momjian	807b9e0dff	pgindent run for 9.5	2015-05-23 21:35:49 -04:00

1 2 3 4 5 ...

400 Commits