postgres

mirror of https://github.com/postgres/postgres.git synced 2025-12-07 12:02:30 +03:00

Author	SHA1	Message	Date
Alvaro Herrera	f49a80c481	Fix "base" snapshot handling in logical decoding Two closely related bugs are fixed. First, xmin of logical slots was advanced too early. During xl_running_xacts processing, xmin of the slot was set to the oldest running xid in the record, but that's wrong: actually, snapshots which will be used for not-yet-replayed transactions might consider older txns as running too, so we need to keep xmin back for them. The problem wasn't noticed earlier because DDL which allows to delete tuple (set xmax) while some another not-yet-committed transaction looks at it is pretty rare, if not unique: e.g. all forms of ALTER TABLE which change schema acquire ACCESS EXCLUSIVE lock conflicting with any inserts. The included test case (test_decoding's oldest_xmin) uses ALTER of a composite type, which doesn't have such interlocking. To deal with this, we must be able to quickly retrieve oldest xmin (oldest running xid among all assigned snapshots) from ReorderBuffer. To fix, add another list of ReorderBufferTXNs to the reorderbuffer, where transactions are sorted by base-snapshot-LSN. This is slightly different from the existing (sorted by first-LSN) list, because a transaction can have an earlier LSN but a later Xmin, if its first record does not obtain an xmin (eg. xl_xact_assignment). Note this new list doesn't fully replace the existing txn list: we still need that one to prevent WAL recycling. The second issue concerns SnapBuilder snapshots and subtransactions. SnapBuildDistributeNewCatalogSnapshot never assigned a snapshot to a transaction that is known to be a subtxn, which is good in the common case that the top-level transaction already has one (no point in doing so), but a bug otherwise. To fix, arrange to transfer the snapshot from the subtxn to its top-level txn as soon as the kinship gets known. test_decoding's snapshot_transfer verifies this. Also, fix a minor memory leak: refcount of toplevel's old base snapshot was not decremented when the snapshot is transferred from child. Liberally sprinkle code comments, and rewrite a few existing ones. This part is my (Álvaro's) contribution to this commit, as I had to write all those comments in order to understand the existing code and Arseny's patch. Reported-by: Arseny Sher <a.sher@postgrespro.ru> Diagnosed-by: Arseny Sher <a.sher@postgrespro.ru> Co-authored-by: Arseny Sher <a.sher@postgrespro.ru> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Antonin Houska <ah@cybertec.at> Discussion: https://postgr.es/m/87lgdyz1wj.fsf@ars-thinkpad	2018-06-26 16:48:10 -04:00
Michael Paquier	6cb3372411	Address set of issues with errno handling System calls mixed up in error code paths are causing two issues which several code paths have not correctly handled: 1) For write() calls, sometimes the system may return less bytes than what has been written without errno being set. Some paths were careful enough to consider that case, and assumed that errno should be set to ENOSPC, other calls missed that. 2) errno generated by a system call is overwritten by other system calls which may succeed once an error code path is taken, causing what is reported to the user to be incorrect. This patch uses the brute-force approach of correcting all those code paths. Some refactoring could happen in the future, but this is let as future work, which is not targeted for back-branches anyway. Author: Michael Paquier Reviewed-by: Ashutosh Sharma Discussion: https://postgr.es/m/20180622061535.GD5215@paquier.xyz	2018-06-25 11:19:05 +09:00
Peter Eisentraut	9e945f8626	Fix Latin spelling "c.f." should be "cf.".	2018-01-11 08:32:01 -05:00
Bruce Momjian	9d4649ca49	Update copyright for 2018 Backpatch-through: certain files through 9.3	2018-01-02 23:30:12 -05:00
Peter Eisentraut	0c5803b450	Refactor new file permission handling The file handling functions from fd.c were called with a diverse mix of notations for the file permissions when they were opening new files. Almost all files created by the server should have the same permissions set. So change the API so that e.g. OpenTransientFile() automatically uses the standard permissions set, and OpenTransientFilePerm() is a new function that takes an explicit permissions set for the few cases where it is needed. This also saves an unnecessary argument for call sites that are just opening an existing file. While we're reviewing these APIs, get rid of the FileName typedef and use the standard const char * for the file name and mode_t for the file mode. This makes these functions match other file handling functions and removes an unnecessary layer of mysteriousness. We can also get rid of a few casts that way. Author: David Steele <david@pgmasters.net>	2017-09-23 10:16:18 -04:00
Tom Lane	21d304dfed	Final pgindent + perltidy run for v10.	2017-08-14 17:29:33 -04:00
Alvaro Herrera	2336f84284	Reword comment for clarity Reported by Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoB+ycZ2z-4Ye=6MfQ_r0aV5r6cvVPw4kOyPdp6bHqQoBQ@mail.gmail.com	2017-08-12 23:26:35 -04:00
Andres Freund	5af4456a56	Fix thinko introduced in `2bef06d516` et al. The callers for GetOldestSafeDecodingTransactionId() all inverted the argument for the argument introduced in `2bef06d516`. Luckily this appears to be inconsequential for the moment, as we wait for concurrent in-progress transaction when assembling a snapshot. Additionally this could only make a difference when adding a second logical slot, because only a pre-existing slot could cause an issue by lowering the returned xid dangerously much. Reported-By: Antonin Houska Discussion: https://postgr.es/m/32704.1496993134@localhost Backport: 9.4-, where `2bef06d516` was backpatched to.	2017-08-06 14:20:55 -07:00
Tom Lane	382ceffdf7	Phase 3 of pgindent updates. Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us	2017-06-21 15:35:54 -04:00
Tom Lane	c7b8998ebb	Phase 2 of pgindent updates. Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit `e3860ffa4d` wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us	2017-06-21 15:19:25 -04:00
Andres Freund	6c2003f8a1	Don't force-assign transaction id when exporting a snapshot. Previously we required every exported transaction to have an xid assigned. That was used to check that the exporting transaction is still running, which in turn is needed to guarantee that that necessary rows haven't been removed in between exporting and importing the snapshot. The exported xid caused unnecessary problems with logical decoding, because slot creation has to wait for all concurrent xid to finish, which in turn serializes concurrent slot creation. It also prohibited snapshots to be exported on hot-standby replicas. Instead export the virtual transactionid, which avoids the unnecessary serialization and the inability to export snapshots on standbys. This changes the file name of the exported snapshot, but since we never documented what that one means, that seems ok. Author: Petr Jelinek, slightly editorialized by me Reviewed-By: Andres Freund Discussion: https://postgr.es/m/f598b4b8-8cd7-0d54-0939-adda763d8c34@2ndquadrant.com	2017-06-14 11:57:21 -07:00
Bruce Momjian	a6fd7b7a5f	Post-PG 10 beta1 pgindent run perltidy run not included.	2017-05-17 16:31:56 -04:00
Tom Lane	c079673dcb	Preventive maintenance in advance of pgindent run. Reformat various places in which pgindent will make a mess, and fix a few small violations of coding style that I happened to notice while perusing the diffs from a pgindent dry run. There is one actual bug fix here: the need-to-enlarge-the-buffer code path in icu_convert_case was obviously broken. Perhaps it's unreachable in our usage? Or maybe this is just sadly undertested.	2017-05-16 20:36:35 -04:00
Andres Freund	524dbc1433	Avoid superfluous work for commits during logical slot creation. Before `955a684e04` logical decoding snapshot maintenance needed to cope with transactions it might not have seen in their entirety. For such transactions we'd to assume they modified the catalog (could have happened before we were watching), and thus a new snapshot had to be built, and distributed to concurrently running transactions. That's problematic because building a new snapshot isn't that cheap , especially as the the array of committed transactions needs to be sorted. When creating a slot on a server with a lot of transactions, this could make logical slot creation infeasibly expensive. After `955a684e04` there's no need to deal with transaction that aren't guaranteed to be fully observable. That allows to avoid building snapshots for transactions that haven't modified catalog, even before reaching consistency. While this isn't necessarily a bugfix, slot creation being impossible in some production workloads, is severe enough to warrant backpatching. Author: Andres Freund, based on a quite different patch from Petr Jelinek Analyzed-By: Petr Jelinek Reviewed-By: Petr Jelinek Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com Backpatch: 9.4-, where logical decoding has been introduced	2017-05-13 15:06:40 -07:00
Andres Freund	955a684e04	Fix race condition leading to hanging logical slot creation. The snapshot assembly during the creation of logical slots relied waiting for transactions in xl_running_xacts to end, by checking for their commit/abort records. Unfortunately, despite locking, it is possible to see an xl_running_xact record listing transactions as ready, that have already WAL-logged an commit/abort record, as the locking just prevents the ProcArray to be adjusted, and the commit record has to be logged first. That lead to either delayed or hanging snapshot creation, because snapbuild.c would wait "forever" to see commit/abort records for some transactions. That hang resolved only if a xl_running_xacts record without any running transactions happened to be logged, far from certain on a busy server. It's impractical to prevent that via more heavyweight locking, the likelihood of deadlocks and significantly increased contention would be too big. Instead change the initial snapshot creation to be solely based on tracking the oldest running transaction via xl_running_xacts->oldestRunningXid - that actually ends up significantly simplifying the code. That has two disadvantages: 1) Because we cannot fully "trust" the contents of xl_running_xacts, we cannot use it to build the initial snapshot. Instead we have to wait twice for all running transactions to finish. 2) Previously a slot, unless the race occurred, could be created when the all transaction perceived as running based on commit/abort records, now we have to wait for the next xl_running_xacts record. To address that, trigger logging new xl_running_xacts record from within snapbuild.c exactly when necessary. Unfortunately snabuild.c's SnapBuild is stored on disk, one of the stupider ideas of a certain Mr Freund, so we can't change it in a minor release. As this is going to be backpatched, we have to hack around a bit to keep on-disk compatibility. A later commit will rejigger that on master. Author: Andres Freund, based on a quite different patch from Petr Jelinek Analyzed-By: Petr Jelinek Reviewed-By: Petr Jelinek Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com Backpatch: 9.4-, where logical decoding has been introduced	2017-05-13 14:21:00 -07:00
Andres Freund	56e19d938d	Don't use on-disk snapshots for exported logical decoding snapshot. Logical decoding stores historical snapshots on disk, so that logical decoding can restart without having to reconstruct a snapshot from scratch (for which the resources are not guaranteed to be present anymore). These serialized snapshots were also used when creating a new slot via the walsender interface, which can export a "full" snapshot (i.e. one that can read all tables, not just catalog ones). The problem is that the serialized snapshots are only useful for catalogs and not for normal user tables. Thus the use of such a serialized snapshot could result in an inconsistent snapshot being exported, which could lead to queries returning wrong data. This would only happen if logical slots are created while another logical slot already exists. Author: Petr Jelinek Reviewed-By: Andres Freund Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com Backport: 9.4, where logical decoding was introduced.	2017-04-27 15:29:15 -07:00
Andres Freund	2bef06d516	Preserve required !catalog tuples while computing initial decoding snapshot. The logical decoding machinery already preserved all the required catalog tuples, which is sufficient in the course of normal logical decoding, but did not guarantee that non-catalog tuples were preserved during computation of the initial snapshot when creating a slot over the replication protocol. This could cause a corrupted initial snapshot being exported. The time window for issues is usually not terribly large, but on a busy server it's perfectly possible to it hit it. Ongoing decoding is not affected by this bug. To avoid increased overhead for the SQL API, only retain additional tuples when a logical slot is being created over the replication protocol. To do so this commit changes the signature of CreateInitDecodingContext(), but it seems unlikely that it's being used in an extension, so that's probably ok. In a drive-by fix, fix handling of ReplicationSlotsComputeRequiredXmin's already_locked argument, which should only apply to ProcArrayLock, not ReplicationSlotControlLock. Reported-By: Erik Rijkers Analyzed-By: Petr Jelinek Author: Petr Jelinek, heavily editorialized by Andres Freund Reviewed-By: Andres Freund Discussion: https://postgr.es/m/9a897b86-46e1-9915-ee4c-da02e4ff6a95@2ndquadrant.com Backport: 9.4, where logical decoding was introduced.	2017-04-27 13:13:36 -07:00
Peter Eisentraut	6275f5d28a	Fix new warnings from GCC 7 This addresses the new warning types -Wformat-truncation -Wformat-overflow that are part of -Wall, via -Wformat, in GCC 7.	2017-04-17 13:59:46 -04:00
Tom Lane	5459cfd3ad	Fix typos in logical replication support for initial data copy. Fix an incorrect assert condition (noted by Coverity), and spell the new name of the function correctly. Typos introduced in commit `7c4f52409`. Michael Paquier	2017-03-26 17:44:35 -04:00
Peter Eisentraut	7c4f52409a	Logical replication support for initial data copy Add functionality for a new subscription to copy the initial data in the tables and then sync with the ongoing apply process. For the copying, add a new internal COPY option to have the COPY source data provided by a callback function. The initial data copy works on the subscriber by receiving COPY data from the publisher and then providing it locally into a COPY that writes to the destination table. A WAL receiver can now execute full SQL commands. This is used here to obtain information about tables and publications. Several new options were added to CREATE and ALTER SUBSCRIPTION to control whether and when initial table syncing happens. Change pg_dump option --no-create-subscription-slots to --no-subscription-connect and use the new CREATE SUBSCRIPTION ... NOCONNECT option for that. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Tested-by: Erik Rijkers <er@xs4all.nl>	2017-03-23 08:55:37 -04:00
Robert Haas	249cf070e3	Create and use wait events for read, write, and fsync operations. Previous commits, notably `53be0b1add` and `6f3bd98ebf`, made it possible to see from pg_stat_activity when a backend was stuck waiting for another backend, but it's also fairly common for a backend to be stuck waiting for an I/O. Add wait events for those operations, too. Rushabh Lathia, with further hacking by me. Reviewed and tested by Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed. Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com	2017-03-18 07:43:01 -04:00
Andres Freund	61d0c320b5	Improve grammar / fix typos in snapbuild.c. Author: Erik Rijkers Discussion: https://postgr.es/m/797c6c4496a1ae49cc69e90aa768bac2@xs4all.nl	2017-03-14 17:04:36 -07:00
Magnus Hagander	1bfebffe81	Fix typo in comment Masahiko Sawada	2017-03-13 12:10:54 +01:00
Tom Lane	9e3755ecb2	Remove useless duplicate inclusions of system header files. c.h #includes a number of core libc header files, such as <stdio.h>. There's no point in re-including these after having read postgres.h, postgres_fe.h, or c.h; so remove code that did so. While at it, also fix some places that were ignoring our standard pattern of "include postgres[_fe].h, then system header files, then other Postgres header files". While there's not any great magic in doing it that way rather than system headers last, it's silly to have just a few files deviating from the general pattern. (But I didn't attempt to enforce this globally, only in files I was touching anyway.) I'd be the first to say that this is mostly compulsive neatnik-ism, but over time it might save enough compile cycles to be useful.	2017-02-25 16:12:55 -05:00
Robert Haas	85c11324ca	Rename user-facing tools with "xlog" in the name to say "wal". This means pg_receivexlog because pg_receivewal, pg_resetxlog becomes pg_resetwal, and pg_xlogdump becomes pg_waldump.	2017-02-09 16:23:46 -05:00
Heikki Linnakangas	181bdb90ba	Fix typos in comments. Backpatch to all supported versions, where applicable, to make backpatching of future fixes go more smoothly. Josh Soref Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com	2017-02-06 11:33:58 +02:00
Bruce Momjian	1d25779284	Update copyright via script for 2017	2017-01-03 13:48:53 -05:00
Tom Lane	ea268cdc9a	Add macros to make AllocSetContextCreate() calls simpler and safer. I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls had typos in the context-sizing parameters. While none of these led to especially significant problems, they did create minor inefficiencies, and it's now clear that expecting people to copy-and-paste those calls accurately is not a great idea. Let's reduce the risk of future errors by introducing single macros that encapsulate the common use-cases. Three such macros are enough to cover all but two special-purpose contexts; those two calls can be left as-is, I think. While this patch doesn't in itself improve matters for third-party extensions, it doesn't break anything for them either, and they can gradually adopt the simplified notation over time. In passing, change TopMemoryContext to use the default allocation parameters. Formerly it could only be extended 8K at a time. That was probably reasonable when this code was written; but nowadays we create many more contexts than we did then, so that it's not unusual to have a couple hundred K in TopMemoryContext, even without considering various dubious code that sticks other things there. There seems no good reason not to let it use growing blocks like most other contexts. Back-patch to 9.6, mostly because that's still close enough to HEAD that it's easy to do so, and keeping the branches in sync can be expected to avoid some future back-patching pain. The bugs fixed by these changes don't seem to be significant enough to justify fixing them further back. Discussion: <21072.1472321324@sss.pgh.pa.us>	2016-08-27 17:50:38 -04:00
Magnus Hagander	55d57359f2	Fix typos in comments and debug message Antonin Houska	2016-07-18 18:46:57 +02:00
Simon Riggs	3fe3511d05	Generic Messages for Logical Decoding API and mechanism to allow generic messages to be inserted into WAL that are intended to be read by logical decoding plugins. This commit adds an optional new callback to the logical decoding API. Messages are either text or bytea. Messages can be transactional, or not, and are identified by a prefix to allow multiple concurrent decoding plugins. (Not to be confused with Generic WAL records, which are intended to allow crash recovery of extensible objects.) Author: Petr Jelinek and Andres Freund Reviewers: Artur Zakirov, Tomas Vondra, Simon Riggs Discussion: 5685F999.6010202@2ndquadrant.com	2016-04-06 10:05:41 +01:00
Andres Freund	d9e903f3cb	logical decoding: Tell reorderbuffer about all xids. Logical decoding's reorderbuffer keeps transactions in an LSN ordered list for efficiency. To make that's efficiently possible upper-level xids are forced to be logged before nested subtransaction xids. That only works though if these records are all looked at: Unfortunately we didn't do so for e.g. row level locks, which are otherwise uninteresting for logical decoding. This could lead to errors like: "ERROR: subxact logged without previous toplevel record". It's not sufficient to just look at row locking records, the xid could appear first due to a lot of other types of records (which will trigger the transaction to be marked logged with MarkCurrentTransactionIdLoggedIfAny). So invent infrastructure to tell reorderbuffer about xids seen, when they'd otherwise not pass through reorderbuffer.c. Reported-By: Jarred Ward Bug: #13844 Discussion: 20160105033249.1087.66040@wrigleys.postgresql.org Backpatch: 9.4, where logical decoding was added	2016-03-05 18:02:20 -08:00
Magnus Hagander	e51ab85cd9	Fix typos in comments Author: Michael Paquier	2016-02-01 11:43:48 +01:00
Bruce Momjian	ee94300446	Update copyright for 2016 Backpatch certain files through 9.1	2016-01-02 13:33:40 -05:00
Peter Eisentraut	a8d585c091	Message style improvements Message style, plurals, quoting, spelling, consistency with similar messages	2015-10-28 20:38:36 -04:00
Andres Freund	e95126cf04	Don't use function definitions looking like old-style ones. This fixes a bunch of somewhat pedantic warnings with new compilers. Since by far the majority of other functions definitions use the (void) style it just seems to be consistent to do so as well in the remaining few places.	2015-08-15 17:25:00 +02:00
Bruce Momjian	807b9e0dff	pgindent run for 9.5	2015-05-23 21:35:49 -04:00
Heikki Linnakangas	fa60fb63e5	Fix more typos in comments. Patch by CharSyam, plus a few more I spotted with grep.	2015-05-20 19:45:43 +03:00
Heikki Linnakangas	4fc72cc7bb	Collection of typo fixes. Use "a" and "an" correctly, mostly in comments. Two error messages were also fixed (they were just elogs, so no translation work required). Two function comments in pg_proc.h were also fixed. Etsuro Fujita reported one of these, but I found a lot more with grep. Also fix a few other typos spotted while grepping for the a/an typos. For example, "consists out of ..." -> "consists of ...". Plus a "though"/ "through" mixup reported by Euler Taveira. Many of these typos were in old code, which would be nice to backpatch to make future backpatching easier. But much of the code was new, and I didn't feel like crafting separate patches for each branch. So no backpatching.	2015-05-20 16:56:22 +03:00
Magnus Hagander	3b075e9d7b	Fix typos in comments Dmitriy Olshevskiy	2015-05-17 14:58:04 +02:00
Andres Freund	6aab1f45ac	Fix various typos and grammar errors in comments. Author: Dmitriy Olshevskiy Discussion: 553D00A6.4090205@bk.ru	2015-04-26 18:42:31 +02:00
Heikki Linnakangas	e2999abcd1	Fix assertion failure in logical decoding. Logical decoding set SnapshotData's regd_count field to avoid the snapshot manager from prematurely freeing snapshots that are generated by the decoding system. That was always an abuse of the field, as it was never supposed to be used outside the snapshot manager. Commit `94028691` made snapshot manager's tracking of the snapshots smarter, and that scheme fell apart. The snapshot manager got confused and hit the assertion, when a snapshot that was marked with regd_count==1 was not found in the heap, where the snapshot manager tracks registered the snapshots. To fix, don't abuse the regd_count field like that. Logical decoding still abuses the active_count field for similar purposes, but that's currently harmless. The assertion failure was first reported by Michael Paquier	2015-04-16 21:50:07 +03:00
Heikki Linnakangas	4f700bcd20	Reorganize our CRC source files again. Now that we use CRC-32C in WAL and the control file, the "traditional" and "legacy" CRC-32 variants are not used in any frontend programs anymore. Move the code for those back from src/common to src/backend/utils/hash. Also move the slicing-by-8 implementation (back) to src/port. This is in preparation for next patch that will add another implementation that uses Intel SSE 4.2 instructions to calculate CRC-32C, where available.	2015-04-14 17:03:42 +03:00
Fujii Masao	f8b031bca8	Fix an obsolete reference to SnapshotNow in comment. Peter Geoghegan	2015-03-04 12:25:48 +09:00
Bruce Momjian	4baaf863ec	Update copyright for 2015 Backpatch certain files through 9.0	2015-01-06 11:43:47 -05:00
Heikki Linnakangas	2c03216d83	Revamp the WAL record format. Each WAL record now carries information about the modified relation and block(s) in a standardized format. That makes it easier to write tools that need that information, like pg_rewind, prefetching the blocks to speed up recovery, etc. There's a whole new API for building WAL records, replacing the XLogRecData chains used previously. The new API consists of XLogRegister* functions, which are called for each buffer and chunk of data that is added to the record. The new API also gives more control over when a full-page image is written, by passing flags to the XLogRegisterBuffer function. This also simplifies the XLogReadBufferForRedo() calls. The function can dig the relation and block number from the WAL record, so they no longer need to be passed as arguments. For the convenience of redo routines, XLogReader now disects each WAL record after reading it, copying the main data part and the per-block data into MAXALIGNed buffers. The data chunks are not aligned within the WAL record, but the redo routines can assume that the pointers returned by XLogRecGet* functions are. Redo routines are now passed the XLogReaderState, which contains the record in the already-disected format, instead of the plain XLogRecord. The new record format also makes the fixed size XLogRecord header smaller, by removing the xl_len field. The length of the "main data" portion is now stored at the end of the WAL record, and there's a separate header after XLogRecord for it. The alignment padding at the end of XLogRecord is also removed. This compansates for the fact that the new format would otherwise be more bulky than the old format. Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera, Fujii Masao.	2014-11-20 18:46:41 +02:00
Peter Eisentraut	a15d387c22	Improve logical decoding log messages suggestions from Robert Haas	2014-11-13 20:44:34 -05:00
Andres Freund	5a2c184058	Fix xmin/xmax horizon computation during logical decoding initialization. When building the initial historic catalog snapshot there were scenarios where snapbuild.c would use incorrect xmin/xmax values when starting from a xl_running_xacts record. The values used were always a bit suspect, but happened to be correct in the easy to test cases. Notably the values used when the the initial snapshot was computed while no other transactions were running were correct. This is likely to be the cause of the occasional buildfarm failures on animals markhor and tick; but it's quite possible to reproduce problems without CLOBBER_CACHE_ALWAYS. Backpatch to 9.4, where logical decoding was introduced.	2014-11-13 20:34:30 +01:00
Andres Freund	ec5896aed3	Fix several weaknesses in slot and logical replication on-disk serialization. Heikki noticed in 544E23C0.8090605@vmware.com that slot.c and snapbuild.c were missing the FIN_CRC32 call when computing/checking checksums of on disk files. That doesn't lower the the error detection capabilities of the checksum, but is inconsistent with other usages. In a followup mail Heikki also noticed that, contrary to a comment, the 'version' and 'length' struct fields of replication slot's on disk data where not covered by the checksum. That's not likely to lead to actually missed corruption as those fields are cross checked with the expected version and the actual file length. But it's wrong nonetheless. As fixing these issues makes existing on disk files unreadable, bump the expected versions of on disk files for both slots and logical decoding historic catalog snapshots. This means that loading old files will fail with ERROR: "replication slot file ... has unsupported version 1" and ERROR: "snapbuild state file ... has unsupported version 1 instead of 2" respectively. Given the low likelihood of anybody already using these new features in a production setup that seems acceptable. Fixing these issues made me notice that there's no regression test covering the loading of historic snapshot from disk - so add one. Backpatch to 9.4 where these features were introduced.	2014-11-12 18:52:49 +01:00
Peter Eisentraut	8339f33d68	Message improvements	2014-11-11 20:02:30 -05:00
Heikki Linnakangas	5028f22f6e	Switch to CRC-32C in WAL and other places. The old algorithm was found to not be the usual CRC-32 algorithm, used by Ethernet et al. We were using a non-reflected lookup table with code meant for a reflected lookup table. That's a strange combination that AFAICS does not correspond to any bit-wise CRC calculation, which makes it difficult to reason about its properties. Although it has worked well in practice, seems safer to use a well-known algorithm. Since we're changing the algorithm anyway, we might as well choose a different polynomial. The Castagnoli polynomial has better error-correcting properties than the traditional CRC-32 polynomial, even if we had implemented it correctly. Another reason for picking that is that some new CPUs have hardware support for calculating CRC-32C, but not CRC-32, let alone our strange variant of it. This patch doesn't add any support for such hardware, but a future patch could now do that. The old algorithm is kept around for tsquery and pg_trgm, which use the values in indexes that need to remain compatible so that pg_upgrade works. While we're at it, share the old lookup table for CRC-32 calculation between hstore, ltree and core. They all use the same table, so might as well.	2014-11-04 11:39:48 +02:00

1 2

62 Commits