1
0
mirror of https://github.com/postgres/postgres.git synced 2025-12-07 12:02:30 +03:00
Commit Graph

62 Commits

Author SHA1 Message Date
Alvaro Herrera
f49a80c481 Fix "base" snapshot handling in logical decoding
Two closely related bugs are fixed.  First, xmin of logical slots was
advanced too early.  During xl_running_xacts processing, xmin of the
slot was set to the oldest running xid in the record, but that's wrong:
actually, snapshots which will be used for not-yet-replayed transactions
might consider older txns as running too, so we need to keep xmin back
for them.  The problem wasn't noticed earlier because DDL which allows
to delete tuple (set xmax) while some another not-yet-committed
transaction looks at it is pretty rare, if not unique: e.g. all forms of
ALTER TABLE which change schema acquire ACCESS EXCLUSIVE lock
conflicting with any inserts. The included test case (test_decoding's
oldest_xmin) uses ALTER of a composite type, which doesn't have such
interlocking.

To deal with this, we must be able to quickly retrieve oldest xmin
(oldest running xid among all assigned snapshots) from ReorderBuffer. To
fix, add another list of ReorderBufferTXNs to the reorderbuffer, where
transactions are sorted by base-snapshot-LSN.  This is slightly
different from the existing (sorted by first-LSN) list, because a
transaction can have an earlier LSN but a later Xmin, if its first
record does not obtain an xmin (eg. xl_xact_assignment).  Note this new
list doesn't fully replace the existing txn list: we still need that one
to prevent WAL recycling.

The second issue concerns SnapBuilder snapshots and subtransactions.
SnapBuildDistributeNewCatalogSnapshot never assigned a snapshot to a
transaction that is known to be a subtxn, which is good in the common
case that the top-level transaction already has one (no point in doing
so), but a bug otherwise.  To fix, arrange to transfer the snapshot from
the subtxn to its top-level txn as soon as the kinship gets known.
test_decoding's snapshot_transfer verifies this.

Also, fix a minor memory leak: refcount of toplevel's old base snapshot
was not decremented when the snapshot is transferred from child.

Liberally sprinkle code comments, and rewrite a few existing ones.  This
part is my (Álvaro's) contribution to this commit, as I had to write all
those comments in order to understand the existing code and Arseny's
patch.

Reported-by: Arseny Sher <a.sher@postgrespro.ru>
Diagnosed-by: Arseny Sher <a.sher@postgrespro.ru>
Co-authored-by: Arseny Sher <a.sher@postgrespro.ru>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/87lgdyz1wj.fsf@ars-thinkpad
2018-06-26 16:48:10 -04:00
Michael Paquier
6cb3372411 Address set of issues with errno handling
System calls mixed up in error code paths are causing two issues which
several code paths have not correctly handled:
1) For write() calls, sometimes the system may return less bytes than
what has been written without errno being set.  Some paths were careful
enough to consider that case, and assumed that errno should be set to
ENOSPC, other calls missed that.
2) errno generated by a system call is overwritten by other system calls
which may succeed once an error code path is taken, causing what is
reported to the user to be incorrect.

This patch uses the brute-force approach of correcting all those code
paths.  Some refactoring could happen in the future, but this is let as
future work, which is not targeted for back-branches anyway.

Author: Michael Paquier
Reviewed-by: Ashutosh Sharma
Discussion: https://postgr.es/m/20180622061535.GD5215@paquier.xyz
2018-06-25 11:19:05 +09:00
Peter Eisentraut
9e945f8626 Fix Latin spelling
"c.f." should be "cf.".
2018-01-11 08:32:01 -05:00
Bruce Momjian
9d4649ca49 Update copyright for 2018
Backpatch-through: certain files through 9.3
2018-01-02 23:30:12 -05:00
Peter Eisentraut
0c5803b450 Refactor new file permission handling
The file handling functions from fd.c were called with a diverse mix of
notations for the file permissions when they were opening new files.
Almost all files created by the server should have the same permissions
set.  So change the API so that e.g. OpenTransientFile() automatically
uses the standard permissions set, and OpenTransientFilePerm() is a new
function that takes an explicit permissions set for the few cases where
it is needed.  This also saves an unnecessary argument for call sites
that are just opening an existing file.

While we're reviewing these APIs, get rid of the FileName typedef and
use the standard const char * for the file name and mode_t for the file
mode.  This makes these functions match other file handling functions
and removes an unnecessary layer of mysteriousness.  We can also get rid
of a few casts that way.

Author: David Steele <david@pgmasters.net>
2017-09-23 10:16:18 -04:00
Tom Lane
21d304dfed Final pgindent + perltidy run for v10. 2017-08-14 17:29:33 -04:00
Alvaro Herrera
2336f84284 Reword comment for clarity
Reported by Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoB+ycZ2z-4Ye=6MfQ_r0aV5r6cvVPw4kOyPdp6bHqQoBQ@mail.gmail.com
2017-08-12 23:26:35 -04:00
Andres Freund
5af4456a56 Fix thinko introduced in 2bef06d516 et al.
The callers for GetOldestSafeDecodingTransactionId() all inverted the
argument for the argument introduced in 2bef06d516. Luckily this
appears to be inconsequential for the moment, as we wait for
concurrent in-progress transaction when assembling a
snapshot. Additionally this could only make a difference when adding a
second logical slot, because only a pre-existing slot could cause an
issue by lowering the returned xid dangerously much.

Reported-By: Antonin Houska
Discussion: https://postgr.es/m/32704.1496993134@localhost
Backport: 9.4-, where 2bef06d516 was backpatched to.
2017-08-06 14:20:55 -07:00
Tom Lane
382ceffdf7 Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.

By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis.  However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent.  That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.

This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:35:54 -04:00
Tom Lane
c7b8998ebb Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.

Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code.  The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there.  BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs.  So the
net result is that in about half the cases, such comments are placed
one tab stop left of before.  This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.

Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.

This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.

Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 15:19:25 -04:00
Andres Freund
6c2003f8a1 Don't force-assign transaction id when exporting a snapshot.
Previously we required every exported transaction to have an xid
assigned. That was used to check that the exporting transaction is
still running, which in turn is needed to guarantee that that
necessary rows haven't been removed in between exporting and importing
the snapshot.

The exported xid caused unnecessary problems with logical decoding,
because slot creation has to wait for all concurrent xid to finish,
which in turn serializes concurrent slot creation.   It also
prohibited snapshots to be exported on hot-standby replicas.

Instead export the virtual transactionid, which avoids the unnecessary
serialization and the inability to export snapshots on standbys. This
changes the file name of the exported snapshot, but since we never
documented what that one means, that seems ok.

Author: Petr Jelinek, slightly editorialized by me
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/f598b4b8-8cd7-0d54-0939-adda763d8c34@2ndquadrant.com
2017-06-14 11:57:21 -07:00
Bruce Momjian
a6fd7b7a5f Post-PG 10 beta1 pgindent run
perltidy run not included.
2017-05-17 16:31:56 -04:00
Tom Lane
c079673dcb Preventive maintenance in advance of pgindent run.
Reformat various places in which pgindent will make a mess, and
fix a few small violations of coding style that I happened to notice
while perusing the diffs from a pgindent dry run.

There is one actual bug fix here: the need-to-enlarge-the-buffer code
path in icu_convert_case was obviously broken.  Perhaps it's unreachable
in our usage?  Or maybe this is just sadly undertested.
2017-05-16 20:36:35 -04:00
Andres Freund
524dbc1433 Avoid superfluous work for commits during logical slot creation.
Before 955a684e04 logical decoding snapshot maintenance needed to
cope with transactions it might not have seen in their entirety. For
such transactions we'd to assume they modified the catalog (could have
happened before we were watching), and thus a new snapshot had to be
built, and distributed to concurrently running transactions.

That's problematic because building a new snapshot isn't that cheap ,
especially as the the array of committed transactions needs to be
sorted.  When creating a slot on a server with a lot of transactions,
this could make logical slot creation infeasibly expensive.

After 955a684e04 there's no need to deal with transaction that
aren't guaranteed to be fully observable.  That allows to avoid
building snapshots for transactions that haven't modified catalog,
even before reaching consistency.

While this isn't necessarily a bugfix, slot creation being impossible
in some production workloads, is severe enough to warrant
backpatching.

Author: Andres Freund, based on a quite different patch from Petr Jelinek
Analyzed-By: Petr Jelinek
Reviewed-By: Petr Jelinek
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
Backpatch: 9.4-, where logical decoding has been introduced
2017-05-13 15:06:40 -07:00
Andres Freund
955a684e04 Fix race condition leading to hanging logical slot creation.
The snapshot assembly during the creation of logical slots relied
waiting for transactions in xl_running_xacts to end, by checking for
their commit/abort records.  Unfortunately, despite locking, it is
possible to see an xl_running_xact record listing transactions as
ready, that have already WAL-logged an commit/abort record, as the
locking just prevents the ProcArray to be adjusted, and the commit
record has to be logged first.

That lead to either delayed or hanging snapshot creation, because
snapbuild.c would wait "forever" to see commit/abort records for some
transactions.  That hang resolved only if a xl_running_xacts record
without any running transactions happened to be logged, far from
certain on a busy server.

It's impractical to prevent that via more heavyweight locking, the
likelihood of deadlocks and significantly increased contention would
be too big.

Instead change the initial snapshot creation to be solely based on
tracking the oldest running transaction via
xl_running_xacts->oldestRunningXid - that actually ends up
significantly simplifying the code.  That has two disadvantages:
1) Because we cannot fully "trust" the contents of xl_running_xacts,
   we cannot use it to build the initial snapshot.  Instead we have to
   wait twice for all running transactions to finish.
2) Previously a slot, unless the race occurred, could be created when
   the all transaction perceived as running based on commit/abort
   records, now we have to wait for the next xl_running_xacts record.
To address that, trigger logging new xl_running_xacts record from
within snapbuild.c exactly when necessary.

Unfortunately snabuild.c's SnapBuild is stored on disk, one of the
stupider ideas of a certain Mr Freund, so we can't change it in a
minor release.  As this is going to be backpatched, we have to hack
around a bit to keep on-disk compatibility.  A later commit will
rejigger that on master.

Author: Andres Freund, based on a quite different patch from Petr Jelinek
Analyzed-By: Petr Jelinek
Reviewed-By: Petr Jelinek
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
Backpatch: 9.4-, where logical decoding has been introduced
2017-05-13 14:21:00 -07:00
Andres Freund
56e19d938d Don't use on-disk snapshots for exported logical decoding snapshot.
Logical decoding stores historical snapshots on disk, so that logical
decoding can restart without having to reconstruct a snapshot from
scratch (for which the resources are not guaranteed to be present
anymore).  These serialized snapshots were also used when creating a
new slot via the walsender interface, which can export a "full"
snapshot (i.e. one that can read all tables, not just catalog ones).

The problem is that the serialized snapshots are only useful for
catalogs and not for normal user tables.  Thus the use of such a
serialized snapshot could result in an inconsistent snapshot being
exported, which could lead to queries returning wrong data.  This
would only happen if logical slots are created while another logical
slot already exists.

Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com
Backport: 9.4, where logical decoding was introduced.
2017-04-27 15:29:15 -07:00
Andres Freund
2bef06d516 Preserve required !catalog tuples while computing initial decoding snapshot.
The logical decoding machinery already preserved all the required
catalog tuples, which is sufficient in the course of normal logical
decoding, but did not guarantee that non-catalog tuples were preserved
during computation of the initial snapshot when creating a slot over
the replication protocol.

This could cause a corrupted initial snapshot being exported.  The
time window for issues is usually not terribly large, but on a busy
server it's perfectly possible to it hit it.  Ongoing decoding is not
affected by this bug.

To avoid increased overhead for the SQL API, only retain additional
tuples when a logical slot is being created over the replication
protocol.  To do so this commit changes the signature of
CreateInitDecodingContext(), but it seems unlikely that it's being
used in an extension, so that's probably ok.

In a drive-by fix, fix handling of
ReplicationSlotsComputeRequiredXmin's already_locked argument, which
should only apply to ProcArrayLock, not ReplicationSlotControlLock.

Reported-By: Erik Rijkers
Analyzed-By: Petr Jelinek
Author: Petr Jelinek, heavily editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/9a897b86-46e1-9915-ee4c-da02e4ff6a95@2ndquadrant.com
Backport: 9.4, where logical decoding was introduced.
2017-04-27 13:13:36 -07:00
Peter Eisentraut
6275f5d28a Fix new warnings from GCC 7
This addresses the new warning types -Wformat-truncation
-Wformat-overflow that are part of -Wall, via -Wformat, in GCC 7.
2017-04-17 13:59:46 -04:00
Tom Lane
5459cfd3ad Fix typos in logical replication support for initial data copy.
Fix an incorrect assert condition (noted by Coverity), and spell the new
name of the function correctly.  Typos introduced in commit 7c4f52409.

Michael Paquier
2017-03-26 17:44:35 -04:00
Peter Eisentraut
7c4f52409a Logical replication support for initial data copy
Add functionality for a new subscription to copy the initial data in the
tables and then sync with the ongoing apply process.

For the copying, add a new internal COPY option to have the COPY source
data provided by a callback function.  The initial data copy works on
the subscriber by receiving COPY data from the publisher and then
providing it locally into a COPY that writes to the destination table.

A WAL receiver can now execute full SQL commands.  This is used here to
obtain information about tables and publications.

Several new options were added to CREATE and ALTER SUBSCRIPTION to
control whether and when initial table syncing happens.

Change pg_dump option --no-create-subscription-slots to
--no-subscription-connect and use the new CREATE SUBSCRIPTION
... NOCONNECT option for that.

Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Tested-by: Erik Rijkers <er@xs4all.nl>
2017-03-23 08:55:37 -04:00
Robert Haas
249cf070e3 Create and use wait events for read, write, and fsync operations.
Previous commits, notably 53be0b1add and
6f3bd98ebf, made it possible to see from
pg_stat_activity when a backend was stuck waiting for another backend,
but it's also fairly common for a backend to be stuck waiting for an
I/O.  Add wait events for those operations, too.

Rushabh Lathia, with further hacking by me.  Reviewed and tested by
Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed.

Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
2017-03-18 07:43:01 -04:00
Andres Freund
61d0c320b5 Improve grammar / fix typos in snapbuild.c.
Author: Erik Rijkers
Discussion: https://postgr.es/m/797c6c4496a1ae49cc69e90aa768bac2@xs4all.nl
2017-03-14 17:04:36 -07:00
Magnus Hagander
1bfebffe81 Fix typo in comment
Masahiko Sawada
2017-03-13 12:10:54 +01:00
Tom Lane
9e3755ecb2 Remove useless duplicate inclusions of system header files.
c.h #includes a number of core libc header files, such as <stdio.h>.
There's no point in re-including these after having read postgres.h,
postgres_fe.h, or c.h; so remove code that did so.

While at it, also fix some places that were ignoring our standard pattern
of "include postgres[_fe].h, then system header files, then other Postgres
header files".  While there's not any great magic in doing it that way
rather than system headers last, it's silly to have just a few files
deviating from the general pattern.  (But I didn't attempt to enforce this
globally, only in files I was touching anyway.)

I'd be the first to say that this is mostly compulsive neatnik-ism,
but over time it might save enough compile cycles to be useful.
2017-02-25 16:12:55 -05:00
Robert Haas
85c11324ca Rename user-facing tools with "xlog" in the name to say "wal".
This means pg_receivexlog because pg_receivewal, pg_resetxlog
becomes pg_resetwal, and pg_xlogdump becomes pg_waldump.
2017-02-09 16:23:46 -05:00
Heikki Linnakangas
181bdb90ba Fix typos in comments.
Backpatch to all supported versions, where applicable, to make backpatching
of future fixes go more smoothly.

Josh Soref

Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com
2017-02-06 11:33:58 +02:00
Bruce Momjian
1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
Tom Lane
ea268cdc9a Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters.  While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea.  Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.

While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.

In passing, change TopMemoryContext to use the default allocation
parameters.  Formerly it could only be extended 8K at a time.  That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there.  There seems no good reason
not to let it use growing blocks like most other contexts.

Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain.  The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.

Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 17:50:38 -04:00
Magnus Hagander
55d57359f2 Fix typos in comments and debug message
Antonin Houska
2016-07-18 18:46:57 +02:00
Simon Riggs
3fe3511d05 Generic Messages for Logical Decoding
API and mechanism to allow generic messages to be inserted into WAL that are
intended to be read by logical decoding plugins. This commit adds an optional
new callback to the logical decoding API.

Messages are either text or bytea. Messages can be transactional, or not, and
are identified by a prefix to allow multiple concurrent decoding plugins.

(Not to be confused with Generic WAL records, which are intended to allow crash
recovery of extensible objects.)

Author: Petr Jelinek and Andres Freund
Reviewers: Artur Zakirov, Tomas Vondra, Simon Riggs
Discussion: 5685F999.6010202@2ndquadrant.com
2016-04-06 10:05:41 +01:00
Andres Freund
d9e903f3cb logical decoding: Tell reorderbuffer about all xids.
Logical decoding's reorderbuffer keeps transactions in an LSN ordered
list for efficiency. To make that's efficiently possible upper-level
xids are forced to be logged before nested subtransaction xids.  That
only works though if these records are all looked at: Unfortunately we
didn't do so for e.g. row level locks, which are otherwise uninteresting
for logical decoding.

This could lead to errors like:
"ERROR: subxact logged without previous toplevel record".

It's not sufficient to just look at row locking records, the xid could
appear first due to a lot of other types of records (which will trigger
the transaction to be marked logged with MarkCurrentTransactionIdLoggedIfAny).
So invent infrastructure to tell reorderbuffer about xids seen, when
they'd otherwise not pass through reorderbuffer.c.

Reported-By: Jarred Ward
Bug: #13844
Discussion: 20160105033249.1087.66040@wrigleys.postgresql.org
Backpatch: 9.4, where logical decoding was added
2016-03-05 18:02:20 -08:00
Magnus Hagander
e51ab85cd9 Fix typos in comments
Author: Michael Paquier
2016-02-01 11:43:48 +01:00
Bruce Momjian
ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
Peter Eisentraut
a8d585c091 Message style improvements
Message style, plurals, quoting, spelling, consistency with similar
messages
2015-10-28 20:38:36 -04:00
Andres Freund
e95126cf04 Don't use function definitions looking like old-style ones.
This fixes a bunch of somewhat pedantic warnings with new
compilers. Since by far the majority of other functions definitions use
the (void) style it just seems to be consistent to do so as well in the
remaining few places.
2015-08-15 17:25:00 +02:00
Bruce Momjian
807b9e0dff pgindent run for 9.5 2015-05-23 21:35:49 -04:00
Heikki Linnakangas
fa60fb63e5 Fix more typos in comments.
Patch by CharSyam, plus a few more I spotted with grep.
2015-05-20 19:45:43 +03:00
Heikki Linnakangas
4fc72cc7bb Collection of typo fixes.
Use "a" and "an" correctly, mostly in comments. Two error messages were
also fixed (they were just elogs, so no translation work required). Two
function comments in pg_proc.h were also fixed. Etsuro Fujita reported one
of these, but I found a lot more with grep.

Also fix a few other typos spotted while grepping for the a/an typos.
For example, "consists out of ..." -> "consists of ...". Plus a "though"/
"through" mixup reported by Euler Taveira.

Many of these typos were in old code, which would be nice to backpatch to
make future backpatching easier. But much of the code was new, and I didn't
feel like crafting separate patches for each branch. So no backpatching.
2015-05-20 16:56:22 +03:00
Magnus Hagander
3b075e9d7b Fix typos in comments
Dmitriy Olshevskiy
2015-05-17 14:58:04 +02:00
Andres Freund
6aab1f45ac Fix various typos and grammar errors in comments.
Author: Dmitriy Olshevskiy
Discussion: 553D00A6.4090205@bk.ru
2015-04-26 18:42:31 +02:00
Heikki Linnakangas
e2999abcd1 Fix assertion failure in logical decoding.
Logical decoding set SnapshotData's regd_count field to avoid the
snapshot manager from prematurely freeing snapshots that are generated
by the decoding system. That was always an abuse of the field, as it was
never supposed to be used outside the snapshot manager. Commit 94028691
made snapshot manager's tracking of the snapshots smarter, and that scheme
fell apart. The snapshot manager got confused and hit the assertion, when
a snapshot that was marked with regd_count==1 was not found in the heap,
where the snapshot manager tracks registered the snapshots.

To fix, don't abuse the regd_count field like that. Logical decoding still
abuses the active_count field for similar purposes, but that's currently
harmless.

The assertion failure was first reported by Michael Paquier
2015-04-16 21:50:07 +03:00
Heikki Linnakangas
4f700bcd20 Reorganize our CRC source files again.
Now that we use CRC-32C in WAL and the control file, the "traditional" and
"legacy" CRC-32 variants are not used in any frontend programs anymore.
Move the code for those back from src/common to src/backend/utils/hash.

Also move the slicing-by-8 implementation (back) to src/port. This is in
preparation for next patch that will add another implementation that uses
Intel SSE 4.2 instructions to calculate CRC-32C, where available.
2015-04-14 17:03:42 +03:00
Fujii Masao
f8b031bca8 Fix an obsolete reference to SnapshotNow in comment.
Peter Geoghegan
2015-03-04 12:25:48 +09:00
Bruce Momjian
4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Heikki Linnakangas
2c03216d83 Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.

There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.

This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.

For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.

The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.

Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 18:46:41 +02:00
Peter Eisentraut
a15d387c22 Improve logical decoding log messages
suggestions from Robert Haas
2014-11-13 20:44:34 -05:00
Andres Freund
5a2c184058 Fix xmin/xmax horizon computation during logical decoding initialization.
When building the initial historic catalog snapshot there were
scenarios where snapbuild.c would use incorrect xmin/xmax values when
starting from a xl_running_xacts record. The values used were always a
bit suspect, but happened to be correct in the easy to test
cases. Notably the values used when the the initial snapshot was
computed while no other transactions were running were correct.

This is likely to be the cause of the occasional buildfarm failures on
animals markhor and tick; but it's quite possible to reproduce
problems without CLOBBER_CACHE_ALWAYS.

Backpatch to 9.4, where logical decoding was introduced.
2014-11-13 20:34:30 +01:00
Andres Freund
ec5896aed3 Fix several weaknesses in slot and logical replication on-disk serialization.
Heikki noticed in 544E23C0.8090605@vmware.com that slot.c and
snapbuild.c were missing the FIN_CRC32 call when computing/checking
checksums of on disk files. That doesn't lower the the error detection
capabilities of the checksum, but is inconsistent with other usages.

In a followup mail Heikki also noticed that, contrary to a comment,
the 'version' and 'length' struct fields of replication slot's on disk
data where not covered by the checksum. That's not likely to lead to
actually missed corruption as those fields are cross checked with the
expected version and the actual file length. But it's wrong
nonetheless.

As fixing these issues makes existing on disk files unreadable, bump
the expected versions of on disk files for both slots and logical
decoding historic catalog snapshots.  This means that loading old
files will fail with
ERROR: "replication slot file ... has unsupported version 1"
and
ERROR: "snapbuild state file ... has unsupported version 1 instead of
2" respectively. Given the low likelihood of anybody already using
these new features in a production setup that seems acceptable.

Fixing these issues made me notice that there's no regression test
covering the loading of historic snapshot from disk - so add one.

Backpatch to 9.4 where these features were introduced.
2014-11-12 18:52:49 +01:00
Peter Eisentraut
8339f33d68 Message improvements 2014-11-11 20:02:30 -05:00
Heikki Linnakangas
5028f22f6e Switch to CRC-32C in WAL and other places.
The old algorithm was found to not be the usual CRC-32 algorithm, used by
Ethernet et al. We were using a non-reflected lookup table with code meant
for a reflected lookup table. That's a strange combination that AFAICS does
not correspond to any bit-wise CRC calculation, which makes it difficult to
reason about its properties. Although it has worked well in practice, seems
safer to use a well-known algorithm.

Since we're changing the algorithm anyway, we might as well choose a
different polynomial. The Castagnoli polynomial has better error-correcting
properties than the traditional CRC-32 polynomial, even if we had
implemented it correctly. Another reason for picking that is that some new
CPUs have hardware support for calculating CRC-32C, but not CRC-32, let
alone our strange variant of it. This patch doesn't add any support for such
hardware, but a future patch could now do that.

The old algorithm is kept around for tsquery and pg_trgm, which use the
values in indexes that need to remain compatible so that pg_upgrade works.
While we're at it, share the old lookup table for CRC-32 calculation
between hstore, ltree and core. They all use the same table, so might as
well.
2014-11-04 11:39:48 +02:00