1
0
mirror of https://github.com/postgres/postgres.git synced 2025-08-28 18:48:04 +03:00
Commit Graph

1304 Commits

Author SHA1 Message Date
Robert Haas
fef2c17807 Properly set relpersistence for fake relcache entries.
This can result in buffers failing to be properly flushed at
checkpoint time, leading to data loss.

Report, diagnosis, and patch by Jeff Davis.
2012-09-14 09:39:10 -04:00
Robert Haas
657face6fe Add missing period to detail message.
Per note from Peter Eisentraut.
2012-08-30 13:27:18 -04:00
Tom Lane
180ce0af33 Fix issues with checks for unsupported transaction states in Hot Standby.
The GUC check hooks for transaction_read_only and transaction_isolation
tried to check RecoveryInProgress(), so as to disallow setting read/write
mode or serializable isolation level (respectively) in hot standby
sessions.  However, GUC check hooks can be called in many situations where
we're not connected to shared memory at all, resulting in a crash in
RecoveryInProgress().  Among other cases, this results in EXEC_BACKEND
builds crashing during child process start if default_transaction_isolation
is serializable, as reported by Heikki Linnakangas.  Protect those calls
by silently allowing any setting when not inside a transaction; which is
okay anyway since these GUCs are always reset at start of transaction.

Also, add a check to GetSerializableTransactionSnapshot() to complain
if we are in hot standby.  We need that check despite the one in
check_XactIsoLevel() because default_transaction_isolation could be
serializable.  We don't want to complain any sooner than this in such
cases, since that would prevent running transactions at all in such a
state; but a transaction can be run, if SET TRANSACTION ISOLATION is done
before setting a snapshot.  Per report some months ago from Robert Haas.

Back-patch to 9.1, since these problems were introduced by the SSI patch.

Kevin Grittner and Tom Lane, with ideas from Heikki Linnakangas
2012-08-24 13:09:17 -04:00
Tom Lane
7a7d970b36 Only allow autovacuum to be auto-canceled by a directly blocked process.
In the original coding of the autovacuum cancel feature, commit
acac68b2bc, an autovacuum process was
considered a target for cancellation if it was found to hard-block any
process examined in the deadlock search.  This patch tightens the test so
that the autovacuum must directly hard-block the current process.  This
should make the behavior more predictable in general, and in particular
it ensures that an autovacuum will not be canceled with less than
deadlock_timeout grace period.  In the old coding, it was possible for an
autovacuum to be canceled almost instantly, given unfortunate timing of two
or more other processes' lock attempts.

This also justifies the logging methodology in the recent commit
d7318d43d891bd63e82dcfc27948113ed7b1db80; without this restriction, that
patch isn't providing enough information to see the connection of the
canceling process to the autovacuum.  Like that one, patch all the way
back.
2012-07-26 14:29:37 -04:00
Robert Haas
a1195a5c26 Log a better message when canceling autovacuum.
The old message was at DEBUG2, so typically it didn't show up in the
log at all.  As a result, in most cases where autovacuum was canceled,
the only information that was logged was the table being vacuumed,
with no indication as to what problem caused the cancel.  Crank up
the level to LOG and add some more details to assist with debugging.

Back-patch all the way, per discussion on pgsql-hackers.
2012-07-26 09:18:32 -04:00
Tom Lane
2f961b1b5f Improve coding around the fsync request queue.
In all branches back to 8.3, this patch fixes a questionable assumption in
CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue that there are
no uninitialized pad bytes in the request queue structs.  This would only
cause trouble if (a) there were such pad bytes, which could happen in 8.4
and up if the compiler makes enum ForkNumber narrower than 32 bits, but
otherwise would require not-currently-planned changes in the widths of
other typedefs; and (b) the kernel has not uniformly initialized the
contents of shared memory to zeroes.  Still, it seems a tad risky, and we
can easily remove any risk by pre-zeroing the request array for ourselves.
In addition to that, we need to establish a coding rule that struct
RelFileNode can't contain any padding bytes, since such structs are copied
into the request array verbatim.  (There are other places that are assuming
this anyway, it turns out.)

In 9.1 and up, the risk was a bit larger because we were also effectively
assuming that struct RelFileNodeBackend contained no pad bytes, and with
fields of different types in there, that would be much easier to break.
However, there is no good reason to ever transmit fsync or delete requests
for temp files to the bgwriter/checkpointer, so we can revert the request
structs to plain RelFileNode, getting rid of the padding risk and saving
some marginal number of bytes and cycles in fsync queue manipulation while
we are at it.  The savings might be more than marginal during deletion of
a temp relation, because the old code transmitted an entirely useless but
nonetheless expensive-to-process ForgetRelationFsync request to the
background process, and also had the background process perform the file
deletion even though that can safely be done immediately.

In addition, make some cleanup of nearby comments and small improvements to
the code in CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue.
2012-07-17 16:57:22 -04:00
Simon Riggs
557433f48a Fix bug in early startup of Hot Standby with subtransactions.
When HS startup is deferred because of overflowed subtransactions, ensure
that we re-initialize KnownAssignedXids for when both existing and incoming
snapshots have non-zero qualifying xids.

Fixes bug #6661 reported by Valentine Gogichashvili.

Analysis and fix by Andres Freund
2012-06-08 17:35:22 +01:00
Tom Lane
1c0e678678 Overdue code review for transaction-level advisory locks patch.
Commit 62c7bd31c8 had assorted problems, most
visibly that it broke PREPARE TRANSACTION in the presence of session-level
advisory locks (which should be ignored by PREPARE), as per a recent
complaint from Stephen Rees.  More abstractly, the patch made the
LockMethodData.transactional flag not merely useless but outright
dangerous, because in point of fact that flag no longer tells you anything
at all about whether a lock is held transactionally.  This fix therefore
removes that flag altogether.  We now rely entirely on the convention
already in use in lock.c that transactional lock holds must be owned by
some ResourceOwner, while session holds are never so owned.  Setting the
locallock struct's owner link to NULL thus denotes a session hold, and
there is no redundant marker for that.

PREPARE TRANSACTION now works again when there are session-level advisory
locks, and it is also able to transfer transactional advisory locks to the
prepared transaction, but for implementation reasons it throws an error if
we hold both types of lock on a single lockable object.  Perhaps it will be
worth improving that someday.

Assorted other minor cleanup and documentation editing, as well.

Back-patch to 9.1, except that in the 9.1 branch I did not remove the
LockMethodData.transactional flag for fear of causing an ABI break for
any external code that might be examining those structs.
2012-05-04 17:43:35 -04:00
Heikki Linnakangas
86073a2b7a Correctly detect SSI conflicts of prepared transactions after crash.
A prepared transaction can get new conflicts in and out after preparing, so
we cannot rely on the in- and out-flags stored in the statefile at prepare-
time. As a quick fix, make the conservative assumption that after a restart,
all prepared transactions are considered to have both in- and out-conflicts.
That can lead to unnecessary rollbacks after a crash, but that shouldn't be
a big problem in practice; you don't want prepared transactions to hang
around for a long time anyway.

Dan Ports
2012-02-29 15:43:43 +02:00
Simon Riggs
8572cc495c Resolve timing issue with logging locks for Hot Standby.
We log AccessExclusiveLocks for replay onto standby nodes,
but because of timing issues on ProcArray it is possible to
log a lock that is still held by a just committed transaction
that is very soon to be removed. To avoid any timing issue we
avoid applying locks made by transactions with InvalidXid.

Simon Riggs, bug report Tom Lane, diagnosis Pavan Deolasee
2012-02-01 09:31:07 +00:00
Heikki Linnakangas
02f377dbe5 Fix corner case in cleanup of transactions using SSI.
When the only remaining active transactions are READ ONLY, we do a "partial
cleanup" of committed transactions because certain types of conflicts
aren't possible anymore. For committed r/w transactions, we release the
SIREAD locks but keep the SERIALIZABLEXACT. However, for committed r/o
transactions, we can go further and release the SERIALIZABLEXACT too. The
problem was with the latter case: we were returning the SERIALIZABLEXACT to
the free list without removing it from the finished list.

The only real change in the patch is the SHMQueueDelete line, but I also
reworked some of the surrounding code to make it obvious that r/o and r/w
transactions are handled differently -- the existing code felt a bit too
clever.

Dan Ports
2012-01-18 17:12:58 +02:00
Robert Haas
f9f0484504 Fix variable confusion in BufferSync().
As noted by Heikki Linnakangas, the previous coding confused the "flags"
variable with the "mask" variable.  The affect of this appears to be that
unlogged buffers would get written out at every checkpoint rather than
only at shutdown time.  Although that's arguably an acceptable failure
mode, I'm back-patching this change, since it seems like a poor idea to
rely on this happening to work.
2012-01-06 08:35:41 -05:00
Tom Lane
a63a7a5b09 Avoid crashing when we have problems unlinking files post-commit.
smgrdounlink takes care to not throw an ERROR if it fails to unlink
something, but that caution was rendered useless by commit
3396000684, which put an smgrexists call in
front of it; smgrexists *does* throw error if anything looks funny, such
as getting a permissions error from trying to open the file.  If that
happens post-commit, you get a PANIC, and what's worse the same logic
appears in the WAL replay code, so the database even fails to restart.

Restore the intended behavior by removing the smgrexists call --- it isn't
accomplishing anything that we can't do better by adjusting mdunlink's
ideas of whether it ought to warn about ENOENT or not.

Per report from Joseph Shraibman of unrecoverable crash after trying to
drop a table whose FSM fork had somehow gotten chmod'd to 000 permissions.
Backpatch to 8.4, where the bogus coding was introduced.
2011-12-20 15:00:41 -05:00
Tom Lane
590ceed6f2 Avoid floating-point underflow while tracking buffer allocation rate.
When the system is idle for awhile after activity, the "smoothed_alloc"
state variable in BgBufferSync converges slowly to zero.  With standard
IEEE float arithmetic this results in several iterations with denormalized
values, which causes kernel traps and annoying log messages on some
poorly-designed platforms.  There's no real need to track such small values
of smoothed_alloc, so we can prevent the kernel traps by forcing it to zero
as soon as it's too small to be interesting for our purposes.  This issue
is purely cosmetic, since the iterations don't happen fast enough for the
kernel traps to pose any meaningful performance problem, but still it seems
worth shutting up the log messages.

The kernel log messages were previously reported by a number of people,
but kudos to Greg Matthews for tracking down exactly where they were coming
from.
2011-11-19 00:35:59 -05:00
Simon Riggs
bf70bf4c71 Derive oldestActiveXid at correct time for Hot Standby.
There was a timing window between when oldestActiveXid was derived
and when it should have been derived that only shows itself under
heavy load. Move code around to ensure correct timing of derivation.
No change to StartupSUBTRANS() code, which is where this failed.

Bug report by Chris Redekop
2011-11-02 08:53:40 +00:00
Simon Riggs
93b915b3d1 Start Hot Standby faster when initial snapshot is incomplete.
If the initial snapshot had overflowed then we can start whenever
the latest snapshot is empty, not overflowed or as we did already,
start when the xmin on primary was higher than xmax of our starting
snapshot, which proves we have full snapshot data.

Bug report by Chris Redekop
2011-11-02 08:46:11 +00:00
Tom Lane
d1d094e4cf Simplify and improve ProcessStandbyHSFeedbackMessage logic.
There's no need to clamp the standby's xmin to be greater than
GetOldestXmin's result; if there were any such need this logic would be
hopelessly inadequate anyway, because it fails to account for
within-database versus cluster-wide values of GetOldestXmin.  So get rid of
that, and just rely on sanity-checking that the xmin is not wrapped around
relative to the nextXid counter.  Also, don't reset the walsender's xmin if
the current feedback xmin is indeed out of range; that just creates more
problems than we already had.  Lastly, don't bother to take the
ProcArrayLock; there's no need to do that to set xmin.

Also improve the comments about this in GetOldestXmin itself.
2011-10-20 19:43:45 -04:00
Peter Eisentraut
90ffdf83d2 Add "Reason code" prefix to internal SSI error messages
This makes it clearer that the error message is perhaps not supposed
to be understood by users, and it also makes it somewhat clearer that
it was not accidentally omitted from translation.

Idea from Heikki Linnakangas, except that we don't mark "Reason code"
for translation at this point, because that would make the
implementation too cumbersome.
2011-08-15 15:24:44 +03:00
Tom Lane
989f530d3f Back-patch assorted latch-related fixes.
Fix a whole bunch of signal handlers that had been hacked to do things that
might change errno, without adding the necessary save/restore logic for
errno.  Also make some minor fixes in unix_latch.c, and clean up bizarre
and unsafe scheme for disowning the process's latch.  While at it, rename
the PGPROC latch field to procLatch for consistency with 9.2.

Issues noted while reviewing a patch by Peter Geoghegan.
2011-08-10 12:20:45 -04:00
Tom Lane
6760a4d402 Documentation improvement and minor code cleanups for the latch facility.
Improve the documentation around weak-memory-ordering risks, and do a pass
of general editorialization on the comments in the latch code.  Make the
Windows latch code more like the Unix latch code where feasible; in
particular provide the same Assert checks in both implementations.
Fix poorly-placed WaitLatch call in syncrep.c.

This patch resolves, for the moment, concerns around weak-memory-ordering
bugs in latch-related code: we have documented the restrictions and checked
that existing calls meet them.  In 9.2 I hope that we will install suitable
memory barrier instructions in SetLatch/ResetLatch, so that their callers
don't need to be quite so careful.
2011-08-09 15:30:51 -04:00
Tom Lane
1318f1ad77 Move CheckRecoveryConflictDeadlock() call to a safer place.
This kluge was inserted in a spot apparently chosen at random: the lock
manager's state is not yet fully set up for the wait, and in particular
LockWaitCancel hasn't been armed by setting lockAwaited, so the ProcLock
will not get cleaned up if the ereport is thrown.  This seems to not cause
any observable problem in trivial test cases, because LockReleaseAll will
silently clean up the debris; but I was able to cause failures with tests
involving subtransactions.

Fixes breakage induced by commit c85c941470.
Back-patch to all affected branches.
2011-08-02 15:16:37 -04:00
Tom Lane
0dd6a09e3d Fix incorrect initialization of ProcGlobal->startupBufferPinWaitBufId.
It was initialized in the wrong place and to the wrong value.  With bad
luck this could result in incorrect query-cancellation failures in hot
standby sessions, should a HS backend be holding pin on buffer number 1
while trying to acquire a lock.
2011-08-02 13:24:00 -04:00
Peter Eisentraut
42221b597e Minor message style adjustment 2011-07-27 23:55:32 +03:00
Peter Eisentraut
e4ce3b7775 Change debug message from ereport to elog 2011-07-19 07:51:17 +03:00
Tom Lane
fbd847a587 Replace errdetail("%s", ...) with errdetail_internal("%s", ...).
There may be some other places where we should use errdetail_internal,
but they'll have to be evaluated case-by-case.  This commit just hits
a bunch of places where invoking gettext is obviously a waste of cycles.
2011-07-16 14:22:35 -04:00
Tom Lane
3b41d490bb Use errdetail_internal() for SSI transaction cancellation details.
Per discussion, these seem too technical to be worth translating.

Kevin Grittner
2011-07-16 14:22:33 -04:00
Heikki Linnakangas
ec194651b8 Fix one overflow and one signedness error, caused by the patch to calculate
OLDSERXID_MAX_PAGE based on BLCKSZ. MSVC compiler warned about these.
2011-07-08 17:33:46 +03:00
Heikki Linnakangas
9b47c67ccd There's a small window wherein a transaction is committed but not yet
on the finished list, and we shouldn't flag it as a potential conflict
if so. We can also skip adding a doomed transaction to the list of
possible conflicts because we know it won't commit.

Dan Ports and Kevin Grittner.
2011-07-08 00:39:40 +03:00
Heikki Linnakangas
fdf8f751e7 SSI has a race condition, where the order of commit sequence numbers of
transactions might not match the order the work done in those transactions
become visible to others. The logic in SSI, however, assumed that it does.
Fix that by having two sequence numbers for each serializable transaction,
one taken before a transaction becomes visible to others, and one after it.
This is easier than trying to make the the transition totally atomic, which
would require holding ProcArrayLock and SerializableXactHashLock at the same
time. By using prepareSeqNo instead of commitSeqNo in a few places where
commit sequence numbers are compared, we can make those comparisons err on
the safe side when we don't know for sure which committed first.

Per analysis by Kevin Grittner and Dan Ports, but this approach to fix it
is different from the original patch.
2011-07-07 23:32:25 +03:00
Robert Haas
a392b52478 Adjust OLDSERXID_MAX_PAGE based on BLCKSZ.
The value when BLCKSZ = 8192 is unchanged, but with larger-than-normal
block sizes we might need to crank things back a bit, as we'll have
more entries per page than normal in that case.

Kevin Grittner
2011-07-07 15:09:17 -04:00
Heikki Linnakangas
046d52f7d3 Fix a bug with SSI and prepared transactions:
If there's a dangerous structure T0 ---> T1 ---> T2, and T2 commits first,
we need to abort something. If T2 commits before both conflicts appear,
then it should be caught by OnConflict_CheckForSerializationFailure. If
both conflicts appear before T2 commits, it should be caught by
PreCommit_CheckForSerializationFailure. But that is actually run when
T2 *prepares*. Fix that in OnConflict_CheckForSerializationFailure, by
treating a prepared T2 as if it committed already.

This is mostly a problem for prepared transactions, which are in prepared
state for some time, but also for regular transactions because they also go
through the prepared state in the SSI code for a short moment when they're
committed.

Kevin Grittner and Dan Ports
2011-07-07 18:13:13 +03:00
Peter Eisentraut
06f04b6dc4 Message style tweaks 2011-07-05 00:17:25 +03:00
Peter Eisentraut
6b23ba1093 Unify spelling of "canceled", "canceling", "cancellation"
We had previously (af26857a27)
established the U.S. spellings as standard.
2011-07-02 23:30:01 +03:00
Tom Lane
1c20ba4f41 Undo overly enthusiastic de-const-ification.
s/const//g wasn't exactly what I was suggesting here ... parameter
declarations of the form "const structtype *param" are good and useful,
so put those occurrences back.  Likewise, avoid casting away the const
in a "const void *" parameter.
2011-06-22 23:05:06 -04:00
Heikki Linnakangas
fbaa7a23e4 Remove pointless const qualifiers from function arguments in the SSI code.
As Tom Lane pointed out, "const Relation foo" doesn't guarantee that you
can't modify the data the "foo" pointer points to. It just means that you
can't change the pointer to point to something else within the function,
which is not very useful.
2011-06-22 12:21:34 +03:00
Tom Lane
0272fd1c00 Minor editing for README-SSI.
Fix some grammatical issues, try to clarify a couple of proofs, make the
terminology more consistent.
2011-06-21 18:01:34 -04:00
Heikki Linnakangas
390c52131b Fix bug in PreCommit_CheckForSerializationFailure. A transaction that has
already been marked as PREPARED cannot be killed. Kill the current
transaction instead.

One of the prepared_xacts regression tests actually hits this bug. I
removed the anomaly from the duplicate-gids test so that it fails in the
intended way, and added a new test to check serialization failures with
a prepared transaction.

Dan Ports
2011-06-21 15:02:32 +03:00
Heikki Linnakangas
0d905db20b Fix bug introduced by recent SSI patch to merge ROLLED_BACK and
MARKED_FOR_DEATH flags into one. We still need the ROLLED_BACK flag to
mark transactions that are in the process of being rolled back. To be
precise, ROLLED_BACK now means that a transaction has already been
discounted from the count of transactions with the oldest xmin, but not
yet removed from the list of active transactions.

Dan Ports
2011-06-21 15:02:26 +03:00
Peter Eisentraut
365c72f205 Capitalization fixes 2011-06-19 00:39:19 +03:00
Heikki Linnakangas
6b339a1ade Update README-SSI. Add a section to describe the "dangerous structure" that
SSI is based on, as well as the optimizations about relative commit times
and read-only transactions. Plus a bunch of other misc fixes and
improvements.

Dan Ports
2011-06-16 21:20:47 +03:00
Heikki Linnakangas
9131bc772b pgindent run of recent SSI changes. Also, remove an unnecessary #include.
Kevin Grittner
2011-06-16 16:17:16 +03:00
Heikki Linnakangas
2b44a2d62d The rolled-back flag on serializable xacts was pointless and redundant with
the marked-for-death flag. It was only set for a fleeting moment while a
transaction was being cleaned up at rollback. All the places that checked
for the rolled-back flag should also check the marked-for-death flag, as
both flags mean that the transaction will roll back. I also renamed the
marked-for-death into "doomed", which is a lot shorter name.
2011-06-15 13:40:24 +03:00
Heikki Linnakangas
ff4e078773 Make non-MVCC snapshots exempt from predicate locking. Scans with non-MVCC
snapshots, like in REINDEX, are basically non-transactional operations. The
DDL operation itself might participate in SSI, but there's separate
functions for that.

Kevin Grittner and Dan Ports, with some changes by me.
2011-06-15 12:12:56 +03:00
Heikki Linnakangas
5de9c165f0 Remove now-unnecessary casts.
Kevin Grittner
2011-06-12 22:51:39 +03:00
Heikki Linnakangas
cb2d158c58 Fix locking while setting flags in MySerializableXact.
Even if a flag is modified only by the backend owning the transaction, it's
not safe to modify it without a lock. Another backend might be setting or
clearing a different flag in the flags field concurrently, and that
operation might be lost because setting or clearing a bit in a word is not
atomic.

Make did-write flag a simple backend-private boolean variable, because it
was only set or tested in the owning backend (except when committing a
prepared transaction, but it's not worthwhile to optimize for the case of a
read-only prepared transaction). This also eliminates the need to add
locking where that flag is set.

Also, set the did-write flag when doing DDL operations like DROP TABLE or
TRUNCATE -- that was missed earlier.
2011-06-10 23:41:10 +03:00
Alvaro Herrera
fba105b109 Use "transient" files for blind writes, take 2
"Blind writes" are a mechanism to push buffers down to disk when
evicting them; since they may belong to different databases than the one
a backend is connected to, the backend does not necessarily have a
relation to link them to, and thus no way to blow them away.  We were
keeping those files open indefinitely, which would cause a problem if
the underlying table was deleted, because the operating system would not
be able to reclaim the disk space used by those files.

To fix, have bufmgr mark such files as transient to smgr; the lower
layer is allowed to close the file descriptor when the current
transaction ends.  We must be careful to have any other access of the
file to remove the transient markings, to prevent unnecessary expensive
system calls when evicting buffers belonging to our own database (which
files we're likely to require again soon.)

This commit fixes a bug in the previous one, which neglected to cleanly
handle the LRU ring that fd.c uses to manage open files, and caused an
unacceptable failure just before beta2 and was thus reverted.
2011-06-10 13:43:02 -04:00
Alvaro Herrera
3d114b63b2 Use a constant sprintf format to silence compiler warning 2011-06-10 13:38:50 -04:00
Heikki Linnakangas
c79c570bd8 Small comment fixes and enhancements. 2011-06-10 17:22:46 +03:00
Alvaro Herrera
9261557eb1 Revert "Use "transient" files for blind writes"
This reverts commit 54d9e8c6c1, which
caused a failure on the buildfarm.  Not a good thing to have just before
a beta release.
2011-06-09 16:41:44 -04:00
Alvaro Herrera
54d9e8c6c1 Use "transient" files for blind writes
"Blind writes" are a mechanism to push buffers down to disk when
evicting them; since they may belong to different databases than the one
a backend is connected to, the backend does not necessarily have a
relation to link them to, and thus no way to blow them away.  We were
keeping those files open indefinitely, which would cause a problem if
the underlying table was deleted, because the operating system would not
be able to reclaim the disk space used by those files.

To fix, have bufmgr mark such files as transient to smgr; the lower
layer is allowed to close the file descriptor when the current
transaction ends.  We must be careful to have any other access of the
file to remove the transient markings, to prevent unnecessary expensive
system calls when evicting buffers belonging to our own database (which
files we're likely to require again soon.)
2011-06-09 16:25:49 -04:00