enabled) and autovacuum is on. Since there will be a steady stream of autovac
worker processes exiting and dropping gmon.out files, allowing them to make
separate subdirectories results in serious bloat; and it seems unlikely that
anyone will care about those profiles anyway. Limit the damage by forcing all
autovac workers to dump in one subdirectory, PGDATA/gprof/avworker/.
Per report from Jrg Beyer and subsequent discussion.
having several of them. Add two more flags: whether the process is
executing an ANALYZE, and whether a vacuum is for Xid wraparound (which
is obviously only set by autovacuum).
Sneakily move the worker's recently-acquired PostAuthDelay to a more useful
place.
later than latestCompletedXid, per Florian Pflug. Also some minor
improvements in the XIDCACHE_DEBUG code --- make sure each call of
TransactionIdIsInProgress is counted one way or another.
For XIDs of our own transaction and subtransactions, it's cheaper to ask
TransactionIdIsCurrentTransactionId() than to look in shared memory.
Also, the xids[] work array is always the same size within any given
process, so malloc it just once instead of doing a palloc/pfree on every
call; aside from being faster this lets us get rid of some goto's, since
we no longer have any end-of-function pfree to do. Both ideas by Heikki.
ReadNewTransactionId from GetSnapshotData --- with a "latestCompletedXid"
variable that is updated during transaction commit or abort. Since
latestCompletedXid is written only in places that had to lock ProcArrayLock
exclusively anyway, and is read only in places that had to lock ProcArrayLock
shared anyway, it adds no new locking requirements to the system despite being
cluster-wide. Moreover, removing ReadNewTransactionId from snapshot
acquisition eliminates the need to take both XidGenLock and ProcArrayLock at
the same time. Since XidGenLock is sometimes held across I/O this can be a
significant win. Some preliminary benchmarking suggested that this patch has
no effect on average throughput but can significantly improve the worst-case
transaction times seen in pgbench. Concept by Florian Pflug, implementation
by Tom Lane.
no need for serialization against snapshot-taking because the xact doesn't
affect anyone else's snapshot anyway. Per discussion. Also, move various
info about the interlocking of transactions and snapshots out of code comments
and into a hopefully-more-cohesive discussion in access/transam/README.
Also, remove a couple of now-obsolete comments about having to force some WAL
to be written to persuade RecordTransactionCommit to do its thing.
that examine fields that could change under them. This is just to make
really sure that when we are fetching a value 'only once', that's what
actually happens. Possibly this is a bug that should be back-patched,
but in the absence of solid evidence that it's needed, I won't bother.
rows will normally never obtain an XID at all. We already did things this way
for subtransactions, but this patch extends the concept to top-level
transactions. In applications where there are lots of short read-only
transactions, this should improve performance noticeably; not so much from
removal of the actual XID-assignments, as from reduction of overhead that's
driven by the rate of XID consumption. We add a concept of a "virtual
transaction ID" so that active transactions can be uniquely identified even
if they don't have a regular XID. This is a much lighter-weight concept:
uniqueness of VXIDs is only guaranteed over the short term, and no on-disk
record is made about them.
Florian Pflug, with some editorialization by Tom.
continue with the schedule. Change current uses of SIGINT to abort a worker
into SIGTERM, which keeps the old behaviour of terminating the process.
Patch from ITAGAKI Takahiro, with some editorializing of my own.
which is the only state in which it's safe to initiate database queries.
It turns out that all but two of the callers thought that's what it meant;
and the other two were using it as a proxy for "will GetTopTransactionId()
return a nonzero XID"? Since it was in fact an unreliable guide to that,
make those two just invoke GetTopTransactionId() always, then deal with a
zero result if they get one.
will exit before failing because of conflicting DB usage. Per discussion,
this seems a good idea to help mask the fact that backend exit takes nonzero
time. Remove a couple of thereby-obsoleted sleeps in contrib and PL
regression test sequences.
are in their commit critical sections via flags in the ProcArray. Checkpoint
can watch the ProcArray to determine when it's safe to proceed. This is
a considerably better solution to the original problem of race conditions
between checkpoint and transaction commit: it speeds up commit, since there's
one less lock to fool with, and it prevents the problem of checkpoint being
delayed indefinitely when there's a constant flow of commits. Heikki, with
some kibitzing from Tom.
pointer" in every Snapshot struct. This allows removal of the case-by-case
tests in HeapTupleSatisfiesVisibility, which should make it a bit faster
(I didn't try any performance tests though). More importantly, we are no
longer violating portable C practices by assuming that small integers are
distinct from all pointer values, and HeapTupleSatisfiesDirty no longer
has a non-reentrant API involving side-effects on a global variable.
There were a couple of places calling HeapTupleSatisfiesXXX routines
directly rather than through the HeapTupleSatisfiesVisibility macro.
Since these places had to be changed anyway, I chose to make them go
through the macro for uniformity.
Along the way I renamed HeapTupleSatisfiesSnapshot to HeapTupleSatisfiesMVCC
to emphasize that it's only used with MVCC-type snapshots. I was sorely
tempted to rename HeapTupleSatisfiesVisibility to HeapTupleSatisfiesSnapshot,
but forebore for the moment to avoid confusion and reduce the likelihood that
this patch breaks some of the pending patches. Might want to reconsider
doing that later.
continuously, and requests vacuum runs of "autovacuum workers" to postmaster.
The workers do the actual vacuum work. This allows for future improvements,
like allowing multiple autovacuum jobs running in parallel.
For now, the code keeps the original behavior of having a single autovac
process at any time by sleeping until the previous worker has finished.
accessing it, like DROP DATABASE. This allows the regression tests to pass
with autovacuum enabled, which open the gates for finally enabling autovacuum
by default.
in PITR scenarios. We now WAL-log the replacement of old XIDs with
FrozenTransactionId, so that such replacement is guaranteed to propagate to
PITR slave databases. Also, rather than relying on hint-bit updates to be
preserved, pg_clog is not truncated until all instances of an XID are known to
have been replaced by FrozenTransactionId. Add new GUC variables and
pg_autovacuum columns to allow management of the freezing policy, so that
users can trade off the size of pg_clog against the amount of freezing work
done. Revise the already-existing code that forces autovacuum of tables
approaching the wraparound point to make it more bulletproof; also, revise the
autovacuum logic so that anti-wraparound vacuuming is done per-table rather
than per-database. initdb forced because of changes in pg_class, pg_database,
and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.
modules; the first try was not usable in EXEC_BACKEND builds (e.g.,
Windows). Instead, just provide some entry points to increase the
allocation requests during postmaster start, and provide a dedicated
LWLock that can be used to synchronize allocation operations performed
by backends. Per discussion with Marc Munro.
to performance. (A wholesale effort to get rid of strncpy should be
undertaken sometime, but not during beta.) This commit also fixes dynahash.c
to correctly truncate overlength string keys for hashtables, so that its
callers don't have to anymore.
PGPROC array into snapshots, and use this information to avoid visits
to pg_subtrans in HeapTupleSatisfiesSnapshot. This appears to solve
the pg_subtrans-related context swap storm problem that's been reported
by several people for 8.1. While at it, modify GetSnapshotData to not
take an exclusive lock on ProcArrayLock, as closer analysis shows that
shared lock is always sufficient.
Itagaki Takahiro and Tom Lane
vacuums. This allows a OLTP-like system with big tables to continue
regular vacuuming on small-but-frequently-updated tables while the
big tables are being vacuumed.
Original patch from Hannu Krossing, rewritten by Tom Lane and updated
by me.
to the low-order bits of the entry hash value. Also make some incidental
cleanups in the dynahash API, such as not exporting the hash header
structs to the world.
current commands; instead, store current-status information in shared
memory. This substantially reduces the overhead of stats_command_string
and also ensures that pg_stat_activity is fully up to date at all times.
Per my recent proposal.
into a single mostly-physical-order scan of the index. This requires some
ticklish interlocking considerations, but should create no material
performance impact on normal index operations (at least given the
already-committed changes to make scans work a page at a time). VACUUM
itself should get significantly faster in any index that's degenerated to a
very nonlinear page order. Also, we save one pass over the index entirely,
except in the case where there were no deletions to do and so only one pass
happened anyway.
Original patch by Heikki Linnakangas, rework by Tom Lane.
an LWLock instead of a spinlock. This hardly matters on Unix machines
but should improve startup performance on Windows (or any port using
EXEC_BACKEND). Per previous discussion.
in favor of having just one set of macros that don't do HOLD/RESUME_INTERRUPTS
(hence, these correspond to the old SpinLockAcquire_NoHoldoff case).
Given our coding rules for spinlock use, there is no reason to allow
CHECK_FOR_INTERRUPTS to be done while holding a spinlock, and also there
is no situation where ImmediateInterruptOK will be true while holding a
spinlock. Therefore doing HOLD/RESUME_INTERRUPTS while taking/releasing a
spinlock is just a waste of cycles. Qingqing Zhou and Tom Lane.
file. The original code probed the PGPROC array separately for each PID,
which was not good for large numbers of backends: not only is the runtime
O(N^2) but most of it is spent holding ProcArrayLock. Instead, take the
lock just once and copy the active PIDs into an array, then use qsort
and bsearch so that the lookup time is more like O(N log N).
reduce contention for the former single LockMgrLock. Per my recent
proposal. I set it up for 16 partitions, but on a pgbench test this
gives only a marginal further improvement over 4 partitions --- we need
to test more scenarios to choose the number of partitions.
the data defining the semantics of a lock method (ie, conflict resolution
table and ancillary data, which is all constant) and the hash tables
storing the current state. The only thing we give up by this is the
ability to use separate hashtables for different lock methods, but there
is no need for that anyway. Put some extra fields into the LockMethod
definition structs to clean up some other uglinesses, like hard-wired
tests for DEFAULT_LOCKMETHOD and USER_LOCKMETHOD. This commit doesn't
do anything about the performance issues we were discussing, but it clears
away some of the underbrush that's in the way of fixing that.
comment line where output as too long, and update typedefs for /lib
directory. Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).
Backpatch to 8.1.X.
to 'Size' (that is, size_t), and install overflow detection checks in it.
This allows us to remove the former arbitrary restrictions on NBuffers
etc. It won't make any difference in a 32-bit machine, but in a 64-bit
machine you could theoretically have terabytes of shared buffers.
(How efficiently we could manage 'em remains to be seen.) Similarly,
num_temp_buffers, work_mem, and maintenance_work_mem can be set above
2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional
work by moi.