Instead of entering them on transaction startup, we materialize them
only when someone wants to wait, which will occur only during CREATE
INDEX CONCURRENTLY. In Hot Standby mode, the startup process must also
be able to probe for conflicting VXID locks, but the lock need never be
fully materialized, because the startup process does not use the normal
lock wait mechanism. Since most VXID locks never need to touch the
lock manager partition locks, this can significantly reduce blocking
contention on read-heavy workloads.
Patch by me. Review by Jeff Davis.
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
backend. If so, send a LOG message to the postmaster log, and if the table
is beyond the vacuum-for-wraparound horizon, forcibly drop it. Per recent
discussions. Perhaps we ought to back-patch this, but it probably needs
to age a bit in HEAD first.
unnecessary cache resets. The major changes are:
* When the queue overflows, we only issue a cache reset to the specific
backend or backends that still haven't read the oldest message, rather
than resetting everyone as in the original coding.
* When we observe backend(s) falling well behind, we signal SIGUSR1
to only one backend, the one that is furthest behind and doesn't already
have a signal outstanding for it. When it finishes catching up, it will
in turn signal SIGUSR1 to the next-furthest-back guy, if there is one that
is far enough behind to justify a signal. The PMSIGNAL_WAKEN_CHILDREN
mechanism is removed.
* We don't attempt to clean out dead messages after every message-receipt
operation; rather, we do it on the insertion side, and only when the queue
fullness passes certain thresholds.
* Split SInvalLock into SInvalReadLock and SInvalWriteLock so that readers
don't block writers nor vice versa (except during the infrequent queue
cleanout operations).
* Transfer multiple sinval messages for each acquisition of a read or
write lock.
deals with the queue, including locking etc, is all in sinvaladt.c. This means
that the struct definition of the queue, and the queue pointer, are now
internal "implementation details" inside sinvaladt.c.
Per my proposal dated 25-Jun-2007 and followup discussion.
rows will normally never obtain an XID at all. We already did things this way
for subtransactions, but this patch extends the concept to top-level
transactions. In applications where there are lots of short read-only
transactions, this should improve performance noticeably; not so much from
removal of the actual XID-assignments, as from reduction of overhead that's
driven by the rate of XID consumption. We add a concept of a "virtual
transaction ID" so that active transactions can be uniquely identified even
if they don't have a regular XID. This is a much lighter-weight concept:
uniqueness of VXIDs is only guaranteed over the short term, and no on-disk
record is made about them.
Florian Pflug, with some editorialization by Tom.
to 'Size' (that is, size_t), and install overflow detection checks in it.
This allows us to remove the former arbitrary restrictions on NBuffers
etc. It won't make any difference in a 32-bit machine, but in a 64-bit
machine you could theoretically have terabytes of shared buffers.
(How efficiently we could manage 'em remains to be seen.) Similarly,
num_temp_buffers, work_mem, and maintenance_work_mem can be set above
2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional
work by moi.
communication structure, and make it its own module with its own lock.
This should reduce contention at least a little, and it definitely makes
the code seem cleaner. Per my recent proposal.
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
connections by the superuser only.
This patch replaces the last patch I sent a couple of days ago.
It closes a connection that has not been authorised by a superuser if it would
leave less than the GUC variable ReservedBackends
(superuser_reserved_connections in postgres.conf) backend process slots free
in the SISeg. This differs to the first patch which only reserved the last
ReservedBackends slots in the procState array. This has made the free slot
test more expensive due to the use of a lock.
After thinking about a comment on the first patch I've also made it a fatal
error if the number of reserved slots is not less than the maximum number of
connections.
Nigel J. Andrews
> Changes to avoid collisions with WIN32 & MFC names...
> 1. Renamed:
> a. PROC => PGPROC
> b. GetUserName() => GetUserNameFromId()
> c. GetCurrentTime() => GetCurrentDateTime()
> d. IGNORE => IGNORE_DTF in include/utils/datetime.h & utils/adt/datetim
>
> 2. Added _P to some lex/yacc tokens:
> CONST, CHAR, DELETE, FLOAT, GROUP, IN, OUT
Jan
SI messages now include the relevant database OID, so that operations
in one database do not cause useless cache flushes in backends attached
to other databases. Declare SI messages properly using a union, to
eliminate the former assumption that Oid is the same size as int or Index.
Rewrite the nearly-unreadable code in inval.c, and document it better.
Arrange for catcache flushes at end of command/transaction to happen before
relcache flushes do --- this avoids loading a new tuple into the catcache
while setting up new relcache entry, only to have it be flushed again
immediately.
IPC key assignment will now work correctly even when multiple postmasters
are using same logical port number (which is possible given -k switch).
There is only one shared-mem segment per postmaster now, not 3.
Rip out broken code for non-TAS case in bufmgr and xlog, substitute a
complete S_LOCK emulation using semaphores in spin.c. TAS and non-TAS
logic is now exactly the same.
When deadlock is detected, "Deadlock detected" is now the elog(ERROR)
message, rather than a NOTICE that comes out before an unhelpful ERROR.
that search loops only have to scan that far and not through all maxBackends
entries. This eliminates a performance penalty for setting maxBackends
much higher than the average number of active backends. Also, eliminate
no-longer-used 'backend tag' concept. Remove setting of environment
variables at backend start (except for CYR_RECODE), since none of them
are being examined by the backend any longer.
* Buffer refcount cleanup (per my "progress report" to pghackers, 9/22).
* Add links to backend PROC structs to sinval's array of per-backend info,
and use these links for routines that need to check the state of all
backends (rather than the slow, complicated search of the ShmemIndex
hashtable that was used before). Add databaseOID to PROC structs.
* Use this to implement an interlock that prevents DESTROY DATABASE of
a database containing running backends. (It's a little tricky to prevent
a concurrently-starting backend from getting in there, since the new
backend is not able to lock anything at the time it tries to look up
its database in pg_database. My solution is to recheck that the DB is
OK at the end of InitPostgres. It may not be a 100% solution, but it's
a lot better than no interlock at all...)
* In ALTER TABLE RENAME, flush buffers for the relation before doing the
rename of the physical files, to ensure we don't get failures later from
mdblindwrt().
* Update TRUNCATE patch so that it actually compiles against current
sources :-(.
You should do "make clean all" after pulling these changes.
offended my aesthestic sensibility that there was so much unreadable code
doing so little. Rewritten code is about half the size, faster, and
(I hope) much more intelligible.
the SInval spinlock while it is calling the passed invalFunction or
resetFunction. This is necessary to avoid deadlock with lmgr change;
InvalidateSharedInvalid can be called recursively now. It should be
a good performance improvement anyway --- holding a spinlock for more
than a very short interval is a no-no.
through MAXBACKENDS array entries used to be fine when MAXBACKENDS = 64.
It's not so cool with MAXBACKENDS = 1024 (or more!), especially not in a
frequently-used routine like SIDelExpiredDataEntries. Repair by making
procState array size be the soft MaxBackends limit rather than the hard
limit, and by converting SIGetProcStateLimit() to a macro.