was broken for a replication connection and no messages were
displayed on either standby or primary, at any debug level.
Connection messages needed to diagnose session drop/reconnect
events. Use LOG mode for now, discuss lowering in later releases.
The purpose of this change is to eliminate the need for every caller
of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists,
GetSysCacheOid, and SearchSysCacheList to know the maximum number
of allowable keys for a syscache entry (currently 4). This will
make it far easier to increase the maximum number of keys in a
future release should we choose to do so, and it makes the code
shorter, too.
Design and review by Tom Lane.
This includes two new kinds of postmaster processes, walsenders and
walreceiver. Walreceiver is responsible for connecting to the primary server
and streaming WAL to disk, while walsender runs in the primary server and
streams WAL from disk to the client.
Documentation still needs work, but the basics are there. We will probably
pull the replication section to a new chapter later on, as well as the
sections describing file-based replication. But let's do that as a separate
patch, so that it's easier to see what has been added/changed. This patch
also adds a new section to the chapter about FE/BE protocol, documenting the
protocol used by walsender/walreceivxer.
Bump catalog version because of two new functions,
pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for
monitoring the progress of replication.
Fujii Masao, with additional hacking by me
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
Create a new catalog pg_db_role_setting where they are now stored, and better
encapsulate the code that deals with settings into its realm. The old
datconfig and rolconfig columns are removed.
psql has gained a \drds command to display the settings.
Backwards compatibility warning: while the backwards-compatible system views
still have the config columns, they no longer completely represent the
configuration for a user or database.
Catalog version bumped.
to fix the problem that SetClientEncoding needs to be done before
InitializeClientEncoding, as reported by Zdenek Kotala. We get at least
the small consolation of being able to remove the bizarre API detail that
had InitPostgres returning whether user is a superuser.
via the "flat files" facility. This requires making it enough like a backend
to be able to run transactions; it's no longer an "auxiliary process" but
more like the autovacuum worker processes. Also, its signal handling has
to be brought into line with backends/workers. In particular, since it
now has to handle procsignal.c processing, the special autovac-launcher-only
signal conditions are moved to SIGUSR2.
Alvaro, with some cleanup from Tom
(That flat file is now completely useless, but removal will come later.)
To do this, postpone client authentication into the startup transaction
that's run by InitPostgres. We still collect the startup packet and do
SSL initialization (if needed) at the same time we did before. The
AuthenticationTimeout is applied separately to startup packet collection
and the actual authentication cycle. (This is a bit annoying, since it
means a couple extra syscalls; but the signal handling requirements inside
and outside a transaction are sufficiently different that it seems best
to treat the timeouts as completely independent.)
A small security disadvantage is that if the given database name is invalid,
this will be reported to the client before any authentication happens.
We could work around that by connecting to database "postgres" instead,
but consensus seems to be that it's not worth introducing such surprising
behavior.
Processing of all command-line switches and GUC options received from the
client is now postponed until after authentication. This means that
PostAuthDelay is much less useful than it used to be --- if you need to
investigate problems during InitPostgres you'll have to set PreAuthDelay
instead. However, allowing an unauthenticated user to set any GUC options
whatever seems a bit too risky, so we'll live with that.
To make this work in the base case, pg_database now has a nailed-in-cache
relation descriptor that is initialized using hardwired knowledge in
relcache.c. This means pg_database is added to the set of relations that
need to have a Schema_pg_xxx macro maintained in pg_attribute.h. When this
path is taken, we'll have to do a seqscan of pg_database to find the row
we need.
In the normal case, we are able to do an indexscan to find the database's row
by name. This is made possible by storing a global relcache init file that
describes only the shared catalogs and their indexes (and therefore is usable
by all backends in any database). A new backend loads this cache file,
finds its database OID after an indexscan on pg_database, and then loads
the local relcache init file for that database.
This change should effectively eliminate number of databases as a factor
in backend startup time, even with large numbers of databases. However,
the real reason for doing it is as a first step towards getting rid of
the flat files altogether. There are still several other sub-projects
to be tackled before that can happen.
This patch gets us out from under the Unix limitation of two user-defined
signal types. We already had done something similar for signals directed to
the postmaster process; this adds multiplexing for signals directed to
backends and auxiliary processes (so long as they're connected to shared
memory).
As proof of concept, replace the former usage of SIGUSR1 and SIGUSR2
for backends with use of the multiplexing mechanism. There are still some
hard-wired definitions of SIGUSR1 and SIGUSR2 for other process types,
but getting rid of those doesn't seem interesting at the moment.
Fujii Masao
Otherwise, the LC_CTYPE/COLLATE setting gets reverted when using plperl, which
leads to incorrect query results and index corruption.
This was accidentally broken in the per-database locale patch in 8.4. Pointed
out by Andrew Gierth.
already did that on Windows, but it's needed on other platforms too when
LC_CTYPE=C. With other locales, we enforce (or trust) that the codeset of
the locale matches the server encoding so we don't need to bind it
explicitly. It should do no harm in that case either, but I don't have
full faith in the PG encoding -> OS codeset mapping table yet. Per recent
discussion on pgsql-hackers.
its usual buffer cleaning duties during archive recovery, and it's responsible
for performing restartpoints.
This requires some changes in postmaster. When the startup process has done
all the initialization and is ready to start WAL redo, it signals the
postmaster to launch the background writer. The postmaster is signaled again
when the point in recovery is reached where we know that the database is in
consistent state. Postmaster isn't interested in that at the moment, but
that's the point where we could let other backends in to perform read-only
queries. The postmaster is signaled third time when the recovery has ended,
so that postmaster knows that it's safe to start accepting connections.
The startup process now traps SIGTERM, and performs a "clean" shutdown. If
you do a fast shutdown during recovery, a shutdown restartpoint is performed,
like a shutdown checkpoint, and postmaster kills the processes cleanly. You
still have to continue the recovery at next startup, though.
Currently, the background writer is only launched during archive recovery.
We could launch it during crash recovery as well, but it seems better to keep
that codepath as simple as possible, for the sake of robustness. And it
couldn't do any restartpoints during crash recovery anyway, so it wouldn't be
that useful.
log_restartpoints is gone. Use log_checkpoints instead. This is yet to be
documented.
This whole operation is a pre-requisite for Hot Standby, but has some value of
its own whether the hot standby patch makes 8.4 or not.
Simon Riggs, with lots of modifications by me.
ctype are now more like encoding, stored in new datcollate and datctype
columns in pg_database.
This is a stripped-down version of Radek Strnad's patch, with further
changes by me.
GetOldestXmin() instead of RecentGlobalXmin; this is safer because we do not
depend on the latter being correctly set elsewhere, and while it is more
expensive, this code path is not performance-critical. This is a real
risk for autovacuum, because it can execute whole cycles without doing
a single vacuum, which would mean that RecentGlobalXmin would stay at its
initialization value, FirstNormalTransactionId, causing a bogus value to be
inserted in pg_database. This bug could explain some recent reports of
failure to truncate pg_clog.
At the same time, change the initialization of RecentGlobalXmin to
InvalidTransactionId, and ensure that it's set to something else whenever
it's going to be used. Using it as FirstNormalTransactionId in HOT page
pruning could incur in data loss. InitPostgres takes care of setting it
to a valid value, but the extra checks are there to prevent "special"
backends from behaving in unusual ways.
Per Tom Lane's detailed problem dissection in 29544.1221061979@sss.pgh.pa.us
unnecessary #include lines in it. Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.
For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.
While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.
do CancelBackup at a sane place, fix some oversights in the state transitions,
allow only superusers to connect while we are waiting for backup mode to end.
deals with the queue, including locking etc, is all in sinvaladt.c. This means
that the struct definition of the queue, and the queue pointer, are now
internal "implementation details" inside sinvaladt.c.
Per my proposal dated 25-Jun-2007 and followup discussion.
transaction, unless rolled back or overridden by a SET clause for the same
variable attached to a surrounding function call. Per discussion, these
seem the best semantics. Note that this is an INCOMPATIBLE CHANGE: in 8.0
through 8.2, SET LOCAL's effects disappeared at subtransaction commit
(leading to behavior that made little sense at the SQL level).
I took advantage of the opportunity to rewrite and simplify the GUC variable
save/restore logic a little bit. The old idea of a "tentative" value is gone;
it was a hangover from before we had a stack. Also, we no longer need a stack
entry for every nesting level, but only for those in which a variable's value
actually changed.
There are still some loose ends: I didn't do anything about the SET FROM
CURRENT idea yet, and it's not real clear whether we are happy with the
interaction of SET LOCAL with function-local settings. The documentation
is a bit spartan, too.
or abort within a backend; rearrange InitPostgres processing to make it so.
Revealed by just-added Asserts along with ECPG regression tests (hm, I wonder
why the core regression tests didn't expose it?). This possibly is another
reason for missing stats updates ...
module and teach PREPARE and protocol-level prepared statements to use it.
In service of this, rearrange utility-statement processing so that parse
analysis does not assume table schemas can't change before execution for
utility statements (necessary because we don't attempt to re-acquire locks
for utility statements when reusing a stored plan). This requires some
refactoring of the ProcessUtility API, but it ends up cleaner anyway,
for instance we can get rid of the QueryContext global.
Still to do: fix up SPI and related code to use the plan cache; I'm tempted to
try to make SQL functions use it too. Also, there are at least some aspects
of system state that we want to ensure remain the same during a replan as in
the original processing; search_path certainly ought to behave that way for
instance, and perhaps there are others.
continuously, and requests vacuum runs of "autovacuum workers" to postmaster.
The workers do the actual vacuum work. This allows for future improvements,
like allowing multiple autovacuum jobs running in parallel.
For now, the code keeps the original behavior of having a single autovac
process at any time by sleeping until the previous worker has finished.
in PITR scenarios. We now WAL-log the replacement of old XIDs with
FrozenTransactionId, so that such replacement is guaranteed to propagate to
PITR slave databases. Also, rather than relying on hint-bit updates to be
preserved, pg_clog is not truncated until all instances of an XID are known to
have been replaced by FrozenTransactionId. Add new GUC variables and
pg_autovacuum columns to allow management of the freezing policy, so that
users can trade off the size of pg_clog against the amount of freezing work
done. Revise the already-existing code that forces autovacuum of tables
approaching the wraparound point to make it more bulletproof; also, revise the
autovacuum logic so that anti-wraparound vacuuming is done per-table rather
than per-database. initdb forced because of changes in pg_class, pg_database,
and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.
contrib functionality. Along the way, remove the USER_LOCKS configuration
symbol, since it no longer makes any sense to try to compile that out.
No user documentation yet ... mmoncure has promised to write some.
Thanks to Abhijit Menon-Sen for creating a first draft to work from.
it's not necessary to have three separate calls anymore. This patch also
fixes things so we don't try to read pg_internal.init until after we've
obtained lock on the target database; which was fairly harmless, but it's
certainly cleaner this way.
The former approach used ExclusiveLock on pg_database, which being a
cluster-wide lock meant only one of these operations could proceed at
a time; worse, it also blocked all incoming connections in ReverifyMyDatabase.
Now that we have LockSharedObject(), we can use locks of different types
applied to databases considered as objects. This allows much more
flexible management of the interlocking: two CREATE DATABASEs need not
block each other, and need not block connections except to the template
database being used. Similarly DROP DATABASE doesn't block unrelated
operations. The locking used in flatfiles.c is also much narrower in
scope than before. Per recent proposal.
in various places that were previously doing ad hoc pg_database searches.
This may speed up database-related privilege checks a little bit, but
the main motivation is to eliminate the performance reason for having
ReverifyMyDatabase do such a lot of stuff (viz, avoiding repeat scans
of pg_database during backend startup). The locking reason for having
that routine is about to go away, and it'd be good to have the option
to break it up.
This commit doesn't make much functional change, but it does eliminate some
duplicated code --- for instance, PageIsNew tests are now done inside
XLogReadBuffer rather than by each caller.
The GIST xlog code still needs a lot of love, but I'll worry about that
separately.
an LWLock instead of a spinlock. This hardly matters on Unix machines
but should improve startup performance on Windows (or any port using
EXEC_BACKEND). Per previous discussion.
comment line where output as too long, and update typedefs for /lib
directory. Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).
Backpatch to 8.1.X.