Previously, this prevented promoted standby servers from being upgraded
because of a missing WAL history file. (Timeline 1 doesn't need a
history file, and we don't copy WAL files anyway.)
Report by Christian Echerer(?), Alexey Klyukin
Backpatch through 9.0
This patch causes pg_upgrade to error out during its check phase if:
(1) template0 is marked connectable
or
(2) any other database is marked non-connectable
This is done because, in the first case, pg_upgrade would fail because
the pg_dumpall --globals restore would fail, and in the second case, the
database would not be restored, leading to data loss.
Report by Matt Landry (1), Stephen Frost (2)
Backpatch through 9.0
DST law changes in Egypt, Mongolia, Palestine.
Historical corrections for Canada and Chile.
Revised zone abbreviation for America/Adak (HST/HDT not HAST/HADT).
Commit 81c45081 introduced a new RBM_ZERO_AND_LOCK mode to ReadBuffer, which
takes a lock on the buffer before zeroing it. However, you cannot take a
lock on a local buffer, and you got a segfault instead. The version of that
patch committed to master included a check for !isLocalBuf, and therefore
didn't crash, but oddly I missed that in the back-patched versions. This
patch adds that check to the back-branches too.
RBM_ZERO_AND_LOCK mode is only used during WAL replay, and in hash indexes.
WAL replay only deals with shared buffers, so the only way to trigger the
bug is with a temporary hash index.
Reported by Artem Ignatyev, analysis by Tom Lane.
If a row that potentially violates a deferred exclusion constraint is
HOT-updated later in the same transaction, the exclusion constraint would
be reported as violated when the check finally occurs, even if the row(s)
the new row originally conflicted with have since been removed. This
happened because the wrong TID was passed to check_exclusion_constraint(),
causing the live HOT-updated row to be seen as a conflicting row rather
than recognized as the row-under-test.
Per bug #13148 from Evan Martin. It's been broken since exclusion
constraints were invented, so back-patch to all supported branches.
Analysis by Noah Misch shows that the 25% threshold set by commit
53bb309d2d is lower than any other,
similar autovac threshold. While we don't know exactly what value
will be optimal for all users, it is better to err a little on the
high side than on the low side. A higher value increases the risk
that users might exhaust the available space and start seeing errors
before autovacuum can clean things up sufficiently, but a user who
hits that problem can compensate for it by reducing
autovacuum_multixact_freeze_max_age to a value dependent on their
average multixact size. On the flip side, if the emergency cap
imposed by that patch kicks in too early, the user will experience
excessive wraparound scanning and will be unable to mitigate that
problem by configuration. The new value will hopefully reduce the
risk of such bad experiences while still providing enough headroom
to avoid multixact member exhaustion for most users.
Along the way, adjust the documentation to reflect the effects of
commit 04e6d3b877, which taught
autovacuum to run for multixact wraparound even when autovacuum
is configured off.
Commit b69bf30b9b advanced the stop point
at vacuum time, but this has subsequently been shown to be unsafe as a
result of analysis by myself and Thomas Munro and testing by Thomas
Munro. The crux of the problem is that the SLRU deletion logic may
get confused about what to remove if, at exactly the right time during
the checkpoint process, the head of the SLRU crosses what used to be
the tail.
This patch, by me, fixes the problem by advancing the stop point only
following a checkpoint. This has the additional advantage of making
the removal logic work during recovery more like the way it works during
normal running, which is probably good.
At least one of the calls to DetermineSafeOldestOffset which this patch
removes was already dead, because MultiXactAdvanceOldest is called only
during recovery and DetermineSafeOldestOffset was set up to do nothing
during recovery. That, however, is inconsistent with the principle that
recovery and normal running should work similarly, and was confusing to
boot.
Along the way, fix some comments that previous patches in this area
neglected to update. It's not clear to me whether there's any
concrete basis for the decision to use only half of the multixact ID
space, but it's neither necessary nor sufficient to prevent multixact
member wraparound, so the comments should not say otherwise.
Commit b69bf30b9b failed to take into
account the possibility that there might be no multixacts in existence
at all.
Report by Thomas Munro; patch by me.
As discussed, the default setting of include_realm=0 can be dangerous in
multi-realm environments because it is then impossible to differentiate
users with the same username but who are from two different realms.
Recommend include_realm=1 and note that the default setting may change
in a future version of PostgreSQL and therefore users may wish to
explicitly set include_realm to avoid issues while upgrading.
The logic introduced in commit b69bf30b9b
and repaired in commits 669c7d20e6 and
7be47c56af helps to ensure that we don't
overwrite old multixact member information while it is still needed,
but a user who creates many large multixacts can still exhaust the
member space (and thus start getting errors) while autovacuum stands
idly by.
To fix this, progressively ramp down the effective value (but not the
actual contents) of autovacuum_multixact_freeze_max_age as member space
utilization increases. This makes autovacuum more aggressive and also
reduces the threshold for a manual VACUUM to perform a full-table scan.
This patch leaves unsolved the problem of ensuring that emergency
autovacuums are triggered even when autovacuum=off. We'll need to fix
that via a separate patch.
Thomas Munro and Robert Haas
The old formula didn't have enough parentheses, so it would do the wrong
thing, and it used / rather than % to find a remainder. The effect of
these oversights is that the stop point chosen by the logic introduced in
commit b69bf30b9b might be rather
meaningless.
Thomas Munro, reviewed by Kevin Grittner, with a whitespace tweak by me.
The Service Control Manager should be notified regularly during a shutdown
that takes a long time. Previously we would increaes the counter, but forgot
to actually send the notification to the system. The loop counter was also
incorrectly initalized in the event that the startup of the system took long
enough for it to increase, which could cause the shutdown process not to wait
as long as expected.
Krystian Bigaj, reviewed by Michael Paquier
These functions should return SETOF TEXT[], like the core functions they
are wrappers for; but they were incorrectly declared as returning just
TEXT[]. This mistake had two results: first, if there was no match you got
a scalar null result, whereas what you should get is an empty set (zero
rows). Second, the 'g' flag was effectively ignored, since you would get
only one result array even if there were multiple matches, as reported by
Jeff Certain.
While ignoring 'g' is a clear bug, the behavior for no matches might well
have been thought to be the intended behavior by people who hadn't compared
it carefully to the core regexp_matches() functions. So we should tread
carefully about introducing this change in the back branches. Still, it
clearly is a bug and so providing some fix is desirable.
After discussion, the conclusion was to introduce the change in a 1.1
version of the citext extension (as we would need to do anyway); 1.0 still
contains the incorrect behavior. 1.1 is the default and only available
version in HEAD, but it is optional in the back branches, where 1.0 remains
the default version. People wishing to adopt the fix in back branches will
need to explicitly do ALTER EXTENSION citext UPDATE TO '1.1'. (I also
provided a downgrade script in the back branches, so people could go back
to 1.0 if necessary.)
This should be called out as an incompatible change in the 9.5 release
notes, although we'll also document it in the next set of back-branch
release notes. The notes should mention that any views or rules that use
citext's regexp_matches() functions will need to be dropped before
upgrading to 1.1, and then recreated again afterwards.
Back-patch to 9.1. The bug goes all the way back to citext's introduction
in 8.4, but pre-9.1 there is no extension mechanism with which to manage
the change. Given the lack of previous complaints it seems unnecessary to
change this behavior in 9.0, anyway.
pg_win32_is_junction() was a typo for pgwin32_is_junction(). open()
was used not only in a two-argument form, which breaks on Windows,
but also where BasicOpenFile() should have been used.
Per reports from Andrew Dunstan and David Rowley.
Otherwise, if there's another crash, some writes from after the first
crash might make it to disk while writes from before the crash fail
to make it to disk. This could lead to data corruption.
Back-patch to all supported versions.
Abhijit Menon-Sen, reviewed by Andres Freund and slightly revised
by me.
The first bug is not releasing a tupdesc when doing an early return out
of the function. The second bug is a logic error in choosing when to do
an early return if given an empty jsonb object.
Bug reports from Pavel Stehule and Tom Lane respectively.
Backpatch to 9.4 where these were introduced.
When altering the deferredness state of a foreign key constraint, we
correctly updated the catalogs and then invalidated the relcache state for
the target relation ... but that's not the only relation with relevant
triggers. Must invalidate the other table as well, or the state change
fails to take effect promptly for operations triggered on the other table.
Per bug #13224 from Christian Ullrich.
In passing, reorganize regression test case for this feature so that it
isn't randomly injected into the middle of an unrelated test sequence.
Oversight in commit f177cbfe67. Back-patch
to 9.4 where the faulty code was added.
We need to create the pg_multixact/offsets file deleted by pg_upgrade
much earlier than we originally were: it was in TrimMultiXact(), which
runs after we exit recovery, but it actually needs to run earlier than
the first call to SetMultiXactIdLimit (before recovery), because that
routine already wants to read the first offset segment.
Per pg_upgrade trouble report from Jeff Janes.
While at it, silence a compiler warning about a pointless assert that an
unsigned variable was being tested non-negative. This was a signed
constant in Thomas Munro's patch which I changed to unsigned before
commit. Pointed out by Andres Freund.
Multixact member files are subject to early wraparound overflow and
removal: if the average multixact size is above a certain threshold (see
note below) the protections against offset overflow are not enough:
during multixact truncation at checkpoint time, some
pg_multixact/members files would be removed because the server considers
them to be old and not needed anymore. This leads to loss of files that
are critical to interpret existing tuples's Xmax values.
To protect against this, since we don't have enough info in pg_control
and we can't modify it in old branches, we maintain shared memory state
about the oldest value that we need to keep; we use this during new
multixact creation to abort if an old still-needed file would get
overwritten. This value is kept up to date by checkpoints, which makes
it not completely accurate but should be good enough. We start emitting
warnings sometime earlier, so that the eventual multixact-shutdown
doesn't take DBAs completely by surprise (more precisely: once 20
members SLRU segments are remaining before shutdown.)
On troublesome average multixact size: The threshold size depends on the
multixact freeze parameters. The oldest age is related to the greater of
multixact_freeze_table_age and multixact_freeze_min_age: anything
older than that should be removed promptly by autovacuum. If autovacuum
is keeping up with multixact freezing, the troublesome multixact average
size is
(2^32-1) / Max(freeze table age, freeze min age)
or around 28 members per multixact. Having an average multixact size
larger than that will eventually cause new multixact data to overwrite
the data area for older multixacts. (If autovacuum is not able to keep
up, or there are errors in vacuuming, the actual maximum is
multixact_freeeze_max_age instead, at which point multixact generation
is stopped completely. The default value for this limit is 400 million,
which means that the multixact size that would cause trouble is about 10
members).
Initial bug report by Timothy Garnett, bug #12990
Backpatch to 9.3, where the problem was introduced.
Authors: Álvaro Herrera, Thomas Munro
Reviews: Thomas Munro, Amit Kapila, Robert Haas, Kevin Grittner
Some operating systems, including the reporter's windows, return EBADFD
or similar when fsync() is invoked on a O_RDONLY file descriptor.
Unfortunately RestoreSlotFromDisk() does exactly that; which causes
failures after restarts in at least some scenarios.
If you hit the bug the error message will be something like
ERROR: could not fsync file "pg_replslot/$name/state": Bad file descriptor
Simply use O_RDWR instead of O_RDONLY when opening the relevant file
descriptor to fix the bug. Unfortunately I have no way of verifying the
fix, but we've seen similar problems in the past.
This bug goes back to 9.4 where slots were introduced. Backpatch
accordingly.
Reported-By: Patrice Drolet
Bug: #13143:
Discussion: 20150424101006.2556.60897@wrigleys.postgresql.org
An outer join appearing within the RHS of an antijoin can't commute with
the antijoin, but somehow I missed teaching make_outerjoininfo() about
that. In Teodor Sigaev's recent trouble report, this manifests as a
"could not find RelOptInfo for given relids" error within eqjoinsel();
but I think silently wrong query results are possible too, if the planner
misorders the joins and doesn't happen to trigger any internal consistency
checks. It's broken as far back as we had antijoins, so back-patch to all
supported branches.
Each of the libraries incorporates src/port files, which often check
FRONTEND. Build systems disagreed on whether to build libpgtypes this
way. Only libecpg incorporates files that rely on it today. Back-patch
to 9.0 (all supported versions) to forestall surprises.
The cross-reference to set_append_rel_pathlist() was obsoleted by
commit e2fa76d80b, which split what
had been set_rel_pathlist() and child routines into two sets of
functions. But I (tgl) evidently missed updating this comment.
Back-patch to 9.2 to avoid unnecessary divergence among branches.
Amit Langote
When the startup process recovers transactions by scanning pg_twophase
directory, it should clear MyLockedGxact after it's done processing each
transaction. Like we do during normal operation, at PREPARE TRANSACTION.
Otherwise, if the startup process exits due to an error, it will try to
clear the locking_backend field of the last recovered transaction. That's
usually harmless, but if the error happens in MarkAsPreparing, while
holding TwoPhaseStateLock, the shmem-exit hook will try to acquire
TwoPhaseStateLock again, and deadlock with itself.
This fixes bug #13128 reported by Grant McAlister. The bug was introduced
by commit bb38fb0d, so backpatch to all supported versions like that
commit.
SLRU_SEGMENTS_PER_PAGE -> SLRU_PAGES_PER_SEGMENT
I introduced this ancient typo in subtrans.c and later propagated it to
multixact.c. I fixed the latter in f741300c, but only back to 9.3;
backpatch to all supported branches for consistency.
After a timeline switch, we would leave behind recycled WAL segments that
are in the future, but on the old timeline. After promotion, and after they
become old enough to be recycled again, we would notice that they don't have
a .ready or .done file, create a .ready file for them, and archive them.
That's bogus, because the files contain garbage, recycled from an older
timeline (or prealloced as zeros). We shouldn't archive such files.
This could happen when we're following a timeline switch during replay, or
when we switch to new timeline at end-of-recovery.
To fix, whenever we switch to a new timeline, scan the data directory for
WAL segments on the old timeline, but with a higher segment number, and
remove them. Those don't belong to our timeline history, and are most
likely bogus recycled or preallocated files. They could also be valid files
that we streamed from the primary ahead of time, but in any case, they're
not needed to recover to the new timeline.
It was previously possible to have the launcher re-execute its main loop
before shutting down if some other signal was received or an error
occurred after getting SIGTERM, as reported by Qingqing Zhou.
While investigating, Tom Lane further noticed that if autovacuum had
been disabled in the config file, it would misbehave by trying to start
a new worker instead of bailing out immediately -- it would consider
itself as invoked in emergency mode.
Fix both problems by checking the shutdown flag in a few more places.
These problems have existed since autovacuum was introduced, so
backpatch all the way back.
While gcc doesn't complain if you declare a function "static" and then
define it not-static, other compilers do; and in any case the code is
highly misleading this way. Add the missing "static" keywords to a
couple of recent patches. Per buildfarm member pademelon.
Considering the number of cases in which "unused" command line arguments
are silently ignored by compilers, it's fairly astonishing that anybody
thought this warning was useful; it's certainly nothing but an annoyance
when building Postgres. One such case is that neither gcc nor clang
complain about unrecognized -Wno-foo switches, making it more difficult
to figure out whether the switch does anything than one could wish.
Back-patch to 9.3, which is as far back as the patch applies conveniently
(we'd have to back-patch PGAC_PROG_CC_VAR_OPT to go further, and it doesn't
seem worth that).
Previously we would re-use input subexpressions in all expression trees
attached to a Join plan node. However, if it's an outer join and the
subexpression appears in the nullable-side input, this is potentially
incorrect for apparently-matching subexpressions that came from above
the outer join (ie, targetlist and qpqual expressions), because the
executor will treat the subexpression value as NULL when maybe it should
not be.
The case is fairly hard to hit because (a) you need a non-strict
subexpression (else NULL is correct), and (b) we don't usually compute
expressions in the outputs of non-toplevel plan nodes. But we might do
so if the expressions are sort keys for a mergejoin, for example.
Probably in the long run we should make a more explicit distinction between
Vars appearing above and below an outer join, but that will be a major
planner redesign and not at all back-patchable. For the moment, just hack
set_join_references so that it will not match any non-Var expressions
coming from nullable inputs to expressions that came from above the join.
(This is somewhat overkill, in that a strict expression could still be
matched, but it doesn't seem worth the effort to check that.)
Per report from Qingqing Zhou. The added regression test case is based
on his example.
This has been broken for a very long time, so back-patch to all active
branches.
Some of the TAP tests were supposing that PG programs would accept switches
after non-switch arguments on their command lines. While GNU getopt_long()
does allow that, our own implementation does not, and it's nowhere
suggested in our documentation that such cases should work. Adjust the
tests to use only the documented syntax.
Back-patch to 9.4, since without this the TAP tests fail when run with
src/port's getopt_long() implementation.
Michael Paquier
Commit ed9cc2b5df made it unnecessary to pass
start_nblkno to _hash_splitbucket(), and for that matter unnecessary to
have the internal nblkno variable either. My compiler didn't complain
about that, but some did. I also rearranged the use of oblkno a bit to
make that case more parallel.
Report and initial patch by Petr Jelinek, rearranged a bit by me.
Back-patch to all branches, like the previous patch.
While a new backend nominally participates in sinval signaling starting
from the SharedInvalBackendInit call near the top of InitPostgres, it
cannot recognize sinval messages for unshared catalogs of its database
until it has set up MyDatabaseId. This is not problematic for the catcache
or relcache, which by definition won't have loaded any data from or about
such catalogs before that point. However, commit 568d4138c6
introduced a mechanism for re-using MVCC snapshots for catalog scans, and
made invalidation of those depend on recognizing relevant sinval messages.
So it's possible to establish a catalog snapshot to read pg_authid and
pg_database, then before we set MyDatabaseId, receive sinval messages that
should result in invalidating that snapshot --- but do not, because we
don't realize they are for our database. This mechanism explains the
intermittent buildfarm failures we've seen since commit 31eae6028e.
That commit was not itself at fault, but it introduced a new regression
test that does reconnections concurrently with the "vacuum full pg_am"
command in vacuum.sql. This allowed the pre-existing error to be exposed,
given just the right timing, because we'd fail to update our information
about how to access pg_am. In principle any VACUUM FULL on a system
catalog could have created a similar hazard for concurrent incoming
connections. Perhaps there are more subtle failure cases as well.
To fix, force invalidation of the catalog snapshot as soon as we've
set MyDatabaseId.
Back-patch to 9.4 where the error was introduced.