Per a suggestion from Tom Lane. Back-patch to 9.0 (all supported
versions). While only 9.4 and up have code known to elicit this
compiler bug, we were disabling inlining by accident until commit
43d89a23d5.
Although initdb has long discouraged use of a filesystem mount-point
directory as a PG data directory, this point was covered nowhere in the
user-facing documentation. Also, with the popularity of pg_upgrade,
we really need to recommend that the PG user own not only the data
directory but its parent directory too. (Without a writable parent
directory, operations such as "mv data data.old" fail immediately.
pg_upgrade itself doesn't do that, but wrapper scripts for it often do.)
Hence, adjust the "Creating a Database Cluster" section to address
these points. I also took the liberty of wordsmithing the discussion
of NFS a bit.
These considerations aren't by any means new, so back-patch to all
supported branches.
Don't print a WARNING if we get ESRCH from a kill() that's attempting
to cancel an autovacuum worker. It's possible (and has been seen in the
buildfarm) that the worker is already gone by the time we are able to
execute the kill, in which case the failure is harmless. About the only
plausible reason for reporting such cases would be to help debug corrupted
lock table contents, but this is hardly likely to be the most important
symptom if that happens. Moreover issuing a WARNING might scare users
more than is warranted.
Also, since sending a signal to an autovacuum worker is now entirely a
routine thing, and the worker will log the query cancel on its end anyway,
reduce the message saying we're doing that from LOG to DEBUG1 level.
Very minor cosmetic cleanup as well.
Since the main practical reason for doing this is to avoid unnecessary
buildfarm failures, back-patch to all active branches.
While postgres' use of SSL renegotiation is a good idea in theory, it
turned out to not work well in practice. The specification and openssl's
implementation of it have lead to several security issues. Postgres' use
of renegotiation also had its share of bugs.
Additionally OpenSSL has a bunch of bugs around renegotiation, reported
and open for years, that regularly lead to connections breaking with
obscure error messages. We tried increasingly complex workarounds to get
around these bugs, but we didn't find anything complete.
Since these connection breakages often lead to hard to debug problems,
e.g. spuriously failing base backups and significant latency spikes when
synchronous replication is used, we have decided to change the default
setting for ssl renegotiation to 0 (disabled) in the released
backbranches and remove it entirely in 9.5 and master..
Author: Michael Paquier, with changes by me
Discussion: 20150624144148.GQ4797@alap3.anarazel.de
Backpatch: 9.0-9.4; 9.5 and master get a different patch
join_clause_is_movable_into() is approximate, in the sense that it might
sometimes return "false" when actually it would be valid to push the given
join clause down to the specified level. This is okay ... but there was
an Assert in get_joinrel_parampathinfo() that's only safe if the answers
are always exact. Comment out the Assert, and add a bunch of commentary
to clarify what's going on.
Per fuzz testing by Andreas Seltenreich. The added regression test is
a pretty silly query, but it's based on his crasher example.
Back-patch to 9.2 where the faulty logic was introduced.
It does currently, and I don't see us changing that any time soon, but we
don't make that assumption anywhere else.
Per Tom Lane's suggestion. Backpatch to 9.2, like the previous patch that
added this assumption.
In GIN, an all-zeros page would be leaked forever, and never reused. Just
add them to the FSM in vacuum, and they will be reinitialized when grabbed
from the FSM. On master and 9.5, attempting to access the page's opaque
struct also caused an assertion failure, although that was otherwise
harmless.
Reported by Jeff Janes. Backpatch to all supported versions.
SP-GiST initialized an all-zeros page at vacuum, but that was not
WAL-logged, which is not safe. You might get a torn page write, when it gets
flushed to disk, and end-up with a half-initialized index page. To fix,
leave it in the all-zeros state, and add it to the FSM. It will be
initialized when reused. Also don't set the page-deleted flag when recycling
an empty page. That was also not WAL-logged, and a torn write of that would
cause the page to have an invalid checksum.
Backpatch to 9.2, where SP-GiST indexes were added.
The planner generally expects that the estimated rowcount of any relation
is at least one row, *unless* it has been proven empty by constraint
exclusion or similar mechanisms, which is marked by installing a dummy path
as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up
allpaths.c's processing of base rels into separate set_base_rel_sizes and
set_base_rel_pathlists steps, the intention was that dummy rels would get
marked as such during the "set size" step; this is what justifies an Assert
in indxpath.c's get_loop_count that other relations should either be dummy
or have positive rowcount. Unfortunately I didn't get that quite right
for append relations: if all the child rels have been proven empty then
set_append_rel_size would come up with a rowcount of zero, which is
correct, but it didn't then do set_dummy_rel_pathlist. (We would have
ended up with the right state after set_append_rel_pathlist, but that's
too late, if we generate indexpaths for some other rel first.)
In addition to fixing the actual bug, I installed an Assert enforcing this
convention in set_rel_size; that then allows simplification of a couple
of now-redundant tests for zero rowcount in set_append_rel_size.
Also, to cover the possibility that third-party FDWs have been careless
about not returning a zero rowcount estimate, apply clamp_row_est to
whatever an FDW comes up with as the rows estimate.
Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches
did not have the separation between set_base_rel_sizes and
set_base_rel_pathlists steps, so there was no intermediate state where an
appendrel would have had inconsistent rowcount and pathlist. It's possible
that adding the Assert to set_rel_size would be a good idea in older
branches too; but since they're not under development any more, it's likely
not worth the trouble.
This was broken by commit 0e7e355f27 and
friends, which ignored the fact that gzopen() will treat "-1" in the
mode argument as an invalid character, which it ignores, and a flag for
compression level 1. Now, when this value is encountered no compression
level flag is passed to gzopen, leaving it to use the zlib default.
Also, enforce the documented allowed range for pg_dump's -Z option,
namely 0 .. 9, and remove some consequently dead code from
pg_backup_tar.c.
Problem reported by Marc Mamin.
Backpatch to 9.1, like the patch that introduced the bug.
If there were no subtransactions (or multixacts) active, we would calculate
the oldestxid == next xid. That's correct, but if next XID happens to be
on the next pg_subtrans (pg_multixact) page, the page does not exist yet,
and SimpleLruTruncate will produce an "apparent wraparound" warning. The
warning is harmless in this case, but looks very alarming to users.
Backpatch to all supported versions. Patch and analysis by Thomas Munro.
As reported by Bill Parker, PL/Tcl did not validate some malloc() calls
against NULL return. Fix by using palloc() in a new long-lived memory
context instead. This allows us to simplify error handling too, by
simply deleting the memory context instead of doing retail frees.
There's still a lot that could be done to improve PL/Tcl's memory
handling ...
This is pretty ancient, so backpatch all the way back.
Author: Michael Paquier and Álvaro Herrera
Discussion: https://www.postgresql.org/message-id/CAFrbyQwyLDYXfBOhPfoBGqnvuZO_Y90YgqFM11T2jvnxjLFmqw@mail.gmail.com
In the previous coding, timeout would be noticed and reported only when
poll() or socket() returned zero (or the equivalent behavior on Windows).
Ordinarily that should work well enough, but it seems conceivable that we
could get into a state where poll() always returns a nonzero value --- for
example, if it is noticing a condition on one of the file descriptors that
we do not think is reason to exit the loop. If that happened, we'd be in a
busy-wait loop that would fail to terminate even when the timeout expires.
We can make this more robust at essentially no cost, by deciding to exit
of our own accord if we compute a zero or negative time-remaining-to-wait.
Previously the code noted this but just clamped the time-remaining to zero,
expecting that we'd detect timeout on the next loop iteration.
Back-patch to 9.2. While 9.1 had a version of WaitLatchOrSocket, it was
primitive compared to later versions, and did not guarantee reliable
detection of timeouts anyway. (Essentially, this is a refinement of
commit 3e7fdcffd6, which was back-patched only as far as 9.2.)
xlc provides "long long" unconditionally at C99-compatible language
levels, and this option provokes a warning. The warning interferes with
"configure" tests that fail in response to any warning. Notably, before
commit 85a2a8903f, it interfered with the
test for -qnoansialias. Back-patch to 9.0 (all supported versions).
It's standard for quicksort implementations, after having partitioned the
input into two subgroups, to recurse to process the smaller partition and
then handle the larger partition by iterating. This method guarantees
that no more than log2(N) levels of recursion can be needed. However,
Bentley and McIlroy argued that checking to see which partition is smaller
isn't worth the cycles, and so their code doesn't do that but just always
recurses on the left partition. In most cases that's fine; but with
worst-case input we might need O(N) levels of recursion, and that means
that qsort could be driven to stack overflow. Such an overflow seems to
be the only explanation for today's report from Yiqing Jin of a SIGSEGV
in med3_tuple while creating an index of a couple billion entries with a
very large maintenance_work_mem setting. Therefore, let's spend the few
additional cycles and lines of code needed to choose the smaller partition
for recursion.
Also, fix up the qsort code so that it properly uses size_t not int for
some intermediate values representing numbers of items. This would only
be a live risk when sorting more than INT_MAX bytes (in qsort/qsort_arg)
or tuples (in qsort_tuple), which I believe would never happen with any
caller in the current core code --- but perhaps it could happen with
call sites in third-party modules? In any case, this is trouble waiting
to happen, and the corrected code is probably if anything shorter and
faster than before, since it removes sign-extension steps that had to
happen when converting between int and size_t.
In passing, move a couple of CHECK_FOR_INTERRUPTS() calls so that it's
not necessary to preserve the value of "r" across them, and prettify
the output of gen_qsort_tuple.pl a little.
Back-patch to all supported branches. The odds of hitting this issue
are probably higher in 9.4 and up than before, due to the new ability
to allocate sort workspaces exceeding 1GB, but there's no good reason
to believe that it's impossible to crash older branches this way.
This allows PostgreSQL modules and their dependencies to have undefined
symbols, resolved at runtime. Perl module shared objects rely on that
in Perl 5.8.0 and later. This fixes the crash when PL/PerlU loads such
modules, as the hstore_plperl test suite does. Module authors can link
using -Wl,-G to permit undefined symbols; by default, linking will fail
as it has. Back-patch to 9.0 (all supported versions).
Per Coverity (not that any of these are so non-obvious that they should not
have been caught before commit). The extent of leakage is probably minor
to unnoticeable, but a leak is a leak. Back-patch as necessary.
Michael Paquier
The documentation implied that there was seldom any reason to use the
array_append, array_prepend, and array_cat functions directly. But that's
not really true, because they can help make it clear which case is meant,
which the || operator can't do since it's overloaded to represent all three
cases. Add some discussion and examples illustrating the potentially
confusing behavior that can ensue if the parser misinterprets what was
meant.
Per a complaint from Michael Herold. Back-patch to 9.2, which is where ||
started to behave this way.
Ordinarily, a failure (unexpected exit status) of the startup subprocess
should be considered fatal, so the postmaster should just close up shop
and quit. However, if we sent the startup process a SIGQUIT or SIGKILL
signal, the failure is hardly "unexpected", and we should attempt restart;
this is necessary for recovery from ordinary backend crashes in hot-standby
scenarios. I attempted to implement the latter rule with a two-line patch
in commit 442231d7f7, but it now emerges that
that patch was a few bricks shy of a load: it failed to distinguish the
case of a signaled startup process from the case where the new startup
process crashes before reaching database consistency. That resulted in
infinitely respawning a new startup process only to have it crash again.
To handle this properly, we really must track whether we have sent the
*current* startup process a kill signal. Rather than add yet another
ad-hoc boolean to the postmaster's state, I chose to unify this with the
existing RecoveryError flag into an enum tracking the startup process's
state. That seems more consistent with the postmaster's general state
machine design.
Back-patch to 9.0, like the previous patch.
Tom fixed another one of these in commit 7f32dbcd, but there was another
almost identical one in libpq docs. Per his comment:
HP's web server has apparently become case-sensitive sometime recently.
Per bug #13479 from Daniel Abraham. Corrected link identified by Alvaro.
POSIX does not specify the -q option, and many implementations do not
offer it. Don't bother changing the MSVC build system, because having
non-GNU diff on Windows is vanishingly unlikely. Back-patch to 9.2,
where this invocation was introduced.
The psql crash happened when no current connection existed. (The second
new check is optional given today's undocumented NULL argument handling
in PQhost() etc.) Back-patch to 9.0 (all supported versions).
SUSv2-era shells don't set the PWD variable, though anything more modern
does. In the buildfarm environment this could lead to test.sh executing
with PWD pointing to $HOME or another high-level directory, so that there
were conflicts between concurrent executions of the test in different
branch subdirectories. This appears to be the explanation for recent
intermittent failures on buildfarm members binturong and dingo (and might
well have something to do with the buildfarm script's failure to capture
log files from pg_upgrade tests, too).
To fix, just use `pwd` in place of $PWD. AFAICS test.sh is the only place
in our source tree that depended on $PWD. Back-patch to all versions
containing this script.
Per buildfarm. Thanks to Oskari Saarenmaa for diagnosing the problem.
If an allocation fails in the main message handling loop, pqParseInput3
or pqParseInput2, it should not be treated as "not enough data available
yet". Otherwise libpq will wait indefinitely for more data to arrive from
the server, and gets stuck forever.
This isn't a complete fix - getParamDescriptions and getCopyStart still
have the same issue, but it's a step in the right direction.
Michael Paquier and me. Backpatch to all supported versions.
Build.bat and vcregress.bat got similar treatment years ago. I'm not sure
why install.bat wasn't treated at the same time, but it seems like a good
idea anyway.
The immediate problem with the old install.bat was that it had quoting
issues, and wouldn't work if the target directory's name contained spaces.
This fixes that problem.
I committed this to master yesterday, this is a backpatch of the same for
all supported versions.
The .backup file name can be passed to pg_archivecleanup even if
it includes the extension which is specified in -x option.
However, previously the document incorrectly warned a user
not to do that.
Back-patch to 9.2 where pg_archivecleanup's -x option and
the warning were added.
Expose PG_VERSION_NUM (e.g., "90600") as a Make variable; but for
consistency with the other Make variables holding similar info,
call the variable just VERSION_NUM not PG_VERSION_NUM.
There was some discussion of making this value available as a pg_config
value as well. However, that would entail substantially more work than
this two-line patch. Given that there was not exactly universal consensus
that we need this at all, let's just do a minimal amount of work for now.
Back-patch of commit a5d489ccb7, so that this
variable is actually useful for its intended purpose sometime before 2020.
Michael Paquier, reviewed by Pavel Stehule
VACUUM FREEZE generated false cancelations of standby queries on an
otherwise idle master. Caused by an off-by-one error on cutoff_xid
which goes back to original commit.
Backpatch to all versions 9.0+
Analysis and report by Marco Nenciarini
Bug fix by Simon Riggs
Commit f3b5565dd4 was a couple of bricks shy
of a load; specifically, it missed putting pg_trigger_tgrelid_tgname_index
into the relcache init file, because that index is not used by any
syscache. However, we have historically nailed that index into cache for
performance reasons. The upshot was that load_relcache_init_file always
decided that the init file was busted and silently ignored it, resulting
in a significant hit to backend startup speed.
To fix, reinstantiate RelationIdIsInInitFile() as a wrapper around
RelationSupportsSysCache(), which can know about additional relations
that should be in the init file despite being unknown to syscache.c.
Also install some guards against future mistakes of this type: make
write_relcache_init_file Assert that all nailed relations get written to
the init file, and make load_relcache_init_file emit a WARNING if it takes
the "wrong number of nailed relations" exit path. Now that we remove the
init files during postmaster startup, that case should never occur in the
field, even if we are starting a minor-version update that added or removed
rels from the nailed set. So the warning shouldn't ever be seen by end
users, but it will show up in the regression tests if somebody breaks this
logic.
Back-patch to all supported branches, like the previous commit.
Commit c03ad5602f introduced a planner
performance regression for UPDATE/DELETE on large inheritance sets.
It required copying the append_rel_list (which is of size proportional to
the number of inherited tables) once for each inherited table, thus
resulting in O(N^2) time and memory consumption. While it's difficult to
avoid that in general, the extra work only has to be done for
append_rel_list entries that actually reference subquery RTEs, which
inheritance-set entries will not. So we can buy back essentially all of
the loss in cases without subqueries in FROM; and even for those, the added
work is mainly proportional to the number of UNION ALL subqueries.
Back-patch to 9.2, like the previous commit.
Tom Lane and Dean Rasheed, per a complaint from Thomas Munro.
This supplements the GNU libc bug #6530 workarounds introduced in commit
54cd4f0457. On affected systems, a
tar-format pg_basebackup failed when some filename beneath the data
directory was not valid character data in the postmaster/walsender
locale. Back-patch to 9.1, where pg_basebackup was introduced. Extant,
bug-prone conversion specifications receive only ASCII bytes or involve
low-importance messages.
This avoids the problem that it might go to sleep for an unreasonable
amount of time in unusual conditions like the server clock moving
backwards an unreasonable amount of time.
(Simply moving the server clock forward again doesn't solve the problem
unless you wake up the autovacuum launcher manually, say by sending it
SIGHUP).
Per trouble report from Prakash Itnal in
https://www.postgresql.org/message-id/CAHC5u79-UqbapAABH2t4Rh2eYdyge0Zid-X=Xz-ZWZCBK42S0Q@mail.gmail.com
Analyzed independently by Haribabu Kommi and Tom Lane.
We already tried to improve this once, but the "improved" text was rather
off-target if you had provided a USING clause. Also, it seems helpful
to provide the exact text of a suggested USING clause, so users can just
copy-and-paste it when needed. Per complaint from Keith Rarick and a
suggestion from Merlin Moncure.
Back-patch to 9.2 where the current wording was adopted.
We don't know why a few Windows users have seen this fail, but the
taciturnity of the error message certainly isn't helping debug it.
Let's at least find out which LC category isn't working.
HotStandbyActiveInReplay, introduced in 061b079f, only allowed WAL
replay to happen in the startup process, missing the single user case.
This buglet is fairly harmless as it only causes problems when single
user mode in an assertion enabled build is used to replay a btree vacuum
record.
Backpatch to 9.2. 061b079f was backpatched further, but the assertion
was not.
When we invalidate the relcache entry for a system catalog or index, we
must also delete the relcache "init file" if the init file contains a copy
of that rel's entry. The old way of doing this relied on a specially
maintained list of the OIDs of relations present in the init file: we made
the list either when reading the file in, or when writing the file out.
The problem is that when writing the file out, we included only rels
present in our local relcache, which might have already suffered some
deletions due to relcache inval events. In such cases we correctly decided
not to overwrite the real init file with incomplete data --- but we still
used the incomplete initFileRelationIds list for the rest of the current
session. This could result in wrong decisions about whether the session's
own actions require deletion of the init file, potentially allowing an init
file created by some other concurrent session to be left around even though
it's been made stale.
Since we don't support changing the schema of a system catalog at runtime,
the only likely scenario in which this would cause a problem in the field
involves a "vacuum full" on a catalog concurrently with other activity, and
even then it's far from easy to provoke. Remarkably, this has been broken
since 2002 (in commit 7863404417), but we had
never seen a reproducible test case until recently. If it did happen in
the field, the symptoms would probably involve unexpected "cache lookup
failed" errors to begin with, then "could not open file" failures after the
next checkpoint, as all accesses to the affected catalog stopped working.
Recovery would require manually removing the stale "pg_internal.init" file.
To fix, get rid of the initFileRelationIds list, and instead consult
syscache.c's list of relations used in catalog caches to decide whether a
relation is included in the init file. This should be a tad more efficient
anyway, since we're replacing linear search of a list with ~100 entries
with a binary search. It's a bit ugly that the init file contents are now
so directly tied to the catalog caches, but in practice that won't make
much difference.
Back-patch to all supported branches.
We should set MyProc->databaseId after acquiring the per-database lock,
not beforehand. The old way risked deadlock against processes trying to
copy or delete the target database, since they would first acquire the lock
and then wait for processes with matching databaseId to exit; that left a
window wherein an incoming process could set its databaseId and then block
on the lock, while the other process had the lock and waited in vain for
the incoming process to exit.
CountOtherDBBackends() would time out and fail after 5 seconds, so this
just resulted in an unexpected failure not a permanent lockup, but it's
still annoying when it happens. A real-world example of a use-case is that
short-duration connections to a template database should not cause CREATE
DATABASE to fail.
Doing it in the other order should be fine since the contract has always
been that processes searching the ProcArray for a database ID must hold the
relevant per-database lock while searching. Thus, this actually removes
the former race condition that required an assumption that storing to
MyProc->databaseId is atomic.
It's been like this for a long time, so back-patch to all active branches.