The old code used SEQ_MINVALUE to get the smallest int64 value. This
was done as a convenience to avoid having to deal with INT64_IS_BUSTED,
but that is obsolete now. Also, it is incorrect because the smallest
int64 value is actually SEQ_MINVALUE-1. Fix by writing out the constant
the long way, as it is done elsewhere in the code.
After pg_upgrade, it is possible that some tuples' Xmax have multixacts
corresponding to the old installation; such multixacts cannot have
running members anymore. In many code sites we already know not to read
them and clobber them silently, but at least when VACUUM tries to freeze
a multixact or determine whether one needs freezing, there's an attempt
to resolve it to its member transactions by calling GetMultiXactIdMembers,
and if the multixact value is "in the future" with regards to the
current valid multixact range, an error like this is raised:
ERROR: MultiXactId 123 has not been created yet -- apparent wraparound
and vacuuming fails. Per discussion with Andrew Gierth, it is completely
bogus to try to resolve multixacts coming from before a pg_upgrade,
regardless of where they stand with regards to the current valid
multixact range.
It's possible to get from under this problem by doing SELECT FOR UPDATE
of the problem tuples, but if tables are large, this is slow and
tedious, so a more thorough solution is desirable.
To fix, we realize that multixacts in xmax created in 9.2 and previous
have a specific bit pattern that is never used in 9.3 and later (we
already knew this, per comments and infomask tests sprinkled in various
places, but we weren't leveraging this knowledge appropriately).
Whenever the infomask of the tuple matches that bit pattern, we just
ignore the multixact completely as if Xmax wasn't set; or, in the case
of tuple freezing, we act as if an unwanted value is set and clobber it
without decoding. This guarantees that no errors will be raised, and
that the values will be progressively removed until all tables are
clean. Most callers of GetMultiXactIdMembers are patched to recognize
directly that the value is a removable "empty" multixact and avoid
calling GetMultiXactIdMembers altogether.
To avoid changing the signature of GetMultiXactIdMembers() in back
branches, we keep the "allow_old" boolean flag but rename it to
"from_pgupgrade"; if the flag is true, we always return an empty set
instead of looking up the multixact. (I suppose we could remove the
argument in the master branch, but I chose not to do so in this commit).
This was broken all along, but the error-facing message appeared first
because of commit 8e9a16ab8f and was partially fixed in a25c2b7c4d.
This fix, backpatched all the way back to 9.3, goes approximately in the
same direction as a25c2b7c4d but should cover all cases.
Bug analysis by Andrew Gierth and Álvaro Herrera.
A number of public reports match this bug:
https://www.postgresql.org/message-id/20140330040029.GY4582@tamriel.snowman.nethttps://www.postgresql.org/message-id/538F3D70.6080902@publicrelay.comhttps://www.postgresql.org/message-id/556439CF.7070109@pscs.co.ukhttps://www.postgresql.org/message-id/SG2PR06MB0760098A111C88E31BD4D96FB3540@SG2PR06MB0760.apcprd06.prod.outlook.comhttps://www.postgresql.org/message-id/20160615203829.5798.4594@wrigleys.postgresql.org
Buildfarm member skink failed with symptoms suggesting that an
auto-analyze had happened and changed the plan displayed for a
test query. Although this is evidently of low probability,
regression tests that sometimes fail are no fun, so add commands
to force a bitmap scan to be chosen.
This patch essentially reverts commit 4c6780fd17, in favor of a much
simpler solution for the case where the new cluster would choose to create
a TOAST table but the old cluster doesn't have one: just don't create a
TOAST table.
The existing code failed in at least two different ways if the situation
arose: (1) ALTER TABLE RESET didn't grab an exclusive lock, so that the
lock sanity check in create_toast_table failed; (2) pg_upgrade did not
provide a pg_type OID for the new toast table, so that the crosscheck in
TypeCreate failed. While both these problems were introduced by later
patches, they show that the hack being used to cause TOAST table creation
is overwhelmingly fragile (and untested). I also note that before the
TypeCreate crosscheck was added, the code would have resulted in assigning
an indeterminate pg_type OID to the toast table, possibly causing a later
OID conflict in that catalog; so that it didn't really work even when
committed.
If we simply don't create a TOAST table, there will only be a problem if
the code tries to store a tuple that's wider than a page, and field
compression isn't sufficient to get it under a page. Given that the TOAST
creation threshold is intended to be about a quarter of a page, it's very
hard to believe that cross-version differences in the do-we-need-a-toast-
table heuristic could result in an observable problem. So let's just
follow the old version's conclusion about whether a TOAST table is needed.
(If we ever do change needs_toast_table() so much that this conclusion
doesn't apply, we can devise a solution at that time, and hopefully do
it in a less klugy way than 4c6780fd17 did.)
Back-patch to 9.3, like the previous patch.
Discussion: <8110.1462291671@sss.pgh.pa.us>
CHECK_PAGE_OFFSET_RANGE() has been unused forever.
CHECK_RELATION_BLOCK_RANGE() has been unused in pgstatindex.c ever since
bt_page_stats() and bt_page_items() functions were moved from pgstattuple
to pageinspect module. It still exists in pageinspect/btreefuncs.c.
Daniel Gustafsson
This didn't work because when we dropped and re-established a database
connection, we did not bother to reset session-specific state such as
the statements-are-prepared flags.
The st->prepared[] array certainly needs to be flushed, and I cleared a
couple of other fields as well that couldn't possibly retain meaningful
state for a new connection.
In passing, fix some bogus comments and strange field order choices.
Per report from Robins Tharakan.
Renaming a file using rename(2) is not guaranteed to be durable in face
of crashes. Use the previously added durable_rename()/durable_link_or_rename()
in various places where we previously just renamed files.
Most of the changed call sites are arguably not critical, but it seems
better to err on the side of too much durability. The most prominent
known case where the previously missing fsyncs could cause data loss is
crashes at the end of a checkpoint. After the actual checkpoint has been
performed, old WAL files are recycled. When they're filled, their
contents are fdatasynced, but we did not fsync the containing
directory. An OS/hardware crash in an unfortunate moment could then end
up leaving that file with its old name, but new content; WAL replay
would thus not replay it.
Reported-By: Tomas Vondra
Author: Michael Paquier, Tomas Vondra, Andres Freund
Discussion: 56583BDD.9060302@2ndquadrant.com
Backpatch: All supported branches
ltree/ltree_gist/ltxtquery's headers stores data at MAXALIGN alignment,
requiring some padding bytes. So far we left these uninitialized. Zero
those by using palloc0.
Author: Andres Freund
Reported-By: Andres Freund / valgrind / buildarm animal skink
Backpatch: 9.1-
Suppress creation of the pg_upgrade delete script when the new data
directory is inside the old data directory.
Reported-by: IRC
Backpatch-through: 9.3, where delete script tests were added
Dead or half-dead index leaf pages were incorrectly reported as live, as a
consequence of a code rearrangement I made (during a moment of severe brain
fade, evidently) in commit d287818eb5.
The index metapage was not counted in index_size, causing that result to
not agree with the actual index size on-disk.
Index root pages were not counted in internal_pages, which is inconsistent
compared to the case of a root that's also a leaf (one-page index), where
the root would be counted in leaf_pages. Aside from that inconsistency,
this could lead to additional transient discrepancies between the reported
page counts and index_size, since it's possible for pgstatindex's scan to
see zero or multiple pages marked as BTP_ROOT, if the root moves due to
a split during the scan. With these fixes, index_size will always be
exactly one page more than the sum of the displayed page counts.
Also, the index_size result was incorrectly documented as being measured in
pages; it's always been measured in bytes. (While fixing that, I couldn't
resist doing some small additional wordsmithing on the pgstattuple docs.)
Including the metapage causes the reported index_size to not be zero for
an empty index. To preserve the desired property that the pgstattuple
regression test results are platform-independent (ie, BLCKSZ configuration
independent), scale the index_size result in the regression tests.
The documentation issue was reported by Otsuka Kenji, and the inconsistent
root page counting by Peter Geoghegan; the other problems noted by me.
Back-patch to all supported branches, because this has been broken for
a long time.
The original code wasn't careful to test the file descriptor returned by
PQsocket() for an invalid socket. If an invalid socket did turn up,
that would amount to calling FD_ISSET with fd = -1, whereby undefined
behavior can be invoked.
To fix, test file descriptor for validity and stop further processing if
that fails.
Problem noticed by Coverity.
There is an existing FD_ISSET callsite that does check for invalid
sockets beforehand, but the error message reported by it was
strerror(errno); in testing the aforementioned change, that turns out to
result in "bad socket: Success" which isn't terribly helpful. Instead
use PQerrorMessage() in both places which is more likely to contain an
useful error message.
Backpatch-through: 9.1.
Commit e09996ff8d was one brick shy of a load: it didn't insist
that the detected JSON number be the whole of the supplied string.
This allowed inputs such as "2016-01-01" to be misdetected as valid JSON
numbers. Per bug #13906 from Dmitry Ryabov.
In passing, be more wary of zero-length input (I'm not sure this can
happen given current callers, but better safe than sorry), and do some
minor cosmetic cleanup.
The order of inclusion of .o files makes a difference in linker output;
not a functional difference, but still a bitwise difference, which annoys
some packagers who would like reproducible builds.
Report and patch by Christoph Berg
Both Blowfish and DES implementations of crypt() can take arbitrarily
long time, depending on the number of rounds specified by the caller;
make sure they can be interrupted.
Author: Andreas Karlsson
Reviewer: Jeff Janes
Backpatch to 9.1.
Also fix getErrorText() to return the right error string on failure.
This behavior now matches that of other operating systems.
Report by Noah Misch
Backpatch through 9.1
The tsquery, ltxtquery and query_int data types have a common ancestor.
Having acquired check_stack_depth() calls independently, each was
missing at least one call. Back-patch to 9.0 (all supported versions).
Certain short salts crashed the backend or disclosed a few bytes of
backend memory. For existing salt-induced error conditions, emit a
message saying as much. Back-patch to 9.0 (all supported versions).
Josh Kupershmidt
Security: CVE-2015-5288
This case seems to have been overlooked when unvalidated check constraints
were introduced, in 9.2. The code would attempt to dump such constraints
over again for each child table, even though adding them to the parent
table is sufficient.
In 9.2 and 9.3, also fix contrib/pg_upgrade/Makefile so that the "make
clean" target fully cleans up after a failed test. This evidently got
dealt with at some point in 9.4, but it wasn't back-patched. I ran into
it while testing this fix ...
Per bug #13656 from Ingmar Brouns.
If we have a local Var of say varchar type with default collation, and
we apply a RelabelType to convert that to text with default collation, we
don't want to consider that as creating an FDW_COLLATE_UNSAFE situation.
It should be okay to compare that to a remote Var, so long as the remote
Var determines the comparison collation. (When we actually ship such an
expression to the remote side, the local Var would become a Param with
default collation, meaning the remote Var would in fact control the
comparison collation, because non-default implicit collation overrides
default implicit collation in parse_collate.c.) To fix, be more precise
about what FDW_COLLATE_NONE means: it applies either to a noncollatable
data type or to a collatable type with default collation, if that collation
can't be traced to a remote Var. (When it can, FDW_COLLATE_SAFE is
appropriate.) We were essentially using that interpretation already at
the Var/Const/Param level, but we weren't bubbling it up properly.
An alternative fix would be to introduce a separate FDW_COLLATE_DEFAULT
value to describe the second situation, but that would add more code
without changing the actual behavior, so it didn't seem worthwhile.
Also, since we're clarifying the rule to be that we care about whether
operator/function input collations match, there seems no need to fail
immediately upon seeing a Const/Param/non-foreign-Var with nondefault
collation. We only have to reject if it appears in a collation-sensitive
context (for example, "var IS NOT NULL" is perfectly safe from a collation
standpoint, whatever collation the var has). So just set the state to
UNSAFE rather than failing immediately.
Per report from Jeevan Chalke. This essentially corrects some sloppy
thinking in commit ed3ddf918b, so back-patch
to 9.3 where that logic appeared.
This setting contains extra configuration for the temp instance, as used
in pg_regress' --temp-config flag.
Backpatch to 9.2 where test.sh was introduced.
Modify pg_dump to restore postgres/template1 databases to non-default
tablespaces by switching out of the database to be moved, then switching
back.
Also, to fix potentially cases where the old/new tablespaces might not
match, fix pg_upgrade to process new/old tablespaces separately in all
cases.
Report by Marti Raudsepp
Patch by Marti Raudsepp, me
Backpatch through 9.0
We were missing a few return checks on OpenSSL calls. Should be pretty
harmless, since we haven't seen any user reports about problems, and
this is not a high-traffic module anyway; still, a bug is a bug, so
backpatch this all the way back to 9.0.
Author: Michael Paquier, while reviewing another sslinfo patch
The regression tests for sepgsql were broken by changes in the
base distro as-shipped policies. Specifically, definition of
unconfined_t in the system default policy was changed to bypass
multi-category rules, which the regression test depended on.
Fix that by defining a custom privileged domain
(sepgsql_regtest_superuser_t) and using it instead of system's
unconfined_t domain. The new sepgsql_regtest_superuser_t domain
performs almost like the current unconfined_t, but restricted by
multi-category policy as the traditional unconfined_t was.
The custom policy module is a self defined domain, and so should not
be affected by related future system policy changes. However, it still
uses the unconfined_u:unconfined_r pair for selinux-user and role.
Those definitions have not been changed for several years and seem
less risky to rely on than the unconfined_t domain. Additionally, if
we define custom user/role, they would need to be manually defined
at the operating system level, adding more complexity to an already
non-standard and complex regression test.
Back-patch to 9.3. The regression tests will need more work before
working correctly on 9.2. Starting with 9.2, sepgsql has had dependencies
on libselinux versions that are only available on newer distros with
the changed set of policies (e.g. RHEL 7.x). On 9.1 sepgsql works
fine with the older distros with original policy set (e.g. RHEL 6.x),
and on which the existing regression tests work fine. We might want
eventually change 9.1 sepgsql regression tests to be more independent
from the underlying OS policies, however more work will be needed to
make that happen and it is not clear that it is worth the effort.
Kohei KaiGai with review by Adam Brightwell and me, commentary by
Stephen, Alvaro, Tom, Robert, and others.
An EAN beginning with 979 (but not 9790 - those are ISMN's) are accepted
as ISBN numbers, but they cannot be represented in the old, 10-digit ISBN
format. They must be output in the new 13-digit ISBN-13 format. We printed
out an incorrect value for those.
Also add a regression test, to test this and some other basic functionality
of the module.
Patch by Fabien Coelho. This fixes bug #13442, reported by B.Z. Backpatch
to 9.1, where we started to recognize ISBN-13 numbers.
POSIX does not specify the -q option, and many implementations do not
offer it. Don't bother changing the MSVC build system, because having
non-GNU diff on Windows is vanishingly unlikely. Back-patch to 9.2,
where this invocation was introduced.
SUSv2-era shells don't set the PWD variable, though anything more modern
does. In the buildfarm environment this could lead to test.sh executing
with PWD pointing to $HOME or another high-level directory, so that there
were conflicts between concurrent executions of the test in different
branch subdirectories. This appears to be the explanation for recent
intermittent failures on buildfarm members binturong and dingo (and might
well have something to do with the buildfarm script's failure to capture
log files from pg_upgrade tests, too).
To fix, just use `pwd` in place of $PWD. AFAICS test.sh is the only place
in our source tree that depended on $PWD. Back-patch to all versions
containing this script.
Per buildfarm. Thanks to Oskari Saarenmaa for diagnosing the problem.
This has been the predominant outcome. When the output of decrypting
with a wrong key coincidentally resembled an OpenPGP packet header,
pgcrypto could instead report "Corrupt data", "Not text data" or
"Unsupported compression algorithm". The distinct "Corrupt data"
message added no value. The latter two error messages misled when the
decrypted payload also exhibited fundamental integrity problems. Worse,
error message variance in other systems has enabled cryptologic attacks;
see RFC 4880 section "14. Security Considerations". Whether these
pgcrypto behaviors are likewise exploitable is unknown.
In passing, document that pgcrypto does not resist side-channel attacks.
Back-patch to 9.0 (all supported versions).
Security: CVE-2015-3167
Previously, this prevented promoted standby servers from being upgraded
because of a missing WAL history file. (Timeline 1 doesn't need a
history file, and we don't copy WAL files anyway.)
Report by Christian Echerer(?), Alexey Klyukin
Backpatch through 9.0
This patch causes pg_upgrade to error out during its check phase if:
(1) template0 is marked connectable
or
(2) any other database is marked non-connectable
This is done because, in the first case, pg_upgrade would fail because
the pg_dumpall --globals restore would fail, and in the second case, the
database would not be restored, leading to data loss.
Report by Matt Landry (1), Stephen Frost (2)
Backpatch through 9.0
These functions should return SETOF TEXT[], like the core functions they
are wrappers for; but they were incorrectly declared as returning just
TEXT[]. This mistake had two results: first, if there was no match you got
a scalar null result, whereas what you should get is an empty set (zero
rows). Second, the 'g' flag was effectively ignored, since you would get
only one result array even if there were multiple matches, as reported by
Jeff Certain.
While ignoring 'g' is a clear bug, the behavior for no matches might well
have been thought to be the intended behavior by people who hadn't compared
it carefully to the core regexp_matches() functions. So we should tread
carefully about introducing this change in the back branches. Still, it
clearly is a bug and so providing some fix is desirable.
After discussion, the conclusion was to introduce the change in a 1.1
version of the citext extension (as we would need to do anyway); 1.0 still
contains the incorrect behavior. 1.1 is the default and only available
version in HEAD, but it is optional in the back branches, where 1.0 remains
the default version. People wishing to adopt the fix in back branches will
need to explicitly do ALTER EXTENSION citext UPDATE TO '1.1'. (I also
provided a downgrade script in the back branches, so people could go back
to 1.0 if necessary.)
This should be called out as an incompatible change in the 9.5 release
notes, although we'll also document it in the next set of back-branch
release notes. The notes should mention that any views or rules that use
citext's regexp_matches() functions will need to be dropped before
upgrading to 1.1, and then recreated again afterwards.
Back-patch to 9.1. The bug goes all the way back to citext's introduction
in 8.4, but pre-9.1 there is no extension mechanism with which to manage
the change. Given the lack of previous complaints it seems unnecessary to
change this behavior in 9.0, anyway.
While gcc doesn't complain if you declare a function "static" and then
define it not-static, other compilers do; and in any case the code is
highly misleading this way. Add the missing "static" keywords to a
couple of recent patches. Per buildfarm member pademelon.
As with initdb these programs need to run with a restricted token, and
if they don't pg_upgrade will fail when run as a user with Adminstrator
privileges.
Backpatch to all live branches. On the development branch the code is
reorganized so that the restricted token code is now in a single
location. On the stable bramches a less invasive change is made by
simply copying the relevant code to pg_upgrade.c and pg_resetxlog.c.
Patches and bug report from Muhammad Asif Naeem, reviewed by Michael
Paquier, slightly edited by me.
It's all very well to claim that a simplistic sort is fast in easy
cases, but O(N^2) in the worst case is not good ... especially if the
worst case is as easy to hit as "descending order input". Replace that
bit with our standard qsort.
Per bug #12866 from Maksym Boguk. Back-patch to all active branches.
This covers alterations to buffer sizing and zeroing made between imath
1.3 and imath 1.20. Valgrind Memcheck identified the buffer overruns
and reliance on uninitialized data; their exploit potential is unknown.
Builds specifying --with-openssl are unaffected, because they use the
OpenSSL BIGNUM facility instead of imath. Back-patch to 9.0 (all
supported versions).
Security: CVE-2015-0243
Most callers pass a stack buffer. The ensuing stack smash can crash the
server, and we have not ruled out the viability of attacks that lead to
privilege escalation. Back-patch to 9.0 (all supported versions).
Marko Tiikkaja
Security: CVE-2015-0243
Coverity points out that mdc_finish returns a pointer to a local buffer
(which of course is gone as soon as the function returns), leaving open
a risk of misbehaviors possibly as bad as a stack overwrite.
In reality, the only possible call site is in process_data_packets()
which does not examine the returned pointer at all. So there's no
live bug, but nonetheless the code is confusing and risky. Refactor
to avoid the issue by letting process_data_packets() call mdc_finish()
directly instead of going through the pullf_read() API.
Although this is only cosmetic, it seems good to back-patch so that
the logic in pgp-decrypt.c stays in sync across all branches.
Marko Kreen
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426b it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me