brin_page_type() and brin_metapage_info() did not enforce being called
by superuser, like other pageinspect functions that take bytea do.
Since they don't verify the passed page thoroughly, it is possible to
use them to read the server memory with a carefully crafted bytea value,
up to a file kilobytes from where the input bytea is located.
Have them throw errors if called by a non-superuser.
Report and initial patch: Andreas Seltenreich
Security: CVE-2016-3065
In the plancache, we check if the environment we planned the query under
has changed in a way which requires us to re-plan, such as when the user
for whom the plan was prepared changes and RLS is being used (and,
therefore, there may be different policies to apply).
Unfortunately, while those values were set and checked, they were not
being reset when the query was re-planned and therefore, in cases where
we change role, re-plan, and then change role again, we weren't
re-planning again. This leads to potentially incorrect policies being
applied in cases where role-specific policies are used and a given query
is planned under one role and then executed under other roles, which
could happen under security definer functions or when a common user and
query is planned initially and then re-used across multiple SET ROLEs.
Further, extensions which made use of CopyCachedPlan() may suffer from
similar issues as the RLS-related fields were not properly copied as
part of the plan and therefore RevalidateCachedQuery() would copy in the
current settings without invalidating the query.
Fix by using the same approach used for 'search_path', where we set the
correct values in CompleteCachedPlan(), check them early on in
RevalidateCachedQuery() and then properly reset them if re-planning.
Also, copy through the values during CopyCachedPlan().
Pointed out by Ashutosh Bapat. Reviewed by Michael Paquier.
Back-patch to 9.5 where RLS was introduced.
Security: CVE-2016-2193
Previously pg_rewind did not fsync any files. That's problematic, given
that the target directory is modified. If the database was started
afterwards, 2ce439f33 luckily already caused the data directory to be
synced to disk at postmaster startup; reducing the scope of the problem.
To fix, use initdb -S, at the end of the pg_rewind run. It doesn't seem
worthwhile to duplicate the code into pg_rewind, and initdb -S is
already used that way by pg_upgrade.
Reported-By: Andres Freund
Author: Michael Paquier, somewhat edited by me
Discussion: 20160310034352.iuqgvpmg5qmnxtkz@alap3.anarazel.deCAB7nPqSytVG1o4S3S2pA1O=692ekurJ+fckW2PywEG3sNw54Ow@mail.gmail.com
Backpatch: 9.5, where pg_rewind was introduced
This was a relatively harmless leak, as createBackupLabel() is only
called once per pg_rewind invocation.
Author: Michael Paquier
Reported-By: Michael Paquier
Discussion: CAB7nPqRnOw30gOXe2_SPLjh37bgm4V+txbYAPwoXb97nGQ297w@mail.gmail.com
Backpatch: 9.5, where pg_rewind was introduced
Returning the direct result of bit arithmetic, in a macro intended to be
used in a boolean manner, can be problematic if the return value is
stored in a variable of type 'bool'. If bool is implemented using C99's
_Bool, that can lead to comparison failures if the variable is then
compared again with the expression (see ginStepRight() for an example
that fails), as _Bool forces the result to be 0/1. That happens in some
configurations of newer MSVC compilers. It's also problematic when
storing the result of such an expression in a narrower type.
Several gin macros have been declared in that style since gin's initial
commit in 8a3631f8d86.
There's a lot more macros like this, but this is the only one causing
regression test failures; and I don't want to commit and backpatch a
larger patch with lots of conflicts just before the next set of minor
releases.
Discussion: 20150811154237.GD17575@awork2.anarazel.de
Backpatch: All supported branches
We really need to sync all of our IANA-derived timezone code with upstream,
but that's going to be a large patch and I certainly don't care to shove
such a thing into stable branches immediately before a release. As a
stopgap, copy just the tzcode2016c logic that checks validity of timezone
abbreviations. This prevents getting multiple "time zone abbreviation
differs from POSIX standard" bleats with tzdata 2014b and later.
DST law changes in Azerbaijan, Chile, Haiti, Palestine, and Russia (Altai,
Astrakhan, Kirov, Sakhalin, Ulyanovsk regions). Historical corrections
for Lithuania, Moldova, Russia (Kaliningrad, Samara, Volgograd).
As of 2015b, the keepers of the IANA timezone database started to use
numeric time zone abbreviations (e.g., "+04") instead of inventing
abbreviations not found in the wild like "ASTT". This causes our rather
old copy of zic to whine "warning: time zone abbreviation differs from
POSIX standard" several times during "make install". This warning is
harmless according to the IANA folk, and I don't see any problems with
these abbreviations in some simple tests; but it seems like now would be
a good time to update our copy of the tzcode stuff. I'll look into that
soon.
Unfortunately, every version of glibc thus far tested has bugs whereby
strcoll() ordering does not match strxfrm() ordering as required by
the standard. This can result in, for example, corrupted indexes.
Disabling abbreviated keys in these cases slows down non-C-collation
string sorting considerably, but there seems to be no practical
alternative. Users who are confident that their libc implementations
are solid in this regard can re-enable the optimization by compiling
with TRUST_STRXFRM.
Users who have built indexes using PostgreSQL 9.5 or PostgreSQL 9.5.1
should REINDEX if there is a possibility that they may have been
affected by this problem.
Report by Marc-Olaf Jaschke. Investigation mostly by Tom Lane, with
help from Peter Geoghegan, Noah Misch, Stephen Frost, and me. Patch
by me, reviewed by Peter Geoghegan and Tom Lane.
User-facing (even tested by regression tests) error conditions were thrown
with elog(), hence had wrong SQLSTATE and were untranslatable. And the
error message texts weren't up to project style, either.
jsonb_set() could produce wrong answers or incorrect error reports, or in
the worst case even crash, when trying to convert a path-array element into
an integer for use as an array subscript. Per report from Vitaly Burovoy.
Back-patch to 9.5 where the faulty code was introduced (in commit
c6947010ceb42143).
Michael Paquier
In commit afb9249d06f47d7a, we (probably I) made ExecLockRows assign
null test tuples to all relations of the query while setting up to do an
EvalPlanQual recheck for a newly-updated locked row. This was sheerest
brain fade: we should only set test tuples for relations that are lockable
by the LockRows node, and in particular empty test tuples are only sensible
for inheritance child relations that weren't the source of the current
tuple from their inheritance tree. Setting a null test tuple for an
unrelated table causes it to return NULLs when it should not, as exhibited
in bug #14034 from Bronislav Houdek. To add insult to injury, doing it the
wrong way required two loops where one would suffice; so the corrected code
is even a bit shorter and faster.
Add a regression test case based on his example, and back-patch to 9.5
where the bug was introduced.
Modern Perl has removed psed from its core distribution, so it might not
be readily available on some build platforms. We therefore replace its
use with a Perl script generated by s2p, which is equivalent to the sed
script. The latter is retained for non-MSVC builds to avoid creating a
new hard dependency on Perl for non-Windows tarball builds.
Backpatch to all live branches.
Michael Paquier and me.
In HEAD, fix incorrect field width for hours part of OF when tm_gmtoff is
negative. This was introduced by commit 2d87eedc1d4468d3 as a result of
falsely applying a pattern that's correct when + signs are omitted, which
is not the case for OF.
In 9.4, fix missing abs() call that allowed a sign to be attached to the
minutes part of OF. This was fixed in 9.5 by 9b43d73b3f9bef27, but for
inscrutable reasons not back-patched.
In all three versions, ensure that the sign of tm_gmtoff is correctly
reported even when the GMT offset is less than 1 hour.
Add regression tests, which evidently we desperately need here.
Thomas Munro and Tom Lane, per report from David Fetter
This didn't work because when we dropped and re-established a database
connection, we did not bother to reset session-specific state such as
the statements-are-prepared flags.
The st->prepared[] array certainly needs to be flushed, and I cleared a
couple of other fields as well that couldn't possibly retain meaningful
state for a new connection.
In passing, fix some bogus comments and strange field order choices.
Per report from Robins Tharakan.
INSERT ... ON CONFLICT's precheck may have to wait on the outcome of
another insertion, which may or may not itself be a speculative
insertion. This wait is not necessarily associated with an exclusion
constraint, but was always reported that way in log messages if the wait
happened to involve a tuple that had no speculative token.
Initially discovered through use of ON CONFLICT DO NOTHING, where
spurious references to exclusion constraints in log messages were more
likely.
Patch by Peter Geoghegan.
Reviewed by Julien Rouhaud.
Back-patch to 9.5 where INSERT ... ON CONFLICT was added.
Previously, we included <xlocale.h> only if necessary to get the definition
of type locale_t. According to notes in PGAC_TYPE_LOCALE_T, this is
important because on some versions of glibc that file supplies an
incompatible declaration of locale_t. (This info may be obsolete, because
on my RHEL6 box that seems to be the *only* definition of locale_t; but
there may still be glibc's in the wild for which it's a live concern.)
It turns out though that on FreeBSD and maybe other BSDen, you can get
locale_t from stdlib.h or locale.h but mbstowcs_l() and friends only from
<xlocale.h>. This was leaving us compiling calls to mbstowcs_l() and
friends with no visible prototype, which causes a warning and could
possibly cause actual trouble, since it's not declared to return int.
Hence, adjust the configure checks so that we'll include <xlocale.h>
either if it's necessary to get type locale_t or if it's necessary to
get a declaration of mbstowcs_l().
Report and patch by Aleksander Alekseev, somewhat whacked around by me.
Back-patch to all supported branches, since we have been using
mbstowcs_l() since 9.1.
On the machines I tried this on, pressing TAB after SECURITY LABEL led to
being offered ON and FOR as intended, plus random other keywords (varying
across machines). But if you were a bit more unlucky you'd get a crash,
as reported by nummervet@mail.ru in bug #14019.
Seems to have been an aboriginal error in the SECURITY LABEL patch,
commit 4d355a8336e0f226. Hence, back-patch to all supported versions.
There's no bug in HEAD, though, thanks to our recent tab-completion
rewrite.
Commit d88976cfa1302e8d removed this code from ginFreeScanKeys():
- if (entry->list)
- pfree(entry->list);
evidently in the belief that that ItemPointer array is allocated in the
keyCtx and so would be reclaimed by the following MemoryContextReset.
Unfortunately, it isn't and it won't. It'd likely be a good idea for
that to become so, but as a simple and back-patchable fix in the
meantime, restore this code to ginFreeScanKeys().
Also, add a similar pfree to where startScanEntry() is about to zero out
entry->list. I am not sure if there are any code paths where this
change prevents a leak today, but it seems like cheap future-proofing.
In passing, make the initial allocation of so->entries[] use palloc
not palloc0. The code doesn't depend on unused entries being zero;
if it did, the array-enlargement code in ginFillScanEntry() would be
wrong. So using palloc0 initially can only serve to confuse readers
about what the invariant is.
Per report from Felipe de Jesús Molina Bravo, via Jaime Casanova in
<CAJGNTeMR1ndMU2Thpr8GPDUfiHTV7idELJRFusA5UXUGY1y-eA@mail.gmail.com>
This longstanding functionality evidently got lost in commit
3d6d1b585524aab6. Noted while studying an OOM report from Jaime
Casanova. Backpatch to 9.5 where the bug was introduced.
Commit a2dabf0e1dda93c8 had the bright idea that it could modify a "const"
global variable if it merely casted away const from a pointer. This does
not work on platforms where the compiler puts "const" variables into
read-only storage. Depressingly, we evidently have no such platforms in
our buildfarm ... an oversight I have now remedied. (The one platform
that is known to catch this is recent OS X with -fno-common.)
Per report from Chris Ruprecht. Back-patch to 9.5 where the bogus
code was introduced.
The chapter "Interfacing Extensions To Indexes" and CREATE OPERATOR
CLASS reference page were missed when BRIN was added. We document
all our other index access methods there, so make sure BRIN complies.
Author: Álvaro Herrera
Reported-By: Julien Rouhaud, Tom Lane
Reviewed-By: Emre Hasegeli
Discussion: https://www.postgresql.org/message-id/56CF604E.9000303%40dalibo.com
Backpatch: 9.5, where BRIN was introduced
The Visual Studio 2013 CRT generates invalid code when it makes a 64-bit
build that is later used on a CPU that supports AVX2 instructions using a
version of Windows before 7SP1/2008R2SP1.
Detect this combination, and in those cases turn off the generation of
FMA3, per recommendation from the Visual Studio team.
The bug is actually in the CRT shipping with Visual Studio 2013, but
Microsoft have stated they're only fixing it in newer major versions.
The fix is therefor conditioned specifically on being built with this
version of Visual Studio, and not previous or later versions.
Author: Christian Ullrich
Renaming a file using rename(2) is not guaranteed to be durable in face
of crashes. Use the previously added durable_rename()/durable_link_or_rename()
in various places where we previously just renamed files.
Most of the changed call sites are arguably not critical, but it seems
better to err on the side of too much durability. The most prominent
known case where the previously missing fsyncs could cause data loss is
crashes at the end of a checkpoint. After the actual checkpoint has been
performed, old WAL files are recycled. When they're filled, their
contents are fdatasynced, but we did not fsync the containing
directory. An OS/hardware crash in an unfortunate moment could then end
up leaving that file with its old name, but new content; WAL replay
would thus not replay it.
Reported-By: Tomas Vondra
Author: Michael Paquier, Tomas Vondra, Andres Freund
Discussion: 56583BDD.9060302@2ndquadrant.com
Backpatch: All supported branches
Renaming a file using rename(2) is not guaranteed to be durable in face
of crashes; especially on filesystems like xfs and ext4 when mounted
with data=writeback. To be certain that a rename() atomically replaces
the previous file contents in the face of crashes and different
filesystems, one has to fsync the old filename, rename the file, fsync
the new filename, fsync the containing directory. This sequence is not
generally adhered to currently; which exposes us to data loss risks. To
avoid having to repeat this arduous sequence, introduce
durable_rename(), which wraps all that.
Also add durable_link_or_rename(). Several places use link() (with a
fallback to rename()) to rename a file, trying to avoid replacing the
target file out of paranoia. Some of those rename sequences need to be
durable as well. There seems little reason extend several copies of the
same logic, so centralize the link() callers.
This commit does not yet make use of the new functions; they're used in
a followup commit.
Author: Michael Paquier, Andres Freund
Discussion: 56583BDD.9060302@2ndquadrant.com
Backpatch: All supported branches
An index search using a row comparison such as ROW(a, b) > ROW('x', 'y')
would stop upon reaching a NULL entry in the "b" column, ignoring the
fact that there might be non-NULL "b" values associated with later values
of "a". This happens because _bt_mark_scankey_required() marks the
subsidiary scankey for "b" as required, which is just wrong: it's for
a column after the one with the first inequality key (namely "a"), and
thus can't be considered a required match.
This bit of brain fade dates back to the very beginnings of our support
for indexed ROW() comparisons, in 2006. Kind of astonishing that no one
came across it before Glen Takahashi, in bug #14010.
Back-patch to all supported versions.
Note: the given test case doesn't actually fail in unpatched 9.1, evidently
because the fix for bug #6278 (i.e., stopping at nulls in either scan
direction) is required to make it fail. I'm sure I could devise a case
that fails in 9.1 as well, perhaps with something involving making a cursor
back up; but it doesn't seem worth the trouble.
Python's allocator does some low-level tricks for efficiency;
unfortunately they trigger valgrind errors. Those tricks can be disabled
making instrumentation easier; but few people testing postgres will have
such a build of python. So add broad suppressions of the resulting
errors.
See also https://svn.python.org/projects/python/trunk/Misc/README.valgrind
This possibly will suppress valid errors, but without it it's basically
impossible to use valgrind with plpython code.
Author: Andres Freund
Backpatch: 9.4, where we started to maintain valgrind suppressions
ltree/ltree_gist/ltxtquery's headers stores data at MAXALIGN alignment,
requiring some padding bytes. So far we left these uninitialized. Zero
those by using palloc0.
Author: Andres Freund
Reported-By: Andres Freund / valgrind / buildarm animal skink
Backpatch: 9.1-
plperl_ref_from_pg_array() didn't consider the case that postgrs arrays
can have 0 dimensions (when they're empty) and accessed the first
dimension without a check. Fix that by special casing the empty array
case.
Author: Alex Hunsaker
Reported-By: Andres Freund / valgrind / buildfarm animal skink
Discussion: 20160308063240.usnzg6bsbjrne667@alap3.anarazel.de
Backpatch: 9.1-
Commit 385f337c9f39b21dca96ca4770552a10a6d5af24 added a new argument
to the FDW GetForeignPlan method, but failed to update the documentation
to match.
Etsuro Fujita
Coverity and inspection for the issue addressed in fd45d16f found some
questionable code.
Specifically coverity noticed that the wrong length was added in
ReorderBufferSerializeChange() - without immediate negative consequences
as the variable isn't used afterwards. During code-review and testing I
noticed that a bit of space was wasted when allocating tuple bufs in
several places. Thirdly, the debug memset()s in
ReorderBufferGetTupleBuf() reduce the error checking valgrind can do.
Backpatch: 9.4, like c8f621c43.
A thinko in a96761391 caused pg_ctl to get it exactly backwards when
deciding whether to report problems to the Windows eventlog or to stderr.
Per bug #14001 from Manuel Mathar, who also identified the fix.
Like the previous patch, back-patch to all supported branches.
In c8f621c43 I forgot to account for MAXALIGN when allocating a new
tuplebuf in ReorderBufferGetTupleBuf(). That happens to currently not
cause active problems on a number of platforms because the affected
pointer is already aligned, but others, like ppc and hppa, trigger this
in the regression test, due to a debug memset clearing memory.
Fix that.
Backpatch: 9.4, like the previous commit.
There were two places in spell.c that supposed that they could search
for a location in a string produced by lowerstr() and then transpose
the offset into the original string. But this fails completely if
lowerstr() transforms any characters into characters of different byte
length, as can happen in Turkish UTF8 for instance.
We'd added some comments about this coding in commit 51e78ab4ff328296,
but failed to realize that it was not merely confusing but wrong.
Coverity complained about this code years ago, but in such an opaque
fashion that nobody understood what it was on about. I'm not entirely
sure that this issue *is* what it's on about, actually, but perhaps
this patch will shut it up -- and in any case the problem is clear.
Back-patch to all supported branches.
When decoding the old version of an UPDATE or DELETE change, and if that
tuple was bigger than MaxHeapTupleSize, we either Assert'ed out, or
failed in more subtle ways in non-assert builds. Normally individual
tuples aren't bigger than MaxHeapTupleSize, with big datums toasted.
But that's not the case for the old version of a tuple for logical
decoding; the replica identity is logged as one piece. With the default
replica identity btree limits that to small tuples, but that's not the
case for FULL.
Change the tuple buffer infrastructure to separate allocate over-large
tuples, instead of always going through the slab cache.
This unfortunately requires changing the ReorderBufferTupleBuf
definition, we need to store the allocated size someplace. To avoid
requiring output plugins to recompile, don't store HeapTupleHeaderData
directly after HeapTupleData, but point to it via t_data; that leaves
rooms for the allocated size. As there's no reason for an output plugin
to look at ReorderBufferTupleBuf->t_data.header, remove the field. It
was just a minor convenience having it directly accessible.
Reported-By: Adam Dratwiński
Discussion: CAKg6ypLd7773AOX4DiOGRwQk1TVOQKhNwjYiVjJnpq8Wo+i62Q@mail.gmail.com
Somehow I managed to flip the order of restoring old & new tuples when
de-spooling a change in a large transaction from disk. This happens to
only take effect when a change is spooled to disk which has old/new
versions of the tuple. That only is the case for UPDATEs where he
primary key changed or where replica identity is changed to FULL.
The tests didn't catch this because either spooled updates, or updates
that changed primary keys, were tested; not both at the same time.
Found while adding tests for the following commit.
Backpatch: 9.4, where logical decoding was added
Logical decoding's reorderbuffer keeps transactions in an LSN ordered
list for efficiency. To make that's efficiently possible upper-level
xids are forced to be logged before nested subtransaction xids. That
only works though if these records are all looked at: Unfortunately we
didn't do so for e.g. row level locks, which are otherwise uninteresting
for logical decoding.
This could lead to errors like:
"ERROR: subxact logged without previous toplevel record".
It's not sufficient to just look at row locking records, the xid could
appear first due to a lot of other types of records (which will trigger
the transaction to be marked logged with MarkCurrentTransactionIdLoggedIfAny).
So invent infrastructure to tell reorderbuffer about xids seen, when
they'd otherwise not pass through reorderbuffer.c.
Reported-By: Jarred Ward
Bug: #13844
Discussion: 20160105033249.1087.66040@wrigleys.postgresql.org
Backpatch: 9.4, where logical decoding was added
Previously recovery_min_apply_delay was applied even before recovery
had reached consistency. This could cause us to wait a long time
unexpectedly for read-only connections to be allowed. It's problematic
because the standby was useless during that wait time.
This patch changes recovery_min_apply_delay so that it's applied once
the database has reached the consistent state. That is, even if the delay
is set, the standby tries to replay WAL records as fast as possible until
it has reached consistency.
Author: Michael Paquier
Reviewed-By: Julien Rouhaud
Reported-By: Greg Clough
Backpatch: 9.4, where recovery_min_apply_delay was added
Bug: #13770
Discussion: http://www.postgresql.org/message-id/20151111155006.2644.84564@wrigleys.postgresql.org