Coverity and inspection for the issue addressed in fd45d16f found some
questionable code.
Specifically coverity noticed that the wrong length was added in
ReorderBufferSerializeChange() - without immediate negative consequences
as the variable isn't used afterwards. During code-review and testing I
noticed that a bit of space was wasted when allocating tuple bufs in
several places. Thirdly, the debug memset()s in
ReorderBufferGetTupleBuf() reduce the error checking valgrind can do.
Backpatch: 9.4, like c8f621c43.
A thinko in a96761391 caused pg_ctl to get it exactly backwards when
deciding whether to report problems to the Windows eventlog or to stderr.
Per bug #14001 from Manuel Mathar, who also identified the fix.
Like the previous patch, back-patch to all supported branches.
In c8f621c43 I forgot to account for MAXALIGN when allocating a new
tuplebuf in ReorderBufferGetTupleBuf(). That happens to currently not
cause active problems on a number of platforms because the affected
pointer is already aligned, but others, like ppc and hppa, trigger this
in the regression test, due to a debug memset clearing memory.
Fix that.
Backpatch: 9.4, like the previous commit.
There were two places in spell.c that supposed that they could search
for a location in a string produced by lowerstr() and then transpose
the offset into the original string. But this fails completely if
lowerstr() transforms any characters into characters of different byte
length, as can happen in Turkish UTF8 for instance.
We'd added some comments about this coding in commit 51e78ab4ff,
but failed to realize that it was not merely confusing but wrong.
Coverity complained about this code years ago, but in such an opaque
fashion that nobody understood what it was on about. I'm not entirely
sure that this issue *is* what it's on about, actually, but perhaps
this patch will shut it up -- and in any case the problem is clear.
Back-patch to all supported branches.
When decoding the old version of an UPDATE or DELETE change, and if that
tuple was bigger than MaxHeapTupleSize, we either Assert'ed out, or
failed in more subtle ways in non-assert builds. Normally individual
tuples aren't bigger than MaxHeapTupleSize, with big datums toasted.
But that's not the case for the old version of a tuple for logical
decoding; the replica identity is logged as one piece. With the default
replica identity btree limits that to small tuples, but that's not the
case for FULL.
Change the tuple buffer infrastructure to separate allocate over-large
tuples, instead of always going through the slab cache.
This unfortunately requires changing the ReorderBufferTupleBuf
definition, we need to store the allocated size someplace. To avoid
requiring output plugins to recompile, don't store HeapTupleHeaderData
directly after HeapTupleData, but point to it via t_data; that leaves
rooms for the allocated size. As there's no reason for an output plugin
to look at ReorderBufferTupleBuf->t_data.header, remove the field. It
was just a minor convenience having it directly accessible.
Reported-By: Adam Dratwiński
Discussion: CAKg6ypLd7773AOX4DiOGRwQk1TVOQKhNwjYiVjJnpq8Wo+i62Q@mail.gmail.com
Somehow I managed to flip the order of restoring old & new tuples when
de-spooling a change in a large transaction from disk. This happens to
only take effect when a change is spooled to disk which has old/new
versions of the tuple. That only is the case for UPDATEs where he
primary key changed or where replica identity is changed to FULL.
The tests didn't catch this because either spooled updates, or updates
that changed primary keys, were tested; not both at the same time.
Found while adding tests for the following commit.
Backpatch: 9.4, where logical decoding was added
Logical decoding's reorderbuffer keeps transactions in an LSN ordered
list for efficiency. To make that's efficiently possible upper-level
xids are forced to be logged before nested subtransaction xids. That
only works though if these records are all looked at: Unfortunately we
didn't do so for e.g. row level locks, which are otherwise uninteresting
for logical decoding.
This could lead to errors like:
"ERROR: subxact logged without previous toplevel record".
It's not sufficient to just look at row locking records, the xid could
appear first due to a lot of other types of records (which will trigger
the transaction to be marked logged with MarkCurrentTransactionIdLoggedIfAny).
So invent infrastructure to tell reorderbuffer about xids seen, when
they'd otherwise not pass through reorderbuffer.c.
Reported-By: Jarred Ward
Bug: #13844
Discussion: 20160105033249.1087.66040@wrigleys.postgresql.org
Backpatch: 9.4, where logical decoding was added
Previously recovery_min_apply_delay was applied even before recovery
had reached consistency. This could cause us to wait a long time
unexpectedly for read-only connections to be allowed. It's problematic
because the standby was useless during that wait time.
This patch changes recovery_min_apply_delay so that it's applied once
the database has reached the consistent state. That is, even if the delay
is set, the standby tries to replay WAL records as fast as possible until
it has reached consistency.
Author: Michael Paquier
Reviewed-By: Julien Rouhaud
Reported-By: Greg Clough
Backpatch: 9.4, where recovery_min_apply_delay was added
Bug: #13770
Discussion: http://www.postgresql.org/message-id/20151111155006.2644.84564@wrigleys.postgresql.org
The existing code confuses the byte length of the string (which is
relevant when passing it to pg_strncasecmp) with the character length
of the string (which is relevant when it is used with the SQL substring
function). Separate those two concepts.
Report and patch by Kyotaro Horiguchi, reviewed by Thomas Munro and
reviewed and further revised by me.
This makes the flag more visible for testers using the default file as a
template, increasing the likelyhood that the test suite will be run.
Also have the flag be displayed in the fake "configure" output, if set.
This patch is two new lines only, but perltidy decides to shift things
around which makes it appear a bit bigger.
Author: Michaël Paquier
Reviewed-by: Craig Ringer
Discussion: https://www.postgresql.org/message-id/CAB7nPqRet6UAP2APhZAZw%3DVhJ6w-Q-gGLdZkrOqFgd2vc9-ZDw%40mail.gmail.com
Otherwise running installcheck-force on a server with
synchronous_commit=off will result in the tests failing. All the other
tests already do so...
Backpatch: 9.4, where logical decoding was added
A thinko concerning nesting depth caused json_to_record() to produce bogus
output if a field of its input object contained a sub-object with a field
name matching one of the requested output column names. Per bug #13996
from Johann Visagie.
I added a regression test case based on his example, plus parallel tests
for json_to_recordset, jsonb_to_record, jsonb_to_recordset. The latter
three do not exhibit the same bug (which suggests that we may be missing
some opportunities to share code...) but testing seems like a good idea
in any case.
Back-patch to 9.4 where these functions were introduced.
This error message was written with only ON SELECT rules in mind, but since
then we also made RETURNING-clause targetlists go through the same logic.
This means that you got a rather off-topic error message if you tried to
add a rule with RETURNING to a table having dropped columns. Ideally we'd
just support that, but some preliminary investigation says that it might be
a significant amount of work. Seeing that Nicklas Avén's complaint is the
first one we've gotten about this in the ten years or so that the code's
been like that, I'm unwilling to put much time into it. Instead, improve
the error report by issuing a different message for RETURNING cases, and
revise the associated comment based on this investigation.
Discussion: 1456176604.17219.9.camel@jordogskog.no
It's harmless, but might confuse readers. Seems to have been introduced
in 6bc8ef0b7f. Back-patch, just to avoid cosmetic cross-branch
differences.
Amit Langote
When converting an RTE with securityQuals into a security barrier
subquery RTE, ensure that the Vars in the new subquery's targetlist
all have varlevelsup = 0 so that they correctly refer to the
underlying base relation being wrapped.
The original code was creating new Vars by copying them from existing
Vars referencing the base relation found elsewhere in the query, but
failed to account for the fact that such Vars could come from sublink
subqueries, and hence have varlevelsup > 0. In practice it looks like
this could only happen with nested security barrier views, where the
outer view has a WHERE clause containing a correlated subquery, due to
the order in which the Vars are processed.
Bug: #13988
Reported-by: Adam Guthrie
Backpatch-to: 9.4, where updatable SB views were introduced
A failure partway through PGLC_localeconv() led to a situation where
the next call would call free_struct_lconv() a second time, leading
to free() on already-freed strings, typically leading to a core dump.
Add a flag to remember whether we need to do that.
Per report from Thom Brown. His example case only provokes the failure
as far back as 9.4, but nonetheless this code is obviously broken, so
back-patch to all supported branches.
StartupSUBTRANS() incorrectly handled cases near the max pageid in the subtrans
data structure, which in some cases could lead to errors in startup for Hot
Standby.
This patch wraps the pageids correctly, avoiding any such errors.
Identified by exhaustive crash testing by Jeff Janes.
Jeff Janes
Suppress creation of the pg_upgrade delete script when the new data
directory is inside the old data directory.
Reported-by: IRC
Backpatch-through: 9.3, where delete script tests were added
Dead or half-dead index leaf pages were incorrectly reported as live, as a
consequence of a code rearrangement I made (during a moment of severe brain
fade, evidently) in commit d287818eb5.
The index metapage was not counted in index_size, causing that result to
not agree with the actual index size on-disk.
Index root pages were not counted in internal_pages, which is inconsistent
compared to the case of a root that's also a leaf (one-page index), where
the root would be counted in leaf_pages. Aside from that inconsistency,
this could lead to additional transient discrepancies between the reported
page counts and index_size, since it's possible for pgstatindex's scan to
see zero or multiple pages marked as BTP_ROOT, if the root moves due to
a split during the scan. With these fixes, index_size will always be
exactly one page more than the sum of the displayed page counts.
Also, the index_size result was incorrectly documented as being measured in
pages; it's always been measured in bytes. (While fixing that, I couldn't
resist doing some small additional wordsmithing on the pgstattuple docs.)
Including the metapage causes the reported index_size to not be zero for
an empty index. To preserve the desired property that the pgstattuple
regression test results are platform-independent (ie, BLCKSZ configuration
independent), scale the index_size result in the regression tests.
The documentation issue was reported by Otsuka Kenji, and the inconsistent
root page counting by Peter Geoghegan; the other problems noted by me.
Back-patch to all supported branches, because this has been broken for
a long time.
A function name that's double-quoted in SQL can contain almost any
characters, but we were using that name directly as part of the name
generated for the Python-level function, and Python doesn't like
anything that isn't pretty much a standard identifier. To fix,
replace anything that isn't an ASCII letter or digit with an underscore
in the generated name. This doesn't create any risk of duplicate Python
function names because we were already appending the function OID to
the generated name to ensure uniqueness. Per bug #13960 from Jim Nasby.
Patch by Jim Nasby, modified a bit by me. Back-patch to all
supported branches.
Clarify the description of which transactions will block a CREATE INDEX
CONCURRENTLY command from proceeding, and mention that the index might
still not be usable after CREATE INDEX completes. (This happens if the
index build detected broken HOT chains, so that pg_index.indcheckxmin gets
set, and there are open old transactions preventing the xmin horizon from
advancing past the index's initial creation. I didn't want to explain what
broken HOT chains are, though, so I omitted an explanation of exactly when
old transactions prevent the index from being used.)
Per discussion with Chris Travers. Back-patch to all supported branches,
since the same text appears in all of them.
In runtime.sgml, the old formulas for calculating the reasonable
values of SEMMNI and SEMMNS were incorrect. They have forgotten to
count the number of semaphores which both the checkpointer process
(introduced in 9.2) and the background worker processes (introduced
in 9.3) need.
This commit fixes those formulas so that they count the number of
semaphores which the checkpointer process and the background worker
processes need.
Report and patch by Kyotaro Horiguchi. Only the patch for 9.3 was
modified by me. Back-patch to 9.2 where the checkpointer process was
added and the number of needed semaphores was increased.
Author: Kyotaro Horiguchi
Reviewed-by: Fujii Masao
Backpatch: 9.2
Discussion: http://www.postgresql.org/message-id/20160203.125119.66820697.horiguchi.kyotaro@lab.ntt.co.jp
The original code wasn't careful to test the file descriptor returned by
PQsocket() for an invalid socket. If an invalid socket did turn up,
that would amount to calling FD_ISSET with fd = -1, whereby undefined
behavior can be invoked.
To fix, test file descriptor for validity and stop further processing if
that fails.
Problem noticed by Coverity.
There is an existing FD_ISSET callsite that does check for invalid
sockets beforehand, but the error message reported by it was
strerror(errno); in testing the aforementioned change, that turns out to
result in "bad socket: Success" which isn't terribly helpful. Instead
use PQerrorMessage() in both places which is more likely to contain an
useful error message.
Backpatch-through: 9.1.
Reportedly, some compilers warn about tests like "c < 0" if c is unsigned,
and hence complain about the character range checks I added in commit
3bb3f42f37. This is a bit of a pain since
the regex library doesn't really want to assume that chr is unsigned.
However, since any such reconfiguration would involve manual edits of
regcustom.h anyway, we can put it on the shoulders of whoever wants to
do that to adjust this new range-checking macro correctly.
Per gripes from Coverity and Andres.
Many automated test suites call pg_ctl. Buildfarm members axolotl,
hornet, mandrill, shearwater, sungazer and tern have failed when server
shutdown took longer than the pg_ctl default 60s timeout. This addition
permits slow hosts to easily raise the timeout without us editing a
--timeout argument into every test suite pg_ctl call. Back-patch to 9.1
(all supported versions) for the sake of automated testing.
Reviewed by Tom Lane.
It turns out that on FreeBSD-derived platforms (including OS X), the
*scanf() family of functions is pretty much brain-dead about multibyte
characters. In particular it will apply isspace() to individual bytes
of input even when those bytes are part of a multibyte character, thus
allowing false recognition of a field-terminating space.
We appear to have little alternative other than instituting a coding
rule that *scanf() is not to be used if the input string might contain
multibyte characters. (There was some discussion of relying on "%ls",
but that probably just moves the portability problem somewhere else,
and besides it doesn't fully prevent BSD *scanf() from using isspace().)
This patch is a down payment on that: it gets rid of use of sscanf()
to parse ispell dictionary files, which are certainly at great risk
of having a problem. The code is cleaner this way anyway, though
a bit longer.
In passing, improve a few comments.
Report and patch by Artur Zakirov, reviewed and somewhat tweaked by me.
Back-patch to all supported branches.
Previously, our regex code defined CHR_MAX as 0xfffffffe, which is a
bad choice because it is outside the range of type "celt" (int32).
Characters approaching that limit could lead to infinite loops in logic
such as "for (c = a; c <= b; c++)" where c is of type celt but the
range bounds are chr. Such loops will work safely only if CHR_MAX+1
is representable in celt, since c must advance to beyond b before the
loop will exit.
Fortunately, there seems no reason not to restrict CHR_MAX to 0x7ffffffe.
It's highly unlikely that Unicode will ever assign codes that high, and
none of our other backend encodings need characters beyond that either.
In addition to modifying the macro, we have to explicitly enforce character
range restrictions on the values of \u, \U, and \x escape sequences, else
the limit is trivially bypassed.
Also, the code for expanding case-independent character ranges in bracket
expressions had a potential integer overflow in its calculation of the
number of characters it could generate, which could lead to allocating too
small a character vector and then overwriting memory. An attacker with the
ability to supply arbitrary regex patterns could easily cause transient DOS
via server crashes, and the possibility for privilege escalation has not
been ruled out.
Quite aside from the integer-overflow problem, the range expansion code was
unnecessarily inefficient in that it always produced a result consisting of
individual characters, abandoning the knowledge that we had a range to
start with. If the input range is large, this requires excessive memory.
Change it so that the original range is reported as-is, and then we add on
any case-equivalent characters that are outside that range. With this
approach, we can bound the number of individual characters allowed without
sacrificing much. This patch allows at most 100000 individual characters,
which I believe to be more than the number of case pairs existing in
Unicode, so that the restriction will never be hit in practice.
It's still possible for range() to take awhile given a large character code
range, so also add statement-cancel detection to its loop. The downstream
function dovec() also lacked cancel detection, and could take a long time
given a large output from range().
Per fuzz testing by Greg Stark. Back-patch to all supported branches.
Security: CVE-2016-0773
Apparently by accident the above commit was backpatched to all supported
branches, except 9.4. This appears to be an error, as the issue is just
as present there.
Given the short amount of time before the next minor release, and given
the issue is documented to be fixed for 9.4, it seems like a good idea
to push this now.
Original-Author: Michael Meskes
Discussion: 75DB81BEEA95B445AE6D576A0A5C9E9364CBC11F@BPXM05GP.gisp.nec.co.jp
Get rid of the false implication that PRIMARY KEY is exactly equivalent to
UNIQUE + NOT NULL. That was more-or-less true at one time in our
implementation, but the standard doesn't say that, and we've grown various
features (many of them required by spec) that treat a pkey differently from
less-formal constraints. Per recent discussion on pgsql-general.
I failed to resist the temptation to do some other wordsmithing in the
same area.
Future PL/Java versions will close CVE-2016-0766 by making these GUCs
PGC_SUSET. This PostgreSQL change independently mitigates that PL/Java
vulnerability, helping sites that update PostgreSQL more frequently than
PL/Java. Back-patch to 9.1 (all supported versions).
deparseReturningList ended up adding up RETURNING NULL to the code, but
code elsewhere saw an empty list of attributes and concluded that it
should not expect tuples from the remote side.
Etsuro Fujita and Robert Haas, reviewed by Thom Brown
If a view is split into CREATE TABLE + CREATE RULE to break a circular
dependency, then any triggers on the view must be dumped/reloaded after
the CREATE RULE; else the backend may reject the CREATE TRIGGER because
it's the wrong type of trigger for a plain table. This works all right
in plain dump/restore because of pg_dump's sorting heuristic that places
triggers after rules. However, when using parallel restore, the ordering
must be enforced by a dependency --- and we didn't have one.
Fixing this is a mere matter of adding an addObjectDependency() call,
except that we need to be able to find all the triggers belonging to the
view relation, and there was no easy way to do that. Add fields to
pg_dump's TableInfo struct to remember where the associated TriggerInfo
struct(s) are.
Per bug report from Dennis Kögel. The failure can be exhibited at least
as far back as 9.1, so back-patch to all supported branches.
Commit e09996ff8d was one brick shy of a load: it didn't insist
that the detected JSON number be the whole of the supplied string.
This allowed inputs such as "2016-01-01" to be misdetected as valid JSON
numbers. Per bug #13906 from Dmitry Ryabov.
In passing, be more wary of zero-length input (I'm not sure this can
happen given current callers, but better safe than sorry), and do some
minor cosmetic cleanup.