By using only the macro that checks infomask bits
HEAP_XMAX_IS_LOCKED_ONLY to verify whether a multixact is not an
updater, and not the full HeapTupleHeaderIsOnlyLocked, it would come to
the wrong result in case of a multixact containing an aborted update;
therefore returning the wrong result code. This would cause predicate.c
to break completely (as in bug report #8273 from David Leverton), and
certain index builds would misbehave. As far as I can tell, other
callers of the bogus routine would make harmless mistakes or not be
affected by the difference at all; so this was a pretty narrow case.
Also, no other user of the HEAP_XMAX_IS_LOCKED_ONLY macro is as
careless; they all check specifically for the HEAP_XMAX_IS_MULTI case,
and they all verify whether the updater is InvalidXid before concluding
that it's a valid updater. So there doesn't seem to be any similar bug.
An ancient logic error in cfindloop() could cause the regex engine to fail
to find matches that begin later than the start of the string. This
function is only used when the regex pattern contains a back reference,
and so far as we can tell the error is only reachable if the pattern is
non-greedy (i.e. its first quantifier uses the ? modifier). Furthermore,
the actual match must begin after some potential match that satisfies the
DFA but then fails the back-reference's match test.
Reported and fixed by Jeevan Chalke, with cosmetic adjustments by me.
In pg_basebackup.c:reached_end_position(), we're reading from an
internal pipe with our own background process but we're possibly
reading more bytes than will actually fit into our buffer due to
an off-by-one error. As we're reading from an internal pipe
there's no real risk here, but it's good form to not depend on
such convenient arrangements.
Bug spotted by the Coverity scanner.
Back-patch to 9.2 where this showed up.
In pg_dump.c:getEventTriggers, check what major version we are on
before calling createPQExpBuffer() to avoid leaking that bit of
memory.
Leak discovered by the Coverity scanner.
Back-patch to 9.3 where support for dumping event triggers was
added.
In receivelog.c:writeTimeLineHistoryFile(), we were not properly
closing the open'd file descriptor in error cases. While this
wouldn't matter much if we were about to exit due to such an
error, that's not the case with pg_receivexlog as it can be a
long-running process and these errors are non-fatal.
This resource leak was found by the Coverity scanner.
Back-patch to 9.3 where this issue first appeared.
In tuplesort.c:inittapes(), we calculate tapeSpace by first figuring
out how many 'tapes' we can use (maxTapes) and then multiplying the
result by the tape buffer overhead for each. Unfortunately, when
we are on a system with an 8-byte long, we allow work_mem to be
larger than 2GB and that allows maxTapes to be large enough that the
32bit arithmetic can overflow when multiplied against the buffer
overhead.
When this overflow happens, we end up adding the overflow to the
amount of space available, causing the amount of memory allocated to
be larger than work_mem.
Note that to reach this point, you have to set work mem to at least
24GB and be sorting a set which is at least that size. Given that a
user who can set work_mem to 24GB could also set it even higher, if
they were looking to run the system out of memory, this isn't
considered a security issue.
This overflow risk was found by the Coverity scanner.
Back-patch to all supported branches, as this issue has existed
since before 8.4.
In streamutil.c:GetConnection(), upgrade failure to parse the
connection string to an exit(1) instead of simply returning NULL.
Most callers already immediately exited, but pg_receivexlog would
loop on this case, continually trying to re-parse the connection
string (which can't be changed after pg_receivexlog has started).
GetConnection() was already expected to exit(1) in some cases
(eg: failure to allocate memory or if unable to determine the
integer_datetimes flag), so this change shouldn't surprise anyone.
Began looking at this due to the Coverity scanner complaining that
we were leaking err_msg in this case- no longer an issue since we
just exit(1) immediately.
The command strings read by the child processes during parallel
pg_dump, after being read and handled, were not being free'd.
This patch corrects this relatively minor memory leak.
Leak found by the Coverity scanner.
Back patch to 9.3 where parallel pg_dump was introduced.
This makes superuser-issued REFRESH MATERIALIZED VIEW safe regardless of
the object's provenance. REINDEX is an earlier example of this pattern.
As a downside, functions called from materialized views must tolerate
running in a security-restricted operation. CREATE MATERIALIZED VIEW
need not change user ID. Nonetheless, avoid creation of materialized
views that will invariably fail REFRESH by making it, too, start a
security-restricted operation.
Back-patch to 9.3 so materialized views have this from the beginning.
Reviewed by Kevin Grittner.
The code in set_append_rel_pathlist() for building parameterized paths
for append relations (inheritance and UNION ALL combinations) supposed
that the cheapest regular path for a child relation would still be cheapest
when reparameterized. Which might not be the case, particularly if the
added join conditions are expensive to compute, as in a recent example from
Jeff Janes. Fix it to compare child path costs *after* reparameterizing.
We can short-circuit that if the cheapest pre-existing path is already
parameterized correctly, which seems likely to be true often enough to be
worth checking for.
Back-patch to 9.2 where parameterized paths were introduced.
Commit 31a891857a128828d47d93c63e041f3b69cbab70 added some tests in
plpgsql.sql that used a function rather unthinkingly named "foo()".
However, rangefuncs.sql has some much older tests that create a function
of that name, and since these test scripts run in parallel, there is a
chance of failures if the timing is just right. Use another name to
avoid that. Per buildfarm (failure seen today on "hamerkop", but
probably it's happened before and not been noticed).
An INSERT into such a view should work just like an INSERT into its base
table, ie the insertion should go directly into that table ... not be
duplicated into each child table, as was happening before, per bug #8275
from Rushabh Lathia. On the other hand, the current behavior for
UPDATE/DELETE seems reasonable: the update/delete traverses the child
tables, or not, depending on whether the view specifies ONLY or not.
Add some regression tests covering this area.
Dean Rasheed
Specifically, permit attaching them to the error in RAISE and retrieving
them from a caught error in GET STACKED DIAGNOSTICS. RAISE enforces
nothing about the content of the fields; for its purposes, they are just
additional string fields. Consequently, clarify in the protocol and
libpq documentation that the usual relationships between error fields,
like a schema name appearing wherever a table name appears, are not
universal. This freedom has other applications; consider a FDW
propagating an error from an RDBMS having no schema support.
Back-patch to 9.3, where core support for the error fields was
introduced. This prevents the confusion of having a release where libpq
exposes the fields and PL/pgSQL does not.
Pavel Stehule, lexical revisions by Noah Misch.
With -Wtype-limits, gcc correctly points out that size_t can never be < 0.
Backpatch to 9.3 and 9.2. It's been like this forever, but in <= 9.1 you got
a lot other warnings with -Wtype-limits anyway (at least with my version of
gcc).
Andres Freund
When there's a comment on an index that was created with UNIQUE or PRIMARY
KEY constraint syntax, we need to label the comment as depending on the
constraint not the index, since only the constraint object actually appears
in the dump. This incorrect dependency can lead to parallel pg_restore
trying to restore the comment before the index has been created, per bug
#8257 from Lloyd Albin.
This patch fixes pg_dump to produce the right dependency in dumps made
in the future. Usually we also try to hack pg_restore to work around
bogus dependencies, so that existing (wrong) dumps can still be restored in
parallel mode; but that doesn't seem practical here since there's no easy
way to relate the constraint dump entry to the comment after the fact.
Andres Freund
On Unix-ish platforms, EWOULDBLOCK may be the same as EAGAIN, which is
*not* a success return, at least not on Linux. We need to treat it as a
failure to avoid giving a misleading error message. Per the Single Unix
Spec, only EINPROGRESS and EINTR returns indicate that the connection
attempt is in progress.
On Windows, on the other hand, EWOULDBLOCK (WSAEWOULDBLOCK) is the expected
case. We must accept EINPROGRESS as well because Cygwin will return that,
and it doesn't seem worth distinguishing Cygwin from native Windows here.
It's not very clear whether EINTR can occur on Windows, but let's leave
that part of the logic alone in the absence of concrete trouble reports.
Also, remove the test for errno == 0, effectively reverting commit
da9501bddb42222dc33c031b1db6ce2133bcee7b, which AFAICS was just a thinko;
or at best it might have been a workaround for a platform-specific bug,
which we can hope is gone now thirteen years later. In any case, since
libpq makes no effort to reset errno to zero before calling connect(),
it seems unlikely that that test has ever reliably done anything useful.
Andres Freund and Tom Lane
Clang 3.3 correctly complains that a variable of type enum
MultiXactStatus cannot hold a value of -1, which makes sense. Change
the declared type of the variable to int instead, and apply casting as
necessary to avoid the warning.
Per notice from Andres Freund
In binary upgrade mode, we need to recreate and then drop dropped
columns so that all the columns get the right attribute number. This is
true for foreign tables as well as for native tables. For foreign
tables we have been getting the first part right but not the second,
leading to bogus columns in the upgraded database. Fix this all the way
back to 9.1, where foreign tables were introduced.
In replication, when we shutdown the master, walsender tries to send
all the outstanding WAL records to the standby, and then to exit. This
basically means that all the WAL records are fully synced between
two servers after the clean shutdown of the master. So, after
promoting the standby to new master, we can restart the stopped
master as new standby without the need for a fresh backup from
new master.
But there was one problem so far: though walsender tries to send all
the outstanding WAL records, it doesn't wait for them to be replicated
to the standby. Then, before receiving all the WAL records,
walreceiver can detect the closure of connection and exit. We cannot
guarantee that there is no missing WAL in the standby after clean
shutdown of the master. In this case, backup from new master is
required when restarting the stopped master as new standby.
This patch fixes this problem. It just changes walsender so that it
waits for all the outstanding WAL records to be replicated to the
standby before closing the replication connection.
Per discussion, this is a fix that needs to get backpatched rather than
new feature. So, back-patch to 9.1 where enough infrastructure for
this exists.
Patch by me, reviewed by Andres Freund.
In some cases with higher numbers of subtransactions
it was possible for us to incorrectly initialize
subtrans leading to complaints of missing pages.
Bug report by Sergey Konoplev
Analysis and fix by Andres Freund
In Danish collations, there are letter combinations which sort
higher than 'Z'. A test for values > 'WA' was picking up rows
where the value started with 'AA', causing the test to fail.
Backpatch to 9.2, where the failing test was added.
Per report from Svenne Krap and analysis by Jeff Janes
MarkBufferDirtyHint() writes WAL, and should know if it's got a
standard buffer or not. Currently, the only callers where buffer_std
is false are related to the FSM.
In passing, rename XLOG_HINT to XLOG_FPI, which is more descriptive.
Back-patch to 9.3.
This avoids platform-dependent behavior wherein pg_sleep() might fail to be
interrupted by statement timeout, query cancel, SIGTERM, etc. Also, since
there's no reason to wake up once a second any more, we can reduce the
power consumption of a sleeping backend a tad.
Back-patch to 9.3, since use of SA_RESTART for SIGALRM makes this a bigger
issue than it used to be.
The exclusion of SIGALRM dates back to Berkeley days, when Postgres used
SIGALRM in only one very short stretch of code. Nowadays, allowing it to
interrupt kernel calls doesn't seem like a very good idea, since its use
for statement_timeout means SIGALRM could occur anyplace in the code, and
there are far too many call sites where we aren't prepared to deal with
EINTR failures. When third-party code is taken into consideration, it
seems impossible that we ever could be fully EINTR-proof, so better to
use SA_RESTART always and deal with the implications of that. One such
implication is that we should not assume pg_usleep() will be terminated
early by a signal. Therefore, long sleeps should probably be replaced
by WaitLatch operations where practical.
Back-patch to 9.3 so we can get some beta testing on this change.
SP-GiST's original scheme for avoiding deadlocks during concurrent index
insertions doesn't work, as per report from Hailong Li, and there isn't any
evident way to make it work completely. We could possibly lock individual
inner tuples instead of their whole pages, but preliminary experimentation
suggests that the performance penalty would be huge. Instead, if we fail
to get a buffer lock while descending the tree, just restart the tree
descent altogether. We keep the old tuple positioning rules, though, in
hopes of reducing the number of cases where this can happen.
Teodor Sigaev, somewhat edited by Tom Lane
elog.c has historically treated LOG messages as low-priority during
bootstrap and standalone operation. This has led to confusion and even
masked a bug, because the normal expectation of code authors is that
elog(LOG) will put something into the postmaster log, and that wasn't
happening during initdb. So get rid of the special-case rule and make
the priority order the same as it is in normal operation. To keep from
cluttering initdb's output and the behavior of a standalone backend,
tweak the severity level of three messages routinely issued by xlog.c
during startup and shutdown so that they won't appear in these cases.
Per my proposal back in December.
pg_filedump and other external utility programs are likely to want to be
able to check Postgres page checksums. To avoid messy duplication of code,
move the checksumming functionality into an exported header file, much as
we did awhile back for the CRC code.
In passing, get rid of an unportable assumption that a static char[] array
will be word-aligned, and do some other minor code beautification.
Memory was allocated based on the sizeof a type that was not the type of
the pointer that the result was being assigned to. The types happen to
be of the same size, but it's still wrong.
In most scenarios a portal without a ResourceOwner is dead and not subject
to any further execution, but a portal for a cursor WITH HOLD remains in
existence with no ResourceOwner after the creating transaction is over.
In this situation, if we attempt to "execute" the portal directly to fetch
data from it, we were setting CurrentResourceOwner to NULL, leading to a
segfault if the datatype output code did anything that required a resource
owner (such as trying to fetch system catalog entries that weren't already
cached). The case appears to be impossible to provoke with stock libpq,
but psqlODBC at least is able to cause it when working with held cursors.
Simplest fix is to just skip the assignment to CurrentResourceOwner, so
that any resources used by the data output operations will be managed by
the transaction-level resource owner instead. For consistency I changed
all the places that install a portal's resowner as current, even though
some of them are probably not reachable with a held cursor's portal.
Per report from Joshua Berry (with thanks to Hiroshi Inoue for developing
a self-contained test case). Back-patch to all supported versions.
Several loops in the JSON parser examined a byte in memory just before
checking whether its address was in-bounds, so they could read one byte
beyond the datum's allocation. A SIGSEGV is possible. New in 9.3, so
no back-patch.