Commit 9dc842f08 of 8.2 era prevented MultiXact truncation during crash
recovery, because there was no guarantee that enough state had been
setup, and because it wasn't deemed to be a good idea to remove data
during crash recovery anyway. Since then, due to Hot-Standby, streaming
replication and PITR, the amount of time a cluster can spend doing crash
recovery has increased significantly, to the point that a cluster may
even never come out of it. This has made not truncating the content of
pg_multixact/ not defensible anymore.
To fix, take care to setup enough state for multixact truncation before
crash recovery starts (easy since checkpoints contain the required
information), and move the current end-of-recovery actions to a new
TrimMultiXact() function, analogous to TrimCLOG().
At some later point, this should probably done similarly to the way
clog.c is doing it, which is to just WAL log truncations, but we can't
do that for the back branches.
Back-patch to 9.0. 8.4 also has the problem, but since there's no hot
standby there, it's much less pressing. In 9.2 and earlier, this patch
is simpler than in newer branches, because multixact access during
recovery isn't required. Add appropriate checks to make sure that's not
happening.
Andres Freund
Ensure that the invocation command for postgres or pg_ctl runservice
double-quotes the executable's pathname; failure to do this leads to
trouble when the path contains spaces.
Also, ensure that the path ends in ".exe" in both cases and uses
backslashes rather than slashes as directory separators. The latter issue
is reported to confuse some third-party tools such as Symantec Backup Exec.
Also, rewrite the function to avoid buffer overrun issues by using a
PQExpBuffer instead of a fixed-size static buffer. Combinations of
very long executable pathnames and very long data directory pathnames
could have caused trouble before, for example.
Back-patch to all active branches, since this code has been like this
for a long while.
Naoya Anzai and Tom Lane, reviewed by Rajeev Rastogi
Vacuum recognizes that it can update relfrozenxid by checking whether it has
processed all pages of a relation. Unfortunately it performed that check
after truncating the dead pages at the end of the relation, and used the new
number of pages to decide whether all pages have been scanned. If the new
number of pages happened to be smaller or equal to the number of pages
scanned, it incorrectly decided that all pages were scanned.
This can lead to relfrozenxid being updated, even though some pages were
skipped that still contain old XIDs. That can lead to data loss due to xid
wraparounds with some rows suddenly missing. This likely has escaped notice
so far because it takes a large number (~2^31) of xids being used to see the
effect, while a full-table vacuum before that would fix the issue.
The incorrect logic was introduced by commit
b4b6923e03. Backpatch this fix down to 8.4,
like that commit.
Andres Freund, with some modifications by me.
variables is varchar. This fixes this test case:
int main(void)
{
exec sql begin declare section;
varchar a[50], b[50];
exec sql end declare section;
return 0;
}
Since varchars are internally turned into custom structs and
the type name is emitted for these variable declarations,
the preprocessed code previously had:
struct varchar_1 { ... } a _,_ struct varchar_2 { ... } b ;
The comma in the generated C file was a syntax error.
There are no regression test changes since it's not exercised.
Patch by Boszormenyi Zoltan <zb@cybertec.at>
This function formerly crashed if called as a statement-level trigger,
or if a column-name argument wasn't given.
In passing, add the trigger name to all error messages from the function.
(None of them are expected cases, so this shouldn't pose any compatibility
risk.)
Marc Cousin, reviewed by Sawada Masahiko
The previous coding labeled expressions such as pg_index.indkey[1:3] as
being of int2vector type; which is not right because the subscript bounds
of such a result don't, in general, satisfy the restrictions of int2vector.
To fix, implicitly promote the result of slicing int2vector to int2[],
or oidvector to oid[]. This is similar to what we've done with domains
over arrays, which is a good analogy because these types are very much
like restricted domains of the corresponding regular-array types.
A side-effect is that we now also forbid array-element updates on such
columns, eg while "update pg_index set indkey[4] = 42" would have worked
before if you were superuser (and corrupted your catalogs irretrievably,
no doubt) it's now disallowed. This seems like a good thing since, again,
some choices of subscripting would've led to results not satisfying the
restrictions of int2vector. The case of an array-slice update was
rejected before, though with a different error message than you get now.
We could make these cases work in future if we added a cast from int2[]
to int2vector (with a cast function checking the subscript restrictions)
but it seems unlikely that there's any value in that.
Per report from Ronan Dunklau. Back-patch to all supported branches
because of the crash risks involved.
If logging is enabled, either ereport() or fprintf() might stomp on errno
internally, causing this function to return the wrong result. That might
only end in a misleading error report, but in any code that's examining
errno to decide what to do next, the consequences could be far graver.
This has been broken since the very first version of this file in 2006
... it's a bit astonishing that we didn't identify this long ago.
Reported by Amit Kapila, though this isn't his proposed fix.
A pointer to a C string was treated as a pointer to a "name" datum and
passed to SPI_execute_plan(). This pointer would then end up being
passed through datumCopy(), which would try to copy the entire 64 bytes
of name data, thus running past the end of the C string. Fix by
converting the string to a proper name structure.
Found by LLVM AddressSanitizer.
pullup_replace_vars()'s decisions about whether a pulled-up replacement
expression needs to be wrapped in a PlaceHolderVar depend on the assumption
that what looks like a Var behaves like a Var. However, if the Var is a
join alias reference, later flattening of join aliases might replace the
Var with something that's not a Var at all, and should have been wrapped.
To fix, do a forcible pass of flatten_join_alias_vars() on the subquery
targetlist before we start to copy items out of it. We'll re-run that
processing on the pulled-up expressions later, but that's harmless.
Per report from Ken Tanzer; the added regression test case is based on his
example. This bug has been there since the PlaceHolderVar mechanism was
invented, but has escaped detection because the circumstances that trigger
it are fairly narrow. You need a flattenable query underneath an outer
join, which contains another flattenable query inside a join of its own,
with a dangerous expression (a constant or something else non-strict)
in that one's targetlist.
Having seen this, I'm wondering if it wouldn't be prudent to do all
alias-variable flattening earlier, perhaps even in the rewriter.
But that would probably not be a back-patchable change.
The command we're telling people to type needs to include double-quoting
around the unfortunately-chosen extension name. Twiddle the textual
quoting so that it looks somewhat sane. Per gripe from roadrunner6.
These bugs can cause data loss on standbys started with hot_standby=on at
the moment they start to accept read only queries, by marking committed
transactions as uncommited. The likelihood of such corruptions is small
unless the primary has a high transaction rate.
5a031a5556 fixed bugs in HS's startup logic
by maintaining less state until at least STANDBY_SNAPSHOT_PENDING state
was reached, missing the fact that both clog and subtrans are written to
before that. This only failed to fail in common cases because the usage
of ExtendCLOG in procarray.c was superflous since clog extensions are
actually WAL logged.
f44eedc3f0f347a856eea8590730769125964597/I then tried to fix the missing
extensions of pg_subtrans due to the former commit's changes - which are
not WAL logged - by performing the extensions when switching to a state
> STANDBY_INITIALIZED and not performing xid assignments before that -
again missing the fact that ExtendCLOG is unneccessary - but screwed up
twice: Once because latestObservedXid wasn't updated anymore in that
state due to the earlier commit and once by having an off-by-one error in
the loop performing extensions. This means that whenever a
CLOG_XACTS_PER_PAGE (32768 with default settings) boundary was crossed
between the start of the checkpoint recovery started from and the first
xl_running_xact record old transactions commit bits in pg_clog could be
overwritten if they started and committed in that window.
Fix this mess by not performing ExtendCLOG() in HS at all anymore since
it's unneeded and evidently dangerous and by performing subtrans
extensions even before reaching STANDBY_SNAPSHOT_PENDING.
Analysis and patch by Andres Freund. Reported by Christophe Pettus.
Backpatch down to 9.0, like the previous commit that caused this.
A couple of places that should have been iterating over WORDS_PER_CHUNK
words were iterating over WORDS_PER_PAGE words instead. This thinko
accidentally failed to fail, because (at least on common architectures
with default BLCKSZ) WORDS_PER_CHUNK is a bit less than WORDS_PER_PAGE,
and the extra words being looked at were always zero so nothing happened.
Still, it's a bug waiting to happen if anybody ever fools with the
parameters affecting TIDBitmap sizes, and it's a small waste of cycles
too. So back-patch to all active branches.
Etsuro Fujita
If a page is deleted, and reused for something else, just as a search is
following a rightlink to it from its left sibling, the search would continue
scanning whatever the new contents of the page are. That could lead to
incorrect query results, or even something more curious if the page is
reused for a different kind of a page.
To fix, modify the search algorithm to lock the next page before releasing
the previous one, and refrain from deleting pages from the leftmost branch
of the tree.
Add a new Concurrency section to the README, explaining why this works.
There is a lot more one could say about concurrency in GIN, but that's for
another patch.
Backpatch to all supported versions.
This change prevents us from doing inappropriate subquery flattening in
cases such as dangerous functions hidden inside a sub-SELECT in the
targetlist of another sub-SELECT. That could result in unexpected behavior
due to multiple evaluations of a volatile function, as in a recent
complaint from Etienne Dube. It's been questionable from the very
beginning whether these functions should look into subqueries (as noted in
their comments), and this case seems to provide proof that they should.
Because the new code only descends into SubLinks, not SubPlans or
InitPlans, the change only affects the planner's behavior during
prepjointree processing and not later on --- for example, you can still get
it to use a volatile function in an indexqual if you wrap the function in
(SELECT ...). That's a historical behavior, for sure, but it's reasonable
given that the executor's evaluation rules for subplans don't depend on
whether there are volatile functions inside them. In any case, we need to
constrain the behavioral change as narrowly as we can to make this
reasonable to back-patch.
Back-patch commits 8e68816cc2 and
8dace66e07 into the stable branches.
Buildfarm testing revealed no great portability surprises, and it
seems useful to have this robustness improvement in all branches.
Before jamming a desired targetlist into a plan node, one really ought to
make sure the plan node can handle projections, and insert a buffering
Result plan node if not. planagg.c forgot to do this, which is a hangover
from the days when it only dealt with IndexScan plan types. MergeAppend
doesn't project though, not to mention that it gets unhappy if you remove
its possibly-resjunk sort columns. The code accidentally failed to fail
for cases in which the min/max argument was a simple Var, because the new
targetlist would be equivalent to the original "flat" tlist anyway.
For any more complex case, it's been broken since 9.1 where we introduced
the ability to optimize min/max using MergeAppend, as reported by Raphael
Bauduin. Fix by duplicating the logic from grouping_planner that decides
whether we need a Result node.
In 9.2 and 9.1, this requires back-porting the tlist_same_exprs() function
introduced in commit 4387cf956b, else we'd
uselessly add a Result node in cases that worked before. It's rather
tempting to back-patch that whole commit so that we can avoid extra Result
nodes in mainline cases too; but I'll refrain, since that code hasn't
really seen all that much field testing yet.
Insertion of default arguments doesn't work for window functions, which is
likely to cause a crash at runtime if the implementation code doesn't check
the number of actual arguments carefully. It doesn't seem worth working
harder than this for pre-9.2 branches.
For rather inscrutable reasons, SQL:2008 disallows copying-and-modifying a
window definition that has any explicit framing clause. The error message
we gave for this only made sense if the referencing window definition
itself contains an explicit framing clause, which it might well not.
Moreover, in the context of an OVER clause it's not exactly obvious that
"OVER (windowname)" implies copy-and-modify while "OVER windowname" does
not. This has led to multiple complaints, eg bug #5199 from Iliya
Krapchatov. Change to a hopefully more intelligible error message, and
in the case where we have just "OVER (windowname)", add a HINT suggesting
that omitting the parentheses will fix it. Also improve the related
documentation. Back-patch to all supported branches.
Callers expect that they only have to set the right resource owner when
creating a BufFile, not during subsequent operations on it. While we could
insist this be fixed at the caller level, it seems more sensible for the
BufFile to take care of it. Without this, some temp files belonging to
a BufFile can go away too soon, eg at the end of a subtransaction,
leading to errors or crashes.
Reported and fixed by Andres Freund. Back-patch to all active branches.
Formerly, when using a SQL-spec timezone setting with a fixed GMT offset
(called a "brute force" timezone in the code), the session_timezone
variable was not updated to match the nominal timezone; rather, all code
was expected to ignore session_timezone if HasCTZSet was true. This is
of course obviously fragile, though a search of the code finds only
timeofday() failing to honor the rule. A bigger problem was that
DetermineTimeZoneOffset() supposed that if its pg_tz parameter was
pointer-equal to session_timezone, then HasCTZSet should override the
parameter. This would cause datetime input containing an explicit zone
name to be treated as referencing the brute-force zone instead, if the
zone name happened to match the session timezone that had prevailed
before installing the brute-force zone setting (as reported in bug #8572).
The same malady could affect AT TIME ZONE operators.
To fix, set up session_timezone so that it matches the brute-force zone
specification, which we can do using the POSIX timezone definition syntax
"<abbrev>offset", and get rid of the bogus lookaside check in
DetermineTimeZoneOffset(). Aside from fixing the erroneous behavior in
datetime parsing and AT TIME ZONE, this will cause the timeofday() function
to print its result in the user-requested time zone rather than some
previously-set zone. It might also affect results in third-party
extensions, if there are any that make use of session_timezone without
considering HasCTZSet, but in all cases the new behavior should be saner
than before.
Back-patch to all supported branches.
The C and POSIX standards state that strncpy's behavior is undefined when
source and destination areas overlap. While it remains dubious whether any
implementations really misbehave when the pointers are exactly equal, some
platforms are now starting to force the issue by complaining when an
undefined call occurs. (In particular OS X 10.9 has been seen to dump core
here, though the exact set of circumstances needed to trigger that remain
elusive. Similar behavior can be expected to be optional on Linux and
other platforms in the near future.) So tweak the code to explicitly do
nothing when nothing need be done.
Back-patch to all active branches. In HEAD, this also lets us get rid of
an exception in valgrind.supp.
Per discussion of a report from Matthias Schmitt.
1. In heap_hot_search_buffer(), the PredicateLockTuple() call is passed
wrong offset number. heapTuple->t_self is set to the tid of the first
tuple in the chain that's visited, not the one actually being read.
2. CheckForSerializableConflictIn() uses the tuple's t_ctid field
instead of t_self to check for exiting predicate locks on the tuple. If
the tuple was updated, but the updater rolled back, t_ctid points to the
aborted dead tuple.
Reported by Hannu Krosing. Backpatch to 9.1.
If a tuple was frozen while its predicate locks mattered,
read-write dependencies could be missed, resulting in failure to
detect conflicts which could lead to anomalies in committed
serializable transactions.
This field was added to the tag when we still thought that it was
necessary to carry locks forward to a new version of an updated
row. That was later proven to be unnecessary, which allowed
simplification of the code, but elimination of xmin from the tag
was missed at the time.
Per report and analysis by Heikki Linnakangas.
Backpatch to 9.1.
lo_open registers the currently active snapshot, and checks if the
large object exists after that. Normally, snapshots registered by lo_open
are unregistered at end of transaction when the lo descriptor is closed, but
if we error out before the lo descriptor is added to the list of open
descriptors, it is leaked. Fix by moving the snapshot registration to after
checking if the large object exists.
Reported by Pavel Stehule. Backpatch to 8.4. The snapshot registration
system was introduced in 8.4, so prior versions are not affected (and not
supported, anyway).
This is a backpatch of commits d942f9d9, 82b01026, and 6697aa2bc, back
to release 9.1 where we introduced extensions which make heavy use of
the PGXS infrastructure.
There is a rare race condition, when a transaction that inserted a tuple
aborts while vacuum is processing the page containing the inserted tuple.
Vacuum prunes the page first, which normally removes any dead tuples, but
if the inserting transaction aborts right after that, the loop after
pruning will see a dead tuple and remove it instead. That's OK, but if the
page is on a table with no indexes, and the page becomes completely empty
after removing the dead tuple (or tuples) on it, it will be immediately
marked as all-visible. That's OK, but the sanity check in vacuum would
throw a warning because it thinks that the page contains dead tuples and
was nevertheless marked as all-visible, even though it just vacuumed away
the dead tuples and so it doesn't actually contain any.
Spotted this while reading the code. It's difficult to hit the race
condition otherwise, but can be done by putting a breakpoint after the
heap_page_prune() call.
Backpatch all the way to 8.4, where this code first appeared.
Though @libdir@ almost always matches @abs_builddir@ in this context,
the test could only fail if they differed. Back-patch to 9.1, where the
test was introduced.
Hamid Quddus Akhtar
In libpq, we set up and pass to OpenSSL callback routines to handle
locking. When we run out of SSL connections, we try to clean things
up by de-registering the hooks. Unfortunately, we had a few calls
into the OpenSSL library after these hooks were de-registered during
SSL cleanup which lead to deadlocking. This moves the thread callback
cleanup to be after all SSL-cleanup related OpenSSL library calls.
I've been unable to reproduce the deadlock with this fix.
In passing, also move the close_SSL call to be after unlocking our
ssl_config mutex when in a failure state. While it looks pretty
unlikely to be an issue, it could have resulted in deadlocks if we
ended up in this code path due to something other than SSL_new
failing. Thanks to Heikki for pointing this out.
Back-patch to all supported versions; note that the close_SSL issue
only goes back to 9.0, so that hunk isn't included in the 8.4 patch.
Initially found and reported by Vesa-Matti J Kari; many thanks to
both Heikki and Andres for their help running down the specific
issue and reviewing the patch.