This prevents a possible longjmp out of the signal handler if a timeout
or SIGINT occurs while something within the handler has transiently set
ImmediateInterruptOK. For safety we must hold off the timeout or cancel
error until we're back in mainline, or at least till we reach the end of
the signal handler when ImmediateInterruptOK was true at entry. This
syncs these functions with the logic now present in handle_sig_alarm.
AFAICT there is no live bug here in 9.0 and up, because I don't think we
currently can wait for any heavyweight lock inside these functions, and
there is no other code (except read-from-client) that will turn on
ImmediateInterruptOK. However, that was not true pre-9.0: in older
branches ProcessIncomingNotify might block trying to lock pg_listener, and
then a SIGINT could lead to undesirable control flow. It might be all
right anyway given the relatively narrow code ranges in which NOTIFY
interrupts are enabled, but for safety's sake I'm back-patching this.
Serious oversight in commit 16e1b7a1b7f7ffd8a18713e83c8cd72c9ce48e07:
we should not allow an interrupt to take control away from mainline code
except when ImmediateInterruptOK is set. Just to be safe, let's adopt
the same save-clear-restore dance that's been used for many years in
HandleCatchupInterrupt and HandleNotifyInterrupt, so that nothing bad
happens if a timeout handler invokes code that tests or even manipulates
ImmediateInterruptOK.
Per report of "stuck spinlock" failures from Christophe Pettus, though
many other symptoms are possible. Diagnosis by Andres Freund.
The operation that removes the remaining dead tuples from the page must
be WAL-logged before the setting of the VM bit. Otherwise, if you replay
the WAL to between those two records, you end up with the VM bit set, but
the dead tuples are still there.
Backpatch to 9.3, where this bug was introduced.
Make the COPY test, which loads most of the large static tables used in
the tests, also explicitly ANALYZE those tables. This allows us to get
rid of various ad-hoc, and rather redundant, ANALYZE commands that had
gotten stuck into various test scripts over time to ensure we got
consistent plan choices. (We could have done a database-wide ANALYZE,
but that would cause stats to get attached to the small static tables
too, which results in plan changes compared to the historical behavior.
I'm not sure that's a good idea, so not going that far for now.)
Back-patch to 9.0, since 9.0 and 9.1 are currently sometimes failing
regression tests for lack of an "ANALYZE tenk1" in the subselect test.
There's no need for this in 8.4 since we didn't print any plans back
then.
The test only needs the one table to be vacuumed. Vacuuming the
database may affect other tests.
Per gripe from Tom Lane. Back-patch to 9.3, where the test was
was added.
An expression such as WHERE (... x IN (SELECT ...) ...) IN (SELECT ...)
could produce an invalid plan that results in a crash at execution time,
if the planner attempts to flatten the outer IN into a semi-join.
This happens because convert_testexpr() was not expecting any nested
SubLinks and would wrongly replace any PARAM_SUBLINK Params belonging
to the inner SubLink. (I think the comment denying that this case could
happen was wrong when written; it's certainly been wrong for quite a long
time, since very early versions of the semijoin flattening logic.)
Per report from Teodor Sigaev. Back-patch to all supported branches.
In 247c76a98909, I added some code to do fine-grained checking of
MultiXact status of locking/updating transactions when traversing an
update chain. There was a thinko in that patch which would have the
traversing abort, that is return HeapTupleUpdated, when the other
transaction is a committed lock-only. In this case we should ignore it
and return success instead. Of course, in the case where there is a
committed update, HeapTupleUpdated is the correct return value.
A user-visible symptom of this bug is that in REPEATABLE READ and
SERIALIZABLE transaction isolation modes spurious serializability errors
can occur:
ERROR: could not serialize access due to concurrent update
In order for this to happen, there needs to be a tuple that's key-share-
locked and also updated, and the update must abort; a subsequent
transaction trying to acquire a new lock on that tuple would abort with
the above error. The reason is that the initial FOR KEY SHARE is seen
as committed by the new locking transaction, which triggers this bug.
(If the UPDATE commits, then the serialization error is correctly
reported.)
When running a query in READ COMMITTED mode, what happens is that the
locking is aborted by the HeapTupleUpdated return value, then
EvalPlanQual fetches the newest version of the tuple, which is then the
only version that gets locked. (The second time the tuple is checked
there is no misbehavior on the committed lock-only, because it's not
checked by the code that traverses update chains; so no bug.) Only the
newest version of the tuple is locked, not older ones, but this is
harmless.
The isolation test added by this commit illustrates the desired
behavior, including the proper serialization errors that get thrown.
Backpatch to 9.3.
Current OpenSSL code includes a BIO_clear_retry_flags() step in the
sock_write() function. Either we failed to copy the code correctly, or
they added this since we copied it. In any case, lack of the clear step
appears to be the cause of the server lockup after connection loss reported
in bug #8647 from Valentine Gogichashvili. Assume that this is correct
coding for all OpenSSL versions, and hence back-patch to all supported
branches.
Diagnosis and patch by Alexander Kukushkin.
HeapTupleSatisfiesUpdate can very easily "forget" tuple locks while
checking the contents of a multixact and finding it contains an aborted
update, by setting the HEAP_XMAX_INVALID bit. This would lead to
concurrent transactions not noticing any previous locks held by
transactions that might still be running, and thus being able to acquire
subsequent locks they wouldn't be normally able to acquire.
This bug was introduced in commit 1ce150b7bb; backpatch this fix to 9.3,
like that commit.
This change reverts the change to the delete-abort-savept isolation test
in 1ce150b7bb, because that behavior change was caused by this bug.
Noticed by Andres Freund while investigating a different issue reported
by Noah Misch.
Insertion to a non-leaf GIN page didn't make a full-page image of the page,
which is wrong. The code used to do it correctly, but was changed (commit
853d1c3103fa961ae6219f0281885b345593d101) because the redo-routine didn't
track incomplete splits correctly when the page was restored from a full
page image. Of course, that was not right way to fix it, the redo routine
should've been fixed instead. The redo-routine was surreptitiously fixed
in 2010 (commit 4016bdef8aded77b4903c457050622a5a1815c16), so all we need
to do now is revert the code that creates the record to its original form.
This doesn't change the format of the WAL record.
Backpatch to all supported versions.
pg_dumpall's charter is to be able to recreate a database cluster's
contents in a virgin installation, but it was failing to honor that
contract if the cluster had any ALTER DATABASE SET
default_transaction_read_only settings. By including a SET command
for the connection for each connection opened by pg_dumpall output,
errors are avoided and the source cluster is successfully
recreated.
There was discussion of whether to also set this for the connection
applying pg_dump output, but it was felt that it was both less
appropriate in that context, and far easier to work around.
Backpatch to all supported branches.
Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look
into a multixact to check the members against cutoff_xid. This means
that a very old Xid could survive hidden within a multi, possibly
outliving its CLOG storage. In the distant future, this would cause
clog lookup failures:
ERROR: could not access status of transaction 3883960912
DETAIL: Could not open file "pg_clog/0E78": No such file or directory.
This mostly was problematic when the updating transaction aborted, since
in that case the row wouldn't get pruned away earlier in vacuum and the
multixact could possibly survive for a long time. In many cases, data
that is inaccessible for this reason way can be brought back
heuristically.
As a second bug, heap_freeze_tuple() didn't properly handle multixacts
that need to be frozen according to cutoff_multi, but whose updater xid
is still alive. Instead of preserving the update Xid, it just set Xmax
invalid, which leads to both old and new tuple versions becoming
visible. This is pretty rare in practice, but a real threat
nonetheless. Existing corrupted rows, unfortunately, cannot be repaired
in an automated fashion.
Existing physical replicas might have already incorrectly frozen tuples
because of different behavior than in master, which might only become
apparent in the future once pg_multixact/ is truncated; it is
recommended that all clones be rebuilt after upgrading.
Following code analysis caused by bug report by J Smith in message
CADFUPgc5bmtv-yg9znxV-vcfkb+JPRqs7m2OesQXaM_4Z1JpdQ@mail.gmail.com
and privately by F-Secure.
Backpatch to 9.3, where freezing of MultiXactIds was introduced.
Analysis and patch by Andres Freund, with some tweaks by Álvaro.
It is dangerous to do so, because some code expects to be able to see what's
the true Xmax even if it is aborted (particularly while traversing HOT
chains). So don't do it, and instead rely on the callers to verify for
abortedness, if necessary.
Several race conditions and bugs fixed in the process. One isolation test
changes the expected output due to these.
This also reverts commit c235a6a589b, which is no longer necessary.
Backpatch to 9.3, where this function was introduced.
Andres Freund
Commit 9dc842f08 of 8.2 era prevented MultiXact truncation during crash
recovery, because there was no guarantee that enough state had been
setup, and because it wasn't deemed to be a good idea to remove data
during crash recovery anyway. Since then, due to Hot-Standby, streaming
replication and PITR, the amount of time a cluster can spend doing crash
recovery has increased significantly, to the point that a cluster may
even never come out of it. This has made not truncating the content of
pg_multixact/ not defensible anymore.
To fix, take care to setup enough state for multixact truncation before
crash recovery starts (easy since checkpoints contain the required
information), and move the current end-of-recovery actions to a new
TrimMultiXact() function, analogous to TrimCLOG().
At some later point, this should probably done similarly to the way
clog.c is doing it, which is to just WAL log truncations, but we can't
do that for the back branches.
Back-patch to 9.0. 8.4 also has the problem, but since there's no hot
standby there, it's much less pressing. In 9.2 and earlier, this patch
is simpler than in newer branches, because multixact access during
recovery isn't required. Add appropriate checks to make sure that's not
happening.
Andres Freund
While autovacuum dutifully launched anti-multixact-wraparound vacuums
when the multixact "age" was reached, the vacuum code was not aware that
it needed to make them be full table vacuums. As the resulting
partial-table vacuums aren't capable of actually increasing relminmxid,
autovacuum continued to launch anti-wraparound vacuums that didn't have
the intended effect, until age of relfrozenxid caused the vacuum to
finally be a full table one via vacuum_freeze_table_age.
To fix, introduce logic for multixacts similar to that for plain
TransactionIds, using the same GUCs.
Backpatch to 9.3, where permanent MultiXactIds were introduced.
Andres Freund, some cleanup by Álvaro
Parts of the code used autovacuum_freeze_max_age to determine whether
anti-multixact-wraparound vacuums are necessary, while others used a
hardcoded 200000000 value. This leads to problems when
autovacuum_freeze_max_age is set to a non-default value. Use the latter
everywhere.
Backpatch to 9.3, where vacuuming of multixacts was introduced.
Andres Freund
Ensure that the invocation command for postgres or pg_ctl runservice
double-quotes the executable's pathname; failure to do this leads to
trouble when the path contains spaces.
Also, ensure that the path ends in ".exe" in both cases and uses
backslashes rather than slashes as directory separators. The latter issue
is reported to confuse some third-party tools such as Symantec Backup Exec.
Also, rewrite the function to avoid buffer overrun issues by using a
PQExpBuffer instead of a fixed-size static buffer. Combinations of
very long executable pathnames and very long data directory pathnames
could have caused trouble before, for example.
Back-patch to all active branches, since this code has been like this
for a long while.
Naoya Anzai and Tom Lane, reviewed by Rajeev Rastogi
The various places that transferred fast-path locks to the main lock table
neglected to release the PGPROC's backendLock if SetupLockInTable failed
due to being out of shared memory. In most cases this is no big deal since
ensuing error cleanup would release all held LWLocks anyway. But there are
some hot-standby functions that don't consider failure of
FastPathTransferRelationLocks to be a hard error, and in those cases this
oversight could lead to system lockup. For consistency, make all of these
places look the same as FastPathTransferRelationLocks.
Noted while looking for the cause of Dan Wood's bugs --- this wasn't it,
but it's a bug anyway.
Prevent handle_sig_alarm from losing control partway through due to a query
cancel (either an asynchronous SIGINT, or a cancel triggered by one of the
timeout handler functions). That would at least result in failure to
schedule any required future interrupt, and might result in actual
corruption of timeout.c's data structures, if the interrupt happened while
we were updating those.
We could still lose control if an asynchronous SIGINT arrives just as the
function is entered. This wouldn't break any data structures, but it would
have the same effect as if the SIGALRM interrupt had been silently lost:
we'd not fire any currently-due handlers, nor schedule any new interrupt.
To forestall that scenario, forcibly reschedule any pending timer interrupt
during AbortTransaction and AbortSubTransaction. We can avoid any extra
kernel call in most cases by not doing that until we've allowed
LockErrorCleanup to kill the DEADLOCK_TIMEOUT and LOCK_TIMEOUT events.
Another hazard is that some platforms (at least Linux and *BSD) block a
signal before calling its handler and then unblock it on return. When we
longjmp out of the handler, the unblock doesn't happen, and the signal is
left blocked indefinitely. Again, we can fix that by forcibly unblocking
signals during AbortTransaction and AbortSubTransaction.
These latter two problems do not manifest when the longjmp reaches
postgres.c, because the error recovery code there kills all pending timeout
events anyway, and it uses sigsetjmp(..., 1) so that the appropriate signal
mask is restored. So errors thrown outside any transaction should be OK
already, and cleaning up in AbortTransaction and AbortSubTransaction should
be enough to fix these issues. (We're assuming that any code that catches
a query cancel error and doesn't re-throw it will do at least a
subtransaction abort to clean up; but that was pretty much required already
by other subsystems.)
Lastly, ProcSleep should not clear the LOCK_TIMEOUT indicator flag when
disabling that event: if a lock timeout interrupt happened after the lock
was granted, the ensuing query cancel is still going to happen at the next
CHECK_FOR_INTERRUPTS, and we want to report it as a lock timeout not a user
cancel.
Per reports from Dan Wood.
Back-patch to 9.3 where the new timeout handling infrastructure was
introduced. We may at some point decide to back-patch the signal
unblocking changes further, but I'll desist from that until we hear
actual field complaints about it.
We have for a long time checked the head pointer of each of the backend's
proclock lists and skipped acquiring the corresponding locktable partition
lock if the head pointer was NULL. This was safe enough in the days when
proclock lists were changed only by the owning backend, but it is pretty
questionable now that the fast-path patch added cases where backends add
entries to other backends' proclock lists. However, we don't really wish
to revert to locking each partition lock every time, because in simple
transactions that would add a lot of useless lock/unlock cycles on
already-heavily-contended LWLocks. Fortunately, the only way that another
backend could be modifying our proclock list at this point would be if it
was promoting a formerly fast-path lock of ours; and any such lock must be
one that we'd decided not to delete in the previous loop over the locallock
table. So it's okay if we miss seeing it in this loop; we'd just decide
not to delete it again. However, once we've detected a non-empty list,
we'd better re-fetch the list head pointer after acquiring the partition
lock. This guards against possibly fetching a corrupt-but-non-null pointer
if pointer fetch/store isn't atomic. It's not clear if any practical
architectures are like that, but we've never assumed that before and don't
wish to start here. In any case, the situation certainly deserves a code
comment.
While at it, refactor the partition traversal loop to use a for() construct
instead of a while() loop with goto's.
Back-patch, just in case the risk is real and not hypothetical.
Instead of simply checking the KEYS_UPDATED bit, we need to check
whether each lock held on the future version of the tuple conflicts with
the lock we're trying to acquire.
Per bug report #8434 by Tomonari Katsumata
Not doing so causes us to traverse an update chain that has been broken
by concurrent page pruning. All other code that traverses update chains
uses this check as one of the cases in which to stop iterating, so
replicate it here too. Failure to do so leads to erroneous CLOG,
subtrans or multixact lookups.
Per discussion following the bug report by J Smith in
CADFUPgc5bmtv-yg9znxV-vcfkb+JPRqs7m2OesQXaM_4Z1JpdQ@mail.gmail.com
as diagnosed by Andres Freund.
If a transaction updates/deletes a tuple just before aborting, and a
concurrent transaction tries to prune the page concurrently, the pruner
may see HeapTupleSatisfiesVacuum return HEAPTUPLE_DELETE_IN_PROGRESS,
but a later call to HeapTupleGetUpdateXid() return InvalidXid. This
would cause an assertion failure in development builds, but would be
otherwise Mostly Harmless.
Fix by checking whether the updater Xid is valid before trying to apply
it as page prune point.
Reported by Andres in 20131124000203.GA4403@alap2.anarazel.de
The reason for the fetch failure is that the tuple was removed because
it was dead; so the failure is innocuous and can be ignored. Moreover,
there's no need for further work and we can return success to the caller
immediately. EvalPlanQualFetch is doing something very similar to this
already.
Report and test case from Andres Freund in
20131124000203.GA4403@alap2.anarazel.de
When acquiring a lock in fast-path mode, we must reset the locallock
object's lock and proclock fields to NULL. They are not necessarily that
way to start with, because the locallock could be left over from a failed
lock acquisition attempt earlier in the transaction. Failure to do this
led to all sorts of interesting misbehaviors when LockRelease tried to
clean up no-longer-related lock and proclock objects in shared memory.
Per report from Dan Wood.
In passing, modify LockRelease to elog not just Assert if it doesn't find
lock and proclock objects for a formerly fast-path lock, matching the code
in FastPathGetRelationLockEntry and LockRefindAndRelease. This isn't a
bug but it will help in diagnosing any future bugs in this area.
Also, modify FastPathTransferRelationLocks and FastPathGetRelationLockEntry
to break out of their loops over the fastpath array once they've found the
sole matching entry. This was inconsistently done in some search loops
and not others.
Improve assorted related comments, too.
Back-patch to 9.2 where the fast-path mechanism was introduced.
Vacuum recognizes that it can update relfrozenxid by checking whether it has
processed all pages of a relation. Unfortunately it performed that check
after truncating the dead pages at the end of the relation, and used the new
number of pages to decide whether all pages have been scanned. If the new
number of pages happened to be smaller or equal to the number of pages
scanned, it incorrectly decided that all pages were scanned.
This can lead to relfrozenxid being updated, even though some pages were
skipped that still contain old XIDs. That can lead to data loss due to xid
wraparounds with some rows suddenly missing. This likely has escaped notice
so far because it takes a large number (~2^31) of xids being used to see the
effect, while a full-table vacuum before that would fix the issue.
The incorrect logic was introduced by commit
b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. Backpatch this fix down to 8.4,
like that commit.
Andres Freund, with some modifications by me.
variables is varchar. This fixes this test case:
int main(void)
{
exec sql begin declare section;
varchar a[50], b[50];
exec sql end declare section;
return 0;
}
Since varchars are internally turned into custom structs and
the type name is emitted for these variable declarations,
the preprocessed code previously had:
struct varchar_1 { ... } a _,_ struct varchar_2 { ... } b ;
The comma in the generated C file was a syntax error.
There are no regression test changes since it's not exercised.
Patch by Boszormenyi Zoltan <zb@cybertec.at>
The previous coding labeled expressions such as pg_index.indkey[1:3] as
being of int2vector type; which is not right because the subscript bounds
of such a result don't, in general, satisfy the restrictions of int2vector.
To fix, implicitly promote the result of slicing int2vector to int2[],
or oidvector to oid[]. This is similar to what we've done with domains
over arrays, which is a good analogy because these types are very much
like restricted domains of the corresponding regular-array types.
A side-effect is that we now also forbid array-element updates on such
columns, eg while "update pg_index set indkey[4] = 42" would have worked
before if you were superuser (and corrupted your catalogs irretrievably,
no doubt) it's now disallowed. This seems like a good thing since, again,
some choices of subscripting would've led to results not satisfying the
restrictions of int2vector. The case of an array-slice update was
rejected before, though with a different error message than you get now.
We could make these cases work in future if we added a cast from int2[]
to int2vector (with a cast function checking the subscript restrictions)
but it seems unlikely that there's any value in that.
Per report from Ronan Dunklau. Back-patch to all supported branches
because of the crash risks involved.
If logging is enabled, either ereport() or fprintf() might stomp on errno
internally, causing this function to return the wrong result. That might
only end in a misleading error report, but in any code that's examining
errno to decide what to do next, the consequences could be far graver.
This has been broken since the very first version of this file in 2006
... it's a bit astonishing that we didn't identify this long ago.
Reported by Amit Kapila, though this isn't his proposed fix.
A pointer to a C string was treated as a pointer to a "name" datum and
passed to SPI_execute_plan(). This pointer would then end up being
passed through datumCopy(), which would try to copy the entire 64 bytes
of name data, thus running past the end of the C string. Fix by
converting the string to a proper name structure.
Found by LLVM AddressSanitizer.
pullup_replace_vars()'s decisions about whether a pulled-up replacement
expression needs to be wrapped in a PlaceHolderVar depend on the assumption
that what looks like a Var behaves like a Var. However, if the Var is a
join alias reference, later flattening of join aliases might replace the
Var with something that's not a Var at all, and should have been wrapped.
To fix, do a forcible pass of flatten_join_alias_vars() on the subquery
targetlist before we start to copy items out of it. We'll re-run that
processing on the pulled-up expressions later, but that's harmless.
Per report from Ken Tanzer; the added regression test case is based on his
example. This bug has been there since the PlaceHolderVar mechanism was
invented, but has escaped detection because the circumstances that trigger
it are fairly narrow. You need a flattenable query underneath an outer
join, which contains another flattenable query inside a join of its own,
with a dangerous expression (a constant or something else non-strict)
in that one's targetlist.
Having seen this, I'm wondering if it wouldn't be prudent to do all
alias-variable flattening earlier, perhaps even in the rewriter.
But that would probably not be a back-patchable change.
These bugs can cause data loss on standbys started with hot_standby=on at
the moment they start to accept read only queries, by marking committed
transactions as uncommited. The likelihood of such corruptions is small
unless the primary has a high transaction rate.
5a031a5556ff83b8a9646892715d7fef415b83c3 fixed bugs in HS's startup logic
by maintaining less state until at least STANDBY_SNAPSHOT_PENDING state
was reached, missing the fact that both clog and subtrans are written to
before that. This only failed to fail in common cases because the usage
of ExtendCLOG in procarray.c was superflous since clog extensions are
actually WAL logged.
f44eedc3f0f347a856eea8590730769125964597/I then tried to fix the missing
extensions of pg_subtrans due to the former commit's changes - which are
not WAL logged - by performing the extensions when switching to a state
> STANDBY_INITIALIZED and not performing xid assignments before that -
again missing the fact that ExtendCLOG is unneccessary - but screwed up
twice: Once because latestObservedXid wasn't updated anymore in that
state due to the earlier commit and once by having an off-by-one error in
the loop performing extensions. This means that whenever a
CLOG_XACTS_PER_PAGE (32768 with default settings) boundary was crossed
between the start of the checkpoint recovery started from and the first
xl_running_xact record old transactions commit bits in pg_clog could be
overwritten if they started and committed in that window.
Fix this mess by not performing ExtendCLOG() in HS at all anymore since
it's unneeded and evidently dangerous and by performing subtrans
extensions even before reaching STANDBY_SNAPSHOT_PENDING.
Analysis and patch by Andres Freund. Reported by Christophe Pettus.
Backpatch down to 9.0, like the previous commit that caused this.
Previously, -d option for pg_isready was broken. When the name of the
database was specified by -d option, pg_isready failed with an error.
When the conninfo specified by -d option contained the setting of the
host name but not Numeric IP address (i.e., hostaddr), pg_isready
displayed wrong connection message. -d option could not handle a valid
URI prefix at all. This commit fixes these bugs of pg_isready.
Backpatch to 9.3, where pg_isready was introduced.
Per report from Josh Berkus and Robert Haas.
Original patch by Fabrízio de Royes Mello, heavily modified by me.
Previously, if VACUUM skipped vacuuming a page because it's pinned, it
didn't count that page as scanned. However, that meant that relfrozenxid
was not bumped up either, which prevented anti-wraparound vacuum from
doing its job.
Report by Миша Тюрин, analysis and patch by Sergey Burladyn and Jeff Janes.
Backpatch to 9.2, where the skip-locked-pages behavior was introduced.
A couple of places that should have been iterating over WORDS_PER_CHUNK
words were iterating over WORDS_PER_PAGE words instead. This thinko
accidentally failed to fail, because (at least on common architectures
with default BLCKSZ) WORDS_PER_CHUNK is a bit less than WORDS_PER_PAGE,
and the extra words being looked at were always zero so nothing happened.
Still, it's a bug waiting to happen if anybody ever fools with the
parameters affecting TIDBitmap sizes, and it's a small waste of cycles
too. So back-patch to all active branches.
Etsuro Fujita
Bug #8591 from Claudio Freire demonstrates that get_eclass_for_sort_expr
must be able to compute valid em_nullable_relids for any new equivalence
class members it creates. I'd worried about this in the commit message
for db9f0e1d9a4a0842c814a464cdc9758c3f20b96c, but claimed that it wasn't a
problem because multi-member ECs should already exist when it runs. That
is transparently wrong, though, because this function is also called by
initialize_mergeclause_eclasses, which runs during deconstruct_jointree.
The example given in the bug report (which the new regression test item
is based upon) fails because the COALESCE() expression is first seen by
initialize_mergeclause_eclasses rather than process_equivalence.
Fixing this requires passing the appropriate nullable_relids set to
get_eclass_for_sort_expr, and it requires new code to compute that set
for top-level expressions such as ORDER BY, GROUP BY, etc. We store
the top-level nullable_relids in a new field in PlannerInfo to avoid
computing it many times. In the back branches, I've added the new
field at the end of the struct to minimize ABI breakage for planner
plugins. There doesn't seem to be a good alternative to changing
get_eclass_for_sort_expr's API signature, though. There probably aren't
any third-party extensions calling that function directly; moreover,
if there are, they probably need to think about what to pass for
nullable_relids anyway.
Back-patch to 9.2, like the previous patch in this area.
Simple oversight in commit 1cb108efb0e60d87e4adec38e7636b6e8efbeb57 ---
recursively examining a subquery output column is only sane if the
original Var refers to a single output column. Found by Kevin Grittner.
The pretty-printing logic in ruleutils.c operates by inserting a newline
and some indentation whitespace into strings that are already valid SQL.
This naturally results in leaving some trailing whitespace before the
newline in many cases; which can be annoying when processing the output
with other tools, as complained of by Joe Abbate. We can fix that in
a pretty localized fashion by deleting any trailing whitespace before
we append a pretty-printing newline. In addition, we have to modify the
code inserted by commit 2f582f76b1945929ff07116cd4639747ce9bb8a1 so that
we also delete trailing whitespace when transposing items from temporary
buffers into the main result string, when a temporary item starts with a
newline.
This results in rather voluminous changes to the regression test results,
but it's easily verified that they are only removal of trailing whitespace.
Back-patch to 9.3, because the aforementioned commit resulted in many
more cases of trailing whitespace than had occurred in earlier branches.
Although the SQL spec forbids duplicate table aliases, historically
we've allowed queries like
SELECT ... FROM tab1 x CROSS JOIN (tab2 x CROSS JOIN tab3 y) z
on the grounds that the aliased join (z) hides the aliases within it,
therefore there is no conflict between the two RTEs named "x". The
LATERAL patch broke this, on the misguided basis that "x" could be
ambiguous if tab3 were a LATERAL subquery. To avoid breaking existing
queries, it's better to allow this situation and complain only if
tab3 actually does contain an ambiguous reference. We need only remove
the check that was throwing an error, because the column lookup code
is already prepared to handle ambiguous references. Per bug #8444.
This is a similar fix as c6ec8793aa59d1842082e14b4b4aae7d4bd883fd
9.2. This should never happen in 9.3 and newer since the special case
cannot happen there, but this patch synchronizes up the code so there
is no confusion on why they're different. An empty block is as harmless
in 9.3 as it was in 9.2, and can safely be ignored.
If a page is deleted, and reused for something else, just as a search is
following a rightlink to it from its left sibling, the search would continue
scanning whatever the new contents of the page are. That could lead to
incorrect query results, or even something more curious if the page is
reused for a different kind of a page.
To fix, modify the search algorithm to lock the next page before releasing
the previous one, and refrain from deleting pages from the leftmost branch
of the tree.
Add a new Concurrency section to the README, explaining why this works.
There is a lot more one could say about concurrency in GIN, but that's for
another patch.
Backpatch to all supported versions.
This change prevents us from doing inappropriate subquery flattening in
cases such as dangerous functions hidden inside a sub-SELECT in the
targetlist of another sub-SELECT. That could result in unexpected behavior
due to multiple evaluations of a volatile function, as in a recent
complaint from Etienne Dube. It's been questionable from the very
beginning whether these functions should look into subqueries (as noted in
their comments), and this case seems to provide proof that they should.
Because the new code only descends into SubLinks, not SubPlans or
InitPlans, the change only affects the planner's behavior during
prepjointree processing and not later on --- for example, you can still get
it to use a volatile function in an indexqual if you wrap the function in
(SELECT ...). That's a historical behavior, for sure, but it's reasonable
given that the executor's evaluation rules for subplans don't depend on
whether there are volatile functions inside them. In any case, we need to
constrain the behavioral change as narrowly as we can to make this
reasonable to back-patch.
contain_volatile_functions() is best applied to the output of
expression_planner(), not its input, so that insertion of function
default arguments and constant-folding have been done. (See comments
at CheckMutability, for instance.) It's perhaps unlikely that anyone
will notice a difference in practice, but still we should do it properly.
In passing, change variable type from Node* to Expr* to reduce the net
number of casts needed.
Noted while perusing uses of contain_volatile_functions().