During the development of d47cfef711 the CFI()s in ExecScan() were
moved back and forth, ending up in the wrong place. Thus queries that
largely spend their time in ExecScan(), and have neither projection
nor a qual, can't be cancelled in a timely manner.
Reported-By: Jeff Janes
Author: Andres Freund
Discussion: https://postgr.es/m/CAMkU=1weDXp8eLLPt9SO1LEUsJYYK9cScaGhLKpuN+WbYo9b5g@mail.gmail.com
Backpatch: 10, as d47cfef711
The order in which GRANTs are output is important as GRANTs which have
been GRANT'd by individuals via WITH GRANT OPTION GRANTs have to come
after the GRANT which included the WITH GRANT OPTION. This happens
naturally in the backend during normal operation as we only change
existing ACLs in-place, only add new ACLs to the end, and when removing
an ACL we remove any which depend on it also.
Also, adjust the comments in acl.h to make this clear.
Unfortunately, the updates to pg_dump to handle initial privileges
involved pulling apart ACLs and then combining them back together and
could end up putting them back together in an invalid order, leading to
dumps which wouldn't restore.
Fix this by adjusting the queries used by pg_dump to ensure that the
ACLs are rebuilt in the same order in which they were originally.
Back-patch to 9.6 where the changes for initial privileges were done.
Test queries added by commit 69835bc89 are giving unexpected results
on some smaller buildfarm critters. I think probably the seqscan
logic is kicking in to cause the scans to not start at the beginning
of the table. Add ORDER BY to make them be indexscans instead.
Per buildfarm member chipmunk.
Historically, the selectivity functions have simply not distinguished
< from <=, or > from >=, arguing that the fraction of the population that
satisfies the "=" aspect can be considered to be vanishingly small, if the
comparison value isn't any of the most-common-values for the variable.
(If it is, the code path that executes the operator against each MCV will
take care of things properly.) But that isn't really true unless we're
dealing with a continuum of variable values, and in practice we seldom are.
If "x = const" would estimate a nonzero number of rows for a given const
value, then it follows that we ought to estimate different numbers of rows
for "x < const" and "x <= const", even if the const is not one of the MCVs.
Handling this more honestly makes a significant difference in edge cases,
such as the estimate for a tight range (x BETWEEN y AND z where y and z
are close together).
Hence, split scalarltsel into scalarltsel/scalarlesel, and similarly
split scalargtsel into scalargtsel/scalargesel. Adjust <= and >=
operator definitions to reference the new selectivity functions.
Improve the core ineq_histogram_selectivity() function to make a
correction for equality. (Along the way, I learned quite a bit about
exactly why that function gives good answers, which I tried to memorialize
in improved comments.)
The corresponding join selectivity functions were, and remain, just stubs.
But I chose to split them similarly, to avoid confusion and to prevent the
need for doing this exercise again if someone ever makes them less stubby.
In passing, change ineq_histogram_selectivity's clamp for extreme
probability estimates so that it varies depending on the histogram
size, instead of being hardwired at 0.0001. With the default histogram
size of 100 entries, you still get the old clamp value, but bigger
histograms should allow us to put more faith in edge values.
Tom Lane, reviewed by Aleksander Alekseev and Kuntal Ghosh
Discussion: https://postgr.es/m/12232.1499140410@sss.pgh.pa.us
The documentation claimed that one should send
"pg_same_as_startup_message" as the user name in the SCRAM messages, but
this did not match the actual implementation, so remove it.
This makes it clear that pg_upgrade standby upgrade instructions should
only be used in link mode, adds examples, and explains how rsync works
with links.
Reported-by: Andreas Joseph Krogh
Discussion: https://postgr.es/m/VisenaEmail.6c.c0e592c5af4ef0a2.15e785dcb61@tc7-visena
Backpatch-through: 9.5
The previous error message when attempting to run a general SQL command
in a physical replication WAL sender was a bit sloppy.
Reported-by: Fujii Masao <masao.fujii@gmail.com>
Commit 83aaac41c66959a3ebaec7daadc4885b5f98f561 introduced the use of
LDAP_NO_ATTRS to avoid requesting a dummy attribute when doing search+bind
LDAP authentication. It turns out that not all LDAP implementations define
that macro, but its value is fixed by the protocol so we can define it
ourselves if it's missing.
Author: Thomas Munro
Reported-By: Ashutosh Sharma
Discussion: https://postgr.es/m/CAE9k0Pm6FKCfPCiAr26-L_SMGOA7dT_k0%2B3pEbB8%2B-oT39xRpw%40mail.gmail.com
This patch adds ERROR, SQLSTATE, and ROW_COUNT, which are updated after
every query, as well as LAST_ERROR_MESSAGE and LAST_ERROR_SQLSTATE,
which are updated only when a query fails. The expected usage of these
is for scripting.
Fabien Coelho, reviewed by Pavel Stehule
Discussion: https://postgr.es/m/alpine.DEB.2.20.1704042158020.12290@lancre
Before, only filters of the form "(<ldapsearchattribute>=<user>)"
could be used to search an LDAP server. Introduce ldapsearchfilter
so that more general filters can be configured using patterns, like
"(|(uid=$username)(mail=$username))" and "(&(uid=$username)
(objectClass=posixAccount))". Also allow search filters to be included
in an LDAP URL.
Author: Thomas Munro
Reviewed-By: Peter Eisentraut, Mark Cave-Ayland, Magnus Hagander
Discussion: https://postgr.es/m/CAEepm=0XTkYvMci0WRubZcf_1am8=gP=7oJErpsUfRYcKF2gwg@mail.gmail.com
When copying from an active database tree, it's possible for files to be
deleted after we see them in a readdir() scan but before we can open them.
(Once we've got a file open, we don't expect any further errors from it
getting unlinked, though.) Tweak RecursiveCopy so it can cope with this
case, so as to avoid irreproducible test failures.
Back-patch to 9.6 where this code was added. In v10 and HEAD, also
remove unused "use RecursiveCopy" in one recovery test script.
Michael Paquier and Tom Lane
Discussion: https://postgr.es/m/24621.1504924323@sss.pgh.pa.us
This is primarily useful for making tests of this utility more
deterministic, to avoid the complexity of starting pg_receivewal as a
deamon in TAP tests.
While this is less useful than the equivalent pg_recvlogical option,
users can as well use it for example to enforce WAL streaming up to a
end-of-backup position, to save only a minimal amount of WAL.
Use this new option to stream WAL data in a deterministic way within a
new set of TAP tests.
Author: Michael Paquier <michael.paquier@gmail.com>
This allows the compiler/linker to move the static variables to a
read-only segment. Not all the signature changes are necessary, but
it seems better to apply const in a consistent manner.
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/20170910232154.asgml44ji2b7lv3d@alap3.anarazel.de
AFTER triggers using transition tables crashed if they were fired due
to a foreign key ON CASCADE update. This is because ExecEndModifyTable
flushes the transition tables, on the assumption that any trigger that
could need them was already fired during ExecutorFinish. Normally
that's true, because we don't allow transition-table-using triggers
to be deferred. However, foreign key CASCADE updates force any
triggers on the referencing table to be deferred to the outer query
level, by means of the EXEC_FLAG_SKIP_TRIGGERS flag. I don't recall
all the details of why it's like that and am pretty loath to redesign
it right now. Instead, just teach ExecEndModifyTable to skip destroying
the TransitionCaptureState when that flag is set. This will allow the
transition table data to survive until end of the current subtransaction.
This isn't a terribly satisfactory solution, because (1) we might be
leaking the transition tables for much longer than really necessary,
and (2) as things stand, an AFTER STATEMENT trigger will fire once per
RI updating query, ie once per row updated or deleted in the referenced
table. I suspect that is not per SQL spec. But redesigning this is a
research project that we're certainly not going to get done for v10.
So let's go with this hackish answer for now.
In passing, tweak AfterTriggerSaveEvent to not save the transition_capture
pointer into the event record for a deferrable trigger. This is not
necessary to fix the current bug, but it avoids letting dangling pointers
to long-gone transition tables persist in the trigger event queue. That's
at least a safety feature. It might also allow merging shared trigger
states in more cases than before.
I added a regression test that demonstrates the crash on unpatched code,
and also exposes the behavior of firing the AFTER STATEMENT triggers
once per row update.
Per bug #14808 from Philippe Beaudoin. Back-patch to v10.
Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
This improves the regression tests' coverage of rbtree.c from pretty
awful (because some of the functions aren't used yet) to basically 100%.
Victor Drobny, reviewed by Aleksander Alekseev and myself
Discussion: https://postgr.es/m/c9d61310e16e75f8acaf6cb1c48b7b77@postgrespro.ru
This code isn't used, and there's no clear reason why anybody would ever
want to use it. These traversal mechanisms don't yield a visitation order
that is semantically meaningful for any external purpose, nor are they
any faster or simpler than the left-to-right or right-to-left traversals.
(In fact, some rough testing suggests they are slower :-(.) Moreover,
these mechanisms are impossible to test in any arm's-length fashion; doing
so requires knowledge of the red-black tree's internal implementation.
Hence, let's just jettison them.
Discussion: https://postgr.es/m/17735.1505003111@sss.pgh.pa.us
The previous coding of get_qual_for_list() was careful to copy everything
it was using from the input data structure. The new version missed
making a copy of pass-by-ref datum values that it's inserting into Consts.
This is not optional, however, as revealed by buildfarm failures on
machines running -DRELCACHE_FORCE_RELEASE: we're copying from a relcache
entry that could go away before the required lifespan of our output
expression. I'm pretty sure -DCLOBBER_CACHE_ALWAYS machines won't like
this either, but none of them have reported in yet.
map_partition_varattnos() failed to set its found_whole_row output
parameter if the given expression list was NIL. This seems to be
a pre-existing bug that chanced to be exposed by commit 6f6b99d13.
It might be unreachable in v10, but I have little faith in that
proposition, so back-patch.
Per buildfarm.
Not completely sure, but I think bowerbird is spitting up on attempting
to include ">" in a temporary file name. (Why in the world are we
writing this stuff into files at all? A hash would be a better answer.)
* Remove no-such-user test case, output isn't stable, and we really
don't need to be testing such cases here anyway.
* Fix the process exit code test logic to match PostgresNode::psql
(but I didn't bother with looking at the "core" flag).
* Give up on inf/nan tests.
Per buildfarm.
The encoding ID was converted between string and number too many times,
probably a remnant from the shell script days.
Reviewed-by: Aleksandr Parfenov <a.parfenov@postgrespro.ru>
* Our own version of getopt_long doesn't support abbreviation of
long options.
* It doesn't do automatic rearrangement of non-option arguments to the end,
either.
* Test was way too optimistic about the platform independence of
NaN and Infinity outputs. I rather imagine we might have to lose
those tests altogether, but for the moment just allow case variation
and fully spelled out Infinity.
Per buildfarm.
In commit fccebe421, we hacked get_actual_variable_range() to scan the
index with SnapshotDirty, so that if there are many uncommitted tuples
at the end of the index range, it wouldn't laboriously scan through all
of them looking for a live value to return. However, that didn't fix it
for the case of many recently-dead tuples at the end of the index;
SnapshotDirty recognizes those as committed dead and so we're back to
the same problem.
To improve the situation, invent a "SnapshotNonVacuumable" snapshot type
and use that instead. The reason this helps is that, if the snapshot
rejects a given index entry, we know that the indexscan will mark that
index entry as killed. This means the next get_actual_variable_range()
scan will proceed past that entry without visiting the heap, making the
scan a lot faster. We may end up accepting a recently-dead tuple as
being the estimated extremal value, but that doesn't seem much worse than
the compromise we made before to accept not-yet-committed extremal values.
The cost of the scan is still proportional to the number of dead index
entries at the end of the range, so in the interval after a mass delete
but before VACUUM's cleaned up the mess, it's still possible for
get_actual_variable_range() to take a noticeable amount of time, if you've
got enough such dead entries. But the constant factor is much much better
than before, since all we need to do with each index entry is test its
"killed" bit.
We chose to back-patch commit fccebe421 at the time, but I'm hesitant to
do so here, because this form of the problem seems to affect many fewer
people. Also, even when it happens, it's less bad than the case fixed
by commit fccebe421 because we don't get the contention effects from
expensive TransactionIdIsInProgress tests.
Dmitriy Sarafannikov, reviewed by Andrey Borodin
Discussion: https://postgr.es/m/05C72CF7-B5F6-4DB9-8A09-5AC897653113@yandex.ru
It is equivalent in ANSI C to write (*funcptr) () and funcptr(). These
two styles have been applied inconsistently. After discussion, we'll
use the more verbose style for plain function pointer variables, to make
it clear that it's a variable, and the shorter style when the function
pointer is in a struct (s.func() or s->func()), because then it's clear
that it's not a plain function name, and otherwise the excessive
punctuation makes some of those invocations hard to read.
Discussion: https://www.postgresql.org/message-id/f52c16db-14ed-757d-4b48-7ef360b1631d@2ndquadrant.com
This doesn't allow routing tuple to the foreign partitions themselves,
but it permits tuples to be routed to regular partitions despite the
presence of foreign partitions in the same inheritance hierarchy.
Etsuro Fujita, reviewed by Amit Langote and by me.
Discussion: http://postgr.es/m/bc3db4c1-1693-3b8a-559f-33ad2b50b7ad@lab.ntt.co.jp
Issuing a savepoint-related command in a Query message that contains
multiple SQL statements led to a FATAL exit with a complaint about
"unexpected state STARTED". This is a shortcoming of commit 4f896dac1,
which attempted to prevent such misbehaviors in multi-statement strings;
its quick hack of marking the individual statements as "not top-level"
does the wrong thing in this case, and isn't a very accurate description
of the situation anyway.
To fix, let's introduce into xact.c an explicit model of what happens for
multi-statement Query strings. This is an "implicit transaction block
in progress" state, which for many purposes works like the normal
TBLOCK_INPROGRESS state --- in particular, IsTransactionBlock returns true,
causing the desired result that PreventTransactionChain will throw error.
But in case of error abort it works like TBLOCK_STARTED, allowing the
transaction to be cancelled without need for an explicit ROLLBACK command.
Commit 4f896dac1 is reverted in toto, so that we go back to treating the
individual statements as "top level". We could have left it as-is, but
this allows sharpening the error message for PreventTransactionChain
calls inside functions.
Except for getting a normal error instead of a FATAL exit for savepoint
commands, this patch should result in no user-visible behavioral change
(other than that one error message rewording). There are some things
we might want to do in the line of changing the appearance or wording of
error and warning messages around this behavior, which would be much
simpler to do now that it's an explicitly modeled state. But I haven't
done them here.
Although this fixes a long-standing bug, no backpatch. The consequences
of the bug don't seem severe enough to justify the risk that this commit
itself creates some new issue.
Patch by me, but it owes something to previous investigation by
Takayuki Tsunakawa, who also reported the bug in the first place.
Also thanks to Michael Paquier for reviewing.
Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F6BE40D@G01JPEXMBYT05
In the generic atomic ops that rely on a loop around a CAS primitive,
there's no need to force the initial read of the "old" value to be atomic.
In the typically-rare case that we get a torn value, that simply means
that the first CAS attempt will fail; but it will update "old" to the
atomically-read value, so the next attempt has a chance of succeeding.
It was already being done that way in pg_atomic_exchange_u64_impl(),
but let's duplicate the approach in the rest.
(Given the current coding of the pg_atomic_read functions, this change
is a no-op anyway on popular platforms; it only makes a difference where
pg_atomic_read_u64_impl() is implemented as a CAS.)
In passing, also remove unnecessary take-a-pointer-and-dereference-it
coding in the pg_atomic_read functions. That seems to have been based
on a misunderstanding of what the C standard requires. What actually
matters is that the pointer be declared as pointing to volatile, which
it is.
I don't believe this will change the assembly code at all on x86
platforms (even ignoring the likelihood that these implementations
get overridden by others); but it may help on less-mainstream CPUs.
Discussion: https://postgr.es/m/13707.1504718238@sss.pgh.pa.us