Given a three-or-more-way equivalence class, such as X.Y = Y.Y = Z.Z,
it was possible for the planner to omit one of the quals needed to
enforce that all members of the equivalence class are actually equal.
This only happened in the case of a parameterized join node for two
of the relations, that is a plan tree like
Nested Loop
-> Scan X
-> Nested Loop
-> Scan Y
-> Scan Z
Filter: Z.Z = X.X
The eclass machinery normally expects to apply X.X = Y.Y when those
two relations are joined, but in this shape of plan tree they aren't
joined until the top node --- and, if the lower nested loop is marked
as parameterized by X, the top node will assume that the relevant eclass
condition(s) got pushed down into the lower node. On the other hand,
the scan of Z assumes that it's only responsible for constraining Z.Z
to match any one of the other eclass members. So one or another of
the required quals sometimes fell between the cracks, depending on
whether consideration of the eclass in get_joinrel_parampathinfo()
for the lower nested loop chanced to generate X.X = Y.Y or X.X = Z.Z
as the appropriate constraint there. If it generated the latter,
it'd erroneously suppose that the Z scan would take care of matters.
To fix, force X.X = Y.Y to be generated and applied at that join node
when this case occurs.
This is *extremely* hard to hit in practice, because various planner
behaviors conspire to mask the problem; starting with the fact that the
planner doesn't really like to generate a parameterized plan of the
above shape. (It might have been impossible to hit it before we
tweaked things to allow this plan shape for star-schema cases.) Many
thanks to Alexander Kirkouski for submitting a reproducible test case.
The bug can be demonstrated in all branches back to 9.2 where parameterized
paths were introduced, so back-patch that far.
The previous coding here was formally undefined, though it seems to
accidentally work on most platforms in the buildfarm. Caught by some
OpenBSD platforms in which libc contains an assertion check for
overlapping areas passed to memcpy().
Thomas Munro
Previously, database clusters created by a TAP test were shut down by
DESTROY methods attached to the PostgresNode objects representing them.
The trouble with that is that if the objects survive into the final global
destruction phase (which they do), Perl executes the DESTROY methods in an
unspecified order. Thus, the order of shutdown of multiple clusters was
indeterminate, which might lead to not-very-reproducible errors getting
logged (eg from a slave whose master might or might not get killed first).
Worse, the File::Temp objects representing the temporary PGDATA directories
might get destroyed before the PostgresNode objects, resulting in attempts
to delete PGDATA directories that still have live servers in them. On
Windows, this would lead to directory deletion failures; on Unix, it
usually had no effects worse than erratic "could not open temporary
statistics file "pg_stat/global.tmp": No such file or directory" log
messages.
While none of this would affect the reported result of the TAP test, which
is already determined, it could be very confusing when one is trying to
understand from the logs what went wrong with a failed test.
To fix, do the postmaster shutdowns in an END block rather than at object
destruction time. The END block will execute at a well-defined (and
reasonable) time during script termination, and it will stop the
postmasters in order of PostgresNode object creation. (Perhaps we should
change that to be reverse order of creation, but the main point here is
that we now have control which we did not before.) Use "pg_ctl stop", not
an asynchronous kill(SIGQUIT), so that we wait for the postmasters to shut
down before proceeding with directory deletion.
Deletion of temporary directories still happens in an unspecified order
during global destruction, but I can see no reason to care about that
once the postmasters are stopped.
Commit fab84c7787f25756 tried to get away without doing an actual bind(),
but buildfarm results show that that doesn't get the job done. So we must
really bind to the target port --- and at least on my Linux box, we need a
listen() as well, or conflicts won't be detected. We rely on SO_REUSEADDR
to prevent problems from starting a postmaster on the socket immediately
after we've bound to it in the test code. (There may be platforms where
that doesn't work too well. But fortunately, we only really care whether
this works on Windows, and there the default behavior should be OK.)
Buildfarm members bowerbird and jacana have shown intermittent "could not
bind IPv4 socket" failures in the BinInstallCheck stage since mid-December,
shortly after commits 1caef31d9e550408 and 9821492ee417a591 changed the
logic for selecting which port to use in temporary installations. One
plausible explanation is that we are randomly selecting ports that are
already in use for some non-Postgres purpose. Although the code tried
to defend against already-in-use ports, it used pg_isready to probe
the port which is quite unhelpful: if some non-Postgres server responds
at the given address, pg_isready will generally say "no response",
leading to exactly the wrong conclusion about whether the port is free.
Instead, let's use a simple TCP connect() call to see if anything answers
without making assumptions about what it is. Note that this means there's
no direct check for a conflicting Unix socket, but that should be okay
because there should be no other Unix sockets in use in the temporary
socket directory created for a test run.
This is only a partial solution for the TCP case, since if the port number
is in use for an outgoing connection rather than a listening socket, we'll
fail to detect that. We could try to bind() to the proposed port as a
means of detecting that case, but that would introduce its own failure
modes, since the system might consider the address to remain reserved for
some period of time after we drop the bound socket. Close study of the
errors returned by bowerbird and jacana suggests that what we're seeing
there may be conflicts with listening not outgoing sockets, so let's try
this and see if it improves matters. It's certainly better than what's
there now, in any case.
Michael Paquier, adjusted by me to work on non-Windows as well as Windows
Given a left join containing a full join in its righthand side, with
the left join's joinclause referencing only one side of the full join
(in a non-strict fashion, so that the full join doesn't get simplified),
the planner could fail with "failed to build any N-way joins" or related
errors. This happened because the full join was seen as overlapping the
left join's RHS, and then recent changes within join_is_legal() caused
that function to conclude that the full join couldn't validly be formed.
Rather than try to rejigger join_is_legal() yet more to allow this,
I think it's better to fix initsplan.c so that the required join order
is explicit in the SpecialJoinInfo data structure. The previous coding
there essentially ignored full joins, relying on the fact that we don't
flatten them in the joinlist data structure to preserve their ordering.
That's sufficient to prevent a wrong plan from being formed, but as this
example shows, it's not sufficient to ensure that the right plan will
be formed. We need to work a bit harder to ensure that the right plan
looks sane according to the SpecialJoinInfos.
Per bug #14105 from Vojtech Rylko. This was apparently induced by
commit 8703059c6 (though now that I've seen it, I wonder whether there
are related cases that could have failed before that); so back-patch
to all active branches. Unfortunately, that patch also went into 9.0,
so this bug is a regression that won't be fixed in that branch.
When we shoehorned "x op ANY (array)" into the SQL syntax, we created a
fundamental ambiguity as to the proper treatment of a sub-SELECT on the
righthand side: perhaps what's meant is to compare x against each row of
the sub-SELECT's result, or perhaps the sub-SELECT is meant as a scalar
sub-SELECT that delivers a single array value whose members should be
compared against x. The grammar resolves it as the former case whenever
the RHS is a select_with_parens, making the latter case hard to reach ---
but you can get at it, with tricks such as attaching a no-op cast to the
sub-SELECT. Parse analysis would throw away the no-op cast, leaving a
parsetree with an EXPR_SUBLINK SubLink directly under a ScalarArrayOpExpr.
ruleutils.c was not clued in on this fine point, and would naively emit
"x op ANY ((SELECT ...))", which would be parsed as the first alternative,
typically leading to errors like "operator does not exist: text = text[]"
during dump/reload of a view or rule containing such a construct. To fix,
emit a no-op cast when dumping such a parsetree. This might well be
exactly what the user wrote to get the construct accepted in the first
place; and even if she got there with some other dodge, it is a valid
representation of the parsetree.
Per report from Karl Czajkowski. He mentioned only a case involving
RLS policies, but actually the problem is very old, so back-patch to
all supported branches.
Report: <20160421001832.GB7976@moraine.isi.edu>
In commit 2ffa86962077c588 we made pg_ctl recognize an environment variable
PGCTLTIMEOUT to set the default timeout for starting and stopping the
postmaster. However, pg_regress uses pg_ctl only for the "stop" end of
that; it has bespoke code for starting the postmaster, and that code has
historically had a hard-wired 60-second timeout. Further buildfarm
experience says it'd be a good idea if that timeout were also controlled
by PGCTLTIMEOUT, so let's make it so. Like the previous patch, back-patch
to all active branches.
Discussion: <13969.1461191936@sss.pgh.pa.us>
Print the actual value of each function result that's expected to be exact,
rather than merely emitting a NULL if it's not right. Although we print
these with extra_float_digits = 3, we should not trust that the platform
will produce a result visibly different from the expected value if it's off
only in the last place; hence, also include comparisons against the exact
values as before. This is a bit bulkier and uglier than the previous
printout, but it will provide more information and be easier to interpret
if there's a test failure.
Discussion: <18241.1461073100@sss.pgh.pa.us>
Although OID acts pretty much like user data, the other system columns do
not, so an index on one would likely misbehave. And it's pretty hard to
see a use-case for one, anyway. Let's just forbid the case rather than
worry about whether it should be supported.
David Rowley
For reasons of sheer brain fade, we (I) was calling systable_endscan()
immediately after systable_getnext() and expecting the tuple returned
by systable_getnext() to still be valid.
That's clearly wrong. Move the systable_endscan() down below the tuple
usage.
Discovered initially by Pavel Stehule and then also by Alvaro.
Add a regression test based on Alvaro's testing.
The original coding of this test used table and view names like "t",
"tv", "foo", etc. This tended to interfere with doing simple manual
tests in the regression database; not to mention that it posed a
considerable risk of conflict with other regression test scripts.
Prefix these names with "mvtest_" to avoid such conflicts.
Also, change transiently-created role name to be "regress_xxx" per
discussions about being careful with regression-test role creation.
Careless coding added by commit 07cacba983ef79be could result in a crash
or a bizarre error message if someone tried to select an index on the
OID column as the replica identity index for a table. Back-patch to 9.4
where the feature was introduced.
Discussion: CAKJS1f8TQYgTRDyF1_u9PVCKWRWz+DkieH=U7954HeHVPJKaKg@mail.gmail.com
David Rowley
The regression test checks whether the output of pg_stat_replication is
expected or not after changing synchronous_standby_names and reloading
the configuration file. Regarding this test logic, previously there was
a timing issue which made the test result unstable. That is,
pg_stat_replication could return unexpected result during small window
after the configuration file was reloaded before new setting value
took effect, and which made the test fail.
This commit changes the test logic so that it uses a loop with a timeout
to give some room for the test to pass. Now the test fails only when
pg_stat_replication keeps returning unexpected result for 30 seconds.
Michael Paquier
\crosstabview interpreted its arguments in an unusual way, including
doing case-insensitive matching of unquoted column names, which is
surely not the right thing. Rip that out in favor of doing something
equivalent to the dequoting/case-folding rules used by other psql
commands. To keep it simple, change the syntax so that the optional
sort column is specified as a separate argument, instead of the
also-quite-unusual syntax that attached it to the colH argument with
a colon.
Also, rework the error messages to be closer to project style.
For a long time, opclasscmds.c explained that "we do not create a
dependency link to the AM [for an opclass or opfamily], because we don't
currently support DROP ACCESS METHOD". Commit 473b93287040b200 invented
DROP ACCESS METHOD, but it batted only 1 for 2 on adding the dependency
links, and 0 for 2 on updating the comments about the topic.
In passing, undo the same commit's entirely inappropriate decision to
blow away an existing index as a side-effect of create_am.sql.
Everyplace else thinks it's _WIN64, so make these places fall in line.
The pg_regress.c usage is not going to result in any change in behavior,
only suppressing (or not) a compiler warning about downcasting HANDLEs.
So there seems no need for back-patching there.
The libpq/win32.mak usage might represent an actual bug, if anyone were
using this script to build for Windows64, which perhaps nobody is.
Given the lack of field complaints, no back-patch here either.
pg_regress.c problem found by Christian Ullrich, the other by me.
To avoid any possible overlap with existing roles on a system when
doing a 'make installcheck', use role names which start with
'regress_'.
Pointed out by Tom.
\crosstabview is a completely different way to display results from a
query: instead of a vertical display of rows, the data values are placed
in a grid where the column and row headers come from the data itself,
similar to a spreadsheet.
The sort order of the horizontal header can be specified by using
another column in the query, and the vertical header determines its
ordering from the order in which they appear in the query.
This only allows displaying a single value in each cell. If more than
one value correspond to the same cell, an error is thrown. Merging of
values can be done in the query itself, if necessary. This may be
revisited in the future.
Author: Daniel Verité
Reviewed-by: Pavel Stehule, Dean Rasheed
This creates an initial set of default roles which administrators may
use to grant access to, historically, superuser-only functions. Using
these roles instead of granting superuser access reduces the number of
superuser roles required for a system. Documention for each of the
default roles has been added to user-manag.sgml.
Bump catversion to 201604082, as we had a commit that bumped it to
201604081 and another that set it back to 201604071...
Reviews by José Luis Tallón and Robert Haas
This will prevent users from creating roles which begin with "pg_" and
will check for those roles before allowing an upgrade using pg_upgrade.
This will allow for default roles to be provided at initdb time.
Reviews by José Luis Tallón and Robert Haas
This feature is controlled by a new old_snapshot_threshold GUC. A
value of -1 disables the feature, and that is the default. The
value of 0 is just intended for testing. Above that it is the
number of minutes a snapshot can reach before pruning and vacuum
are allowed to remove dead tuples which the snapshot would
otherwise protect. The xmin associated with a transaction ID does
still protect dead tuples. A connection which is using an "old"
snapshot does not get an error unless it accesses a page modified
recently enough that it might not be able to produce accurate
results.
This is similar to the Oracle feature, and we use the same SQLSTATE
and error message for compatibility.
Output order from the pg_indexes view might vary depending on the
phase of the moon, so add ORDER BY to ensure stable results of tests
added by commit 386e3d7609c49505e079c40c65919d99feb82505.
Per buildfarm.
As noticed by Tom Lane changing operation's number in commit
bb140506df605fab58f48926ee1db1f80bdafb59 causes on-disk format incompatibility.
Revert to previous numbering, that is reason to add special array to store
priorities of operation. Also it reverts order of tsquery to previous.
Author: Dmitry Ivanov
Now indexes (but only B-tree for now) can contain "extra" column(s) which
doesn't participate in index structure, they are just stored in leaf
tuples. It allows to use index only scan by using single index instead
of two or more indexes.
Author: Anastasia Lubennikova with minor editorializing by me
Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
In cases where joins use multiple columns we currently assess each join
separately causing gross mis-estimates for join cardinality.
This patch adds use of FK information for the first time into the
planner. When FKs are present and we have multi-column join information,
plan estimates will be drastically improved. Cases with multiple FKs
are handled, though partial matches are ignored currently.
Net effect is substantial performance improvements for joins in many
common cases. Additional planning time is isolated to cases that are
currently performing poorly, measured at 0.08 - 0.15 ms.
Please watch for planner performance regressions; circumstances seem
unlikely but the law of unintended consequences may apply somewhen.
Additional complex tests welcome to prove this before release.
Tests can be performed using SET enable_fkey_estimates = on | off
using scripts provided during Hackers discussions, message id:
552335D9.3090707@2ndquadrant.com
Authors: Tomas Vondra and David Rowley
Reviewed and tested by Simon Riggs, adding comments only
We shouldn't be adding roles during the regression tests as that can
cause back-to-back installcheck runs to fail and users running the
regression tests likley don't want those extra roles.
Pointed out by Tom
While prior to this patch the user-visible effect on the database
of any set of successfully committed serializable transactions was
always consistent with some one-at-a-time order of execution of
those transactions, the presence of declarative constraints could
allow errors to occur which were not possible in any such ordering,
and developers had no good workarounds to prevent user-facing
errors where they were not necessary or desired. This patch adds
a check for serialization failure ahead of duplicate key checking
so that if a developer explicitly (redundantly) checks for the
pre-existing value they will get the desired serialization failure
where the problem is caused by a concurrent serializable
transaction; otherwise they will get a duplicate key error.
While it would be better if the reads performed by the constraints
could count as part of the work of the transaction for
serialization failure checking, and we will hopefully get there
some day, this patch allows a clean and reliable way for developers
to work around the issue. In many cases existing code will already
be doing the right thing for this to "just work".
Author: Thomas Munro, with minor editing of docs by me
Reviewed-by: Marko Tiikkaja, Kevin Grittner
Patch introduces new text search operator (<-> or <DISTANCE>) into tsquery.
On-disk and binary in/out format of tsquery are backward compatible.
It has two side effect:
- change order for tsquery, so, users, who has a btree index over tsquery,
should reindex it
- less number of parenthesis in tsquery output, and tsquery becomes more
readable
Authors: Teodor Sigaev, Oleg Bartunov, Dmitry Ivanov
Reviewers: Alexander Korotkov, Artur Zakirov
Commit c22650cd6450854e1a75064b698d7dcbb4a8821a sparked a discussion
about diverse interpretations of "token user" in error messages. Expel
old and new specimens of that phrase by making all GetTokenInformation()
callers report errors the way GetTokenUser() has been reporting them.
These error conditions almost can't happen, so users are unlikely to
observe this change.
Reviewed by Tom Lane and Stephen Frost.
Now that all of the infrastructure exists, add in the ability to
dump out the ACLs of the objects inside of pg_catalog or the ACLs
for objects which are members of extensions, but only if they have
been changed from their original values.
The original values are tracked in pg_init_privs. When pg_dump'ing
9.6-and-above databases, we will dump out the ACLs for all objects
in pg_catalog and the ACLs for all extension members, where the ACL
has been changed from the original value which was set during either
initdb or CREATE EXTENSION.
This should not change dumps against pre-9.6 databases.
Reviews by Alexander Korotkov, Jose Luis Tallon
This new catalog holds the privileges which the system was
initialized with at initdb time, along with any permissions set
by extensions at CREATE EXTENSION time. This allows pg_dump
(and any other similar use-cases) to detect when the privileges
set on initdb-created or extension-created objects have been
changed from what they were set to at initdb/extension-creation
time and handle those changes appropriately.
Reviews by Alexander Korotkov, Jose Luis Tallon
It inserts a new value into an jsonb array at arbitrary position or
a new key to jsonb object.
Author: Dmitry Dolgov
Reviewers: Petr Jelinek, Vitaly Burovoy, Andrew Dunstan
This introduces a new dependency type which marks an object as depending
on an extension, such that if the extension is dropped, the object
automatically goes away; and also, if the database is dumped, the object
is included in the dump output. Currently the grammar supports this for
indexes, triggers, materialized views and functions only, although the
utility code is generic so adding support for more object types is a
matter of touching the parser rules only.
Author: Abhijit Menon-Sen
Reviewed-by: Alexander Korotkov, Álvaro Herrera
Discussion: http://www.postgresql.org/message-id/20160115062649.GA5068@toroid.org
has_parallel_hazard() was ignoring the proparallel markings for
aggregates, which is no good. Fix that. There was no way to mark
an aggregate as actually being parallel-safe, either, so add a
PARALLEL option to CREATE AGGREGATE.
Patch by me, reviewed by David Rowley.
This lets us use parallel aggregate for a variety of useful cases
that didn't work before, like sum(int8), sum(numeric), several
versions of avg(), and various other functions.
Add some regression tests, as well, testing the general sanity of
these and future catalog entries.
David Rowley, reviewed by Tomas Vondra, with a few further changes
by me.
\gexec executes the just-entered query, like \g, but instead of printing
the results it takes each field as a SQL command to send to the server.
Computing a series of queries to be executed is a fairly common thing,
but up to now you always had to resort to kluges like writing the queries
to a file and then inputting the file. Now it can be done with no
intermediate step.
The implementation is fairly straightforward except for its interaction
with FETCH_COUNT. ExecQueryUsingCursor isn't capable of being called
recursively, and even if it were, its need to create a transaction
block interferes unpleasantly with the desired behavior of \gexec after
a failure of a generated query (i.e., that it can continue). Therefore,
disable use of ExecQueryUsingCursor when doing the master \gexec query.
We can still apply it to individual generated queries, however, and there
might be some value in doing so.
While testing this feature's interaction with single-step mode, I (tgl) was
led to conclude that SendQuery needs to recognize SIGINT (cancel_pressed)
as a negative response to the single-step prompt. Perhaps that's a
back-patchable bug fix, but for now I just included it here.
Corey Huinker, reviewed by Jim Nasby, Daniel Vérité, and myself
When adjusting the estimate for the number of distinct values from a
rel in a grouped query to take into account the selectivity of the
rel's restrictions, use a formula that is less likely to produce
under-estimates.
The old formula simply multiplied the number of distinct values in the
rel by the restriction selectivity, which would be correct if the
restrictions were fully correlated with the grouping expressions, but
can produce significant under-estimates in cases where they are not
well correlated.
The new formula is based on the random selection probability, and so
assumes that the restrictions are not correlated with the grouping
expressions. This is guaranteed to produce larger estimates, and of
course risks over-estimating in cases where the restrictions are
correlated, but that has less severe consequences than
under-estimating, which might lead to a HashAgg that consumes an
excessive amount of memory.
This could possibly be improved upon in the future by identifying
correlated restrictions and using a hybrid of the old and new
formulae.
Author: Tomas Vondra, with some hacking be me
Reviewed-by: Mark Dilger, Alexander Korotkov, Dean Rasheed and Tom Lane
Discussion: http://www.postgresql.org/message-id/flat/56CD0381.5060502@2ndquadrant.com
In the test_slot_timelines test module, we were abusing passing NULL
values which was received as zeroes in x86, but this breaks in ARM
(buildfarm member hamster) by crashing instead. Fix the breakage by
marking these functions as STRICT; the InvalidXid value that was
previously implicit in NULL values (on x86 at least) can now be passed
as 0. Failing to follow the fmgr protocol to check for NULLs beforehand
was causing ARM to fail, as evidenced by segmentation faults in
buildfarm member hamster.
In order to use the new functionality in the test script, use COALESCE
in the right spot to avoid forwarding NULL values.
This was diagnosed from the hamster crash by Craig Ringer, who also
proposed a different patch (checking for NULL values explicitely in the
C function code, and keeping the non-strictness in the C functions).
I decided to go with this approach instead.
Our actual convention, contrary to what I said in 59a2111b23f, is not to
quote type names, as evidenced by unquoted use of format_type_be()
result value in error messages. Remove quotes from recently tweaked
messages accordingly.
Per note from Tom Lane