If the operator is a strict btree equality operator, and X isn't volatile,
then the clause must yield true for any non-null value of X, or null if X
is null. At top level of a WHERE clause, we can ignore the distinction
between false and null results, so it's valid to simplify the clause to
"X IS NOT NULL". This is a useful improvement mainly because we'll get
a far better selectivity estimate in most cases.
Because such cases seldom arise in well-written queries, it is unappetizing
to expend a lot of planner cycles looking for them ... but it turns out
that there's a place we can shoehorn this in practically for free, because
equivclass.c already has to detect and reject candidate equivalences of the
form X = X. That doesn't catch every place that it would be valid to
simplify to X IS NOT NULL, but it catches the typical case. Working harder
doesn't seem justified.
Patch by me, reviewed by Petr Jelinek
Discussion: https://postgr.es/m/CAMjNa7cC4X9YR-vAJS-jSYCajhRDvJQnN7m2sLH1wLh-_Z2bsw@mail.gmail.com
Instead of joining two partitioned tables in their entirety we can, if
it is an equi-join on the partition keys, join the matching partitions
individually. This involves teaching the planner about "other join"
rels, which are related to regular join rels in the same way that
other member rels are related to baserels. This can use significantly
more CPU time and memory than regular join planning, because there may
now be a set of "other" rels not only for every base relation but also
for every join relation. In most practical cases, this probably
shouldn't be a problem, because (1) it's probably unusual to join many
tables each with many partitions using the partition keys for all
joins and (2) if you do that scenario then you probably have a big
enough machine to handle the increased memory cost of planning and (3)
the resulting plan is highly likely to be better, so what you spend in
planning you'll make up on the execution side. All the same, for now,
turn this feature off by default.
Currently, we can only perform joins between two tables whose
partitioning schemes are absolutely identical. It would be nice to
cope with other scenarios, such as extra partitions on one side or the
other with no match on the other side, but that will have to wait for
a future patch.
Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit
Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit
Khandekar, and by me. A few final adjustments by me.
Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
Instead of duplicating the logic to search for a matching
ParamPathInfo in multiple places, factor it out into a separate
function.
Pass only the relevant bits of the PartitionKey to
partition_bounds_equal instead of the whole thing, because
partition-wise join will want to call this without having a
PartitionKey available.
Adjust allow_star_schema_join and calc_nestloop_required_outer
to take relevant Relids rather than the entire Path, because
partition-wise join will want to call it with the top-parent
relids to determine whether a child join is allowable.
Ashutosh Bapat. Review and testing of the larger patch set of which
this is a part by Amit Langote, Rajkumar Raghuwanshi, Rafia Sabih,
Thomas Munro, Dilip Kumar, and me.
Discussion: http://postgr.es/m/CA+TgmobQK80vtXjAsPZWWXd7c8u13G86gmuLupN+uUJjA+i4nA@mail.gmail.com
Currently, child relations are always base relations, so when we
translate parent relids to child relids, we only need to translate
a singler relid. However, the proposed partition-wise join feature
will create child joins, which will mean we need to translate a set
of parent relids to the corresponding child relids. This is
preliminary refactoring to make that possible.
Ashutosh Bapat. Review and testing of the larger patch set of which
this is a part by Amit Langote, Rajkumar Raghuwanshi, Rafia Sabih,
Thomas Munro, Dilip Kumar, and me. Some adjustments, mostly
cosmetic, by me.
Discussion: http://postgr.es/m/CA+TgmobQK80vtXjAsPZWWXd7c8u13G86gmuLupN+uUJjA+i4nA@mail.gmail.com
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4d wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:
* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
than the expected column 33.
On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list. This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.
There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses. I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
In a CHECK clause, a null result means true, whereas in a WHERE clause
it means false. predtest.c provided different functions depending on
which set of semantics applied to the predicate being proved, but had
no option to control what a null meant in the clauses provided as
axioms. Add one.
Use that in the partitioning code when figuring out whether the
validation scan on a new partition can be skipped. Rip out the
old logic that attempted (not very successfully) to compensate
for the absence of the necessary support in predtest.c.
Ashutosh Bapat and Robert Haas, reviewed by Amit Langote and
incorporating feedback from Tom Lane.
Discussion: http://postgr.es/m/CAFjFpReT_kq_uwU_B8aWDxR7jNGE=P0iELycdq5oupi=xSQTOw@mail.gmail.com
I'd always assumed that backend/optimizer/geqo/'s remarkably poor
showing on code coverage metrics was because we weren't exercising
it much in the regression tests. But it turns out that a good chunk
of the problem is that there's a bunch of code that is physically
unreachable (because the calls to it are #ifdef'd out in geqo_main.c)
but is being built anyway. Making the called code have #if guards
similar to the calling code saves a couple of kilobytes of executable
size and should make the coverage numbers more reflective of reality.
It's arguable that we should just delete all the unused recombination
mechanisms altogether, but I didn't feel a need to go that far today.
If the inner relation can be proven unique, that is it can have no more
than one matching row for any row of the outer query, then we might as
well implement the semijoin as a plain inner join, allowing substantially
more freedom to the planner. This is a form of outer join strength
reduction, but it can't be implemented in reduce_outer_joins() because
we don't have enough info about the individual relations at that stage.
Instead do it much like remove_useless_joins(): once we've built base
relations, we can make another pass over the SpecialJoinInfo list and
get rid of any entries representing reducible semijoins.
This is essentially a followon to the inner-unique patch (commit 9c7f5229a)
and makes use of the proof machinery that that patch created. We need only
minor refactoring of innerrel_is_unique's API to support this usage.
Per performance complaint from Teodor Sigaev.
Discussion: https://postgr.es/m/f994fc98-389f-4a46-d1bc-c42e05cb43ed@sigaev.ru
If there can certainly be no more than one matching inner row for a given
outer row, then the executor can move on to the next outer row as soon as
it's found one match; there's no need to continue scanning the inner
relation for this outer row. This saves useless scanning in nestloop
and hash joins. In merge joins, it offers the opportunity to skip
mark/restore processing, because we know we have not advanced past the
first possible match for the next outer row.
Of course, the devil is in the details: the proof of uniqueness must
depend only on joinquals (not otherquals), and if we want to skip
mergejoin mark/restore then it must depend only on merge clauses.
To avoid adding more planning overhead than absolutely necessary,
the present patch errs in the conservative direction: there are cases
where inner_unique or skip_mark_restore processing could be used, but
it will not do so because it's not sure that the uniqueness proof
depended only on "safe" clauses. This could be improved later.
David Rowley, reviewed and rather heavily editorialized on by me
Discussion: https://postgr.es/m/CAApHDvqF6Sw-TK98bW48TdtFJ+3a7D2mFyZ7++=D-RyPsL76gw@mail.gmail.com
Follow on patch in the multi-variate statistics patch series.
CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t;
ANALYZE;
will collect dependency stats on (a, b) and then use the measured
dependency in subsequent query planning.
Commit 7b504eb282 added
CREATE STATISTICS with n-distinct coefficients. These are now
specified using the mutually exclusive option WITH (ndistinct).
Author: Tomas Vondra, David Rowley
Reviewed-by: Kyotaro HORIGUCHI, Álvaro Herrera, Dean Rasheed, Robert Haas
and many other comments and contributions
Discussion: https://postgr.es/m/56f40b20-c464-fad2-ff39-06b668fac47c@2ndquadrant.com
Currently, the only type of child relation is an "other member rel",
which is the child of a baserel, but in the future joins and even
upper relations may have child rels. To facilitate that, introduce
macros that test to test for particular RelOptKind values, and use
them in various places where they help to clarify the sense of a test.
(For example, a test may allow RELOPT_OTHER_MEMBER_REL either because
it intends to allow child rels, or because it intends to allow simple
rels.)
Also, remove find_childrel_top_parent, which will not work for a
child rel that is not a baserel. Instead, add a new RelOptInfo
member top_parent_relids to track the same kind of information in a
more generic manner.
Ashutosh Bapat, slightly tweaked by me. Review and testing of the
patch set from which this was taken by Rajkumar Raghuwanshi and Rafia
Sabih.
Discussion: http://postgr.es/m/CA+TgmoagTnF2yqR3PT2rv=om=wJiZ4-A+ATwdnriTGku1CLYxA@mail.gmail.com
A QueryEnvironment concept is added, which allows new types of
objects to be passed into queries from parsing on through
execution. At this point, the only thing implemented is a
collection of EphemeralNamedRelation objects -- relations which
can be referenced by name in queries, but do not exist in the
catalogs. The only type of ENR implemented is NamedTuplestore, but
provision is made to add more types fairly easily.
An ENR can carry its own TupleDesc or reference a relation in the
catalogs by relid.
Although these features can be used without SPI, convenience
functions are added to SPI so that ENRs can easily be used by code
run through SPI.
The initial use of all this is going to be transition tables in
AFTER triggers, but that will be added to each PL as a separate
commit.
An incidental effect of this patch is to produce a more informative
error message if an attempt is made to modify the contents of a CTE
from a referencing DML statement. No tests previously covered that
possibility, so one is added.
Kevin Grittner and Thomas Munro
Reviewed by Heikki Linnakangas, David Fetter, and Thomas Munro
with valuable comments and suggestions from many others
copyObject() is declared to return void *, which allows easily assigning
the result independent of the input, but it loses all type checking.
If the compiler supports typeof or something similar, cast the result to
the input type. This creates a greater amount of type safety. In some
cases, where the result is assigned to a generic type such as Node * or
Expr *, new casts are now necessary, but in general casts are now
unnecessary in the normal case and indicate that something unusual is
happening.
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
This extends the Aggregate node with two new features: HashAggregate
can now run multiple hashtables concurrently, and a new strategy
MixedAggregate populates hashtables while doing sorted grouping.
The planner will now attempt to save as many sorts as possible when
planning grouping sets queries, while not exceeding work_mem for the
estimated combined sizes of all hashtables used. No SQL-level changes
are required. There should be no user-visible impact other than the
new EXPLAIN output and possible changes to result ordering when ORDER
BY was not used (which affected a few regression tests). The
enable_hashagg option is respected.
Author: Andrew Gierth
Reviewers: Mark Dilger, Andres Freund
Discussion: https://postgr.es/m/87vatszyhj.fsf@news-spur.riddles.org.uk
Partitioned tables do not contain any data; only their unpartitioned
descendents need to be scanned. However, the partitioned tables still
need to be locked, even though they're not scanned. To make that
work, Append and MergeAppend relations now need to carry a list of
(unscanned) partitioned relations that must be locked, and InitPlan
must lock all partitioned result relations.
Aside from the obvious advantage of avoiding some work at execution
time, this has two other advantages. First, it may improve the
planner's decision-making in some cases since the empty relation
might throw things off. Second, it paves the way to getting rid of
the storage for partitioned tables altogether.
Amit Langote, reviewed by me.
Discussion: http://postgr.es/m/6837c359-45c4-8044-34d1-736756335a15@lab.ntt.co.jp
Commit 51ee6f3160 accidentally changed
the behavior around inheritance hierarchies; before, we always
considered parallel paths even for very small inheritance children,
because otherwise an inheritance hierarchy with even one small child
wouldn't be eligible for parallelism. That exception was inadverently
removed; put it back.
In passing, also adjust the degree-of-parallelism comptuation for
index-only scans not to consider the number of heap pages fetched.
Otherwise, we'll avoid parallel index-only scans on tables that are
mostly all-visible, which isn't especially logical.
Robert Haas and Amit Kapila, per a report from Ashutosh Sharma.
Discussion: http://postgr.es/m/CAE9k0PmgSoOHRd60SHu09aRVTHRSs8s6pmyhJKWHxWw9C_x+XA@mail.gmail.com
Like Gather, we spawn multiple workers and run the same plan in each
one; however, Gather Merge is used when each worker produces the same
output ordering and we want to preserve that output ordering while
merging together the streams of tuples from various workers. (In a
way, Gather Merge is like a hybrid of Gather and MergeAppend.)
This works out to a win if it saves us from having to perform an
expensive Sort. In cases where only a small amount of data would need
to be sorted, it may actually be faster to use a regular Gather node
and then sort the results afterward, because Gather Merge sometimes
needs to wait synchronously for tuples whereas a pure Gather generally
doesn't. But if this avoids an expensive sort then it's a win.
Rushabh Lathia, reviewed and tested by Amit Kapila, Thomas Munro,
and Neha Sharma, and reviewed and revised by me.
Discussion: http://postgr.es/m/CAGPqQf09oPX-cQRpBKS0Gq49Z+m6KBxgxd_p9gX8CKk_d75HoQ@mail.gmail.com
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
XMLTABLE is defined by the SQL/XML standard as a feature that allows
turning XML-formatted data into relational form, so that it can be used
as a <table primary> in the FROM clause of a query.
This new construct provides significant simplicity and performance
benefit for XML data processing; what in a client-side custom
implementation was reported to take 20 minutes can be executed in 400ms
using XMLTABLE. (The same functionality was said to take 10 seconds
using nested PostgreSQL XPath function calls, and 5 seconds using
XMLReader under PL/Python).
The implemented syntax deviates slightly from what the standard
requires. First, the standard indicates that the PASSING clause is
optional and that multiple XML input documents may be given to it; we
make it mandatory and accept a single document only. Second, we don't
currently support a default namespace to be specified.
This implementation relies on a new executor node based on a hardcoded
method table. (Because the grammar is fixed, there is no extensibility
in the current approach; further constructs can be implemented on top of
this such as JSON_TABLE, but they require changes to core code.)
Author: Pavel Stehule, Álvaro Herrera
Extensively reviewed by: Craig Ringer
Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com
Extract the logic used by hash_inner_and_outer into a separate
function, get_cheapest_parallel_safe_total_inner, so that it can
also be used to plan parallel merge joins.
Also, add a require_parallel_safe argument to the existing function
get_cheapest_path_for_pathkeys, because parallel merge join needs
to find the cheapest path for a given set of pathkeys that is
parallel-safe, not just the cheapest one overall.
Patch by me, reviewed by Dilip Kumar.
Discussion: http://postgr.es/m/CA+TgmoYOv+dFK0MWW6366dFj_xTnohQfoBDrHyB7d1oZhrgPjA@mail.gmail.com
In combination with 569174f1be, which
taught the btree AM how to perform parallel index scans, this allows
parallel index scan plans on btree indexes. This infrastructure
should be general enough to support parallel index scans for other
index AMs as well, if someone updates them to support parallel
scans.
Amit Kapila, reviewed and tested by Anastasia Lubennikova, Tushar
Ahuja, and Haribabu Kommi, and me.
When min_parallel_relation_size was added, the only supported type
of parallel scan was a parallel sequential scan, but there are
pending patches for parallel index scan, parallel index-only scan,
and parallel bitmap heap scan. Those patches introduce two new
types of complications: first, what's relevant is not really the
total size of the relation but the portion of it that we will scan;
and second, index pages and heap pages shouldn't necessarily be
treated in exactly the same way. Typically, the number of index
pages will be quite small, but that doesn't necessarily mean that
a parallel index scan can't pay off.
Therefore, we introduce min_parallel_table_scan_size, which works
out a degree of parallelism for scans based on the number of table
pages that will be scanned (and which is therefore equivalent to
min_parallel_relation_size for parallel sequential scans) and also
min_parallel_index_scan_size which can be used to work out a degree
of parallelism based on the number of index pages that will be
scanned.
Amit Kapila and Robert Haas
Discussion: http://postgr.es/m/CAA4eK1KowGSYYVpd2qPpaPPA5R90r++QwDFbrRECTE9H_HvpOg@mail.gmail.com
Discussion: http://postgr.es/m/CAA4eK1+TnM4pXQbvn7OXqam+k_HZqb0ROZUMxOiL6DWJYCyYow@mail.gmail.com
Currently, we only need this logic in order to cost a Bitmap Heap
Scan. But a pending patch for Parallel Bitmap Heap Scan also uses
it to help figure out how many workers to use for the scan, which
has to be determined prior to costing. So, move the logic to
a separate function to make that easier.
Dilip Kumar. The patch series of which this is a part has been
reviewed by Andres Freund, Amit Khendekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, and me; it is not clear from the email
discussion which of those people have looked specifically at this
part.
Discussion: http://postgr.es/m/CAFiTN-v3QYNJEZnnmKCeATuLbN-h9tMVfeEF0+BrouYDqjXgwg@mail.gmail.com
Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT
generate_series(1,5)) so far was done in the expression evaluation (i.e.
ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code.
This meant that most executor nodes performing projection, and most
expression evaluation functions, had to deal with the possibility that an
evaluated expression could return a set of return values.
That's bad because it leads to repeated code in a lot of places. It also,
and that's my (Andres's) motivation, made it a lot harder to implement a
more efficient way of doing expression evaluation.
To fix this, introduce a new executor node (ProjectSet) that can evaluate
targetlists containing one or more SRFs. To avoid the complexity of the old
way of handling nested expressions returning sets (e.g. having to pass up
ExprDoneCond, and dealing with arguments to functions returning sets etc.),
those SRFs can only be at the top level of the node's targetlist. The
planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is
only necessary in ProjectSet nodes and that SRFs are only present at the
top level of the node's targetlist. If there are nested SRFs the planner
creates multiple stacked ProjectSet nodes. The ProjectSet nodes always get
input from an underlying node.
We also discussed and prototyped evaluating targetlist SRFs using ROWS
FROM(), but that turned out to be more complicated than we'd hoped.
While moving SRF evaluation to ProjectSet would allow to retain the old
"least common multiple" behavior when multiple SRFs are present in one
targetlist (i.e. continue returning rows until all SRFs are at the end of
their input at the same time), we decided to instead only return rows till
all SRFs are exhausted, returning NULL for already exhausted ones. We
deemed the previous behavior to be too confusing, unexpected and actually
not particularly useful.
As a side effect, the previously prohibited case of multiple set returning
arguments to a function, is now allowed. Not because it's particularly
desirable, but because it ends up working and there seems to be no argument
for adding code to prohibit it.
Currently the behavior for COALESCE and CASE containing SRFs has changed,
returning multiple rows from the expression, even when the SRF containing
"arm" of the expression is not evaluated. That's because the SRFs are
evaluated in a separate ProjectSet node. As that's quite confusing, we're
likely to instead prohibit SRFs in those places. But that's still being
discussed, and the code would reside in places not touched here, so that's
a task for later.
There's a lot of, now superfluous, code dealing with set return expressions
around. But as the changes to get rid of those are verbose largely boring,
it seems better for readability to keep the cleanup as a separate commit.
Author: Tom Lane and Andres Freund
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
In an RLS query, we must ensure that security filter quals are evaluated
before ordinary query quals, in case the latter contain "leaky" functions
that could expose the contents of sensitive rows. The original
implementation of RLS planning ensured this by pushing the scan of a
secured table into a sub-query that it marked as a security-barrier view.
Unfortunately this results in very inefficient plans in many cases, because
the sub-query cannot be flattened and gets planned independently of the
rest of the query.
To fix, drop the use of sub-queries to enforce RLS qual order, and instead
mark each qual (RestrictInfo) with a security_level field establishing its
priority for evaluation. Quals must be evaluated in security_level order,
except that "leakproof" quals can be allowed to go ahead of quals of lower
security_level, if it's helpful to do so. This has to be enforced within
the ordering of any one list of quals to be evaluated at a table scan node,
and we also have to ensure that quals are not chosen for early evaluation
(i.e., use as an index qual or TID scan qual) if they're not allowed to go
ahead of other quals at the scan node.
This is sufficient to fix the problem for RLS quals, since we only support
RLS policies on simple tables and thus RLS quals will always exist at the
table scan level only. Eventually these qual ordering rules should be
enforced for join quals as well, which would permit improving planning for
explicit security-barrier views; but that's a task for another patch.
Note that FDWs would need to be aware of these rules --- and not, for
example, send an insecure qual for remote execution --- but since we do
not yet allow RLS policies on foreign tables, the case doesn't arise.
This will need to be addressed before we can allow such policies.
Patch by me, reviewed by Stephen Frost and Dean Rasheed.
Discussion: https://postgr.es/m/8185.1477432701@sss.pgh.pa.us
Normally, if we have a WHERE clause like "indexcol = constant",
the planner will figure out that that index column can be ignored
when determining whether the index has a desired sort ordering.
But this failed to work for boolean index columns, because a
condition like "boolcol = true" is canonicalized to just "boolcol"
which does not give rise to an EquivalenceClass. Add a check to
allow the same type of deduction to be made in this case too.
Per a complaint from Dima Pavlov. Arguably this is a bug, but given the
limited impact and the small number of complaints so far, I won't risk
destabilizing plans in stable branches by back-patching.
Patch by me, reviewed by Michael Paquier
Discussion: https://postgr.es/m/1788.1481605684@sss.pgh.pa.us
We need to scan the whole parse tree for parallel-unsafe functions.
If there are none, we'll later need to determine whether particular
subtrees contain any parallel-restricted functions. The previous coding
retained no knowledge from the first scan, even though this is very
wasteful in the common case where the query contains only parallel-safe
functions. We can bypass all of the later scans by remembering that fact.
This provides a small but measurable speed improvement when the case
applies, and shouldn't cost anything when it doesn't.
Patch by me, reviewed by Robert Haas
Discussion: <3740.1471538387@sss.pgh.pa.us>
It's rather silly to make a separate pass over the tlist + HAVING qual,
and a separate set of visits to the syscache, when get_agg_clause_costs
already has all the required information in hand. This nets out as less
code as well as fewer cycles.
The original coding had three separate booleans representing partial
aggregation behavior, which was confusing, unreadable, and error-prone,
not least because the booleans weren't always listed in the same order.
It was also inadequate for the allegedly-desirable future extension to
support intermediate partial aggregation, because we'd need separate
markers for serialization and deserialization in such a case.
Merge these bools into an enum "AggSplit" to provide symbolic names for
the supported operating modes (and document what those are). By assigning
the values of the enum constants carefully, we can treat AggSplit values
as options bitmasks so that tests of what to do aren't noticeably more
expensive than before.
While at it, get rid of Aggref.aggoutputtype. That's not needed since
commit 59a3795c2 got rid of setrefs.c's special-purpose Aggref comparison
code, and it likewise seemed more confusing than helpful.
Assorted comment cleanup as well (there's still more that I want to do
in that line).
catversion bump for change in Aggref node contents. Should be the last
one for partial-aggregation changes.
Discussion: <29309.1466699160@sss.pgh.pa.us>
Commit e06a38965's original coding for constructing the execution-time
expression tree for a combining aggregate was rather messy, involving
duplicating quite a lot of code in setrefs.c so that it could inject
a nonstandard matching rule for Aggrefs. Get rid of that in favor of
explicitly constructing a combining Aggref with a partial Aggref as input,
then allowing setref's normal matching logic to match the partial Aggref
to the output of the lower plan node and hence replace it with a Var.
In passing, rename and redocument make_partialgroup_input_target to have
some connection to what it actually does.
The original upper-planner-pathification design (commit 3fc6e2d7f5)
assumed that we could always determine during Path formation whether or not
we would need a Result plan node to perform projection of a targetlist.
That turns out not to work very well, though, because createplan.c still
has some responsibilities for choosing the specific target list associated
with sorting/grouping nodes (in particular it might choose to add resjunk
columns for sorting). We might not ever refactor that --- doing so would
push more work into Path formation, which isn't attractive --- and we
certainly won't do so for 9.6. So, while create_projection_path and
apply_projection_to_path can tell for sure what will happen if the subpath
is projection-capable, they can't tell for sure when it isn't. This is at
least a latent bug in apply_projection_to_path, which might think it can
apply a target to a non-projecting node when the node will end up computing
something different.
Also, I'd tied the creation of a ProjectionPath node to whether or not a
Result is needed, but it turns out that we sometimes need a ProjectionPath
node anyway to avoid modifying a possibly-shared subpath node. Callers had
to use create_projection_path for such cases, and we added code to them
that knew about the potential omission of a Result node and attempted to
adjust the cost estimates for that. That was uncertainly correct and
definitely ugly/unmaintainable.
To fix, have create_projection_path explicitly check whether a Result
is needed and adjust its cost estimate accordingly, though it creates
a ProjectionPath in either case. apply_projection_to_path is now mostly
just an optimized version that can avoid creating an extra Path node when
the input is known to not be shared with any other live path. (There
is one case that create_projection_path doesn't handle, which is pushing
parallel-safe expressions below a Gather node. We could make it do that
by duplicating the GatherPath, but there seems no need as yet.)
create_projection_plan still has to recheck the tlist-match condition,
which means that if the matching situation does get changed by createplan.c
then we'll have made a slightly incorrect cost estimate. But there seems
no help for that in the near term, and I doubt it occurs often enough,
let alone would change planning decisions often enough, to be worth
stressing about.
I added a "dummypp" field to ProjectionPath to track whether
create_projection_path thinks a Result is needed. This is not really
necessary as-committed because create_projection_plan doesn't look at the
flag; but it seems like a good idea to remember what we thought when
forming the cost estimate, if only for debugging purposes.
In passing, get rid of the target_parallel parameter added to
apply_projection_to_path by commit 54f5c5150. I don't think that's a good
idea because it involves callers in what should be an internal decision,
and opens us up to missing optimization opportunities if callers think they
don't need to provide a valid flag, as most don't. For the moment, this
just costs us an extra has_parallel_hazard call when planning a Gather.
If that starts to look expensive, I think a better solution would be to
teach PathTarget to carry/cache knowledge of parallel-safety of its
contents.
This patch provides a new implementation of the logic added by commit
137805f89 and later removed by 77ba61080. It differs from the original
primarily in expending much less effort per joinrel in large queries,
which it accomplishes by doing most of the matching work once per query not
once per joinrel. Hopefully, it's also less buggy and better commented.
The never-documented enable_fkey_estimates GUC remains gone.
There remains work to be done to make the selectivity estimates account
for nulls in FK referencing columns; but that was true of the original
patch as well. We may be able to address this point later in beta.
In the meantime, any error should be in the direction of overestimating
rather than underestimating joinrel sizes, which seems like the direction
we want to err in.
Tomas Vondra and Tom Lane
Discussion: <31041.1465069446@sss.pgh.pa.us>
Commit 04ae11f62e removed some broken
code to apply the scan/join target to partial paths, but its theory
that this processing step is totally unnecessary turns out to be wrong.
Put similar code back again, but this time, check for parallel-safety
and avoid in-place modifications to paths that may already have been
used as part of some other path.
(This is not an entirely elegant solution to this problem; it might
be better, for example, to postpone generate_gather_paths for the
topmost scan/join rel until after the scan/join target has been
applied. But this is not the time for such redesign work.)
Amit Kapila and Robert Haas
The main point of doing this is to allow the cutoff to be set very small,
even zero, to allow parallel-query behavior to be tested on relatively
small tables such as we typically use in the regression tests. But it
might be of use to users too. The number-of-workers scaling behavior in
create_plain_partial_paths() is pretty ad-hoc and subject to change, so
we won't expose anything about that, but the notion of not considering
parallel query at all for tables below size X seems reasonably stable.
Amit Kapila, per a suggestion from me
Discussion: <17170.1465830165@sss.pgh.pa.us>
As noted by Andres Freund, we'd accumulated quite a few similar functions
in clauses.c that examine all functions in an expression tree to see if
they satisfy some boolean test. Reduce the duplication by inventing a
function check_functions_in_node() that applies a simple callback function
to each SQL function OID appearing in a given expression node. This also
fixes some arguable oversights; for example, contain_mutable_functions()
did not check aggregate or window functions for mutability. I doubt that
that represents a live bug at the moment, because we don't really consider
mutability for aggregates; but it might someday be one.
I chose to put check_functions_in_node() in nodeFuncs.c because it seemed
like other modules might wish to use it in future. That in turn forced
moving set_opfuncid() et al into nodeFuncs.c, as the alternative was for
nodeFuncs.c to depend on optimizer/setrefs.c which didn't seem very clean.
In passing, teach contain_leaked_vars_walker() about a few more expression
node types it can safely look through, and improve the rather messy and
undercommented code in has_parallel_hazard_walker().
Discussion: <20160527185853.ziol2os2zskahl7v@alap3.anarazel.de>
This terminology provoked widespread complaints. So, instead, rename
the GUC max_parallel_degree to max_parallel_workers_per_gather
(leaving room for a possible future GUC max_parallel_workers that acts
as a system-wide limit), and rename the parallel_degree reloption to
parallel_workers. Rename structure members to match.
These changes create a dump/restore hazard for users of PostgreSQL
9.6beta1 who have set the reloption (or applied the GUC using ALTER
USER or ALTER DATABASE).
This commit reverts 137805f89 as well as the associated commits 015e88942,
5306df283, and 68d704edb. We found multiple bugs in this feature, and
there was concern about possible planner slowdown (though to be fair,
exhibiting a very large slowdown proved difficult). The way forward
requires a considerable rewrite, which may or may not be possible to
accomplish in time for beta2. In my judgment reviewing the rewrite will
be easier to accomplish starting from a clean slate, so let's temporarily
revert what's there now. This also leaves us in a safe state if it turns
out to be necessary to postpone the rewrite to the next development cycle.
Discussion: <20160429102531.GA13701@huehner.biz>
Given a three-or-more-way equivalence class, such as X.Y = Y.Y = Z.Z,
it was possible for the planner to omit one of the quals needed to
enforce that all members of the equivalence class are actually equal.
This only happened in the case of a parameterized join node for two
of the relations, that is a plan tree like
Nested Loop
-> Scan X
-> Nested Loop
-> Scan Y
-> Scan Z
Filter: Z.Z = X.X
The eclass machinery normally expects to apply X.X = Y.Y when those
two relations are joined, but in this shape of plan tree they aren't
joined until the top node --- and, if the lower nested loop is marked
as parameterized by X, the top node will assume that the relevant eclass
condition(s) got pushed down into the lower node. On the other hand,
the scan of Z assumes that it's only responsible for constraining Z.Z
to match any one of the other eclass members. So one or another of
the required quals sometimes fell between the cracks, depending on
whether consideration of the eclass in get_joinrel_parampathinfo()
for the lower nested loop chanced to generate X.X = Y.Y or X.X = Z.Z
as the appropriate constraint there. If it generated the latter,
it'd erroneously suppose that the Z scan would take care of matters.
To fix, force X.X = Y.Y to be generated and applied at that join node
when this case occurs.
This is *extremely* hard to hit in practice, because various planner
behaviors conspire to mask the problem; starting with the fact that the
planner doesn't really like to generate a parameterized plan of the
above shape. (It might have been impossible to hit it before we
tweaked things to allow this plan shape for star-schema cases.) Many
thanks to Alexander Kirkouski for submitting a reproducible test case.
The bug can be demonstrated in all branches back to 9.2 where parameterized
paths were introduced, so back-patch that far.
The original patch kind of ignored the fact that we were doing something
different from a costing point of view, but nobody noticed. This patch
fixes that oversight.
David Rowley
Per discussion, this gives potential users of the hook more flexibility,
because they can build custom Paths that implement only one stage of
upper processing atop core-provided Paths for earlier stages.
In cases where joins use multiple columns we currently assess each join
separately causing gross mis-estimates for join cardinality.
This patch adds use of FK information for the first time into the
planner. When FKs are present and we have multi-column join information,
plan estimates will be drastically improved. Cases with multiple FKs
are handled, though partial matches are ignored currently.
Net effect is substantial performance improvements for joins in many
common cases. Additional planning time is isolated to cases that are
currently performing poorly, measured at 0.08 - 0.15 ms.
Please watch for planner performance regressions; circumstances seem
unlikely but the law of unintended consequences may apply somewhen.
Additional complex tests welcome to prove this before release.
Tests can be performed using SET enable_fkey_estimates = on | off
using scripts provided during Hackers discussions, message id:
552335D9.3090707@2ndquadrant.com
Authors: Tomas Vondra and David Rowley
Reviewed and tested by Simon Riggs, adding comments only
Previously, the planner would reject an index-only scan if any restriction
clause for its table used a column not available from the index, even
if that restriction clause would later be dropped from the plan entirely
because it's implied by the index's predicate. This is a fairly common
situation for partial indexes because predicates using columns not included
in the index are often the most useful kind of predicate, and we have to
duplicate (or at least imply) the predicate in the WHERE clause in order
to get the index to be considered at all. So index-only scans were
essentially unavailable with such partial indexes.
To fix, we have to do detection of implied-by-predicate clauses much
earlier in the planner. This patch puts it in check_index_predicates
(nee check_partial_indexes), meaning it gets done for every partial index,
whereas we previously only considered this issue at createplan time,
so that the work was only done for an index actually selected for use.
That could result in a noticeable planning slowdown for queries against
tables with many partial indexes. However, testing suggested that there
isn't really a significant cost, especially not with reasonable numbers
of partial indexes. We do get a small additional benefit, which is that
cost_index is more accurate since it correctly discounts the evaluation
cost of clauses that will be removed. We can also avoid considering such
clauses as potential indexquals, which saves useless matching cycles in
the case where the predicate columns aren't in the index, and prevents
generating bogus plans that double-count the clause's selectivity when
the columns are in the index.
Tomas Vondra and Kyotaro Horiguchi, reviewed by Kevin Grittner and
Konstantin Knizhnik, and whacked around a little by me