1
0
mirror of https://github.com/postgres/postgres.git synced 2025-10-31 10:30:33 +03:00
Commit Graph

1127 Commits

Author SHA1 Message Date
Tom Lane
586b98fdf1 Make type "name" collation-aware.
The "name" comparison operators now all support collations, making them
functionally equivalent to "text" comparisons, except for the different
physical representation of the datatype.  They do, in fact, mostly share
the varstr_cmp and varstr_sortsupport infrastructure, which has been
slightly enlarged to handle the case.

To avoid changes in the default behavior of the datatype, set name's
typcollation to C_COLLATION_OID not DEFAULT_COLLATION_OID, so that
by default comparisons to a name value will continue to use strcmp
semantics.  (This would have been the case for system catalog columns
anyway, because of commit 6b0faf723, but doing this makes it true for
user-created name columns as well.  In particular, this avoids
locale-dependent changes in our regression test results.)

In consequence, tweak a couple of places that made assumptions about
collatable base types always having typcollation DEFAULT_COLLATION_OID.
I have not, however, attempted to relax the restriction that user-
defined collatable types must have that.  Hence, "name" doesn't
behave quite like a user-defined type; it acts more like a domain
with COLLATE "C".  (Conceivably, if we ever get rid of the need for
catalog name columns to be fixed-length, "name" could actually become
such a domain over text.  But that'd be a pretty massive undertaking,
and I'm not volunteering.)

Discussion: https://postgr.es/m/15938.1544377821@sss.pgh.pa.us
2018-12-19 17:46:25 -05:00
Tom Lane
001bb9f3ed Add stack depth checks to key recursive functions in backend/nodes/*.c.
Although copyfuncs.c has a check_stack_depth call in its recursion,
equalfuncs.c, outfuncs.c, and readfuncs.c lacked one.  This seems
unwise.

Likewise fix planstate_tree_walker(), in branches where that exists.

Discussion: https://postgr.es/m/30253.1544286631@sss.pgh.pa.us
2018-12-10 11:12:43 -05:00
Alvaro Herrera
705d433fd5 Revise attribute handling code on partition creation
The original code to propagate NOT NULL and default expressions
specified when creating a partition was mostly copy-pasted from
typed-tables creation, but not being a great match it contained some
duplicity, inefficiency and bugs.

This commit fixes the bug that NOT NULL constraints declared in the
parent table would not be honored in the partition.  One reported issue
that is not fixed is that a DEFAULT declared in the child is not used
when inserting through the parent.  That would amount to a behavioral
change that's better not back-patched.

This rewrite makes the code simpler:

1. instead of checking for duplicate column names in its own block,
reuse the original one that already did that;

2. instead of concatenating the list of columns from parent and the one
declared in the partition and scanning the result to (incorrectly)
propagate defaults and not-null constraints, just scan the latter
searching the former for a match, and merging sensibly.  This works
because we know the list in the parent is already correct and there can
only be one parent.

This rewrite makes ColumnDef->is_from_parent unused, so it's removed
on branch master; on released branches, it's kept as an unused field in
order not to cause ABI incompatibilities.

This commit also adds a test case for creating partitions with
collations mismatching that on the parent table, something that is
closely related to the code being patched.  No code change is introduced
though, since that'd be a behavior change that could break some (broken)
working applications.

Amit Langote wrote a less invasive fix for the original
NOT NULL/defaults bug, but while I kept the tests he added, I ended up
not using his original code.  Ashutosh Bapat reviewed Amit's fix.  Amit
reviewed mine.

Author: Álvaro Herrera, Amit Langote
Reviewed-by: Ashutosh Bapat, Amit Langote
Reported-by: Jürgen Strobel (bug #15212)
Discussion: https://postgr.es/m/152746742177.1291.9847032632907407358@wrigleys.postgresql.org
2018-11-08 16:22:09 -03:00
Alvaro Herrera
c7d43c4d8a Correct attach/detach logic for FKs in partitions
There was no code to handle foreign key constraints on partitioned
tables in the case of ALTER TABLE DETACH; and if you happened to ATTACH
a partition that already had an equivalent constraint, that one was
ignored and a new constraint was created.  Adding this to the fact that
foreign key cloning reuses the constraint name on the partition instead
of generating a new name (as it probably should, to cater to SQL
standard rules about constraint naming within schemas), the result was a
pretty poor user experience -- the most visible failure was that just
detaching a partition and re-attaching it failed with an error such as

  ERROR:  duplicate key value violates unique constraint "pg_constraint_conrelid_contypid_conname_index"
  DETAIL:  Key (conrelid, contypid, conname)=(26702, 0, test_result_asset_id_fkey) already exists.

because it would try to create an identically-named constraint in the
partition.  To make matters worse, if you tried to drop the constraint
in the now-independent partition, that would fail because the constraint
was still seen as dependent on the constraint in its former parent
partitioned table:
  ERROR:  cannot drop inherited constraint "test_result_asset_id_fkey" of relation "test_result_cbsystem_0001_0050_monthly_2018_09"

This fix attacks the problem from two angles: first, when the partition
is detached, the constraint is also marked as independent, so the drop
now works.  Second, when the partition is re-attached, we scan existing
constraints searching for one matching the FK in the parent, and if one
exists, we link that one to the parent constraint.  So we don't end up
with a duplicate -- and better yet, we don't need to scan the referenced
table to verify that the constraint holds.

To implement this I made a small change to previously planner-only
struct ForeignKeyCacheInfo to contain the constraint OID; also relcache
now maintains the list of FKs for partitioned tables too.

Backpatch to 11.

Reported-by: Michael Vitale (bug #15425)
Discussion: https://postgr.es/m/15425-2dbc9d2aa999f816@postgresql.org
2018-10-12 12:37:37 -03:00
Tom Lane
52ed730d51 Remove some unnecessary fields from Plan trees.
In the wake of commit f2343653f, we no longer need some fields that
were used before to control executor lock acquisitions:

* PlannedStmt.nonleafResultRelations can go away entirely.

* partitioned_rels can go away from Append, MergeAppend, and ModifyTable.
However, ModifyTable still needs to know the RT index of the partition
root table if any, which was formerly kept in the first entry of that
list.  Add a new field "rootRelation" to remember that.  rootRelation is
partly redundant with nominalRelation, in that if it's set it will have
the same value as nominalRelation.  However, the latter field has a
different purpose so it seems best to keep them distinct.

Amit Langote, reviewed by David Rowley and Jesper Pedersen,
and whacked around a bit more by me

Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
2018-10-07 14:33:17 -04:00
Tom Lane
9ddef36278 Centralize executor's opening/closing of Relations for rangetable entries.
Create an array estate->es_relations[] paralleling the es_range_table,
and store references to Relations (relcache entries) there, so that any
given RT entry is opened and closed just once per executor run.  Scan
nodes typically still call ExecOpenScanRelation, but ExecCloseScanRelation
is no more; relation closing is now done centrally in ExecEndPlan.

This is slightly more complex than one would expect because of the
interactions with relcache references held in ResultRelInfo nodes.
The general convention is now that ResultRelInfo->ri_RelationDesc does
not represent a separate relcache reference and so does not need to be
explicitly closed; but there is an exception for ResultRelInfos in the
es_trig_target_relations list, which are manufactured by
ExecGetTriggerResultRel and have to be cleaned up by
ExecCleanUpTriggerState.  (That much was true all along, but these
ResultRelInfos are now more different from others than they used to be.)

To allow the partition pruning logic to make use of es_relations[] rather
than having its own relcache references, adjust PartitionedRelPruneInfo
to store an RT index rather than a relation OID.

Amit Langote, reviewed by David Rowley and Jesper Pedersen,
some mods by me

Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
2018-10-04 14:03:42 -04:00
Tom Lane
fdba460a26 Create an RTE field to record the query's lock mode for each relation.
Add RangeTblEntry.rellockmode, which records the appropriate lock mode for
each RTE_RELATION rangetable entry (either AccessShareLock, RowShareLock,
or RowExclusiveLock depending on the RTE's role in the query).

This patch creates the field and makes all creators of RTE nodes fill it
in reasonably, but for the moment nothing much is done with it.  The plan
is to replace assorted post-parser logic that re-determines the right
lockmode to use with simple uses of rte->rellockmode.  For now, just add
Asserts in each of those places that the rellockmode matches what they are
computing today.  (In some cases the match isn't perfect, so the Asserts
are weaker than you might expect; but this seems OK, as per discussion.)

This passes check-world for me, but it seems worth pushing in this state
to see if the buildfarm finds any problems in cases I failed to test.

catversion bump due to change of stored rules.

Amit Langote, reviewed by David Rowley and Jesper Pedersen,
and whacked around a bit more by me

Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
2018-09-30 13:55:51 -04:00
Tom Lane
d0cfc3d6a4 Add a debugging option to stress-test outfuncs.c and readfuncs.c.
In the normal course of operation, query trees will be serialized only if
they are stored as views or rules; and plan trees will be serialized only
if they get passed to parallel-query workers.  This leaves an awful lot of
opportunity for bugs/oversights to not get detected, as indeed we've just
been reminded of the hard way.

To improve matters, this patch adds a new compile option
WRITE_READ_PARSE_PLAN_TREES, which is modeled on the longstanding option
COPY_PARSE_PLAN_TREES; but instead of passing all parse and plan trees
through copyObject, it passes them through nodeToString + stringToNode.
Enabling this option in a buildfarm animal or two will catch problems
at least for cases that are exercised by the regression tests.

A small problem with this idea is that readfuncs.c historically has
discarded location fields, on the reasonable grounds that parse
locations in a retrieved view are not relevant to the current query.
But doing that in WRITE_READ_PARSE_PLAN_TREES breaks pg_stat_statements,
and it could cause problems for future improvements that might try to
report error locations at runtime.  To fix that, provide a variant
behavior in readfuncs.c that makes it restore location fields when
told to.

In passing, const-ify the string arguments of stringToNode and its
subsidiary functions, just because it annoyed me that they weren't
const already.

Discussion: https://postgr.es/m/17114.1537138992@sss.pgh.pa.us
2018-09-18 17:11:54 -04:00
Tom Lane
db1071d4ee Fix some minor issues exposed by outfuncs/readfuncs testing.
A test patch to pass parse and plan trees through outfuncs + readfuncs
exposed several issues that need to be fixed to get clean matches:

Query.withCheckOptions failed to get copied; it's intentionally ignored
by outfuncs/readfuncs on the grounds that it'd always be NIL anyway in
stored rules.  This seems less than future-proof, and it's not even
saving very much, so just undo the decision and treat the field like
all others.

Several places that convert a view RTE into a subquery RTE, or similar
manipulations, failed to clear out fields that were specific to the
original RTE type and should be zero in a subquery RTE.  Since readfuncs.c
will leave such fields as zero, equalfuncs.c thinks the nodes are different
leading to a reported mismatch.  It seems like a good idea to clear out the
no-longer-needed fields, even though in principle nothing should look at
them; the node ought to be indistinguishable from how it would look if
we'd built a new node instead of scribbling on the old one.

BuildOnConflictExcludedTargetlist randomly set the resname of some
TargetEntries to "" not NULL.  outfuncs/readfuncs don't distinguish those
cases, and so the string will read back in as NULL ... but equalfuncs.c
does distinguish.  Perhaps we ought to try to make things more consistent
in this area --- but it's just useless extra code space for
BuildOnConflictExcludedTargetlist to not use NULL here, so I fixed it for
now by making it do that.

catversion bumped because the change in handling of Query.withCheckOptions
affects stored rules.

Discussion: https://postgr.es/m/17114.1537138992@sss.pgh.pa.us
2018-09-18 15:08:28 -04:00
Tom Lane
09991e5a47 Fix some probably-minor oversights in readfuncs.c.
The system expects TABLEFUNC RTEs to have coltypes, coltypmods, and
colcollations lists, but outfuncs doesn't dump them and readfuncs doesn't
restore them.  This doesn't cause obvious failures, because the only things
that look at those fields are expandRTE() and get_rte_attribute_type(),
which are mostly used during parse analysis, before anything would've
passed the parsetree through outfuncs/readfuncs.  But expandRTE() is used
in build_physical_tlist(), which means that that function will return a
wrong answer for a TABLEFUNC RTE that came from a view.  Very accidentally,
this doesn't cause serious problems, because what it will return is NIL
which callers will interpret as "couldn't build a physical tlist because
of dropped columns".  So you still get a plan that works, though it's
marginally less efficient than it could be.  There are also some other
expandRTE() calls associated with transformation of whole-row Vars in
the planner.  I have been unable to exhibit misbehavior from that, and
it may be unreachable in any case that anyone would care about ... but
I'm not entirely convinced, so this seems like something we should back-
patch a fix for.  Fortunately, we can fix it without forcing a change
of stored rules and a catversion bump, because we can just copy these
lists from the subsidiary TableFunc object.

readfuncs.c was also missing support for NamedTuplestoreScan plan nodes.
This accidentally fails to break parallel query because a query using
a named tuplestore would never be considered parallel-safe anyway.
However, project policy since parallel query came in is that all plan
node types should have outfuncs/readfuncs support, so this is clearly
an oversight that should be repaired.

Noted while fooling around with a patch to test outfuncs/readfuncs more
thoroughly.  That exposed some other issues too, but these are the only
ones that seem worth back-patching.

Back-patch to v10 where both of these features came in.

Discussion: https://postgr.es/m/17114.1537138992@sss.pgh.pa.us
2018-09-18 13:02:27 -04:00
Tom Lane
3844adbf3c Add outfuncs.c support for RawStmt nodes.
I noticed while poking at a report from Andrey Lepikhov that the
recent addition of RawStmt nodes at the top of raw parse trees
makes it impossible to print any raw parse trees whatsoever,
because outfuncs.c doesn't know RawStmt and hence fails to descend
into it.

While we generally lack outfuncs.c support for utility statements,
there is reasonably complete support for what you can find in a
raw SELECT statement.  It was not my intention to make that all
dead code ... so let's add support for RawStmt.

Back-patch to v10 where RawStmt appeared.
2018-09-16 13:02:47 -04:00
Andrew Gierth
728202b63c Order active window clauses for greater reuse of Sort nodes.
By sorting the active window list lexicographically by the sort clause
list but putting longer clauses before shorter prefixes, we generate
more chances to elide Sort nodes when building the path.

Author: Daniel Gustafsson (with some editorialization by me)
Reviewed-by: Alexander Kuzmenkov, Masahiko Sawada, Tom Lane
Discussion: https://postgr.es/m/124A7F69-84CD-435B-BA0E-2695BE21E5C2%40yesql.se
2018-09-14 17:35:42 +01:00
Etsuro Fujita
7cfdc77023 Disable support for partitionwise joins in problematic cases.
Commit f49842d, which added support for partitionwise joins, built the
child's tlist by applying adjust_appendrel_attrs() to the parent's.  So in
the case where the parent's included a whole-row Var for the parent, the
child's contained a ConvertRowtypeExpr.  To cope with that, that commit
added code to the planner, such as setrefs.c, but some code paths still
assumed that the tlist for a scan (or join) rel would only include Vars
and PlaceHolderVars, which was true before that commit, causing errors:

* When creating an explicit sort node for an input path for a mergejoin
  path for a child join, prepare_sort_from_pathkeys() threw the 'could not
  find pathkey item to sort' error.
* When deparsing a relation participating in a pushed down child join as a
  subquery in contrib/postgres_fdw, get_relation_column_alias_ids() threw
  the 'unexpected expression in subquery output' error.
* When performing set_plan_references() on a local join plan generated by
  contrib/postgres_fdw for EvalPlanQual support for a pushed down child
  join, fix_join_expr() threw the 'variable not found in subplan target
  lists' error.

To fix these, two approaches have been proposed: one by Ashutosh Bapat and
one by me.  While the former keeps building the child's tlist with a
ConvertRowtypeExpr, the latter builds it with a whole-row Var for the
child not to violate the planner assumption, and tries to fix it up later,
But both approaches need more work, so refuse to generate partitionwise
join paths when whole-row Vars are involved, instead.  We don't need to
handle ConvertRowtypeExprs in the child's tlists for now, so this commit
also removes the changes to the planner.

Previously, partitionwise join computed attr_needed data for each child
separately, and built the child join's tlist using that data, which also
required an extra step for adding PlaceHolderVars to that tlist, but it
would be more efficient to build it from the parent join's tlist through
the adjust_appendrel_attrs() transformation.  So this commit builds that
list that way, and simplifies build_joinrel_tlist() and placeholder.c as
well as part of set_append_rel_size() to basically what they were before
partitionwise join went in.

Back-patch to PG11 where partitionwise join was introduced.

Report by Rajkumar Raghuwanshi.  Analysis by Ashutosh Bapat, who also
provided some of regression tests.  Patch by me, reviewed by Robert Haas.

Discussion: https://postgr.es/m/CAKcux6ktu-8tefLWtQuuZBYFaZA83vUzuRd7c1YHC-yEWyYFpg@mail.gmail.com
2018-08-31 20:34:06 +09:00
Tom Lane
1c2cb2744b Fix run-time partition pruning for appends with multiple source rels.
The previous coding here supposed that if run-time partitioning applied to
a particular Append/MergeAppend plan, then all child plans of that node
must be members of a single partitioning hierarchy.  This is totally wrong,
since an Append could be formed from a UNION ALL: we could have multiple
hierarchies sharing the same Append, or child plans that aren't part of any
hierarchy.

To fix, restructure the related plan-time and execution-time data
structures so that we can have a separate list or array for each
partitioning hierarchy.  Also track subplans that are not part of any
hierarchy, and make sure they don't get pruned.

Per reports from Phil Florent and others.  Back-patch to v11, since
the bug originated there.

David Rowley, with a lot of cosmetic adjustments by me; thanks also
to Amit Langote for review.

Discussion: https://postgr.es/m/HE1PR03MB17068BB27404C90B5B788BCABA7B0@HE1PR03MB1706.eurprd03.prod.outlook.com
2018-08-01 19:42:52 -04:00
Tom Lane
6574f19127 Remove dead code left behind by 1b6801051. 2018-07-30 19:11:02 -04:00
Alvaro Herrera
1b68010518 Change bms_add_range to be a no-op for empty ranges
In commit 84940644de, bms_add_range was added with an API to fail with
an error if an empty range was specified.  This seems arbitrary and
unhelpful, so turn that case into a no-op instead.  Callers that require
further verification on the arguments or result can apply them by
themselves.

This fixes the bug that partition pruning throws an API error for a case
involving the default partition of a default partition, as in the
included test case.

Reported-by: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/16590.1532622503@sss.pgh.pa.us
2018-07-30 18:44:33 -04:00
Michael Paquier
9ebe0572ce Refactor cluster_rel() to handle more options
This extends cluster_rel() in such a way that more options can be added
in the future, which will reduce the amount of chunk code for an
upcoming SKIP_LOCKED aimed for VACUUM.  As VACUUM FULL is a different
flavor of CLUSTER, we want to make that extensible to ease integration.

This only reworks the API and its callers, without providing anything
user-facing.  Two options are present now: verbose mode and relation
recheck when doing the cluster command work across multiple
transactions.  This could be used as well as a base to extend the
grammar of CLUSTER later on.

Author: Michael Paquier
Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/20180723031058.GE2854@paquier.xyz
2018-07-24 11:37:32 +09:00
Heikki Linnakangas
5220bb7533 Expand run-time partition pruning to work with MergeAppend
This expands the support for the run-time partition pruning which was added
for Append in 499be013de to also allow unneeded subnodes of a MergeAppend
to be removed.

Author: David Rowley
Discussion: https://www.postgresql.org/message-id/CAKJS1f_F_V8D7Wu-HVdnH7zCUxhoGK8XhLLtd%3DCu85qDZzXrgg%40mail.gmail.com
2018-07-19 13:49:43 +03:00
Tom Lane
ff4f889164 Fix bugs with degenerate window ORDER BY clauses in GROUPS/RANGE mode.
nodeWindowAgg.c failed to cope with the possibility that no ordering
columns are defined in the window frame for GROUPS mode or RANGE OFFSET
mode, leading to assertion failures or odd errors, as reported by Masahiko
Sawada and Lukas Eder.  In RANGE OFFSET mode, an ordering column is really
required, so add an Assert about that.  In GROUPS mode, the code would
work, except that the node initialization code wasn't in sync with the
execution code about when to set up tuplestore read pointers and spare
slots.  Fix the latter for consistency's sake (even though I think the
changes described below make the out-of-sync cases unreachable for now).

Per SQL spec, a single ordering column is required for RANGE OFFSET mode,
and at least one ordering column is required for GROUPS mode.  The parser
enforced the former but not the latter; add a check for that.

We were able to reach the no-ordering-column cases even with fully spec
compliant queries, though, because the planner would drop partitioning
and ordering columns from the generated plan if they were redundant with
earlier columns according to the redundant-pathkey logic, for instance
"PARTITION BY x ORDER BY y" in the presence of a "WHERE x=y" qual.
While in principle that's an optimization that could save some pointless
comparisons at runtime, it seems unlikely to be meaningful in the real
world.  I think this behavior was not so much an intentional optimization
as a side-effect of an ancient decision to construct the plan node's
ordering-column info by reverse-engineering the PathKeys of the input
path.  If we give up redundant-column removal then it takes very little
code to generate the plan node info directly from the WindowClause,
ensuring that we have the expected number of ordering columns in all
cases.  (If anyone does complain about this, the planner could perhaps
be taught to remove redundant columns only when it's safe to do so,
ie *not* in RANGE OFFSET mode.  But I doubt anyone ever will.)

With these changes, the WindowAggPath.winpathkeys field is not used for
anything anymore, so remove it.

The test cases added here are not actually very interesting given the
removal of the redundant-column-removal logic, but they would represent
important corner cases if anyone ever tries to put that back.

Tom Lane and Masahiko Sawada.  Back-patch to v11 where RANGE OFFSET
and GROUPS modes were added.

Discussion: https://postgr.es/m/CAD21AoDrWqycq-w_+Bx1cjc+YUhZ11XTj9rfxNiNDojjBx8Fjw@mail.gmail.com
Discussion: https://postgr.es/m/153086788677.17476.8002640580496698831@wrigleys.postgresql.org
2018-07-11 12:07:20 -04:00
Tom Lane
321f648a31 Assorted cosmetic cleanup of run-time-partition-pruning code.
Use "subplan" rather than "subnode" to refer to the child plans of
a partitioning Append; this seems a bit more specific and hence
clearer.  Improve assorted comments.  No non-cosmetic changes.

David Rowley and Tom Lane

Discussion: https://postgr.es/m/CAFj8pRBjrufA3ocDm8o4LPGNye9Y+pm1b9kCwode4X04CULG3g@mail.gmail.com
2018-06-10 18:24:34 -04:00
Tom Lane
939449de0e Relocate partition pruning structs to a saner place.
These struct definitions were originally dropped into primnodes.h,
which is a poor choice since that's mainly intended for primitive
expression node types; these are not in that category.  What they
are is auxiliary info in Plan trees, so move them to plannodes.h.

For consistency, also relocate some related code that was apparently
placed with the aid of a dartboard.

There's no interesting code changes in this commit, just reshuffling.

David Rowley and Tom Lane

Discussion: https://postgr.es/m/CAFj8pRBjrufA3ocDm8o4LPGNye9Y+pm1b9kCwode4X04CULG3g@mail.gmail.com
2018-06-10 16:30:14 -04:00
Tom Lane
73b7f48f78 Improve run-time partition pruning to handle any stable expression.
The initial coding of the run-time-pruning feature only coped with cases
where the partition key(s) are compared to Params.  That is a bit silly;
we can allow it to work with any non-Var-containing stable expression, as
long as we take special care with expressions containing PARAM_EXEC Params.
The code is hardly any longer this way, and it's considerably clearer
(IMO at least).  Per gripe from Pavel Stehule.

David Rowley, whacked around a bit by me

Discussion: https://postgr.es/m/CAFj8pRBjrufA3ocDm8o4LPGNye9Y+pm1b9kCwode4X04CULG3g@mail.gmail.com
2018-06-10 15:22:32 -04:00
Noah Misch
ef31095000 Reconcile nodes/*funcs.c with PostgreSQL 11 work.
This covers new fields in two outfuncs.c functions having no readfuncs.c
counterpart.  Thus, this changes only debugging output.
2018-05-31 16:07:13 -07:00
Tom Lane
cfffe83ba8 Fix incorrect field type for PlannedStmt.jitFlags in outfuncs/readfuncs.
This field was a bool at one point, but now it's an int.
Spotted by Hari Babu; trivial patch is by Ashutosh Bapat.

Discussion: https://postgr.es/m/CAJrrPGedKiFE2fqntSauUfhapCksOJzam+QtHfSgx86LhXLeOQ@mail.gmail.com
2018-04-28 16:46:24 -04:00
Tom Lane
bdf46af748 Post-feature-freeze pgindent run.
Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us
2018-04-26 14:47:16 -04:00
Alvaro Herrera
055fb8d33d Add GUC enable_partition_pruning
This controls both plan-time and execution-time new-style partition
pruning.  While finer-grain control is possible (maybe using an enum GUC
instead of boolean), there doesn't seem to be much need for that.

This new parameter controls partition pruning for all queries:
trivially, SELECT queries that affect partitioned tables are naturally
under its control since they are using the new technology.  However,
while UPDATE/DELETE queries do not use the new code, we make the new GUC
control their behavior also (stealing control from
constraint_exclusion), because it is more natural, and it leads to a
more natural transition to the future in which those queries will also
use the new pruning code.

Constraint exclusion still controls pruning for regular inheritance
situations (those not involving partitioned tables).

Author: David Rowley
Review: Amit Langote, Ashutosh Bapat, Justin Pryzby, David G. Johnston
Discussion: https://postgr.es/m/CAKJS1f_0HwsxJG9m+nzU+CizxSdGtfe6iF_ykPYBiYft302DCw@mail.gmail.com
2018-04-23 17:57:43 -03:00
Teodor Sigaev
a5ab8928d7 Make bms_prev_member work correctly with a 64 bit bitmapword
5c067521 erroneously had coded bms_prev_member assuming that a bitmapword
would always hold 32 bits and started it's search on what it thought was the
highest 8-bits of the word.  This was not the case if bitmapwords were 64
bits.

In passing add a test to exercise this function a little. Previously there was
no coverage at all.

David Rowly
2018-04-23 17:59:17 +03:00
Simon Riggs
08ea7a2291 Revert MERGE patch
This reverts commits d204ef6377,
83454e3c2b and a few more commits thereafter
(complete list at the end) related to MERGE feature.

While the feature was fully functional, with sufficient test coverage and
necessary documentation, it was felt that some parts of the executor and
parse-analyzer can use a different design and it wasn't possible to do that in
the available time. So it was decided to revert the patch for PG11 and retry
again in the future.

Thanks again to all reviewers and bug reporters.

List of commits reverted, in reverse chronological order:

 f1464c5380 Improve parse representation for MERGE
 ddb4158579 MERGE syntax diagram correction
 530e69e59b Allow cpluspluscheck to pass by renaming variable
 01b88b4df5 MERGE minor errata
 3af7b2b0d4 MERGE fix variable warning in non-assert builds
 a5d86181ec MERGE INSERT allows only one VALUES clause
 4b2d44031f MERGE post-commit review
 4923550c20 Tab completion for MERGE
 aa3faa3c7a WITH support in MERGE
 83454e3c2b New files for MERGE
 d204ef6377 MERGE SQL Command following SQL:2016

Author: Pavan Deolasee
Reviewed-by: Michael Paquier
2018-04-12 11:22:56 +01:00
Alvaro Herrera
499be013de Support partition pruning at execution time
Existing partition pruning is only able to work at plan time, for query
quals that appear in the parsed query.  This is good but limiting, as
there can be parameters that appear later that can be usefully used to
further prune partitions.

This commit adds support for pruning subnodes of Append which cannot
possibly contain any matching tuples, during execution, by evaluating
Params to determine the minimum set of subnodes that can possibly match.
We support more than just simple Params in WHERE clauses. Support
additionally includes:

1. Parameterized Nested Loop Joins: The parameter from the outer side of the
   join can be used to determine the minimum set of inner side partitions to
   scan.

2. Initplans: Once an initplan has been executed we can then determine which
   partitions match the value from the initplan.

Partition pruning is performed in two ways.  When Params external to the plan
are found to match the partition key we attempt to prune away unneeded Append
subplans during the initialization of the executor.  This allows us to bypass
the initialization of non-matching subplans meaning they won't appear in the
EXPLAIN or EXPLAIN ANALYZE output.

For parameters whose value is only known during the actual execution
then the pruning of these subplans must wait.  Subplans which are
eliminated during this stage of pruning are still visible in the EXPLAIN
output.  In order to determine if pruning has actually taken place, the
EXPLAIN ANALYZE must be viewed.  If a certain Append subplan was never
executed due to the elimination of the partition then the execution
timing area will state "(never executed)".  Whereas, if, for example in
the case of parameterized nested loops, the number of loops stated in
the EXPLAIN ANALYZE output for certain subplans may appear lower than
others due to the subplan having been scanned fewer times.  This is due
to the list of matching subnodes having to be evaluated whenever a
parameter which was found to match the partition key changes.

This commit required some additional infrastructure that permits the
building of a data structure which is able to perform the translation of
the matching partition IDs, as returned by get_matching_partitions, into
the list index of a subpaths list, as exist in node types such as
Append, MergeAppend and ModifyTable.  This allows us to translate a list
of clauses into a Bitmapset of all the subpath indexes which must be
included to satisfy the clause list.

Author: David Rowley, based on an earlier effort by Beena Emerson
Reviewers: Amit Langote, Robert Haas, Amul Sul, Rajkumar Raghuwanshi,
Jesper Pedersen
Discussion: https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
2018-04-07 17:54:39 -03:00
Alvaro Herrera
5c0675215e Add bms_prev_member function
This works very much like the existing bms_last_member function, only it
traverses through the Bitmapset in the opposite direction from the most
significant bit down to the least significant bit.  A special prevbit value of
-1 may be used to have the function determine the most significant bit.  This
is useful for starting a loop.  When there are no members less than prevbit,
the function returns -2 to indicate there are no more members.

Author: David Rowley
Discussion: https://postgr.es/m/CAKJS1f-K=3d5MDASNYFJpUpc20xcBnAwNC1-AOeunhn0OtkWbQ@mail.gmail.com
2018-04-07 17:54:39 -03:00
Teodor Sigaev
8224de4f42 Indexes with INCLUDE columns and their support in B-tree
This patch introduces INCLUDE clause to index definition.  This clause
specifies a list of columns which will be included as a non-key part in
the index.  The INCLUDE columns exist solely to allow more queries to
benefit from index-only scans.  Also, such columns don't need to have
appropriate operator classes.  Expressions are not supported as INCLUDE
columns since they cannot be used in index-only scans.

Index access methods supporting INCLUDE are indicated by amcaninclude flag
in IndexAmRoutine.  For now, only B-tree indexes support INCLUDE clause.

In B-tree indexes INCLUDE columns are truncated from pivot index tuples
(tuples located in non-leaf pages and high keys).  Therefore, B-tree indexes
now might have variable number of attributes.  This patch also provides
generic facility to support that: pivot tuples contain number of their
attributes in t_tid.ip_posid.  Free 13th bit of t_info is used for indicating
that.  This facility will simplify further support of index suffix truncation.
The changes of above are backward-compatible, pg_upgrade doesn't need special
handling of B-tree indexes for that.

Bump catalog version

Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me
Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes,
			 David Rowley, Alexander Korotkov
Discussion: https://www.postgresql.org/message-id/flat/56168952.4010101@postgrespro.ru
2018-04-07 23:00:39 +03:00
Alvaro Herrera
9fdb675fc5 Faster partition pruning
Add a new module backend/partitioning/partprune.c, implementing a more
sophisticated algorithm for partition pruning.  The new module uses each
partition's "boundinfo" for pruning instead of constraint exclusion,
based on an idea proposed by Robert Haas of a "pruning program": a list
of steps generated from the query quals which are run iteratively to
obtain a list of partitions that must be scanned in order to satisfy
those quals.

At present, this targets planner-time partition pruning, but there exist
further patches to apply partition pruning at execution time as well.

This commit also moves some definitions from include/catalog/partition.h
to a new file include/partitioning/partbounds.h, in an attempt to
rationalize partitioning related code.

Authors: Amit Langote, David Rowley, Dilip Kumar
Reviewers: Robert Haas, Kyotaro Horiguchi, Ashutosh Bapat, Jesper Pedersen.
Discussion: https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
2018-04-06 16:44:05 -03:00
Simon Riggs
f1464c5380 Improve parse representation for MERGE
Separation of parser data structures from executor, as
requested by Tom Lane. Further improvements possible.

While there, implement error for multiple VALUES clauses via parser
to allow line number of error, as requested by Andres Freund.

Author: Pavan Deolasee

Discussion: https://www.postgresql.org/message-id/CABOikdPpqjectFchg0FyTOpsGXyPoqwgC==OLKWuxgBOsrDDZw@mail.gmail.com
2018-04-06 09:38:59 +01:00
Simon Riggs
aa3faa3c7a WITH support in MERGE
Author: Peter Geoghegan
Recursive support removed, no tests
Docs added by me
2018-04-03 12:13:59 +01:00
Simon Riggs
d204ef6377 MERGE SQL Command following SQL:2016
MERGE performs actions that modify rows in the target table
using a source table or query. MERGE provides a single SQL
statement that can conditionally INSERT/UPDATE/DELETE rows
a task that would other require multiple PL statements.
e.g.

MERGE INTO target AS t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED AND t.balance > s.delta THEN
  UPDATE SET balance = t.balance - s.delta
WHEN MATCHED THEN
  DELETE
WHEN NOT MATCHED AND s.delta > 0 THEN
  INSERT VALUES (s.sid, s.delta)
WHEN NOT MATCHED THEN
  DO NOTHING;

MERGE works with regular and partitioned tables, including
column and row security enforcement, as well as support for
row, statement and transition triggers.

MERGE is optimized for OLTP and is parameterizable, though
also useful for large scale ETL/ELT. MERGE is not intended
to be used in preference to existing single SQL commands
for INSERT, UPDATE or DELETE since there is some overhead.
MERGE can be used statically from PL/pgSQL.

MERGE does not yet support inheritance, write rules,
RETURNING clauses, updatable views or foreign tables.
MERGE follows SQL Standard per the most recent SQL:2016.

Includes full tests and documentation, including full
isolation tests to demonstrate the concurrent behavior.

This version written from scratch in 2017 by Simon Riggs,
using docs and tests originally written in 2009. Later work
from Pavan Deolasee has been both complex and deep, leaving
the lead author credit now in his hands.
Extensive discussion of concurrency from Peter Geoghegan,
with thanks for the time and effort contributed.

Various issues reported via sqlsmith by Andreas Seltenreich

Authors: Pavan Deolasee, Simon Riggs
Reviewer: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs

Discussion:
https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com
https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
2018-04-03 09:28:16 +01:00
Simon Riggs
7cf8a5c302 Revert "Modified files for MERGE"
This reverts commit 354f13855e.
2018-04-02 21:34:15 +01:00
Simon Riggs
354f13855e Modified files for MERGE 2018-04-02 21:12:47 +01:00
Bruce Momjian
20b4323bd1 C comments: "a" <--> "an" corrections
Reported-by: Michael Paquier, Abhijit Menon-Sen

Discussion: https://postgr.es/m/20180305045854.GB2266@paquier.xyz

Author: Michael Paquier, Abhijit Menon-Sen, me
2018-03-29 15:18:53 -04:00
Andres Freund
cc415a56d0 Basic planner and executor integration for JIT.
This adds simple cost based plan time decision about whether JIT
should be performed. jit_above_cost, jit_optimize_above_cost are
compared with the total cost of a plan, and if the cost is above them
JIT is performed / optimization is performed respectively.

For that PlannedStmt and EState have a jitFlags (es_jit_flags) field
that stores information about what JIT operations should be performed.

EState now also has a new es_jit field, which can store a
JitContext. When there are no errors the context is released in
standard_ExecutorEnd().

It is likely that the default values for jit_[optimize_]above_cost
will need to be adapted further, but in my test these values seem to
work reasonably.

Author: Andres Freund, with feedback by Peter Eisentraut
Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-03-22 11:51:58 -07:00
Peter Eisentraut
ec87efde8d Simplify parse representation of savepoint commands
Instead of embedding the savepoint name in a list and then requiring
complex code to unpack it, just add another struct field to store it
directly.

Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
2018-03-16 13:18:06 -04:00
Peter Eisentraut
17bb625017 Move strtoint() to common
Several places used similar code to convert a string to an int, so take
the function that we already had and make it globally available.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
2018-03-13 10:21:09 -04:00
Peter Eisentraut
6cf86f4354 Change internal integer representation of Value node
A Value node would store an integer as a long.  This causes needless
portability risks, as long can be of varying sizes.  Change it to use
int instead.  All code using this was already careful to only store
32-bit values anyway.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
2018-03-13 09:56:25 -04:00
Alvaro Herrera
5564c11815 Clone extended stats in CREATE TABLE (LIKE INCLUDING ALL)
The LIKE INCLUDING ALL clause to CREATE TABLE intuitively indicates
cloning of extended statistics on the source table, but it failed to do
so.  Patch it up so that it does.  Also include an INCLUDING STATISTICS
option to the LIKE clause, so that the behavior can be requested
individually, or excluded individually.

While at it, reorder the INCLUDING options, both in code and in docs, in
alphabetical order which makes more sense than feature-implementation
order that was previously used.

Backpatch this to Postgres 10, where extended statistics were
introduced, because this is seen as an oversight in a fresh feature
which is better to get consistent from the get-go instead of changing
only in pg11.

In pg11, comments on statistics objects are cloned too.  In pg10 they
are not, because I (Álvaro) was too coward to change the parse node as
required to support it.  Also, in pg10 I chose not to renumber the
parser symbols for the various INCLUDING options in LIKE, for the same
reason.  Any corresponding user-visible changes (docs) are backpatched,
though.

Reported-by: Stephen Froehlich
Author: David Rowley
Reviewed-by: Álvaro Herrera, Tomas Vondra
Discussion: https://postgr.es/m/CY1PR0601MB1927315B45667A1B679D0FD5E5EF0@CY1PR0601MB1927.namprd06.prod.outlook.com
2018-03-05 19:37:19 -03:00
Peter Eisentraut
76b6aa41f4 Support parameters in CALL
To support parameters in CALL, move the parse analysis of the procedure
and arguments into the global transformation phase, so that the parser
hooks can be applied.  And then at execution time pass the parameters
from ProcessUtility on to ExecuteCallStmt.
2018-02-22 21:36:48 -05:00
Tom Lane
0a459cec96 Support all SQL:2011 options for window frame clauses.
This patch adds the ability to use "RANGE offset PRECEDING/FOLLOWING"
frame boundaries in window functions.  We'd punted on that back in the
original patch to add window functions, because it was not clear how to
do it in a reasonably data-type-extensible fashion.  That problem is
resolved here by adding the ability for btree operator classes to provide
an "in_range" support function that defines how to add or subtract the
RANGE offset value.  Factoring it this way also allows the operator class
to avoid overflow problems near the ends of the datatype's range, if it
wishes to expend effort on that.  (In the committed patch, the integer
opclasses handle that issue, but it did not seem worth the trouble to
avoid overflow failures for datetime types.)

The patch includes in_range support for the integer_ops opfamily
(int2/int4/int8) as well as the standard datetime types.  Support for
other numeric types has been requested, but that seems like suitable
material for a follow-on patch.

In addition, the patch adds GROUPS mode which counts the offset in
ORDER-BY peer groups rather than rows, and it adds the frame_exclusion
options specified by SQL:2011.  As far as I can see, we are now fully
up to spec on window framing options.

Existing behaviors remain unchanged, except that I changed the errcode
for a couple of existing error reports to meet the SQL spec's expectation
that negative "offset" values should be reported as SQLSTATE 22013.

Internally and in relevant parts of the documentation, we now consistently
use the terminology "offset PRECEDING/FOLLOWING" rather than "value
PRECEDING/FOLLOWING", since the term "value" is confusingly vague.

Oliver Ford, reviewed and whacked around some by me

Discussion: https://postgr.es/m/CAGMVOdu9sivPAxbNN0X+q19Sfv9edEPv=HibOJhB14TJv_RCQg@mail.gmail.com
2018-02-07 00:06:56 -05:00
Peter Eisentraut
533c5d8bdd Fix application of identity values in some cases
Investigation of 2d2d06b7e2 revealed that
identity values were not applied in some further cases, including
logical replication subscribers, VALUES RTEs, and ALTER TABLE ... ADD
COLUMN.  To fix all that, apply the identity column expression in
build_column_default() instead of repeating the same logic at each call
site.

For ALTER TABLE ... ADD COLUMN ... IDENTITY, the previous coding
completely ignored that existing rows for the new column should have
values filled in from the identity sequence.  The coding using
build_column_default() fails for this because the sequence ownership
isn't registered until after ALTER TABLE, and we can't do it before
because we don't have the column in the catalog yet.  So we specially
remember in ColumnDef the sequence name that we decided on and build a
custom NextValueExpr using that.

Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-02-02 14:39:10 -05:00
Tom Lane
4971d2a322 Remove the obsolete WITH clause of CREATE FUNCTION.
This clause was superseded by SQL-standard syntax back in 7.3.
We've kept it around for backwards-compatibility purposes ever since;
but 15 years seems like long enough for that, especially seeing that
there are undocumented weirdnesses in how it interacts with the
SQL-standard syntax for specifying the same options.

Michael Paquier, per an observation by Daniel Gustafsson;
some small cosmetic adjustments to nearby code by me.

Discussion: https://postgr.es/m/20180115022748.GB1724@paquier.xyz
2018-01-26 12:25:44 -05:00
Robert Haas
2f17844104 Allow UPDATE to move rows between partitions.
When an UPDATE causes a row to no longer match the partition
constraint, try to move it to a different partition where it does
match the partition constraint.  In essence, the UPDATE is split into
a DELETE from the old partition and an INSERT into the new one.  This
can lead to surprising behavior in concurrency scenarios because
EvalPlanQual rechecks won't work as they normally did; the known
problems are documented.  (There is a pending patch to improve the
situation further, but it needs more review.)

Amit Khandekar, reviewed and tested by Amit Langote, David Rowley,
Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro
Herrera, Amit Kapila, and me.  A few final revisions by me.

Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
2018-01-19 15:33:06 -05:00
Alvaro Herrera
8b08f7d482 Local partitioned indexes
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.

As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones.  Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.

To support pg_dump'ing these indexes, add commands
    CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
    ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index).  These reconstruct prior
database state exactly.

Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
	Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
2018-01-19 11:49:22 -03:00
Tom Lane
3cb1b2a880 Rewrite list_qsort() to avoid trashing its input list.
The initial implementation of list_qsort(), from commit ab7271677,
re-used the ListCells of the input list while not touching the List
header.  This meant that anybody who still had a pointer to the
original header would now be in possession of a corrupted list,
a problem that seems sure to bite us eventually.

One possible solution is to re-use the original List header as well,
giving the function the semantics of update-in-place.  However, that
doesn't seem like a very good idea either given the way that the
function is used in the planner: create_path functions aren't normally
supposed to modify their input lists.  It doesn't look like there would
be a problem today, but it's not hard to foresee a time when modifying
a list of Paths in-place could have side-effects on some other append
path.

On the whole, and in view of the likelihood that this function might
be used in other contexts in the future, it seems best to get rid of
the micro-optimization of re-using the input list cells.  Just build
a new list.

Discussion: https://postgr.es/m/16912.1515449066@sss.pgh.pa.us
2018-01-09 13:25:53 -05:00