One of the things canonicalize_qual() does is to remove constant-NULL
subexpressions of top-level AND/OR clauses. It does that on the assumption
that what it's given is a top-level WHERE clause, so that NULL can be
treated like FALSE. Although this is documented down inside a subroutine
of canonicalize_qual(), it wasn't mentioned in the documentation of that
function itself, and some callers hadn't gotten that memo.
Notably, commit d007a9505 caused get_relation_constraints() to apply
canonicalize_qual() to CHECK constraints. That allowed constraint
exclusion to misoptimize situations in which a CHECK constraint had a
provably-NULL subclause, as seen in the regression test case added here,
in which a child table that should be scanned is not. (Although this
thinko is ancient, the test case doesn't fail before 9.2, for reasons
I've not bothered to track down in detail. There may be related cases
that do fail before that.)
More recently, commit f0e44751d added an independent bug by applying
canonicalize_qual() to index expressions, which is even sillier since
those might not even be boolean. If they are, though, I think this
could lead to making incorrect index entries for affected index
expressions in v10. I haven't attempted to prove that though.
To fix, add an "is_check" parameter to canonicalize_qual() to specify
whether it should assume WHERE or CHECK semantics, and make it perform
NULL-elimination accordingly. Adjust the callers to apply the right
semantics, or remove the call entirely in cases where it's not known
that the expression has one or the other semantics. I also removed
the call in some cases involving partition expressions, where it should
be a no-op because such expressions should be canonical already ...
and was a no-op, independently of whether it could in principle have
done something, because it was being handed the qual in implicit-AND
format which isn't what it expects. In HEAD, add an Assert to catch
that type of mistake in future.
This represents an API break for external callers of canonicalize_qual().
While that's intentional in HEAD to make such callers think about which
case applies to them, it seems like something we probably wouldn't be
thanked for in released branches. Hence, in released branches, the
extra parameter is added to a new function canonicalize_qual_ext(),
and canonicalize_qual() is a wrapper that retains its old behavior.
Patch by me with suggestions from Dean Rasheed. Back-patch to all
supported branches.
Discussion: https://postgr.es/m/24475.1520635069@sss.pgh.pa.us
Commit b08df9cab left things rather poorly documented as far as the
exact semantics of "clause_is_check" mode went. Also, that mode did
not really work correctly for predicate_refuted_by; although given the
lack of specification as to what it should do, as well as the lack
of any actual use-case, that's perhaps not surprising.
Rename "clause_is_check" to "weak" proof mode, and provide specifications
for what it should do. I defined weak refutation as meaning "truth of A
implies non-truth of B", which makes it possible to use the mode in the
part of relation_excluded_by_constraints that checks for mutually
contradictory WHERE clauses. Fix up several places that did things wrong
for that definition. (As far as I can see, these errors would only lead
to failure-to-prove, not incorrect claims of proof, making them not
serious bugs even aside from the fact that v10 contains no use of this
mode. So there seems no need for back-patching.)
In addition, teach predicate_refuted_by_recurse that it can use
predicate_implied_by_recurse after all when processing a strong NOT-clause,
so long as it asks for the correct proof strength. This is an optimization
that could have been included in commit b08df9cab, but wasn't.
Also, simplify and generalize the logic that checks for whether nullness of
the argument of IS [NOT] NULL would force overall nullness of the predicate
or clause. (This results in a change in the partition_prune test's output,
as it is now able to prune an all-nulls partition that it did not recognize
before.)
In passing, in PartConstraintImpliedByRelConstraint, remove bogus
conversion of the constraint list to explicit-AND form and then right back
again; that accomplished nothing except forcing a useless extra level of
recursion inside predicate_implied_by.
Discussion: https://postgr.es/m/5983.1520487191@sss.pgh.pa.us
Apparently, it doesn't work to use a plain cstring as a Name datum: you
may end up having random bytes because of failing to zero the bytes
after the terminating \0, as indicated by valgrind. I introduced this
bug in 5564c11815, so backpatch this fix to REL_10_STABLE, like that
commit.
While at it, fix a slightly misleading comment, pointed out by David
Rowley.
The LIKE INCLUDING ALL clause to CREATE TABLE intuitively indicates
cloning of extended statistics on the source table, but it failed to do
so. Patch it up so that it does. Also include an INCLUDING STATISTICS
option to the LIKE clause, so that the behavior can be requested
individually, or excluded individually.
While at it, reorder the INCLUDING options, both in code and in docs, in
alphabetical order which makes more sense than feature-implementation
order that was previously used.
Backpatch this to Postgres 10, where extended statistics were
introduced, because this is seen as an oversight in a fresh feature
which is better to get consistent from the get-go instead of changing
only in pg11.
In pg11, comments on statistics objects are cloned too. In pg10 they
are not, because I (Álvaro) was too coward to change the parse node as
required to support it. Also, in pg10 I chose not to renumber the
parser symbols for the various INCLUDING options in LIKE, for the same
reason. Any corresponding user-visible changes (docs) are backpatched,
though.
Reported-by: Stephen Froehlich
Author: David Rowley
Reviewed-by: Álvaro Herrera, Tomas Vondra
Discussion: https://postgr.es/m/CY1PR0601MB1927315B45667A1B679D0FD5E5EF0@CY1PR0601MB1927.namprd06.prod.outlook.com
The new column distinguishes normal functions, procedures, aggregates,
and window functions. This replaces the existing columns proisagg and
proiswindow, and replaces the convention that procedures are indicated
by prorettype == 0. Also change prorettype to be VOIDOID for procedures.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
A before-update row trigger may choose to return the "new" or "old" tuple
unmodified. ExecBRUpdateTriggers failed to consider the second
possibility, and would proceed to free the "old" tuple even if it was the
one returned, leading to subsequent access to already-deallocated memory.
In debug builds this reliably leads to an "invalid memory alloc request
size" failure; in production builds it might accidentally work, but data
corruption is also possible.
This is a very old bug. There are probably a couple of reasons it hasn't
been noticed up to now. It would be more usual to return NULL if one
wanted to suppress the update action; returning "old" is significantly less
efficient since the update will occur anyway. Also, none of the standard
PLs would ever cause this because they all returned freshly-manufactured
tuples even if they were just copying "old". But commit 4b93f5799 changed
that for plpgsql, making it possible to see the bug with a plpgsql trigger.
Still, this is certainly legal behavior for a trigger function, so it's
ExecBRUpdateTriggers's fault not plpgsql's.
It seems worth creating a test case that exercises returning "old" directly
with a C-language trigger; testing this through plpgsql seems unreliable
because its behavior might change again.
Report and fix by Rushabh Lathia; regression test case by me.
Back-patch to all supported branches.
Discussion: https://postgr.es/m/CAGPqQf1P4pjiNPrMof=P_16E-DFjt457j+nH2ex3=nBTew7tXw@mail.gmail.com
To support parameters in CALL, move the parse analysis of the procedure
and arguments into the global transformation phase, so that the parser
hooks can be applied. And then at execution time pass the parameters
from ProcessUtility on to ExecuteCallStmt.
It's not necessary to fully initialize the executor data structures
for partitions to which no tuples are ever routed. Consider, for
example, an INSERT statement that inserts only one row: it only cares
about the partition to which that one row is routed. The new function
ExecInitPartitionInfo performs the initialization in question only
when a particular partition is about to receive a tuple. This includes
creating, validating, and saving a pointer to the ResultRelInfo,
setting up for speculative insertions, translating WCOs and
initializing the resulting expressions, translating returning lists
and building the appropriate projection information, and setting up a
tuple conversion map.
One thing that's not deferred is locking the child partitions; that
seems desirable but would need more thought. Still, testing shows
that this makes single-row inserts significantly faster on a table
with many partitions without harming the bulk-insert case.
Amit Langote, reviewed by Etsuro Fujita, with a few changes by me
Discussion: http://postgr.es/m/8975331d-d961-cbdd-f862-fdd3d97dc2d0@lab.ntt.co.jp
If we restrict unique constraints on partitioned tables so that they
must always include the partition key, then our standard approach to
unique indexes already works --- each unique key is forced to exist
within a single partition, so enforcing the unique restriction in each
index individually is enough to have it enforced globally. Therefore we
can implement unique indexes on partitions by simply removing a few
restrictions (and adding others.)
Discussion: https://postgr.es/m/20171222212921.hi6hg6pem2w2t36z@alvherre.pgsql
Discussion: https://postgr.es/m/20171229230607.3iib6b62fn3uaf47@alvherre.pgsql
Reviewed-by: Simon Riggs, Jesper Pedersen, Peter Eisentraut, Jaime
Casanova, Amit Langote
The reason for doing so is that it will allow expression evaluation to
optimize based on the underlying tupledesc. In particular it will
allow to JIT tuple deforming together with the expression itself.
For that expression initialization needs to be moved after the
relevant slots are initialized - mostly unproblematic, except in the
case of nodeWorktablescan.c.
After doing so there's no need for ExecAssignResultType() and
ExecAssignResultTypeFromTL() anymore, as all former callers have been
converted to create a slot with a fixed descriptor.
When creating a slot with a fixed descriptor, tts_values/isnull can be
allocated together with the main slot, reducing allocation overhead
and increasing cache density a bit.
Author: Andres Freund
Discussion: https://postgr.es/m/20171206093717.vqdxe5icqttpxs3p@alap3.anarazel.de
Prematurely freeing the EState used to evaluate CALL arguments led, in some
cases, to passing dangling pointers to the procedure. This was masked in
trivial cases because the argument pointers would point to Const nodes in
the original expression tree, and in some other cases because the result
value would end up in the standalone ExprContext rather than in memory
belonging to the EState --- but that wasn't exactly high quality
programming either, because the standalone ExprContext was never
explicitly freed, breaking assorted API contracts.
In addition, using a separate EState for each argument was just silly.
So let's use just one EState, and one ExprContext, and make the latter
belong to the former rather than be standalone, and clean up the EState
(and hence the ExprContext) post-call.
While at it, improve the function's commentary a bit.
Discussion: https://postgr.es/m/29173.1518282748@sss.pgh.pa.us
CALL statements cannot support sub-SELECTs in the arguments of the called
procedure, since they just use ExecEvalExpr to evaluate such arguments.
Teach transformSubLink() to reject the case, as it already does for other
contexts in which subqueries are not supported.
In passing, s/EXPR_KIND_CALL/EXPR_KIND_CALL_ARGUMENT/ to make that enum
symbol line up more closely with the phrasing of the error messages it is
associated with. And fix someone's weak grasp of English grammar in the
preceding EXPR_KIND_PARTITION_EXPRESSION addition. Also update an
incorrect comment in resolve_unique_index_expr (possibly it was correct
when written, but nowadays transformExpr definitely does reject SRFs here).
Per report from Pavel Stehule --- but this resolves only one of the bugs
he mentions.
Discussion: https://postgr.es/m/CAFj8pRDxOwPPzpA8i+AQeDQFj7bhVw-dR2==rfWZ3zMGkm568Q@mail.gmail.com
Doing so causes EXPLAIN ANALYZE to show trigger statistics multiple
times. Commit 2f17844104 seems to
be to blame for this.
Amit Langote, revieed by Amit Khandekar, Etsuro Fujita, and me.
This patch adds the ability to use "RANGE offset PRECEDING/FOLLOWING"
frame boundaries in window functions. We'd punted on that back in the
original patch to add window functions, because it was not clear how to
do it in a reasonably data-type-extensible fashion. That problem is
resolved here by adding the ability for btree operator classes to provide
an "in_range" support function that defines how to add or subtract the
RANGE offset value. Factoring it this way also allows the operator class
to avoid overflow problems near the ends of the datatype's range, if it
wishes to expend effort on that. (In the committed patch, the integer
opclasses handle that issue, but it did not seem worth the trouble to
avoid overflow failures for datetime types.)
The patch includes in_range support for the integer_ops opfamily
(int2/int4/int8) as well as the standard datetime types. Support for
other numeric types has been requested, but that seems like suitable
material for a follow-on patch.
In addition, the patch adds GROUPS mode which counts the offset in
ORDER-BY peer groups rather than rows, and it adds the frame_exclusion
options specified by SQL:2011. As far as I can see, we are now fully
up to spec on window framing options.
Existing behaviors remain unchanged, except that I changed the errcode
for a couple of existing error reports to meet the SQL spec's expectation
that negative "offset" values should be reported as SQLSTATE 22013.
Internally and in relevant parts of the documentation, we now consistently
use the terminology "offset PRECEDING/FOLLOWING" rather than "value
PRECEDING/FOLLOWING", since the term "value" is confusingly vague.
Oliver Ford, reviewed and whacked around some by me
Discussion: https://postgr.es/m/CAGMVOdu9sivPAxbNN0X+q19Sfv9edEPv=HibOJhB14TJv_RCQg@mail.gmail.com
Investigation of 2d2d06b7e2 revealed that
identity values were not applied in some further cases, including
logical replication subscribers, VALUES RTEs, and ALTER TABLE ... ADD
COLUMN. To fix all that, apply the identity column expression in
build_column_default() instead of repeating the same logic at each call
site.
For ALTER TABLE ... ADD COLUMN ... IDENTITY, the previous coding
completely ignored that existing rows for the new column should have
values filled in from the identity sequence. The coding using
build_column_default() fails for this because the sequence ownership
isn't registered until after ALTER TABLE, and we can't do it before
because we don't have the column in the catalog yet. So we specially
remember in ColumnDef the sequence name that we decided on and build a
custom NextValueExpr using that.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
We have a lot of code in which option names, which from the user's
viewpoint are logically keywords, are passed through the grammar as plain
identifiers, and then matched to string literals during command execution.
This approach avoids making words into lexer keywords unnecessarily. Some
places matched these strings using plain strcmp, some using pg_strcasecmp.
But the latter should be unnecessary since identifiers would have been
downcased on their way through the parser. Aside from any efficiency
concerns (probably not a big factor), the lack of consistency in this area
creates a hazard of subtle bugs due to different places coming to different
conclusions about whether two option names are the same or different.
Hence, standardize on using strcmp() to match any option names that are
expected to have been fed through the parser.
This does create a user-visible behavioral change, which is that while
formerly all of these would work:
alter table foo set (fillfactor = 50);
alter table foo set (FillFactor = 50);
alter table foo set ("fillfactor" = 50);
alter table foo set ("FillFactor" = 50);
now the last case will fail because that double-quoted identifier is
different from the others. However, none of our documentation says that
you can use a quoted identifier in such contexts at all, and we should
discourage doing so since it would break if we ever decide to parse such
constructs as true lexer keywords rather than poor man's substitutes.
So this shouldn't create a significant compatibility issue for users.
Daniel Gustafsson, reviewed by Michael Paquier, small changes by me
Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se
This clause was superseded by SQL-standard syntax back in 7.3.
We've kept it around for backwards-compatibility purposes ever since;
but 15 years seems like long enough for that, especially seeing that
there are undocumented weirdnesses in how it interacts with the
SQL-standard syntax for specifying the same options.
Michael Paquier, per an observation by Daniel Gustafsson;
some small cosmetic adjustments to nearby code by me.
Discussion: https://postgr.es/m/20180115022748.GB1724@paquier.xyz
get_relation_info() was too optimistic about opening indexes in
partitioned tables, which would raise errors when any queries were
planned on such tables. Fix by ignoring any indexes of the partitioned
kind.
CLUSTER (and ALTER TABLE CLUSTER ON) had a similar problem. Fix by
disallowing these commands in partitioned tables.
Fallout from 8b08f7d482.
In each of the supplied procedural languages (PL/pgSQL, PL/Perl,
PL/Python, PL/Tcl), add language-specific commit and rollback
functions/commands to control transactions in procedures in that
language. Add similar underlying functions to SPI. Some additional
cleanup so that transaction commit or abort doesn't blow away data
structures still used by the procedure call. Add execution context
tracking to CALL and DO statements so that transaction control commands
can only be issued in top-level procedure and block calls, not function
calls or other procedure or block calls.
- SPI
Add a new function SPI_connect_ext() that is like SPI_connect() but
allows passing option flags. The only option flag right now is
SPI_OPT_NONATOMIC. A nonatomic SPI connection can execute transaction
control commands, otherwise it's not allowed. This is meant to be
passed down from CALL and DO statements which themselves know in which
context they are called. A nonatomic SPI connection uses different
memory management. A normal SPI connection allocates its memory in
TopTransactionContext. For nonatomic connections we use PortalContext
instead. As the comment in SPI_connect_ext() (previously SPI_connect())
indicates, one could potentially use PortalContext in all cases, but it
seems safest to leave the existing uses alone, because this stuff is
complicated enough already.
SPI also gets new functions SPI_start_transaction(), SPI_commit(), and
SPI_rollback(), which can be used by PLs to implement their transaction
control logic.
- portalmem.c
Some adjustments were made in the code that cleans up portals at
transaction abort. The portal code could already handle a command
*committing* a transaction and continuing (e.g., VACUUM), but it was not
quite prepared for a command *aborting* a transaction and continuing.
In AtAbort_Portals(), remove the code that marks an active portal as
failed. As the comment there already predicted, this doesn't work if
the running command wants to keep running after transaction abort. And
it's actually not necessary, because pquery.c is careful to run all
portal code in a PG_TRY block and explicitly runs MarkPortalFailed() if
there is an exception. So the code in AtAbort_Portals() is never used
anyway.
In AtAbort_Portals() and AtCleanup_Portals(), we need to be careful not
to clean up active portals too much. This mirrors similar code in
PreCommit_Portals().
- PL/Perl
Gets new functions spi_commit() and spi_rollback()
- PL/pgSQL
Gets new commands COMMIT and ROLLBACK.
Update the PL/SQL porting example in the documentation to reflect that
transactions are now possible in procedures.
- PL/Python
Gets new functions plpy.commit and plpy.rollback.
- PL/Tcl
Gets new commands commit and rollback.
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
When an UPDATE causes a row to no longer match the partition
constraint, try to move it to a different partition where it does
match the partition constraint. In essence, the UPDATE is split into
a DELETE from the old partition and an INSERT into the new one. This
can lead to surprising behavior in concurrency scenarios because
EvalPlanQual rechecks won't work as they normally did; the known
problems are documented. (There is a pending patch to improve the
situation further, but it needs more review.)
Amit Khandekar, reviewed and tested by Amit Langote, David Rowley,
Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro
Herrera, Amit Kapila, and me. A few final revisions by me.
Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
AclObjectKind was basically just another enumeration for object types,
and we already have a preferred one for that. It's only used in
aclcheck_error. By using ObjectType instead, we can also give some more
precise error messages, for example "index" instead of "relation".
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
There used to be a lot of different *Type and *Kind symbol groups to
address objects within different commands, most of which have been
replaced by ObjectType, starting with
b256f24264. But this conversion was never
done for the ACL commands until now.
This change ends up being just a plain replacement of the types and
symbols, without any code restructuring needed, except deleting some now
redundant code.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Stephen Frost <sfrost@snowman.net>
When CREATE INDEX is run on a partitioned table, create catalog entries
for an index on the partitioned table (which is just a placeholder since
the table proper has no data of its own), and recurse to create actual
indexes on the existing partitions; create them in future partitions
also.
As a convenience gadget, if the new index definition matches some
existing index in partitions, these are picked up and used instead of
creating new ones. Whichever way these indexes come about, they become
attached to the index on the parent table and are dropped alongside it,
and cannot be dropped on isolation unless they are detached first.
To support pg_dump'ing these indexes, add commands
CREATE INDEX ON ONLY <table>
(which creates the index on the parent partitioned table, without
recursing) and
ALTER INDEX ATTACH PARTITION
(which is used after the indexes have been created individually on each
partition, to attach them to the parent index). These reconstruct prior
database state exactly.
Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
Langote, Jesper Pedersen, Simon Riggs, David Rowley
Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
For no apparent reason, this function was using a 16bit-wide inhseqno
value, rather than the correct 32 bit width which is what is stored in
the pg_inherits catalog. This becomes evident if you try to create a
table with more than 65535 parents, because this error appears:
ERROR: duplicate key value violates unique constraint «pg_inherits_relid_seqno_index»
DETAIL: Key (inhrelid, inhseqno)=(329371, 0) already exists.
Needless to say, having so many parents is an uncommon situations, which
explains why this error has never been reported despite being having
been introduced with the Postgres95 1.01 sources in commit d31084e9d1:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/creatinh.c;hb=d31084e9d111#l349
Backpatch all the way back.
David Rowley noticed this while reviewing a patch of mine.
Discussion: https://postgr.es/m/CAKJS1f8Dn7swSEhOWwzZzssW7747YB=2Hi+T7uGud40dur69-g@mail.gmail.com
This should have been done in commit 18ce3a4ab, which added that parameter
to ExplainOneQuery, but it was overlooked. This makes it impossible for
a user of the hook to pass the queryEnv down to ExplainOnePlan.
It's too late to change this API in v10, I suppose, but fortunately
passing NULL to ExplainOnePlan will work in nearly all interesting
cases in v10. That might not be true forever, so we'd better fix it.
Tatsuro Yamada, reviewed by Thomas Munro
Discussion: https://postgr.es/m/890e8dd9-c1c7-a422-6892-874f5eaee048@lab.ntt.co.jp
After having gotten rid of PortalGetHeapMemory(), there seems little
reason to keep one Portal access macro around that offers no actual
abstraction and isn't consistently used anyway.
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Rename PortalMemory to TopPortalContext, to avoid confusion with
PortalContext and align naming with similar top-level memory contexts.
Rename PortalData's "heap" field to portalContext. The "heap" naming
seems quite antiquated and confusing. Also get rid of the
PortalGetHeapMemory() macro and access the field directly, which we do
for other portal fields, so this abstraction doesn't buy anything.
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
At present, we always raise an ERROR if the partition constraint
is violated, but a pending patch for UPDATE tuple routing will
consider instead moving the tuple to the correct partition.
Refactor to make that simpler.
Amit Khandekar, reviewed by Amit Langote, David Rowley, and me.
Discussion: http://postgr.es/m/CAJ3gD9cue54GbEzfV-61nyGpijvjZgCcghvLsB0_nL8Nm8HzCA@mail.gmail.com
Generalize is_partition_attr to has_partition_attrs and make it
accessible from outside tablecmds.c. Change map_partition_varattnos
to clarify that it can be used for mapping between any two relations
in a partitioning hierarchy, not just parent -> child.
Amit Khandekar, reviewed by Amit Langote, David Rowley, and me.
Some comment changes by me.
Discussion: http://postgr.es/m/CAJ3gD9fWfxgKC+PfJZF3hkgAcNOy-LpfPxVYitDEXKHjeieWQQ@mail.gmail.com
Instead of having ExecSetupPartitionTupleRouting return multiple out
parameters, have it return a pointer to a structure containing all of
those different things. Also, provide and use a cleanup function,
ExecCleanupTupleRouting, instead of cleaning up all of the resources
allocated by ExecSetupPartitionTupleRouting individually.
Amit Khandekar, reviewed by Amit Langote, David Rowley, and me
Discussion: http://postgr.es/m/CAJ3gD9fWfxgKC+PfJZF3hkgAcNOy-LpfPxVYitDEXKHjeieWQQ@mail.gmail.com
Multiple sessions doing CREATE INDEX CONCURRENTLY simultaneously are
supposed to be able to work in parallel, as evidenced by fixes in commit
c3d09b3bd2 specifically to support this case. In reality, one of the
sessions would be aborted by a misterious "deadlock detected" error.
Jeff Janes diagnosed that this is because of leftover snapshots used for
system catalog scans -- this was broken by 8aa3e47510 keeping track of
(registering) the catalog snapshot. To fix the deadlocks, it's enough
to de-register that snapshot prior to waiting.
Backpatch to 9.4, which introduced MVCC catalog scans.
Include an isolationtester spec that 8 out of 10 times reproduces the
deadlock with the unpatched code for me (Álvaro).
Author: Jeff Janes
Diagnosed-by: Jeff Janes
Reported-by: Jeremy Finzel
Discussion: https://postgr.es/m/CAMa1XUhHjCv8Qkx0WOr1Mpm_R4qxN26EibwCrj0Oor2YBUFUTg%40mail.gmail.com
In a race case, EXPLAIN ANALYZE could fail to display correct nbatch
and size information. Refactor so that participants report only on
batches they worked on rather than trying to report on all of them,
and teach explain.c to consider the HashInstrumentation object from
all participants instead of picking the first one it can find. This
should fix an occasional build farm failure in the "join" regression
test.
Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/30219.1514428346%40sss.pgh.pa.us
Hash index depends on estimation of numbers of tuples and pages of relations,
incorrect value could be a reason of significantly growing of index. Vacuum
full recreates heap and reindex all indexes before renewal stats. The patch
fixes that, so indexes will see correct values.
Backpatch to v10 only because earlier versions haven't usable hash index and
growing of hash index is a single user-visible symptom.
Author: Amit Kapila
Reviewed-by: Ashutosh Sharma, me
Discussion: https://www.postgresql.org/message-id/flat/20171115232922.5tomkxnw3iq6jsg7@inml.weebeastie.net
This patch does three interrelated things:
* Create a new expression execution step type EEOP_PARAM_CALLBACK
and add the infrastructure needed for add-on modules to generate that.
As discussed, the best control mechanism for that seems to be to add
another hook function to ParamListInfo, which will be called by
ExecInitExpr if it's supplied and a PARAM_EXTERN Param is found.
For stand-alone expressions, we add a new entry point to allow the
ParamListInfo to be specified directly, since it can't be retrieved
from the parent plan node's EState.
* Redesign the API for the ParamListInfo paramFetch hook so that the
ParamExternData array can be entirely virtual. This also lets us get rid
of ParamListInfo.paramMask, instead leaving it to the paramFetch hook to
decide which param IDs should be accessible or not. plpgsql_param_fetch
was already doing the identical masking check, so having callers do it too
seemed redundant. While I was at it, I added a "speculative" flag to
paramFetch that the planner can specify as TRUE to avoid unwanted failures.
This solves an ancient problem for plpgsql that it couldn't provide values
of non-DTYPE_VAR variables to the planner for fear of triggering premature
"record not assigned yet" or "field not found" errors during planning.
* Rework plpgsql to get rid of the need for "unshared" parameter lists,
by dint of turning the single ParamListInfo per estate into a nearly
read-only data structure that doesn't instantiate any per-variable data.
Instead, the paramFetch hook controls access to per-variable data and can
make the right decisions on the fly, replacing the cases that we used to
need multiple ParamListInfos for. This might perhaps have been a
performance loss on its own, but by using a paramCompile hook we can
bypass plpgsql_param_fetch entirely during normal query execution.
(It's now only called when, eg, we copy the ParamListInfo into a cursor
portal. copyParamList() or SerializeParamList() effectively instantiate
the virtual parameter array as a simple physical array without a
paramFetch hook, which is what we want in those cases.) This allows
reverting most of commit 6c82d8d1f, though I kept the cosmetic
code-consolidation aspects of that (eg the assign_simple_var function).
Performance testing shows this to be at worst a break-even change,
and it can provide wins ranging up to 20% in test cases involving
accesses to fields of "record" variables. The fact that values of
such variables can now be exposed to the planner might produce wins
in some situations, too, but I've not pursued that angle.
In passing, remove the "parent" pointer from the arguments to
ExecInitExprRec and related functions, instead storing that pointer in a
transient field in ExprState. The ParamListInfo pointer for a stand-alone
expression is handled the same way; we'd otherwise have had to add
yet another recursively-passed-down argument in expression compilation.
Discussion: https://postgr.es/m/32589.1513706441@sss.pgh.pa.us
The previous commit has shown that the sanity checks around freezing
aren't strong enough. Strengthening them seems especially important
because the existance of the bug has caused corruption that we don't
want to make even worse during future vacuum cycles.
The errors are emitted with ereport rather than elog, despite being
"should never happen" messages, so a proper error code is emitted. To
avoid superflous translations, mark messages as internal.
Author: Andres Freund and Alvaro Herrera
Reviewed-By: Alvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
Backpatch: 9.3-
This patch makes a number of interrelated changes to reduce the overhead
involved in creating/deleting memory contexts. The key ideas are:
* Include the AllocSetContext header of an aset.c context in its first
malloc request, rather than allocating it separately in TopMemoryContext.
This means that we now always create an initial or "keeper" block in an
aset, even if it never receives any allocation requests.
* Create freelists in which we can save and recycle recently-destroyed
asets (this idea is due to Robert Haas).
* In the common case where the name of a context is a constant string,
just store a pointer to it in the context header, rather than copying
the string.
The first change eliminates a palloc/pfree cycle per context, and
also avoids bloat in TopMemoryContext, at the price that creating
a context now involves a malloc/free cycle even if the context never
receives any allocations. That would be a loser for some common
usage patterns, but recycling short-lived contexts via the freelist
eliminates that pain.
Avoiding copying constant strings not only saves strlen() and strcpy()
overhead, but is an essential part of the freelist optimization because
it makes the context header size constant. Currently we make no
attempt to use the freelist for contexts with non-constant names.
(Perhaps someday we'll need to think harder about that, but in current
usage, most contexts with custom names are long-lived anyway.)
The freelist management in this initial commit is pretty simplistic,
and we might want to refine it later --- but in common workloads that
will never matter because the freelists will never get full anyway.
To create a context with a non-constant name, one is now required to
call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME
option. AllocSetContextCreate becomes a wrapper macro, and it includes
a test that will complain about non-string-literal context name
parameters on gcc and similar compilers.
An unfortunate side effect of making AllocSetContextCreate a macro is
that one is now *required* to use the size parameter abstraction macros
(ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of
writing out individual size parameters no longer works unless you
switch to AllocSetContextCreateExtended.
Internally to the memory-context-related modules, the context creation
APIs are simplified, removing the rather baroque original design whereby
a context-type module called mcxt.c which then called back into the
context-type module. That saved a bit of code duplication, but not much,
and it prevented context-type modules from exercising control over the
allocation of context headers.
In passing, I converted the test-and-elog validation of aset size
parameters into Asserts to save a few more cycles. The original thought
was that callers might compute size parameters on the fly, but in practice
nobody does that, so it's useless to expend cycles on checking those
numbers in production builds.
Also, mark the memory context method-pointer structs "const",
just for cleanliness.
Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us
A COPY into a table should apply identity sequence values just like it
does for ordinary defaults. This was previously forgotten, leading to
null values being inserted, which in turn would fail because identity
columns have not-null constraints.
Author: Michael Paquier <michael.paquier@gmail.com>
Reported-by: Steven Winfield <steven.winfield@cantabcapital.com>
Bug: #14952
If a hash join appears in a parallel query, there may be no hash table
available for explain.c to inspect even though a hash table may have
been built in other processes. This could happen either because
parallel_leader_participation was set to off or because the leader
happened to hit the end of the outer relation immediately (even though
the complete relation is not empty) and decided not to build the hash
table.
Commit bf11e7ee introduced a way for workers to exchange
instrumentation via the DSM segment for Sort nodes even though they
are not parallel-aware. This commit does the same for Hash nodes, so
that explain.c has a way to find instrumentation data from an
arbitrary participant that actually built the hash table.
Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm%3D3DUQC2-z252N55eOcZBer6DPdM%3DFzrxH9dZc5vYLsjaA%40mail.gmail.com
Hopefully, the additional logging will help avoid confusion that
could otherwise result.
Nathan Bossart, reviewed by Michael Paquier, Fabrízio Mello, and me
This adds a new object type "procedure" that is similar to a function
but does not have a return type and is invoked by the new CALL statement
instead of SELECT or similar. This implementation is aligned with the
SQL standard and compatible with or similar to other SQL implementations.
This commit adds new commands CALL, CREATE/ALTER/DROP PROCEDURE, as well
as ALTER/DROP ROUTINE that can refer to either a function or a
procedure (or an aggregate function, as an extension to SQL). There is
also support for procedures in various utility commands such as COMMENT
and GRANT, as well as support in pg_dump and psql. Support for defining
procedures is available in all the languages supplied by the core
distribution.
While this commit is mainly syntax sugar around existing functionality,
future features will rely on having procedures as a separate object
type.
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
heap_drop_with_catalog and ATExecDetachPartition neglected to check for
SearchSysCache failures, as noted in bugs #14927 and #14928 from Pan Bian.
Such failures are pretty unlikely, since we should already have some sort
of lock on the rel at these points, but it's neither a good idea nor
per project style to omit a check for failure.
Also, StorePartitionKey contained a syscache lookup that it never did
anything with, including never releasing the result. Presumably the
reason why we don't see refcount-leak complaints is that the lookup
always fails; but in any case it's pretty useless, so remove it.
All of these errors were evidently introduced by the relation
partitioning feature. Back-patch to v10 where that came in.
Amit Langote and Tom Lane
Discussion: https://postgr.es/m/20171127090105.1463.3962@wrigleys.postgresql.org
Discussion: https://postgr.es/m/20171127091341.1468.72696@wrigleys.postgresql.org