* Separate calculation of hash value from the lookup.
* Split build_hash_table() into two functions.
* Change lookup_hash_entry() to return AggStatePerGroup. That's all
the caller needed, anyway.
These changes are to support the upcoming Disk-based Hash Aggregation
work.
Discussion: https://postgr.es/m/31f5ab871a3ad5a1a91a7a797651f20e77ac7ce3.camel%40j-davis.com
When updating a table row with generated columns, only recompute those
generated columns whose base columns have changed in this update and
keep the rest unchanged. This can result in a significant performance
benefit. The required information was already kept in
RangeTblEntry.extraUpdatedCols; we just have to make use of it.
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b05e781a-fa16-6b52-6738-761181204567@2ndquadrant.com
Commit 4eaea3db introduced TupleHashTableHash(), but the signature
didn't match the other exposed functions. Separate it into internal
and external versions. The external version hides the details behind
an API more consistent with the other external functions, and the
internal version is still suitable for simplehash.
Commit 147e3722f7 changed Tid scan so that it calls table_beginscan()
and uses the scan option for seq scan. This change caused two issues.
(1) The change caused Tid scan to take a predicate lock on the entire
relation in serializable transaction even when relation-level
lock is not necessary. This could lead to an unexpected
serialization error.
(2) The change caused Tid scan to increment the number of seq_scan
in pg_stat_*_tables views even though it's not seq scan. This
could confuse the users.
This commit adds the scan option for Tid scan and makes Tid scan
use it, to avoid those issues.
Back-patch to v12, where the bug was introduced.
Author: Tatsuhito Kasahara
Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada, Fujii Masao
Discussion: https://postgr.es/m/CAP0=ZVKy+gTbFmB6X_UW0pP3WaeJ-fkUWHoD-pExS=at3CY76g@mail.gmail.com
Expose two new entry points: one for only calculating the hash value
of a tuple, and another for looking up a hash entry when the hash
value is already known. This will be useful for disk-based Hash
Aggregation to avoid recomputing the hash value for the same tuple
after saving and restoring it from disk.
Discussion: https://postgr.es/m/37091115219dd522fd9ed67333ee8ed1b7e09443.camel%40j-davis.com
It's already tracked via ExprState->parent, so we don't need to also
include it in ExprEvalStep. When that code originally was written
ExprState->parent didn't exist, but it since has been introduced in
6719b238e8.
Author: Andres Freund
Discussion: https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de
This mostly consists of using C99 style for loops, moving variables
into narrower scopes, and a smattering of other minor improvements.
Done separately to make it easier to review patches with actual
functional changes.
Author: Andres Freund
Discussion: https://postgr.es/m/20191023163849.sosqbfs5yenocez3@alap3.anarazel.de
When replica identity is FULL (an admittedly unusual case), the loop
that searches for tuples in execReplication.c didn't stop scanning the
table when once a matching tuple was found. Add the missing 'break'.
Note slight behavior change: we now return the first matching tuple
rather than the last one. They are supposed to be indistinguishable
anyway, so this shouldn't matter.
Author: Konstantin Knizhnik
Discussion: https://postgr.es/m/379743f6-ae91-b866-f7a2-5624e6d2b0a4@postgrespro.ru
We used to strategically place newlines after some function call left
parentheses to make pgindent move the argument list a few chars to the
left, so that the whole line would fit under 80 chars. However,
pgindent no longer does that, so the newlines just made the code
vertically longer for no reason. Remove those newlines, and reflow some
of those lines for some extra naturality.
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql
EvalPlanQualStart() supposed that it could re-use the relsubs_rowmark
and relsubs_done arrays from a prior instantiation. But since they are
allocated in the es_query_cxt of the recheckestate, that's just wrong;
EvalPlanQualEnd() will blow away that storage. Therefore we were using
storage that could have been reallocated to something else, causing all
sorts of havoc.
I think this was modeled on the old code's handling of es_epqTupleSlot,
but since the code was anyway clearing the arrays at re-use, there's
clearly no expectation of importing any outside state. So it's just
a dubious savings of a couple of pallocs, which is negligible compared
to setting up a new planstate tree. Therefore, just allocate the
arrays always. (I moved the allocations slightly for readability.)
In principle this bug could cause a problem whenever EPQ rechecks are
needed in more than one target table of a ModifyTable plan node.
In practice it seems not quite so easy to trigger as that; I couldn't
readily duplicate a crash with a partitioned target table, for instance.
That's probably down to incidental choices about when to free or
reallocate stuff. The added isolation test case does seem to reliably
show an assertion failure, though.
Per report from Oleksii Kliukin. Back-patch to v12 where the bug was
introduced (evidently by commit 3fb307bc4).
Discussion: https://postgr.es/m/EEF05F66-2871-4786-992B-5F45C92FEE2E@hintbits.com
This gives more information to the user about the error and it makes such
messages consistent with the other similar messages in the code.
Reported-by: Simon Riggs
Author: Mahendra Singh and Simon Riggs
Reviewed-by: Beena Emerson and Amit Kapila
Discussion: https://postgr.es/m/CANP8+j+7YUvQvGxTrCiw77R23enMJ7DFmyA3buR+fa2pKs4XhA@mail.gmail.com
Currently, Parallel Hash Join cannot be used for full/right joins,
so there is no point in setting the match flag. It turns out that
the cache coherence traffic generated by those writes slows down
large systems running many-core joins, so let's stop doing that.
In future, if we need to use match bits in parallel joins, we might
want to consider setting them only if not already set.
Back-patch to 11, where Parallel Hash Join arrived.
Reported-by: Deng, Gang
Discussion: https://postgr.es/m/0F44E799048C4849BAE4B91012DB910462E9897A%40SHSMSX103.ccr.corp.intel.com
The code checking whether an aggregate transition value needs to be
reparented into the current context has always only compared the
transition return value with the previous transition value by datum,
i.e. without regard for NULLness. This normally works, because when
the transition function returns NULL (via fcinfo->isnull), it'll
return a value that won't be the same as its input value.
But there's no hard requirement that that's the case. And it turns
out, it's possible to hit this case (see discussion or reproducers),
leading to a non-null transition value not being reparented, followed
by a crash caused by that.
Instead of adding another comparison of NULLness, instead have
ExecAggTransReparent() ensure that pergroup->transValue ends up as 0
when the new transition value is NULL. That avoids having to add an
additional branch to the much more common cases of the transition
function returning the old transition value (which is a pointer in
this case), and when the new value is different, but not NULL.
In branches since 69c3936a14, also deduplicate the reparenting code
between the expression evaluation based transitions, and the path for
ordered aggregates.
Reported-By: Teodor Sigaev, Nikita Glukhov
Author: Andres Freund
Discussion: https://postgr.es/m/bd34e930-cfec-ea9b-3827-a8bc50891393@sigaev.ru
Backpatch: 9.4-, this issue has existed since at least 7.4
Commit 9b63c13f0 turns out to have been fundamentally misguided:
the parent node's subPlan list is by no means the only way in which
a child SubPlan node can be hooked into the outer execution state.
As shown in bug #16213 from Matt Jibson, we can also get short-lived
tuple table slots added to the outer es_tupleTable list. At this point
I have little faith that there aren't other possible connections as
well; the long time it took to notice this problem shows that this
isn't a heavily-exercised situation.
Therefore, revert that fix, returning to the coding that passed a
NULL parent plan pointer down to the transiently-built subexpressions.
That gives us a pretty good guarantee that they won't hook into the
outer executor state in any way. But then we need some other solution
to make SubPlans work. Adopt the solution speculated about in the
previous commit's log message: do expression initialization at plan
startup for just those VALUES rows containing SubPlans, abandoning the
goal of reclaiming memory intra-query for those rows. In practice it
seems unlikely that queries containing a vast number of VALUES rows
would be using SubPlans in them, so this should not give up much.
(BTW, this test case also refutes my claim in connection with the prior
commit that the issue only arises with use of LATERAL. That was just
wrong: some variants of SubLink always produce SubPlans.)
As with previous patch, back-patch to all supported branches.
Discussion: https://postgr.es/m/16213-871ac3bc208ecf23@postgresql.org
Previously, check_xact_readonly() was responsible for determining
which types of queries could not be run in a read-only transaction,
standard_ProcessUtility() was responsibility for prohibiting things
which were allowed in read only transactions but not in recovery, and
utility commands were basically prohibited in bulk in parallel mode by
calls to CommandIsReadOnly() in functions.c and spi.c. This situation
was confusing and error-prone. Accordingly, move all the checks to a
new function ClassifyUtilityCommandAsReadOnly(), which determines the
degree to which a given statement is read only.
In the old code, check_xact_readonly() inadvertently failed to handle
several statement types that actually should have been prohibited,
specifically T_CreatePolicyStmt, T_AlterPolicyStmt, T_CreateAmStmt,
T_CreateStatsStmt, T_AlterStatsStmt, and T_AlterCollationStmt. As a
result, thes statements were erroneously allowed in read only
transactions, parallel queries, and standby operation. Generally, they
would fail anyway due to some lower-level error check, but we
shouldn't rely on that. In the new code structure, future omissions
of this type should cause ClassifyUtilityCommandAsReadOnly() to
complain about an unrecognized node type.
As a fringe benefit, this means we can allow certain types of utility
commands in parallel mode, where it's safe to do so. This allows
ALTER SYSTEM, CALL, DO, CHECKPOINT, COPY FROM, EXPLAIN, and SHOW.
It might be possible to allow additional commands with more work
and thought.
Along the way, document the thinking process behind the current set
of checks, as per discussion especially with Peter Eisentraut. There
is some interest in revising some of these rules, but that seems
like a job for another patch.
Patch by me, reviewed by Tom Lane, Stephen Frost, and Peter
Eisentraut.
Discussion: http://postgr.es/m/CA+TgmoZ_rLqJt5sYkvh+JpQnfX0Y+B2R+qfi820xNih6x-FQOQ@mail.gmail.com
A view with conditional INSTEAD rules and no unconditional INSTEAD
rules or INSTEAD OF triggers is not auto-updatable. Previously we
relied on a check in the executor to catch this, but that's
problematic since the planner may fail to properly handle such a query
and thus return a particularly unhelpful error to the user, before
reaching the executor check.
Instead, trap this in the rewriter and report the correct error there.
Doing so also allows us to include more useful error detail than the
executor check can provide. This doesn't change the existing behaviour
of updatable views; it merely ensures that useful error messages are
reported when a view isn't updatable.
Per report from Pengzhou Tang, though not adopting that suggested fix.
Back-patch to all supported branches.
Discussion: https://postgr.es/m/CAG4reAQn+4xB6xHJqWdtE0ve_WqJkdyCV4P=trYr4Kn8_3_PEA@mail.gmail.com
Use the parser's standard type coercion machinery to convert the
output column(s) of a SQL function's final SELECT or RETURNING
to the type(s) they should have according to the function's declared
result type. We'll allow any case where an assignment-level
coercion is available. Previously, we failed unless the required
coercion was a binary-compatible one (and the documentation ignored
this, falsely claiming that the types must match exactly).
Notably, the coercion now accounts for typmods, so that cases where
a SQL function is declared to return a composite type whose columns
are typmod-constrained now behave as one would expect. Arguably
this aspect is a bug fix, but the overall behavioral change here
seems too large to consider back-patching.
A nice side-effect is that functions can now be inlined in a
few cases where we previously failed to do so because of type
mismatches.
Discussion: https://postgr.es/m/18929.1574895430@sss.pgh.pa.us
This follows multiple complains from Peter Geoghegan, Andres Freund and
Alvaro Herrera that this issue ought to be dug more before actually
happening, if it happens.
Discussion: https://postgr.es/m/20191226144606.GA5659@alvherre.pgsql
Formerly the rd_partkey and rd_partdesc data structures were always
populated immediately when a relcache entry was built or rebuilt.
This patch changes things so that they are populated only when they
are first requested. (Hence, callers *must* now always use
RelationGetPartitionKey or RelationGetPartitionDesc; just fetching
the pointer directly is no longer acceptable.)
This seems to have some performance benefits, but the main reason to do
it is that it eliminates a recursive-reload failure that occurs if the
partkey or partdesc expressions contain any references to the relation's
rowtype (as discovered by Amit Langote). In retrospect, since loading
these data structures might result in execution of nearly-arbitrary code
via eval_const_expressions, it was a dumb idea to require that to happen
during relcache entry rebuild.
Also, fix things so that old copies of a relcache partition descriptor
will be dropped when the cache entry's refcount goes to zero. In the
previous coding it was possible for such copies to survive for the
lifetime of the session, as I'd complained of in a previous discussion.
(This management technique still isn't perfect, but it's better than
before.) Improve the commentary explaining how that works and why
it's safe to hand out direct pointers to these relcache substructures.
In passing, improve RelationBuildPartitionDesc by using the same
memory-context-parent-swap approach used by RelationBuildPartitionKey,
thereby making it less dependent on strong assumptions about what
partition_bounds_copy does. Avoid doing get_rel_relkind in the
critical section, too.
Patch by Amit Langote and Tom Lane; Robert Haas deserves some credit
for prior work in the area, too. Although this is a pre-existing
problem, no back-patch: the patch seems too invasive to be safe to
back-patch, and the bug it fixes is a corner case that seems
relatively unlikely to cause problems in the field.
Discussion: https://postgr.es/m/CA+HiwqFUzjfj9HEsJtYWcr1SgQ_=iCAvQ=O2Sx6aQxoDu4OiHw@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com
The following renaming is done so as source files related to index
access methods are more consistent with table access methods (the
original names used for index AMs ware too generic, and could be
confused as including features related to table AMs):
- amapi.h -> indexam.h.
- amapi.c -> indexamapi.c. Here we have an equivalent with
backend/access/table/tableamapi.c.
- amvalidate.c -> indexamvalidate.c.
- amvalidate.h -> indexamvalidate.h.
- genam.c -> indexgenam.c.
- genam.h -> indexgenam.h.
This has been discussed during the development of v12 when table AM was
worked on, but the renaming never happened.
Author: Michael Paquier
Reviewed-by: Fabien Coelho, Julien Rouhaud
Discussion: https://postgr.es/m/20191223053434.GF34339@paquier.xyz
Using \ is unnecessary and ugly, so remove that. While at it, stitch
the literals back into a single line: we've long discouraged splitting
error message literals even when they go past the 80 chars line limit,
to improve greppability.
Leave contrib/tablefunc alone.
Discussion: https://postgr.es/m/20191223195156.GA12271@alvherre.pgsql
Our algorithm for choosing batch numbers turned out not to work
effectively for multi-billion key inner relations. We would use
more hash bits than we have, and effectively concentrate all tuples
into a smaller number of batches than we intended. While ideally
we should switch to wider hashes, for now, change the algorithm to
one that effectively gives up bits from the bucket number when we
don't have enough bits. That means we'll finish up with longer
bucket chains than would be ideal, but that's better than having
batches that don't fit in work_mem and can't be divided.
Batch-patch to all supported releases.
Author: Thomas Munro
Reviewed-by: Tom Lane, thanks also to Tomas Vondra, Alvaro Herrera, Andres Freund for testing and discussion
Reported-by: James Coleman
Discussion: https://postgr.es/m/16104-dc11ed911f1ab9df%40postgresql.org
Tuple conversion support in tupconvert.c is able to convert rowtypes
between two relations, inner and outer, which are logically equivalent
but have a different ordering or even dropped columns (used mainly for
inheritance tree and partitions). This makes use of attribute mappings,
which are simple arrays made of AttrNumber elements with a length
matching the number of attributes of the outer relation. The length of
the attribute mapping has been treated as completely independent of the
mapping itself until now, making it easy to pass down an incorrect
mapping length.
This commit refactors the code related to attribute mappings and moves
it into an independent facility called attmap.c, extracted from
tupconvert.c. This merges the attribute mapping with its length,
avoiding to try to guess what is the length of a mapping to use as this
is computed once, when the map is built.
This will avoid mistakes like what has been fixed in dc816e58, which has
used an incorrect mapping length by matching it with the number of
attributes of an inner relation (a child partition) instead of an outer
relation (a partitioned table).
Author: Michael Paquier
Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/20191121042556.GD153437@paquier.xyz
Previously, if the startup pruning logic proved that all child nodes
of an Append or MergeAppend could be pruned, we still kept one, just
to keep EXPLAIN from failing. The previous commit removed the
ruleutils.c limitation that required this kluge, so drop it. That
results in less-confusing EXPLAIN output, as per a complaint from
Yuzuko Hosoya.
David Rowley
Discussion: https://postgr.es/m/001001d4f44b$2a2cca50$7e865ef0$@lab.ntt.co.jp
This patch causes EXPLAIN to always assign a separate table alias to the
parent RTE of an append relation (inheritance set); before, such RTEs
were ignored if not actually scanned by the plan. Since the child RTEs
now always have that same alias to start with (cf. commit 55a1954da),
the net effect is that the parent RTE usually gets the alias used or
implied by the query text, and the children all get that alias with "_N"
appended. (The exception to "usually" is if there are duplicate aliases
in different subtrees of the original query; then some of those original
RTEs will also have "_N" appended.)
This results in more uniform output for partitioned-table plans than
we had before: the partitioned table itself gets the original alias,
and all child tables have aliases with "_N", rather than the previous
behavior where one of the children would get an alias without "_N".
The reason for giving the parent RTE an alias, even if it isn't scanned
by the plan, is that we now use the parent's alias to qualify Vars that
refer to an appendrel output column and appear above the Append or
MergeAppend that computes the appendrel. But below the append, Vars
refer to some one of the child relations, and are displayed that way.
This seems clearer than the old behavior where a Var that could carry
values from any child relation was displayed as if it referred to only
one of them.
While at it, change ruleutils.c so that the code paths used by EXPLAIN
deal in Plan trees not PlanState trees. This effectively reverts a
decision made in commit 1cc29fe7c, which seemed like a good idea at
the time to make ruleutils.c consistent with explain.c. However,
it's problematic because we'd really like to allow executor startup
pruning to remove all the children of an append node when possible,
leaving no child PlanState to resolve Vars against. (That's not done
here, but will be in the next patch.) This requires different handling
of subplans and initplans than before, but is otherwise a pretty
straightforward change.
Discussion: https://postgr.es/m/001001d4f44b$2a2cca50$7e865ef0$@lab.ntt.co.jp
Revert part of commit 19df1702f5.
Early shutdown was added by that commit so that we could collect
statistics from workers, but unfortunately, it interacted badly with
rescans. The problem is that we ended up destroying the parallel context
which is required for rescans. This leads to rescans of a Limit node over
a Gather node to produce unpredictable results as it tries to access
destroyed parallel context. By reverting the early shutdown code, we
might lose statistics in some cases of Limit over Gather [Merge], but that
will require further study to fix.
Reported-by: Jerry Sievers
Diagnosed-by: Thomas Munro
Author: Amit Kapila, testcase by Vignesh C
Backpatch-through: 9.6
Discussion: https://postgr.es/m/87ims2amh6.fsf@jsievers.enova.com
Call ExecShutdownNode() after ExecutePlan()'s loop, rather than at each
break. We had forgotten to do that in one case. The omission caused
intermittent "temporary file leak" warnings from multi-batch parallel
hash joins with a LIMIT clause.
Back-patch to 11. Though the problem exists in theory in earlier
parallel query releases, nothing really depended on it.
Author: Kyotaro Horiguchi
Reviewed-by: Thomas Munro, Amit Kapila
Discussion: https://postgr.es/m/20191111.212418.2222262873417235945.horikyota.ntt%40gmail.com
Avoid creating transiently-inconsistent slot states where possible,
by not setting TTS_FLAG_SHOULDFREE until after the slot actually has
a free'able tuple pointer, and by making sure that we reset tts_nvalid
and related derived state before we replace the tuple contents. This
would only matter if something were to examine the slot after we'd
suffered some kind of error (e.g. out of memory) while manipulating
the slot. We typically don't do that, so these changes might just be
cosmetic --- but even if so, it seems like good future-proofing.
Also remove some redundant Asserts, and add a couple for consistency.
Back-patch to v12 where all this code was rewritten.
Discussion: https://postgr.es/m/16095-c3ff2e5283b8dba5@postgresql.org
When maintaining or merging patches, one of the most common sources
for conflicts are the list of objects in makefiles. Especially when
the split across lines has been changed on both sides, which is
somewhat common due to attempting to stay below 80 columns, those
conflicts are unnecessarily laborious to resolve.
By splitting, and alphabetically sorting, OBJS style lines into one
object per line, conflicts should be less frequent, and easier to
resolve when they still occur.
Author: Andres Freund
Discussion: https://postgr.es/m/20191029200901.vww4idgcxv74cwes@alap3.anarazel.de
The old names for the attribute-detoasting functions names included
the word "heap," which seems outdated now that the heap is only one of
potentially many table access methods.
On the other hand, toast_insert_or_update and toast_delete are
heap-specific, so rename them by adding "heap_" as a prefix.
Not all of the work of making the TOAST system fully accessible to AMs
other than the heap is done yet, but there seems to be little harm in
getting this renaming out of the way now. Commit
8b94dab066 already divided up the
functions among various files partially according to whether it was
intended that they should be heap-specific or AM-agnostic, so this is
just clarifying the division contemplated by that commit.
Patch by me, reviewed and tested by Prabhat Sabu, Thomas Munro,
Andres Freund, and Álvaro Herrera.
Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com
That avoids unnecessary work during both interpreted execution, and
JIT compiled expression evaluation. Both benefit from fewer expression
steps needing be processed, and for interpreted execution there now is
a fastpath dedicated to just fetching a value from a virtual
slot. That's e.g. beneficial for hashjoins over nodes that perform
projections, as the hashed columns are currently fetched individually.
Author: Soumyadeep Chakraborty, Andres Freund
Discussion: https://postgr.es/m/CAE-ML+9OKSN71+mHtfMD-L24oDp8dGTfaVjDU6U+j+FNAW5kRQ@mail.gmail.com
In the course of 5567d12ce0, 356687bd8 and 317ffdfeaa, I changed
BuildTupleHashTable[Ext]'s call to ExecBuildGroupingEqual to not pass
in the parent node, but NULL. Which in turn prevents the tuple
equality comparator from being JIT compiled. While that fixes
bug #15486, it is not actually necessary after all of the above commits,
as we don't re-build the comparator when using the new
BuildTupleHashTableExt() interface (as the content of the hashtable
are reset, but the TupleHashTable itself is not).
Therefore re-allow jit compilation for callers that use
BuildTupleHashTableExt with a separate context for "metadata" and
content.
As in the previous commit, there's ongoing work to make this easier to
test to prevent such regressions in the future, but that
infrastructure is not going to be backpatchable.
The performance impact of not JIT compiling hashtable equality
comparators can be substantial e.g. for aggregation queries that
aggregate a lot of input rows to few output rows (when there are a lot
of output groups, there will be fewer comparisons).
Author: Andres Freund
Discussion: https://postgr.es/m/20190927072053.njf6prdl3vb7y7qb@alap3.anarazel.de
Backpatch: 11, just as 5567d12ce0
For many queries the fact that the tuple descriptor from the lower
node was not taken into account when determining whether the type of a
slot is fixed, lead to tuple deforming for such upper nodes not to be
JIT accelerated.
I broke this in 675af5c01e.
There is ongoing work to enable writing regression tests for related
behavior (including a patch that would have detected this
regression), by optionally showing such details in EXPLAIN. But as it
seems unlikely that that will be suitable for stable branches, just
merge the fix for now.
While it's fairly close to the 12 release window, the fact that 11
continues to perform JITed tuple deforming in these cases, that
there's still cases where we do so in 12, and the fact that the
performance regression can be sizable, weigh in favor of fixing it
now.
Author: Andres Freund
Discussion: https://postgr.es/m/20190927072053.njf6prdl3vb7y7qb@alap3.anarazel.de
Backpatch: 12-, where 675af5c01e was merged.
The code used the destination slot's natts where it intended to
use the source slot's natts. Adding an Assert shows that there
is no case in "make check-world" where these counts are different,
so maybe this is a harmless bug, but it's still a bug.
Takayuki Tsunakawa
Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1FD34C0E@G01JPEXMBYT05
In ad0bda5d24 I changed the EvalPlanQual machinery to store
substitution tuples in slot, instead of using plain HeapTuples. The
main motivation for that was that using HeapTuples will be inefficient
for future tableams. But it turns out that that conversion was buggy
for non-locking rowmarks - the wrong tuple descriptor was used to
create the slot.
As a secondary issue 5db6df0c0 changed ExecLockRows() to begin EPQ
earlier, to allow to fetch the locked rows directly into the EPQ
slots, instead of having to copy tuples around. Unfortunately, as Tom
complained, that forces some expensive initialization to happen
earlier.
As a third issue, the test coverage for EPQ was clearly insufficient.
Fixing the first issue is unfortunately not trivial: Non-locked row
marks were fetched at the start of EPQ, and we don't have the type
information for the rowmarks available at that point. While we could
change that, it's not easy. It might be worthwhile to change that at
some point, but to fix this bug, it seems better to delay fetching
non-locking rowmarks when they're actually needed, rather than
eagerly. They're referenced at most once, and in cases where EPQ
fails, might never be referenced. Fetching them when needed also
increases locality a bit.
To be able to fetch rowmarks during execution, rather than
initialization, we need to be able to access the active EPQState, as
that contains necessary data. To do so move EPQ related data from
EState to EPQState, and, only for EStates creates as part of EPQ,
reference the associated EPQState from EState.
To fix the second issue, change EPQ initialization to allow use of
EvalPlanQualSlot() to be used before EvalPlanQualBegin() (but
obviously still requiring EvalPlanQualInit() to have been done).
As these changes made struct EState harder to understand, e.g. by
adding multiple EStates, significantly reorder the members, and add a
lot more comments.
Also add a few more EPQ tests, including one that fails for the first
issue above. More is needed.
Reported-By: yi huang
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion:
https://postgr.es/m/CAHU7rYZo_C4ULsAx_LAj8az9zqgrD8WDd4hTegDTMM1LMqrBsg@mail.gmail.comhttps://postgr.es/m/24530.1562686693@sss.pgh.pa.us
Backpatch: 12-, where the EPQ changes were introduced
detoast.c/h contain functions required to detoast a datum, partially
or completely, plus a few other utility functions for examining the
size of toasted datums.
toast_internals.c/h contain functions that are used internally to the
TOAST subsystem but which (mostly) do not need to be accessed from
outside.
heaptoast.c/h contains code that is intrinsically specific to the
heap AM, either because it operates on HeapTuples or is based on the
layout of a heap page.
detoast.c and toast_internals.c are placed in
src/backend/access/common rather than src/backend/access/heap. At
present, both files still have dependencies on the heap, but that will
be improved in a future commit.
Patch by me, reviewed and tested by Prabhat Sabu, Thomas Munro,
Andres Freund, and Álvaro Herrera.
Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com
The message was included as a parameter when this function was added in
dcb2bda9b7, but I don't think it has ever served any useful purpose.
Let's stop spreading it pointlessly.
Reviewed by Amit Langote and Peter Eisentraut.
Discussion: https://postgr.es/m/20190806224728.GA17233@alvherre.pgsql