1
0
mirror of https://github.com/postgres/postgres.git synced 2025-09-02 04:21:28 +03:00
Commit Graph

2585 Commits

Author SHA1 Message Date
Melanie Plageman
2128cebcdb Remove extra prefetch iterator setup for Bitmap Table Scan
1a0da347a7 replaced Bitmap Table Scan's separate private and
shared bitmap iterators with a unified iterator. It accidentally set up
the prefetch iterator twice for non-parallel bitmap table scans. Remove
the extra set up call to tbm_begin_iterate().
2024-12-19 11:55:18 -05:00
Melanie Plageman
754c610e13 Fix bitmap table scan crash on iterator release
1a0da347a7 replaced Bitmap Table Scan's individual private and
shared iterators with a unified iterator. It neglected, however, to
check if the iterator had already been cleaned up before doing so on
rescan. Add this check both on rescan and end scan to be safe.

Reported-by: Richard Guo
Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs48nrhcLY1kcd-u9oD%2B6yiS631F_8Fx8ZGsO-BYDwH%2Bbyw%40mail.gmail.com
2024-12-19 11:55:03 -05:00
David Rowley
08cdb079d4 Optimize grouping equality checks with virtual slots
8f4ee9626 fixed an old Assert failure that could happen when the slot
type used to look up the hash table for BuildTupleHashTableExt() users
wasn't a TTSOpsMinimalTuple slot.  The fix for that in the back branches
had to be to pass the TupleTableSlotOps as NULL, however in master,
since we have the inputOps parameter as was added by d96d1d515, we can
pass that down instead.

At least one caller uses a fixed slot that's always TTSOpsVirtual, so
passing down inputOps for these cases allows ExecBuildGroupingEqual() to
skip adding the EEOP_INNER_FETCHSOME ExprEvalStep.

This should increase the performance of hashed subplans very slightly.

Author: Tom Lane, David Rowley
Discussion: https://postgr.es/m/2543667.1734483723@sss.pgh.pa.us
2024-12-19 13:57:21 +13:00
David Rowley
8f4ee96269 Fix Assert failure in WITH RECURSIVE UNION queries
If the non-recursive part of a recursive CTE ended up using
TTSOpsBufferHeapTuple as the table slot type, then a duplicate value
could cause an Assert failure in CheckOpSlotCompatibility() when
checking the hash table for the duplicate value.  The expected slot type
for the deform step was TTSOpsMinimalTuple so the Assert failed when the
TTSOpsBufferHeapTuple slot was used.

This is a long-standing bug which we likely didn't notice because it
seems much more likely that the non-recursive term would have required
projection and used a TTSOpsVirtual slot, which CheckOpSlotCompatibility
is ok with.

There doesn't seem to be any harm done here other than the Assert
failure.  Both TTSOpsMinimalTuple and TTSOpsBufferHeapTuple slot types
require tuple deformation, so the EEOP_*_FETCHSOME ExprState step would
have properly existed in the ExprState.

The solution is to pass NULL for the ExecBuildGroupingEqual's 'lops'
parameter.  This means the ExprState's EEOP_*_FETCHSOME step won't
expect a fixed slot type.  This makes CheckOpSlotCompatibility() happy as
no checking is performed when the ExprEvalStep is not expecting a fixed
slot type.

Reported-by: Richard Guo
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAMbWs4-8U9q2LAtf8+ghV11zeUReA3AmrYkxzBEv0vKnDxwkKA@mail.gmail.com
Backpatch-through: 13, all supported versions
2024-12-19 13:11:39 +13:00
Melanie Plageman
1a0da347a7 Bitmap Table Scans use unified TBMIterator
With the repurposing of TBMIterator as an interface for both parallel
and serial iteration through TIDBitmaps in commit 7f9d4187e7,
bitmap table scans may now use it.

Modify bitmap table scan code to use the TBMIterator. This requires
moving around a bit of code, so a few variables are initialized
elsewhere.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/c736f6aa-8b35-4e20-9621-62c7c82e2168%40vondra.me
2024-12-18 18:43:39 -05:00
Melanie Plageman
7f9d4187e7 Add common interface for TBMIterators
Add and use TBMPrivateIterator, which replaces the current TBMIterator
for serial use cases, and repurpose TBMIterator to be a unified
interface for both the serial ("private") and parallel ("shared") TID
Bitmap iterator interfaces. This encapsulation simplifies call sites for
callers supporting both parallel and serial TID Bitmap access.
TBMIterator is not yet used in this commit.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Heikki Linnakangas
Discussion: https://postgr.es/m/063e4eb4-32d9-439e-a0b1-75565a9835a8%40iki.fi
2024-12-18 18:19:28 -05:00
David Rowley
d96d1d5152 Fix incorrect slot type in BuildTupleHashTableExt
0f5738202 adjusted the execGrouping.c code so it made use of ExprStates to
generate hash values.  That commit made a wrong assumption that the slot
type to pass to ExecBuildHash32FromAttrs() is always &TTSOpsMinimalTuple.
That's not the case as the slot type depends on the slot type passed to
LookupTupleHashEntry(), which for nodeRecursiveunion.c, could be any of
the current slot types.

Here we fix this by adding a new parameter to BuildTupleHashTableExt()
to allow the slot type to be passed in.  In the case of nodeSubplan.c
and nodeAgg.c the slot type is always &TTSOpsVirtual, so for both of
those cases, it's beneficial to pass the known slot type as that allows
ExecBuildHash32FromAttrs() to skip adding the tuple deform step to the
resulting ExprState.  Another possible fix would have been to have
ExecBuildHash32FromAttrs() set "fetch.kind" to NULL so that
ExecComputeSlotInfo() always determines the EEOP_INNER_FETCHSOME is
required, however, that option isn't favorable as slows down aggregation
and hashed subplan evaluation due to the extra (needless) deform step.

Thanks to Nathan Bossart for bisecting to find the offending commit
based on Paul's report.

Reported-by: Paul Ramsey <pramsey@cleverelephant.ca>
Discussion: https://postgr.es/m/99F064C1-B3EB-4BE7-97D2-D2A0AA487A71@cleverelephant.ca
2024-12-18 12:05:55 +13:00
David Rowley
0f5738202b Use ExprStates for hashing in GROUP BY and SubPlans
This speeds up obtaining hash values for GROUP BY and hashed SubPlans by
using the ExprState support for hashing, thus allowing JIT compilation for
obtaining hash values for these operations.

This, even without JIT compilation, has been shown to improve Hash
Aggregate performance in some cases by around 15% and hashed NOT IN
queries in one case by over 30%, however, real-world cases are likely to
see smaller gains as the test cases used were purposefully designed to
have high hashing overheads by keeping the hash table small to prevent
additional memory overheads that would be a factor when working with large
hash tables.

In passing, fix a hypothetical bug in ExecBuildHash32Expr() so that the
initial value is stored directly in the ExprState's result field if
there are no expressions to hash.  None of the current users of this
function use an initial value, so the bug is only hypothetical.

Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/CAApHDvpYSO3kc9UryMevWqthTBrxgfd9djiAjKHMPUSQeX9vdQ@mail.gmail.com
2024-12-11 13:47:16 +13:00
David Rowley
50416cc484 Speedup Hash Joins with dedicated functions for ExprState hashing
Hashing of a single Var is a very common operation for ExprState to
perform.  Here we add dedicated ExecJust* functions which helps speed up
Hash Joins by removing the interpretation overhead in ExecInterpExpr().

This change currently only affects Hash Joins on a single column.  Hash
Joins with multiple join keys or an expression still run through
ExecInterpExpr().

Some testing has shown up to 10% query performance increases on recent AMD
hardware and nearly 7% increase on an Apple M2 for a query performing a
hash join with a large number of lookups on a small hash table.

This change was extracted from a larger patch which adjusts GROUP BY /
hashed subplans / hashed set operations to use ExprState hashing.

Discussion: https://postgr.es/m/CAApHDvr8Zc0ZgzVoCZLdHGOFNhiJeQ6vrUcS9V7N23zMWQb-eA@mail.gmail.com
2024-12-11 11:32:15 +13:00
Tom Lane
9828905303 Doc: add some commentary about ExecutorRun's NoMovement special case.
Robert Haas expressed concern about whether commit 3eea7a0c9 exposed
the parallel-execution machinery to a case it isn't tested for, namely
a second non-parallel execution of a plan after a parallel execution.
Investigation shows that that can't happen because of pquery.c's
manipulation of the scan direction, but it sure wasn't obvious to
start with.  Add some commentary about that.

Discussion: https://postgr.es/m/CA+TgmoagyKQy=HFw+wLo0AKTYWHui+iKswZ8Jnqqd-cFby-WVg@mail.gmail.com
2024-12-10 17:17:28 -05:00
Peter Eisentraut
74edabce7a Support for GiST in get_equal_strategy_number()
A WITHOUT OVERLAPS primary key or unique constraint is accepted as a
REPLICA IDENTITY, since it guarantees uniqueness.  But subscribers
applying logical decoding messages would fail because there was not
support for looking up the equals operator for a gist index.  This
fixes that: For GiST indexes we can use the stratnum GiST support
function.

Reviewed-by: Paul Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
2024-12-10 13:26:09 +01:00
Peter Eisentraut
a2a475b011 Replace get_equal_strategy_number_for_am() by get_equal_strategy_number()
get_equal_strategy_number_for_am() gets the equal strategy number for
an AM.  This currently only supports btree and hash.  In the more
general case, this also depends on the operator class (see for example
GistTranslateStratnum()).  To support that, replace this function with
get_equal_strategy_number() that takes an opclass and derives it from
there.  (This function already existed before as a static function, so
the signature is kept for simplicity.)

This patch is only a refactoring, it doesn't add support for other
index AMs such as gist.  This will be done separately.

Reviewed-by: Paul Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
2024-12-10 12:53:27 +01:00
Peter Eisentraut
321c287351 Improve internal logical replication error for missing equality strategy
This "shouldn't happen", except right now it can with a temporal gist
index (to be fixed soon), because of missing gist support in
get_equal_strategy_number().  But right now, the error is not caught
right away, but instead you get the subsequent error about a "missing
operator 0".  This makes the error more accurate.

Author: Paul Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
2024-12-10 12:30:42 +01:00
Tom Lane
3eea7a0c97 Simplify executor's determination of whether to use parallelism.
Our parallel-mode code only works when we are executing a query
in full, so ExecutePlan must disable parallel mode when it is
asked to do partial execution.  The previous logic for this
involved passing down a flag (variously named execute_once or
run_once) from callers of ExecutorRun or PortalRun.  This is
overcomplicated, and unsurprisingly some of the callers didn't
get it right, since it requires keeping state that not all of
them have handy; not to mention that the requirements for it were
undocumented.  That led to assertion failures in some corner
cases.  The only state we really need for this is the existing
QueryDesc.already_executed flag, so let's just put all the
responsibility in ExecutePlan.  (It could have been done in
ExecutorRun too, leading to a slightly shorter patch -- but if
there's ever more than one caller of ExecutePlan, it seems better
to have this logic in the subroutine than the callers.)

This makes those ExecutorRun/PortalRun parameters unnecessary.
In master it seems okay to just remove them, returning the
API for those functions to what it was before parallelism.
Such an API break is clearly not okay in stable branches,
but for them we can just leave the parameters in place after
documenting that they do nothing.

Per report from Yugo Nagata, who also reviewed and tested
this patch.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/20241206062549.710dc01cf91224809dd6c0e1@sraoss.co.jp
2024-12-09 14:38:19 -05:00
Richard Guo
5668a857de Fix right-semi-joins in HashJoin rescans
When resetting a HashJoin node for rescans, if it is a single-batch
join and there are no parameter changes for the inner subnode, we can
just reuse the existing hash table without rebuilding it.  However,
for join types that depend on the inner-tuple match flags in the hash
table, we need to reset these match flags to avoid incorrect results.
This applies to right, right-anti, right-semi, and full joins.

When I introduced "Right Semi Join" plan shapes in aa86129e1, I failed
to reset the match flags in the hash table for right-semi joins in
rescans.  This oversight has been shown to produce incorrect results.
This patch fixes it.

Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4-nQF9io2WL2SkD0eXvfPdyBc9Q=hRwfQHCGV2usa0jyA@mail.gmail.com
2024-12-09 20:36:23 +09:00
Amit Kapila
2d0152d614 Improve the error message introduced in commit 87ce27de69.
The error detail message "Replica identity consists of an unpublished
generated column." implies that the entire replica identity is made up of
an unpublished generated column which may not be the case.

Reported-by: Peter Smith
Author: Shlok Kyal
Reviewed-by: Peter Smith, Amit Kapila
Discussion: https://postgr.es/m/CAHut+PuwMhKx0PhOA4APhJTLoBGNykbeCQpr_CuwGT-SkswG5w@mail.gmail.com
2024-12-09 09:11:45 +05:30
David Rowley
1fe5a347e3 Fix possible crash during WindowAgg evaluation
When short-circuiting WindowAgg node evaluation on the top-level
WindowAgg node using quals on monotonic window functions, because the
WindowAgg run condition can mean there's no need to evaluate subsequent
window function results in the same partition once the run condition
becomes false, it was possible that the executor would use stale results
from the previous invocation of the window function in some cases.

A fix for this was partially done by a5832722, but that commit only
fixed the issue for non-top-level WindowAgg nodes.  I mistakenly thought
that the top-level WindowAgg didn't have this issue, but Jayesh's example
case clearly shows that's incorrect.  At the time, I also thought that
this only affected 32-bit systems as all window functions which then
supported run conditions returned BIGINT, however, that's wrong as
ExecProject is still called and that could cause evaluation of any other
window function belonging to the same WindowAgg node, one of which may
return a byref type.

The only queries affected by this are WindowAggs with a "Run Condition"
which contains at least one window function with a byref result type,
such as lead() or lag() on a byref column.  The window clause must also
contain a PARTITION BY clause (without a PARTITION BY, execution of the
WindowAgg stops immediately when the run condition becomes false and
there's no risk of using the stale results).

Reported-by: Jayesh Dehankar
Discussion: https://postgr.es/m/193261e2c4d.3dd3cd7c1842.871636075166132237@zohocorp.com
Backpatch-through: 15, where WindowAgg run conditions were added
2024-12-09 14:23:21 +13:00
Amit Kapila
87ce27de69 Ensure stored generated columns must be published when required.
Ensure stored generated columns that are part of REPLICA IDENTITY must be
published explicitly for UPDATE and DELETE operations to be published. We
can publish generated columns by listing them in the column list or by
enabling the publish_generated_columns option.

This commit changes the behavior of the test added in commit adedf54e65 by
giving an ERROR for the UPDATE operation in such cases. There is no way to
trigger the bug reported in commit adedf54e65 but we didn't remove the
corresponding code change because it is still relevant when replicating
changes from a publisher with version less than 18.

We decided not to backpatch this behavior change to avoid the risk of
breaking existing output plugins that may be sending generated columns by
default although we are not aware of any such plugin. Also, we didn't see
any reports related to this on STABLE branches which is another reason not
to backpatch this change.

Author: Shlok Kyal, Hou Zhijie
Reviewed-by: Vignesh C, Amit Kapila
Discussion: https://postgr.es/m/CANhcyEVw4V2Awe2AB6i0E5AJLNdASShGfdBLbUd1XtWDboymCA@mail.gmail.com
2024-12-04 09:45:18 +05:30
David Rowley
4171c44c9b Revert "Introduce CompactAttribute array in TupleDesc"
This reverts commit d28dff3f6c.

Quite a large number of buildfarm members didn't like this commit and
it's not yet clear why.  Reverting this before too many animals turn
red.

Discussion: https://postgr.es/m/CAApHDvr9i6T5=iAwQCxFDgMsthr_obVxgwBaEJkC8KUH6yM3Hw@mail.gmail.com
2024-12-03 17:12:38 +13:00
David Rowley
d28dff3f6c Introduce CompactAttribute array in TupleDesc
The new compact_attrs array stores a few select fields from
FormData_pg_attribute in a more compact way, using only 16 bytes per
column instead of the 104 bytes that FormData_pg_attribute uses.  Using
CompactAttribute allows performance-critical operations such as tuple
deformation to be performed without looking at the FormData_pg_attribute
element in TupleDesc which means fewer cacheline accesses.  With this
change, NAMEDATALEN could be increased with a much smaller negative impact
on performance.

For some workloads, tuple deformation can be the most CPU intensive part
of processing the query.  Some testing with 16 columns on a table
where the first column is variable length showed around a 10% increase in
transactions per second for an OLAP type query performing aggregation on
the 16th column.  However, in certain cases, the increases were much
higher, up to ~25% on one AMD Zen4 machine.

This also makes pg_attribute.attcacheoff redundant.  A follow-on commit
will remove it, thus shrinking the FormData_pg_attribute struct by 4
bytes.

Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com
Reviewed-by: Andres Freund, Victor Yegorov
2024-12-03 16:50:59 +13:00
Peter Eisentraut
7f798aca1d Remove useless casts to (void *)
Many of them just seem to have been copied around for no real reason.
Their presence causes (small) risks of hiding actual type mismatches
or silently discarding qualifiers

Discussion: https://www.postgresql.org/message-id/flat/461ea37c-8b58-43b4-9736-52884e862820@eisentraut.org
2024-11-28 08:27:20 +01:00
Tom Lane
5b8728cd7f Fix NULLIF()'s handling of read-write expanded objects.
If passed a read-write expanded object pointer, the EEOP_NULLIF
code would hand that same pointer to the equality function
and then (unless equality was reported) also return the same
pointer as its value.  This is no good, because a function that
receives a read-write expanded object pointer is fully entitled
to scribble on or even delete the object, thus corrupting the
NULLIF output.  (This problem is likely unobservable with the
equality functions provided in core Postgres, but it's easy to
demonstrate with one coded in plpgsql.)

To fix, make sure the pointer passed to the equality function
is read-only.  We can still return the original read-write
pointer as the NULLIF result, allowing optimization of later
operations.

Per bug #18722 from Alexander Lakhin.  This has been wrong
since we invented expanded objects, so back-patch to all
supported branches.

Discussion: https://postgr.es/m/18722-fd9e645448cc78b4@postgresql.org
2024-11-25 18:09:09 -05:00
Nathan Bossart
cd7ab57532 Ensure cached plans are correctly marked as dependent on role.
If a CTE, subquery, sublink, security invoker view, or coercion
projection references a table with row-level security policies, we
neglected to mark the plan as potentially dependent on which role
is executing it.  This could lead to later executions in the same
session returning or hiding rows that should have been hidden or
returned instead.

Reported-by: Wolfgang Walther
Reviewed-by: Noah Misch
Security: CVE-2024-10976
Backpatch-through: 12
2024-11-11 09:00:00 -06:00
Michael Paquier
e7a9496de9 Add two attributes to pg_stat_database for parallel workers activity
Two attributes are added to pg_stat_database:
* parallel_workers_to_launch, counting the total number of parallel
workers that were planned to be launched.
* parallel_workers_launched, counting the total number of parallel
workers actually launched.

The ratio of both fields can provide hints that there are not enough
slots available when launching parallel workers, also useful when
pg_stat_statements is not deployed on an instance (i.e. cf54a2c002).

This commit relies on de3a2ea3b2, that has added two fields to EState,
that get incremented when executing Gather or GatherMerge nodes.

A test is added in select_parallel, where parallel workers are spawned.

Bump catalog version.

Author: Benoit Lobréau
Discussion: https://postgr.es/m/783bc7f7-659a-42fa-99dd-ee0565644e25@dalibo.com
2024-11-11 10:40:48 +09:00
Tom Lane
b8df690492 Improve fix for not entering parallel mode when holding interrupts.
Commit ac04aa84a put the shutoff for this into the planner, which is
not ideal because it doesn't prevent us from re-using a previously
made parallel plan.  Revert the planner change and instead put the
shutoff into InitializeParallelDSM, modeling it on the existing code
there for recovering from failure to allocate a DSM segment.

However, that code path is mostly untested, and testing a bit harder
showed there's at least one bug: ExecHashJoinReInitializeDSM is not
prepared for us to have skipped doing parallel DSM setup.  I also
thought the Assert in ReinitializeParallelWorkers is pretty
ill-advised, and replaced it with a silent Min() operation.

The existing test case added by ac04aa84a serves fine to test this
version of the fix, so no change needed there.

Patch by me, but thanks to Noah Misch for the core idea that we
could shut off worker creation when !INTERRUPTS_CAN_BE_PROCESSED.
Back-patch to v12, as ac04aa84a was.

Discussion: https://postgr.es/m/CAC-SaSzHUKT=vZJ8MPxYdC_URPfax+yoA1hKTcF4ROz_Q6z0_Q@mail.gmail.com
2024-11-08 13:42:10 -05:00
David Rowley
87f81a5563 Fix hypothetical bug in ExprState building for hashing
adf97c156 gave ExprStates the ability to hash expressions and return a
single hash value.  That commit supports seeding the hash value with an
initial value to have that blended into the final hash value.

Here we fix a hypothetical bug where if there are zero expressions to
hash, the initial value is stored in the wrong location.  The existing
code stored the initial value in an intermediate location expecting that
when the expressions were hashed that those steps would store the final
hash value in the ExprState.resvalue field.  However, that wouldn't happen
when there are zero expressions to hash.  The correct thing to do instead
is to have a special case for zero expressions and when we hit that case,
store the initial value directly in the ExprState.resvalue.  The reason
that this is a hypothetical bug is that no code currently calls
ExecBuildHash32Expr passing a non-zero initial value.

Discussion: https://postgr.es/m/CAApHDvpMAL_zxbMRr1LOex3O7Y7R7ZN2i8iUFLQhqQiJMAg3qw@mail.gmail.com
2024-11-06 09:16:00 +13:00
Daniel Gustafsson
fb7e27abfb Remove duplicate words in comments
A few comments contained duplicate "the" in sentences, fix by removing
one occurrence.

Author: Vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/CALDaNm2aEEiPwGJmPdzBxROVvs8n75yCjKz4K1f1B2TdWpzxTA@mail.gmail.com
2024-10-31 11:38:03 +01:00
David Rowley
3974bc3196 Remove unused field from SubPlanState struct
bf6c614a2 did some conversion work to use ExprState instead of manually
calling equality functions to check if one set of values is not distinct
from another set.  That patch removed many of the fields that became
redundant as a result of that change, but it forgot to remove
SubPlanState.tab_eq_funcs.  Fix that.

In passing, fix the header comment for TupleHashEntryData to correctly
spell the field name it's talking about.

Author: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/CA+FpmFeycdombFzrjZw7Rmc29CVm4OOzCWwu=dVBQ6q=PX8SvQ@mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvrWR2jYVhec=COyF2g2BE_ns91NDsCHAMFiXbyhEujKdQ@mail.gmail.com
2024-10-31 13:44:15 +13:00
Peter Eisentraut
e18512c000 Remove unused #include's from backend .c files
as determined by IWYU

These are mostly issues that are new since commit dbbca2cf29.

Discussion: https://www.postgresql.org/message-id/flat/0df1d5b1-8ca8-4f84-93be-121081bde049%40eisentraut.org
2024-10-27 08:26:50 +01:00
Melanie Plageman
de380a62b5 Make table_scan_bitmap_next_block() async-friendly
Move all responsibility for indicating a block is exhuasted into
table_scan_bitmap_next_tuple() and advance the main iterator in
heap-specific code. This flow control makes more sense and is a step
toward using the read stream API for bitmap heap scans.

Previously, table_scan_bitmap_next_block() returned false to indicate
table_scan_bitmap_next_tuple() should not be called for the tuples on
the page. This happened both when 1) there were no visible tuples on the
page and 2) when the block returned by the iterator was past the end of
the table. BitmapHeapNext() (generic bitmap table scan code) handled the
case when the bitmap was exhausted.

It makes more sense for table_scan_bitmap_next_tuple() to return false
when there are no visible tuples on the page and
table_scan_bitmap_next_block() to return false when the bitmap is
exhausted or there are no more blocks in the table.

As part of this new design, TBMIterateResults are no longer used as a
flow control mechanism in BitmapHeapNext(), so we removed
table_scan_bitmap_next_tuple's TBMIterateResult parameter.

Note that the prefetch iterator is still saved in the
BitmapHeapScanState node and advanced in generic bitmap table scan code.
This is because 1) it was not necessary to change the prefetch iterator
location to change the flow control in BitmapHeapNext() 2) modifying
prefetch iterator management requires several more steps better split
over multiple commits and 3) the prefetch iterator will be removed once
the read stream API is used.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas, Mark Dilger
Discussion: https://postgr.es/m/063e4eb4-32d9-439e-a0b1-75565a9835a8%40iki.fi
2024-10-25 10:11:58 -04:00
Melanie Plageman
7bd7aa4d30 Move EXPLAIN counter increment to heapam_scan_bitmap_next_block
Increment the lossy and exact page counters for EXPLAIN of bitmap heap
scans in heapam_scan_bitmap_next_block(). Note that other table AMs will
need to do this as well

Pushing the counters into heapam_scan_bitmap_next_block() is required to
be able to use the read stream API for bitmap heap scans. The bitmap
iterator must be advanced from inside the read stream callback, so
TBMIterateResults cannot be used as a flow control mechanism in
BitmapHeapNext().

Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Heikki Linnakangas
Discussion: https://postgr.es/m/063e4eb4-32d9-439e-a0b1-75565a9835a8%40iki.fi
2024-10-25 10:11:46 -04:00
Amit Langote
55e6d712af Remove unnecessary word in a comment
Relations opened by the executor are only closed once in
ExecCloseRangeTableRelations(), so the word "again" in the comment
for ExecGetRangeTableRelation() is misleading and unnecessary.

Discussion: https://postgr.es/m/CA+HiwqHnw-zR+u060i3jp4ky5UR0CjByRFQz50oZ05de7wUg=Q@mail.gmail.com
Backpatch-through: 12
2024-10-23 17:54:48 +09:00
Amit Langote
11c87216d1 SQL/JSON: Fix some oversights in commit b6e1157e7
The decision in b6e1157e7 to ignore raw_expr when evaluating a
JsonValueExpr was incorrect.  While its value is not ultimately
used (since formatted_expr's value is), failing to initialize it
can lead to problems, for instance,  when the expression tree in
raw_expr contains Aggref nodes, which must be initialized to
ensure the parent Agg node works correctly.

Also, optimize eval_const_expressions_mutator()'s handling of
JsonValueExpr a bit.  Currently, when formatted_expr cannot be folded
into a constant, we end up processing it twice -- once directly in
eval_const_expressions_mutator() and again recursively via
ece_generic_processing().  This recursive processing is required to
handle raw_expr. To avoid the redundant processing of formatted_expr,
we now  process raw_expr directly in eval_const_expressions_mutator().

Finally, update the comment of JsonValueExpr to describe the roles of
raw_expr and formatted_expr more clearly.

Bug: #18657
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Diagnosed-by: Fabio R. Sluzala <fabio3rs@gmail.com>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18657-1b90ccce2b16bdb8@postgresql.org
Backpatch-through: 16
2024-10-20 12:20:55 +09:00
Thomas Munro
98c7c7152d Fix extreme skew detection in Parallel Hash Join.
After repartitioning the inner side of a hash join that would have
exceeded the allowed size, we check if all the tuples from a parent
partition moved to one child partition.  That is evidence that it
contains duplicate keys and later attempts to repartition will also
fail, so we should give up trying to limit memory (for lack of a better
fallback strategy).

A thinko prevented the check from working correctly in partition 0 (the
one that is partially loaded into memory already).  After
repartitioning, we should check for extreme skew if the *parent*
partition's space_exhausted flag was set, not the child partition's.
The consequence was repeated futile repartitioning until per-partition
data exceeded various limits including "ERROR: invalid DSA memory alloc
request size 1811939328", OS allocation failure, or temporary disk space
errors.  (We could also do something about some of those symptoms, but
that's material for separate patches.)

This problem only became likely when PostgreSQL 16 introduced support
for Parallel Hash Right/Full Join, allowing NULL keys into the hash
table.  Repartitioning always leaves NULL in partition 0, no matter how
many times you do it, because the hash value is all zero bits.  That's
unlikely for other hashed values, but they might still have caused
wasted extra effort before giving up.

Back-patch to all supported releases.

Reported-by: Craig Milhiser <craig@milhiser.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/CA%2BwnhO1OfgXbmXgC4fv_uu%3DOxcDQuHvfoQ4k0DFeB0Qqd-X-rQ%40mail.gmail.com
2024-10-17 22:11:59 +13:00
David Rowley
9ca67658d1 Don't store intermediate hash values in ExprState->resvalue
adf97c156 made it so ExprStates could support hashing and changed Hash
Join to use that instead of manually extracting Datums from tuples and
hashing them one column at a time.

When hashing multiple columns or expressions, the code added in that
commit stored the intermediate hash value in the ExprState's resvalue
field.  That was a mistake as steps may be injected into the ExprState
between each hashing step that look at or overwrite the stored
intermediate hash value.  EEOP_PARAM_SET is an example of such a step.

Here we fix this by adding a new dedicated field for storing
intermediate hash values and adjust the code so that all apart from the
final hashing step store their result in the intermediate field.

In passing, rename a variable so that it's more aligned to the
surrounding code and also so a few lines stay within the 80 char margin.

Reported-by: Andres Freund
Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru>
Discussion: https://postgr.es/m/CAApHDvqo9eenEFXND5zZ9JxO_k4eTA4jKMGxSyjdTrsmYvnmZw@mail.gmail.com
2024-10-17 14:25:08 +13:00
Tom Lane
c96de42c4b Further refine _SPI_execute_plan's rule for atomic execution.
Commit 2dc1deaea turns out to have been still a brick shy of a load,
because CALL statements executing within a plpgsql exception block
could still pass the wrong snapshot to stable functions within the
CALL's argument list.  That happened because standard_ProcessUtility
forces isAtomicContext to true if IsTransactionBlock is true, which
it always will be inside a subtransaction.  Then ExecuteCallStmt
would think it does not need to push a new snapshot --- but
_SPI_execute_plan didn't do so either, since it thought it was in
nonatomic mode.

The best fix for this seems to be for _SPI_execute_plan to operate
in atomic execution mode if IsSubTransaction() is true, even when the
SPI context as a whole is non-atomic.  This makes _SPI_execute_plan
have the same rules about when non-atomic execution is allowed as
_SPI_commit/_SPI_rollback have about when COMMIT/ROLLBACK are allowed,
which seems appropriately symmetric.  (If anyone ever tries to allow
COMMIT/ROLLBACK inside a subtransaction, this would all need to be
rethought ... but I'm unconvinced that such a thing could be logically
consistent at all.)

For further consistency, also check IsSubTransaction() in
SPI_inside_nonatomic_context.  That does not matter for its
one present-day caller StartTransaction, which can't be reached
inside a subtransaction.  But if any other callers ever arise,
they'd presumably want this definition.

Per bug #18656 from Alexander Alehin.  Back-patch to all
supported branches, like previous fixes in this area.

Discussion: https://postgr.es/m/18656-cade1780866ef66c@postgresql.org
2024-10-16 17:36:40 -04:00
Peter Eisentraut
c594f1ad2b Track scan reversals in MergeJoin
The MergeJoin struct was tracking "mergeStrategies", which were an
array of btree strategy numbers, purely for the purpose of comparing
it later against btree strategies to determine if the scan direction
was forward or reverse.  Change that.  Instead, track
"mergeReversals", an array of bool, to indicate the same without an
unfortunate assumption that a strategy number refers specifically to a
btree strategy.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2024-10-14 15:36:18 +02:00
Michael Paquier
de3a2ea3b2 Introduce two fields in EState to track parallel worker activity
These fields can be set by executor nodes to record how many parallel
workers were planned to be launched and how many of them have been
actually launched within the number initially planned.  This data is
able to give an approximation of the parallel worker draught a system
is facing, making easier the tuning of related configuration parameters.

These fields will be used by some follow-up patches to populate other
parts of the system with their data.

Author: Guillaume Lelarge, Benoit Lobréau
Discussion: https://postgr.es/m/783bc7f7-659a-42fa-99dd-ee0565644e25@dalibo.com
Discussion: https://postgr.es/m/CAECtzeWtTGOK0UgKXdDGpfTVSa5bd_VbUt6K6xn8P7X+_dZqKw@mail.gmail.com
2024-10-09 08:07:48 +09:00
Michael Paquier
4572d59e3c Improve style of two code paths
In execGrouping.c, execTuplesMatchPrepare() was doing a memory
allocation that was not necessary when the number of columns was 0.
In foreign.c, pg_options_to_table() was assigning twice a variable to
the same value.

Author: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQAqup0agbSzMjSLSTn=OANyCzxENF1+HrSYnr3WyZib7=Q@mail.gmail.com
2024-10-08 10:51:20 +09:00
Michael Paquier
4dd3087300 Remove assertion checking query ID in execMain.c
This assertion has been added by 24f5205948, but Alexander Lakhin has
proved that the ExecutorRun() one can be broken by using a PL function
that manipulates compute_query_id and track_activities, while the ones
in ExecutorFinish() and ExecutorEnd() could be triggered when cleaning
up portals at the beginning of a new query execution.

Discussion: https://postgr.es/m/b37d8e6c-e83d-e157-8865-1b2460a6aef2@gmail.com
2024-10-04 12:51:17 +09:00
Michael Paquier
5deb56387e Expand assertion check for query ID reporting in executor
As formulated, the assertion added in the executor by 24f5205948 to
check that a query ID is set had two problems:
- track_activities may be disabled while compute_query_id is enabled,
causing the query ID to not be reported to pg_stat_activity.
- debug_query_string may not be set in some context.  The only path
where this would matter is visibly autovacuum, should parallel workers
be enabled there at some point.  This is not the case currently.

There was no test showing the interactions between the query ID and
track_activities, so let's add one based on a scan of pg_stat_activity.
This assertion is still an experimentation at this stage, but let's see
if this shows more paths where query IDs are not properly set while they
should.

Discussion: https://postgr.es/m/Zvn5616oYXmpXyHI@paquier.xyz
2024-10-01 08:56:21 +09:00
Noah Misch
aac2c9b4fd For inplace update durability, make heap_update() callers wait.
The previous commit fixed some ways of losing an inplace update.  It
remained possible to lose one when a backend working toward a
heap_update() copied a tuple into memory just before inplace update of
that tuple.  In catalogs eligible for inplace update, use LOCKTAG_TUPLE
to govern admission to the steps of copying an old tuple, modifying it,
and issuing heap_update().  This includes MERGE commands.  To avoid
changing most of the pg_class DDL, don't require LOCKTAG_TUPLE when
holding a relation lock sufficient to exclude inplace updaters.
Back-patch to v12 (all supported versions).  In v13 and v12, "UPDATE
pg_class" or "UPDATE pg_database" can still lose an inplace update.  The
v14+ UPDATE fix needs commit 86dc90056d,
and it wasn't worth reimplementing that fix without such infrastructure.

Reviewed by Nitin Motiani and (in earlier versions) Heikki Linnakangas.

Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com
2024-09-24 15:25:18 -07:00
David Rowley
5d56d07ca3 Optimize tuplestore usage for WITH RECURSIVE CTEs
nodeRecursiveunion.c makes use of two tuplestores and, until now, would
delete and recreate one of these tuplestores after every recursive
iteration.

Here we adjust that behavior and instead reuse one of the existing
tuplestores and just empty it of all tuples using tuplestore_clear().

This saves some free/malloc roundtrips and has shown a 25-30% performance
improvement for queries that perform very little work between recursive
iterations.

This also paves the way to add some EXPLAIN ANALYZE telemetry output for
recursive common table expressions, similar to what was done in 1eff8279d
and 95d6e9af0.  Previously calling tuplestore_end() would have caused
the maximum storage space used to be lost.

Reviewed-by: Tatsuo Ishii
Discussion: https://postgr.es/m/CAApHDvr9yW0YRiK8A2J7nvyT8g17YzbSfOviEWrghazKZbHbig@mail.gmail.com
2024-09-19 15:20:35 +12:00
Michael Paquier
24f5205948 Add some sanity checks in executor for query ID reporting
This commit adds three sanity checks in code paths of the executor where
it is possible to use hooks, checking that a query ID is reported in
pg_stat_activity if compute_query_id is enabled:
- ExecutorRun()
- ExecutorFinish()
- ExecutorEnd()

This causes the test in pg_stat_statements added in 933848d16d to
complain immediately in ExecutorRun().  The idea behind this commit is
to help extensions to detect if they are missing query ID reports when a
query goes through the executor.  Perhaps this will prove to be a bad
idea, but let's see where this experience goes in v18 and newer
versions.

Reviewed-by: Sami Imseih
Discussion: https://postgr.es/m/ZuJb5xCKHH0A9tMN@paquier.xyz
2024-09-18 14:43:37 +09:00
Michael Paquier
933848d16d Add missing query ID reporting in extended query protocol
This commit adds query ID reports for two code paths when processing
extended query protocol messages:
- When receiving a bind message, setting it to the first Query retrieved
from a cached cache.
- When receiving an execute message, setting it to the first PlannedStmt
stored in a portal.

An advantage of this method is that this is able to cover all the types
of portals handled in the extended query protocol, particularly these
two when the report done in ExecutorStart() is not enough (neither is an
addition in ExecutorRun(), actually, for the second point):
- Multiple execute messages, with multiple ExecutorRun().
- Portal with execute/fetch messages, like a query with a RETURNING
clause and a fetch size that stores the tuples in a first execute
message going though ExecutorStart() and ExecuteRun(), followed by one
or more execute messages doing only fetches from the tuplestore created
in the first message.  This corresponds to the case where
execute_is_fetch is set, for example.

Note that the query ID reporting done in ExecutorStart() is still
necessary, as an EXECUTE requires it.  Query ID reporting is optimistic
and more calls to pgstat_report_query_id() don't matter as the first
report takes priority except if the report is forced.  The comment in
ExecutorStart() is adjusted to reflect better the reality with the
extended query protocol.

The test added in pg_stat_statements is a courtesy of Robert Haas.  This
uses psql's \bind metacommand, hence this part is backpatched down to
v16.

Reported-by:  Kaido Vaikla, Erik Wienhold
Author: Sami Imseih
Reviewed-by: Jian He, Andrei Lepikhov, Michael Paquier
Discussion: https://postgr.es/m/CA+427g8DiW3aZ6pOpVgkPbqK97ouBdf18VLiHFesea2jUk3XoQ@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZxtnf_jZ=VqBSyaU8hfUkkwoJCJ6ufy4LGpXaunKrjrg@mail.gmail.com
Discussion: https://postgr.es/m/1391613709.939460.1684777418070@office.mailbox.org
Backpatch-through: 14
2024-09-18 09:59:09 +09:00
Peter Eisentraut
fc0438b4e8 Add temporal PRIMARY KEY and UNIQUE constraints
Add WITHOUT OVERLAPS clause to PRIMARY KEY and UNIQUE constraints.
These are backed by GiST indexes instead of B-tree indexes, since they
are essentially exclusion constraints with = for the scalar parts of
the key and && for the temporal part.

(previously committed as 46a0cd4cef, reverted by 46a0cd4cefb; the new
part is this:)

Because 'empty' && 'empty' is false, the temporal PK/UQ constraint
allowed duplicates, which is confusing to users and breaks internal
expectations.  For instance, when GROUP BY checks functional
dependencies on the PK, it allows selecting other columns from the
table, but in the presence of duplicate keys you could get the value
from any of their rows.  So we need to forbid empties.

This all means that at the moment we can only support ranges and
multiranges for temporal PK/UQs, unlike the original patch (above).
Documentation and tests for this are added.  But this could
conceivably be extended by introducing some more general support for
the notion of "empty" for other types.

Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
2024-09-17 11:29:30 +02:00
Amit Langote
dd8bea88ab SQL/JSON: Avoid initializing unnecessary ON ERROR / ON EMPTY steps
When the ON ERROR / ON EMPTY behavior is to return NULL, returning
NULL directly from ExecEvalJsonExprPath() suffices. Therefore, there's
no need to create separate steps to check the error/empty flag or
those to evaluate the the constant NULL expression.  This speeds up
common cases because the default ON ERROR / ON EMPTY behavior for
JSON_QUERY() and JSON_VALUE() is to return NULL.  However, these steps
are necessary if the RETURNING type is a domain, as constraints on the
domain may need to be checked.

Reported-by: Jian He <jian.universality@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com
Backpatch-through: 17
2024-09-09 13:46:58 +09:00
Amit Langote
4d7e24e0f4 Revert recent SQL/JSON related commits
Reverts 68222851d5, 565caaa79a, and 3a97460970, because a few
BF animals didn't like one or all of them.
2024-09-06 12:53:01 +09:00
Amit Langote
3a97460970 SQL/JSON: Avoid initializing unnecessary ON ERROR / ON EMPTY steps
When the ON ERROR / ON EMPTY behavior is to return NULL, returning
NULL directly from ExecEvalJsonExprPath() suffices. Therefore, there's
no need to create separate steps to check the error/empty flag or
those to evaluate the the constant NULL expression.  This speeds up
common cases because the default ON ERROR / ON EMPTY behavior for
JSON_QUERY() and JSON_VALUE() is to return NULL.  However, these steps
are necessary if the RETURNING type is a domain, as constraints on the
domain may need to be checked.

Reported-by: Jian He <jian.universality@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com
Backpatch-through: 17
2024-09-06 12:05:40 +09:00
Peter Eisentraut
4af123ad45 Fix misleading error message context
Author: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Stepan Neretin <sncfmgg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRAw+OkVW=FgMKHKyvY3CgtWy3cWdY7XT+S5TJaTttu=oA@mail.gmail.com
2024-09-05 15:19:00 +02:00