1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-25 12:03:53 +03:00

Disable parallel plans for RIGHT_SEMI joins

RIGHT_SEMI joins rely on the HEAP_TUPLE_HAS_MATCH flag to guarantee
that only the first match for each inner tuple is considered.
However, in a parallel hash join, the inner relation is stored in a
shared global hash table that can be probed by multiple workers
concurrently.  This allows different workers to inspect and set the
match flags of the same inner tuples at the same time.

If two workers probe the same inner tuple concurrently, both may see
the match flag as unset and emit the same tuple, leading to duplicate
output rows and violating RIGHT_SEMI join semantics.

For now, we disable parallel plans for RIGHT_SEMI joins.  In the long
term, it may be possible to support parallel execution by performing
atomic operations on the match flag, for example using a CAS or
similar mechanism.

Backpatch to v18, where RIGHT_SEMI join was introduced.

Bug: #19094
Reported-by: Lori Corbani <Lori.Corbani@jax.org>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19094-6ed410eb5b256abd@postgresql.org
Backpatch-through: 18
This commit is contained in:
Richard Guo
2025-10-30 11:58:45 +09:00
parent 50eb4e1181
commit 257ee78341
3 changed files with 62 additions and 6 deletions

View File

@@ -3080,6 +3080,33 @@ select * from tbl_rs t1 join
3 | 3 | 4 | 4
(6 rows)
--
-- regression test for bug with parallel-hash-right-semi join
--
begin;
-- encourage use of parallel plans
set local parallel_setup_cost=0;
set local parallel_tuple_cost=0;
set local min_parallel_table_scan_size=0;
set local max_parallel_workers_per_gather=4;
-- ensure we don't get parallel hash right semi join
explain (costs off)
select * from tenk1 t1
where exists (select 1 from tenk1 t2 where fivethous = t1.fivethous)
and t1.fivethous < 5;
QUERY PLAN
--------------------------------------------------
Gather
Workers Planned: 4
-> Parallel Hash Semi Join
Hash Cond: (t1.fivethous = t2.fivethous)
-> Parallel Seq Scan on tenk1 t1
Filter: (fivethous < 5)
-> Parallel Hash
-> Parallel Seq Scan on tenk1 t2
(8 rows)
rollback;
--
-- regression test for bug #13908 (hash join with skew tuples & nbatch increase)
--

View File

@@ -759,6 +759,26 @@ select * from tbl_rs t1 join
(select t1.a+t3.a from tbl_rs t3) and t2.a < 5)
on true;
--
-- regression test for bug with parallel-hash-right-semi join
--
begin;
-- encourage use of parallel plans
set local parallel_setup_cost=0;
set local parallel_tuple_cost=0;
set local min_parallel_table_scan_size=0;
set local max_parallel_workers_per_gather=4;
-- ensure we don't get parallel hash right semi join
explain (costs off)
select * from tenk1 t1
where exists (select 1 from tenk1 t2 where fivethous = t1.fivethous)
and t1.fivethous < 5;
rollback;
--
-- regression test for bug #13908 (hash join with skew tuples & nbatch increase)
--