1
0
mirror of https://github.com/postgres/postgres.git synced 2025-12-19 17:02:53 +03:00

Make row compares robust during nbtree array scans.

Recent nbtree bugfix commit 5f4d98d4 added a special case to the code
that sets up a page-level prefix of keys that are definitely satisfied
by every tuple on the page: whenever _bt_set_startikey reached a row
compare key, we'd refuse to apply the pstate.forcenonrequired behavior
in scans where that usually happens (scans with a higher-order array
key).  That hack made the scan avoid essentially the same infinite
cycling behavior that also affected nbtree scans with redundant keys
(keys that preprocessing could not eliminate) prior to commit f09816a0.
There are now serious doubts about this row compare workaround.

Testing has shown that a scan with a row compare key and an array key
could still read the same leaf page twice (without the scan's direction
changing), which isn't supposed to be possible following the SAOP
enhancements added by Postgres 17 commit 5bf748b8.  Also, we still
allowed a required row compare key to be used with forcenonrequired mode
when its header key happened to be beyond the pstate.ikey set by
_bt_set_startikey, which was complicated and brittle.

The underlying problem was that row compares had inconsistent rules
around how scans start (which keys can be used for initial positioning
purposes) and how scans end (which keys can set continuescan=false).
Quals with redundant keys that could not be eliminated by preprocessing
also had that same quality to them prior to today's bugfix f09816a0.  It
now seems prudent to bring row compare keys in line with the new charter
for required keys, by making the start and end rules symmetric.

This commit fixes two points of disagreement between _bt_first and
_bt_check_rowcompare.  Firstly, _bt_check_rowcompare was capable of
ending the scan at the point where it needed to compare an ISNULL-marked
row compare member that came immediately after a required row compare
member.  _bt_first now has symmetric handling for NULL row compares.
Secondly, _bt_first had its own ideas about which keys were safe to use
for initial positioning purposes.  It could use fewer or more keys than
_bt_check_rowcompare.  _bt_first now uses the same requiredness markings
as _bt_check_rowcompare for this.

Now that _bt_first and _bt_check_rowcompare agree on how to start and
end scans, we can get rid of the forcenonrequired special case, without
any risk of infinite cycling.  This approach also makes row compare keys
behave more like regular scalar keys, particularly within _bt_first.

Fixing these inconsistencies necessitates dealing with a related issue
with the way that row compares were marked required by preprocessing: we
didn't mark any lower-order row members required following 2016 bugfix
commit a298a1e0.  That approach was over broad.  The bug in question was
actually an oversight in how _bt_check_rowcompare dealt with tuple NULL
values that failed to satisfy a scan key marked required in the opposite
scan direction (it was a bug in 2011 commits 6980f817 and 882368e8, not
a bug in 2006 commit 3a0a16cb).  Go back to marking row compare members
as required using the original 2006 rules, and fix the 2016 bug in a
more principled way: by limiting use of the "set continuescan=false with
a key required in the opposite scan direction upon encountering a NULL
tuple value" optimization to the first/most significant row member key.
While it isn't safe to use an implied IS NOT NULL qualifier to end the
scan when it comes from a required lower-order row compare member key,
it _is_ generally safe for such a required member key to end the scan --
provided the key is marked required in the _current_ scan direction.

This fixes what was arguably an oversight in either commit 5f4d98d4 or
commit 8a510275.  It is a direct follow-up to today's commit f09816a0.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Discussion: https://postgr.es/m/CAH2-Wz=pcijHL_mA0_TJ5LiTB28QpQ0cGtT-ccFV=KzuunNDDQ@mail.gmail.com
Backpatch-through: 18
This commit is contained in:
Peter Geoghegan
2025-07-02 09:48:15 -04:00
parent f09816a0a7
commit bd3f59fdb7
5 changed files with 412 additions and 229 deletions

View File

@@ -195,54 +195,123 @@ ORDER BY proname DESC, proargtypes DESC, pronamespace DESC LIMIT 1;
(1 row)
--
-- Add coverage for RowCompare quals whose rhs row has a NULL that ends scan
-- Forwards scan RowCompare qual whose row arg has a NULL that affects our
-- initial positioning strategy
--
explain (costs off)
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE proname = 'abs' AND (proname, proargtypes) < ('abs', NULL)
WHERE (proname, proargtypes) >= ('abs', NULL) AND proname <= 'abs'
ORDER BY proname, proargtypes, pronamespace;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Index Only Scan using pg_proc_proname_args_nsp_index on pg_proc
Index Cond: ((ROW(proname, proargtypes) < ROW('abs'::name, NULL::oidvector)) AND (proname = 'abs'::name))
Index Cond: ((ROW(proname, proargtypes) >= ROW('abs'::name, NULL::oidvector)) AND (proname <= 'abs'::name))
(2 rows)
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE proname = 'abs' AND (proname, proargtypes) < ('abs', NULL)
WHERE (proname, proargtypes) >= ('abs', NULL) AND proname <= 'abs'
ORDER BY proname, proargtypes, pronamespace;
proname | proargtypes | pronamespace
---------+-------------+--------------
(0 rows)
--
-- Add coverage for backwards scan RowCompare quals whose rhs row has a NULL
-- that ends scan
-- Forwards scan RowCompare quals whose row arg has a NULL that ends scan
--
explain (costs off)
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE proname = 'abs' AND (proname, proargtypes) > ('abs', NULL)
ORDER BY proname DESC, proargtypes DESC, pronamespace DESC;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Index Only Scan Backward using pg_proc_proname_args_nsp_index on pg_proc
Index Cond: ((ROW(proname, proargtypes) > ROW('abs'::name, NULL::oidvector)) AND (proname = 'abs'::name))
WHERE proname >= 'abs' AND (proname, proargtypes) < ('abs', NULL)
ORDER BY proname, proargtypes, pronamespace;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Index Only Scan using pg_proc_proname_args_nsp_index on pg_proc
Index Cond: ((proname >= 'abs'::name) AND (ROW(proname, proargtypes) < ROW('abs'::name, NULL::oidvector)))
(2 rows)
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE proname = 'abs' AND (proname, proargtypes) > ('abs', NULL)
WHERE proname >= 'abs' AND (proname, proargtypes) < ('abs', NULL)
ORDER BY proname, proargtypes, pronamespace;
proname | proargtypes | pronamespace
---------+-------------+--------------
(0 rows)
--
-- Backwards scan RowCompare qual whose row arg has a NULL that affects our
-- initial positioning strategy
--
explain (costs off)
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE proname >= 'abs' AND (proname, proargtypes) <= ('abs', NULL)
ORDER BY proname DESC, proargtypes DESC, pronamespace DESC;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Index Only Scan Backward using pg_proc_proname_args_nsp_index on pg_proc
Index Cond: ((proname >= 'abs'::name) AND (ROW(proname, proargtypes) <= ROW('abs'::name, NULL::oidvector)))
(2 rows)
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE proname >= 'abs' AND (proname, proargtypes) <= ('abs', NULL)
ORDER BY proname DESC, proargtypes DESC, pronamespace DESC;
proname | proargtypes | pronamespace
---------+-------------+--------------
(0 rows)
--
-- Add coverage for recheck of > key following array advancement on previous
-- (left sibling) page that used a high key whose attribute value corresponding
-- to the > key was -inf (due to being truncated when the high key was created).
-- Backwards scan RowCompare qual whose row arg has a NULL that ends scan
--
explain (costs off)
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE (proname, proargtypes) > ('abs', NULL) AND proname <= 'abs'
ORDER BY proname DESC, proargtypes DESC, pronamespace DESC;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Index Only Scan Backward using pg_proc_proname_args_nsp_index on pg_proc
Index Cond: ((ROW(proname, proargtypes) > ROW('abs'::name, NULL::oidvector)) AND (proname <= 'abs'::name))
(2 rows)
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE (proname, proargtypes) > ('abs', NULL) AND proname <= 'abs'
ORDER BY proname DESC, proargtypes DESC, pronamespace DESC;
proname | proargtypes | pronamespace
---------+-------------+--------------
(0 rows)
-- Makes B-Tree preprocessing deal with unmarking redundant keys that were
-- initially marked required (test case relies on current row compare
-- preprocessing limitations)
explain (costs off)
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE proname = 'zzzzzz' AND (proname, proargtypes) > ('abs', NULL)
AND pronamespace IN (1, 2, 3) AND proargtypes IN ('26 23', '5077')
ORDER BY proname, proargtypes, pronamespace;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using pg_proc_proname_args_nsp_index on pg_proc
Index Cond: ((ROW(proname, proargtypes) > ROW('abs'::name, NULL::oidvector)) AND (proname = 'zzzzzz'::name) AND (proargtypes = ANY ('{"26 23",5077}'::oidvector[])) AND (pronamespace = ANY ('{1,2,3}'::oid[])))
(2 rows)
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE proname = 'zzzzzz' AND (proname, proargtypes) > ('abs', NULL)
AND pronamespace IN (1, 2, 3) AND proargtypes IN ('26 23', '5077')
ORDER BY proname, proargtypes, pronamespace;
proname | proargtypes | pronamespace
---------+-------------+--------------
(0 rows)
--
-- Performs a recheck of > key following array advancement on previous (left
-- sibling) page that used a high key whose attribute value corresponding to
-- the > key was -inf (due to being truncated when the high key was created).
--
-- XXX This relies on the assumption that tenk1_thous_tenthous has a truncated
-- high key "(183, -inf)" on the first page that we'll scan. The test will only

View File

@@ -143,38 +143,83 @@ SELECT proname, proargtypes, pronamespace
ORDER BY proname DESC, proargtypes DESC, pronamespace DESC LIMIT 1;
--
-- Add coverage for RowCompare quals whose rhs row has a NULL that ends scan
-- Forwards scan RowCompare qual whose row arg has a NULL that affects our
-- initial positioning strategy
--
explain (costs off)
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE proname = 'abs' AND (proname, proargtypes) < ('abs', NULL)
WHERE (proname, proargtypes) >= ('abs', NULL) AND proname <= 'abs'
ORDER BY proname, proargtypes, pronamespace;
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE proname = 'abs' AND (proname, proargtypes) < ('abs', NULL)
WHERE (proname, proargtypes) >= ('abs', NULL) AND proname <= 'abs'
ORDER BY proname, proargtypes, pronamespace;
--
-- Add coverage for backwards scan RowCompare quals whose rhs row has a NULL
-- that ends scan
-- Forwards scan RowCompare quals whose row arg has a NULL that ends scan
--
explain (costs off)
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE proname = 'abs' AND (proname, proargtypes) > ('abs', NULL)
WHERE proname >= 'abs' AND (proname, proargtypes) < ('abs', NULL)
ORDER BY proname, proargtypes, pronamespace;
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE proname >= 'abs' AND (proname, proargtypes) < ('abs', NULL)
ORDER BY proname, proargtypes, pronamespace;
--
-- Backwards scan RowCompare qual whose row arg has a NULL that affects our
-- initial positioning strategy
--
explain (costs off)
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE proname >= 'abs' AND (proname, proargtypes) <= ('abs', NULL)
ORDER BY proname DESC, proargtypes DESC, pronamespace DESC;
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE proname = 'abs' AND (proname, proargtypes) > ('abs', NULL)
WHERE proname >= 'abs' AND (proname, proargtypes) <= ('abs', NULL)
ORDER BY proname DESC, proargtypes DESC, pronamespace DESC;
--
-- Add coverage for recheck of > key following array advancement on previous
-- (left sibling) page that used a high key whose attribute value corresponding
-- to the > key was -inf (due to being truncated when the high key was created).
-- Backwards scan RowCompare qual whose row arg has a NULL that ends scan
--
explain (costs off)
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE (proname, proargtypes) > ('abs', NULL) AND proname <= 'abs'
ORDER BY proname DESC, proargtypes DESC, pronamespace DESC;
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE (proname, proargtypes) > ('abs', NULL) AND proname <= 'abs'
ORDER BY proname DESC, proargtypes DESC, pronamespace DESC;
-- Makes B-Tree preprocessing deal with unmarking redundant keys that were
-- initially marked required (test case relies on current row compare
-- preprocessing limitations)
explain (costs off)
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE proname = 'zzzzzz' AND (proname, proargtypes) > ('abs', NULL)
AND pronamespace IN (1, 2, 3) AND proargtypes IN ('26 23', '5077')
ORDER BY proname, proargtypes, pronamespace;
SELECT proname, proargtypes, pronamespace
FROM pg_proc
WHERE proname = 'zzzzzz' AND (proname, proargtypes) > ('abs', NULL)
AND pronamespace IN (1, 2, 3) AND proargtypes IN ('26 23', '5077')
ORDER BY proname, proargtypes, pronamespace;
--
-- Performs a recheck of > key following array advancement on previous (left
-- sibling) page that used a high key whose attribute value corresponding to
-- the > key was -inf (due to being truncated when the high key was created).
--
-- XXX This relies on the assumption that tenk1_thous_tenthous has a truncated
-- high key "(183, -inf)" on the first page that we'll scan. The test will only