1
0
mirror of https://github.com/postgres/postgres.git synced 2025-05-28 05:21:27 +03:00

60120 Commits

Author SHA1 Message Date
David Rowley
db448ce5ad Optimize alignment calculations in tuple form/deform
Here we convert CompactAttribute.attalign from a char, which is directly
derived from pg_attribute.attalign into a uint8, which stores the number
of bytes to align the column's value by in the tuple.

This allows tuple deformation and tuple size calculations to move away
from using the inefficient att_align_nominal() macro, which manually
checks each TYPALIGN_* char to translate that into the alignment bytes
for the given type.  Effectively, this commit changes those to TYPEALIGN
calls, which are branchless and only perform some simple arithmetic with
some bit-twiddling.

The removed branches were often mispredicted by CPUs, especially so in
real-world tables which often contain a mishmash of different types
with different alignment requirements.

Author: David Rowley
Reviewed-by: Andres Freund, Victor Yegorov
Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com
2024-12-21 09:43:26 +13:00
Heikki Linnakangas
1f81b48a9d Mark CatalogSnapshotData static
Like CurrentSnapshotData, it should not be accessed directly outside
snapmgr.c.
2024-12-20 19:37:50 +02:00
Heikki Linnakangas
d5a7bd5670 Fix variable reference in comment
This used to say "nsubxcnt isn't decreased when subtransactions
abort", but there's no variable called nsubxcnt. Commit 8548ddc61b
changed it to "subxcnt", among other typo fixes, but that was wrong
too: the comment actually talks about txn->nsubtxns. That's the field
that's incremented but never decremented and is used for the
allocation earlier in the function.
2024-12-20 19:36:33 +02:00
Melanie Plageman
94bb6c4410 Fix overflow danger in SampleHeapTupleVisible(), take 2
28328ec87b45725 addressed one overflow danger in
SampleHeapTupleVisible() but introduced another, albeit a less likely
one. Modify the binary search code to remove this danger.

Reported-by: Richard Guo
Reviewed-by: Richard Guo, Ranier Vilela
Discussion: https://postgr.es/m/CAMbWs4_bE%2BNscChbKWzw6HZOipCUyXfA5133qvoXQ654D3B2gQ%40mail.gmail.com
2024-12-20 09:43:44 -05:00
Thomas Munro
38c579b089 Fix corruption when relation truncation fails.
RelationTruncate() does three things, while holding an
AccessExclusiveLock and preventing checkpoints:

1. Logs the truncation.
2. Drops buffers, even if they're dirty.
3. Truncates some number of files.

Step 2 could previously be canceled if it had to wait for I/O, and step
3 could and still can fail in file APIs.  All orderings of these
operations have data corruption hazards if interrupted, so we can't give
up until the whole operation is done.  When dirty pages were discarded
but the corresponding blocks were left on disk due to ERROR, old page
versions could come back from disk, reviving deleted data (see
pgsql-bugs #18146 and several like it).  When primary and standby were
allowed to disagree on relation size, standbys could panic (see
pgsql-bugs #18426) or revive data unknown to visibility management on
the primary (theorized).

Changes:

 * WAL is now unconditionally flushed first
 * smgrtruncate() is now called in a critical section, preventing
   interrupts and causing PANIC on file API failure
 * smgrtruncate() has a new parameter for existing fork sizes,
   because it can't call smgrnblocks() itself inside a critical section

The changes apply to RelationTruncate(), smgr_redo() and
pg_truncate_visibility_map().  That last is also brought up to date with
other evolutions of the truncation protocol.

The VACUUM FileTruncate() failure mode had been discussed in older
reports than the ones referenced below, with independent analysis from
many people, but earlier theories on how to fix it were too complicated
to back-patch.  The more recently invented cancellation bug was
diagnosed by Alexander Lakhin.  Other corruption scenarios were spotted
by me while iterating on this patch and earlier commit 75818b3a.

Back-patch to all supported releases.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reported-by: rootcause000@gmail.com
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/18146-04e908c662113ad5%40postgresql.org
Discussion: https://postgr.es/m/18426-2d18da6586f152d6%40postgresql.org
2024-12-20 23:57:02 +13:00
David Rowley
02a8d0c452 Remove pg_attribute.attcacheoff column
The column is no longer needed as the offset is now cached in the
CompactAttribute struct per commit 5983a4cff.

Author: David Rowley
Reviewed-by: Andres Freund, Victor Yegorov
Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com
2024-12-20 23:22:37 +13:00
Michael Paquier
546371599e Relax regression test for fsync check of backend-level stats
One test added in 9aea73fc61d4 did not take into account that the
backend may have some fsync even after a checkpoint.  Let's relax it to
be more flexible.

Per report from buildfarm member grassquit, via Alexander Lakhin.

Author: Bertrand Drouvot
Discussion: https://postgr.es/m/6143ab0a-9e88-4790-8d9d-50ba45657761@gmail.com
2024-12-20 19:00:18 +09:00
David Rowley
5983a4cffc Introduce CompactAttribute array in TupleDesc, take 2
The new compact_attrs array stores a few select fields from
FormData_pg_attribute in a more compact way, using only 16 bytes per
column instead of the 104 bytes that FormData_pg_attribute uses.  Using
CompactAttribute allows performance-critical operations such as tuple
deformation to be performed without looking at the FormData_pg_attribute
element in TupleDesc which means fewer cacheline accesses.

For some workloads, tuple deformation can be the most CPU intensive part
of processing the query.  Some testing with 16 columns on a table
where the first column is variable length showed around a 10% increase in
transactions per second for an OLAP type query performing aggregation on
the 16th column.  However, in certain cases, the increases were much
higher, up to ~25% on one AMD Zen4 machine.

This also makes pg_attribute.attcacheoff redundant.  A follow-on commit
will remove it, thus shrinking the FormData_pg_attribute struct by 4
bytes.

Author: David Rowley
Reviewed-by: Andres Freund, Victor Yegorov
Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com
2024-12-20 22:31:26 +13:00
Melanie Plageman
8ac0021b6f Remove final mention of FREEZE_PAGE from comments
b7493e1ab35 removed leftover mentions of XLOG_HEAP2_FREEZE_PAGE records
from comments but neglected to remove one mention of FREEZE_PAGE.

Reported off-list by Alexander Lakhin
2024-12-19 18:52:19 -05:00
Tom Lane
e0a2721f7c Get rid of old version of BuildTupleHashTable().
It was reasonable to preserve the old API of BuildTupleHashTable()
in the back branches, but in HEAD we should actively discourage use
of that version.  There are no remaining callers in core, so just
get rid of it.  Then rename BuildTupleHashTableExt() back to
BuildTupleHashTable().

While at it, fix up the miserably-poorly-maintained header comment
for BuildTupleHashTable[Ext].  It looks like more than one patch in
this area has had the opinion that updating comments is beneath them.

Discussion: https://postgr.es/m/538343.1734646986@sss.pgh.pa.us
2024-12-19 18:07:00 -05:00
Tom Lane
f0b900086a Use ExecGetCommonSlotOps infrastructure in more places.
Append, MergeAppend, and RecursiveUnion can all use the support
functions added in commit 276279295.  The first two can report a
fixed result slot type if all their children return the same fixed
slot type.  That does nothing for the append step itself, but might
allow optimizations in the parent plan node.  RecursiveUnion can
optimize tuple hash table operations in the same way as SetOp now
does.

Patch by me; thanks to Richard Guo and David Rowley for review.

Discussion: https://postgr.es/m/1850138.1731549611@sss.pgh.pa.us
2024-12-19 17:07:14 -05:00
Tom Lane
8d96f57d5c Improve planner's handling of SetOp plans.
Remove the code for inserting flag columns in the inputs of a SetOp.
That was the only reason why there would be resjunk columns in a
set-operations plan tree, so we can get rid of some code that
supported that, too.

Get rid of choose_hashed_setop() in favor of building Paths for
the hashed and sorted alternatives, and letting them fight it out
within add_path().

Remove set_operation_ordered_results_useful(), which was giving wrong
answers due to examining the wrong ancestor node: we need to examine
the immediate SetOperationStmt parent not the topmost node.  Instead
make each caller of recurse_set_operations() pass down the relevant
parent node.  (This thinko seems to have led only to wasted planning
cycles and possibly-inferior plans, not wrong query answers.  Perhaps
we should back-patch it, but I'm not doing so right now.)

Teach generate_nonunion_paths() to consider pre-sorted inputs for
sorted SetOps, rather than always generating a Sort node.

Patch by me; thanks to Richard Guo and David Rowley for review.

Discussion: https://postgr.es/m/1850138.1731549611@sss.pgh.pa.us
2024-12-19 17:02:25 -05:00
Tom Lane
2762792952 Convert SetOp to read its inputs as outerPlan and innerPlan.
The original design for set operations involved appending the two
input relations into one and adding a flag column that allows
distinguishing which side each row came from.  Then the SetOp node
pries them apart again based on the flag.  This is bizarre.  The
only apparent reason to do it is that when sorting, we'd only need
one Sort node not two.  But since sorting is at least O(N log N),
sorting all the data is actually worse than sorting each side
separately --- plus, we have no chance of taking advantage of
presorted input.  On top of that, adding the flag column frequently
requires an additional projection step that adds cycles, and then
the Append node isn't free either.  Let's get rid of all of that
and make the SetOp node have two separate children, using the
existing outerPlan/innerPlan infrastructure.

This initial patch re-implements nodeSetop.c and does a bare minimum
of work on the planner side to generate correctly-shaped plans.
In particular, I've tried not to change the cost estimates here,
so that the visible changes in the regression test results will only
involve removal of useless projection steps and not any changes in
whether to use sorted vs hashed mode.

For SORTED mode, we combine successive identical tuples from each
input into groups, and then merge-join the groups.  The tuple
comparisons now use SortSupport instead of simple equality, but
the group-formation part should involve roughly the same number of
tuple comparisons as before.  The cross-comparisons between left and
right groups probably add to that, but I'm not sure to quantify how
many more comparisons we might need.

For HASHED mode, nodeSetop's logic is almost the same as before,
just refactored into two separate loops instead of one loop that
has an assumption that it will see all the left-hand inputs first.

In both modes, I added early-exit logic to not bother reading the
right-hand relation if the left-hand input is empty, since neither
INTERSECT nor EXCEPT modes can produce any output if the left input
is empty.  This could have been done before in the hashed mode, but
not in sorted mode.  Sorted mode can also stop as soon as it exhausts
the left input; any remaining right-hand tuples cannot have matches.

Also, this patch adds some infrastructure for detecting whether
child plan nodes all output the same type of tuple table slot.
If they do, the hash table logic can use slightly more efficient
code based on assuming that that's the input slot type it will see.
We'll make use of that infrastructure in other plan node types later.

Patch by me; thanks to Richard Guo and David Rowley for review.

Discussion: https://postgr.es/m/1850138.1731549611@sss.pgh.pa.us
2024-12-19 16:23:45 -05:00
Melanie Plageman
2128cebcdb Remove extra prefetch iterator setup for Bitmap Table Scan
1a0da347a7ac98db replaced Bitmap Table Scan's separate private and
shared bitmap iterators with a unified iterator. It accidentally set up
the prefetch iterator twice for non-parallel bitmap table scans. Remove
the extra set up call to tbm_begin_iterate().
2024-12-19 11:55:18 -05:00
Melanie Plageman
754c610e13 Fix bitmap table scan crash on iterator release
1a0da347a7ac98db replaced Bitmap Table Scan's individual private and
shared iterators with a unified iterator. It neglected, however, to
check if the iterator had already been cleaned up before doing so on
rescan. Add this check both on rescan and end scan to be safe.

Reported-by: Richard Guo
Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs48nrhcLY1kcd-u9oD%2B6yiS631F_8Fx8ZGsO-BYDwH%2Bbyw%40mail.gmail.com
2024-12-19 11:55:03 -05:00
Peter Geoghegan
31b0a8f040 Avoid nbtree index scan SAOP scanBehind confusion.
Consistently reset so->scanBehind at the beginning of nbtree array
advancement, even during sktrig_required=false calls (calls where array
advancement is triggered by an unsatisfied non-required array scan key).
Otherwise, it's possible for queries to fail to return all relevant
tuples to the scan given a low-order required scan key that was
previously deemed "satisfied" by a truncated high key attribute value.
This only happened at the point where a later non-required array scan
key needed to be "advanced" once on the next leaf page (that is, once
the right sibling of the truncated high key page was reached).

The underlying issue was that later code within _bt_advance_array_keys
assumed that the so->scanBehind flag must have been set using the
current page's high key (not the previous page's high key).  Any later
successful recheck call to _bt_check_compare would therefore spuriously
be prevented from making _bt_advance_array_keys return true, based on
the faulty belief that the truncated attribute must be from the scan's
current tuple (i.e. the non-pivot tuple at the start of the next page).
_bt_advance_array_keys would return false for the tuple, ultimately
resulting in _bt_checkkeys failing to return a matching tuple.

Oversight in commit 5bf748b8, which enhanced nbtree ScalarArrayOp
execution.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkJKncfqyAUTeuB5GgRhT1vhsWO2q11dbZNqKmvjopP_g@mail.gmail.com
Backpatch: 17-, where commit 5bf748b8 first appears.
2024-12-19 11:08:55 -05:00
Peter Eisentraut
3e4bacb171 bootstrap: pure parser and reentrant scanner
Use the flex %option reentrant and the bison option %pure-parser to
make the generated scanner and parser pure, reentrant, and
thread-safe.

Make the generated scanner use palloc() etc. instead of malloc() etc.

For the bootstrap scanner and parser, reentrancy and memory management
aren't that important, but we make this change here anyway so that all
the scanners and parsers in the backend use a similar set of options
and APIs.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://www.postgresql.org/message-id/flat/eb6faeac-2a8a-4b69-9189-c33c520e5b7b@eisentraut.org
2024-12-19 15:37:44 +01:00
Peter Eisentraut
399d0f1e11 Small whitespace improvement
Author: Andreas Karlsson <andreas@proxel.se>
Discussion: https://www.postgresql.org/message-id/flat/eb6faeac-2a8a-4b69-9189-c33c520e5b7b@eisentraut.org
2024-12-19 13:00:31 +01:00
Peter Eisentraut
382092a0cd Prevent redeclaration of typedef yyscan_t
Fix for 1f0de66ea2a: We need to prevent redeclaration of typedef
yyscan_t.  (This will work with C11 but not currently with C99.)  The
generated scanner files provide their own typedef, but we also need to
provide one for the interfaces that we expose.  So we need to add some
preprocessor guards to avoid a redefinition.  (This is how the
generated scanner files do it internally as well.)  This way
everything now works independent of the order in which things are
included.

Discussion: https://www.postgresql.org/message-id/flat/eb6faeac-2a8a-4b69-9189-c33c520e5b7b@eisentraut.org
2024-12-19 11:24:43 +01:00
Michael Paquier
9aea73fc61 Add backend-level statistics to pgstats
This adds a new variable-numbered statistics kind in pgstats, where the
object ID key of the stats entries is based on the proc number of the
backends.  This acts as an upper-bound for the number of stats entries
that can exist at once.  The entries are created when a backend starts
after authentication succeeds, and are removed when the backend exits,
making the stats entry exist for as long as their backend is up and
running.  These are not written to the pgstats file at shutdown (note
that write_to_file is disabled, as a safety measure).

Currently, these stats include only information about the I/O generated
by a backend, using the same layer as pg_stat_io, except that it is now
possible to know how much activity is happening in each backend rather
than an overall aggregate of all the activity.  A function called
pg_stat_get_backend_io() is added to access this data depending on the
PID of a backend.  The existing structure could be expanded in the
future to add more information about other statistics related to
backends, depending on requirements or ideas.

Auxiliary processes are not included in this set of statistics.  These
are less interesting to have than normal backends as they have dedicated
entries in pg_stat_io, and stats kinds of their own.

This commit includes also pg_stat_reset_backend_stats(), function able
to reset all the stats associated to a single backend.

Bump catalog version and PGSTAT_FILE_FORMAT_ID.

Author: Bertrand Drouvot
Reviewed-by: Álvaro Herrera, Kyotaro Horiguchi, Michael Paquier, Nazir
Bilal Yavuz
Discussion: https://postgr.es/m/ZtXR+CtkEVVE/LHF@ip-10-97-1-34.eu-west-3.compute.internal
2024-12-19 13:19:22 +09:00
Michael Paquier
ff7c40d7fd Extract logic filling pg_stat_get_io()'s tuplestore into its own routine
This commit adds pg_stat_io_build_tuples(), a helper routine for
pg_stat_get_io(), that fills its result tuplestore based on the contents
of PgStat_BktypeIO.  This will be used in a follow-up commit that uses
the same structures as pg_stat_io for reporting, including the same
object types and contexts, but for a different statistics kind.

Author: Bertrand Drouvot, Michael Paquier
Discussion: https://postgr.es/m/ZtXR+CtkEVVE/LHF@ip-10-97-1-34.eu-west-3.compute.internal
2024-12-19 10:16:02 +09:00
David Rowley
08cdb079d4 Optimize grouping equality checks with virtual slots
8f4ee9626 fixed an old Assert failure that could happen when the slot
type used to look up the hash table for BuildTupleHashTableExt() users
wasn't a TTSOpsMinimalTuple slot.  The fix for that in the back branches
had to be to pass the TupleTableSlotOps as NULL, however in master,
since we have the inputOps parameter as was added by d96d1d515, we can
pass that down instead.

At least one caller uses a fixed slot that's always TTSOpsVirtual, so
passing down inputOps for these cases allows ExecBuildGroupingEqual() to
skip adding the EEOP_INNER_FETCHSOME ExprEvalStep.

This should increase the performance of hashed subplans very slightly.

Author: Tom Lane, David Rowley
Discussion: https://postgr.es/m/2543667.1734483723@sss.pgh.pa.us
2024-12-19 13:57:21 +13:00
David Rowley
8f4ee96269 Fix Assert failure in WITH RECURSIVE UNION queries
If the non-recursive part of a recursive CTE ended up using
TTSOpsBufferHeapTuple as the table slot type, then a duplicate value
could cause an Assert failure in CheckOpSlotCompatibility() when
checking the hash table for the duplicate value.  The expected slot type
for the deform step was TTSOpsMinimalTuple so the Assert failed when the
TTSOpsBufferHeapTuple slot was used.

This is a long-standing bug which we likely didn't notice because it
seems much more likely that the non-recursive term would have required
projection and used a TTSOpsVirtual slot, which CheckOpSlotCompatibility
is ok with.

There doesn't seem to be any harm done here other than the Assert
failure.  Both TTSOpsMinimalTuple and TTSOpsBufferHeapTuple slot types
require tuple deformation, so the EEOP_*_FETCHSOME ExprState step would
have properly existed in the ExprState.

The solution is to pass NULL for the ExecBuildGroupingEqual's 'lops'
parameter.  This means the ExprState's EEOP_*_FETCHSOME step won't
expect a fixed slot type.  This makes CheckOpSlotCompatibility() happy as
no checking is performed when the ExprEvalStep is not expecting a fixed
slot type.

Reported-by: Richard Guo
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAMbWs4-8U9q2LAtf8+ghV11zeUReA3AmrYkxzBEv0vKnDxwkKA@mail.gmail.com
Backpatch-through: 13, all supported versions
2024-12-19 13:11:39 +13:00
Melanie Plageman
b7493e1ab3 Remove leftover mentions of XLOG_HEAP2_FREEZE_PAGE records
f83d709760d merged the separate XLOG_HEAP2_FREEZE_PAGE records into a
new combined prune, freeze, and vacuum record with opcode
XLOG_HEAP2_PRUNE_VACUUM_SCAN. Remove the last few references to
XLOG_HEAP2_FREEZE_PAGE records which were accidentally left behind.

Reported-by: Tomas Vondra
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA%2BTgmoY1tYff-1CEn8kYt5FsOrynTbtr%3DUZw%3D7mTC1Hv1HpeBQ%40mail.gmail.com
2024-12-18 18:47:21 -05:00
Melanie Plageman
1a0da347a7 Bitmap Table Scans use unified TBMIterator
With the repurposing of TBMIterator as an interface for both parallel
and serial iteration through TIDBitmaps in commit 7f9d4187e7bab10329cc,
bitmap table scans may now use it.

Modify bitmap table scan code to use the TBMIterator. This requires
moving around a bit of code, so a few variables are initialized
elsewhere.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/c736f6aa-8b35-4e20-9621-62c7c82e2168%40vondra.me
2024-12-18 18:43:39 -05:00
Melanie Plageman
7f9d4187e7 Add common interface for TBMIterators
Add and use TBMPrivateIterator, which replaces the current TBMIterator
for serial use cases, and repurpose TBMIterator to be a unified
interface for both the serial ("private") and parallel ("shared") TID
Bitmap iterator interfaces. This encapsulation simplifies call sites for
callers supporting both parallel and serial TID Bitmap access.
TBMIterator is not yet used in this commit.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Heikki Linnakangas
Discussion: https://postgr.es/m/063e4eb4-32d9-439e-a0b1-75565a9835a8%40iki.fi
2024-12-18 18:19:28 -05:00
Melanie Plageman
28328ec87b Fix overflow danger in SampleHeapTupleVisible()
68d9662be1c4b70 made HeapScanDesc->rs_ntuples unsigned but neglected to
change how it was being used in SampleHeapTupleVisible().

Return early if rs_ntuples is 0 to avoid overflowing and incorrectly
executing the loop code in SampleHeapTupleVisible().

Reported-by: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQAot_xQoZyPZjpj1aBUPrPykY5mOPHGyvfe%3Djz%2BWowdA3A%40mail.gmail.com
2024-12-18 18:16:43 -05:00
Melanie Plageman
68d9662be1 Make rs_cindex and rs_ntuples unsigned
HeapScanDescData.rs_cindex and rs_ntuples can't be less than 0. All scan
types using the heap scan descriptor expect these values to be >= 0.
Make that expectation clear by making rs_cindex and rs_ntuples unsigned.

Also remove the test in heapam_scan_bitmap_next_tuple() that checks if
rs_cindex < 0. This was never true, but now that rs_cindex is unsigned,
it makes even less sense.

While we are at it, initialize both rs_cindex and rs_ntuples to 0 in
initscan().

Author: Melanie Plageman
Reviewed-by: Dilip Kumar
Discussion: https://postgr.es/m/CAAKRu_ZxF8cDCM_BFi_L-t%3DRjdCZYP1usd1Gd45mjHfZxm0nZw%40mail.gmail.com
2024-12-18 11:47:38 -05:00
Peter Eisentraut
1f0de66ea2 seg: pure parser and reentrant scanner
Use the flex %option reentrant and the bison option %pure-parser to
make the generated scanner and parser pure, reentrant, and
thread-safe.

Make the generated scanner use palloc() etc. instead of malloc() etc.
Previously, we only used palloc() for the buffer, but flex would still
use malloc() for its internal structures.  As a result, there could be
some small memory leaks in case of uncaught errors.  (We do catch
normal syntax errors as soft errors.)  Now, all the memory is under
palloc() control, so there are no more such issues.

Simplify flex scan buffer management: Instead of constructing the
buffer from pieces and then using yy_scan_buffer(), we can just use
yy_scan_string(), which does the same thing internally.

The previous code was necessary because we allocated the buffer with
palloc() and the rest of the state was handled by malloc().  But this
is no longer the case; everything is under palloc() now.

(We could even get rid of the yylex_destroy() call and just let the
memory context cleanup handle everything.  But for now, we preserve
the existing behavior.)

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://www.postgresql.org/message-id/flat/eb6faeac-2a8a-4b69-9189-c33c520e5b7b@eisentraut.org
2024-12-18 08:47:53 +01:00
Peter Eisentraut
802fe923e3 cube: pure parser and reentrant scanner
Use the flex %option reentrant and the bison option %pure-parser to
make the generated scanner and parser pure, reentrant, and
thread-safe.

Make the generated scanner use palloc() etc. instead of malloc() etc.
Previously, we only used palloc() for the buffer, but flex would still
use malloc() for its internal structures.  As a result, there could be
some small memory leaks in case of uncaught errors.  (We do catch
normal syntax errors as soft errors.)  Now, all the memory is under
palloc() control, so there are no more such issues.

Simplify flex scan buffer management: Instead of constructing the
buffer from pieces and then using yy_scan_buffer(), we can just use
yy_scan_string(), which does the same thing internally.  (Actually, we
use yy_scan_bytes() here because we already have the length.)

The previous code was necessary because we allocated the buffer with
palloc() and the rest of the state was handled by malloc().  But this
is no longer the case; everything is under palloc() now.

(We could even get rid of the yylex_destroy() call and just let the
memory context cleanup handle everything.  But for now, we preserve
the existing behavior.)

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://www.postgresql.org/message-id/flat/eb6faeac-2a8a-4b69-9189-c33c520e5b7b@eisentraut.org
2024-12-18 08:47:34 +01:00
Michael Paquier
477728b5d6 psql: Add more information about service name
This commit adds support for the following items in psql, able to show a
service name, when available:
- Variable SERVICE.
- Substitution %s in PROMPT{1,2,3}.

This relies on 4b99fed7541e, that has made the service name available in
PGconn for libpq.

Author: Michael Banck
Reviewed-by: Greg Sabino Mullane
Discussion: https://postgr.es/m/6723c612.050a0220.1567f4.b94a@mx.google.com
2024-12-18 15:16:12 +09:00
Michael Paquier
4b99fed754 libpq: Add service name to PGconn and PQservice()
This commit adds one field to PGconn for the database service name (if
any), with PQservice() as routine to retrieve it.  Like the other
routines of this area, NULL is returned as result if the connection is
NULL.

A follow-up patch will make use of this feature to be able to display
the service name in the psql prompt.

Author: Michael Banck
Reviewed-by: Greg Sabino Mullane
Discusion: https://postgr.es/m/6723c612.050a0220.1567f4.b94a@mx.google.com
2024-12-18 14:53:42 +09:00
Tom Lane
3f06324705 Fix memory leak in pg_restore with zstd-compressed data.
EndCompressorZstd() neglected to free everything.  This was
most visible with a lot of large objects in the dump.

Per report from Tomasz Szypowski.  Back-patch to v16
where this code came in.

Discussion: https://postgr.es/m/DU0PR04MB94193D038A128EF989F922D199042@DU0PR04MB9419.eurprd04.prod.outlook.com
2024-12-17 22:31:26 -05:00
David Rowley
d96d1d5152 Fix incorrect slot type in BuildTupleHashTableExt
0f5738202 adjusted the execGrouping.c code so it made use of ExprStates to
generate hash values.  That commit made a wrong assumption that the slot
type to pass to ExecBuildHash32FromAttrs() is always &TTSOpsMinimalTuple.
That's not the case as the slot type depends on the slot type passed to
LookupTupleHashEntry(), which for nodeRecursiveunion.c, could be any of
the current slot types.

Here we fix this by adding a new parameter to BuildTupleHashTableExt()
to allow the slot type to be passed in.  In the case of nodeSubplan.c
and nodeAgg.c the slot type is always &TTSOpsVirtual, so for both of
those cases, it's beneficial to pass the known slot type as that allows
ExecBuildHash32FromAttrs() to skip adding the tuple deform step to the
resulting ExprState.  Another possible fix would have been to have
ExecBuildHash32FromAttrs() set "fetch.kind" to NULL so that
ExecComputeSlotInfo() always determines the EEOP_INNER_FETCHSOME is
required, however, that option isn't favorable as slows down aggregation
and hashed subplan evaluation due to the extra (needless) deform step.

Thanks to Nathan Bossart for bisecting to find the offending commit
based on Paul's report.

Reported-by: Paul Ramsey <pramsey@cleverelephant.ca>
Discussion: https://postgr.es/m/99F064C1-B3EB-4BE7-97D2-D2A0AA487A71@cleverelephant.ca
2024-12-18 12:05:55 +13:00
Nathan Bossart
84f1b0b031 Accommodate very large dshash tables.
If a dshash table grows very large (e.g., the dshash table for
cumulative statistics when there are millions of tables), resizing
it may fail with an error like:

	ERROR: invalid DSA memory alloc request size 1073741824

To fix, permit dshash resizing to allocate more than 1 GB by
providing the DSA_ALLOC_HUGE flag.

Reported-by: Andreas Scherbaum
Author: Matthias van de Meent
Reviewed-by: Cédric Villemain, Michael Paquier, Andres Freund
Discussion: https://postgr.es/m/80a12d59-0d5e-4c54-866c-e69cd6536471%40pgug.de
Backpatch-through: 13
2024-12-17 15:24:45 -06:00
Tom Lane
7a80e381d1 Skip useless calculation of join RTE column names during EXPLAIN.
There's no need for set_simple_column_names() to compute unique
column names for join RTEs, because a finished plan tree will
not contain any join alias Vars that we could need names for.
Its other, internal callers will not pass it any join RTEs
anyway, so the upshot is we can just skip join RTEs here.

Aside from getting rid of a klugy against-its-documentation use of
set_relation_column_names, this can speed up EXPLAIN substantially
when considering many-join queries, because the upper join RTEs
tend to have a lot of columns.

Sami Imseih, with cosmetic changes by me

Discussion: https://postgr.es/m/CAA5RZ0th3q-0p1pri58z9grG8r8azmEBa8o1rtkwhLmJg_cH+g@mail.gmail.com
2024-12-17 15:52:12 -05:00
Melanie Plageman
dc6acfd910 Count pages set all-visible and all-frozen in VM during vacuum
Heap vacuum already counts and logs pages with newly frozen tuples. Now
count and log the number of pages newly set all-visible and all-frozen
in the visibility map.

Pages that are all-visible but not all-frozen are debt for future
aggressive vacuums. The counts of newly all-visible and all-frozen pages
give us insight into the rate at which this debt is being accrued and
paid down.

Author: Melanie Plageman
Reviewed-by: Masahiko Sawada, Alastair Turner, Nitin Jadhav, Andres Freund, Bilal Yavuz, Tomas Vondra
Discussion: https://postgr.es/m/flat/CAAKRu_ZQe26xdvAqo4weHLR%3DivQ8J4xrSfDDD8uXnh-O-6P6Lg%40mail.gmail.com#6d8d2b4219394f774889509bf3bdc13d,
https://postgr.es/m/ctdjzroezaxmiyah3gwbwm67defsrwj2b5fpfs4ku6msfpxeia%40mwjyqlhwr2wu
2024-12-17 14:19:13 -05:00
Melanie Plageman
4b565a198b Make visibilitymap_set() return previous state of vmbits
It can be useful to know the state of a relation page's VM bits before
visibilitymap_set(). visibilitymap_set() has the old value on hand, so
returning it is simple. This commit does not use visibilitymap_set()'s
new return value.

Author: Melanie Plageman
Reviewed-by: Masahiko Sawada, Andres Freund, Nitin Jadhav, Bilal Yavuz
Discussion: https://postgr.es/m/flat/CAAKRu_ZQe26xdvAqo4weHLR%3DivQ8J4xrSfDDD8uXnh-O-6P6Lg%40mail.gmail.com#6d8d2b4219394f774889509bf3bdc13d,
https://postgr.es/m/ctdjzroezaxmiyah3gwbwm67defsrwj2b5fpfs4ku6msfpxeia%40mwjyqlhwr2wu
2024-12-17 14:19:03 -05:00
Melanie Plageman
f020baa066 Rename LVRelState->frozen_pages
Rename frozen_pages to new_frozen_tuple_pages in LVRelState, the struct
used for tracking state during vacuuming of a heap relation.
frozen_pages sounds like it tracks pages set all-frozen. That is a
misnomer. It only includes pages with at least one newly frozen tuple.
It also includes pages that are not all-frozen.

Author: Melanie Plageman
Reviewed-by: Andres Freund, Masahiko Sawada, Nitin Jadhav, Bilal Yavuz

Discussion: https://postgr.es/m/ctdjzroezaxmiyah3gwbwm67defsrwj2b5fpfs4ku6msfpxeia%40mwjyqlhwr2wu
2024-12-17 14:18:59 -05:00
Tom Lane
21fb39cb07 Set max_safe_fds whenever we create shared memory and semaphores.
Formerly we skipped this in bootstrap/check mode and in single-user
mode.  That's bad in check mode because it may allow accepting a
value of max_connections that doesn't actually work: on platforms
where semaphores consume file descriptors, there may not be enough
free FDs left over to satisfy fd.c, causing postmaster start to
fail.  It's also not great in single-user mode, because fd.c will
operate with just the minimum allowable value of max_safe_fds,
resulting in excess file open/close overhead if anything moderately
complicated is done in single-user mode.  (There may be some penalty
for bootstrap mode too, though probably not much.)

Discussion: https://postgr.es/m/2081982.1734393311@sss.pgh.pa.us
2024-12-17 12:23:26 -05:00
Tom Lane
c91963da13 Set the stack_base_ptr in main(), not in random other places.
Previously we did this in PostmasterMain() and InitPostmasterChild(),
which meant that stack depth checking was disabled in non-postmaster
server processes, for instance in single-user mode.  That seems like
a fairly bad idea, since there's no a-priori restriction on the
complexity of queries we will run in single-user mode.  Moreover, this
led to not having quite the same stack depth limit in all processes,
which likely has no real-world effect but it offends my inner neatnik.
Setting the depth in main() guarantees that check_stack_depth() is
armed and has a consistent interpretation of stack depth in all forms
of server processes.

While at it, move the code associated with checking the stack depth
out of tcop/postgres.c (which was never a great home for it) into
a new file src/backend/utils/misc/stack_depth.c.

Discussion: https://postgr.es/m/2081982.1734393311@sss.pgh.pa.us
2024-12-17 12:08:42 -05:00
Tomas Vondra
957ba9ff14 Detect version mismatch in brin_page_items
Commit dae761a87ed modified brin_page_items() to return the new "empty"
flag for each BRIN range. But the new output parameter was added in the
middle, which may cause crashes when using the new binary with old
function definition.

The ideal solution would be to introduce API versioning similar to what
pg_stat_statements does, but it's too late for that as PG17 was already
released (so we can't introduce a new extension version). We could do
something similar in brin_page_items() by checking the number of output
columns (and ignoring the new flag), but it doesn't seem very nice.

Instead, simply error out and suggest updating the extension to the
latest version. pageinspect is a superuser-only extension, and there's
not much reason to run an older version. Moreover, there's a precedent
for this approach in 691e8b2e18.

Reported by Ľuboslav Špilák, investigation and patch by me. Backpatch to
17, same as dae761a87ed.

Reported-by: Ľuboslav Špilák
Reviewed-by: Michael Paquier, Hayato Kuroda, Peter Geoghegan
Backpatch-through: 17
Discussion: https://postgr.es/m/VI1PR02MB63331C3D90E2104FD12399D38A5D2@VI1PR02MB6333.eurprd02.prod.outlook.com
Discussion: https://postgr.es/m/flat/3385a58f-5484-49d0-b790-9a198a0bf236@vondra.me
2024-12-17 17:48:55 +01:00
Tomas Vondra
8cd44db42a Update comments about index parallel builds
Commit b43757171470 allowed parallel builds for BRIN, but left behind
two comments claiming only btree indexes support parallel builds.

Reported by Egor Rogov, along with similar issues in SGML docs.
Backpatch to 17, where parallel builds for BRIN were introduced.

Reported-by: Egor Rogov
Backpatch-through: 17
Discussion: https://postgr.es/m/114e2d5d-125e-07d8-94aa-5ad175fb7443@postgrespro.ru
2024-12-17 15:40:07 +01:00
Peter Eisentraut
fb1a18810f Remove ts_locale.c's lowerstr()
lowerstr() and lowerstr_with_len() in ts_locale.c do the same thing as
str_tolower() that the rest of the system uses, except that the former
don't use the common locale provider framework but instead use the
global libc locale settings.

This patch replaces uses of lowerstr*() with str_tolower(...,
DEFAULT_COLLATION_OID).  For instances that use a libc locale
globally, this will result in exactly the same behavior.  For
instances that use other locale providers, you now get consistent
behavior and are no longer dependent on the libc locale settings (for
this case; there are others).

Most uses of these functions are for processing dictionary and
configuration files.  In those cases, using the default collation
seems appropriate.  At least we don't have a more specific collation
available.  But the code in contrib/pg_trgm should really depend on
the collation of the columns being processed.  This is not done here,
this can be done in a separate patch.

(You can probably construct some edge cases where this change would
create some locale-related upgrade incompatibility, for example if
before you used a combination of ICU and a differently-behaving libc
locale.  We can document this in the release notes, but I don't think
there is anything more we can do about this.)

Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://www.postgresql.org/message-id/flat/653f3b84-fc87-45a7-9a0c-bfb4fcab3e7d%40eisentraut.org
2024-12-17 14:04:55 +01:00
Peter Eisentraut
d3aad4ac57 Remove ts_locale.c's t_isdigit(), t_isspace(), t_isprint()
These do the same thing as the standard isdigit(), isspace(), and
isprint() but with multibyte and encoding support.  But all the
callers are only interested in analyzing single-byte ASCII characters.
So this extra layer is overkill and we can replace the uses with the
standard functions.

All the t_is*() functions in ts_locale.c are under scrutiny because
they don't use the common locale provider framework but instead use
the global libc locale settings.  For the functions being touched by
this patch, we don't need all that anyway, as mentioned above, so the
simplest solution is to just remove them.  The few remaining t_is*()
functions will need a different treatment in a separate patch.

pg_trgm has some compile-time options with macros such as
KEEPONLYALNUM.  These are not documented, and the non-default variant
is not supported by any test cases.  As part of this undertaking, I'm
removing the non-default variant, as it is in the way of cleanup.  So
in this case, the not-KEEPONLYALNUM code path is gone.

Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://www.postgresql.org/message-id/flat/653f3b84-fc87-45a7-9a0c-bfb4fcab3e7d%40eisentraut.org
2024-12-17 12:52:29 +01:00
Richard Guo
60be3f9f0a Avoid unnecessary wrapping for more complex expressions
When pulling up a subquery that is under an outer join, if the
subquery's target list contains a strict expression that uses a
subquery variable, it's okay to pull up the expression without
wrapping it in a PlaceHolderVar: if the subquery variable is forced to
NULL by the outer join, the expression result will come out as NULL
too.

If the strict expression does not contain any subquery variables, the
current code always wraps it in a PlaceHolderVar.  While this is not
incorrect, the analysis could be tighter: if the strict expression
contains any variables of rels that are under the same lowest nulling
outer join as the subquery, we can also avoid wrapping it.  This is
safe because if the subquery variable is forced to NULL by the outer
join, the variables of rels that are under the same lowest nulling
outer join will also be forced to NULL, resulting in the expression
evaluating to NULL as well.  Therefore, it's not necessary to force
the expression to be evaluated below the outer join.  It could be
beneficial to get rid of such PHVs because they could imply lateral
dependencies, which force us to resort to nestloop joins.

This patch checks if the lateral references in the strict expression
contain any variables of rels under the same lowest nulling outer join
as the subquery, and avoids wrapping the expression in that case.

This is fundamentally a generalization of the optimizations for bare
Vars and PHVs introduced in commit f64ec81a8.

No backpatch as this could result in plan changes.

Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4_ENtfRdLaM_bXAxiKRYO7DmwDBDG4_2=VTDi0mJP-jAw@mail.gmail.com
2024-12-17 19:53:01 +09:00
Amit Kapila
2364f61488 Doc: Fix the wrong link on pg_createsubscriber page.
Commit 84db9a0eb1 has added the incorrect link to
'initial data synchronization'. It was a subsection of Row Filter and
didn't provide the required information.

Author: Peter Smith
Reviewed-by: Vignesh C, Pavel Luzanov
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/CAHut+PtnA4DB_pcv4TDr4NjUSM1=P2N_cuZx5DX09k7LVmaqUA@mail.gmail.com
2024-12-17 15:08:29 +05:30
Michael Paquier
fee2b3ea2e Tweak some comments related to variable-numbered stats in pgstat.c
These comments referred to database objects, but depending on the stats
kind dealt with this may not be true.

Issues found while reviewing a different patch in this area.

Discussion: https://postgr.es/m/ZtXR+CtkEVVE/LHF@ip-10-97-1-34.eu-west-3.compute.internal
2024-12-17 14:32:35 +09:00
Michael Paquier
0f23dedc91 Print out error position for some more DDLs
The following commands gain some information about the error position in
the query, should they fail when looking at the type used:
- CREATE TYPE (LIKE)
- CREATE TABLE OF

Both are related to typenameType() where the type name lookup is done.
These calls gain the ParseState that already exists in these paths.

Author: Kirill Reshke, Jian He
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/CALdSSPhqfvKbDwqJaY=yEePi_aq61GmMpW88i6ZH7CMG_2Z4Cg@mail.gmail.com
2024-12-17 09:44:06 +09:00
Michael Paquier
e116b703f0 pg_combinebackup: Fix PITR comparison test in 002_compare_backups
The test was creating both the dumps to compare from the same database
on the same node, so it would never detect any mismatches when comparing
the logical dumps of the two servers.

Fixing this issue has revealed that there is a difference in the dumps:
the tablespaces paths are different.  This commit uses compare_text()
with a custom comparison function to erase the difference (slightly
tweaked to be able to work with WIN32 and non-WIN32 paths).  This way,
the non-relevant parts of the tablespace path are ignored from the check
with the basic structure of the query string still compared.

Author: Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/87h67653ns.fsf@wibble.ilmari.org
Backpatch-through: 17
2024-12-17 09:23:49 +09:00