1
0
mirror of https://github.com/postgres/postgres.git synced 2026-01-27 21:43:08 +03:00
Commit Graph

213 Commits

Author SHA1 Message Date
Heikki Linnakangas
ad853bb877 Fix misc typos, mostly in comments
The only user-visible change is the fix in the "malformed
pg_dependencies" error detail. That one is new in commit e1405aa5e3,
so no backpatching required.
2026-01-08 18:10:08 +02:00
Bruce Momjian
451c43974f Update copyright for 2026
Backpatch-through: 14
2026-01-01 13:24:10 -05:00
Michael Paquier
0c3c5c3b06 Use palloc_object() and palloc_array() in more areas of the tree
The idea is to encourage more the use of these new routines across the
tree, as these offer stronger type safety guarantees than palloc().

The following paths are included in this batch, treating all the areas
proposed by the author for the most trivial changes, except src/backend
(by far the largest batch):
src/bin/
src/common/
src/fe_utils/
src/include/
src/pl/
src/test/
src/tutorial/

Similar work has been done in 31d3847a37.

The code compiles the same before and after this commit, with the
following exceptions due to changes in line numbers because some of the
new allocation formulas are shorter:
blkreftable.c
pgfnames.c
pl_exec.c

Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
2025-12-09 14:53:17 +09:00
Alexander Korotkov
8af3ae0d4b Add pairingheap_initialize() for shared memory usage
The existing pairingheap_allocate() uses palloc(), which allocates
from process-local memory. For shared memory use cases, the pairingheap
structure must be allocated via ShmemAlloc() or embedded in a shared
memory struct. Add pairingheap_initialize() to initialize an already-
allocated pairingheap structure in-place, enabling shared memory usage.

Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com
Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
2025-11-05 11:44:13 +02:00
Tom Lane
1ea5bdb00b Improve planner's estimates of tuple hash table sizes.
For several types of plan nodes that use TupleHashTables, the
planner estimated the expected size of the table as basically
numEntries * (MAXALIGN(dataWidth) + MAXALIGN(SizeofHeapTupleHeader)).
This is pretty far off, especially for small data widths, because
it doesn't account for the overhead of the simplehash.h hash table
nor for any per-tuple "additional space" the plan node may request.
Jeff Janes noted a case where the estimate was off by about a factor
of three, even though the obvious hazards such as inaccurate estimates
of numEntries or dataWidth didn't apply.

To improve matters, create functions provided by the relevant executor
modules that can estimate the required sizes with reasonable accuracy.
(We're still not accounting for effects like allocator padding, but
this at least gets the first-order effects correct.)

I added functions that can estimate the tuple table sizes for
nodeSetOp and nodeSubplan; these rely on an estimator for
TupleHashTables in general, and that in turn relies on one for
simplehash.h hash tables.  That feels like kind of a lot of mechanism,
but if we take any short-cuts we're violating modularity boundaries.

The other places that use TupleHashTables are nodeAgg, which took
pains to get its numbers right already, and nodeRecursiveunion.
I did not try to improve the situation for nodeRecursiveunion because
there's nothing to improve: we are not making an estimate of the hash
table size, and it wouldn't help us to do so because we have no
non-hashed alternative implementation.  On top of that, our estimate
of the number of entries to be hashed in that module is so suspect
that we'd likely often choose the wrong implementation if we did have
two ways to do it.

Reported-by: Jeff Janes <jeff.janes@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
2025-11-02 16:57:26 -05:00
David Rowley
5c0a20003b Fix reset of incorrect hash iterator in GROUPING SETS queries
This fixes an unlikely issue when fetching GROUPING SET results from
their internally stored hash tables.  It was possible in rare cases that
the hash iterator would be set up incorrectly which could result in a
crash.

This was introduced in 4d143509c, so backpatch to v18.

Many thanks to Yuri Zamyatin for reporting and helping to debug this
issue.

Bug: #19078
Reported-by: Yuri Zamyatin <yuri@yrz.am>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/19078-dfd62f840a2c0766@postgresql.org
Backpatch-through: 18
2025-10-18 16:07:04 +13:00
Peter Eisentraut
a5b35fcedb Remove PointerIsValid()
This doesn't provide any value over the standard style of checking the
pointer directly or comparing against NULL.

Also remove related:
- AllocPointerIsValid() [unused]
- IndexScanIsValid() [had one user]
- HeapScanIsValid() [unused]
- InvalidRelation [unused]

Leaving HeapTupleIsValid(), ItemIdIsValid(), PortalIsValid(),
RelationIsValid for now, to reduce code churn.

Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/ad50ab6b-6f74-4603-b099-1cd6382fb13d%40eisentraut.org
Discussion: https://www.postgresql.org/message-id/CA+hUKG+NFKnr=K4oybwDvT35dW=VAjAAfiuLxp+5JeZSOV3nBg@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/bccf2803-5252-47c2-9ff0-340502d5bd1c@iki.fi
2025-09-24 15:17:20 +02:00
Peter Eisentraut
a0ed19e0a9 Use PRI?64 instead of "ll?" in format strings (continued).
Continuation of work started in commit 15a79c73, after initial trial.

Author: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/b936d2fb-590d-49c3-a615-92c3a88c6c19%40eisentraut.org
2025-03-29 10:43:57 +01:00
Tatsuo Ishii
a9dcbb4d5c Add new StringInfo APIs to allow callers to specify the buffer size.
Previously StringInfo APIs allocated buffers with fixed initial
allocation size of 1024 bytes. This may be too large and inappropriate
for some callers that can do with smaller memory buffers. To fix this,
introduce new APIs that allow callers to specify initial buffer size.

extern StringInfo makeStringInfoExt(int initsize);
extern void initStringInfoExt(StringInfo str, int initsize);

Existing APIs (makeStringInfo() and initStringInfo()) are changed to
call makeStringInfoExt and initStringInfoExt respectively (via inline
helper functions makeStringInfoInternal and initStringInfoInternal),
with the default buffer size of 1024.

Reviewed-by: Nathan Bossart, David Rowley, Michael Paquier, Gurjeet Singh
Discussion: https://postgr.es/m/20241225.123704.1194662271286702010.ishii%40postgresql.org
2025-01-11 08:23:46 +09:00
John Naylor
3e70da2781 Always use the caller-provided context for radix tree leaves
Previously, it would not have worked for a caller to pass a slab
context, since it would have been used for other things which likely
had incompatible size. In an attempt to be helpful and avoid possible
space wastage due to aset's power-of-two rounding, RT_CREATE would
create an additional slab context if the value type was fixed-length
and larger than pointer size. The problem was, we have since added
the bump context type, and the generation context was a possibility as
well, so silently overriding the caller's choice may actually be worse.

Commit e8a6f1f908 arranged so that the caller-provided context is
used only for leaves, so it's safe for the caller to use slab here
if they wish. As demonstration, use slab in one of the radix tree
regression tests.

Reviewed by Masahiko Sawada

Discussion: https://postgr.es/m/CANWCAZZDCo4k5oURg_pPxM6+WZ1oiG=sqgjmQiELuyP0Vtrwig@mail.gmail.com
2025-01-06 13:26:02 +07:00
John Naylor
e8a6f1f908 Get rid of radix tree's general purpose memory context
Previously, this was notionally used only for the entry point of the
tree and as a convenient parent for other contexts.

For shared memory, the creator previously allocated the entry point
in this context, but attaching backends didn't have access to that,
so they just used the caller's context. For the sake of consistency,
allocate every instance of an entry point in the caller's context.

For local memory, allocate the control object in the caller's context
as well. This commit also makes the "leaf context" the notional parent
of the child contexts used for nodes, so it's a bit of a misnomer,
but a future commit will make the node contexts independent rather
than children, so leave it this way for now to avoid code churn.

The memory context parameter for RT_CREATE is now unused in the case
of shared memory, so remove it and adjust callers to match.

In passing, remove unused "context" member from struct TidStore,
which seems to have been an oversight.

Reviewed by Masahiko Sawada

Discussion: https://postgr.es/m/CANWCAZZDCo4k5oURg_pPxM6+WZ1oiG=sqgjmQiELuyP0Vtrwig@mail.gmail.com
2025-01-06 11:21:21 +07:00
John Naylor
960013f2a1 Use caller's memory context for radix tree iteration state
Typically only one iterator is present at any time, so it's overkill
to devote an entire context for this. Get rid of it and use the
caller's context.

This is tidy-up work, so no backpatch in this form. However, a
hypothetical extension to v17 that tried to start iteration from
an attaching backend would result in a crash, so that'll be fixed
separately in a way that doesn't change behavior in core.

Patch by me, reported and reviewed by Masahiko Sawada

Discussion: https://postgr.es/m/CAD21AoBB2U47V=F+wQRB1bERov_of5=BOZGaybjaV8FLQyqG3Q@mail.gmail.com
2025-01-06 09:01:58 +07:00
Bruce Momjian
50e6eb731d Update copyright for 2025
Backpatch-through: 13
2025-01-01 11:21:55 -05:00
Masahiko Sawada
724890ffb7 Include necessary header files in radixtree.h.
When #include'ing radixtree.h with RT_SHMEM, it could happen to raise
compiler errors due to missing some declarations of types and
functions.

This commit also removes the inclusion of postgres.h since it's
against our usual convention.

Backpatch to v17, where radixtree.h was introduced.

Reviewed-by: Heikki Linnakangas, Álvaro Herrera
Discussion: https://postgr.es/m/CAD21AoCU9YH%2Bb9Rr8YRw7UjmB%3D1zh8GKQkWNiuN9mVhMvkyrRg%40mail.gmail.com
Backpatch-through: 17
2024-12-09 13:07:06 -08:00
Alexander Korotkov
3a7ae6b3d9 Revert pg_wal_replay_wait() stored procedure
This commit reverts 3c5db1d6b0, and subsequent improvements and fixes
including 8036d73ae3, 867d396ccd, 3ac3ec580c, 0868d7ae70, 85b98b8d5a,
2520226c95, 014f9f34d2, e658038772, e1555645d7, 5035172e4a, 6cfebfe88b,
73da6b8d1b, and e546989a26.

The reason for reverting is a set of remaining issues.  Most notably, the
stored procedure appears to need more effort than the utility statement
to turn the backend into a "snapshot-less" state.  This makes an approach
to use stored procedures questionable.

Catversion is bumped.

Discussion: https://postgr.es/m/Zyhj2anOPRKtb0xW%40paquier.xyz
2024-11-04 22:47:57 +02:00
Alexander Korotkov
3c5db1d6b0 Implement pg_wal_replay_wait() stored procedure
pg_wal_replay_wait() is to be used on standby and specifies waiting for
the specific WAL location to be replayed.  This option is useful when
the user makes some data changes on primary and needs a guarantee to see
these changes are on standby.

The queue of waiters is stored in the shared memory as an LSN-ordered pairing
heap, where the waiter with the nearest LSN stays on the top.  During
the replay of WAL, waiters whose LSNs have already been replayed are deleted
from the shared memory pairing heap and woken up by setting their latches.

pg_wal_replay_wait() needs to wait without any snapshot held.  Otherwise,
the snapshot could prevent the replay of WAL records, implying a kind of
self-deadlock.  This is why it is only possible to implement
pg_wal_replay_wait() as a procedure working without an active snapshot,
not a function.

Catversion is bumped.

Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru
Author: Kartyshov Ivan, Alexander Korotkov
Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila
Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira
Reviewed-by: Heikki Linnakangas, Kyotaro Horiguchi
2024-08-02 21:16:56 +03:00
John Naylor
fd49e8f323 Prevent access of uninitialized memory in radix tree nodes
RT_NODE_16_SEARCH_EQ() performs comparisions using vector registers
on x64-64 and aarch64. We apply a mask to the resulting bitfield
to eliminate irrelevant bits that may be set. This ensures correct
behavior, but Valgrind complains of the partially-uninitialised
values. So far the warnings have only occurred on aarch64, which
explains why this hasn't been seen earlier.

To fix this warning, initialize the whole fixed-sized part of the nodes
upon allocation, rather than just do the minimum initialization to
function correctly. The initialization for node48 is a bit different
in that the 256-byte slot index array must be populated with "invalid
index" rather than zero. Experimentation has shown that compilers
tend to emit code that uselessly memsets that array twice. To avoid
pessimizing this path, swap the order of the slot_idxs[] and isset[]
arrays so we can initialize with two non-overlapping memset calls.

Reported by Tomas Vondra
Analysis and patch by Tom Lane, reviewed by Masahiko Sawada. I
investigated the behavior of memset calls to overlapping regions,
leading to the above tweaks to node48 as discussed in the thread.

Discussion: https://postgr.es/m/120c63ad-3d12-415f-a7bf-3da451c31bf6%40enterprisedb.com
2024-06-21 17:29:39 +07:00
John Naylor
ed52df3b19 Small cosmetic fixes in radix tree template
- Bring memory context names in line with other naming
- Fix typos, reported off-list by Alexander Lakhin
- Remove copy-paste errors from comments
- Remove duplicate #undef
2024-04-27 14:42:01 +07:00
Masahiko Sawada
bb7f195ff7 radixtree: Fix SIGSEGV at update of embeddable value to non-embeddable.
Also, fix a memory leak when updating from non-embeddable to
embeddable. Both were unreachable without adding C code.

Reported-by: Noah Misch
Author: Noah Misch
Reviewed-by: Masahiko Sawada, John Naylor
Discussion: https://postgr.es/m/20240424210319.4c.nmisch%40google.com
2024-04-25 21:48:52 +09:00
Daniel Gustafsson
950d4a2cb1 Fix typos and duplicate words
This fixes various typos, duplicated words, and tiny bits of whitespace
mainly in code comments but also in docs.

Author: Daniel Gustafsson <daniel@yesql.se>
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Author: Alexander Lakhin <exclusion@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se
2024-04-18 21:28:07 +02:00
Alexander Korotkov
772faafca1 Revert: Implement pg_wal_replay_wait() stored procedure
This commit reverts 06c418e163, e37662f221, bf1e650806, 25f42429e2,
ee79928441, and 74eaf66f98 per review by Heikki Linnakangas.

Discussion: https://postgr.es/m/b155606b-e744-4218-bda5-29379779da1a%40iki.fi
2024-04-11 17:28:15 +03:00
Masahiko Sawada
810f64a015 Revert indexed and enlargable binary heap implementation.
This reverts commit b840508644 and bcb14f4abc. These commits were made
for commit 5bec1d6bc5 (Improve eviction algorithm in ReorderBuffer
using max-heap for many subtransactions). However, per discussion,
commit efb8acc0d0 replaced binary heap + index with pairing heap, and
made these commits unnecessary.

Reported-by: Jeff Davis
Discussion: https://postgr.es/m/12747c15811d94efcc5cda72d6b35c80d7bf3443.camel%40j-davis.com
2024-04-11 17:18:05 +09:00
John Naylor
0fe5f64367 Teach radix tree to embed values at runtime
Previously, the decision to store values in leaves or within the child
pointer was made at compile time, with variable length values using
leaves by necessity. This commit allows introspecting the length of
variable length values at runtime for that decision. This requires
the ability to tell whether the last-level child pointer is actually
a value, so we use a pointer tag in the lowest level bit.

Use this in TID store. This entails adding a byte to the header to
reserve space for the tag. Commit f35bd9bf3 stores up to three offsets
within the header with no bitmap, and now the header can be embedded
as above. This reduces worst-case memory usage when TIDs are sparse.

Reviewed (in an earlier version) by Masahiko Sawada

Discussion: https://postgr.es/m/CANWCAZYw+_KAaUNruhJfE=h6WgtBKeDG32St8vBJBEY82bGVRQ@mail.gmail.com
Discussion: https://postgr.es/m/CAD21AoBci3Hujzijubomo1tdwH3XtQ9F89cTNQ4bsQijOmqnEw@mail.gmail.com
2024-04-08 18:54:35 +07:00
John Naylor
8a1b31e6e5 Use bump context for TID bitmaps stored by vacuum
Vacuum does not pfree individual entries, and only frees the entire
storage space when finished with it. This allows using a bump context,
eliminating the chunk header in each leaf allocation. Most leaf
allocations will be 16 to 32 bytes, so that's a significant savings.
TidStoreCreateLocal gets a boolean parameter to indicate that the
created store is insert-only.

This requires a separate tree context for iteration, since we free
the iteration state after iteration completes.

Discussion: https://postgr.es/m/CANWCAZac%3DpBePg3rhX8nXkUuaLoiAJJLtmnCfZsPEAS4EtJ%3Dkg%40mail.gmail.com
Discussion: https://postgr.es/m/CANWCAZZQFfxvzO8yZHFWtQV+Z2gAMv1ku16Vu7KWmb5kZQyd1w@mail.gmail.com
2024-04-08 14:39:49 +07:00
Andres Freund
af7e90a277 simplehash: Free collisions array in SH_STAT
While SH_STAT() is only used for debugging, the allocated array can be large,
and therefore should be freed.

It's unclear why coverity started warning now.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Coverity
Discussion: https://postgr.es/m/3005248.1712538233@sss.pgh.pa.us
Backpatch: 12-
2024-04-07 19:08:41 -07:00
Alexander Korotkov
bf1e650806 Use the pairing heap instead of a flat array for LSN replay waiters
06c418e163 introduced pg_wal_replay_wait() procedure allowing to wait for
the particular LSN to be replayed on standby.  The waiters were stored in
the flat array.  Even though scanning small arrays is fast, that might be a
problem at scale (a lot of waiting processes).

This commit replaces the flat shared memory array with the pairing heap,
which holds the waiter with the least LSN at the top.  This gives us O(log N)
complexity for both inserting and removing waiters.

Reported-by: Alvaro Herrera
Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql
2024-04-03 18:15:41 +03:00
Masahiko Sawada
b840508644 Add functions to binaryheap for efficient key removal and update.
Previously, binaryheap didn't support updating a key and removing a
node in an efficient way. For example, in order to remove a node from
the binaryheap, the caller had to pass the node's position within the
array that the binaryheap internally has. Removing a node from the
binaryheap is done in O(log n) but searching for the key's position is
done in O(n).

This commit adds a hash table to binaryheap in order to track the
position of each nodes in the binaryheap. That way, by using newly
added functions such as binaryheap_update_up() etc., both updating a
key and removing a node can be done in O(1) on an average and O(log n)
in worst case. This is known as the indexed binary heap. The caller
can specify to use the indexed binaryheap by passing indexed = true.

The current code does not use the new indexing logic, but it will be
used by an upcoming patch.

Reviewed-by: Vignesh C, Peter Smith, Hayato Kuroda, Ajin Cherian,
Tomas Vondra, Shubham Khanna
Discussion: https://postgr.es/m/CAD21AoDffo37RC-eUuyHJKVEr017V2YYDLyn1xF_00ofptWbkg%40mail.gmail.com
2024-04-03 10:44:21 +09:00
Masahiko Sawada
bcb14f4abc Make binaryheap enlargeable.
The node array space of the binaryheap is doubled when there is no
available space.

Reviewed-by: Vignesh C, Peter Smith, Hayato Kuroda, Ajin Cherian,
Tomas Vondra, Shubham Khanna
Discussion: https://postgr.es/m/CAD21AoDffo37RC-eUuyHJKVEr017V2YYDLyn1xF_00ofptWbkg%40mail.gmail.com
2024-04-03 10:27:43 +09:00
Daniel Gustafsson
a96a8b15fa Remove superfluous trailing semicolons
Two semicolons were accidentally added to rows which were already
terminated semicolons.  While harmless, fix by removing these.

Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4_fnJ0+yOgFioswzLE7t6R8P6cqbuacFVeZqbESFAjs1A@mail.gmail.com
2024-03-29 23:51:43 +01:00
Masahiko Sawada
80d5d4937c Fix potential integer handling issue in radixtree.h.
Coverity complained about the integer handling issue; if we start with
an arbitrary non-negative shift value, the loop may decrement it down
to something less than zero before exiting. This commit adds an
assertion to make sure the 'shift' is always 0 after the loop, and
uses 0 as the shift to get the key chunk in the following operation.

Introduced by ee1b30f12.

Reported-by: Tom Lane as per coverity
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/2089517.1711299216%40sss.pgh.pa.us
2024-03-25 12:06:41 +09:00
Daniel Gustafsson
b783186515 Add destroyStringInfo function for cleaning up StringInfos
destroyStringInfo() is a counterpart to makeStringInfo(), freeing a
palloc'd StringInfo and its data. This is a convenience function to
align the StringInfo API with the PQExpBuffer API. Originally added
in the OAuth patchset, it was extracted and committed separately in
order to aid upcoming JSON work.

Author: Daniel Gustafsson <daniel@yesql.se>
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAOYmi+mWdTd6ujtyF7MsvXvk7ToLRVG_tYAcaGbQLvf=N4KrQw@mail.gmail.com
2024-03-16 23:18:28 +01:00
John Naylor
e444ebcb85 Fix incorrect format specifier for int64
Follow-up to ee1b30f12, per buildfarm member mamba.

Discussion: https://postgr.es/m/CANWCAZYwyRMU%2BOTVOjK%3Dno1hm-W3ZQ5vrSFM1MFAaLtLydvwzA%40mail.gmail.com
2024-03-07 14:26:52 +07:00
John Naylor
ac234e6377 Fix redefinition of typedefs
Per buildfarm members sifaka and longfin, clang with
-Wtypedef-redefinition warns of duplicate typedefs unless building with
C11. Follow-up to ee1b30f12.

Masahiko Sawada

Discussion: https://postgr.es/m/CANWCAZauSg%3DLUbBbXhpeQtBuPifmzQNTYS6O8NsoAPz1zL-Txg%40mail.gmail.com
2024-03-07 14:11:49 +07:00
John Naylor
ee1b30f128 Add template for adaptive radix tree
This implements a radix tree data structure based on the design in
"The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases"
by Viktor Leis, Alfons Kemper, and ThomasNeumann, 2013. The main
technique that makes it adaptive is using several different node types,
each with a different capacity of elements, and a different algorithm
for accessing them. The nodes start small and grow/shrink as needed.

The main advantage over hash tables is efficient sorted iteration and
better memory locality when successive keys are lexicographically
close together. The implementation currently assumes 64-bit integer
keys, and traversing the tree is in general slower than a linear
probing hash table, so this is not a general-purpose associative array.

The paper describes two other techniques not implemented here,
namely "path compression" and "lazy expansion". These can further
reduce memory usage and speed up traversal, but the former would add
significant complexity and the latter requires storing the full key
with the value. We do trivially compress the path when leading bytes
of the key are zeros, however.

For value storage, we use "combined pointer/value slots", as
recommended in the paper. Values of size equal or smaller than the the
platform's pointer type are stored in the array of child pointers in
the last level node, while larger values are each stored in a separate
allocation. This is for now fixed at compile time, but it would be
fairly trivial to allow determining at runtime how variable-length
values are stored.

One innovation in our implementation compared to the ART paper is
decoupling the notion of node "size class" from "kind". The size
classes within a given node kind have the same underlying type, but
a variable capacity for children, so we can introduce additional node
sizes with little additional code.

To enable different use cases to specialize for different value types
and for shared/local memory, we use macro-templatized code generation
in the same manner as simplehash.h and sort_template.h.

Future commits will use this infrastructure for storing TIDs.

Patch by Masahiko Sawada and John Naylor, but a substantial amount of
credit is due to Andres Freund, whose proof-of-concept was a valuable
source of coding idioms and awareness of performance pitfalls, and
who reviewed earlier versions.

Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com
2024-03-07 12:40:11 +07:00
Nathan Bossart
e1724af42c Fix comments for the dshash_parameters struct.
A recent commit added a copy_function member to the
dshash_parameters struct, but it missed updating a couple of
comments that refer to the function pointer members of this struct.
One of those comments also refers to a tranche_name member and non-
arg variants of the function pointer members, all of which were
either removed during development or removed shortly after dshash
table support was committed.

Oversights in commits 8c0d7bafad, d7694fc148, and 42a1de3013.

Discussion: https://postgr.es/m/20240227045213.GA2329190%40nathanxps13
2024-02-27 09:44:59 -06:00
Nathan Bossart
42a1de3013 Add helper functions for dshash tables with string keys.
Presently, string keys are not well-supported for dshash tables.
The dshash code always copies key_size bytes into new entries'
keys, and dshash.h only provides compare and hash functions that
forward to memcmp() and tag_hash(), both of which do not stop at
the first NUL.  This means that callers must pad string keys so
that the data beyond the first NUL does not adversely affect the
results of copying, comparing, and hashing the keys.

To better support string keys in dshash tables, this commit does
a couple things:

* A new copy_function field is added to the dshash_parameters
  struct.  This function pointer specifies how the key should be
  copied into new table entries.  For example, we only want to copy
  up to the first NUL byte for string keys.  A dshash_memcpy()
  helper function is provided and used for all existing in-tree
  dshash tables without string keys.

* A set of helper functions for string keys are provided.  These
  helper functions forward to strcmp(), strcpy(), and
  string_hash(), all of which ignore data beyond the first NUL.

This commit also adjusts the DSM registry's dshash table to use the
new helper functions for string keys.

Reviewed-by: Andy Fan
Discussion: https://postgr.es/m/20240119215941.GA1322079%40nathanxps13
2024-02-26 15:47:13 -06:00
Nathan Bossart
6b80394781 Introduce overflow-safe integer comparison functions.
This commit adds integer comparison functions that are designed to
be as efficient as possible while avoiding overflow.  A follow-up
commit will make use of these functions in many of the in-tree
qsort() comparators.  The new functions are not better in all cases
(e.g., when the comparator function is inlined), so it is important
to consider the context before using them.

Author: Mats Kindahl
Reviewed-by: Tom Lane, Heikki Linnakangas, Andres Freund, Thomas Munro, Andrey Borodin, Fabrízio de Royes Mello
Discussion: https://postgr.es/m/CA%2B14426g2Wa9QuUpmakwPxXFWG_1FaY0AsApkvcTBy-YfS6uaw%40mail.gmail.com
2024-02-16 13:37:02 -06:00
Bruce Momjian
29275b1d17 Update copyright for 2024
Reported-by: Michael Paquier

Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz

Backpatch-through: 12
2024-01-03 20:49:05 -05:00
Peter Eisentraut
141752bbb0 Fix typos in simplehash.h
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/18252-d46d27900a277d87@postgresql.org
2024-01-02 10:18:47 +01:00
Jeff Davis
b282fa88df simplehash: preserve consistency in case of OOM.
Compute size first, then allocate, then update the structure.

Previously, an out-of-memory when growing could leave the hashtable in
an inconsistent state.

Discussion: https://postgr.es/m/20231117201334.eyb542qr5yk4gilq@awork3.anarazel.de
Reviewed-by: Andres Freund
Reviewed-by: Gurjeet Singh
2023-11-17 13:58:16 -08:00
David Rowley
f0efa5aec1 Introduce the concept of read-only StringInfos
There were various places in our codebase which conjured up a StringInfo
by manually assigning the StringInfo fields and setting the data field
to point to some existing buffer.  There wasn't much consistency here as
to what fields like maxlen got set to and in one location we didn't
correctly ensure that the buffer was correctly NUL terminated at len
bytes, as per what was documented as required in stringinfo.h

Here we introduce 2 new functions to initialize StringInfos.  One allows
callers to initialize a StringInfo passing along a buffer that is
already allocated by palloc.  Here the StringInfo code uses this buffer
directly rather than doing any memcpying into a new allocation.  Having
this as a function allows us to verify the buffer is correctly NUL
terminated.  StringInfos initialized this way can be appended to and
reset just like any other normal StringInfo.

The other new initialization function also accepts an existing buffer,
but the given buffer does not need to be a pointer to a palloc'd chunk.
This buffer could be a pointer pointing partway into some palloc'd chunk
or may not even be palloc'd at all.  StringInfos initialized this way
are deemed as "read-only".  This means that it's not possible to
append to them or reset them.

For the latter of the two new initialization functions mentioned above,
we relax the requirement that the data buffer must be NUL terminated.
Relaxing this requirement is convenient in a few places as it can save
us from having to allocate an entire new buffer just to add the NUL
terminator or save us from having to temporarily add a NUL only to have to
put the original char back again later.

Incompatibility note:

Here we also forego adding the NUL in a few places where it does not
seem to be required.  These locations are passing the given StringInfo
into a type's receive function.  It does not seem like any of our
built-in receive functions require this, but perhaps there's some UDT
out there in the wild which does require this.  It is likely worthy of
a mention in the release notes that a UDT's receive function mustn't rely
on the input StringInfo being NUL terminated.

Author: David Rowley
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAApHDvorfO3iBZ%3DxpiZvp3uHtJVLyFaPBSvcAhAq2HPLnaNSwQ%40mail.gmail.com
2023-10-26 16:31:48 +13:00
Nathan Bossart
c103d07381 Add function for removing arbitrary nodes in binaryheap.
This commit introduces binaryheap_remove_node(), which can be used
to remove any node from a binary heap.  The implementation is
straightforward.  The target node is replaced with the last node in
the heap, and then we sift as needed to preserve the heap property.
This new function is intended for use in a follow-up commit that
will improve the performance of pg_restore.

Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/3612876.1689443232%40sss.pgh.pa.us
2023-09-18 14:06:08 -07:00
Nathan Bossart
5af0263afd Make binaryheap available to frontend code.
There are a couple of places in frontend code that could make use
of this simple binary heap implementation.  This commit makes
binaryheap usable in frontend code, much like commit 26aaf97b68 did
for StringInfo.  Like StringInfo, the header file is left in lib/
to reduce the likelihood of unnecessary breakage.

The frontend version of binaryheap exposes a void *-based API since
frontend code does not have access to the Datum definitions.  This
seemed like a better approach than switching all existing uses to
void * or making the Datum definitions available to frontend code.

Reviewed-by: Tom Lane, Alvaro Herrera
Discussion: https://postgr.es/m/3612876.1689443232%40sss.pgh.pa.us
2023-09-18 12:18:33 -07:00
Andres Freund
f0a94d81e4 Fix type of iterator variable in SH_START_ITERATE
Also add comment to make the reasoning behind the Assert() more explicit (per
Tom).

Reported-by: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQAocXNJ6s1VLz+hMamLAQAiewRoW17OJ6-+9GACKfj6iPQ@mail.gmail.com
Backpatch: 11-
2023-07-06 09:57:28 -07:00
Michael Paquier
ef7002dbe0 Fix various typos in code and tests
Most of these are recent, and the documentation portions are new as of
v16 so there is no need for a backpatch.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20230208155644.GM1653@telsasoft.com
2023-02-09 14:43:53 +09:00
Tom Lane
3b4ac33254 Avoid type cheats for invalid dsa_handles and dshash_table_handles.
Invent separate macros for "invalid" values of these types, so that
we needn't embed knowledge of their representations into calling code.
These are all zeroes anyway ATM, so this is not fixing any live bug,
but it makes the code cleaner and more future-proof.

I (tgl) also chose to move DSM_HANDLE_INVALID into dsm_impl.h,
since it seems like it should live beside the typedef for dsm_handle.

Hou Zhijie, Nathan Bossart, Kyotaro Horiguchi, Tom Lane

Discussion: https://postgr.es/m/OS0PR01MB5716860B1454C34E5B179B6694C99@OS0PR01MB5716.jpnprd01.prod.outlook.com
2023-01-25 11:48:38 -05:00
Andres Freund
51384cc40c Add detached node functions to ilist
These allow to test whether an element is in a list by checking whether
prev/next are NULL. Needed to replace SHMQueueIsDetached() when converting
from SHM_QUEUE to ilist.h style lists.

Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20221120055930.t6kl3tyivzhlrzu2@awork3.anarazel.de
Discussion: https://postgr.es/m/20200211042229.msv23badgqljrdg2@alap3.anarazel.de
2023-01-18 11:41:14 -08:00
Peter Eisentraut
c8ad4d8166 Constify the arguments of ilist.c/h functions
Const qualifiers ensure that we don't do something stupid in the
function implementation.  Additionally they clarify the interface.  As
an example:

    void
    slist_delete(slist_head *head, const slist_node *node)

Here one can instantly tell that node->next is not going to be set to
NULL.  Finally, const qualifiers potentially allow the compiler to do
more optimizations.  This being said, no benchmarking was done for
this patch.

The functions that return non-const pointers like slist_next_node(),
dclist_next_node() etc. are not affected by the patch intentionally.

Author: Aleksander Alekseev
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CAJ7c6TM2%3D08mNKD9aJg8vEY9hd%2BG4L7%2BNvh30UiNT3kShgRgNg%40mail.gmail.com
2023-01-12 08:00:51 +01:00
Michael Paquier
33ab0a2a52 Fix typos in comments, code and documentation
While on it, newlines are removed from the end of two elog() strings.
The others are simple grammar mistakes.  One comment in pg_upgrade
referred incorrectly to sequences since a7e5457.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20221230231257.GI1153@telsasoft.com
Backpatch-through: 11
2023-01-03 16:26:14 +09:00
Bruce Momjian
c8e1ba736b Update copyright for 2023
Backpatch-through: 11
2023-01-02 15:00:37 -05:00