ca9c6a5680 consolidated most of the vacuum-related GUCs' documentation
into a new subsection. af2317652d then enforced this order in
postgresql.conf.sample. This commit reorganizes the GUC groups in
guc_tables.c/h to match the updated ordering in the docs.
Reported-by: Álvaro Herrera
Reviewed-by: Álvaro Herrera, Alena Rybakina
Discussion: https://postgr.es/m/202501132046.m4mcvxxswznu%40alvherre.pgsql
If a new catalog tuple is inserted that belongs to a catcache list
entry, and cache invalidation happens while the list entry is being
built, the list entry might miss the newly inserted tuple.
To fix, change the way we detect concurrent invalidations while a
catcache entry is being built. Keep a stack of entries that are being
built, and apply cache invalidation to those entries in addition to
the real catcache entries. This is similar to the in-progress list in
relcache.c.
Back-patch to all supported versions.
Reviewed-by: Noah Misch
Discussion: https://www.postgresql.org/message-id/2234dc98-06fe-42ed-b5db-ac17384dc880@iki.fi
This includes all the definitions for the various PGSTAT_KIND_* values,
the range allowed for custom stats kinds and some macros related all
that.
One use-case behind this split is the possibility to use this
information for frontend tools, without having to rely on pgstat.h and a
backend footprint.
Author: Michael Paquier
Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/Z24fyb3ipXKR38oS@paquier.xyz
This commit changes the way pending backend statistics are tracked by
moving them into a new structure called PgStat_BackendPending, removing
PgStat_BackendPendingIO. PgStat_BackendPending currently only includes
PgStat_PendingIO for the pending I/O stats.
pgstat_flush_backend() is extended with a "flags" argument to control
which parts of the stats of a backend should be flushed.
With this refactoring, it becomes easier to plug into backend statistics
more data. A patch to add information related to WAL in this stats kind
is under discussion.
Author: Bertrand Drouvot
Discussion: https://postgr.es/m/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
This adds a new variable-numbered statistics kind in pgstats, where the
object ID key of the stats entries is based on the proc number of the
backends. This acts as an upper-bound for the number of stats entries
that can exist at once. The entries are created when a backend starts
after authentication succeeds, and are removed when the backend exits,
making the stats entry exist for as long as their backend is up and
running. These are not written to the pgstats file at shutdown (note
that write_to_file is disabled, as a safety measure).
Currently, these stats include only information about the I/O generated
by a backend, using the same layer as pg_stat_io, except that it is now
possible to know how much activity is happening in each backend rather
than an overall aggregate of all the activity. A function called
pg_stat_get_backend_io() is added to access this data depending on the
PID of a backend. The existing structure could be expanded in the
future to add more information about other statistics related to
backends, depending on requirements or ideas.
Auxiliary processes are not included in this set of statistics. These
are less interesting to have than normal backends as they have dedicated
entries in pg_stat_io, and stats kinds of their own.
This commit includes also pg_stat_reset_backend_stats(), function able
to reset all the stats associated to a single backend.
Bump catalog version and PGSTAT_FILE_FORMAT_ID.
Author: Bertrand Drouvot
Reviewed-by: Álvaro Herrera, Kyotaro Horiguchi, Michael Paquier, Nazir
Bilal Yavuz
Discussion: https://postgr.es/m/ZtXR+CtkEVVE/LHF@ip-10-97-1-34.eu-west-3.compute.internal
Create API entry points pg_strlower(), etc., that work with any
provider and give the caller control over the destination
buffer. Then, move provider-specific logic into pg_locale_builtin.c,
pg_locale_icu.c, and pg_locale_libc.c as appropriate.
Discussion: https://postgr.es/m/7aa46d77b377428058403723440862d12a8a129a.camel@j-davis.com
Our parallel-mode code only works when we are executing a query
in full, so ExecutePlan must disable parallel mode when it is
asked to do partial execution. The previous logic for this
involved passing down a flag (variously named execute_once or
run_once) from callers of ExecutorRun or PortalRun. This is
overcomplicated, and unsurprisingly some of the callers didn't
get it right, since it requires keeping state that not all of
them have handy; not to mention that the requirements for it were
undocumented. That led to assertion failures in some corner
cases. The only state we really need for this is the existing
QueryDesc.already_executed flag, so let's just put all the
responsibility in ExecutePlan. (It could have been done in
ExecutorRun too, leading to a slightly shorter patch -- but if
there's ever more than one caller of ExecutePlan, it seems better
to have this logic in the subroutine than the callers.)
This makes those ExecutorRun/PortalRun parameters unnecessary.
In master it seems okay to just remove them, returning the
API for those functions to what it was before parallelism.
Such an API break is clearly not okay in stable branches,
but for them we can just leave the parameters in place after
documenting that they do nothing.
Per report from Yugo Nagata, who also reviewed and tested
this patch. Back-patch to all supported branches.
Discussion: https://postgr.es/m/20241206062549.710dc01cf91224809dd6c0e1@sraoss.co.jp
Remove the 'whenTaken' and 'lsn' fields from SnapshotData. After the
removal of the "snapshot too old" feature, they were never set to a
non-zero value.
This largely reverts commit 3e2f3c2e42, which added the
OldestActiveSnapshot tracking, and the init_toast_snapshot()
function. That was only required for setting the 'whenTaken' and 'lsn'
fields. SnapshotToast is now a constant again, like SnapshotSelf and
SnapshotAny. I kept a thin get_toast_snapshot() wrapper around
SnapshotToast though, to check that you have a registered or active
snapshot. That's still a useful sanity check.
Reviewed-by: Nathan Bossart, Andres Freund, Tom Lane
Discussion: https://www.postgresql.org/message-id/cd4b4f8c-e63a-41c0-95f6-6e6cd9b83f6d@iki.fi
Redefine our exact width types with standard C99 types and macros,
including int64_t, INT64_MAX, INT64_C(), PRId64 etc. We were already
using <stdint.h> types in a few places.
One complication is that Windows' <inttypes.h> uses format strings like
"%I64d", "%I32", "%I" for PRI*64, PRI*32, PTR*PTR, instead of mapping to
other standardized format strings like "%lld" etc as seen on other known
systems. Teach our snprintf.c to understand them.
This removes a lot of configure clutter, and should also allow 64-bit
numbers and other standard types to be used in localized messages
without casting.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/ME3P282MB3166F9D1F71F787929C0C7E7B6312%40ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM
Commit ccd38024bc, which introduced the pg_signal_autovacuum_worker
role, added a call to pgstat_get_beentry_by_proc_number() for the
purpose of determining whether the process is an autovacuum worker.
This function calls pgstat_read_current_status(), which can be
fairly expensive and may return cached, out-of-date information.
Since we just need to look up the target backend's BackendType, and
we already know its ProcNumber, we can instead inspect the
BackendStatusArray directly, which is much less expensive and
possibly more up-to-date. There are some caveats with this
approach (which are documented in the code), but it's still
substantially better than before.
Reported-by: Andres Freund
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/ujenaa2uabzfkwxwmfifawzdozh3ljr7geozlhftsuosgm7n7q%40g3utqqyyosb6
This new field controls if entries of a stats kind should be written or
not to the on-disk pgstats file when shutting down an instance. This
affects both fixed and variable-numbered kinds.
This is useful for custom statistics by itself, and a patch is under
discussion to add a new builtin stats kind where the write of the stats
is not necessary. All the built-in stats kinds, as well as the two
custom stats kinds in the test module injection_points, set this flag to
"true" for now, so as stats entries are written to the on-disk pgstats
file.
Author: Bertrand Drouvot
Reviewed-by: Nazir Bilal Yavuz
Discussion: https://postgr.es/m/Zz7T47nHwYgeYwOe@ip-10-97-1-34.eu-west-3.compute.internal
pg_memory_is_all_zeros() is currently implemented to do only a
byte-per-byte comparison. While being sufficient for its existing
callers for pgstats entries, it could lead to performance regressions
should it be used for larger memory areas, like 8kB blocks, or even
future commits.
This commit optimizes the implementation of this function to be more
efficient for larger sizes, written in a way so that compilers can
optimize the code. This is portable across 32b and 64b architectures.
The implementation handles three cases, depending on the size of the
input provided:
* If less than sizeof(size_t), do a simple byte-by-byte comparison.
* If between sizeof(size_t) and (sizeof(size_t) * 8 - 1):
** Phase 1: byte-by-byte comparison, until the pointer is aligned.
** Phase 2: size_t comparisons, with aligned pointers, up to the last
aligned location possible.
** Phase 3: byte-by-byte comparison, until the end location.
* If more than (sizeof(size_t) * 8) bytes, this is the same as case 2
except that an additional phase is placed between Phase 1 and Phase 2,
with 8 * sizeof(size_t) comparisons using bitwise OR, to encourage
compilers to use SIMD instructions if available.
The last improvement proves to be at least 3 times faster than the
size_t comparisons, which is something currently used for the all-zero
page check in PageIsVerifiedExtended().
The optimization tricks that would encourage the use of SIMD
instructions have been suggested by David Rowley.
Author: Bertrand Drouvot
Reviewed-by: Michael Paquier, Ranier Vilela
Discussion: https://postgr.es/m/CAApHDvq7P-JgFhgtxUPqhavG-qSDVUhyWaEX9M8_MNorFEijZA@mail.gmail.com
This fixes a set of race conditions with cumulative statistics where a
shared stats entry could be dropped while it should still be valid in
the event when it is reused: an entry may refer to a different object
but requires the same hash key. This can happen with various stats
kinds, like:
- Replication slots that compute internally an index number, for
different slot names.
- Stats kinds that use an OID in the object key, where a wraparound
causes the same key to be used if an OID is used for the same object.
- As of PostgreSQL 18, custom pgstats kinds could also be an issue,
depending on their implementation.
This issue is fixed by introducing a counter called "generation" in the
shared entries via PgStatShared_HashEntry, initialized at 0 when an
entry is created and incremented when the same entry is reused, to avoid
concurrent issues on drop because of other backends still holding a
reference to it. This "generation" is copied to the local copy that a
backend holds when looking at an object, then cross-checked with the
shared entry to make sure that the entry is not dropped even if its
"refcount" justifies that if it has been reused.
This problem could show up when a backend shuts down and needs to
discard any entries it still holds, causing statistics to be removed
when they should not, or even an assertion failure. Another report
involved a failure in a standby after an OID wraparound, where the
startup process would FATAL on a "can only drop stats once", stopping
recovery abruptly. The buildfarm has been sporadically complaining
about the problem, as well, but the window is hard to reach with the
in-core tests.
Note that the issue can be reproduced easily by adding a sleep before
dshash_find() in pgstat_release_entry_ref() to enlarge the problematic
window while repeating test_decoding's isolation test oldest_xmin a
couple of times, for example, as pointed out by Alexander Lakhin.
Reported-by: Alexander Lakhin, Peter Smith
Author: Kyotaro Horiguchi, Michael Paquier
Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/CAA4eK1KxuMVyAryz_Vk5yq3ejgKYcL6F45Hj9ZnMNBS-g+PuZg@mail.gmail.com
Discussion: https://postgr.es/m/17947-b9554521ad963c9c@postgresql.org
Backpatch-through: 15
We now create contype='n' pg_constraint rows for not-null constraints on
user tables. Only one such constraint is allowed for a column.
We propagate these constraints to other tables during operations such as
adding inheritance relationships, creating and attaching partitions and
creating tables LIKE other tables. These related constraints mostly
follow the well-known rules of conislocal and coninhcount that we have
for CHECK constraints, with some adaptations: for example, as opposed to
CHECK constraints, we don't match not-null ones by name when descending
a hierarchy to alter or remove it, instead matching by the name of the
column that they apply to. This means we don't require the constraint
names to be identical across a hierarchy.
The inheritance status of these constraints can be controlled: now we
can be sure that if a parent table has one, then all children will have
it as well. They can optionally be marked NO INHERIT, and then children
are free not to have one. (There's currently no support for altering a
NO INHERIT constraint into inheriting down the hierarchy, but that's a
desirable future feature.)
This also opens the door for having these constraints be marked NOT
VALID, as well as allowing UNIQUE+NOT NULL to be used for functional
dependency determination, as envisioned by commit e49ae8d3bc. It's
likely possible to allow DEFERRABLE constraints as followup work, as
well.
psql shows these constraints in \d+, though we may want to reconsider if
this turns out to be too noisy. Earlier versions of this patch hid
constraints that were on the same columns of the primary key, but I'm
not sure that that's very useful. If clutter is a problem, we might be
better off inventing a new \d++ command and not showing the constraints
in \d+.
For now, we omit these constraints on system catalog columns, because
they're unlikely to achieve anything.
The main difference to the previous attempt at this (b0e96f3119) is
that we now require that such a constraint always exists when a primary
key is in the column; we didn't require this previously which had a
number of unpalatable consequences. With this requirement, the code is
easier to reason about. For example:
- We no longer have "throwaway constraints" during pg_dump. We needed
those for the case where a table had a PK without a not-null
underneath, to prevent a slow scan of the data during restore of the
PK creation, which was particularly problematic for pg_upgrade.
- We no longer have to cope with attnotnull being set spuriously in
case a primary key is dropped indirectly (e.g., via DROP COLUMN).
Some bits of code in this patch were authored by Jian He.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Author: Bernd Helmle <mailings@oopsware.de>
Reviewed-by: 何建 (jian he) <jian.universality@gmail.com>
Reviewed-by: 王刚 (Tender Wang) <tndrwang@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/202408310358.sdhumtyuy2ht@alvherre.pgsql
A CacheInvalidateHeapTuple* callee might call
CatalogCacheInitializeCache(), which needs a relcache entry. Acquiring
a valid relcache entry might scan pg_class. Hence, to prevent
undetected LWLock self-deadlock, CacheInvalidateHeapTuple* callers must
not hold BUFFER_LOCK_EXCLUSIVE on buffers of pg_class. Move the
CacheInvalidateHeapTupleInplace() before the BUFFER_LOCK_EXCLUSIVE. No
back-patch, since I've reverted commit
243e9b40f1 from non-master branches.
Reported by Alexander Lakhin. Reviewed by Alexander Lakhin.
Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com
This new function tests if a memory region starting at a given location
for a defined length is made only of zeroes. This unifies in a single
path the all-zero checks that were happening in a couple of places of
the backend code:
- For pgstats entries of relation, checkpointer and bgwriter, where
some "all_zeroes" variables were previously used with memcpy().
- For all-zero buffer pages in PageIsVerifiedExtended().
This new function uses the same forward scan as the check for all-zero
buffer pages, applying it to the three pgstats paths mentioned above.
Author: Bertrand Drouvot
Reviewed-by: Peter Eisentraut, Heikki Linnakangas, Peter Smith
Discussion: https://postgr.es/m/ZupUDDyf1hHI4ibn@ip-10-97-1-34.eu-west-3.compute.internal
The inplace update survives ROLLBACK. The inval didn't, so another
backend's DDL could then update the row without incorporating the
inplace update. In the test this fixes, a mix of CREATE INDEX and ALTER
TABLE resulted in a table with an index, yet relhasindex=f. That is a
source of index corruption. Back-patch to v12 (all supported versions).
The back branch versions don't change WAL, because those branches just
added end-of-recovery SIResetAll(). All branches change the ABI of
extern function PrepareToInvalidateCacheTuple(). No PGXN extension
calls that, and there's no apparent use case in extensions.
Reviewed by Nitin Motiani and (in earlier versions) Andres Freund.
Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com
Currently, when a single relcache entry gets invalidated,
TypeCacheRelCallback() has to loop over all type cache entries to find
appropriate typentry to invalidate. Unfortunately, using the syscache here
is impossible, because this callback could be called outside a transaction
and this makes impossible catalog lookups. This is why present commit
introduces RelIdToTypeIdCacheHash to map relation OID to its composite type
OID.
We are keeping RelIdToTypeIdCacheHash entry while corresponding type cache
entry have something to clean. Therefore, RelIdToTypeIdCacheHash shouldn't
get bloat in the case of temporary tables flood.
There are many places in lookup_type_cache() where syscache invalidation,
user interruption, or even error could occur. In order to handle this, we
keep an array of in-progress type cache entries. In the case of
lookup_type_cache() interruption this array is processed to keep
RelIdToTypeIdCacheHash in a consistent state.
Discussion: https://postgr.es/m/5812a6e5-68ae-4d84-9d85-b443176966a1%40sigaev.ru
Author: Teodor Sigaev
Reviewed-by: Aleksander Alekseev, Tom Lane, Michael Paquier, Roman Zharkov
Reviewed-by: Andrei Lepikhov, Pavel Borisov, Jian He, Alexander Lakhin
Reviewed-by: Artur Zakirov
Instead of XXX_IN_XLOCALE_H for several features XXX, let's just
include <xlocale.h> if HAVE_XLOCALE_H. The reason for the extra
complication was apparently that some old glibc systems also had an
<xlocale.h>, and you weren't supposed to include it directly, but it's
gone now (as far as I can tell it was harmless to do so anyway).
Author: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CWZBBRR6YA8D.8EHMDRGLCKCD%40neon.tech
The previous commit fixed some ways of losing an inplace update. It
remained possible to lose one when a backend working toward a
heap_update() copied a tuple into memory just before inplace update of
that tuple. In catalogs eligible for inplace update, use LOCKTAG_TUPLE
to govern admission to the steps of copying an old tuple, modifying it,
and issuing heap_update(). This includes MERGE commands. To avoid
changing most of the pg_class DDL, don't require LOCKTAG_TUPLE when
holding a relation lock sufficient to exclude inplace updaters.
Back-patch to v12 (all supported versions). In v13 and v12, "UPDATE
pg_class" or "UPDATE pg_database" can still lose an inplace update. The
v14+ UPDATE fix needs commit 86dc90056d,
and it wasn't worth reimplementing that fix without such infrastructure.
Reviewed by Nitin Motiani and (in earlier versions) Heikki Linnakangas.
Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com
Ensure that error paths within these functions do not leak a collator,
and return the result rather than using an out parameter. (Error paths
in the caller may still result in a leaked collator, which will be
addressed separately.)
In make_libc_collator(), if the first newlocale() succeeds and the
second one fails, close the first locale_t object.
The function make_icu_collator() doesn't have any external callers, so
change it to be static.
Discussion: https://postgr.es/m/54d20e812bd6c3e44c10eddcd757ec494ebf1803.camel@j-davis.com
This opens the possibility to define keys for more types of statistics
kinds in PgStat_HashKey, the first case being 8-byte query IDs for
statistics like pg_stat_statements.
This increases the size of PgStat_HashKey from 12 to 16 bytes, while
PgStatShared_HashEntry, entry stored in the dshash for pgstats, keeps
the same size due to alignment.
xl_xact_stats_item, that tracks the stats items to drop in commit WAL
records, is increased from 12 to 16 bytes. Note that individual chunks
in commit WAL records should be multiples of sizeof(int), hence 8-byte
object IDs are stored as two uint32, based on a suggestion from Heikki
Linnakangas.
While on it, the field of PgStat_HashKey is renamed from "objoid" to
"objid", as for some stats kinds this field does not refer to OIDs but
just IDs, like for replication slot stats.
This commit bumps the following format variables:
- PGSTAT_FILE_FORMAT_ID, as PgStat_HashKey is written to the stats file
for non-serialized stats kinds in the dshash table.
- XLOG_PAGE_MAGIC for the changes in xl_xact_stats_item.
- Catalog version, for the SQL function pg_stat_have_stats().
Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/ZsvTS9EW79Up8I62@paquier.xyz
Remove redundant checks for locale->collate_is_c now that we always
have a valid pg_locale_t.
Also, remove pg_locale_deterministic() wrapper, which is no longer
useful after commit e9931bfb75. Just check the field directly,
consistent with other fields in pg_locale_t.
Author: Andreas Karlsson
Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se
1eff8279d added an API to tuplestore.c to allow callers to obtain
storage telemetry data. That API wasn't quite good enough for callers
that perform tuplestore_clear() as the telemetry functions only
accounted for the current state of the tuplestore, not the maximums
before tuplestore_clear() was called.
There's a pending patch that would like to add tuplestore telemetry
output to EXPLAIN ANALYZE for WindowAgg. That node type uses
tuplestore_clear() before moving to the next window partition and we
want to show the maximum space used, not the space used for the final
partition.
Reviewed-by: Tatsuo Ishii, Ashutosh Bapat
Discussion: https://postgres/m/CAApHDvoY8cibGcicLV0fNh=9JVx9PANcWvhkdjBnDCc9Quqytg@mail.gmail.com
This commit adds two callbacks in pgstats to have a better control of
the flush timing of pgstat_report_stat(), whose operation depends on the
three PGSTAT_*_INTERVAL variables:
- have_fixed_pending_cb(), to check if a stats kind has any pending
data waiting for a flush. This is used as a fast path if there are no
pending statistics to flush, and this check is done for fixed-numbered
statistics only if there are no variable-numbered statistics to flush.
A flush will need to happen if at least one callback reports any pending
data.
- flush_fixed_cb(), to do the actual flush.
These callbacks are currently used by the SLRU, WAL and IO statistics,
generalizing the concept for all stats kinds (builtin and custom).
The SLRU and IO stats relied each on one global variable to determine
whether a flush should happen; these are now local to pgstat_slru.c and
pgstat_io.c, cleaning up a bit how the pending flush states are tracked
in pgstat.c.
pgstat_flush_io() and pgstat_flush_wal() are still required, but we do
not need to check their return result anymore.
Reviewed-by: Bertrand Drouvot, Kyotaro Horiguchi
Discussion: https://postgr.es/m/ZtaVO0N-aTwiAk3w@paquier.xyz
Instead always fetch the locale and look at the ctype_is_c field.
hba.c relies on regexes working for the C locale without needing
catalog access, which worked before due to a special case for
C_COLLATION_OID in lc_ctype_is_c(). Move the special case to
pg_set_regex_collation() now that lc_ctype_is_c() is gone.
Author: Andreas Karlsson
Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se
pgstat_initialize() is currently used by the WAL stats as a code path to
take some custom actions when a backend starts. A callback is added to
generalize the concept so as all stats kinds can do the same, for
builtin and custom kinds, if set.
Reviewed-by: Bertrand Drouvot, Kyotaro Horiguchi
Discussion: https://postgr.es/m/ZtZr1K4PLdeWclXY@paquier.xyz
To make them follow the usual naming convention where
FoobarShmemSize() calculates the amount of shared memory needed by
Foobar subsystem, and FoobarShmemInit() performs the initialization.
I didn't rename CreateLWLocks() and InitShmmeIndex(), because they are
a little special. They need to be called before any of the other
ShmemInit() functions, because they set up the shared memory
bookkeeping itself. I also didn't rename InitProcGlobal(), because
unlike other Shmeminit functions, it's not called by individual
backends.
Reviewed-by: Andreas Karlsson
Discussion: https://www.postgresql.org/message-id/c09694ff-2453-47e5-b26c-32a16cd75ce6@iki.fi
We advance origin progress during abort on successful streaming and
application of ROLLBACK in parallel streaming mode. But the origin
shouldn't be advanced during an error or unsuccessful apply due to
shutdown. Otherwise, it will result in a transaction loss as such a
transaction won't be sent again by the server.
Reported-by: Hou Zhijie
Author: Hayato Kuroda and Shveta Malik
Reviewed-by: Amit Kapila
Backpatch-through: 16
Discussion: https://postgr.es/m/TYAPR01MB5692FAC23BE40C69DA8ED4AFF5B92@TYAPR01MB5692.jpnprd01.prod.outlook.com
The TRACE_SORT macro guarded the availability of the trace_sort GUC
setting. But it has been enabled by default ever since it was
introduced in PostgreSQL 8.1, and there have been no reports that
someone wanted to disable it. So just remove the macro to simplify
things. (For the avoidance of doubt: The trace_sort GUC is still
there. This only removes the rarely-used macro guarding it.)
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/be5f7162-7c1d-44e3-9a78-74dcaa6529f2%40eisentraut.org
The code intends to allow GUCs to be set within parallel workers
via function SET clauses, but not otherwise. However, doing so fails
for "session_authorization" and "role", because the assign hooks for
those attempt to set the subsidiary "is_superuser" GUC, and that call
falls foul of the "not otherwise" prohibition. We can't switch to
using GUC_ACTION_SAVE for this, so instead add a new GUC variable
flag GUC_ALLOW_IN_PARALLEL to mark is_superuser as being safe to set
anyway. (This is okay because is_superuser has context PGC_INTERNAL
and thus only hard-wired calls can change it. We'd need more thought
before applying the flag to other GUCs; but maybe there are other
use-cases.) This isn't the prettiest fix perhaps, but other
alternatives we thought of would be much more invasive.
While here, correct a thinko in commit 059de3ca4: when rejecting
a GUC setting within a parallel worker, we should return 0 not -1
if the ereport doesn't longjmp. (This seems to have no consequences
right now because no caller cares, but it's inconsistent.) Improve
the comments to try to forestall future confusion of the same kind.
Despite the lack of field complaints, this seems worth back-patching.
Thanks to Nathan Bossart for the idea to invent a new flag,
and for review.
Discussion: https://postgr.es/m/2833457.1723229039@sss.pgh.pa.us
This new function iterates hash entries with given hash values. This function
is designed to avoid full sequential hash search in the syscache invalidation
callbacks.
Discussion: https://postgr.es/m/5812a6e5-68ae-4d84-9d85-b443176966a1%40sigaev.ru
Author: Teodor Sigaev
Reviewed-by: Aleksander Alekseev, Tom Lane, Michael Paquier, Roman Zharkov
Reviewed-by: Andrei Lepikhov
When pg_dump retrieves the list of database objects and performs the
data dump, there was possibility that objects are replaced with others
of the same name, such as views, and access them. This vulnerability
could result in code execution with superuser privileges during the
pg_dump process.
This issue can arise when dumping data of sequences, foreign
tables (only 13 or later), or tables registered with a WHERE clause in
the extension configuration table.
To address this, pg_dump now utilizes the newly introduced
restrict_nonsystem_relation_kind GUC parameter to restrict the
accesses to non-system views and foreign tables during the dump
process. This new GUC parameter is added to back branches too, but
these changes do not require cluster recreation.
Back-patch to all supported branches.
Reviewed-by: Noah Misch
Security: CVE-2024-7348
Backpatch-through: 12
This is useful for extensions to get snapshot and shmem data for custom
cumulative statistics when these have a fixed number of objects, so as
these do not need to know about the snapshot internals, aka pgStatLocal.
An upcoming commit introducing an example template for custom cumulative
stats with fixed-numbered objects will make use of these. I have
noticed that this is useful for extension developers while hacking my
own example, actually.
Author: Michael Paquier
Reviewed-by: Dmitry Dolgov, Bertrand Drouvot
Discussion: https://postgr.es/m/Zmqm9j5EO0I4W8dx@paquier.xyz
This commit adds support in the backend for $subject, allowing
out-of-core extensions to plug their own custom kinds of cumulative
statistics. This feature has come up a few times into the lists, and
the first, original, suggestion came from Andres Freund, about
pg_stat_statements to use the cumulative statistics APIs in shared
memory rather than its own less efficient internals. The advantage of
this implementation is that this can be extended to any kind of
statistics.
The stats kinds are divided into two parts:
- The in-core "builtin" stats kinds, with designated initializers, able
to use IDs up to 128.
- The "custom" stats kinds, able to use a range of IDs from 128 to 256
(128 slots available as of this patch), with information saved in
TopMemoryContext. This can be made larger, if necessary.
There are two types of cumulative statistics in the backend:
- For fixed-numbered objects (like WAL, archiver, etc.). These are
attached to the snapshot and pgstats shmem control structures for
efficiency, and built-in stats kinds still do that to avoid any
redirection penalty. The data of custom kinds is stored in a first
array in snapshot structure and a second array in the shmem control
structure, both indexed by their ID, acting as an equivalent of the
builtin stats.
- For variable-numbered objects (like tables, functions, etc.). These
are stored in a dshash using the stats kind ID in the hash lookup key.
Internally, the handling of the builtin stats is unchanged, and both
fixed and variabled-numbered objects are supported. Structure
definitions for builtin stats kinds are renamed to reflect better the
differences with custom kinds.
Like custom RMGRs, custom cumulative statistics can only be loaded with
shared_preload_libraries at startup, and must allocate a unique ID
shared across all the PostgreSQL extension ecosystem with the following
wiki page to avoid conflicts:
https://wiki.postgresql.org/wiki/CustomCumulativeStats
This makes the detection of the stats kinds and their handling when
reading and writing stats much easier than, say, allocating IDs for
stats kinds from a shared memory counter, that may change the ID used by
a stats kind across restarts. When under development, extensions can
use PGSTAT_KIND_EXPERIMENTAL.
Two examples that can be used as templates for fixed-numbered and
variable-numbered stats kinds will be added in some follow-up commits,
with tests to provide coverage.
Some documentation is added to explain how to use this plugin facility.
Author: Michael Paquier
Reviewed-by: Dmitry Dolgov, Bertrand Drouvot
Discussion: https://postgr.es/m/Zmqm9j5EO0I4W8dx@paquier.xyz
This converts
COPY_PARSE_PLAN_TREES
WRITE_READ_PARSE_PLAN_TREES
RAW_EXPRESSION_COVERAGE_TEST
into run-time parameters
debug_copy_parse_plan_trees
debug_write_read_parse_plan_trees
debug_raw_expression_coverage_test
They can be activated for tests using PG_TEST_INITDB_EXTRA_OPTS.
The compile-time symbols are kept for build farm compatibility, but
they now just determine the default value of the run-time settings.
Furthermore, support for these settings is not compiled in at all
unless assertions are enabled, or the new symbol
DEBUG_NODE_TESTS_ENABLED is defined at compile time, or any of the
legacy compile-time setting symbols are defined. So there is no
run-time overhead in production builds. (This is similar to the
handling of DISCARD_CACHES_ENABLED.)
Discussion: https://www.postgresql.org/message-id/flat/30747bd8-f51e-4e0c-a310-a6e2c37ec8aa%40eisentraut.org
Now that the result of pg_newlocale_from_collation() is always
non-NULL, then we can move the collate_is_c and ctype_is_c flags into
pg_locale_t. That simplifies the logic in lc_collate_is_c() and
lc_ctype_is_c(), removing the dependence on setlocale().
This commit also eliminates the multi-stage initialization of the
collation cache.
As long as we have catalog access, then it's now safe to call
pg_newlocale_from_collation() without checking lc_collate_is_c()
first.
Discussion: https://postgr.es/m/cfd9eb85-c52a-4ec9-a90e-a5e4de56e57d@eisentraut.org
Reviewed-by: Peter Eisentraut, Andreas Karlsson