This function is able to clear the data associated to an extended
statistics object, making things so as the object looks as
newly-created.
The caller of this function needs the following arguments for the
extended stats to clear:
- The name of the relation.
- The schema name of the relation.
- The name of the extended stats object.
- The schema name of the extended stats object.
- If the stats are inherited or not.
The first two parameters are especially important to ensure a consistent
lookup and ACL checks for the relation on which is based the extended
stats object that will be cleared, relying first on a RangeVar lookup
where permissions are checked without locking a relation, critical to
prevent denial-of-service attacks when using this kind of function (see
also 688dc6299a for a similar concern). The third to fifth arguments
give a way to target the extended stats records to clear.
This has been extracted from a larger patch by the same author, for a
piece which is again useful on its own. I have rewritten large portions
of it. The tests have been extended while discussing this piece,
resulting on what this commit includes. The intention behind this
feature is to add support for the import of extended statistics across
dumps and upgrades, this change building one piece that we will be able
to rely on for the rest of the changes.
Bump catalog version.
Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
This patch completes the transition to making inet_ops be default for
inet/cidr columns, rather than btree_gist's opclasses. Once we do
that, though, pg_upgrade has a big problem. A dump from an older
version will see btree_gist's opclasses as being default, so it will
not mention the opclass explicitly in CREATE INDEX commands, which
would cause the restore to create the indexes using inet_ops. Since
that's not compatible with what's actually in the files, havoc would
ensue.
This isn't readily fixable, because the CREATE INDEX command strings
are built by the older server's pg_get_indexdef() function; pg_dump
hasn't nearly enough knowledge to modify those strings successfully.
Even if we cared to put in the work to make that happen in pg_dump,
it would be counterproductive because the end goal here is to get
people off of these opclasses. Allowing such indexes to persist
through pg_upgrade wouldn't advance that goal.
Therefore, this patch just adds code to pg_upgrade to detect indexes
that would be problematic and refuse to upgrade.
There's another issue too: even without any indexes to worry about,
pg_dump in binary-upgrade mode will reproduce the "CREATE OPERATOR
CLASS ... DEFAULT" commands for btree_gist's opclasses, and those
will fail because now we have a built-in opclass that provides a
conflicting default. We could ask users to drop the btree_gist
extension altogether before upgrading, but that would carry very
severe penalties. It would affect perfectly-valid indexes for other
data types, and it would drop operators that might be relied on in
views or other database objects. Instead, put a hack in DefineOpClass
to ignore the DEFAULT clauses for these opclasses when in
binary-upgrade mode. This will result in installing a version of
btree_gist that isn't quite the version it claims to be, but that can
be fixed by issuing ALTER EXTENSION UPDATE afterwards.
Since we don't apply that hack when not in binary-upgrade mode,
it is now impossible to install any version of btree_gist
less than 1.9 via CREATE EXTENSION.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/2483812.1754072263@sss.pgh.pa.us
When repurposing a standby to a logical replica, pg_createsubscriber
uses for the new replica a set of configuration parameters saved into
postgresql.auto.conf, to force recovery patterns when the physical
replica is promoted.
While not wrong in practice, this approach can cause issues when forcing
again recovery on a logical replica or its base backup as the recovery
parameters are not reset on the target server once pg_createsubscriber
is done with the node.
This commit aims at improving the situation, by changing the way
recovery parameters are saved on the target node. Instead of writing
all the configuration to postgresql.auto.conf, this file now uses an
include_if_exists, that points to a pg_createsubscriber.conf. This new
file contains all the recovery configuration, and is renamed to
pg_createsubscriber.conf.disabled when pg_createsubscriber exits. This
approach resets the recovery parameters, and offers the benefit to keep
a trace of the setup used when the target node got promoted, for
debugging purposes. If pg_createsubscriber.conf cannot be renamed
(unlikely scenario), a warning is issued to inform users that a manual
intervention may be required to reset this configuration.
This commit includes a test case to demonstrate the problematic case: a
standby node created from a base backup of what was the target node of
pg_createsubscriber does not get confused when started. If removing
this new logic, the test fails with the standby not able to start due
to an incorrect recovery target setup, where the startup process fails
quickly with a FATAL.
I have provided the design idea for the patch, that Alyona has written
(with some code adjustments from me). This could be considered as a
bug, but after discussion this is put into the bucket for improvements.
Redesigning pg_createsubscriber would not be acceptable in the stable
branches anyway.
Author: Alyona Vinter <dlaaren8@gmail.com>
Reviewed-by: Ilyasov Ian <ianilyasov@outlook.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andrey Rudometov <unlimitedhikari@gmail.com>
Discussion: https://postgr.es/m/CAGWv16K6L6Pzm99i1KiXLjFWx2bUS3DVsR6yV87-YR9QO7xb3A@mail.gmail.com
The USER and IN GROUP clauses of CREATE ROLE are deprecated, and
commit 8e78f0a1 removed them from the CREATE ROLE syntax syntax
synopsis in the docs. However, previously CREATE USER and
CREATE GROUP docs still listed these clauses.
Since CREATE USER is equivalent to CREATE ROLE ... WITH LOGIN and
CREATE GROUP is equivalent to CREATE ROLE, their documented syntax
synopsis should match CREATE ROLE to avoid confusion.
Therefore this commit removes the deprecated USER and IN GROUP
clauses from the CREATE USER and CREATE GROUP syntax synopsis
in the docs.
Author: Japin Li <japinli@hotmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/MEAPR01MB3031C30E72EF16CFC08C8565B687A@MEAPR01MB3031.ausprd01.prod.outlook.com
Commit c7eab0e97 changed the default password_encryption setting to
'scram-sha-256', so update the example for creating a user with an
assigned password.
In addition, commit 08951a7c9 added new options that in turn pass
default tokens NOBYPASSRLS and NOREPLICATION to the CREATE ROLE
command, so fix this omission as well for v16 and later.
Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/cff1ea60-c67d-4320-9e33-094637c2c4fb%40iki.fi
Backpatch-through: 14
This new identifier type provides support for 64-bit unsigned values,
to be used in catalogs, like OIDs. An advantage of a new data type is
that it becomes easier to grep for it in the code when assigning this
type to a catalog attribute, linking it to dedicated APIs and internal
structures.
The following operators are added in this commit, with dedicated tests:
- Casts with integer types and OID.
- btree and hash operators
- min/max functions.
- C type with related macros and defines, named around "Oid8".
This has been mentioned as useful on its own on the thread to add
support for 64-bit TOAST values, so as it becomes possible to attach
this data type to the TOAST code and catalog definitions. However, as
this concept can apply to many more areas, it is implemented as its own
independent change. This is based on a discussion with Andres Freund
and Tom Lane.
Bump catalog version.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com>
Discussion: https://postgr.es/m/1891064.1754681536@sss.pgh.pa.us
Background workers gain a new flag, called BGWORKER_INTERRUPTIBLE, that
offers the possibility to terminate the workers when these are connected
to a database that is involved in one of the following commands:
ALTER DATABASE RENAME TO
ALTER DATABASE SET TABLESPACE
CREATE DATABASE
DROP DATABASE
This is useful to give background workers the same behavior as backends
and autovacuum workers, which are stopped when these commands are
executed. The default behavior, that exists since 9.3, is still to
never terminate bgworkers connected to the database involved in any of
these commands. The new flag has to be set to terminate the workers.
A couple of tests are added to worker_spi to track the commands that
impact the termination of the workers. There is a test case for a
non-interruptible worker, additionally, that relies on an injection
point to make the wait time in CountOtherDBBackends() reduced from 5s to
0.3s for faster test runs. The tests rely on the contents of the server
logs to check if a worker has been started or terminated:
- LOG generated by worker_spi_main() at startup, once connection to
database is done.
- FATAL in bgworker_die() when terminated.
A couple of tests run in the CI have showed that this method is stable
enough. The safe_psql() calls that scan pg_stat_activity could be
replaced with some poll_query_until() for more stability, if the current
method proves to be an issue in the buildfarm.
Author: Aya Iwata <iwata.aya@fujitsu.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Ryo Matsumura <matsumura.ryo@fujitsu.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/OS7PR01MB11964335F36BE41021B62EAE8EAE4A@OS7PR01MB11964.jpnprd01.prod.outlook.com
It's been almost a year since we last did this, and upstream has
been busy. They've added stemmers for Polish and Esperanto,
and also deprecated their old Dutch stemmer in favor of the
Kraaij-Pohlmann algorithm. (The "dutch" stemmer is now the
latter, and "dutch_porter" is the old algorithm.)
Upstream also decided to rename their internal header "header.h"
to something less generic: "snowball_runtime.h". Seems like a good
thing, but it complicates this patch a bit because we were relying on
interposing our own version of "header.h" to control system header
inclusion order. (We're partially failing at that now, because now the
generated stemmer files include <stddef.h> before snowball_runtime.h.
I think that'll be okay, but if the buildfarm complains then we'll
have to do more-extensive editing of the generated files.)
I realized that we weren't documenting the available stemmers in
any user-visible place, except indirectly through sample \dFd output.
That's incomplete because we only provide built-in dictionaries for
the recommended stemmers for each language, not alternative stemmers
such as dutch_porter. So I added a list to the documentation.
I did not do anything with the stopword lists. If those are still
available from snowballstem.org, they are mighty well hidden.
Discussion: https://postgr.es/m/1185975.1767569534@sss.pgh.pa.us
This commit extends the WAIT FOR LSN command with an optional MODE option in
the WITH clause that specifies which LSN type to wait for:
WAIT FOR LSN '<lsn>' [WITH (MODE '<mode>', ...)]
where mode can be:
- 'standby_replay' (default): Wait for WAL to be replayed to the specified
LSN,
- 'standby_write': Wait for WAL to be written (received) to the specified
LSN,
- 'standby_flush': Wait for WAL to be flushed to disk at the specified LSN,
- 'primary_flush': Wait for WAL to be flushed to disk on the primary server.
The default mode is 'standby_replay', matching the original behavior when MODE
is not specified. This follows the pattern used by COPY and EXPLAIN
commands, where options are specified as string values in the WITH clause.
Modes are explicitly named to distinguish between primary and standby
operations:
- Standby modes ('standby_replay', 'standby_write', 'standby_flush') can only
be used during recovery (on a standby server),
- Primary mode ('primary_flush') can only be used on a primary server.
The 'standby_write' and 'standby_flush' modes are useful for scenarios where
applications need to ensure WAL has been received or persisted on the standby
without necessarily waiting for replay to complete. The 'primary_flush' mode
allows waiting for WAL to be flushed on the primary server.
This commit also includes includes:
- Documentation updates for the new syntax and mode descriptions,
- Test coverage for all four modes, including error cases and concurrent
waiters,
- Wakeup logic in walreceiver for standby write/flush waiters,
- Wakeup logic in WAL writer for primary flush waiters.
Discussion: https://postgr.es/m/CABPTF7UiArgW-sXj9CNwRzUhYOQrevLzkYcgBydmX5oDes1sjg%40mail.gmail.com
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@kurilemu.de>
Add a new "location" column to the pg_available_extensions and
pg_available_extension_versions views, exposing the directory where
the extension is located.
The default system location is shown as '$system', the same value
that can be used to configure the extension_control_path GUC.
User-defined locations are only visible for super users, otherwise
'<insufficient privilege>' is returned as a column value, the same
behaviour that we already use in pg_stat_activity.
I failed to resist the temptation to do a little extra editorializing of
the TAP test script.
Catalog version bumped.
Author: Matheus Alcantara <mths.dev@pm.me>
Reviewed-By: Chao Li <li.evan.chao@gmail.com>
Reviewed-By: Rohit Prasad <rohit.prasad@arm.com>
Reviewed-By: Michael Banck <mbanck@gmx.net>
Reviewed-By: Manni Wood <manni.wood@enterprisedb.com>
Reviewed-By: Euler Taveira <euler@eulerto.com>
Reviewed-By: Quan Zongliang <quanzongliang@yeah.net>
This <note> was originally written to describe the double levels
of de-backslashing encountered when a backslash-aware string
literal is used to hold the text representation of a composite
value. It still made sense when we switched to mostly using E'...'
syntax for that type of literal. However, commit f77de4b0c mangled
it completely by changing the example literal to be SQL-standard.
The extra pass of de-backslashing described in the text doesn't
actually occur with the example as written, unless you happen to
be using standard_conforming_strings = off.
We could restore this <note> to self-consistency by reverting the
change from f77de4b0c, but on the whole I judge that its time has
passed. standard_conforming_strings = off is nearly obsolete,
and may soon be fully so. But without that, the behavior isn't
so complicated as to justify a discursive note. I observe that
the nearby section about array I/O syntax has no equivalent text,
although that syntax is equally subject to this issue.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2998401.1767038920@sss.pgh.pa.us
Discussion: https://postgr.es/m/3279216.1767072538@sss.pgh.pa.us
Up to now, index amhandlers were expected to produce a new, palloc'd
struct on each call. That requires palloc/pfree overhead, and creates
a risk of memory leaks if the caller fails to pfree, and the time
taken to fill such a large structure isn't nil. Moreover, we were
storing these things in the relcache, eating several hundred bytes for
each cached index. There is not anything in these structs that needs
to vary at runtime, so let's change the definition so that an
amhandler can return a pointer to a "static const" struct of which
there's only one copy per index AM. Mark all the core code's
IndexAmRoutine pointers const so that we catch anyplace that might
still try to change or pfree one.
(This is similar to the way we were already handling TableAmRoutine
structs. This commit does fix one comment that was infelicitously
copied-and-pasted into tableamapi.c.)
This commit needs to be called out in the v19 release notes as an API
change for extension index AMs. An un-updated AM will still work
(as of now, anyway) but it risks memory leaks and will be slower than
necessary.
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAEoWx2=vApYk2LRu8R0DdahsPNEhWUxGBZ=rbZo1EXE=uA+opQ@mail.gmail.com
This new function exposes at SQL level some information related to
multixacts, not available until now. This data is useful for monitoring
purposes, especially for workloads that make a heavy use of multixacts:
- num_mxids, number of MultiXact IDs in use.
- num_members, number of member entries in use.
- members_size, bytes used by num_members in pg_multixact/members/.
- oldest_multixact: oldest MultiXact still needed.
This patch has been originally proposed when MultiXactOffset was still
32 bits, to monitor wraparound. This part is not relevant anymore since
bd8d9c9bdf that has widen MultiXactOffset to 64 bits. The monitoring
of disk space usage for the members is still relevant.
Some tests are added to check this function, in the shape of one
isolation test with concurrent transactions that take a ROW SHARE lock,
and some SQL tests for pg_read_all_stats. Some documentation is added
to explain some patterns that can come from the information provided by
the function.
Bump catalog version.
Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Discussion: https://postgr.es/m/CA+QeY+AAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg@mail.gmail.com
Previously logical decoding required wal_level to be set to 'logical'
at server start. This meant that users had to incur the overhead of
logical-level WAL logging even when no logical replication slots were
in use.
This commit adds functionality to automatically control logical
decoding availability based on logical replication slot presence. The
newly introduced module logicalctl.c allows logical decoding to be
dynamically activated when needed when wal_level is set to
'replica'.
When the first logical replication slot is created, the system
automatically increases the effective WAL level to maintain
logical-level WAL records. Conversely, after the last logical slot is
dropped or invalidated, it decreases back to 'replica' WAL level.
While activation occurs synchronously right after creating the first
logical slot, deactivation happens asynchronously through the
checkpointer process. This design avoids a race condition at the end
of recovery; a concurrent deactivation could happen while the startup
process enables logical decoding at the end of recovery, but WAL
writes are still not permitted until recovery fully completes. The
checkpointer will handle it after recovery is done. Asynchronous
deactivation also avoids excessive toggling of the logical decoding
status in workloads that repeatedly create and drop a single logical
slot. On the other hand, this lazy approach can delay changes to
effective_wal_level and the disabling logical decoding, especially
when the checkpointer is busy with other tasks. We chose this lazy
approach in all deactivation paths to keep the implementation simple,
even though laziness is strictly required only for end-of-recovery
cases. Future work might address this limitation either by using a
dedicated worker instead of the checkpointer, or by implementing
synchronous waiting during slot drops if workloads are significantly
affected by the lazy deactivation of logical decoding.
The effective WAL level, determined internally by XLogLogicalInfo, is
allowed to change within a transaction until an XID is assigned. Once
an XID is assigned, the value becomes fixed for the remainder of the
transaction. This behavior ensures that the logging mode remains
consistent within a writing transaction, similar to the behavior of
GUC parameters.
A new read-only GUC parameter effective_wal_level is introduced to
monitor the actual WAL level in effect. This parameter reflects the
current operational WAL level, which may differ from the configured
wal_level setting.
Bump PG_CONTROL_VERSION as it adds a new field to CheckPoint struct.
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAD21AoCVLeLYq09pQPaWs+Jwdni5FuJ8v2jgq-u9_uFbcp6UbA@mail.gmail.com
The definition of PGoauthBearerRequest uses a temporary SOCKTYPE macro
to hide the difference between Windows and Berkeley socket handles,
since we don't surface pgsocket in our public API. This macro doesn't
need to escape the header, because implementers will choose the correct
socket type based on their platform, so I #undef'd it immediately after
use.
I didn't namespace that helper, though, so if anyone else needs a
SOCKTYPE macro, libpq-fe.h will now unhelpfully get rid of it. This
doesn't seem too far-fetched, given its proximity to existing POSIX
macro names.
Add a PQ_ prefix to avoid collisions, update and improve the surrounding
documentation, and backpatch.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAOYmi%2BmrGg%2Bn_X2MOLgeWcj3v_M00gR8uz_D7mM8z%3DdX1JYVbg%40mail.gmail.com
Backpatch-through: 18
Allow pg_createsubscriber to reuse existing publications instead of
failing when they already exist on the publisher.
Previously, pg_createsubscriber would fail if any specified publication
already existed. Now, existing publications are reused as-is with their
current configuration, and non-existing publications are created
automatically with FOR ALL TABLES.
This change provides flexibility when working with mixed scenarios of
existing and new publications. Users should verify that existing
publications have the desired configuration before reusing them, and can
use --dry-run with verbose mode to see which publications will be reused
and which will be created.
Only publications created by pg_createsubscriber are cleaned up during
error cleanup operations. Pre-existing publications are preserved unless
'--clean=publications' is explicitly specified, which drops all
publications.
This feature would be helpful for pub-sub configurations where users want
to subscribe to a subset of tables from the publisher.
Author: Shubham Khanna <khannashubham1197@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: tianbing <tian_bing_0531@163.com>
Discussion: https://postgr.es/m/CAHv8Rj%2BsxWutv10WiDEAPZnygaCbuY2RqiLMj2aRMH-H3iZwyA%40mail.gmail.com
Previously, pg_sync_replication_slots() would finish without synchronizing
slots that didn't meet requirements, rather than failing outright. This
could leave some failover slots unsynchronized if required catalog rows or
WAL segments were missing or at risk of removal, while the standby
continued removing needed data.
To address this, the function now waits for the primary slot to advance to
a position where all required data is available on the standby before
completing synchronization. It retries cyclically until all failover slots
that existed on the primary at the start of the call are synchronized.
Slots created after the function begins are not included. If the standby
is promoted during this wait, the function exits gracefully and the
temporary slots will be removed.
Author: Ajin Cherian <itsajin@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Yilin Zhang <jiezhilove@126.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com