pg_stat_statements produces multiple entries for queries like
SELECT something FROM table WHERE col IN (1, 2, 3, ...)
depending on the number of parameters, because every element of
ArrayExpr is individually jumbled. Most of the time that's undesirable,
especially if the list becomes too large.
Fix this by introducing a new GUC query_id_squash_values which modifies
the node jumbling code to only consider the first and last element of a
list of constants, rather than each list element individually. This
affects both the query_id generated by query jumbling, as well as
pg_stat_statements query normalization so that it suppresses printing of
the individual elements of such a list.
The default value is off, meaning the previous behavior is maintained.
Author: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Sergey Dudoladov (mysterious, off-list)
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Sutou Kouhei <kou@clear-code.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Marcos Pegoraro <marcos@f10.com.br>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Tested-by: Yasuo Honda <yasuo.honda@gmail.com>
Tested-by: Sergei Kornilov <sk@zsrv.org>
Tested-by: Maciek Sakrejda <m.sakrejda@gmail.com>
Tested-by: Chengxi Sun <sunchengxi@highgo.com>
Tested-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Discussion: https://postgr.es/m/CA+q6zcWtUbT_Sxj0V6HY6EZ89uv5wuG5aefpe_9n0Jr3VwntFg@mail.gmail.com
The previous commit introduced the infrastructure to start io_workers. This
commit actually makes the workers execute IOs.
IO workers consume IOs from a shared memory submission queue, run traditional
synchronous system calls, and perform the shared completion handling
immediately. Client code submits most requests by pushing IOs into the
submission queue, and waits (if necessary) using condition variables. Some
IOs cannot be performed in another process due to lack of infrastructure for
reopening the file, and must processed synchronously by the client code when
submitted.
For now the default io_method is changed to "worker". We should re-evaluate
that around beta1, we might want to be careful and set the default to "sync"
for 18.
Reviewed-by: Noah Misch <noah@leadboat.com>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
This commit contains the basic, system-wide, infrastructure for
io_method=worker. It does not yet actually execute IO, this commit just
provides the infrastructure for running IO workers, kept separate for easier
review.
The number of IO workers can be adjusted with a PGC_SIGHUP GUC. Eventually
we'd like to make the number of workers dynamically scale up/down based on the
current "IO load".
To allow the number of IO workers to be increased without a restart, we need
to reserve PGPROC entries for the workers unconditionally. This has been
judged to be worth the cost. If it turns out to be problematic, we can
introduce a PGC_POSTMASTER GUC to control the maximum number.
As io workers might be needed during shutdown, e.g. for AIO during the
shutdown checkpoint, a new PMState phase is added. IO workers are shut down
after the shutdown checkpoint has been performed and walsender/archiver have
shut down, but before the checkpointer itself shuts down. See also
87a6690cc69.
Updates PGSTAT_FILE_FORMAT_ID due to the addition of a new BackendType.
Reviewed-by: Noah Misch <noah@leadboat.com>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
Discussion: https://postgr.es/m/20210223100344.llw5an2aklengrmn@alap3.anarazel.de
Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
Since many clients default to the X25519 curve in the TLS handshake,
the fact that the server by defualt doesn't support it cause an extra
roundtrip for each TLS connection. By adding multiple curves, which
is supported since 3d1ef3a15c3eb68da, we can reduce the risk of extra
roundtrips.
Author: Daniel Gustafsson <daniel@yesql.se>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/20240616234612.6cslu7nqexquvwj7@awork3.anarazel.de
Since its introduction in fc34b0d9de27a, the default
maintenance_io_concurrency has been larger than the default
effective_io_concurrency. maintenance_io_concurrency primarily
controlled prefetching done on behalf of the whole system, for
operations like recovery. Therefore it makes sense for it to have a
value equal to or greater than effective_io_concurrency, which controls
I/O concurrency for reading a relation in a bitmap heap scan.
ff79b5b2ab increased effective_io_concurrency to 16, so we'll increase
maintenance_io_concurrency as well. For now, though, we'll keep the
defaults of effective_io_concurrency and maintenance_io_concurrency
equal to one another (16).
On fast, high IOPs systems, significantly higher values of
maintenance_io_concurrency are observably beneficial [1]. However, such
values would flood low IOPs systems and increase overall system I/O
latency.
It is worth mentioning that since 9256822608f and c3e775e608f,
maintenance_io_concurrency also controls the I/O concurrency of each
vacuum worker. Since many autovacuum workers may be simultaneously
issuing I/Os, we want to keep maintenance_io_concurrency appropriately
conservative.
[1] https://postgr.es/m/c5d52837-6256-0556-ac8c-d6d3d558820a%40enterprisedb.com
Suggested-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Discussion: https://postgr.es/m/CAKZiRmxdHQaU%2B2Zpe6d%3Dx%3D0vigJ1sfWwwVYLJAf%3Dud_wQ_VcUw%40mail.gmail.com
This commit just does the minimal wiring up of the AIO subsystem, added in the
next commit, to the rest of the system. The next commit contains more details
about motivation and architecture.
This commit is kept separate to make it easier to review, separating the
changes across the tree, from the implementation of the new subsystem.
We discussed squashing this commit with the main commit before merging AIO,
but there has been a mild preference for keeping it separate.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
This commit introduces a new GUC, log_lock_failure, which controls whether
a detailed log message is produced when a lock acquisition fails. Currently,
it only supports logging lock failures caused by SELECT ... NOWAIT.
The log message includes information about all processes holding or
waiting for the lock that couldn't be acquired, helping users analyze and
diagnose the causes of lock failures.
Currently, this option does not log failures from SELECT ... SKIP LOCKED,
as that could generate excessive log messages if many locks are skipped,
causing unnecessary noise.
This mechanism can be extended in the future to support for logging
lock failures from other commands, such as LOCK TABLE ... NOWAIT.
Author: Yuki Seino <seinoyu@oss.nttdata.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://postgr.es/m/411280a186cc26ef7034e0f2dfe54131@oss.nttdata.com
Add log_connections option 'setup_durations' which logs durations of
several key parts of connection establishment and backend setup.
For an incoming connection, starting from when the postmaster gets a
socket from accept() and ending when the forked child backend is first
ready for query, there are multiple steps that could each take longer
than expected due to external factors. This logging provides visibility
into authentication and fork duration as well as the end-to-end
connection establishment and backend initialization time.
To make this portable, the timings captured in the postmaster (socket
creation time, fork initiation time) are passed through the
BackendStartupData.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Guillaume Lelarge <guillaume.lelarge@dalibo.com>
Discussion: https://postgr.es/m/flat/CAAKRu_b_smAHK0ZjrnL5GRxnAVWujEXQWpLXYzGbmpcZd3nLYw%40mail.gmail.com
Convert the boolean log_connections GUC into a list GUC comprised of the
connection aspects to log.
This gives users more control over the volume and kind of connection
logging.
The current log_connections options are 'receipt', 'authentication', and
'authorization'. The empty string disables all connection logging. 'all'
enables all available connection logging.
For backwards compatibility, the most common values for the
log_connections boolean are still supported (on, off, 1, 0, true, false,
yes, no). Note that previously supported substrings of on, off, true,
false, yes, and no are no longer supported.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/flat/CAAKRu_b_smAHK0ZjrnL5GRxnAVWujEXQWpLXYzGbmpcZd3nLYw%40mail.gmail.com
Calculate the insert threshold for triggering an autovacuum of a
relation based on the number of unfrozen pages.
By only considering the unfrozen portion of the table when calculating
how many tuples to add to the insert threshold, we can trigger more
frequent vacuums of insert-heavy tables. This increases the chances of
vacuuming those pages when they still reside in shared buffers
This also increases the number of autovacuums triggered by tuples
inserted and not by wraparound risk. We prefer to freeze these pages
during insert-triggered autovacuums, as anti-wraparound vacuums are not
automatically canceled by conflicting lock requests.
We calculate the unfrozen percentage of the table using the recently
added (99f8f3fbbc8f) relallfrozen column of pg_class.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_aj-P7YyBz_cPNwztz6ohP%2BvWis%3Diz3YcomkB3NpYA--w%40mail.gmail.com
This commit is a rework of 2421e9a51d20, about which Andres Freund has
raised some concerns as it is valuable to have both track_io_timing and
track_wal_io_timing in some cases, as the WAL write and fsync paths can
be a major bottleneck for some workloads. Hence, it can be relevant to
not calculate the WAL timings in environments where pg_test_timing
performs poorly while capturing some IO data under track_io_timing for
the non-WAL IO paths. The opposite can be also true: it should be
possible to disable the non-WAL timings and enable the WAL timings (the
previous GUC setups allowed this possibility).
track_wal_io_timing is added back in this commit, controlling if WAL
timings should be calculated in pg_stat_io for the read, fsync and write
paths, as done previously with pg_stat_wal. pg_stat_wal previously
tracked only the sync and write parts (now removed), read stats is new
data tracked in pg_stat_io, all three are aggregated if
track_wal_io_timing is enabled. The read part matters during recovery
or if a XLogReader is used.
Extra note: more control over if the types of timings calculated in
pg_stat_io could be done with a GUC that lists pairs of (IOObject,IOOp).
Reported-by: Andres Freund <andres@anarazel.de>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/3opf2wh2oljco6ldyqf7ukabw3jijnnhno6fjb4mlu6civ5h24@fcwmhsgmlmzu
The four following attributes are removed from pg_stat_wal:
* wal_write
* wal_sync
* wal_write_time
* wal_sync_time
a051e71e28a1 has added an equivalent of this information in pg_stat_io
with more granularity as this now spreads across the backend types, IO
context and IO objects. So, keeping the same information in pg_stat_wal
has little benefits.
Another benefit of this commit is the removal of PendingWalStats,
simplifying an upcoming patch to add per-backend WAL statistics, which
already support IO statistics and which have access to the write/sync
stats data of WAL.
The GUC track_wal_io_timing, that was used to enable or disable the
aggregation of the write and sync timings for WAL, is also removed.
pgstat_prepare_io_time() is simplified.
Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, due to the update of PgStat_WalStats.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal
This commit implements OAUTHBEARER, RFC 7628, and OAuth 2.0 Device
Authorization Grants, RFC 8628. In order to use this there is a
new pg_hba auth method called oauth. When speaking to a OAuth-
enabled server, it looks a bit like this:
$ psql 'host=example.org oauth_issuer=... oauth_client_id=...'
Visit https://oauth.example.org/login and enter the code: FPQ2-M4BG
Device authorization is currently the only supported flow so the
OAuth issuer must support that in order for users to authenticate.
Third-party clients may however extend this and provide their own
flows. The built-in device authorization flow is currently not
supported on Windows.
In order for validation to happen server side a new framework for
plugging in OAuth validation modules is added. As validation is
implementation specific, with no default specified in the standard,
PostgreSQL does not ship with one built-in. Each pg_hba entry can
specify a specific validator or be left blank for the validator
installed as default.
This adds a requirement on libcurl for the client side support,
which is optional to build, but the server side has no additional
build requirements. In order to run the tests, Python is required
as this adds a https server written in Python. Tests are gated
behind PG_TEST_EXTRA as they open ports.
This patch has been a multi-year project with many contributors
involved with reviews and in-depth discussions: Michael Paquier,
Heikki Linnakangas, Zhihong Yu, Mahendrakar Srinivasarao, Andrey
Chudnovsky and Stephen Frost to name a few. While Jacob Champion
is the main author there have been some levels of hacking by others.
Daniel Gustafsson contributed the validation module and various bits
and pieces; Thomas Munro wrote the client side support for kqueue.
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Reviewed-by: Kashif Zeeshan <kashi.zeeshan@gmail.com>
Discussion: https://postgr.es/m/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.camel@vmware.com
This commit introduces idle_replication_slot_timeout GUC that allows
inactive slots to be invalidated at the time of checkpoint. Because
checkpoints happen checkpoint_timeout intervals, there can be some lag
between when the idle_replication_slot_timeout was exceeded and when the
slot invalidation is triggered at the next checkpoint. To avoid such lags,
users can force a checkpoint to promptly invalidate inactive slots.
Note that the idle timeout invalidation mechanism is not applicable for
slots that do not reserve WAL or for slots on the standby server that are
synced from the primary server (i.e., standby slots having 'synced' field
'true'). Synced slots are always considered to be inactive because they
don't perform logical decoding to produce changes.
The slots can become inactive for a long period if a subscriber is down
due to a system error or inaccessible because of network issues. If such a
situation persists, it might be more practical to recreate the subscriber
rather than attempt to recover the node and wait for it to catch up which
could be time-consuming.
Then, external tools could create replication slots (e.g., for migrations
or upgrades) that may fail to remove them if an error occurs, leaving
behind unused slots that take up space and resources. Manually cleaning
them up can be tedious and error-prone, and without intervention, these
lingering slots can cause unnecessary WAL retention and system bloat.
As the duration of idle_replication_slot_timeout is in minutes, any test
using that would be time-consuming. We are planning to commit a follow up
patch for tests by using the injection point framework.
Author: Nisha Moond <nisha.moond412@gmail.com>
Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
Discussion: https://postgr.es/m/OS0PR01MB5716C131A7D80DAE8CB9E88794FC2@OS0PR01MB5716.jpnprd01.prod.outlook.com
This commit adds the amount of time spent sleeping due to
cost-based delay to the pg_stat_progress_vacuum and
pg_stat_progress_analyze system views. A new configuration
parameter named track_cost_delay_timing, which is off by default,
controls whether this information is gathered. For vacuum, the
reported value includes the sleep time of any associated parallel
workers. However, parallel workers only report their sleep time
once per second to avoid overloading the leader process.
Bumps catversion.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Masahiro Ikeda <ikedamsh@oss.nttdata.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Sergei Kornilov <sk@zsrv.org>
Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
Aggressive vacuums must scan every unfrozen tuple in order to advance
the relfrozenxid/relminmxid. Because data is often vacuumed before it is
old enough to require freezing, relations may build up a large backlog
of pages that are set all-visible but not all-frozen in the visibility
map. When an aggressive vacuum is triggered, all of these pages must be
scanned. These pages have often been evicted from shared buffers and
even from the kernel buffer cache. Thus, aggressive vacuums often incur
large amounts of extra I/O at the expense of foreground workloads.
To amortize the cost of aggressive vacuums, eagerly scan some
all-visible but not all-frozen pages during normal vacuums.
All-visible pages that are eagerly scanned and set all-frozen in the
visibility map are counted as successful eager freezes and those not
frozen are counted as failed eager freezes.
If too many eager scans fail in a row, eager scanning is temporarily
suspended until a later portion of the relation. The number of failures
tolerated is configurable globally and per table.
To effectively amortize aggressive vacuums, we cap the number of
successes as well. Capping eager freeze successes also limits the amount
of potentially wasted work if these pages are modified again before the
next aggressive vacuum. Once we reach the maximum number of blocks
successfully eager frozen, eager scanning is disabled for the remainder
of the vacuum of the relation.
Original design idea from Robert Haas, with enhancements from
Andres Freund, Tomas Vondra, and me
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZF_KCzZuOrPrOqjGVe8iRVWEAJSpzMgRQs%3D5-v84cXUg%40mail.gmail.com
"I/O" seems more descriptive than "Asynchronous Behavior", given that some of
the GUCs in the section don't relate to anything asynchronous.
Most other abbreviations in the config sections are un-abbreviated, but
"Input/Output" seems less likely to be helpful than just IO or I/O.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/x3tlw2jk5gm3r3mv47hwrshffyw7halpczkfbk3peksxds7bvc@lguk43z3bsyq
One way autovacuum chooses tables to vacuum is by comparing the
number of updated or deleted tuples with a value calculated using
autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor.
The threshold specifies the base value for comparison, and the
scale factor specifies the fraction of the table size to add to it.
This strategy ensures that smaller tables are vacuumed after fewer
updates/deletes than larger tables, which is reasonable in many
cases but can result in infrequent vacuums on very large tables.
This is undesirable for a couple of reasons, such as very large
tables incurring a huge amount of bloat between vacuums.
This new parameter provides a way to set a limit on the value
calculated with autovacuum_vacuum_threshold and
autovacuum_vacuum_scale_factor so that very large tables are
vacuumed more frequently. By default, it is set to 100,000,000
tuples, but it can be disabled by setting it to -1. It can also be
adjusted for individual tables by changing storage parameters.
Author: Nathan Bossart <nathandbossart@gmail.com>
Co-authored-by: Frédéric Yhuel <frederic.yhuel@dalibo.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Michael Banck <mbanck@gmx.net>
Reviewed-by: Joe Conway <mail@joeconway.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Reviewed-by: Vinícius Abrahão <vinnix.bsd@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru>
Discussion: https://postgr.es/m/956435f8-3b2f-47a6-8756-8c54ded61802%40dalibo.com
ca9c6a5680d consolidated most of vacuum-related GUCs' documentation into
a new subsection. It neglected, however, to reorganize
postgresql.conf.sample to match the new order. Do this now.
Reported-by: Álvaro Herrera
Discussion: https://postgr.es/m/202501110902.5banlseavz7c%40alvherre.pgsql
This commit introduces a new parameter named
autovacuum_worker_slots that controls how many autovacuum worker
slots to reserve during server startup. Modifying this new
parameter's value does require a server restart, but it should
typically be set to the upper bound of what you might realistically
need to set autovacuum_max_workers. With that new parameter in
place, autovacuum_max_workers can now be changed with a SIGHUP
(e.g., pg_ctl reload).
If autovacuum_max_workers is set higher than
autovacuum_worker_slots, a WARNING is emitted, and the server will
only start up to autovacuum_worker_slots workers at a given time.
If autovacuum_max_workers is set to a value less than the number of
currently-running autovacuum workers, the existing workers will
continue running, but no new workers will be started until the
number of running autovacuum workers drops below
autovacuum_max_workers.
Reviewed-by: Sami Imseih, Justin Pryzby, Robert Haas, Andres Freund, Yogesh Sharma
Discussion: https://postgr.es/m/20240410212344.GA1824549%40nathanxps13
MD5 has been considered to be unsuitable for use as a cryptographic
hash algorithm for some time. Furthermore, MD5 password hashes in
PostgreSQL are vulnerable to pass-the-hash attacks, i.e., knowing
the username and hashed password is sufficient to authenticate.
The SCRAM-SHA-256 method added in v10 is not subject to these
problems and is considered to be superior to MD5.
This commit marks MD5 password support in PostgreSQL as deprecated
and to be removed in a future release. The documentation now
contains several deprecation notices, and CREATE ROLE and ALTER
ROLE now emit deprecation warnings when setting MD5 passwords. The
warnings can be disabled by setting the md5_password_warnings
parameter to "off".
Reviewed-by: Greg Sabino Mullane, Jim Nasby
Discussion: https://postgr.es/m/ZwbfpJJol7lDWajL%40nathan
The ordering of DISTINCT items is semantically insignificant, so we
can reorder them as needed. In fact, in the parser, we absorb the
sorting semantics of the sortClause as much as possible into the
distinctClause, ensuring that one clause is a prefix of the other.
This can help avoid a possible need to re-sort.
In this commit, we attempt to adjust the DISTINCT keys to match the
input path's pathkeys. This can likewise help avoid re-sorting, or
allow us to use incremental-sort to save efforts.
For DISTINCT ON expressions, the parser already ensures that they
match the initial ORDER BY expressions. When reordering the DISTINCT
keys, we must ensure that the resulting pathkey list matches the
initial distinctClause pathkeys.
This introduces a new GUC, enable_distinct_reordering, which allows
the optimization to be disabled if needed.
Author: Richard Guo
Reviewed-by: Andrei Lepikhov
Discussion: https://postgr.es/m/CAMbWs48dR26cCcX0f=8bja2JKQPcU64136kHk=xekHT9xschiQ@mail.gmail.com
The ssl_ciphers GUC can only set cipher suites for TLSv1.2, and lower,
connections. For TLSv1.3 connections a different OpenSSL API must be
used. This adds a new GUC, ssl_tls13_ciphers, which can be used to
configure a colon separated list of cipher suites to support when
performing a TLSv1.3 handshake.
Original patch by Erica Zhang with additional hacking by me.
Author: Erica Zhang <ericazhangy2021@qq.com>
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://postgr.es/m/tencent_063F89FA72CCF2E48A0DF5338841988E9809@qq.com
The ssl_ecdh_curve GUC only accepts a single value, but the TLS
handshake can list multiple curves in the groups extension (the
extension has been renamed to contain more than elliptic curves).
This changes the GUC to accept a colon-separated list of curves.
This commit also renames the GUC to ssl_groups to match the new
nomenclature for the TLS extension.
Original patch by Erica Zhang with additional hacking by me.
Author: Erica Zhang <ericazhangy2021@qq.com>
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://postgr.es/m/tencent_063F89FA72CCF2E48A0DF5338841988E9809@qq.com
Since the introduction of TID store, vacuum uses far less memory in
the common case than in versions 16 and earlier. Invoking multiple
rounds of index vacuuming in turn requires a much larger table. It'd
be a good idea anyway to cover this case in regression testing, and a
lower limit is less painful for slow buildfarm animals. The reason to
do it now is to re-enable coverage of the bugfix in commit 83c39a1f7f.
For consistency, give autovacuum_work_mem the same treatment.
Suggested by Andres Freund
Tested by Melanie Plageman
Backpatch to v17, where TID store was introduced
Discussion: https://postgr.es/m/20240516205458.ohvlzis5b5tvejru@awork3.anarazel.de
Discussion: https://postgr.es/m/20240722164745.fvaoh6g6zprisqgp%40awork3.anarazel.de
The standby_slot_names GUC allows the specification of physical standby
slots that must be synchronized before the logical walsenders associated
with logical failover slots. However, for this purpose, the GUC name is
too generic.
Author: Hou Zhijie
Reviewed-by: Bertrand Drouvot, Masahiko Sawada
Backpatch-through: 17
Discussion: https://postgr.es/m/ZnWeUgdHong93fQN@momjian.us
They were under "File Locations", which doesn't make sense. Move them
to Resource Usage / Memory, which matches their categorization in the
source code and in the documentation.
Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...])
on the preliminary stage of optimization when we are still working with the
expression tree.
Here Cn is a n-th constant expression, 'expr' is non-constant expression, 'op'
is an operator which returns boolean result and has a commuter (for the case
of reverse order of constant and non-constant parts of the expression,
like 'Cn op expr').
Sometimes it can lead to not optimal plan. This is why there is a
or_to_any_transform_limit GUC. It specifies a threshold value of length of
arguments in an OR expression that triggers the OR-to-ANY transformation.
Generally, more groupable OR arguments mean that transformation will be more
likely to win than to lose.
Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru
Author: Alena Rybakina <lena.ribackina@yandex.ru>
Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Ranier Vilela <ranier.vf@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
The BAS_VACUUM ring size has been 256kB since commit d526575f introduced
the mechanism 17 years ago. Commit 1cbbee03 recently made it
configurable but retained the traditional default. The correct default
size has been debated for years, but 256kB is certainly very small.
VACUUM soon needs to write back data it dirtied only 32 blocks ago,
which usually requires flushing the WAL. New experiments in prefetching
pages for VACUUM exacerbated the problem by crashing into dirty data
even sooner. Let's make the default 2MB. That's 1.6% of the default
toy buffer pool size, and 0.2% of 1GB, which would be a considered a
small shared_buffers setting for a real system these days. Users are
still free to set the GUC to a different value.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20240403221257.md4gfki3z75cdyf6%40awork3.anarazel.de
Discussion: https://postgr.es/m/CA%2BhUKGLY4Q4ZY4f1rvnFtv6%2BPkjNf8MejdPkcju3Qii9DYqqcQ%40mail.gmail.com
Break ReadBuffer() up into two steps. StartReadBuffers() and
WaitReadBuffers() give us two main advantages:
1. Multiple consecutive blocks can be read with one system call.
2. Advice (hints of future reads) can optionally be issued to the
kernel ahead of time.
The traditional ReadBuffer() function is now implemented in terms of
those functions, to avoid duplication.
A new GUC io_combine_limit is defined, and the functions for limiting
per-backend pin counts are made into public APIs. Those are provided
for use by callers of StartReadBuffers(), when deciding how many buffers
to read at once. The following commit will add a higher level mechanism
for doing that automatically with a practical interface.
With some more infrastructure in later work, StartReadBuffers() could
be extended to start real asynchronous I/O instead of just issuing
advice and leaving WaitReadBuffers() to do the work synchronously.
Author: Thomas Munro <thomas.munro@gmail.com>
Author: Andres Freund <andres@anarazel.de> (some optimization tweaks)
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Tested-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
This is marked PGC_SIGHUP, so it can only be set in a configuration
file, not anywhere else; and it is also marked GUC_DISALLOW_IN_AUTO_FILE,
so it can't be set using ALTER SYSTEM. When set to false, the
ALTER SYSTEM command is disallowed.
There was considerable concern that this would be misinterpreted as
a security feature, which it is not, because a determined superuser
has various ways of bypassing it. Hence, a lot of work has gone into
wordsmithing the documentation, in the hopes of avoiding any such
confusion.
Jelte Fennemia-Nio and Gabriele Bartolini, with wording suggestions
for the documentation from many others.
Discussion: http://postgr.es/m/CA%2BVUV5rEKt2%2BCdC_KUaPoihMu%2Bi5ChT4WVNTr4CD5-xXZUfuQw%40mail.gmail.com
This patch provides a way to ensure that physical standbys that are
potential failover candidates have received and flushed changes before
the primary server making them visible to subscribers. Doing so guarantees
that the promoted standby server is not lagging behind the subscribers
when a failover is necessary.
The logical walsender now guarantees that all local changes are sent and
flushed to the standby servers corresponding to the replication slots
specified in 'standby_slot_names' before sending those changes to the
subscriber.
Additionally, the SQL functions pg_logical_slot_get_changes,
pg_logical_slot_peek_changes and pg_replication_slot_advance are modified
to ensure that they process changes for failover slots only after physical
slots specified in 'standby_slot_names' have confirmed WAL receipt for those.
Author: Hou Zhijie and Shveta Malik
Reviewed-by: Masahiko Sawada, Peter Smith, Bertrand Drouvot, Ajin Cherian, Nisha Moond, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
More precisely, what we do here is make the SLRU cache sizes
configurable with new GUCs, so that sites with high concurrency and big
ranges of transactions in flight (resp. multixacts/subtransactions) can
benefit from bigger caches. In order for this to work with good
performance, two additional changes are made:
1. the cache is divided in "banks" (to borrow terminology from CPU
caches), and algorithms such as eviction buffer search only affect
one specific bank. This forestalls the problem that linear searching
for a specific buffer across the whole cache takes too long: we only
have to search the specific bank, whose size is small. This work is
authored by Andrey Borodin.
2. Change the locking regime for the SLRU banks, so that each bank uses
a separate LWLock. This allows for increased scalability. This work
is authored by Dilip Kumar. (A part of this was previously committed as
d172b717c6f4.)
Special care is taken so that the algorithms that can potentially
traverse more than one bank release one bank's lock before acquiring the
next. This should happen rarely, but particularly clog.c's group commit
feature needed code adjustment to cope with this. I (Álvaro) also added
lots of comments to make sure the design is sound.
The new GUCs match the names introduced by bcdfa5f2e2f2 in the
pg_stat_slru view.
The default values for these parameters are similar to the previous
sizes of each SLRU. commit_ts, clog and subtrans accept value 0, which
means to adjust by dividing shared_buffers by 512 (so 2MB for every 1GB
of shared_buffers), with a cap of 8MB. (A new slru.c function
SimpleLruAutotuneBuffers() was added to support this.) The cap was
previously 1MB for clog, so for sites with more than 512MB of shared
memory the total memory used increases, which is likely a good tradeoff.
However, other SLRUs (notably multixact ones) retain smaller sizes and
don't support a configured value of 0. These values based on
shared_buffers may need to be revisited, but that's an easy change.
There was some resistance to adding these new GUCs: it would be better
to adjust to memory pressure automatically somehow, for example by
stealing memory from shared_buffers (where the caches can grow and
shrink naturally). However, doing that seems to be a much larger
project and one which has made virtually no progress in several years,
and because this is such a pain point for so many users, here we take
the pragmatic approach.
Author: Andrey Borodin <x4mmm@yandex-team.ru>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amul Sul, Gilles Darold, Anastasia Lubennikova,
Ivan Lazarev, Robert Haas, Thomas Munro, Tomas Vondra,
Yura Sokolov, Васильев Дмитрий (Dmitry Vasiliev).
Discussion: https://postgr.es/m/2BEC2B3F-9B61-4C1D-9FB5-5FAB0F05EF86@yandex-team.ru
Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com
By enabling slot synchronization, all the failover logical replication
slots on the primary (assuming configurations are appropriate) are
automatically created on the physical standbys and are synced
periodically. The slot sync worker on the standby server pings the primary
server at regular intervals to get the necessary failover logical slots
information and create/update the slots locally. The slots that no longer
require synchronization are automatically dropped by the worker.
The nap time of the worker is tuned according to the activity on the
primary. The slot sync worker waits for some time before the next
synchronization, with the duration varying based on whether any slots were
updated during the last cycle.
A new parameter sync_replication_slots enables or disables this new
process.
On promotion, the slot sync worker is shut down by the startup process to
drop any temporary slots acquired by the slot sync worker and to prevent
the worker from trying to fetch the failover slots.
A functionality to allow logical walsenders to wait for the physical will
be done in a subsequent commit.
Author: Shveta Malik, Hou Zhijie based on design inputs by Masahiko Sawada and Amit Kapila
Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Ajin Cherian, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
This commit adds timeout that is expected to be used as a prevention
of long-running queries. Any session within the transaction will be
terminated after spanning longer than this timeout.
However, this timeout is not applied to prepared transactions.
Only transactions with user connections are affected.
Discussion: https://postgr.es/m/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com
Author: Andrey Borodin <amborodin@acm.org>
Author: Japin Li <japinli@hotmail.com>
Author: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Nikolay Samokhvalov <samokhvalov@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: bt23nguyent <bt23nguyent@oss.nttdata.com>
Reviewed-by: Yuhang Qiu <iamqyh@gmail.com>
When evaluating a query with a multi-column GROUP BY clause, we can minimize
sort operations or avoid them if we synchronize the order of GROUP BY clauses
with the ORDER BY sort clause or sort order, which comes from the underlying
query tree. Grouping does not imply any ordering, so we can compare
the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg,
we simply compared keys in the order specified in the query. This commit
explores alternative ordering of the keys, trying to find a cheaper one.
The ordering of group keys may interact with other parts of the query, some of
which may not be known while planning the grouping. For example, there may be
an explicit ORDER BY clause or some other ordering-dependent operation higher up
in the query, and using the same ordering may allow using either incremental
sort or even eliminating the sort entirely.
The patch always keeps the ordering specified in the query, assuming the user
might have additional insights.
This introduces a new GUC enable_group_by_reordering so that the optimization
may be disabled if needed.
Discussion: https://postgr.es/m/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru
Author: Andrei Lepikhov, Teodor Sigaev
Reviewed-by: Tomas Vondra, Claudio Freire, Gavin Flower, Dmitry Dolgov
Reviewed-by: Robert Haas, Pavel Borisov, David Rowley, Zhihong Yu
Reviewed-by: Tom Lane, Alexander Korotkov, Richard Guo, Alena Rybakina
In commit 3e51b278d I (tgl) caused initdb to leave lc_messages and
other lc_xxx GUCs commented-out in the installed postgresql.conf file
if they were going to be set to 'C'. This was a hack for cosmetic
purposes, and it was buggy because lc_messages' wired-in default is
not 'C' but '' (empty string). That led to --no-locale not having
the expected effect, since the postmaster would then obtain
lc_messages from its startup environment.
Let's just revert to the prior behavior of always de-commenting the
lc_xxx entries; the argument for changing that longstanding behavior
was weak in the first place.
Also, fix postgresql.conf.sample's erroneous claim that the default
value of lc_messages is 'C'. I suspect that was what misled me into
making this mistake in the first place.
Report and patch by Kyotaro Horiguchi. Back-patch to v16 where
the problem was introduced.
Discussion: https://postgr.es/m/20231122.162700.1995154567625541112.horikyota.ntt@gmail.com
When active, this process writes WAL summary files to
$PGDATA/pg_wal/summaries. Each summary file contains information for a
certain range of LSNs on a certain TLI. For each relation, it stores a
"limit block" which is 0 if a relation is created or destroyed within
a certain range of WAL records, or otherwise the shortest length to
which the relation was truncated during that range of WAL records, or
otherwise InvalidBlockNumber. In addition, it stores a list of blocks
which have been modified during that range of WAL records, but
excluding blocks which were removed by truncation after they were
modified and never subsequently modified again.
In other words, it tells us which blocks need to copied in case of an
incremental backup covering that range of WAL records. But this
doesn't yet add the capability to actually perform an incremental
backup; the next patch will do that.
A new parameter summarize_wal enables or disables this new background
process. The background process also automatically deletes summary
files that are older than wal_summarize_keep_time, if that parameter
has a non-zero value and the summarizer is configured to run.
Patch by me, with some design help from Dilip Kumar and Andres Freund.
Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter
Eisentraut, and Álvaro Herrera.
Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
This avoids the wraparound in async.c and removes the corresponding code
complexity. The maximum amount of allocated SLRU pages for NOTIFY / LISTEN
queue is now determined by the max_notify_queue_pages GUC. The default
value is 1048576. It allows to consume up to 8 GB of disk space which is
exactly the limit we had previously.
Author: Maxim Orlov, Aleksander Alekseev, Alexander Korotkov, Teodor Sigaev
Author: Nikita Glukhov, Pavel Borisov, Yura Sokolov
Reviewed-by: Jacob Champion, Heikki Linnakangas, Alexander Korotkov
Reviewed-by: Japin Li, Pavel Borisov, Tom Lane, Peter Eisentraut, Andres Freund
Reviewed-by: Andrey Borodin, Dilip Kumar, Aleksander Alekseev
Discussion: https://postgr.es/m/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com
Discussion: https://postgr.es/m/CAJ7c6TPDOYBYrnCAeyndkBktO0WG2xSdYduTF0nxq%2BvfkmTF5Q%40mail.gmail.com
In order to troubleshoot misbehaving or buggy event triggers, the
documented advice is to enter single-user mode. In an attempt to
reduce the number of situations where single-user mode is required
(or even recommended) for non-extraordinary maintenance, this GUC
allows to temporarily suspend event triggers.
This was originally extracted from a larger patchset which aimed
at supporting event triggers on login events.
Reviewed-by: Ted Yu <yuzhihong@gmail.com>
Reviewed-by: Mikhail Gribkov <youzhick@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/9140106E-F9BF-4D85-8FC8-F2D3C094A6D9@yesql.se
Discussion: https://postgr.es/m/0d46d29f-4558-3af9-9c85-7774e14a7709@postgrespro.ru