When checking for expression indexes that may be affected by a Unicode
update during upgrade, check for a few more functions. Specifically,
check for documented regexp functions, as well as the new CASEFOLD()
function.
Also, fully-qualify references to pg_catalog.text and
pg_catalog.regtype.
Discussion: https://postgr.es/m/399b656a3abb0c9283538a040f72199c0601525c.camel@j-davis.com
Backpatch-through: 18
With tables defined like this,
CREATE TABLE ip (id int PRIMARY KEY);
CREATE TABLE ic (id int) INHERITS (ip);
ALTER TABLE ic ALTER id DROP NOT NULL;
pg_upgrade fails during the schema restore phase due to this error:
ERROR: column "id" in child table must be marked NOT NULL
This can only be fixed by marking the child column as NOT NULL before
the upgrade, which could take an arbitrary amount of time (because ic's
data must be scanned). Have pg_upgrade's check mode warn if that
condition is found, so that users know what to adjust before running the
upgrade for real.
Author: Ali Akbar <the.apaan@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/CACQjQLoMsE+1pyLe98pi0KvPG2jQQ94LWJ+PTiLgVRK4B=i_jg@mail.gmail.com
Since pg_upgrade does not transfer the cumulative statistics used
to trigger autovacuum and autoanalyze, the server may take much
longer than expected to process them post-upgrade. Currently, we
recommend analyzing only relations for which optimizer statistics
were not transferred by using the --analyze-in-stages and
--missing-stats-only options. This commit appends another
recommendation to analyze all relations to update the relevant
cumulative statistics by using the --analyze-only option. This is
similar to the recommendation for pg_stat_reset().
Reported-by: Christoph Berg <myon@debian.org>
Reviewed-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/aAfxfKC82B9NvJDj%40msg.df7cb.de
This function was initializing the "task" variable before a couple
of early returns. To fix, postpone the initialization until just
before it's needed.
Per Coverity.
Discussion: https://postgr.es/m/Z_KMsUH2-FEbiNjC%40nathan
This new option instructs pg_upgrade to move the data directories
from the old cluster to the new cluster and then to replace the
catalog files with those generated for the new cluster. This mode
can outperform --link, --clone, --copy, and --copy-file-range,
especially on clusters with many relations.
However, this mode creates many garbage files in the old cluster,
which can prolong the file synchronization step if
--sync-method=syncfs is used. To handle that, we recommend using
--sync-method=fsync with this mode, and pg_upgrade internally uses
"initdb --sync-only --no-sync-data-files" for file synchronization.
pg_upgrade will synchronize the catalog files as they are
transferred. We assume that the database files transferred from
the old cluster were synchronized prior to upgrade.
This mode also complicates reverting to the old cluster, so we
recommend restoring from backup upon failure during or after file
transfer. We did consider teaching pg_upgrade how to generate a
revert script for such failures, but we decided against it due to
the rarity of failing during file transfer, the complexity of
generating the script, and the potential for misusing the script.
The new mode is limited to clusters located in the same file
system. With some effort, we could probably support upgrades
between different file systems, but this mode is unlikely to offer
much benefit if we have to copy the files across file system
boundaries.
It is also limited to upgrades from version 10 or newer. There are
a few known obstacles for using swap mode to upgrade from older
versions. For example, the visibility map format changed in v9.6,
and the sequence tuple format changed in v10. In fact, swap mode
omits the --sequence-data option in its uses of pg_dump and instead
reuses the old cluster's sequence data files. While teaching swap
mode to deal with these kinds of changes is surely possible (and we
may have to deal with similar problems in the future, anyway), it
doesn't seem worth the effort to support upgrades from
long-unsupported versions.
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan
This commit introduces a new GUC option max_active_replication_origins
to control the maximum number of active replication
origins. Previously, this was controlled by
'max_replication_slots'. Having a separate GUC option provides better
flexibility for setting up subscribers, as they may not require
replication slots (for cascading replication) but always require
replication origins.
Author: Euler Taveira <euler@eulerto.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Discussion: https://postgr.es/m/b81db436-8262-4575-b7c4-bc0c1551000b@app.fastmail.com
Now that pg_upgrade can carry over most optimizer statistics, we
should recommend using vacuumdb's new --missing-stats-only option
to only analyze relations that are missing statistics.
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
This change adds a new option --set-char-signedness to pg_upgrade. It
enables user to set arbitrary signedness during pg_upgrade. This helps
cases where user who knew they copied the v17 source cluster from
x86 (signedness=true) to ARM (signedness=false) can pg_upgrade
properly without the prerequisite of acquiring an x86 VM.
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
I wanted to avoid adjusting this code too much when converting
these tasks to use the new parallelization framework (see commit
40e2e5e92b), which is why this is being done as a follow-up commit.
These stylistic adjustments result in fewer lines of code and fewer
levels of indentation in some places.
While at it, add names to the UpgradeTaskSlotState enum and the
UpgradeTaskSlot struct. I'm not aware of any established project
policy in this area, but let's at least be consistent within the
same file.
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/ZunW7XHLd2uTts4f%40nathan
There's no need to add another level of indentation to this status
message. pg_log() will put it in the right place.
Oversight in commit 347758b120.
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/ZunW7XHLd2uTts4f%40nathan
Backpatch-through: 17
This commit makes use of the new task framework in pg_upgrade to
parallelize the check for incompatible user-defined encoding
conversions, i.e., those defined on servers older than v14. This
step will now process multiple databases concurrently when
pg_upgrade's --jobs option is provided a value greater than 1.
Reviewed-by: Daniel Gustafsson, Ilya Gladyshev
Discussion: https://postgr.es/m/20240516211638.GA1688936%40nathanxps13
This commit makes use of the new task framework in pg_upgrade to
parallelize the check for tables declared WITH OIDS. This step
will now process multiple databases concurrently when pg_upgrade's
--jobs option is provided a value greater than 1.
Reviewed-by: Daniel Gustafsson, Ilya Gladyshev
Discussion: https://postgr.es/m/20240516211638.GA1688936%40nathanxps13
This commit makes use of the new task framework in pg_upgrade to
parallelize the check for usage of incompatible polymorphic
functions, i.e., those with arguments of type anyarray/anyelement
rather than the newer anycompatible variants. This step will now
process multiple databases concurrently when pg_upgrade's --jobs
option is provided a value greater than 1.
Reviewed-by: Daniel Gustafsson, Ilya Gladyshev
Discussion: https://postgr.es/m/20240516211638.GA1688936%40nathanxps13
This commit makes use of the new task framework in pg_upgrade to
parallelize the check for user-defined postfix operators. This
step will now process multiple databases concurrently when
pg_upgrade's --jobs option is provided a value greater than 1.
Reviewed-by: Daniel Gustafsson, Ilya Gladyshev
Discussion: https://postgr.es/m/20240516211638.GA1688936%40nathanxps13
This commit makes use of the new task framework in pg_upgrade to
parallelize the check for contrib/isn functions that rely on the
bigint data type. This step will now process multiple databases
concurrently when pg_upgrade's --jobs option is provided a value
greater than 1.
Reviewed-by: Daniel Gustafsson, Ilya Gladyshev
Discussion: https://postgr.es/m/20240516211638.GA1688936%40nathanxps13
This commit makes use of the new task framework in pg_upgrade to
parallelize the checks for incompatible data types, i.e., data
types whose on-disk format has changed, data types that have been
removed, etc. This step will now process multiple databases
concurrently when pg_upgrade's --jobs option is provided a value
greater than 1.
Reviewed-by: Daniel Gustafsson, Ilya Gladyshev
Discussion: https://postgr.es/m/20240516211638.GA1688936%40nathanxps13
This commit makes use of the new task framework in pg_upgrade to
parallelize the part of check_old_cluster_subscription_state() that
verifies each of the subscribed tables is in the 'i' (initialize)
or 'r' (ready) state. This check will now process multiple
databases concurrently when pg_upgrade's --jobs option is provided
a value greater than 1.
Reviewed-by: Daniel Gustafsson, Ilya Gladyshev
Discussion: https://postgr.es/m/20240516211638.GA1688936%40nathanxps13
At the moment, pg_upgrade stores whether it is doing a "live check"
(i.e., the user specified --check and the old server is still
running) in a local variable scoped to main(). This live_check
variable is passed to several functions. To further complicate
matters, a few call sites provide a hard-coded "false" as the
live_check argument. Specifically, this is done when calling these
functions for the new cluster, for which any live-check-only paths
won't apply.
This commit moves the live_check variable to the global user_opts
variable, which stores information about the options the user
specified on the command line. This allows us to remove the
live_check parameter from several functions. For the functions
with callers that provide a hard-coded "false" as the live_check
argument (e.g., get_control_data()), we verify the given cluster is
the old cluster before taking any live-check-only paths.
This small refactoring effort helps simplify some proposed changes
that would parallelize many of pg_upgrade's once-in-each-database
tasks using libpq's asynchronous APIs. By removing the live_check
parameter, we can more easily convert the functions to callbacks
for the new parallel system.
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20240516211638.GA1688936%40nathanxps13
Presently, pg_upgrade obtains the number of subscriptions in the
to-be-upgraded cluster by first querying pg_subscription in every
database for the number of subscriptions in only that database.
Then, in count_old_cluster_subscriptions(), it adds all the values
collected in the first step. This is expensive, especially when
there are many databases.
Fortunately, there is a better way to retrieve the subscription
count. Since pg_subscription is a shared catalog, we only need to
connect to a single database and query it once. This commit
modifies pg_upgrade to use that approach, which also allows us to
trim several lines of code. In passing, move the call to
get_db_subscription_count(), which has been renamed to
get_subscription_count(), from get_db_rel_and_slot_infos() to the
dedicated >= v17 section in check_and_dump_old_cluster().
We may be able to make similar improvements to
get_old_cluster_logical_slot_infos(), but that is left as a future
exercise.
Reviewed-by: Michael Paquier, Amit Kapila
Discussion: https://postgr.es/m/ZprQJv_TxccN3tkr%40nathan
Backpatch-through: 17
After further review, we want to move in the direction of always
quoting GUC names in error messages, rather than the previous (PG16)
wildly mixed practice or the intermittent (mid-PG17) idea of doing
this depending on how possibly confusing the GUC name is.
This commit applies appropriate quotes to (almost?) all mentions of
GUC names in error messages. It partially supersedes a243569bf6 and
8d9978a717, which had moved things a bit in the opposite direction
but which then were abandoned in a partial state.
Author: Peter Smith <smithpb2250@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w%40mail.gmail.com
Run pgindent, pgperltidy, and reformat-dat-files.
The pgindent part of this is pretty small, consisting mainly of
fixing up self-inflicted formatting damage from patches that
hadn't bothered to add their new typedefs to typedefs.list.
In order to keep it from making anything worse, I manually added
a dozen or so typedefs that appeared in the existing typedefs.list
but not in the buildfarm's list. Perhaps we should formalize that,
or better find a way to get those typedefs into the automatic list.
pgperltidy is as opinionated as always, and reformat-dat-files too.
Coverity identified this as a resource leak. It's surely of no
consequence given that the function is called only once per run, but
freeing the storage is no more work than dismissing the complaint.
Minor oversight in commit 347758b12.
The checks for data type usage were each connecting to all databases
in the cluster and running their query. On clusters which have a lot
of databases this can become unnecessarily expensive. This moves the
checks to run in a single connection instead to minimize setup and
teardown overhead.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/BB4C76F-D416-4F9F-949E-DBE950D37787@yesql.se
Most callers of strerror() are removed from the backend code. The
remaining callers require special handling with a saved errno from a
previous system call. The frontend code still needs strerror() where
error states need to be handled outside of fprintf.
Note that pg_regress is not changed to use %m as the TAP output may
clobber errno, since those functions call fprintf() and friends before
evaluating the format string.
Support for %m in src/port/snprintf.c has been added in d6c55de1f9,
hence all the stable branches currently supported include it.
Author: Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/87sf13jhuw.fsf@wibble.ilmari.org
The copy_file_range() system call is available on at least Linux and
FreeBSD, and asks the kernel to use efficient ways to copy ranges of a
file. Options available to the kernel include sharing block ranges
(similar to --clone mode), and pushing down block copies to the storage
layer.
For automated testing, see PG_TEST_PG_UPGRADE_MODE. (Perhaps in a later
commit we could consider setting this mode for one of the CI targets.)
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKGKe7Hb0-UNih8VD5UNZy5-ojxFb3Pr3xSBBL8qj2M2%3DdQ%40mail.gmail.com
This feature will allow us to replicate the changes on subscriber nodes
after the upgrade.
Previously, only the subscription metadata information was preserved.
Without the list of relations and their state, it's not possible to
re-enable the subscriptions without missing some records as the list of
relations can only be refreshed after enabling the subscription (and
therefore starting the apply worker). Even if we added a way to refresh
the subscription while enabling a publication, we still wouldn't know
which relations are new on the publication side, and therefore should be
fully synced, and which shouldn't.
To preserve the subscription relations, this patch teaches pg_dump to
restore the content of pg_subscription_rel from the old cluster by using
binary_upgrade_add_sub_rel_state SQL function. This is supported only
in binary upgrade mode.
The subscription's replication origin is needed to ensure that we don't
replicate anything twice.
To preserve the replication origins, this patch teaches pg_dump to update
the replication origin along with creating a subscription by using
binary_upgrade_replorigin_advance SQL function to restore the
underlying replication origin remote LSN. This is supported only in
binary upgrade mode.
pg_upgrade will check that all the subscription relations are in 'i'
(init) or in 'r' (ready) state and will error out if that's not the case,
logging the reason for the failure. This helps to avoid the risk of any
dangling slot or origin after the upgrade.
Author: Vignesh C, Julien Rouhaud, Shlok Kyal
Reviewed-by: Peter Smith, Masahiko Sawada, Michael Paquier, Amit Kapila, Hayato Kuroda
Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
While reading information from the old cluster, a list of logical
slots is fetched. At the later part of upgrading, pg_upgrade revisits the
list and restores slots by executing pg_create_logical_replication_slot()
on the new cluster. Migration of logical replication slots is only
supported when the old cluster is version 17.0 or later.
If the old node has invalid slots or slots with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.
The significant advantage of this commit is that it makes it easy to
continue logical replication even after upgrading the publisher node.
Previously, pg_upgrade allowed copying publications to a new node. With
this patch, adjusting the connection string to the new publisher will
cause the apply worker on the subscriber to connect to the new publisher
automatically. This enables seamless continuation of logical replication,
even after an upgrade.
Author: Hayato Kuroda, Hou Zhijie
Reviewed-by: Peter Smith, Bharath Rupireddy, Dilip Kumar, Vignesh C, Shlok Kyal
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
Discussion: http://postgr.es/m/CAA4eK1+t7xYcfa0rEQw839=b2MzsfvYDPz3xbD+ZqOdP3zpKYg@mail.gmail.com
The new_cluster parameter in check_for_new_tablespace_dir was
shadowing the globally defined new_cluster variable, causing
compiler warnings when running with -Wshadow. The function is
only applicable to the new cluster, so remove the parameter
rather than rename to match check_new_cluster_is_empty which
also only applies to the new cluster.
Author: Peter Smith <peter.b.smith@fujitsu.com>
Discussion: https://postgr.es/m/CAHut+PvS_PHLntWy1yTgXv0O1tWm4iVcKBQFzpoQRDsm2Ce_Fg@mail.gmail.com
Run pgindent, pgperltidy, and reformat-dat-files.
This set of diffs is a bit larger than typical. We've updated to
pg_bsd_indent 2.1.2, which properly indents variable declarations that
have multi-line initialization expressions (the continuation lines are
now indented one tab stop). We've also updated to perltidy version
20230309 and changed some of its settings, which reduces its desire to
add whitespace to lines to make assignments etc. line up. Going
forward, that should make for fewer random-seeming changes to existing
code.
Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
The check for the number of roles in the target cluster for an upgrade
selects the existing roles and performs a COUNT(*) over the result. A
value of one is the expected query result value indicating that only
the install user is present in the new cluster. The result was converted
with the function for converting a string containing an Oid into a numeric,
which avoids potential overflow but makes the code less readable since
it's not actually an Oid at all.
Discussion: https://postgr.es/m/41AB5F1F-4389-4B25-9668-5C430375836C@yesql.se
Previously, pg_upgrade checked that the old and new clusters were
compatible, including the locale and encoding. But the new cluster was
just created, and only template0 from the new cluster will be
preserved (template1 and postgres are both recreated during the
upgrade process).
Because template0 is not sensitive to locale or encoding, just update
the pg_database entry to be the same as template0 from the original
cluster.
This commit makes it easier to change the default initdb locale or
encoding settings without causing needless incompatibilities.
Discussion: https://postgr.es/m/d62b2874-729b-d26a-2d0a-0d64f509eca4@enterprisedb.com
Reviewed-by: Peter Eisentraut