1
0
mirror of https://github.com/postgres/postgres.git synced 2025-06-14 18:42:34 +03:00

Revert per-index collation version tracking feature.

Design problems were discovered in the handling of composite types and
record types that would cause some relevant versions not to be recorded.
Misgivings were also expressed about the use of the pg_depend catalog
for this purpose.  We're out of time for this release so we'll revert
and try again.

Commits reverted:

1bf946bd: Doc: Document known problem with Windows collation versions.
cf002008: Remove no-longer-relevant test case.
ef387bed: Fix bogus collation-version-recording logic.
0fb0a050: Hide internal error for pg_collation_actual_version(<bad OID>).
ff942057: Suppress "warning: variable 'collcollate' set but not used".
d50e3b1f: Fix assertion in collation version lookup.
f24b1569: Rethink extraction of collation dependencies.
257836a7: Track collation versions for indexes.
cd6f479e: Add pg_depend.refobjversion.
7d1297df: Remove pg_collation.collversion.

Discussion: https://postgr.es/m/CA%2BhUKGLhj5t1fcjqAu8iD9B3ixJtsTNqyCCD4V0aTO9kAKAjjA%40mail.gmail.com
This commit is contained in:
Thomas Munro
2021-05-07 20:17:42 +12:00
parent a288d94c91
commit ec48314708
55 changed files with 463 additions and 1354 deletions

View File

@ -2374,6 +2374,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
<symbol>LC_CTYPE</symbol> for this collation object
</para></entry>
</row>
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>collversion</structfield> <type>text</type>
</para>
<para>
Provider-specific version of the collation. This is recorded when the
collation is created and then checked when it is used, to detect
changes in the collation definition that could lead to data corruption.
</para></entry>
</row>
</tbody>
</tgroup>
</table>
@ -3317,18 +3328,6 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
A code defining the specific semantics of this dependency relationship; see text
</para></entry>
</row>
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>refobjversion</structfield> <type>text</type>
</para>
<para>
An optional version for the referenced object. Currently used for
indexes' collations (see <xref linkend="collation-versions"/>).
</para>
</entry>
</row>
</tbody>
</tgroup>
</table>

View File

@ -948,54 +948,6 @@ CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-tr
</tip>
</sect3>
</sect2>
<sect2 id="collation-versions">
<title>Collation Versions</title>
<para>
The sort order defined by a collation is not necessarily fixed over time.
<productname>PostgreSQL</productname> relies on external libraries that
are subject to operating system upgrades, and can also differ between
servers involved in binary replication and file-system-level migration.
Persistent data structures such as B-trees that depend on sort order might
be corrupted by any resulting change.
<productname>PostgreSQL</productname> defends against this by recording the
current version of each referenced collation for any index that depends on
it in the
<link linkend="catalog-pg-depend"><structname>pg_depend</structname></link>
catalog, if the collation provider makes that information available. If the
provider later begins to report a different version, a warning will be
issued when the index is accessed, until either the
<xref linkend="sql-reindex"/> command or the
<xref linkend="sql-alterindex"/> command is used to update the version.
</para>
<para>
Version information is available from the
<literal>icu</literal> provider on all operating systems. For the
<literal>libc</literal> provider, versions are currently only available
on systems using the GNU C library (most Linux systems), FreeBSD and
Windows.
</para>
<note>
<para>
When using the GNU C library for collations, the C library's version
is used as a proxy for the collation version. Many Linux distributions
change collation definitions only when upgrading the C library, but this
approach is imperfect as maintainers are free to back-port newer
collation definitions to older C library releases.
</para>
<para>
When using Windows collations, version information is only available for
collations defined with BCP 47 language tags such as
<literal>en-US</literal>. Currently, <command>initdb</command> selects
a default locale using a traditional Windows language and country
string such as <literal>English_United States.1252</literal>. The
<literal>--lc-collate</literal> option can be used to provide an explicit
locale name in BCP 47 format.
</para>
</note>
</sect2>
</sect1>
<sect1 id="multibyte">

View File

@ -26547,9 +26547,11 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
</para>
<para>
Returns the actual version of the collation object as it is currently
installed in the operating system. <literal>null</literal> is returned
on operating systems where <productname>PostgreSQL</productname>
doesn't have support for versions.
installed in the operating system. If this is different from the
value in
<structname>pg_collation</structname>.<structfield>collversion</structfield>,
then objects depending on the collation might need to be rebuilt. See
also <xref linkend="sql-altercollation"/>.
</para></entry>
</row>

View File

@ -21,6 +21,8 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
ALTER COLLATION <replaceable>name</replaceable> REFRESH VERSION
ALTER COLLATION <replaceable>name</replaceable> RENAME TO <replaceable>new_name</replaceable>
ALTER COLLATION <replaceable>name</replaceable> OWNER TO { <replaceable>new_owner</replaceable> | CURRENT_ROLE | CURRENT_USER | SESSION_USER }
ALTER COLLATION <replaceable>name</replaceable> SET SCHEMA <replaceable>new_schema</replaceable>
@ -86,9 +88,70 @@ ALTER COLLATION <replaceable>name</replaceable> SET SCHEMA <replaceable>new_sche
</listitem>
</varlistentry>
<varlistentry>
<term><literal>REFRESH VERSION</literal></term>
<listitem>
<para>
Update the collation's version.
See <xref linkend="sql-altercollation-notes"/> below.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1 id="sql-altercollation-notes" xreflabel="Notes">
<title>Notes</title>
<para>
When using collations provided by the ICU library, the ICU-specific version
of the collator is recorded in the system catalog when the collation object
is created. When the collation is used, the current version is
checked against the recorded version, and a warning is issued when there is
a mismatch, for example:
<screen>
WARNING: collation "xx-x-icu" has version mismatch
DETAIL: The collation in the database was created using version 1.2.3.4, but the operating system provides version 2.3.4.5.
HINT: Rebuild all objects affected by this collation and run ALTER COLLATION pg_catalog."xx-x-icu" REFRESH VERSION, or build PostgreSQL with the right library version.
</screen>
A change in collation definitions can lead to corrupt indexes and other
problems because the database system relies on stored objects having a
certain sort order. Generally, this should be avoided, but it can happen
in legitimate circumstances, such as when
using <command>pg_upgrade</command> to upgrade to server binaries linked
with a newer version of ICU. When this happens, all objects depending on
the collation should be rebuilt, for example,
using <command>REINDEX</command>. When that is done, the collation version
can be refreshed using the command <literal>ALTER COLLATION ... REFRESH
VERSION</literal>. This will update the system catalog to record the
current collator version and will make the warning go away. Note that this
does not actually check whether all affected objects have been rebuilt
correctly.
</para>
<para>
When using collations provided by <literal>libc</literal> and
<productname>PostgreSQL</productname> was built with the GNU C library, the
C library's version is used as a collation version. Since collation
definitions typically change only with GNU C library releases, this provides
some defense against corruption, but it is not completely reliable.
</para>
<para>
Currently, there is no version tracking for the database default collation.
</para>
<para>
The following query can be used to identify all collations in the current
database that need to be refreshed and the objects that depend on them:
<programlisting><![CDATA[
SELECT pg_describe_object(refclassid, refobjid, refobjsubid) AS "Collation",
pg_describe_object(classid, objid, objsubid) AS "Object"
FROM pg_depend d JOIN pg_collation c
ON refclassid = 'pg_collation'::regclass AND refobjid = c.oid
WHERE c.collversion <> pg_collation_actual_version(c.oid)
ORDER BY 1, 2;
]]></programlisting></para>
</refsect1>
<refsect1>
<title>Examples</title>

View File

@ -25,7 +25,6 @@ ALTER INDEX [ IF EXISTS ] <replaceable class="parameter">name</replaceable> RENA
ALTER INDEX [ IF EXISTS ] <replaceable class="parameter">name</replaceable> SET TABLESPACE <replaceable class="parameter">tablespace_name</replaceable>
ALTER INDEX <replaceable class="parameter">name</replaceable> ATTACH PARTITION <replaceable class="parameter">index_name</replaceable>
ALTER INDEX <replaceable class="parameter">name</replaceable> [ NO ] DEPENDS ON EXTENSION <replaceable class="parameter">extension_name</replaceable>
ALTER INDEX <replaceable class="parameter">name</replaceable> ALTER COLLATION <replaceable class="parameter">collation_name</replaceable> REFRESH VERSION
ALTER INDEX [ IF EXISTS ] <replaceable class="parameter">name</replaceable> SET ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )
ALTER INDEX [ IF EXISTS ] <replaceable class="parameter">name</replaceable> RESET ( <replaceable class="parameter">storage_parameter</replaceable> [, ... ] )
ALTER INDEX [ IF EXISTS ] <replaceable class="parameter">name</replaceable> ALTER [ COLUMN ] <replaceable class="parameter">column_number</replaceable>
@ -113,20 +112,6 @@ ALTER INDEX ALL IN TABLESPACE <replaceable class="parameter">name</replaceable>
</listitem>
</varlistentry>
<varlistentry>
<term><literal>ALTER COLLATION <replaceable class="parameter">collation_name</replaceable> REFRESH VERSION</literal></term>
<listitem>
<para>
Silences warnings about mismatched collation versions, by declaring
that the index is compatible with the current collation definition.
Be aware that incorrect use of this command can hide index corruption.
If you don't know whether a collation's definition has changed
incompatibly, <xref linkend="sql-reindex"/> is a safe alternative.
See <xref linkend="collation-versions"/> for more information.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><literal>SET ( <replaceable class="parameter">storage_parameter</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )</literal></term>
<listitem>

View File

@ -27,6 +27,7 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> (
[ LC_CTYPE = <replaceable>lc_ctype</replaceable>, ]
[ PROVIDER = <replaceable>provider</replaceable>, ]
[ DETERMINISTIC = <replaceable>boolean</replaceable>, ]
[ VERSION = <replaceable>version</replaceable> ]
)
CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replaceable>existing_collation</replaceable>
</synopsis>
@ -148,6 +149,26 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable>version</replaceable></term>
<listitem>
<para>
Specifies the version string to store with the collation. Normally,
this should be omitted, which will cause the version to be computed
from the actual version of the collation as provided by the operating
system. This option is intended to be used
by <command>pg_upgrade</command> for copying the version from an
existing installation.
</para>
<para>
See also <xref linkend="sql-altercollation"/> for how to handle
collation version mismatches.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable>existing_collation</replaceable></term>

View File

@ -215,21 +215,6 @@ PostgreSQL documentation
</listitem>
</varlistentry>
<varlistentry>
<term><option>--index-collation-versions-unknown</option></term>
<listitem>
<para>
When upgrading indexes from releases before 14 that didn't track
collation versions, <application>pg_upgrade</application>
assumes by default that the upgraded indexes are compatible with the
currently installed versions of relevant collations (see
<xref linkend="collation-versions"/>). Specify
<option>--index-collation-versions-unknown</option> to mark
them as needing to be rebuilt instead.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-?</option></term>
<term><option>--help</option></term>

View File

@ -40,15 +40,6 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { IN
several scenarios in which to use <command>REINDEX</command>:
<itemizedlist>
<listitem>
<para>
The index depends on the sort order of a collation, and the definition
of the collation has changed. This can cause index scans to fail to
find keys that are present. See <xref linkend="collation-versions"/> for
more information.
</para>
</listitem>
<listitem>
<para>
An index has become corrupted, and no longer contains valid