1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-07 19:06:32 +03:00
Commit Graph

210 Commits

Author SHA1 Message Date
Jeff Davis
c0ef1234df Fix whitespace in commit 005c6b833f. 2024-07-28 13:34:52 -07:00
Jeff Davis
1c461a8d8d Refactor: make default_locale internal to pg_locale.c.
Discussion: https://postgr.es/m/2228884bb1f1a02614b39f71a90c94d2cc8a3a2f.camel@j-davis.com
Reviewed-by: Peter Eisentraut, Andreas Karlsson
2024-07-28 13:07:25 -07:00
Jeff Davis
005c6b833f Change collation cache to use simplehash.h.
Speeds up text comparison expressions when using a collation other
than the database default collation. Does not affect larger operations
such as ORDER BY, because the lookup is only done once.

Discussion: https://postgr.es/m/7bb9f018d20a7b30b9a7f6231efab1b5e50c7720.camel@j-davis.com
Reviewed-by: John Naylor, Andreas Karlsson
2024-07-28 12:39:57 -07:00
Peter Eisentraut
5d2e1cc117 Replace some strtok() with strsep()
strtok() considers adjacent delimiters to be one delimiter, which is
arguably the wrong behavior in some cases.  Replace with strsep(),
which has the right behavior: Adjacent delimiters create an empty
token.

Affected by this are parsing of:

- Stored SCRAM secrets
  ("SCRAM-SHA-256$<iterations>:<salt>$<storedkey>:<serverkey>")

- ICU collation attributes
  ("und@colStrength=primary;colCaseLevel=yes") for ICU older than
  version 54

- PG_COLORS environment variable
  ("error=01;31:warning=01;35:note=01;36:locus=01")

- pg_regress command-line options with comma-separated list arguments
  (--dbname, --create-role) (currently only used pg_regress_ecpg)

Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: David Steele <david@pgmasters.net>
Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c3944222a@eisentraut.org
2024-07-22 15:45:46 +02:00
Michael Paquier
b81a71aa05 Assign error codes where missing for user-facing failures
All the errors triggered in the code paths patched here would cause the
backend to issue an internal_error errcode, which is a state that should
be used only for "can't happen" situations.  However, these code paths
are reachable by the regression tests, and could be seen by users in
valid cases.  Some regression tests expect internal errcodes as they
manipulate the backend state to cause corruption (like checksums), or
use elog() because it is more convenient (like injection points), these
have no need to change.

This reduces the number of internal failures triggered in a check-world
by more than half, while providing correct errcodes for these valid
cases.

Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/Zic_GNgos5sMxKoa@paquier.xyz
2024-07-04 09:48:40 +09:00
Peter Eisentraut
17974ec259 Revise GUC names quoting in messages again
After further review, we want to move in the direction of always
quoting GUC names in error messages, rather than the previous (PG16)
wildly mixed practice or the intermittent (mid-PG17) idea of doing
this depending on how possibly confusing the GUC name is.

This commit applies appropriate quotes to (almost?) all mentions of
GUC names in error messages.  It partially supersedes a243569bf6 and
8d9978a717, which had moved things a bit in the opposite direction
but which then were abandoned in a partial state.

Author: Peter Smith <smithpb2250@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w%40mail.gmail.com
2024-05-17 11:44:26 +02:00
Jeff Davis
832c4f657f Remove obsolete comment.
Per suggestion from Peter, the comment was not helpful, so remove it
rather than fixing it.

Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/d9421b21-e759-4b74-a039-c487b469c1f3@eisentraut.org
2024-05-07 11:44:47 -07:00
Jeff Davis
46a44dc372 Use version for builtin collations.
Given that the version field already exists, there's little reason not
to use it. Suggestion from Peter Eisentraut.

Discussion: https://postgr.es/m/613c120a-5413-4fa7-a501-6590eae558f8@eisentraut.org
Reviewed-by: Peter Eisentraut
2024-03-29 10:53:26 -07:00
Jeff Davis
f69319f2f1 Support C.UTF-8 locale in the new builtin collation provider.
The builtin C.UTF-8 locale has similar semantics to the libc locale of
the same name. That is, code point sort order (fast, memcmp-based)
combined with Unicode semantics for character operations such as
pattern matching, regular expressions, and
LOWER()/INITCAP()/UPPER(). The character semantics are based on
Unicode simple case mappings.

The builtin provider's C.UTF-8 offers several important advantages
over libc:

 * faster sorting -- benefits from additional optimizations such as
   abbreviated keys and varstrfastcmp_c
 * faster case conversion, e.g. LOWER(), at least compared with some
   libc implementations
 * available on all platforms with identical semantics, and the
   semantics are stable, testable, and documentable within a given
   Postgres major version

Being based on memcmp, the builtin C.UTF-8 locale does not offer
natural language sort order. But it is an improvement for most use
cases that might otherwise use libc's "C.UTF-8" locale, as well as
many use cases that use libc's "C" locale.

Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com
Reviewed-by: Daniel Vérité, Peter Eisentraut, Jeremy Schneider
2024-03-19 15:24:41 -07:00
Jeff Davis
60769c62dc Fix another warning, introduced by 846311051e.
Discussion: https://postgr.es/m/3703896.1710799495@sss.pgh.pa.us
Reported-by: Tom Lane
2024-03-18 15:34:43 -07:00
Jeff Davis
846311051e Address more review comments on commit 2d819a08a1.
Based on comments from Peter Eisentraut.

 * Document CREATE DATABASE ... BUILTIN_LOCALE.
 * Determine required encoding based on locale name for CREATE
   COLLATION. Use -1 for "C" (requires catversion bump).
 * initdb output fixups.
 * Make ctype_is_c a constant true for now.
 * Fixups to ICU 010_create_database.pl test.

Discussion: https://postgr.es/m/4135cf11-206d-40ed-96c0-9363c1232379@eisentraut.org
2024-03-18 11:58:13 -07:00
Jeff Davis
61f352ece9 Fix unreachable code warning from commit 2d819a08a1.
Found by Coverity.

Discussion: https://postgr.es/m/3422201.1710711993@sss.pgh.pa.us
Reported-by: Tom Lane
2024-03-18 09:15:47 -07:00
Jeff Davis
2d819a08a1 Introduce "builtin" collation provider.
New provider for collations, like "libc" or "icu", but without any
external dependency.

Initially, the only locale supported by the builtin provider is "C",
which is identical to the libc provider's "C" locale. The libc
provider's "C" locale has always been treated as a special case that
uses an internal implementation, without using libc at all -- so the
new builtin provider uses the same implementation.

The builtin provider's locale is independent of the server environment
variables LC_COLLATE and LC_CTYPE. Using the builtin provider, the
database collation locale can be "C" while LC_COLLATE and LC_CTYPE are
set to "en_US", which is impossible with the libc provider.

By offering a new builtin provider, it clarifies that the semantics of
a collation using this provider will never depend on libc, and makes
it easier to document the behavior.

Discussion: https://postgr.es/m/ab925f69-5f9d-f85e-b87c-bd2a44798659@joeconway.com
Discussion: https://postgr.es/m/dd9261f4-7a98-4565-93ec-336c1c110d90@manitou-mail.org
Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com
Reviewed-by: Daniel Vérité, Peter Eisentraut, Jeremy Schneider
2024-03-13 23:33:44 -07:00
Jeff Davis
f696c0cd5f Catalog changes preparing for builtin collation provider.
Rename pg_collation.colliculocale to colllocale, and
pg_database.daticulocale to datlocale. These names reflects that the
fields will be useful for the upcoming builtin provider as well, not
just for ICU.

This is purely a rename; no changes to the meaning of the fields.

Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com
Reviewed-by: Peter Eisentraut
2024-03-09 14:48:18 -08:00
Peter Eisentraut
dbbca2cf29 Remove unused #include's from backend .c files
as determined by include-what-you-use (IWYU)

While IWYU also suggests to *add* a bunch of #include's (which is its
main purpose), this patch does not do that.  In some cases, a more
specific #include replaces another less specific one.

Some manual adjustments of the automatic result:

- IWYU currently doesn't know about includes that provide global
  variable declarations (like -Wmissing-variable-declarations), so
  those includes are being kept manually.

- All includes for port(ability) headers are being kept for now, to
  play it safe.

- No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the
  patch from exploding in size.

Note that this patch touches just *.c files, so nothing declared in
header files changes in hidden ways.

As a small example, in src/backend/access/transam/rmgr.c, some IWYU
pragma annotations are added to handle a special case there.

Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
2024-03-04 12:02:20 +01:00
Bruce Momjian
29275b1d17 Update copyright for 2024
Reported-by: Michael Paquier

Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz

Backpatch-through: 12
2024-01-03 20:49:05 -05:00
Michael Paquier
8d9978a717 Apply quotes more consistently to GUC names in logs
Quotes are applied to GUCs in a very inconsistent way across the code
base, with a mix of double quotes or no quotes used.  This commit
removes double quotes around all the GUC names that are obviously
referred to as parameters with non-English words (use of underscore,
mixed case, etc).

This is the result of a discussion with Álvaro Herrera, Nathan Bossart,
Laurenz Albe, Peter Eisentraut, Tom Lane and Daniel Gustafsson.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w@mail.gmail.com
2023-11-30 14:11:45 +09:00
Michael Paquier
b8f44a4779 Refactor error messages for unsupported providers in pg_locale.c
These code paths should not be reached normally, but if they are an
error with "(null)" as information for the collation provider would show
up if no locale is set, while we can assume that we are referring to
libc.

This refactors the code so as the provider is always reported even if no
locale is set.  The name of the function where the error happens is
added, while on it, as it can be helpful for debugging.

Issue introduced by d87d548cd0, so backpatch down to 16.

Author: Michael Paquier, Ranier Vilela
Reviewed-by: Jeff Davis, Kyotaro Horiguchi
Discussion: https://postgr.es/m/7073610042fcf97e1bea2ce08b7e0214b5e11094.camel@j-davis.com
Backpatch-through: 16
2023-09-14 08:35:02 +09:00
Thomas Munro
4e9fa6d56b Don't expose Windows' mbstowcs_l() and wcstombs_l().
Windows has similar functions with leading underscores.  Previously, we
provided the rename via a macro in win32_port.h.  In fact its functions
are not always good replacements for the Unix functions, since they
can't deal with UTF-8.  They are only currently used by pg_locale.c,
which is careful to redirect to other Windows routines for UTF-8.  Given
that portability hazard, it seem unlikely to be a good idea to encourage
any other code to think of these functions as being available outside
pg_locale.c.  Any code that thinks it wants these functions probably
wants our wchar2char() or char2wchar() routines instead, or it won't
actually work on Windows in UTF-8 databases.

Furthermore, some major libc implementations including glibc don't have
them (they only have the standard variants without _l), so external code
is very unlikely to require them to exist.

Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKG%2Bt_CHPzEoPnKyARJBJgE9-GxNajJo6ZuSfRK_KWFO%2B6w%40mail.gmail.com
2023-07-11 09:34:22 +12:00
Peter Eisentraut
e53a611523 Message wording improvements 2023-07-10 10:47:24 +02:00
Thomas Munro
8d9a9f034e All supported systems have locale_t.
locale_t is defined by POSIX.1-2008 and SUSv4, and available on all
targeted systems.  For Windows, win32_port.h redirects to a partial
implementation called _locale_t.  We can now remove a lot of
compile-time tests for HAVE_LOCALE_T, and associated comments and dead
code branches that were needed for older computers.

Since configure + MinGW builds didn't detect locale_t but now we assume
that all systems have it, further inconsistencies among the 3 Windows build
systems were revealed.  With this commit, we no longer define
HAVE_WCSTOMBS_L and HAVE_MBSTOWCS_L on any Windows build system, but
we have logic to deal with that so that replacements are available where
appropriate.

Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tristan Partin <tristan@neon.tech>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKGLg7_T2GKwZFAkEf0V7vbnur-NfCjZPKZb%3DZfAXSV1ORw%40mail.gmail.com
2023-07-09 11:55:18 +12:00
Jeff Davis
f3a01af29b ICU: do not convert locale 'C' to 'en-US-u-va-posix'.
Older versions of ICU canonicalize "C" to "en-US-u-va-posix"; but
starting in ICU version 64, the "C" locale is considered
obsolete. Postgres commit ea1db8ae70 introduced code to always
canonicalize "C" to "en-US-u-va-posix" for consistency and
convenience, but it was deemed too confusing.

This commit removes that code, so that "C" is treated like other ICU
locale names: canonicalization is attempted, and if it fails, the
behavior is controlled by icu_validation_level.

A similar change was previously committed as f7faa9976c, then reverted
due to an ICU-version-dependent test failure. This commit un-reverts
it, omitting the test because we now expect the behavior to depend on
the version of ICU being used.

Discussion: https://postgr.es/m/3a200aca-4672-4b37-fc91-5d198a323503%40eisentraut.org
Discussion: https://postgr.es/m/f83f089ee1e9acd5dbbbf3353294d24e1f196e95.camel@j-davis.com
Discussion: https://postgr.es/m/37520ec1ae9591f83132f82dbd625f3fc2d69c16.camel@j-davis.com
2023-06-21 13:18:25 -07:00
Tom Lane
0245f8db36 Pre-beta mechanical code beautification.
Run pgindent, pgperltidy, and reformat-dat-files.

This set of diffs is a bit larger than typical.  We've updated to
pg_bsd_indent 2.1.2, which properly indents variable declarations that
have multi-line initialization expressions (the continuation lines are
now indented one tab stop).  We've also updated to perltidy version
20230309 and changed some of its settings, which reduces its desire to
add whitespace to lines to make assignments etc. line up.  Going
forward, that should make for fewer random-seeming changes to existing
code.

Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
2023-05-19 17:24:48 -04:00
Jeff Davis
1c634f6647 ICU: check for U_STRING_NOT_TERMINATED_WARNING.
Fixes memory error in cases where the length of the language name
returned by uloc_getLanguage() is exactly ULOC_LANG_CAPACITY, in which
case the status is set to U_STRING_NOT_TERMINATED_WARNING.

Also check in call sites for other ICU functions that are expected to
return a C string to be safe (no bug is known at these other call
sites).

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/2098874d-c111-41e4-9063-30bcf135226b@gmail.com
2023-05-17 14:18:45 -07:00
Jeff Davis
6de31ce446 Reduce icu_validation_level default to WARNING.
Discussion: https://postgr.es/m/daa9f060aa2349ebc84444515efece49e7b32c5d.camel@j-davis.com
2023-05-17 13:18:40 -07:00
Jeff Davis
455f948b0d Revert "ICU: do not convert locale 'C' to 'en-US-u-va-posix'."
This reverts commit f7faa9976c.

Discussion: https://postgr.es/m/483826.1683582475@sss.pgh.pa.us
2023-05-08 20:50:51 -07:00
Jeff Davis
f7faa9976c ICU: do not convert locale 'C' to 'en-US-u-va-posix'.
The conversion was intended to be for convenience, but it's more
likely to be confusing than useful.

The user can still directly specify 'en-US-u-va-posix' if desired.

Discussion: https://postgr.es/m/f83f089ee1e9acd5dbbbf3353294d24e1f196e95.camel@j-davis.com
Discussion: https://postgr.es/m/37520ec1ae9591f83132f82dbd625f3fc2d69c16.camel@j-davis.com
2023-05-08 10:34:51 -07:00
Michael Paquier
8961cb9a03 Fix typos in comments
The changes done in this commit impact comments with no direct
user-visible changes, with fixes for incorrect function, variable or
structure names.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/e8c38840-596a-83d6-bd8d-cebc51111572@gmail.com
2023-05-02 12:23:08 +09:00
Thomas Munro
7d3d72b55e Remove obsolete defense against strxfrm() bugs.
Old versions of Solaris and illumos had buffer overrun bugs in their
strxfrm() implementations.  The bugs were fixed more than a decade ago
and the relevant releases are long out of vendor support.  It's time to
remove the defense added by commit be8b06c3.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA+hUKGJ-ZPJwKHVLbqye92-ZXeLoCHu5wJL6L6HhNP7FkJ=meA@mail.gmail.com
2023-04-20 13:20:14 +12:00
Peter Geoghegan
d6f0f95a6b Harmonize some more function parameter names.
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in a few places.  These
inconsistencies were all introduced relatively recently, after the code
base had parameter name mismatches fixed in bulk (see commits starting
with commits 4274dc22 and 035ce1fe).

pg_bsd_indent still has a couple of similar inconsistencies, which I
(pgeoghegan) have left untouched for now.

Like all earlier commits that cleaned up function parameter names, this
commit was written with help from clang-tidy.
2023-04-13 10:15:20 -07:00
Jeff Davis
36320cbc16 Fix MSVC warning introduced in ea1db8ae70.
Discussion: https://postgr.es/m/CA+hUKGJR1BhCORa5WdvwxztD3arhENcwaN1zEQ1Upg20BwjKWA@mail.gmail.com
Reported-by: Thomas Munro
2023-04-04 15:43:18 -07:00
Jeff Davis
ea1db8ae70 Canonicalize ICU locale names to language tags.
Convert to BCP47 language tags before storing in the catalog, except
during binary upgrade or when the locale comes from an existing
collation or template database.

The resulting language tags can vary slightly between ICU
versions. For instance, "@colBackwards=yes" is converted to
"und-u-kb-true" in older versions of ICU, and to the simpler (but
equivalent) "und-u-kb" in newer versions.

The process of canonicalizing to a language tag also understands more
input locale string formats than ucol_open(). For instance,
"fr_CA.UTF-8" is misinterpreted by ucol_open() and the region is
ignored; effectively treating it the same as the locale "fr" and
opening the wrong collator. Canonicalization properly interprets the
language and region, resulting in the language tag "fr-CA", which can
then be understood by ucol_open().

This commit fixes a problem in prior versions due to ucol_open()
misinterpreting locale strings as described above. For instance,
creating an ICU collation with locale "fr_CA.UTF-8" would store that
string directly in the catalog, which would later be passed to (and
misinterpreted by) ucol_open(). After this commit, the locale string
will be canonicalized to language tag "fr-CA" in the catalog, which
will be properly understood by ucol_open(). Because this fix affects
the resulting collator, we cannot change the locale string stored in
the catalog for existing databases or collations; otherwise we'd risk
corrupting indexes. Therefore, only canonicalize locales for
newly-created (not upgraded) collations/databases. For similar
reasons, do not backport.

Discussion: https://postgr.es/m/8c7af6820aed94dc7bc259d2aa7f9663518e6137.camel@j-davis.com
Reviewed-by: Peter Eisentraut
2023-04-04 10:38:58 -07:00
Jeff Davis
1671f990dd Validate ICU locales.
For ICU collations, ensure that the locale's language exists in ICU,
and that the locale can be opened.

Basic validation helps avoid minor mistakes and misspellings, which
often fall back to the root locale instead of the intended
locale. It's even more important to avoid such mistakes in ICU
versions 54 and earlier, where the same (misspelled) locale string
could fall back to different locales depending on the environment.

Discussion: https://postgr.es/m/11b1eeb7e7667fdd4178497aeb796c48d26e69b9.camel@j-davis.com
Discussion: https://postgr.es/m/df2efad0cae7c65180df8e5ebb709e5eb4f2a82b.camel@j-davis.com
Reviewed-by: Peter Eisentraut
2023-03-28 16:34:29 -07:00
Jeff Davis
8b3eb0c584 Fix error inconsistency in older ICU versions.
To support older ICU versions, we rely on
icu_set_collation_attributes() to do error checking that is handled
directly by ucol_open() in newer ICU versions. Commit 3b50275b12
introduced a slight inconsistency, where the error report includes the
fixed-up locale string, rather than the locale string passed to
pg_ucol_open().

Refactor slightly so that pg_ucol_open() handles the errors from both
ucol_open() and icu_set_collation_attributes(), making it easier to
see any differences between the error reports. It also makes
pg_ucol_open() responsible for closing the UCollator on error, which
seems like the right place.

Discussion: https://postgr.es/m/04182066-7655-344a-b8b7-040b1b2490fb%40enterprisedb.com
Reviewed-by: Peter Eisentraut
2023-03-28 08:24:18 -07:00
Daniel Gustafsson
d435f15fff Add SysCacheGetAttrNotNull for guaranteed not-null attrs
When extracting an attr from a cached tuple in the syscache with
SysCacheGetAttr the isnull parameter must be checked in case the
attr cannot be NULL.  For cases when this is known beforehand, a
wrapper is introduced which perform the errorhandling internally
on behalf of the caller, invoking an elog in case of a NULL attr.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/AD76405E-DB45-46B6-941F-17B1EB3A9076@yesql.se
2023-03-25 22:49:33 +01:00
Jeff Davis
a03b3b6b4a Avoid potential UCollator leak for older ICU versions.
ICU versions 53 and earlier rely on icu_set_collation_attributes() to
process the attributes in the locale string. Avoid leaking the
already-opened UCollator object if an error is encountered.

Discussion: https://postgr.es/m/04182066-7655-344a-b8b7-040b1b2490fb%40enterprisedb.com
Reviewed-by: Peter Eisentraut
2023-03-24 08:48:03 -07:00
Jeff Davis
9a24289915 pg_locale.c: change ereport() to elog().
Discussion: https://postgr.es/m/73553013-3926-0f34-0fb8-f37909fe4902@enterprisedb.com
Reported-by: Peter Eisentraut
2023-03-24 08:47:42 -07:00
Peter Eisentraut
a9bc04b211 Fix incorrect format placeholders
The fields of NLSVERSIONINFOEX are of type DWORD, which is unsigned
long, so the results of the computations being printed are also of
type unsigned long.
2023-03-24 07:21:40 +01:00
Jeff Davis
3b50275b12 Handle the "und" locale in ICU versions 54 and older.
The "und" locale is an alternative spelling of the root locale, but it
was not recognized until ICU 55. To maintain common behavior across
all supported ICU versions, check for "und" and replace with "root"
before opening.

Previously, the lack of support for "und" was dangerous, because
versions 54 and older fall back to the environment when a locale is
not found. If the user specified "und" for the language (which is
expected and documented), it could not only resolve to the wrong
collator, but it could unexpectedly change (which could lead to
corrupt indexes).

This effectively reverts commit d72900bded, which worked around the
problem for the built-in "unicode" collation, and is no longer
necessary.

Discussion: https://postgr.es/m/60da0cecfb512a78b8666b31631a636215d8ce73.camel@j-davis.com
Discussion: https://postgr.es/m/0c6fa66f2753217d2a40480a96bd2ccf023536a1.camel@j-davis.com
Reviewed-by: Peter Eisentraut
2023-03-23 10:08:27 -07:00
Jeff Davis
a326aac8f1 Wrap ICU ucol_open().
Hide details of supporting older ICU versions in a wrapper
function. The current code only needs to handle
icu_set_collation_attributes(), but a subsequent commit will add
additional version-specific code.

Discussion: https://postgr.es/m/7ee414ad-deb5-1144-8a0e-b34ae3b71cd5@enterprisedb.com
Reviewed-by: Peter Eisentraut
2023-03-23 09:15:25 -07:00
Jeff Davis
869650fa86 Support language tags in older ICU versions (53 and earlier).
By calling uloc_canonicalize() before parsing the attributes, the
existing locale attribute parsing logic works on language tags as
well.

Fix a small memory leak, too.

Discussion: http://postgr.es/m/60da0cecfb512a78b8666b31631a636215d8ce73.camel@j-davis.com
Reviewed-by: Peter Eisentraut
2023-03-21 16:12:37 -07:00
Jeff Davis
f413941f41 Fix t_isspace(), etc., when datlocprovider=i and datctype=C.
Check whether the datctype is C to determine whether t_isspace() and
related functions use isspace() or iswspace().

Previously, t_isspace() checked whether the database default collation
was C; which is incorrect when the default collation uses the ICU
provider.

Discussion: https://postgr.es/m/79e4354d9eccfdb00483146a6b9f6295202e7890.camel@j-davis.com
Reviewed-by: Peter Eisentraut
Backpatch-through: 15
2023-03-17 12:08:46 -07:00
Peter Eisentraut
30a53b7929 Allow tailoring of ICU locales with custom rules
This exposes the ICU facility to add custom collation rules to a
standard collation.

New options are added to CREATE COLLATION, CREATE DATABASE, createdb,
and initdb to set the rules.

Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Discussion: https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f739f@enterprisedb.com
2023-03-08 16:56:37 +01:00
Tom Lane
ded7b7bbc3 Silence more compiler warnings introduced by d87d548cd0.
Per buildfarm, there are still a couple of functions where we
get warnings from compilers that don't know that elog(ERROR)
doesn't return.
2023-02-26 14:05:03 -05:00
Jeff Davis
05fc551796 Silence compiler warnings introduced by d87d548cd0.
Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/20230224002029.GQ1653@telsasoft.com
2023-02-24 09:11:35 -08:00
Jeff Davis
6974a8f768 Refactor to introduce pg_locale_deterministic().
Avoids the need of callers to test for NULL, and also avoids the need
to access the pg_locale_t structure directly.

Reviewed-by: Peter Eisentraut, Peter Geoghegan
Discussion: https://postgr.es/m/a581136455c940d7bd0ff482d3a2bd51af25a94f.camel%40j-davis.com
2023-02-23 11:17:41 -08:00
Jeff Davis
d87d548cd0 Refactor to add pg_strcoll(), pg_strxfrm(), and variants.
Offers a generally better separation of responsibilities for collation
code. Also, a step towards multi-lib ICU, which should be based on a
clean separation of the routines required for collation providers.

Callers with NUL-terminated strings should call pg_strcoll() or
pg_strxfrm(); callers with strings and their length should call the
variants pg_strncoll() or pg_strnxfrm().

Reviewed-by: Peter Eisentraut, Peter Geoghegan
Discussion: https://postgr.es/m/a581136455c940d7bd0ff482d3a2bd51af25a94f.camel%40j-davis.com
2023-02-23 10:55:20 -08:00
Bruce Momjian
c8e1ba736b Update copyright for 2023
Backpatch-through: 11
2023-01-02 15:00:37 -05:00
Tom Lane
0a20ff54f5 Split up guc.c for better build speed and ease of maintenance.
guc.c has grown to be one of our largest .c files, making it
a bottleneck for compilation.  It's also acquired a bunch of
knowledge that'd be better kept elsewhere, because of our not
very good habit of putting variable-specific check hooks here.
Hence, split it up along these lines:

* guc.c itself retains just the core GUC housekeeping mechanisms.
* New file guc_funcs.c contains the SET/SHOW interfaces and some
  SQL-accessible functions for GUC manipulation.
* New file guc_tables.c contains the data arrays that define the
  built-in GUC variables, along with some already-exported constant
  tables.
* GUC check/assign/show hook functions are moved to the variable's
  home module, whenever that's clearly identifiable.  A few hard-
  to-classify hooks ended up in commands/variable.c, which was
  already a home for miscellaneous GUC hook functions.

To avoid cluttering a lot more header files with #include "guc.h",
I also invented a new header file utils/guc_hooks.h and put all
the GUC hook functions' declarations there, regardless of their
originating module.  That allowed removal of #include "guc.h"
from some existing headers.  The fallout from that (hopefully
all caught here) demonstrates clearly why such inclusions are
best minimized: there are a lot of files that, for example,
were getting array.h at two or more levels of remove, despite
not having any connection at all to GUCs in themselves.

There is some very minor code beautification here, such as
renaming a couple of inconsistently-named hook functions
and improving some comments.  But mostly this just moves
code from point A to point B and deals with the ensuing
needs for #include adjustments and exporting a few functions
that previously weren't exported.

Patch by me, per a suggestion from Andres Freund; thanks also
to Michael Paquier for the idea to invent guc_funcs.c.

Discussion: https://postgr.es/m/587607.1662836699@sss.pgh.pa.us
2022-09-13 11:11:45 -04:00
Tom Lane
5c7121bcf8 Fix function-defined-but-not-used warning.
Buildfarm member jacana (MinGW) has been complaining that
get_iso_localename is defined but not used.  This is evidently
fallout from the recent removal of VS2013 support in pg_locale.c.
Rearrange the #ifs so that get_iso_localename and its subroutine
search_locale_enum won't get built on MinGW.

I also noticed that a comment in get_iso_localename cross-
referenced a comment in IsoLocaleName that isn't there anymore.
Put back what I think is the referenced material.
2022-08-06 13:32:29 -04:00