postgres

mirror of https://github.com/postgres/postgres.git synced 2025-11-06 07:49:08 +03:00

Author	SHA1	Message	Date
Bruce Momjian	50e6eb731d	Update copyright for 2025 Backpatch-through: 13	2025-01-01 11:21:55 -05:00
Jeff Davis	86a5d6006a	Refactor string case conversion into provider-specific files. Create API entry points pg_strlower(), etc., that work with any provider and give the caller control over the destination buffer. Then, move provider-specific logic into pg_locale_builtin.c, pg_locale_icu.c, and pg_locale_libc.c as appropriate. Discussion: https://postgr.es/m/7aa46d77b377428058403723440862d12a8a129a.camel@j-davis.com	2024-12-16 09:35:18 -08:00
Jeff Davis	1ba0782ce9	Perform provider-specific initialization in new functions. Reviewed-by: Andreas Karlsson Discussion: https://postgr.es/m/4548a168-62cd-457b-8d06-9ba7b985c477@proxel.se	2024-12-02 23:24:35 -08:00
Jeff Davis	e3fa2b037c	Fix unintentional behavior change in commit `e9931bfb75`. Prior to that commit, there was special case to use ASCII case mapping behavior for the libc provider with a single-byte encoding when that's the default collation. Commit `e9931bfb75` mistakenly eliminated that special case; this commit restores it. Discussion: https://postgr.es/m/01a104f0d2179d756261e90d96fd65c36ad6fcf0.camel@j-davis.com	2024-12-02 21:59:02 -08:00
Thomas Munro	1758d42446	Require ucrt if using MinGW. Historically we tolerated the absence of various C runtime library features for the benefit of the MinGW tool chain, because it used ancient msvcrt.dll for a long period of time. It now uses ucrt by default (like Windows 10+, Visual Studio 2015+), and that's the only configuration we're testing. In practice, we effectively required ucrt already in PostgreSQL 17, when commit `8d9a9f03` required _create_locale etc, first available in msvcr120.dll (Visual Studio 2013, the last of the pre-ucrt series of runtimes), and for MinGW users that practically meant ucrt because it was difficult or impossible to use msvcr120.dll. That may even not have been the first such case, but old MinGW configurations had already dropped off our testing radar so we weren't paying much attention. This commit formalizes the requirement. It also removes a couple of obsolete comments that discussed msvcrt.dll limitations, and some tests of !defined(_MSC_VER) to imply msvcrt.dll. There are many more anachronisms, but it'll take some time to figure out how to remove them all. APIs affected relate to locales, UTF-8, threads, large files and more. Thanks to Peter Eisentraut for the documentation change. It's not really necessary to talk about ucrt explicitly in such a short section, since it's the default for MinGW-w64 and MSYS2. It's enough to prune references and broken links to much older tools. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/d9e7731c-ca1b-477c-9298-fa51e135574a%40eisentraut.org	2024-11-27 23:13:45 +13:00
Álvaro Herrera	e6c32d9fad	Clean up newlines following left parentheses Most came in during the 17 cycle, so backpatch there. Some (particularly reorderbuffer.h) are very old, but backpatching doesn't seem useful. Like commits `c9d2977519`, `c4f113e8fe`.	2024-11-26 17:10:07 +01:00
Jeff Davis	3aa2373c11	Refactor the code to create a pg_locale_t into new function. Reviewed-by: Andreas Karlsson Discussion: https://postgr.es/m/59da7ee4-5e1a-4727-b464-a603c6ed84cd@proxel.se	2024-10-25 16:31:08 -07:00
Jeff Davis	66ac94cdc7	Move libc-specific code from pg_locale.c into pg_locale_libc.c. Move implementation of pg_locale_t code for libc collations into pg_locale_libc.c. Other locale-related code, such as pg_perm_setlocale(), remains in pg_locale.c for now. Discussion: https://postgr.es/m/flat/2830211e1b6e6a2e26d845780b03e125281ea17b.camel@j-davis.com	2024-10-14 12:48:43 -07:00
Jeff Davis	f244a2bb4c	Move ICU-specific code from pg_locale.c into pg_locale_icu.c. Discussion: https://postgr.es/m/flat/2830211e1b6e6a2e26d845780b03e125281ea17b.camel@j-davis.com	2024-10-14 12:13:26 -07:00
Thomas Munro	adbb27ac89	Reject non-ASCII locale names. Commit `bf03cfd1` started scanning all available BCP 47 locale names on Windows. This caused an abort/crash in the Windows runtime library if the default locale name contained non-ASCII characters, because of our use of the setlocale() save/restore pattern with "char" strings. After switching to another locale with a different encoding, the saved name could no longer be understood, and setlocale() would abort. "Turkish_Türkiye.1254" is the example from recent reports, but there are other examples of countries and languages with non-ASCII characters in their names, and they appear in Windows' (old style) locale names. To defend against this: 1. In initdb, reject non-ASCII locale names given explicity on the command line, or returned by the operating system environment with setlocale(..., ""), or "canonicalized" by the operating system when we set it. 2. In initdb only, perform the save-and-restore with Windows' non-standard wchar_t variant of setlocale(), so that it is not subject to round trip failures stemming from char string encoding confusion. 3. In the backend, we don't have to worry about the save-and-restore problem because we have already vetted the defaults, so we just have to make sure that CREATE DATABASE also rejects non-ASCII names in any new databases. SET lc_XXX doesn't suffer from the problem, but the ban applies to it too because it uses check_locale(). CREATE COLLATION doesn't suffer from the problem either, but it doesn't use check_locale() so it is not included in the new ban for now, to minimize the change. Anyone who encounters the new error message should either create a new duplicated locale with an ASCII-only name using Windows Locale Builder, or consider using BCP 47 names like "tr-TR". Users already couldn't initialize a cluster with "Turkish_Türkiye.1254" on PostgreSQL 16+, but the new failure mode is an error message that explains why, instead of a crash. Back-patch to 16, where `bf03cfd1` landed. Older versions are affected in theory too, but only 16 and later are causing crash reports. Reviewed-by: Andrew Dunstan <andrew@dunslane.net> (the idea, not the patch) Reported-by: Haifang Wang (Centific Technologies Inc) <v-haiwang@microsoft.com> Discussion: https://postgr.es/m/PH8PR21MB3902F334A3174C54058F792CE5182%40PH8PR21MB3902.namprd21.prod.outlook.com	2024-10-05 13:50:02 +13:00
Jeff Davis	ac30021356	Allow length=-1 for NUL-terminated input to pg_strncoll(), etc. Like ICU, allow a length of -1 to be specified for NUL-terminated arguments to pg_strncoll(), pg_strnxfrm(), and pg_strnxfrm_prefix(). Simplifies the code and comments. Discussion: https://postgr.es/m/2d758e07dff26bcc7cbe2aec57431329bfe3679a.camel@j-davis.com	2024-09-24 15:15:18 -07:00
Jeff Davis	ceeaaed87a	Tighten up make_libc_collator() and make_icu_collator(). Ensure that error paths within these functions do not leak a collator, and return the result rather than using an out parameter. (Error paths in the caller may still result in a leaked collator, which will be addressed separately.) In make_libc_collator(), if the first newlocale() succeeds and the second one fails, close the first locale_t object. The function make_icu_collator() doesn't have any external callers, so change it to be static. Discussion: https://postgr.es/m/54d20e812bd6c3e44c10eddcd757ec494ebf1803.camel@j-davis.com	2024-09-24 12:01:45 -07:00
Jeff Davis	b0c30612c5	Simplify checks for deterministic collations. Remove redundant checks for locale->collate_is_c now that we always have a valid pg_locale_t. Also, remove pg_locale_deterministic() wrapper, which is no longer useful after commit `e9931bfb75`. Just check the field directly, consistent with other fields in pg_locale_t. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se	2024-09-12 13:35:56 -07:00
Jeff Davis	51edc4ca54	Remove lc_ctype_is_c(). Instead always fetch the locale and look at the ctype_is_c field. hba.c relies on regexes working for the C locale without needing catalog access, which worked before due to a special case for C_COLLATION_OID in lc_ctype_is_c(). Move the special case to pg_set_regex_collation() now that lc_ctype_is_c() is gone. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se	2024-09-06 13:23:21 -07:00
Jeff Davis	06421b0843	Remove lc_collate_is_c(). Instead just look up the collation and check collate_is_c field. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se	2024-09-04 14:35:25 -07:00
Jeff Davis	12d3345c0d	Remember last collation to speed up collation cache. This optimization is to avoid a performance regression in an upcoming patch that will remove lc_collate_is_c(). Discussion: https://postgr.es/m/96a559be83329bc66074a3925ebcfa8ceb16dfc5.camel@j-davis.com Discussion: https://postgr.es/m/646f662e145ab38cff1c04d475f4448f53fc5042.camel@j-davis.com Discussion: https://postgr.es/m/54565933-d82f-4d7c-8f47-288b1b570fd8@eisentraut.org	2024-09-03 16:30:03 -07:00
Peter Eisentraut	a2bbc58f74	thread-safety: gmtime_r(), localtime_r() Use gmtime_r() and localtime_r() instead of gmtime() and localtime(), for thread-safety. There are a few affected calls in libpq and ecpg's libpgtypes, which are probably effectively bugs, because those libraries already claim to be thread-safe. There is one affected call in the backend. Most of the backend otherwise uses the custom functions pg_gmtime() and pg_localtime(), which are implemented differently. While we're here, change the call in the backend to gmtime() instead of localtime(), since for that use time zone behavior is irrelevant, and this side-steps any questions about when time zones are initialized by localtime_r() vs localtime(). Portability: gmtime_r() and localtime_r() are in POSIX but are not available on Windows. Windows has functions gmtime_s() and localtime_s() that can fulfill the same purpose, so we add some small wrappers around them. (Note that these _s() functions are also different from the _s() functions in the bounds-checking extension of C11. We are not using those here.) On MinGW, you can get the POSIX-style *_r() functions by defining _POSIX_C_SOURCE appropriately before including <time.h>. This leads to a conflict at least in plpython because apparently _POSIX_C_SOURCE gets defined in some header there, and then our replacement definitions conflict with the system definitions. To avoid that sort of thing, we now always define _POSIX_C_SOURCE on MinGW and use the POSIX-style functions here. Reviewed-by: Stepan Neretin <sncfmgg@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/eba1dc75-298e-4c46-8869-48ba8aad7d70@eisentraut.org	2024-08-23 07:43:04 +02:00
Thomas Munro	2724ff381a	Fix harmless LC_COLLATE[_MASK] confusion. Commit `ca051d8b10` called newlocale(LC_COLLATE, ...) instead of newlocale(LC_COLLATE_MASK, ...), in code reached only on FreeBSD. They have the same value on that OS, explaining why it worked. Fix. Back-patch to 14, where `ca051d8b10` landed.	2024-08-19 22:12:55 +12:00
Jeff Davis	a890ad2149	selfuncs.c: use pg_strxfrm() instead of strxfrm(). pg_strxfrm() takes a pg_locale_t, so it works properly with all providers. This improves estimates for ICU when performing linear interpolation within a histogram bin. Previously, convert_string_datum() always used strxfrm() and relied on setlocale(). That did not produce good estimates for non-default or non-libc collations. Discussion: https://postgr.es/m/89475ee5487d795124f4e25118ea8f1853edb8cb.camel@j-davis.com	2024-08-06 12:25:12 -07:00
Jeff Davis	e9931bfb75	Remove support for null pg_locale_t most places. Previously, passing NULL for pg_locale_t meant "use the libc provider and the server environment". Now that the database collation is represented as a proper pg_locale_t (not dependent on setlocale()), remove special cases for NULL. Leave wchar2char() and char2wchar() unchanged for now, because the callers don't always have a libc-based pg_locale_t available. Discussion: https://postgr.es/m/cfd9eb85-c52a-4ec9-a90e-a5e4de56e57d@eisentraut.org Reviewed-by: Peter Eisentraut, Andreas Karlsson	2024-08-05 18:31:48 -07:00
Jeff Davis	679c5084cf	Relax check for return value from second call of pg_strnxfrm(). strxfrm() is not guaranteed to return the exact number of bytes needed to store the result; it may return a higher value. Discussion: https://postgr.es/m/32f85d88d1f64395abfe5a10dd97a62a4d3474ce.camel@j-davis.com Reviewed-by: Heikki Linnakangas Backpatch-through: 16	2024-07-30 16:23:20 -07:00
Jeff Davis	72fe6d24a3	Make collation not depend on setlocale(). Now that the result of pg_newlocale_from_collation() is always non-NULL, then we can move the collate_is_c and ctype_is_c flags into pg_locale_t. That simplifies the logic in lc_collate_is_c() and lc_ctype_is_c(), removing the dependence on setlocale(). This commit also eliminates the multi-stage initialization of the collation cache. As long as we have catalog access, then it's now safe to call pg_newlocale_from_collation() without checking lc_collate_is_c() first. Discussion: https://postgr.es/m/cfd9eb85-c52a-4ec9-a90e-a5e4de56e57d@eisentraut.org Reviewed-by: Peter Eisentraut, Andreas Karlsson	2024-07-30 00:58:06 -07:00
Jeff Davis	8240401437	Do not return NULL from pg_newlocale_from_collation(). Previously, pg_newlocale_from_collation() returned NULL as a special case for the DEFAULT_COLLATION_OID if the provider was libc. In that case the behavior would depend on the last call to setlocale(). Now, consistent with the other providers, it will return a pointer to default_locale, which is not dependent on setlocale(). Note: for the C and POSIX locales, the locale_t structure within the pg_locale_t will still be zero, because those locales are implemented with internal logic and do not use libc at all. lc_collate_is_c() and lc_ctype_is_c() still depend on setlocale() to determine the current locale, which will be removed in a subsequent commit. Discussion: https://postgr.es/m/cfd9eb85-c52a-4ec9-a90e-a5e4de56e57d@eisentraut.org Reviewed-by: Peter Eisentraut, Andreas Karlsson	2024-07-29 15:18:06 -07:00
Jeff Davis	c0ef1234df	Fix whitespace in commit `005c6b833f`.	2024-07-28 13:34:52 -07:00
Jeff Davis	1c461a8d8d	Refactor: make default_locale internal to pg_locale.c. Discussion: https://postgr.es/m/2228884bb1f1a02614b39f71a90c94d2cc8a3a2f.camel@j-davis.com Reviewed-by: Peter Eisentraut, Andreas Karlsson	2024-07-28 13:07:25 -07:00
Jeff Davis	005c6b833f	Change collation cache to use simplehash.h. Speeds up text comparison expressions when using a collation other than the database default collation. Does not affect larger operations such as ORDER BY, because the lookup is only done once. Discussion: https://postgr.es/m/7bb9f018d20a7b30b9a7f6231efab1b5e50c7720.camel@j-davis.com Reviewed-by: John Naylor, Andreas Karlsson	2024-07-28 12:39:57 -07:00
Peter Eisentraut	5d2e1cc117	Replace some strtok() with strsep() strtok() considers adjacent delimiters to be one delimiter, which is arguably the wrong behavior in some cases. Replace with strsep(), which has the right behavior: Adjacent delimiters create an empty token. Affected by this are parsing of: - Stored SCRAM secrets ("SCRAM-SHA-256$<iterations>:<salt>$<storedkey>:<serverkey>") - ICU collation attributes ("und@colStrength=primary;colCaseLevel=yes") for ICU older than version 54 - PG_COLORS environment variable ("error=01;31:warning=01;35:note=01;36:locus=01") - pg_regress command-line options with comma-separated list arguments (--dbname, --create-role) (currently only used pg_regress_ecpg) Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: David Steele <david@pgmasters.net> Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c3944222a@eisentraut.org	2024-07-22 15:45:46 +02:00
Michael Paquier	b81a71aa05	Assign error codes where missing for user-facing failures All the errors triggered in the code paths patched here would cause the backend to issue an internal_error errcode, which is a state that should be used only for "can't happen" situations. However, these code paths are reachable by the regression tests, and could be seen by users in valid cases. Some regression tests expect internal errcodes as they manipulate the backend state to cause corruption (like checksums), or use elog() because it is more convenient (like injection points), these have no need to change. This reduces the number of internal failures triggered in a check-world by more than half, while providing correct errcodes for these valid cases. Reviewed-by: Robert Haas Discussion: https://postgr.es/m/Zic_GNgos5sMxKoa@paquier.xyz	2024-07-04 09:48:40 +09:00
Peter Eisentraut	17974ec259	Revise GUC names quoting in messages again After further review, we want to move in the direction of always quoting GUC names in error messages, rather than the previous (PG16) wildly mixed practice or the intermittent (mid-PG17) idea of doing this depending on how possibly confusing the GUC name is. This commit applies appropriate quotes to (almost?) all mentions of GUC names in error messages. It partially supersedes `a243569bf6` and `8d9978a717`, which had moved things a bit in the opposite direction but which then were abandoned in a partial state. Author: Peter Smith <smithpb2250@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w%40mail.gmail.com	2024-05-17 11:44:26 +02:00
Jeff Davis	832c4f657f	Remove obsolete comment. Per suggestion from Peter, the comment was not helpful, so remove it rather than fixing it. Reported-by: Peter Eisentraut Discussion: https://postgr.es/m/d9421b21-e759-4b74-a039-c487b469c1f3@eisentraut.org	2024-05-07 11:44:47 -07:00
Jeff Davis	46a44dc372	Use version for builtin collations. Given that the version field already exists, there's little reason not to use it. Suggestion from Peter Eisentraut. Discussion: https://postgr.es/m/613c120a-5413-4fa7-a501-6590eae558f8@eisentraut.org Reviewed-by: Peter Eisentraut	2024-03-29 10:53:26 -07:00
Jeff Davis	f69319f2f1	Support C.UTF-8 locale in the new builtin collation provider. The builtin C.UTF-8 locale has similar semantics to the libc locale of the same name. That is, code point sort order (fast, memcmp-based) combined with Unicode semantics for character operations such as pattern matching, regular expressions, and LOWER()/INITCAP()/UPPER(). The character semantics are based on Unicode simple case mappings. The builtin provider's C.UTF-8 offers several important advantages over libc: * faster sorting -- benefits from additional optimizations such as abbreviated keys and varstrfastcmp_c * faster case conversion, e.g. LOWER(), at least compared with some libc implementations * available on all platforms with identical semantics, and the semantics are stable, testable, and documentable within a given Postgres major version Being based on memcmp, the builtin C.UTF-8 locale does not offer natural language sort order. But it is an improvement for most use cases that might otherwise use libc's "C.UTF-8" locale, as well as many use cases that use libc's "C" locale. Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com Reviewed-by: Daniel Vérité, Peter Eisentraut, Jeremy Schneider	2024-03-19 15:24:41 -07:00
Jeff Davis	60769c62dc	Fix another warning, introduced by `846311051e`. Discussion: https://postgr.es/m/3703896.1710799495@sss.pgh.pa.us Reported-by: Tom Lane	2024-03-18 15:34:43 -07:00
Jeff Davis	846311051e	Address more review comments on commit `2d819a08a1`. Based on comments from Peter Eisentraut. * Document CREATE DATABASE ... BUILTIN_LOCALE. * Determine required encoding based on locale name for CREATE COLLATION. Use -1 for "C" (requires catversion bump). * initdb output fixups. * Make ctype_is_c a constant true for now. * Fixups to ICU 010_create_database.pl test. Discussion: https://postgr.es/m/4135cf11-206d-40ed-96c0-9363c1232379@eisentraut.org	2024-03-18 11:58:13 -07:00
Jeff Davis	61f352ece9	Fix unreachable code warning from commit `2d819a08a1`. Found by Coverity. Discussion: https://postgr.es/m/3422201.1710711993@sss.pgh.pa.us Reported-by: Tom Lane	2024-03-18 09:15:47 -07:00
Jeff Davis	2d819a08a1	Introduce "builtin" collation provider. New provider for collations, like "libc" or "icu", but without any external dependency. Initially, the only locale supported by the builtin provider is "C", which is identical to the libc provider's "C" locale. The libc provider's "C" locale has always been treated as a special case that uses an internal implementation, without using libc at all -- so the new builtin provider uses the same implementation. The builtin provider's locale is independent of the server environment variables LC_COLLATE and LC_CTYPE. Using the builtin provider, the database collation locale can be "C" while LC_COLLATE and LC_CTYPE are set to "en_US", which is impossible with the libc provider. By offering a new builtin provider, it clarifies that the semantics of a collation using this provider will never depend on libc, and makes it easier to document the behavior. Discussion: https://postgr.es/m/ab925f69-5f9d-f85e-b87c-bd2a44798659@joeconway.com Discussion: https://postgr.es/m/dd9261f4-7a98-4565-93ec-336c1c110d90@manitou-mail.org Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com Reviewed-by: Daniel Vérité, Peter Eisentraut, Jeremy Schneider	2024-03-13 23:33:44 -07:00
Jeff Davis	f696c0cd5f	Catalog changes preparing for builtin collation provider. Rename pg_collation.colliculocale to colllocale, and pg_database.daticulocale to datlocale. These names reflects that the fields will be useful for the upcoming builtin provider as well, not just for ICU. This is purely a rename; no changes to the meaning of the fields. Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com Reviewed-by: Peter Eisentraut	2024-03-09 14:48:18 -08:00
Peter Eisentraut	dbbca2cf29	Remove unused #include's from backend .c files as determined by include-what-you-use (IWYU) While IWYU also suggests to add a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org	2024-03-04 12:02:20 +01:00
Bruce Momjian	29275b1d17	Update copyright for 2024 Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12	2024-01-03 20:49:05 -05:00
Michael Paquier	8d9978a717	Apply quotes more consistently to GUC names in logs Quotes are applied to GUCs in a very inconsistent way across the code base, with a mix of double quotes or no quotes used. This commit removes double quotes around all the GUC names that are obviously referred to as parameters with non-English words (use of underscore, mixed case, etc). This is the result of a discussion with Álvaro Herrera, Nathan Bossart, Laurenz Albe, Peter Eisentraut, Tom Lane and Daniel Gustafsson. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Pv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w@mail.gmail.com	2023-11-30 14:11:45 +09:00
Michael Paquier	b8f44a4779	Refactor error messages for unsupported providers in pg_locale.c These code paths should not be reached normally, but if they are an error with "(null)" as information for the collation provider would show up if no locale is set, while we can assume that we are referring to libc. This refactors the code so as the provider is always reported even if no locale is set. The name of the function where the error happens is added, while on it, as it can be helpful for debugging. Issue introduced by `d87d548cd0`, so backpatch down to 16. Author: Michael Paquier, Ranier Vilela Reviewed-by: Jeff Davis, Kyotaro Horiguchi Discussion: https://postgr.es/m/7073610042fcf97e1bea2ce08b7e0214b5e11094.camel@j-davis.com Backpatch-through: 16	2023-09-14 08:35:02 +09:00
Thomas Munro	4e9fa6d56b	Don't expose Windows' mbstowcs_l() and wcstombs_l(). Windows has similar functions with leading underscores. Previously, we provided the rename via a macro in win32_port.h. In fact its functions are not always good replacements for the Unix functions, since they can't deal with UTF-8. They are only currently used by pg_locale.c, which is careful to redirect to other Windows routines for UTF-8. Given that portability hazard, it seem unlikely to be a good idea to encourage any other code to think of these functions as being available outside pg_locale.c. Any code that thinks it wants these functions probably wants our wchar2char() or char2wchar() routines instead, or it won't actually work on Windows in UTF-8 databases. Furthermore, some major libc implementations including glibc don't have them (they only have the standard variants without _l), so external code is very unlikely to require them to exist. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CA%2BhUKG%2Bt_CHPzEoPnKyARJBJgE9-GxNajJo6ZuSfRK_KWFO%2B6w%40mail.gmail.com	2023-07-11 09:34:22 +12:00
Peter Eisentraut	e53a611523	Message wording improvements	2023-07-10 10:47:24 +02:00
Thomas Munro	8d9a9f034e	All supported systems have locale_t. locale_t is defined by POSIX.1-2008 and SUSv4, and available on all targeted systems. For Windows, win32_port.h redirects to a partial implementation called _locale_t. We can now remove a lot of compile-time tests for HAVE_LOCALE_T, and associated comments and dead code branches that were needed for older computers. Since configure + MinGW builds didn't detect locale_t but now we assume that all systems have it, further inconsistencies among the 3 Windows build systems were revealed. With this commit, we no longer define HAVE_WCSTOMBS_L and HAVE_MBSTOWCS_L on any Windows build system, but we have logic to deal with that so that replacements are available where appropriate. Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Tristan Partin <tristan@neon.tech> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CA%2BhUKGLg7_T2GKwZFAkEf0V7vbnur-NfCjZPKZb%3DZfAXSV1ORw%40mail.gmail.com	2023-07-09 11:55:18 +12:00
Jeff Davis	f3a01af29b	ICU: do not convert locale 'C' to 'en-US-u-va-posix'. Older versions of ICU canonicalize "C" to "en-US-u-va-posix"; but starting in ICU version 64, the "C" locale is considered obsolete. Postgres commit `ea1db8ae70` introduced code to always canonicalize "C" to "en-US-u-va-posix" for consistency and convenience, but it was deemed too confusing. This commit removes that code, so that "C" is treated like other ICU locale names: canonicalization is attempted, and if it fails, the behavior is controlled by icu_validation_level. A similar change was previously committed as `f7faa9976c`, then reverted due to an ICU-version-dependent test failure. This commit un-reverts it, omitting the test because we now expect the behavior to depend on the version of ICU being used. Discussion: https://postgr.es/m/3a200aca-4672-4b37-fc91-5d198a323503%40eisentraut.org Discussion: https://postgr.es/m/f83f089ee1e9acd5dbbbf3353294d24e1f196e95.camel@j-davis.com Discussion: https://postgr.es/m/37520ec1ae9591f83132f82dbd625f3fc2d69c16.camel@j-davis.com	2023-06-21 13:18:25 -07:00
Tom Lane	0245f8db36	Pre-beta mechanical code beautification. Run pgindent, pgperltidy, and reformat-dat-files. This set of diffs is a bit larger than typical. We've updated to pg_bsd_indent 2.1.2, which properly indents variable declarations that have multi-line initialization expressions (the continuation lines are now indented one tab stop). We've also updated to perltidy version 20230309 and changed some of its settings, which reduces its desire to add whitespace to lines to make assignments etc. line up. Going forward, that should make for fewer random-seeming changes to existing code. Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql	2023-05-19 17:24:48 -04:00
Jeff Davis	1c634f6647	ICU: check for U_STRING_NOT_TERMINATED_WARNING. Fixes memory error in cases where the length of the language name returned by uloc_getLanguage() is exactly ULOC_LANG_CAPACITY, in which case the status is set to U_STRING_NOT_TERMINATED_WARNING. Also check in call sites for other ICU functions that are expected to return a C string to be safe (no bug is known at these other call sites). Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/2098874d-c111-41e4-9063-30bcf135226b@gmail.com	2023-05-17 14:18:45 -07:00
Jeff Davis	6de31ce446	Reduce icu_validation_level default to WARNING. Discussion: https://postgr.es/m/daa9f060aa2349ebc84444515efece49e7b32c5d.camel@j-davis.com	2023-05-17 13:18:40 -07:00
Jeff Davis	455f948b0d	Revert "ICU: do not convert locale 'C' to 'en-US-u-va-posix'." This reverts commit `f7faa9976c`. Discussion: https://postgr.es/m/483826.1683582475@sss.pgh.pa.us	2023-05-08 20:50:51 -07:00
Jeff Davis	f7faa9976c	ICU: do not convert locale 'C' to 'en-US-u-va-posix'. The conversion was intended to be for convenience, but it's more likely to be confusing than useful. The user can still directly specify 'en-US-u-va-posix' if desired. Discussion: https://postgr.es/m/f83f089ee1e9acd5dbbbf3353294d24e1f196e95.camel@j-davis.com Discussion: https://postgr.es/m/37520ec1ae9591f83132f82dbd625f3fc2d69c16.camel@j-davis.com	2023-05-08 10:34:51 -07:00

1 2 3 4 5

233 Commits