int_to_roman() only accepts plain "int" input, which is fine since
we're going to produce '###############' for any value above 3999
anyway. However, the numeric and int8 variants of to_char() would
throw an error if the given input exceeded the integer range, while
the float-input variants invoked undefined-per-C-standard behavior.
Fix things so that you uniformly get '###############' for out of
range input.
Also add test cases covering this code, plus the equally-untested
EEEE, V, and PL format codes.
Discussion: https://postgr.es/m/2956175.1725831136@sss.pgh.pa.us
Instead always fetch the locale and look at the ctype_is_c field.
hba.c relies on regexes working for the C locale without needing
catalog access, which worked before due to a special case for
C_COLLATION_OID in lc_ctype_is_c(). Move the special case to
pg_set_regex_collation() now that lc_ctype_is_c() is gone.
Author: Andreas Karlsson
Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se
Digging into the history, this was not necessary even when it was
added, but might have been some time before that. In any case, there
is no use for this now.
Previously, passing NULL for pg_locale_t meant "use the libc provider
and the server environment". Now that the database collation is
represented as a proper pg_locale_t (not dependent on setlocale()),
remove special cases for NULL.
Leave wchar2char() and char2wchar() unchanged for now, because the
callers don't always have a libc-based pg_locale_t available.
Discussion: https://postgr.es/m/cfd9eb85-c52a-4ec9-a90e-a5e4de56e57d@eisentraut.org
Reviewed-by: Peter Eisentraut, Andreas Karlsson
This brings the titlecasing implementation for the builtin provider
out of formatting.c and into unicode_case.c, along with
unicode_strlower() and unicode_strupper(). Accepts an arbitrary word
boundary callback.
Simple for now, but can be extended to support the Unicode Default
Case Conversion algorithm with full case mapping.
Discussion: https://postgr.es/m/3bc653b5d562ae9e2838b11cb696816c328a489a.camel@j-davis.com
Reviewed-by: Peter Eisentraut
This introduces the following SQL/JSON functions for querying JSON
data using jsonpath expressions:
JSON_EXISTS(), which can be used to apply a jsonpath expression to a
JSON value to check if it yields any values.
JSON_QUERY(), which can be used to to apply a jsonpath expression to
a JSON value to get a JSON object, an array, or a string. There are
various options to control whether multi-value result uses array
wrappers and whether the singleton scalar strings are quoted or not.
JSON_VALUE(), which can be used to apply a jsonpath expression to a
JSON value to return a single scalar value, producing an error if it
multiple values are matched.
Both JSON_VALUE() and JSON_QUERY() functions have options for
handling EMPTY and ERROR conditions, which can be used to specify
the behavior when no values are matched and when an error occurs
during jsonpath evaluation, respectively.
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Andrew Dunstan <andrew@dunslane.net>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Peter Eisentraut <peter@eisentraut.org>
Author: Jian He <jian.universality@gmail.com>
Reviewers have included (in no particular order):
Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup,
Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson,
Justin Pryzby, Álvaro Herrera, Jian He, Anton A. Melnikov,
Nikita Malakhov, Peter Eisentraut, Tomas Vondra
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
Discussion: https://postgr.es/m/CA+HiwqHROpf9e644D8BRqYvaAPmgBZVup-xKMDPk-nd4EpgzHw@mail.gmail.com
Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
The builtin C.UTF-8 locale has similar semantics to the libc locale of
the same name. That is, code point sort order (fast, memcmp-based)
combined with Unicode semantics for character operations such as
pattern matching, regular expressions, and
LOWER()/INITCAP()/UPPER(). The character semantics are based on
Unicode simple case mappings.
The builtin provider's C.UTF-8 offers several important advantages
over libc:
* faster sorting -- benefits from additional optimizations such as
abbreviated keys and varstrfastcmp_c
* faster case conversion, e.g. LOWER(), at least compared with some
libc implementations
* available on all platforms with identical semantics, and the
semantics are stable, testable, and documentable within a given
Postgres major version
Being based on memcmp, the builtin C.UTF-8 locale does not offer
natural language sort order. But it is an improvement for most use
cases that might otherwise use libc's "C.UTF-8" locale, as well as
many use cases that use libc's "C" locale.
Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com
Reviewed-by: Daniel Vérité, Peter Eisentraut, Jeremy Schneider
New provider for collations, like "libc" or "icu", but without any
external dependency.
Initially, the only locale supported by the builtin provider is "C",
which is identical to the libc provider's "C" locale. The libc
provider's "C" locale has always been treated as a special case that
uses an internal implementation, without using libc at all -- so the
new builtin provider uses the same implementation.
The builtin provider's locale is independent of the server environment
variables LC_COLLATE and LC_CTYPE. Using the builtin provider, the
database collation locale can be "C" while LC_COLLATE and LC_CTYPE are
set to "en_US", which is impossible with the libc provider.
By offering a new builtin provider, it clarifies that the semantics of
a collation using this provider will never depend on libc, and makes
it easier to document the behavior.
Discussion: https://postgr.es/m/ab925f69-5f9d-f85e-b87c-bd2a44798659@joeconway.com
Discussion: https://postgr.es/m/dd9261f4-7a98-4565-93ec-336c1c110d90@manitou-mail.org
Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com
Reviewed-by: Daniel Vérité, Peter Eisentraut, Jeremy Schneider
as determined by include-what-you-use (IWYU)
While IWYU also suggests to *add* a bunch of #include's (which is its
main purpose), this patch does not do that. In some cases, a more
specific #include replaces another less specific one.
Some manual adjustments of the automatic result:
- IWYU currently doesn't know about includes that provide global
variable declarations (like -Wmissing-variable-declarations), so
those includes are being kept manually.
- All includes for port(ability) headers are being kept for now, to
play it safe.
- No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the
patch from exploding in size.
Note that this patch touches just *.c files, so nothing declared in
header files changes in hidden ways.
As a small example, in src/backend/access/transam/rmgr.c, some IWYU
pragma annotations are added to handle a special case there.
Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
Formerly, these were only supported in to_char(), but there seems
little reason for that restriction. We should at least have enough
support to permit round-tripping the output of to_char().
In that spirit, TZ accepts either zone abbreviations or numeric
(HH or HH:MM) offsets, which are the cases that to_char() can output.
In an ideal world we'd make it take full zone names too, but
that seems like it'd introduce an unreasonable amount of ambiguity,
since the rules for POSIX-spec zone names are so lax.
OF is a subset of this, accepting only HH or HH:MM.
One small benefit of this improvement is that we can simplify
jsonpath's executeDateTimeMethod function, which no longer needs
to consider the HH and HH:MM cases separately. Moreover, letting
it accept zone abbreviations means it will accept "Z" to mean UTC,
which is emitted by JSON.stringify() for example.
Patch by me, reviewed by Aleksander Alekseev and Daniel Gustafsson
Discussion: https://postgr.es/m/1681086.1686673242@sss.pgh.pa.us
This adds support for infinity to the interval data type, using the
same input/output representation as the other date/time data types
that support infinity. This allows various arithmetic operations on
infinite dates, timestamps and intervals.
The new values are represented by setting all fields of the interval
to INT32/64_MIN for -infinity, and INT32/64_MAX for +infinity. This
ensures that they compare as less/greater than all other interval
values, without the need for any special-case comparison code.
Note that, since those 2 values were formerly accepted as legal finite
intervals, pg_upgrade and dump/restore from an old database will turn
them from finite to infinite intervals. That seems OK, since those
exact values should be extremely rare in practice, and they are
outside the documented range supported by the interval type, which
gives us a certain amount of leeway.
Bump catalog version.
Joseph Koshakow, Jian He, and Ashutosh Bapat, reviewed by me.
Discussion: https://postgr.es/m/CAAvxfHea4%2BsPybKK7agDYOMo9N-Z3J6ZXf3BOM79pFsFNcRjwA%40mail.gmail.com
Since C99, there can be a trailing comma after the last value in an
enum definition. A lot of new code has been introducing this style on
the fly. Some new patches are now taking an inconsistent approach to
this. Some add the last comma on the fly if they add a new last
value, some are trying to preserve the existing style in each place,
some are even dropping the last comma if there was one. We could
nudge this all in a consistent direction if we just add the trailing
commas everywhere once.
I omitted a few places where there was a fixed "last" value that will
always stay last. I also skipped the header files of libpq and ecpg,
in case people want to use those with older compilers. There were
also a small number of cases where the enum type wasn't used anywhere
(but the enum values were), which ended up confusing pgindent a bit,
so I left those alone.
Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
locale_t is defined by POSIX.1-2008 and SUSv4, and available on all
targeted systems. For Windows, win32_port.h redirects to a partial
implementation called _locale_t. We can now remove a lot of
compile-time tests for HAVE_LOCALE_T, and associated comments and dead
code branches that were needed for older computers.
Since configure + MinGW builds didn't detect locale_t but now we assume
that all systems have it, further inconsistencies among the 3 Windows build
systems were revealed. With this commit, we no longer define
HAVE_WCSTOMBS_L and HAVE_MBSTOWCS_L on any Windows build system, but
we have logic to deal with that so that replacements are available where
appropriate.
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tristan Partin <tristan@neon.tech>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKGLg7_T2GKwZFAkEf0V7vbnur-NfCjZPKZb%3DZfAXSV1ORw%40mail.gmail.com
The band-aid applied in commit f0bedf3e4 turns out to still need
some work: it made sure we didn't set Np->last_relevant too small
(to the left of the decimal point), but it didn't prevent setting
it too large (off the end of the partially-converted string).
This could result in fetching data beyond the end of the allocated
space, which with very bad luck could cause a SIGSEGV, though
I don't see any hazard of interesting memory disclosure.
Per bug #17839 from Thiago Nunes. The bug's pretty ancient,
so back-patch to all supported versions.
Discussion: https://postgr.es/m/17839-aada50db24d7b0da@postgresql.org
Remove some code guarded by IS_MINUS() or IS_PLUS(), where the entire
stanza is inside an else-block where both of these are false. This
should slightly improve test coverage.
While at it, remove coding that apparently assumes that unsetting a
bit is so expensive that we have to first check if it's already set
in the first place.
Per Coverity report from Ranier Vilela
Analysis and review by Justin Pryzby
Discussion: https://www.postgresql.org/message-id/20221223010818.GP1153%40telsasoft.com
Replace the error trapping scheme introduced in 5bc450629 with our
shiny new errsave/ereturn mechanism. This doesn't have any real
functional impact (although I think that the new coding is able
to report a few more errors softly than v15 did). And I doubt
there's any measurable performance difference either. But this
gets rid of an ad-hoc, one-of-a-kind design in favor of a mechanism
that will be widely used going forward, so it should be a net win
for code readability.
Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
This patch converts the input functions for date, time, timetz,
timestamp, timestamptz, and interval to the new soft-error style.
There's some related stuff in formatting.c that remains to be
cleaned up, but that seems like a separable project.
Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
Pay down some ancient technical debt (dating to commit 022fd9966):
fix a couple of places in datetime parsing that were throwing
ereport's immediately instead of returning a DTERR code that could be
interpreted by DateTimeParseError. The reason for that was that there
was no mechanism for passing any auxiliary data (such as a zone name)
to DateTimeParseError, and these errors seemed to really need it.
Up to now it didn't matter that much just where the error got thrown,
but now we'd like to have a hard policy that datetime parse errors
get thrown from just the one place.
Hence, invent a "DateTimeErrorExtra" struct that can be used to
carry any extra values needed for specific DTERR codes. Perhaps
in the future somebody will be motivated to use this to improve
the specificity of other DateTimeParseError messages, but for now
just deal with the timezone-error cases.
This is on the way to making the datetime input functions report
parse errors softly; but it's really an independent change, so
commit separately.
Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
More could be done in this line, but I just grabbed some low-hanging
fruit. Principal objective was to remove the need for several ugly
unconstify() usages in formatting.c.
The reverts the following and makes some associated cleanups:
commit f79b803dc: Common SQL/JSON clauses
commit f4fb45d15: SQL/JSON constructors
commit 5f0adec25: Make STRING an unreserved_keyword.
commit 33a377608: IS JSON predicate
commit 1a36bc9db: SQL/JSON query functions
commit 606948b05: SQL JSON functions
commit 49082c2cc: RETURNING clause for JSON() and JSON_SCALAR()
commit 4e34747c8: JSON_TABLE
commit fadb48b00: PLAN clauses for JSON_TABLE
commit 2ef6f11b0: Reduce running time of jsonb_sqljson test
commit 14d3f24fa: Further improve jsonb_sqljson parallel test
commit a6baa4bad: Documentation for SQL/JSON features
commit b46bcf7a4: Improve readability of SQL/JSON documentation.
commit 112fdb352: Fix finalization for json_objectagg and friends
commit fcdb35c32: Fix transformJsonBehavior
commit 4cd8717af: Improve a couple of sql/json error messages
commit f7a605f63: Small cleanups in SQL/JSON code
commit 9c3d25e17: Fix JSON_OBJECTAGG uniquefying bug
commit a79153b7a: Claim SQL standard compliance for SQL/JSON features
commit a1e7616d6: Rework SQL/JSON documentation
commit 8d9f9634e: Fix errors in copyfuncs/equalfuncs support for JSON node types.
commit 3c633f32b: Only allow returning string types or bytea from json_serialize
commit 67b26703b: expression eval: Fix EEOP_JSON_CONSTRUCTOR and EEOP_JSONEXPR size.
The release notes are also adjusted.
Backpatch to release 15.
Discussion: https://postgr.es/m/40d2c882-bcac-19a9-754d-4299e1d87ac7@postgresql.org
datetime.c's parsing logic has assumed that strtod() will accept
a string that looks like ".", which it does in glibc, but not on
some less-common platforms such as AIX. The result of this was
that datetime fields like "123." would be accepted on some platforms
but not others; which is a sufficiently odd case that it's not that
surprising we've heard no field complaints. But commit e39f99046
extended that assumption to new places, and happened to add a test
case that exposed the platform dependency. Remove this dependency
by special-casing situations without any digits after the decimal
point.
(Again, this is in part a pre-existing bug but I don't feel a
compulsion to back-patch.)
Also, rearrange e39f99046's changes in formatting.c to avoid a
Coverity complaint that we were copying an uninitialized field.
Discussion: https://postgr.es/m/1592893.1648969747@sss.pgh.pa.us
DecodeInterval (interval input) was careless about integer-overflow
hazards, allowing bogus results to be obtained for sufficiently
large input values. Also, since it initially converted the input
to a "struct tm", it was impossible to produce the full range of
representable interval values.
Meanwhile, EncodeInterval (interval output) and a few other
functions could suffer failures if asked to process sufficiently
large interval values, because they also relied on being able to
represent an interval in "struct tm" which is not designed to
handle that.
Fix all this stuff by introducing new struct types that are more
fit for purpose.
While this is clearly a bug fix, it's also an API break for any
code that's calling these functions directly. So back-patching
doesn't seem wise, especially in view of the lack of field
complaints.
Joe Koshakow, editorialized a bit by me
Discussion: https://postgr.es/m/CAAvxfHff0JLYHwyBrtMx_=6wr=k2Xp+D+-X3vEhHjJYMj+mQcg@mail.gmail.com
This introduces the SQL/JSON functions for querying JSON data using
jsonpath expressions. The functions are:
JSON_EXISTS()
JSON_QUERY()
JSON_VALUE()
All of these functions only operate on jsonb. The workaround for now is
to cast the argument to jsonb.
JSON_EXISTS() tests if the jsonpath expression applied to the jsonb
value yields any values. JSON_VALUE() must return a single value, and an
error occurs if it tries to return multiple values. JSON_QUERY() must
return a json object or array, and there are various WRAPPER options for
handling scalar or multi-value results. Both these functions have
options for handling EMPTY and ERROR conditions.
Nikita Glukhov
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby.
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
The upper case versions "OF", "TZH", and "TZM" are already supported,
and all other format codes that are supported in upper case are also
supported in lower case, so we should support these as well for
consistency.
Nitin Jadhav, with a tiny cosmetic change by me. Reviewed by Suraj
Kharage and David Zhang.
Discussion: http://postgr.es/m/CAMm1aWZ-oZyKd75+8D=VJ0sAoSwtdXWLP-MAWD4D8R1Dgandzw@mail.gmail.com
Previously, callers of pg_newlocale_from_collation() did not call it
if the collation was DEFAULT_COLLATION_OID and instead proceeded with
a pg_locale_t of 0. Instead, now we call it anyway and have it return
0 if the default collation was passed. It already did this, so we
just have to adjust the callers. This simplifies all the call sites
and also makes future enhancements easier.
After discussion and testing, the previous comment in pg_locale.c
about avoiding this for performance reasons may have been mistaken
since it was testing a very different patch version way back when.
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/ed3baa81-7fac-7788-cc12-41e3f7917e34@enterprisedb.com
Using Roman numbers (via "RM" or "rm") for a conversion to calculate a
number of months has never considered the case of negative numbers,
where a conversion could easily cause out-of-bound memory accesses. The
conversions in themselves were not completely consistent either, as
specifying 12 would result in NULL, but it should mean XII.
This commit reworks the conversion calculation to have a more
consistent behavior:
- If the number of months and years is 0, return NULL.
- If the number of months is positive, return the exact month number.
- If the number of months is negative, do a backward calculation, with
-1 meaning December, -2 November, etc.
Reported-by: Theodor Arsenij Larionov-Trichkin
Author: Julien Rouhaud
Discussion: https://postgr.es/m/16953-f255a18f8c51f1d5@postgresql.org
backpatch-through: 9.6
Previously, a conversion such as
to_date('-44-02-01','YYYY-MM-DD')
would result in '0045-02-01 BC', as the code attempted to interpret
the negative year as BC, but failed to apply the correction needed
for our internal handling of BC years. Fix the off-by-one problem.
Also, arrange for the combination of a negative year and an
explicit "BC" marker to cancel out and produce AD. This is how
the negative-century case works, so it seems sane to do likewise.
Continue to read "year 0000" as 1 BC. Oracle would throw an error,
but we've accepted that case for a long time so I'm hesitant to
change it in a back-patch.
Per bug #16419 from Saeed Hubaishan. Back-patch to all supported
branches.
Dar Alathar-Yemen and Tom Lane
Discussion: https://postgr.es/m/16419-d8d9db0a7553f01b@postgresql.org
The SQL standard doesn't require jsonpath .datetime() method to support the
ISO 8601 format. But our to_json[b]() functions convert timestamps to text in
the ISO 8601 format in the sake of compatibility with javascript. So, we add
support of the ISO 8601 to the jsonpath .datetime() in the sake compatibility
with to_json[b]().
The standard mode of datetime parsing currently supports just template patterns
and separators in the format string. In order to implement ISO 8601, we have to
add support of the format string double quotes to the standard parsing mode.
Discussion: https://postgr.es/m/94321be0-cc96-1a81-b6df-796f437f7c66%40postgrespro.ru
Author: Nikita Glukhov, revised by me
Backpatch-through: 13
They are equivalent, except that StrNCpy() zero-fills the entire
destination buffer instead of providing just one trailing zero. For
all but a tiny number of callers, that's just overhead rather than
being desirable.
Remove StrNCpy() as it is now unused.
In some cases, namestrcpy() is the more appropriate function to use.
While we're here, simplify the API of namestrcpy(): Remove the return
value, don't check for NULL input. Nothing was using that anyway.
Also, remove a few unused name-related functions.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/44f5e198-36f6-6cdb-7fa9-60e34784daae%402ndquadrant.com
Add infinities that behave the same as they do in the floating-point
data types. Aside from any intrinsic usefulness these may have,
this closes an important gap in our ability to convert floating
values to numeric and/or replace float-based APIs with numeric.
The new values are represented by bit patterns that were formerly
not used (although old code probably would take them for NaNs).
So there shouldn't be any pg_upgrade hazard.
Patch by me, reviewed by Dean Rasheed and Andrew Gierth
Discussion: https://postgr.es/m/606717.1591924582@sss.pgh.pa.us
Thomas Munro fixed a longstanding annoyance in pg_bsd_indent, that
it would misformat lines containing IsA() macros on the assumption
that the IsA() call should be treated like a cast. This improves
some other cases involving field/variable names that match typedefs,
too. The only places that get worse are a couple of uses of the
OpenSSL macro STACK_OF(); we'll gladly take that trade-off.
Discussion: https://postgr.es/m/20200114221814.GA19630@alvherre.pgsql
The additional pain from level 4 is excessive for the gain.
Also revert all the source annotation changes to their original
wordings, to avoid back-patching pain.
Discussion: https://postgr.es/m/31166.1589378554@sss.pgh.pa.us
Use it at level 4, a bit more restrictive than the default level, and
tweak our commanding comments to FALLTHROUGH.
(However, leave zic.c alone, since it's external code; to avoid the
warnings that would appear there, change CFLAGS for that file in the
Makefile.)
Author: Julien Rouhaud <rjuju123@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20200412081825.qyo5vwwco3fv4gdo@nol
Discussion: https://postgr.es/m/flat/E1fDenm-0000C8-IJ@gemulon.postgresql.org
Writing a trailing semicolon in a macro is almost never the right thing,
because you almost always want to write a semicolon after each macro
call instead. (Even if there was some reason to prefer not to, pgindent
would probably make a hash of code formatted that way; so within PG the
rule should basically be "don't do it".) Thus, if we have a semi inside
the macro, the compiler sees "something;;". Much of the time the extra
empty statement is harmless, but it could lead to mysterious syntax
errors at call sites. In perhaps an overabundance of neatnik-ism, let's
run around and get rid of the excess semicolons whereever possible.
The only thing worse than a mysterious syntax error is a mysterious
syntax error that only happens in the back branches; therefore,
backpatch these changes where relevant, which is most of them because
most of these mistakes are old. (The lack of reported problems shows
that this is largely a hypothetical issue, but still, it could bite
us in some future patch.)
John Naylor and Tom Lane
Discussion: https://postgr.es/m/CACPNZCs0qWTqJ2QUSGJ07B7uvAvzMb-KbG2q+oo+J3tsWN5cqw@mail.gmail.com
to_char() has long allowed the TM (translation mode) prefix to
specify output of translated month or day names; but that prefix
had no effect in input format strings. Now it does. to_date()
and to_timestamp() will now recognize the same month or day names
that to_char() would output for the same format code. Matching is
case-insensitive (per the active collation's notion of what that
means), just as it has always been for English month/day names
without the TM prefix.
(As per the discussion thread, there are lots of cases that this
feature will not handle, such as alternate day names. But being
able to accept what to_char() will output seems useful enough.)
In passing, fix some shaky English and violations of message
style guidelines in jsonpath errors for the .datetime() method,
which depends on this code.
Juan José Santamaría Flecha, reviewed and modified by me,
with other commentary from Alvaro Herrera, Tomas Vondra,
Arthur Zakirov, Peter Eisentraut, Mark Dilger.
Discussion: https://postgr.es/m/CAC+AXB3u1jTngJcoC1nAHBf=M3v-jrEfo86UFtCqCjzbWS9QhA@mail.gmail.com
Windows has this, and so do all other live platforms according to the
buildfarm; it's been required by POSIX since SUSv2. So remove the
configure probe and tests of HAVE_WCHAR_H.
This is part of a series of commits to get rid of no-longer-relevant
configure checks and dead src/port/ code. I'm committing them separately
to make it easier to back out individual changes if they prove less
portable than I expect.
Discussion: https://postgr.es/m/15379.1582221614@sss.pgh.pa.us
We used to strategically place newlines after some function call left
parentheses to make pgindent move the argument list a few chars to the
left, so that the whole line would fit under 80 chars. However,
pgindent no longer does that, so the newlines just made the code
vertically longer for no reason. Remove those newlines, and reflow some
of those lines for some extra naturality.
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql
I had supposed that the from_char_seq_search() call sites were
all passing the constant arrays you'd expect them to pass ...
but on looking closer, the one for DY format was passing the
days[] array not days_short[]. This accidentally worked because
the day abbreviations in English are all the same as the first
three letters of the full day names. However, once we took out
the "maximum comparison length" logic, it stopped working.
As penance for that oversight, add regression test cases covering
this, as well as every other switch case in DCH_from_char() that
was not reached according to the code coverage report.
Also, fold the DCH_RM and DCH_rm cases into one --- now that
seq_search is case independent, there's no need to pass different
comparison arrays for those cases.
Back-patch, as the previous commit was.
seq_search(), which is used to match input substrings to constants
such as month and day names, had a lot of bizarre and unnecessary
behaviors. It was mostly possible to avert our eyes from that before,
but we don't want to duplicate those behaviors in the upcoming patch
to allow recognition of non-English month and day names. So it's time
to clean this up. In particular:
* seq_search scribbled on the input string, which is a pretty dangerous
thing to do, especially in the badly underdocumented way it was done here.
Fortunately the input string is a temporary copy, but that was being made
three subroutine levels away, making it something easy to break
accidentally. The behavior is externally visible nonetheless, in the form
of odd case-folding in error reports about unrecognized month/day names.
The scribbling is evidently being done to save a few calls to pg_tolower,
but that's such a cheap function (at least for ASCII data) that it's
pretty pointless to worry about. In HEAD I switched it to be
pg_ascii_tolower to ensure it is cheap in all cases; but there are corner
cases in Turkish where this'd change behavior, so leave it as pg_tolower
in the back branches.
* seq_search insisted on knowing the case form (all-upper, all-lower,
or initcap) of the constant strings, so that it didn't have to case-fold
them to perform case-insensitive comparisons. This likewise seems like
excessive micro-optimization, given that pg_tolower is certainly very
cheap for ASCII data. It seems unsafe to assume that we know the case
form that will come out of pg_locale.c for localized month/day names, so
it's better just to define the comparison rule as "downcase all strings
before comparing". (The choice between downcasing and upcasing is
arbitrary so far as English is concerned, but it might not be in other
locales, so follow citext's lead here.)
* seq_search also had a parameter that'd cause it to report a match
after a maximum number of characters, even if the constant string were
longer than that. This was not actually used because no caller passed
a value small enough to cut off a comparison. Replicating that behavior
for localized month/day names seems expensive as well as useless, so
let's get rid of that too.
* from_char_seq_search used the maximum-length parameter to truncate
the input string in error reports about not finding a matching name.
This leads to rather confusing reports in many cases. Worse, it is
outright dangerous if the input string isn't all-ASCII, because we
risk truncating the string in the middle of a multibyte character.
That'd lead either to delivering an illegible error message to the
client, or to encoding-conversion failures that obscure the actual
data problem. Get rid of that in favor of truncating at whitespace
if any (a suggestion due to Alvaro Herrera).
In addition to fixing these things, I const-ified the input string
pointers of DCH_from_char and its subroutines, to make sure there
aren't any other scribbling-on-input problems.
The risk of generating a badly-encoded error message seems like
enough of a bug to justify back-patching, so patch all supported
branches.
Discussion: https://postgr.es/m/29432.1579731087@sss.pgh.pa.us