The existing code in NUM_numpart_from_char has hard-wired logic to treat
'.' as decimal point, even when we're using a locale-aware format string
and the locale says that '.' is the thousands separator. This results in
clearly wrong answers in FM mode (where we must be able to identify the
decimal point location), as per bug report from Patryk Kordylewski.
Since the initialization code in NUM_prepare_locale already sets up
Np->decimal as either the locale decimal-point string or "." depending
on which decimal-point format code was used, there's really no need to
have any extra logic at all in NUM_numpart_from_char: we only need to
test for a match to Np->decimal.
(Note: AFAICS there's nothing in here that explicitly checks for thousands
separators --- rather, any unmatched character is silently skipped over.
That's pretty bogus IMO but it's not the issue being complained of.)
This is a longstanding bug, but it's possible that some existing apps
are depending on '.' being recognized as decimal point even when using
a D format code. Hence, no back-patch. We should probably list this
as a potential incompatibility in the 9.3 release notes.
formatting.c used locale-dependent case folding rules in some code paths
where the result isn't supposed to be locale-dependent, for example
to_char(timestamp, 'DAY'). Since the source data is always just ASCII
in these cases, that usually didn't matter ... but it does matter in
Turkish locales, which have unusual treatment of "i" and "I". To confuse
matters even more, the misbehavior was only visible in UTF8 encoding,
because in single-byte encodings we used pg_toupper/pg_tolower which
don't have locale-specific behavior for ASCII characters. Fix by providing
intentionally ASCII-only case-folding functions and using these where
appropriate. Per bug #7913 from Adnan Dursun. Back-patch to all active
branches, since it's been like this for a long time.
Dates outside the supported range could be entered, but would not print
reasonably, and operations such as conversion to timestamp wouldn't behave
sanely either. Since this has the potential to result in undumpable table
data, it seems worth back-patching.
Hitoshi Harada
century specifications just like positive/AD centuries. Previously the
behavior was either wrong or inconsistent with positive/AD handling.
Centuries without years now always assume the first year of the century,
which is now documented.
The Solaris Studio compiler warns about these instances, unlike more
mainstream compilers such as gcc. But manual inspection showed that
the code is clearly not reachable, and we hope no worthy compiler will
complain about removing this code.
A thinko in commit 029dfdf1157b6d837a7b7211cd35b00c6bcd767c caused the year
519 to be handled differently from either adjacent year, which was not the
intention AFAICS. Report and diagnosis by Marc Cousin.
In passing, remove redundant re-tests of year value.
Trailing-zero stripping applied by the FM specifier could strip zeroes
to the left of the decimal point, for a format with no digit positions
after the decimal point (such as "FM999.").
Reported and diagnosed by Marti Raudsepp, though I didn't use his patch.
These functions should take a pg_locale_t, not a collation OID, and should
call mbstowcs_l/wcstombs_l where available. Where those functions are not
available, temporarily select the correct locale with uselocale().
This change removes the bogus assumption that all locales selectable in
a given database have the same wide-character conversion method; in
particular, the collate.linux.utf8 regression test now passes with
LC_CTYPE=C, so long as the database encoding is UTF8.
I decided to move the char2wchar/wchar2char functions out of mbutils.c and
into pg_locale.c, because they work on wchar_t not pg_wchar_t and thus
don't really belong with the mbutils.c functions. Keeping them where they
were would have required importing pg_locale_t into pg_wchar.h somehow,
which did not seem like a good plan.
pg_newlocale_from_collation does not have enough context to give an error
message that's even a little bit useful, so move the responsibility for
complaining up to its callers. Also, reword ERRCODE_INDETERMINATE_COLLATION
error messages in a less jargony, more message-style-guide-compliant
fashion.
Install just one instance of the "C" and "POSIX" collations into
pg_collation, rather than one per encoding. Make these instances exist
and do something useful even in machines without locale_t support: to wit,
it's now possible to force comparisons and case-folding functions to use C
locale in an otherwise non-C database, whether or not the platform has
support for using any additional collations.
Fix up severely broken upper/lower/initcap functions, too: the C/POSIX
fastpath now does what it is supposed to, and non-default collations are
handled correctly in single-byte database encodings.
Merge the two separate collation hashtables that were being maintained in
pg_locale.c, and be more wary of the possibility that we fail partway
through filling a cache entry.
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.
Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
rather than only sort-of working as the previous attempt had left it.
Clean up some unnecessary differences between the way these were coded and
the way the YYYY case was coded. Update the regression test cases that
proved that it wasn't working.
even when not in FM mode. This improves compatibility with Oracle and with
our pre-8.4 behavior, as per bug #4862.
Brendan Jurd
Add a couple of regression test cases for this. In passing, get rid of the
labeling of the individual test cases; doesn't seem to be good for anything
except causing extra work when inserting a test...
Tom Lane
to date, as per bug #4702 and subsequent discussion. In particular, make it
work for years specified using AD/BC or CC fields, and fix the test for "no
year specified" so that it doesn't trigger inappropriately for 1 BC (which it
was doing even in code paths that had nothing to do with to_timestamp). I
also did some minor code beautification in the non-ISO-day-number code path.
This area has been busted all along, but because the code has been rewritten
repeatedly, it would be considerable trouble to back-patch. It's such a
corner case that it doesn't seem worth the effort.
format codes are misapplied to a numeric argument. (The code still produces
a pretty bogus error message in such cases, but I'll settle for stopping the
crash for now.) Per bug #4700 from Sergey Burladyan.
Problem exists in all supported branches, so patch all the way back.
In HEAD, also clean up some ugly coding in the nearby cache management
code.
pg_database_encoding_max_length() predicts the maximum character length
returned by wchar2char(). Per Hiroshi Inoue, MB_CUR_MAX isn't usable on
Windows because we allow encoding = UTF8 when the locale says differently;
and getting rid of it seems a good idea on general principles because it
narrows our dependence on libc's locale API just a little bit more.
Also install a check for overflow of the buffer size computation.
is treated like a non-digit separator. This fixes the inconsistency in
examples like:
to_timestamp('2008-01-2', 'YYYY-MM-DD') -- didn't work
and
to_timestamp('2008-1-02', 'YYYY-MM-DD') -- did work