In HEAD, fix incorrect field width for hours part of OF when tm_gmtoff is
negative. This was introduced by commit 2d87eedc1d4468d3 as a result of
falsely applying a pattern that's correct when + signs are omitted, which
is not the case for OF.
In 9.4, fix missing abs() call that allowed a sign to be attached to the
minutes part of OF. This was fixed in 9.5 by 9b43d73b3f9bef27, but for
inscrutable reasons not back-patched.
In all three versions, ensure that the sign of tm_gmtoff is correctly
reported even when the GMT offset is less than 1 hour.
Add regression tests, which evidently we desperately need here.
Thomas Munro and Tom Lane, per report from David Fetter
Tighten the semantics of boundary-case timestamptz so that we allow
timestamps >= '4714-11-24 00:00+00 BC' and < 'ENDYEAR-01-01 00:00+00 AD'
exactly, no more and no less, but it is allowed to enter timestamps
within that range using non-GMT timezone offsets (which could make the
nominal date 4714-11-23 BC or ENDYEAR-01-01 AD). This eliminates
dump/reload failure conditions for timestamps near the endpoints.
To do this, separate checking of the inputs for date2j() from the
final range check, and allow the Julian date code to handle a range
slightly wider than the nominal range of the datatypes.
Also add a bunch of checks to detect out-of-range dates and timestamps
that formerly could be returned by operations such as date-plus-integer.
All C-level functions that return date, timestamp, or timestamptz should
now be proof against returning a value that doesn't pass IS_VALID_DATE()
or IS_VALID_TIMESTAMP().
Vitaly Burovoy, reviewed by Anastasia Lubennikova, and substantially
whacked around by me
For time masks, like HH24, MI, SS, CC, MM, do not count the negative
sign as part of the zero-padding length specified by the mask, e.g. have
to_char('-4 years'::interval, 'YY') return '-04', not '-4'.
Report by Craig Ringer
Commit ef3f9e642d2b2bba suppressed one cause of warnings here, but
recent clang on OS X is still unhappy because we're passing a "long"
to abs(). The fact that tm_gmtoff is declared as long is no doubt a
hangover from days when int might be only 16 bits; but Postgres has
never been able to run on such machines, so we can just cast it to int
with no worries. For consistency, also cast to int in the other
uses of tm_gmtoff in this stanza.
Note: this code is still broken on machines that don't follow C99
integer-division-truncates-towards-zero rules. Given the lack of
complaints about it, I don't feel a large desire to complicate things
enough to cope with the pre-C99 rules.
Revert "to_char(float4/8): zero pad to specified length". There are
too many platform-specific problems, and the proper rounding is missing.
Also revert companion patch 9d61b9953c1489cbb458ca70013cf5fca1bb7710.
Previously, zero padding was limited to the internal length, rather than
the specified length. This allows it to match to_char(int/numeric), which
always padded to the specified length.
Regression tests added.
BACKWARD INCOMPATIBILITY
We reserve space for the full amount, not one less. The affected checks
deal with localized month and day names. Today's DCH_MAX_ITEM_SIZ value
would suffice for a 60-byte day name, while the longest known is the
49-byte mn_CN.utf-8 word for "Saturday." Thus, the upshot of this
change is merely to avoid misdirecting future readers of the code; users
are not expected to see errors either way.
Previously very long localized month and weekday strings could
overflow the allocated buffers, causing a server crash.
Reported and patch reviewed by Noah Misch. Backpatch to all
supported versions.
Security: CVE-2015-0241
Previously very long field masks for floats could access memory
beyond the existing buffer allocated to hold the result.
Reported by Andres Freund and Peter Geoghegan. Backpatch to all
supported versions.
Security: CVE-2015-0241
When there are consecutive spaces (or other non-format-code characters) in
the format, we should advance over exactly that many characters of input.
The previous coding mistakenly did a "skip whitespace" action between such
characters, possibly allowing more input to be skipped than the user
intended. We only need to skip whitespace just before an actual field.
This is really a bug fix, but given the minimal number of field complaints
and the risk of breaking applications coded to expect the old behavior,
let's not back-patch it.
Jeevan Chalke
These changes should generally improve correctness/maintainability.
A nice side benefit is that several kilobytes move from initialized
data to text segment, allowing them to be shared across processes and
probably reducing copy-on-write overhead while forking a new backend.
Unfortunately this doesn't seem to help libpq in the same way (at least
not when it's compiled with -fpic on x86_64), but we can hope the linker
at least collects all nominally-const data together even if it's not
actually part of the text segment.
Also, make pg_encname_tbl[] static in encnames.c, since there seems
no very good reason for any other code to use it; per a suggestion
from Wim Lewis, who independently submitted a patch that was mostly
a subset of this one.
Oskari Saarenmaa, with some editorialization by me
Remove the variable from the enclosing scopes so that nothing can be
relying on it. The net result of this refactoring is that we get rid
of a few unnecessary strlen() calls.
Original patch from Greg Jaskiewicz, substantially expanded by me.
Add ability for to_char() to output the timezone's UTC offset (OF). We
already have the ability to return the timezone abbeviation (TZ/tz).
Per request from Andrew Dunstan
The existing code in NUM_numpart_from_char has hard-wired logic to treat
'.' as decimal point, even when we're using a locale-aware format string
and the locale says that '.' is the thousands separator. This results in
clearly wrong answers in FM mode (where we must be able to identify the
decimal point location), as per bug report from Patryk Kordylewski.
Since the initialization code in NUM_prepare_locale already sets up
Np->decimal as either the locale decimal-point string or "." depending
on which decimal-point format code was used, there's really no need to
have any extra logic at all in NUM_numpart_from_char: we only need to
test for a match to Np->decimal.
(Note: AFAICS there's nothing in here that explicitly checks for thousands
separators --- rather, any unmatched character is silently skipped over.
That's pretty bogus IMO but it's not the issue being complained of.)
This is a longstanding bug, but it's possible that some existing apps
are depending on '.' being recognized as decimal point even when using
a D format code. Hence, no back-patch. We should probably list this
as a potential incompatibility in the 9.3 release notes.
formatting.c used locale-dependent case folding rules in some code paths
where the result isn't supposed to be locale-dependent, for example
to_char(timestamp, 'DAY'). Since the source data is always just ASCII
in these cases, that usually didn't matter ... but it does matter in
Turkish locales, which have unusual treatment of "i" and "I". To confuse
matters even more, the misbehavior was only visible in UTF8 encoding,
because in single-byte encodings we used pg_toupper/pg_tolower which
don't have locale-specific behavior for ASCII characters. Fix by providing
intentionally ASCII-only case-folding functions and using these where
appropriate. Per bug #7913 from Adnan Dursun. Back-patch to all active
branches, since it's been like this for a long time.
Dates outside the supported range could be entered, but would not print
reasonably, and operations such as conversion to timestamp wouldn't behave
sanely either. Since this has the potential to result in undumpable table
data, it seems worth back-patching.
Hitoshi Harada
century specifications just like positive/AD centuries. Previously the
behavior was either wrong or inconsistent with positive/AD handling.
Centuries without years now always assume the first year of the century,
which is now documented.
The Solaris Studio compiler warns about these instances, unlike more
mainstream compilers such as gcc. But manual inspection showed that
the code is clearly not reachable, and we hope no worthy compiler will
complain about removing this code.
A thinko in commit 029dfdf1157b6d837a7b7211cd35b00c6bcd767c caused the year
519 to be handled differently from either adjacent year, which was not the
intention AFAICS. Report and diagnosis by Marc Cousin.
In passing, remove redundant re-tests of year value.
Trailing-zero stripping applied by the FM specifier could strip zeroes
to the left of the decimal point, for a format with no digit positions
after the decimal point (such as "FM999.").
Reported and diagnosed by Marti Raudsepp, though I didn't use his patch.
These functions should take a pg_locale_t, not a collation OID, and should
call mbstowcs_l/wcstombs_l where available. Where those functions are not
available, temporarily select the correct locale with uselocale().
This change removes the bogus assumption that all locales selectable in
a given database have the same wide-character conversion method; in
particular, the collate.linux.utf8 regression test now passes with
LC_CTYPE=C, so long as the database encoding is UTF8.
I decided to move the char2wchar/wchar2char functions out of mbutils.c and
into pg_locale.c, because they work on wchar_t not pg_wchar_t and thus
don't really belong with the mbutils.c functions. Keeping them where they
were would have required importing pg_locale_t into pg_wchar.h somehow,
which did not seem like a good plan.
pg_newlocale_from_collation does not have enough context to give an error
message that's even a little bit useful, so move the responsibility for
complaining up to its callers. Also, reword ERRCODE_INDETERMINATE_COLLATION
error messages in a less jargony, more message-style-guide-compliant
fashion.
Install just one instance of the "C" and "POSIX" collations into
pg_collation, rather than one per encoding. Make these instances exist
and do something useful even in machines without locale_t support: to wit,
it's now possible to force comparisons and case-folding functions to use C
locale in an otherwise non-C database, whether or not the platform has
support for using any additional collations.
Fix up severely broken upper/lower/initcap functions, too: the C/POSIX
fastpath now does what it is supposed to, and non-default collations are
handled correctly in single-byte database encodings.
Merge the two separate collation hashtables that were being maintained in
pg_locale.c, and be more wary of the possibility that we fail partway
through filling a cache entry.
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.
Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch