json_build_object and json_build_array and the jsonb equivalents did not
correctly process explicit VARIADIC arguments. They are modified to use
the new extract_variadic_args() utility function which abstracts away
the details of the call method.
Michael Paquier, reviewed by Tom Lane and Dmitry Dolgov.
Backpatch to 9.5 for the jsonb fixes and 9.4 for the json fixes, as
that's where they originated.
This is a mechanical change in preparation for a later commit that
will change the layout of TupleDesc. Introducing a macro to abstract
the details of where attributes are stored will allow us to change
that in separate step and revise it in future.
Author: Thomas Munro, editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
This makes almost all core code follow the policy introduced in the
previous commit. Specific decisions:
- Text search support functions with char* and length arguments, such as
prsstart and lexize, may receive unaligned strings. I doubt
maintainers of non-core text search code will notice.
- Use plain VARDATA() on values detoasted or synthesized earlier in the
same function. Use VARDATA_ANY() on varlenas sourced outside the
function, even if they happen to always have four-byte headers. As an
exception, retain the universal practice of using VARDATA() on return
values of SendFunctionCall().
- Retain PG_GETARG_BYTEA_P() in pageinspect. (Page images are too large
for a one-byte header, so this misses no optimization.) Sites that do
not call get_page_from_raw() typically need the four-byte alignment.
- For now, do not change btree_gist. Its use of four-byte headers in
memory is partly entangled with storage of 4-byte headers inside
GBT_VARKEY, on disk.
- For now, do not change gtrgm_consistent() or gtrgm_distance(). They
incorporate the varlena header into a cache, and there are multiple
credible implementation strategies to consider.
Columns with array pseudotypes have not been identified as arrays, so
they have been rendered as strings in the json and jsonb conversion
routines. This change allows them to be rendered as json arrays, making
it possible to deal correctly with the anyarray columns in pg_stats.
This avoids additional translatable strings for each distinct type, as
well as making our quoting style around type names more consistent
(namely, that we don't quote type names). This continues what started
as f402b9950120.
Discussion: https://postgr.es/m/20160401170642.GA57509@alvherre.pgsql
Commit e09996ff8dee3f70 was one brick shy of a load: it didn't insist
that the detected JSON number be the whole of the supplied string.
This allowed inputs such as "2016-01-01" to be misdetected as valid JSON
numbers. Per bug #13906 from Dmitry Ryabov.
In passing, be more wary of zero-length input (I'm not sure this can
happen given current callers, but better safe than sorry), and do some
minor cosmetic cleanup.
Commit bda76c1c8cfb1d11751ba6be88f0242850481733 caused both plus and
minus infinity to be rendered as "infinity", which is not only wrong
but inconsistent with the pre-9.4 behavior of to_json(). Fix that by
duplicating the coding in date_out/timestamp_out/timestamptz_out more
closely. Per bug #13687 from Stepan Perlov. Back-patch to 9.4, like
the previous commit.
In passing, also re-pgindent json.c, since it had gotten a bit messed up by
recent patches (and I was already annoyed by indentation-related problems
in back-patching this fix ...)
Sufficiently-deep recursion heretofore elicited a SIGSEGV. If an
application constructs PostgreSQL json or jsonb values from arbitrary
user input, application users could have exploited this to terminate all
active database connections. That applies to 9.3, where the json parser
adopted recursive descent, and later versions. Only row_to_json() and
array_to_json() were at risk in 9.2, both in a non-security capacity.
Back-patch to 9.2, where the json type was introduced.
Oskari Saarenmaa, reviewed by Michael Paquier.
Security: CVE-2015-5289
These functions have been looking up type info for every row they
process. Instead of doing that we only look them up the first time
through and stash the information in the aggregate state object.
Affects json_agg, json_object_agg, jsonb_agg and jsonb_object_agg.
There is plenty more work to do in making these more efficient,
especially the jsonb functions, but this is a virtually cost free
improvement that can be done right away.
Backpatch to 9.5 where the jsonb variants were introduced.
Previously, there was an inconsistency across json/jsonb operators that
operate on datums containing JSON arrays -- only some operators
supported negative array count-from-the-end subscripting. Specifically,
only a new-to-9.5 jsonb deletion operator had support (the new "jsonb -
integer" operator). This inconsistency seemed likely to be
counter-intuitive to users. To fix, allow all places where the user can
supply an integer subscript to accept a negative subscript value,
including path-orientated operators and functions, as well as other
extraction operators. This will need to be called out as an
incompatibility in the 9.5 release notes, since it's possible that users
are relying on certain established extraction operators changed here
yielding NULL in the event of a negative subscript.
For the json type, this requires adding a way of cheaply getting the
total JSON array element count ahead of time when parsing arrays with a
negative subscript involved, necessitating an ad-hoc lex and parse.
This is followed by a "conversion" from a negative subscript to its
equivalent positive-wise value using the count. From there on, it's as
if a positive-wise value was originally provided.
Note that there is still a minor inconsistency here across jsonb
deletion operators. Unlike the aforementioned new "-" deletion operator
that accepts an integer on its right hand side, the new "#-" path
orientated deletion variant does not throw an error when it appears like
an array subscript (input that could be recognized by as an integer
literal) is being used on an object, which is wrong-headed. The reason
for not being stricter is that it could be the case that an object pair
happens to have a key value that looks like an integer; in general,
these two possibilities are impossible to differentiate with rhs path
text[] argument elements. However, we still don't allow the "#-"
path-orientated deletion operator to perform array-style subscripting.
Rather, we just return the original left operand value in the event of a
negative subscript (which seems analogous to how the established
"jsonb/json #> text[]" path-orientated operator may yield NULL in the
event of an invalid subscript).
In passing, make SetArrayPath() stricter about not accepting cases where
there is trailing non-numeric garbage bytes rather than a clean NUL
byte. This means, for example, that strings like "10e10" are now not
accepted as an array subscript of 10 by some new-to-9.5 path-orientated
jsonb operators (e.g. the new #- operator). Finally, remove dead code
for jsonb subscript deletion; arguably, this should have been done in
commit b81c7b409.
Peter Geoghegan and Andrew Dunstan
Commit ab14a73a6c raised an error in these cases and later the
behaviour was copied to jsonb. This is what the XML code, which we
then adopted, does, as the XSD types don't accept infinite values.
However, json dates and timestamps are just strings as far as json is
concerned, so there is no reason not to render these values as
'infinity'.
The json portion of this is backpatched to 9.4 where the behaviour was
introduced. The jsonb portion only affects the development branch.
Per gripe on pgsql-general.
We've been trying to support \u0000 in JSON values since commit
78ed8e03c67d7333, and have introduced increasingly worse hacks to try to
make it work, such as commit 0ad1a816320a2b53. However, it fundamentally
can't work in the way envisioned, because the stored representation looks
the same as for \\u0000 which is not the same thing at all. It's also
entirely bogus to output \u0000 when de-escaped output is called for.
The right way to do this would be to store an actual 0x00 byte, and then
throw error only if asked to produce de-escaped textual output. However,
getting to that point seems likely to take considerable work and may well
never be practical in the 9.4.x series.
To preserve our options for better behavior while getting rid of the nasty
side-effects of 0ad1a816320a2b53, revert that commit in toto and instead
throw error if \u0000 is used in a context where it needs to be de-escaped.
(These are the same contexts where non-ASCII Unicode escapes throw error
if the database encoding isn't UTF8, so this behavior is by no means
without precedent.)
In passing, make both the \u0000 case and the non-ASCII Unicode case report
ERRCODE_UNTRANSLATABLE_CHARACTER / "unsupported Unicode escape sequence"
rather than claiming there's something wrong with the input syntax.
Back-patch to 9.4, where we have to do something because 0ad1a816320a2b53
broke things for many cases having nothing to do with \u0000. 9.3 also has
bogus behavior, but only for that specific escape value, so given the lack
of field complaints it seems better to leave 9.3 alone.
The functions are:
to_jsonb()
jsonb_object()
jsonb_build_object()
jsonb_build_array()
jsonb_agg()
jsonb_object_agg()
Also along the way some better logic is implemented in
json_categorize_type() to match that in the newly implemented
jsonb_categorize_type().
Andrew Dunstan, reviewed by Pavel Stehule and Alvaro Herrera.
Davide S. reported that json_agg() sometimes produced multiple trailing
right brackets. This turns out to be because json_agg_finalfn() attaches
the final right bracket, and was doing so by modifying the aggregate state
in-place. That's verboten, though unfortunately it seems there's no way
for nodeAgg.c to check for such mistakes.
Fix that back to 9.3 where the broken code was introduced. In 9.4 and
HEAD, likewise fix json_object_agg(), which had copied the erroneous logic.
Make some cosmetic cleanups as well.
We expose a function IsValidJsonNumber that internally calls the lexer
for json numbers. That allows us to use the same test everywhere,
instead of inventing a broken test for hstore conversions. The new
function is also used in datum_to_json, replacing the code that is now
moved to the new function.
Backpatch to 9.3 where hstore_to_json_loose was introduced.
Per discussion, revert the commit which added 'ignore_nulls' to
row_to_json. This capability would be better added as an independent
function rather than being bolted on to row_to_json. Additionally,
the implementation didn't address complex JSON objects, and so was
incomplete anyway.
Pointed out by Tom and discussed with Andrew and Robert.
We removed a similar ban on this in json_object recently, but the ban in
datum_to_json was left, which generate4d sprutious errors in othee json
generators, notable json_build_object.
Along the way, add an assertion that datum_to_json is not passed a null
key. All current callers comply with this rule, but the assertion will
catch any possible future misbehaviour.
Provide an option to skip NULL values in a row when generating a JSON
object from that row with row_to_json. This can reduce the size of the
JSON object in cases where columns are NULL without really reducing the
information in the JSON object.
This also makes row_to_json into a single function with default values,
rather than having multiple functions. In passing, change array_to_json
to also be a single function with default values (we don't add an
'ignore_nulls' option yet- it's not clear that there is a sensible
use-case there, and it hasn't been asked for in any case).
Pavel Stehule
Commit f30015b6d794c15d52abbb3df3a65081fbefb1ed made this happen for
timestamp and timestamptz, but it seems pretty inconsistent to not
do it for simple dates as well.
(In passing, I re-pgindent'd json.c.)
There's actually no need for any special case for unknown-type literals,
since we only need to push the value through its output function and
unknownout() works fine. The code that was here was completely bizarre
anyway, and would fail outright in cases that should work, not to mention
suffering from some copy-and-paste bugs.
Fix an obvious typo in json_build_object()'s complaint about invalid
number of arguments, and make the errhint a bit more sensible too.
Per discussion about how to word the improved hint, change the few places
in the documentation that refer to JSON object field names as "names" to
say "keys" instead, since that's what we've said in the vast majority of
places in the docs. Arguably "name" is more correct, since that's the
terminology used in RFC 7159; but we're stuck with "key" in view of the
naming of json_object_keys() so let's at least be self-consistent.
I adjusted a few code comments to match this as well, and failed to
resist the temptation to clean up some odd whitespace choices in the
same area, as well as a useless duplicate PG_ARGISNULL() check. There's
still quite a bit of code that uses the phrase "field name" in non-user-
visible ways, so I left those usages alone.
The isxdigit() calls relied on undefined behavior. The isascii() call
was well-defined, but our prevailing style is to include the cast.
Back-patch to 9.4, where the isxdigit() calls were introduced.
Per gripe from Peter Eisentraut and Tom Lane.
The output is slightly different, but still ISO 8601 compliant: to_char
doesn't output the minutes when time zone offset is an integer number of
hours, while EncodeDateTime outputs ":00".
The code is slightly adapted from code in xml.c
Previously, any backslash in text being escaped for JSON was doubled so
that the result was still valid JSON. However, this led to some perverse
results in the case of Unicode sequences, These are now detected and the
initial backslash is no longer escaped. All other backslashes are
still escaped. No validity check is performed, all that is looked for is
\uXXXX where X is a hexidecimal digit.
This is a change from the 9.2 and 9.3 behaviour as noted in the Release
notes.
Per complaint from Teodor Sigaev.
Many JSON processors require timestamp strings in ISO 8601 format in
order to convert the strings. When converting a timestamp, with or
without timezone, to a JSON datum we therefore now use such a format
rather than the type's default text output, in functions such as
to_json().
This is a change in behaviour from 9.2 and 9.3, as noted in the release
notes.
These functions were relying on typcategory to identify arrays and
composites, which is not reliable and not the normal way to do it.
Using typcategory to identify boolean, numeric types, and json itself is
also pretty questionable, though the code in those cases didn't seem to be
at risk of anything worse than wrong output. Instead, use the standard
lsyscache functions to identify arrays and composites, and rely on a direct
check of the type OID for the other cases.
In HEAD, also be sure to look through domains so that a domain is treated
the same as its base type for conversions to JSON. However, this is a
small behavioral change; given the lack of field complaints, we won't
back-patch it.
In passing, refactor so that there's only one copy of the code that decides
which conversion strategy to apply, not multiple copies that could (and
have) gotten out of sync.
This code really needs to be refactored so that there aren't so many copies
that can diverge. Not to mention that this whole approach is probably
wrong. But for the moment I'll just stick my finger in the dike.
Per report from Michael Paquier.
The new format accepts exactly the same data as the json type. However, it is
stored in a format that does not require reparsing the orgiginal text in order
to process it, making it much more suitable for indexing and other operations.
Insignificant whitespace is discarded, and the order of object keys is not
preserved. Neither are duplicate object keys kept - the later value for a given
key is the only one stored.
The new type has all the functions and operators that the json type has,
with the exception of the json generation functions (to_json, json_agg etc.)
and with identical semantics. In addition, there are operator classes for
hash and btree indexing, and two classes for GIN indexing, that have no
equivalent in the json type.
This feature grew out of previous work by Oleg Bartunov and Teodor Sigaev, which
was intended to provide similar facilities to a nested hstore type, but which
in the end proved to have some significant compatibility issues.
Authors: Oleg Bartunov, Teodor Sigaev, Peter Geoghegan and Andrew Dunstan.
Review: Andres Freund
json_build_array() and json_build_object allow for the construction of
arbitrarily complex json trees. json_object() turns a one or two
dimensional array, or two separate arrays, into a json_object of
name/value pairs, similarly to the hstore() function.
json_object_agg() aggregates its two arguments into a single json object
as name value pairs.
Catalog version bumped.
Andrew Dunstan, reviewed by Marko Tiikkaja.