Apply const qualifiers to char * arguments and fields throughout the
jsonapi. This allows the top-level APIs such as
pg_parse_json_incremental() to declare their input argument as const.
It also reduces the number of unconstify() calls.
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://www.postgresql.org/message-id/flat/f732b014-f614-4600-a437-dba5a2c3738b%40eisentraut.org
Run pgindent, pgperltidy, and reformat-dat-files.
The pgindent part of this is pretty small, consisting mainly of
fixing up self-inflicted formatting damage from patches that
hadn't bothered to add their new typedefs to typedefs.list.
In order to keep it from making anything worse, I manually added
a dozen or so typedefs that appeared in the existing typedefs.list
but not in the buildfarm's list. Perhaps we should formalize that,
or better find a way to get those typedefs into the automatic list.
pgperltidy is as opinionated as always, and reformat-dat-files too.
Commit d6607016c7 moved all the jsonapi.c error messages into
token_error(). This needs to be added to the various nls.mk files
that use this. Since that makes token_error() effectively a globally
known symbol, the name seems a bit too general, so rename to
json_token_error() for more clarity.
json_lex_string() relies on pg_encoding_mblen_bounded() to point to the
end of a JSON string when generating an error message, and the input it
uses is not guaranteed to be null-terminated.
It was possible to walk off the end of the input buffer by a few bytes
when the last bytes consist of an incomplete multi-byte sequence, as
token_terminator would point to a location defined by
pg_encoding_mblen_bounded() rather than the end of the input. This
commit switches token_terminator so as the error uses data up to the
end of the JSON input.
More work should be done so as this code could rely on an equivalent of
report_invalid_encoding() so as incorrect byte sequences can show in
error messages in a readable form. This requires work for at least two
cases in the JSON parsing API: an incomplete token and an invalid escape
sequence. A more complete solution may be too invasive for a backpatch,
so this is left as a future improvement, taking care of the overread
first.
A test is added on HEAD as test_json_parser makes this issue
straight-forward to check.
Note that pg_encoding_mblen_bounded() no longer has any callers. This
will be removed on HEAD with a separate commit, as this is proving to
encourage unsafe coding.
Author: Jacob Champion
Discussion: https://postgr.es/m/CAOYmi+ncM7pwLS3AnKCSmoqqtpjvA8wmCdoBtKA3ZrB2hZG6zA@mail.gmail.com
Backpatch-through: 13
JsonNonTerminal and JsonParserSem were added in commit 3311ea86ed
These names of these two enums are not actually used, so there is no
need for typedefs. Instead use plain enums to declare the constants.
Noticed by Alvaro Herera.
Per gripes from Michael Paquier
Discussion: https://postgr.es/m/ZhTQ6_w1vwOhqTQI@paquier.xyz
Along the way, also clean up a handful of typos in 3311ea86ed and
ea7b4e9a2a, found by Alexander Lakhin, and a couple of stylistic
snafus noted by Daniel Westermann and Daniel Gustafsson.
Coverity complained about not freeing some memory associated with
incrementally parsing backup manifests. To fix that, provide and use a new
shutdown function for the JsonManifestParseIncrementalState object, in
line with a suggestion from Tom Lane.
While analysing the problem, I noticed a buglet in freeing memory for
incremental json lexers. To fix that remove a bogus condition on
freeing the memory allocated for them.
This parser uses an explicit prediction stack, unlike the present
recursive descent parser where the parser state is represented on the
call stack. This difference makes the new parser suitable for use in
incremental parsing of huge JSON documents that cannot be conveniently
handled piece-wise by the recursive descent parser. One potential use
for this will be in parsing large backup manifests associated with
incremental backups.
Because this parser is somewhat slower than the recursive descent
parser, it is not replacing that parser, but is an additional parser
available to callers.
For testing purposes, if the build is done with -DFORCE_JSON_PSTACK, all
JSON parsing is done with the non-recursive parser, in which case only
trivial regression differences in error messages should be observed.
Author: Andrew Dunstan
Reviewed-By: Jacob Champion
Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net
Allocate memory for the error message inside memory owned by the
JsonLexContext and move responsibility away from the caller for
freeing it. This means that we can partially revert b44669b2ca
as this is now safe to use in FRONTEND code. The motivation for
this comes from the OAuth and incremental JSON patchsets but it
also adds value on its own.
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAOYmi+mWdTd6ujtyF7MsvXvk7ToLRVG_tYAcaGbQLvf=N4KrQw@mail.gmail.com
Since C99, there can be a trailing comma after the last value in an
enum definition. A lot of new code has been introducing this style on
the fly. Some new patches are now taking an inconsistent approach to
this. Some add the last comma on the fly if they add a new last
value, some are trying to preserve the existing style in each place,
some are even dropping the last comma if there was one. We could
nudge this all in a consistent direction if we just add the trailing
commas everywhere once.
I omitted a few places where there was a fixed "last" value that will
always stay last. I also skipped the header files of libpq and ecpg,
in case people want to use those with older compilers. There were
also a small number of cases where the enum type wasn't used anywhere
(but the enum values were), which ended up confusing pgindent a bit,
so I left those alone.
Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
Previously, the JSON code didn't have to worry too much about freeing
JsonLexContext, because it was never too long-lived. With new features
being added for SQL/JSON this is no longer the case. Add a routine
that knows how to free this struct and apply that to a few places, to
prevent this from becoming problematic.
At the same time, we change the API of makeJsonLexContextCstringLen to
make it receive a pointer to JsonLexContext for callers that want it to
be stack-allocated; it can also be passed as NULL to get the original
behavior of a palloc'ed one.
This also causes an ABI break due to the addition of flags to
JsonLexContext, so we can't easily backpatch it. AFAICS that's not much
of a problem; apparently some leaks might exist in JSON usage of
text-search, for example via json_to_tsvector, but I haven't seen any
complaints about that.
Per Coverity complaint about datum_to_jsonb_internal().
Discussion: https://postgr.es/m/20230808174110.oq3iymllsv6amkih@alvherre.pgsql
The majority of error exit cases in json_lex_string() failed to
set lex->token_terminator, causing problems for the error context
reporting code: it would see token_terminator less than token_start
and do something more or less nuts. In v14 and up the end result
could be as bad as a crash in report_json_context(). Older
versions accidentally avoided that fate; but all versions produce
error context lines that are far less useful than intended,
because they'd stop at the end of the prior token instead of
continuing to where the actually-bad input is.
To fix, invent some macros that make it less notationally painful
to do the right thing. Also add documentation about what the
function is actually required to do; and in >= v14, add an assertion
in report_json_context about token_terminator being sufficiently
far advanced.
Per report from Nikolay Shaplov. Back-patch to all supported
versions.
Discussion: https://postgr.es/m/7332649.x5DLKWyVIX@thinkpad-pgpro
This requires a bit of further infrastructure-extension to allow
trapping errors reported by numeric_in and pg_unicode_to_server,
but otherwise it's pretty straightforward.
In the case of jsonb_in, we are only capturing errors reported
during the initial "parse" phase. The value-construction phase
(JsonbValueToJsonb) can also throw errors if assorted implementation
limits are exceeded. We should improve that, but it seems like a
separable project.
Andrew Dunstan and Tom Lane
Discussion: https://postgr.es/m/3bac9841-fe07-713d-fa42-606c225567d6@dunslane.net
Formerly, semantic action functions for the JSON parser returned void,
so that there was no way for them to affect the parser's behavior.
That means in particular that they can't force an error exit except by
longjmp'ing. That won't do in the context of our project to make input
functions return errors softly. Hence, change them to return the same
JsonParseErrorType enum value as the parser itself uses. If an action
function returns anything besides JSON_SUCCESS, the parse is abandoned
and that error code is returned.
Action functions can thus easily return the same error conditions that
the parser already knows about. As an escape hatch for expansion, also
invent a code JSON_SEM_ACTION_FAILED that the core parser does not know
the exact meaning of. When returning this code, an action function
must use some out-of-band mechanism for reporting the error details.
This commit simply makes the API change and causes all the existing
action functions to return JSON_SUCCESS, so that there is no actual
change in behavior here. This is long enough and boring enough that
it seemed best to commit it separately from the changes that make
real use of the new mechanism.
In passing, remove a duplicate assignment of
transform_string_values_scalar.
Discussion: https://postgr.es/m/1436686.1670701118@sss.pgh.pa.us
This prevents marking the argument string for translation for gettext,
and it also prevents the given string (which is already translated) from
being translated at runtime.
Also, mark the strings used as arguments to check_rolespec_name for
translation.
Backpatch all the way back as appropriate. None of this is caught by
any tests (necessarily so), so I verified it manually.
Commit 3838fa269 added a lookahead loop to allow building strings multiple
bytes at a time. This loop could exit because it reached the end of input,
yet did not check for that before checking if we reached the end of a
valid string. To fix, put the end of string check back in the outer loop.
Per Valgrind animal skink
This commit removes a dependency to the central logging facilities in
the JSON parsing routines of src/common/, which existed to log errors
when seeing error codes that do not match any existing values in
JsonParseErrorType, which is not something that should never happen.
The routine providing a detailed error message based on the error code
is made backend-only, the existing code being unsafe to use in the
frontend as the error message may finish by being palloc'd or point to a
static string, so there is no way to know if the memory of the message
should be pfree'd or not. The only user of this routine in the frontend
was pg_verifybackup, that is changed to use a more generic error message
on parsing failure.
Note that making this code more resilient to OOM failures if used in
shared libraries would require much more work as a lot of code paths
still rely on palloc() & friends, but we are not sure yet if we need to
go down to that. Still, removing the dependency to logging is a step
toward more portability.
This cleans up the handling of check_stack_depth() while on it, as it
exists only in the backend.
Per discussion with Jacob Champion and Tom Lane.
Discussion: https://postgr.es/m/YNwL7kXwn3Cckbd6@paquier.xyz
An incorrectly-encoded multibyte character near the end of a string
could cause various processing loops to run past the string's
terminating NUL, with results ranging from no detectable issue to
a program crash, depending on what happens to be in the following
memory.
This isn't an issue in the server, because we take care to verify
the encoding of strings before doing any interesting processing
on them. However, that lack of care leaked into client-side code
which shouldn't assume that anyone has validated the encoding of
its input.
Although this is certainly a bug worth fixing, the PG security team
elected not to regard it as a security issue, primarily because
any untrusted text should be sanitized by PQescapeLiteral or
the like before being incorporated into a SQL or psql command.
(If an app fails to do so, the same technique can be used to
cause SQL injection, with probably much more dire consequences
than a mere client-program crash.) Those functions were already
made proof against this class of problem, cf CVE-2006-2313.
To fix, invent PQmblenBounded() which is like PQmblen() except it
won't return more than the number of bytes remaining in the string.
In HEAD we can make this a new libpq function, as PQmblen() is.
It seems imprudent to change libpq's API in stable branches though,
so in the back branches define PQmblenBounded as a macro in the files
that need it. (Note that just changing PQmblen's behavior would not
be a good idea; notably, it would completely break the escaping
functions' defense against this exact problem. So we just want a
version for those callers that don't have any better way of handling
this issue.)
Per private report from houjingyi. Back-patch to all supported branches.
Point to the specific line where the error was detected; the
previous code tended to include several preceding lines as well.
Avoid re-scanning the entire input to recompute which line that
was. Simplify the logic a bit. Add test cases.
Simon Riggs and Hamid Akhtar, reviewed by Daniel Gustafsson and myself
Discussion: https://postgr.es/m/CANbhV-EPBnXm3MF_TTWBwwqgn1a1Ghmep9VHfqmNBQ8BT0f+_g@mail.gmail.com
Thomas Munro fixed a longstanding annoyance in pg_bsd_indent, that
it would misformat lines containing IsA() macros on the assumption
that the IsA() call should be treated like a cast. This improves
some other cases involving field/variable names that match typedefs,
too. The only places that get worse are a couple of uses of the
OpenSSL macro STACK_OF(); we'll gladly take that trade-off.
Discussion: https://postgr.es/m/20200114221814.GA19630@alvherre.pgsql
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.
Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
SQL includes provisions for numeric Unicode escapes in string
literals and identifiers. Previously we only accepted those
if they represented ASCII characters or the server encoding
was UTF-8, making the conversion to internal form trivial.
This patch adjusts things so that we'll call the appropriate
encoding conversion function in less-trivial cases, allowing
the escape sequence to be accepted so long as it corresponds
to some character available in the server encoding.
This also applies to processing of Unicode escapes in JSONB.
However, the old restriction still applies to client-side
JSON processing, since that hasn't got access to the server's
encoding conversion infrastructure.
This patch includes some lexer infrastructure that simplifies
throwing errors with error cursors pointing into the middle of
a string (or other complex token). For the moment I only used
it for errors relating to Unicode escapes, but we might later
expand the usage to some other cases.
Patch by me, reviewed by John Naylor.
Discussion: https://postgr.es/m/2393.1578958316@sss.pgh.pa.us
To make this work, (1) makeJsonLexContextCstringLen now takes the
encoding to be used as an argument; (2) check_stack_depth() is made to
do nothing in frontend code, and (3) elog(ERROR, ...) is changed to
pg_log_fatal + exit in frontend code.
Mark Dilger, reviewed and slightly revised by me.
Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com