Previously, a Query generated through the transform phase would have
unset stmt_location, tracking the starting point of a query string.
Extensions relying on the statement location to extract its relevant
parts in the source text string would fallback to use the whole
statement instead, leading to confusing results like in
pg_stat_statements for queries relying on nested queries, like:
- EXPLAIN, with top-level and nested query using the same query string,
and a query ID coming from the nested query when the non-top-level
entry.
- Multi-statements, with only partial portions of queries being
normalized.
- COPY TO with a query, SELECT or DMLs.
This patch improves things by keeping track of the statement locations
and propagate it to Query during transform, allowing PGSS to only show
the relevant part of the query for nested query. This leads to less
bloat in entries for non-top-level entries, as queries can now be
grouped within the same (toplevel, queryid) duos in pg_stat_statements.
The result gives a stricter one-one mapping between query IDs and its
query strings.
The regression tests introduced in 45e0ba30fc40 produce differences
reflecting the new logic.
Author: Anthonin Bonnefoy
Reviewed-by: Michael Paquier, Jian He
Discussion: https://postgr.es/m/CAO6_XqqM6S9bQ2qd=75W+yKATwoazxSNhv5sjW06fjGAtHbTUA@mail.gmail.com
Up to now, the parser's reporting of a statement's stmt_location
included any preceding whitespace or comments. This isn't really
desirable but was done to avoid accounting honestly for nonterminals
that reduce to empty. It causes problems for pg_stat_statements,
which partially compensates by manually stripping whitespace, but
is not bright enough to strip /*-style comments. There will be
more problems with an upcoming patch to improve reporting of errors
in extension scripts, so it's time to do something about this.
The thing we have to do to make it work right is to adjust
YYLLOC_DEFAULT to scan the inputs of each production to find the
first one that has a valid location (i.e., did not reduce to
empty). In theory this adds a little bit of per-reduction overhead,
but in practice it's negligible. I checked by measuring the time
to run raw_parser() on the contents of information_schema.sql, and
there was basically no change.
Having done that, we can rely on any nonterminal that didn't reduce
to completely empty to have a correct starting location, and we don't
need the kluges the stmtmulti production formerly used.
This should have a side benefit of allowing parse error reports to
include an error position in some cases where they formerly failed to
do so, due to trying to report the position of an empty nonterminal.
I did not go looking for an example though. The one previously known
case where that could happen (OptSchemaEltList) no longer needs the
kluge it had; but I rather doubt that that was the only case.
Discussion: https://postgr.es/m/ZvV1ClhnbJLCz7Sm@msg.df7cb.de
This is preliminary patch. It adds NOT NULL checking for the result of
pg_stat_statements_reset() function. It is needed for upcoming patch
"Track statement entry timestamp" that will change the result type of
this function to the timestamp of a reset performed.
Discussion: https://postgr.es/m/flat/72e80e7b160a6eb189df9ef6f068cce3765d37f8.camel%40moonset.ru
Author: Andrei Zubkov
Reviewed-by: Julien Rouhaud, Hayato Kuroda, Yuki Seino, Chengxi Sun
Reviewed-by: Anton Melnikov, Darren Rush, Michael Paquier, Sergei Kornilov
Reviewed-by: Alena Rybakina, Andrei Lepikhov
This commit expands more the refactoring of the regression tests of
pg_stat_statements, with tests moved out of pg_stat_statements.sql into
separate files. The following file structure is now used:
- select is mostly the former pg_stat_statements.sql, renamed.
- dml for INSERT/UPDATE/DELETE and MERGE
- user_activity, to test role-level checks and stat resets.
- wal, to check the WAL generation after some queries.
Like e8dbdb1, there is no change in terms of code coverage or results,
and this finishes the split I was aiming for in these tests. Most of
the tests used "test" of "pgss_test" as names for the tables used, these
are renamed to less generic names.
Reviewed-by: Bertrand Drouvot
Discussion: https://postgr.es/m/Y/7Y9U/y/keAW3qH@paquier.xyz