1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-30 11:03:19 +03:00

Introduce squashing of constant lists in query jumbling

pg_stat_statements produces multiple entries for queries like
    SELECT something FROM table WHERE col IN (1, 2, 3, ...)

depending on the number of parameters, because every element of
ArrayExpr is individually jumbled.  Most of the time that's undesirable,
especially if the list becomes too large.

Fix this by introducing a new GUC query_id_squash_values which modifies
the node jumbling code to only consider the first and last element of a
list of constants, rather than each list element individually.  This
affects both the query_id generated by query jumbling, as well as
pg_stat_statements query normalization so that it suppresses printing of
the individual elements of such a list.

The default value is off, meaning the previous behavior is maintained.

Author: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Sergey Dudoladov (mysterious, off-list)
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Sutou Kouhei <kou@clear-code.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Marcos Pegoraro <marcos@f10.com.br>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Tested-by: Yasuo Honda <yasuo.honda@gmail.com>
Tested-by: Sergei Kornilov <sk@zsrv.org>
Tested-by: Maciek Sakrejda <m.sakrejda@gmail.com>
Tested-by: Chengxi Sun <sunchengxi@highgo.com>
Tested-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Discussion: https://postgr.es/m/CA+q6zcWtUbT_Sxj0V6HY6EZ89uv5wuG5aefpe_9n0Jr3VwntFg@mail.gmail.com
This commit is contained in:
Álvaro Herrera
2025-03-18 18:56:11 +01:00
parent 247ce06b88
commit 62d712ecfd
15 changed files with 945 additions and 22 deletions

View File

@ -8701,6 +8701,36 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
<varlistentry id="guc-query-id-squash-values" xreflabel="query_id_squash_values">
<term><varname>query_id_squash_values</varname> (<type>bool</type>)
<indexterm>
<primary><varname>query_id_squash_values</varname> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
Specifies how a list of constants (e.g., for an <literal>IN</literal>
clause) contributes to the query identifier computation.
Normally, every element of such a list contributes to the query
identifier separately, which means that two queries that only differ
in the number of elements in such a list would get different query
identifiers.
If this parameter is on, a list of constants will not contribute
to the query identifier. This means that two queries whose only
difference is the number of constants in such a list are going to get the
same query identifier.
</para>
<para>
Only constants are affected; bind parameters do not benefit from this
functionality. The default value is <literal>off</literal>.
</para>
<para>
This parameter also affects how <xref linkend="pgstatstatements"/>
generates normalized query texts.
</para>
</listitem>
</varlistentry>
<varlistentry id="guc-log-statement-stats">
<term><varname>log_statement_stats</varname> (<type>boolean</type>)
<indexterm>

View File

@ -630,9 +630,27 @@
<para>
In some cases, queries with visibly different texts might get merged into a
single <structname>pg_stat_statements</structname> entry. Normally this will happen
only for semantically equivalent queries, but there is a small chance of
hash collisions causing unrelated queries to be merged into one entry.
single <structname>pg_stat_statements</structname> entry; as explained above,
this is expected to happen for semantically equivalent queries.
In addition, if <varname>query_id_squash_values</varname> is enabled
and the only difference between queries is the number of elements in a list
of constants, the list will get squashed down to a single element but shown
with a commented-out list indicator:
<screen>
=# SET query_id_squash_values = on;
=# SELECT pg_stat_statements_reset();
=# SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5, 6, 7);
=# SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5, 6, 7, 8);
=# SELECT query, calls FROM pg_stat_statements
WHERE query LIKE 'SELECT%';
-[ RECORD 1 ]------------------------------
query | SELECT * FROM test WHERE a IN ($1 /*, ... */)
calls | 2
</screen>
In addition to these cases, there is a small chance of hash collisions
causing unrelated queries to be merged into one entry.
(This cannot happen for queries belonging to different users or databases,
however.)
</para>