1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-05 07:21:24 +03:00

Make EXPLAIN report maximum hashtable usage across multiple rescans.

Before discarding the old hash table in ExecReScanHashJoin, capture
its statistics, ensuring that we report the maximum hashtable size
across repeated rescans of the hash input relation.  We can repurpose
the existing code for reporting hashtable size in parallel workers
to help with this, making the patch pretty small.  This also ensures
that if rescans happen within parallel workers, we get the correct
maximums across all instances.

Konstantin Knizhnik and Tom Lane, per diagnosis by Thomas Munro
of a trouble report from Alvaro Herrera.

Discussion: https://postgr.es/m/20200323165059.GA24950@alvherre.pgsql
This commit is contained in:
Tom Lane
2020-04-11 12:39:19 -04:00
parent 5c27bce7f3
commit 969f9d0b4b
5 changed files with 87 additions and 49 deletions

View File

@ -2358,7 +2358,7 @@ typedef struct HashInstrumentation
int nbuckets_original; /* planned number of buckets */
int nbatch; /* number of batches at end of execution */
int nbatch_original; /* planned number of batches */
size_t space_peak; /* peak memory usage in bytes */
Size space_peak; /* peak memory usage in bytes */
} HashInstrumentation;
/* ----------------
@ -2381,8 +2381,20 @@ typedef struct HashState
HashJoinTable hashtable; /* hash table for the hashjoin */
List *hashkeys; /* list of ExprState nodes */
SharedHashInfo *shared_info; /* one entry per worker */
HashInstrumentation *hinstrument; /* this worker's entry */
/*
* In a parallelized hash join, the leader retains a pointer to the
* shared-memory stats area in its shared_info field, and then copies the
* shared-memory info back to local storage before DSM shutdown. The
* shared_info field remains NULL in workers, or in non-parallel joins.
*/
SharedHashInfo *shared_info;
/*
* If we are collecting hash stats, this points to an initially-zeroed
* collection area, which could be either local storage or in shared
* memory; either way it's for just one process.
*/
HashInstrumentation *hinstrument;
/* Parallel hash state. */
struct ParallelHashJoinState *parallel_state;