1
0
mirror of https://github.com/postgres/postgres.git synced 2026-01-26 09:41:40 +03:00
Files
postgres/src/test
Tom Lane bd3e3e9e56 Ensure sanity of hash-join costing when there are no MCV statistics.
estimate_hash_bucket_stats is defined to return zero to *mcv_freq if
it cannot obtain a value for the frequency of the most common value.
Its sole caller final_cost_hashjoin ignored this provision and would
blindly believe the zero value, resulting in computing zero for the
largest bucket size.  In consequence, the safety check that intended
to prevent the largest bucket from exceeding get_hash_memory_limit()
was ineffective, allowing very silly plans to be chosen if statistics
were missing.

After fixing final_cost_hashjoin to disregard zero results for
mcv_freq, a second problem appeared: some cases that should use hash
joins failed to.  This is because estimate_hash_bucket_stats was
unaware of the fact that ANALYZE won't store MCV statistics if it
doesn't find any multiply-occurring values.  Thus the lack of an MCV
stats entry doesn't necessarily mean that we know nothing; we may
well know that the column is unique.  The former coding returned zero
for *mcv_freq in this case, which was pretty close to correct, but now
final_cost_hashjoin doesn't believe it and disables the hash join.
So check to see if there is a HISTOGRAM stats entry; if so, ANALYZE
has in fact run for this column and must have found it to be unique.
In that case report the MCV frequency as 1 / rows, instead of claiming
ignorance.

Reporting a more accurate *mcv_freq in this case can also affect the
bucket-size skew adjustment further down in estimate_hash_bucket_stats,
causing hash-join cost estimates to change slightly.  This affects
some plan choices in the core regression tests.  The first diff in
join.out corresponds to a case where we have no stats and should not
risk a hash join, but the remaining changes are caused by producing
a better bucket-size estimate for unique join columns.  Those are all
harmless changes so far as I can tell.

The existing behavior was introduced in commit 4867d7f62 in v11.
It appears from the commit log that disabling the bucket-size safety
check in the absence of statistics was intentional; but we've now seen
a case where the ensuing behavior is bad enough to make that seem like
a poor decision.  In any case the lack of other problems with that
safety check after several years helps to justify enforcing it more
strictly.  However, we won't risk back-patching this, in case any
applications are depending on the existing behavior.

Bug: #19363
Reported-by: Jinhui Lai <jinhui.lai@qq.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/2380165.1766871097@sss.pgh.pa.us
Discussion: https://postgr.es/m/19363-8dd32fc7600a1153@postgresql.org
2025-12-29 13:01:27 -05:00
..
2025-12-28 09:17:42 +09:00
2025-01-01 11:21:55 -05:00
2025-01-01 11:21:55 -05:00
2025-12-09 13:53:03 +02:00
2025-12-28 09:17:42 +09:00
2025-01-01 11:21:55 -05:00

PostgreSQL tests
================

This directory contains a variety of test infrastructure as well as some of the
tests in PostgreSQL. Not all tests are here -- in particular, there are more in
individual contrib/ modules and in src/bin.

Not all these tests get run by "make check". Check src/test/Makefile to see
which tests get run automatically.

authentication/
  Tests for authentication (but see also below)

examples/
  Demonstration programs for libpq that double as regression tests via
  "make check"

isolation/
  Tests for concurrent behavior at the SQL level

kerberos/
  Tests for Kerberos/GSSAPI authentication and encryption

ldap/
  Tests for LDAP-based authentication

locale/
  Sanity checks for locale data, encodings, etc

mb/
  Tests for multibyte encoding (UTF-8) support

modules/
  Extensions used only or mainly for test purposes, generally not suitable
  for installing in production databases

perl/
  Infrastructure for Perl-based TAP tests

recovery/
  Test suite for recovery and replication

regress/
  PostgreSQL's main regression test suite, pg_regress

ssl/
  Tests to exercise and verify SSL certificate handling

subscription/
  Tests for logical replication