Introduce analyze_sample_percentage variable

The variable controls the amount of sampling analyze table performs. If ANALYZE table with histogram collection is too slow, one can reduce the time taken by setting analyze_sample_percentage to a lower value of the total number of rows. Setting it to 0 will use a formula to compute how many rows to sample: The number of rows collected is capped to a minimum of 50000 and increases logarithmically with a coffecient of 4096. The coffecient is chosen so that we expect an error of less than 3% in our estimations according to the paper: "Random Sampling for Histogram Construction: How much is enough?” – Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya, ACM SIGMOD, 1998. The drawback of sampling is that avg_frequency number is computed imprecisely and will yeild a smaller number than the real one.
2025-08-08 11:22:35 +03:00 · 2019-02-15 01:23:00 +02:00
parent 47f15ea73c
commit f0773b7842
8 changed files with 265 additions and 10 deletions
--- a/mysql-test/main/mysqld--help.result
+++ b/mysql-test/main/mysqld--help.result
@@ -15,6 +15,10 @@ The following specify which files/extra groups are read (specified before remain
 --alter-algorithm[=name] 
 Specify the alter table algorithm. One of: DEFAULT, COPY,
 INPLACE, NOCOPY, INSTANT
+ --analyze-sample-percentage=# 
+ Percentage of rows from the table ANALYZE TABLE will
+ sample to collect table statistics. Set to 0 to let
+ MariaDB decide what percentage of rows to sample.
 -a, --ansi          Use ANSI SQL syntax instead of MySQL syntax. This mode
 will also set transaction isolation level 'serializable'.
 --auto-increment-increment[=#] 
@@ -1385,6 +1389,7 @@ The following specify which files/extra groups are read (specified before remain
 Variables (--variable-name=value)
 allow-suspicious-udfs FALSE
 alter-algorithm DEFAULT
+analyze-sample-percentage 100
 auto-increment-increment 1
 auto-increment-offset 1
 autocommit TRUE