Introduce analyze_sample_percentage variable

The variable controls the amount of sampling analyze table performs. If ANALYZE table with histogram collection is too slow, one can reduce the time taken by setting analyze_sample_percentage to a lower value of the total number of rows. Setting it to 0 will use a formula to compute how many rows to sample: The number of rows collected is capped to a minimum of 50000 and increases logarithmically with a coffecient of 4096. The coffecient is chosen so that we expect an error of less than 3% in our estimations according to the paper: "Random Sampling for Histogram Construction: How much is enough?” – Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya, ACM SIGMOD, 1998. The drawback of sampling is that avg_frequency number is computed imprecisely and will yeild a smaller number than the real one.
2025-07-29 05:21:33 +03:00 · 2019-02-15 01:23:00 +02:00
parent 47f15ea73c
commit f0773b7842
8 changed files with 265 additions and 10 deletions
--- a/mysql-test/suite/sys_vars/r/sysvars_server_notembedded.result
+++ b/mysql-test/suite/sys_vars/r/sysvars_server_notembedded.result
@ -40,6 +40,20 @@ NUMERIC_BLOCK_SIZE	NULL
 ENUM_VALUE_LIST	DEFAULT,COPY,INPLACE,NOCOPY,INSTANT
 READ_ONLY	NO
 COMMAND_LINE_ARGUMENT	OPTIONAL
+VARIABLE_NAME	ANALYZE_SAMPLE_PERCENTAGE
+SESSION_VALUE	100.000000
+GLOBAL_VALUE	100.000000
+GLOBAL_VALUE_ORIGIN	COMPILE-TIME
+DEFAULT_VALUE	100.000000
+VARIABLE_SCOPE	SESSION
+VARIABLE_TYPE	DOUBLE
+VARIABLE_COMMENT	Percentage of rows from the table ANALYZE TABLE will sample to collect table statistics. Set to 0 to let MariaDB decide what percentage of rows to sample.
+NUMERIC_MIN_VALUE	0
+NUMERIC_MAX_VALUE	100
+NUMERIC_BLOCK_SIZE	NULL
+ENUM_VALUE_LIST	NULL
+READ_ONLY	NO
+COMMAND_LINE_ARGUMENT	REQUIRED
 VARIABLE_NAME	AUTOCOMMIT
 SESSION_VALUE	ON
 GLOBAL_VALUE	ON