1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-01 03:47:19 +03:00

Introduce analyze_sample_percentage variable

The variable controls the amount of sampling analyze table performs.

If ANALYZE table with histogram collection is too slow, one can reduce the
time taken by setting analyze_sample_percentage to a lower value of the
total number of rows.
Setting it to 0 will use a formula to compute how many rows to sample:

The number of rows collected is capped to a minimum of 50000 and
increases logarithmically with a coffecient of 4096. The coffecient is
chosen so that we expect an error of less than 3% in our estimations
according to the paper:
"Random Sampling for Histogram Construction: How much is enough?”
– Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya, ACM SIGMOD, 1998.

The drawback of sampling is that avg_frequency number is computed
imprecisely and will yeild a smaller number than the real one.
This commit is contained in:
Vicențiu Ciorbaru
2019-02-15 01:23:00 +02:00
parent 47f15ea73c
commit f0773b7842
8 changed files with 265 additions and 10 deletions

View File

@ -622,6 +622,7 @@ typedef struct system_variables
ulong optimizer_selectivity_sampling_limit;
ulong optimizer_use_condition_selectivity;
ulong use_stat_tables;
double sample_percentage;
ulong histogram_size;
ulong histogram_type;
ulong preload_buff_size;