1
0
mirror of https://github.com/MariaDB/server.git synced 2025-11-09 11:41:36 +03:00
Commit Graph

351 Commits

Author SHA1 Message Date
Michael Okoko
096995b106 Fix column range cardinality crash when histogram is null
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:09 +03:00
Michael Okoko
058a90e6f5 Use existing statistics test to improve coverage for JSON statistics
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:09 +03:00
Michael Okoko
3692adebd4 Fix avg_frequency statistics and remove stderr dumps
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
bff65a813e Implement point selectivity for JSON histograms
* Also merges tests relating to JSON statistics into one file

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
547f805311 Refactor histogram point selectivity
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
e10d99ce87 Backfill json histogram bounds during building
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
3d952cd8bd Improve tests and test results to cover larger cases
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
63cbd0748b replace range_selectivity methods for Histograms and add tests
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
c129689ddc Use binary search to compute range selectivity
* it also adds an "explain select" statement to the test so that the fprintf calls
  can print the computed intervals to mysqld.1.err

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
c605285bb8 fix returned value type for empty json objects
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
69f24c238e Use generic Histogram_base class for Histogram_builders
This fixes the wrong calculation for avg_frequency in json histograms
by replacing the specific histogram objects with the generic Histogram_base class.

It also restores get/set size functions as they were useful in calculating fields
for binary histogram.

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Sergei Petrunia
21e0f5487f MDEV-21130: Histograms: use JSON as on-disk format
A demo of how to use in-memory data structure for histogram.
The patch shows how to
* convert string form of data to binary form
* compare two values in binary form
* compute a fraction for val in [X, Y] range.

grep for GSOC-TODO for notes.
2022-01-19 18:10:08 +03:00
Michael Okoko
e778d12f83 report parse error when parsing JSON histogram fails
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
fe2e516a50 inform test result of zero hist_size for json histogram
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
bf4d0dcfe2 implement parse and serialize for histogram json 2022-01-19 18:10:08 +03:00
Michael Okoko
9bba595528 remove unneeded shared methods
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
1fa7af749e Split histogram classes and into JSON and binary classes
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Sergei Petrunia
1998b787ac MDEV-21130: Histograms: use JSON as on-disk format
Preparation for handling different kinds of histograms:

- In Column_statistics, change "Histogram histogram" into
  "Histogram *histogram_".  This allows for different kinds
  of Histogram classes with virtual functions.

- [Almost] remove the usage of Histogram->set_values and
  Histogram->set_size. The code outside the histogram should
  not make any assumptions about what/how is stored in the Histogram.

- Introduce drafts of methods to read/save histograms to/from disk.
2022-01-19 18:10:08 +03:00
Michael Okoko
fb2edab3eb Extract json parser functions from class
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
6bc2df5fa4 Add parser to read JSON array (of histograms) into string vector
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
524322ad3e Properly initialize bucket bounds vector
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
d4d539803b Fix garbage null values at end of histogram json
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
a378735862 Fix garbage null values at end of json array elements
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:07 +03:00
Michael Okoko
9954aecc2b Store bucket bounds and extend test cases for JSON histogram
This fixes the memory allocation for json histogram builder and add more column types for testing.
Some challenges at the moment include:
* Garbage value at the end of JSON array still persists.
* Garbage value also gets appended to bucket values if the column is a primary key.
* There's a memory leak resulting in a "Warning: Memory not freed" message at the end of tests.

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:07 +03:00
Michael Okoko
237447de63 rough base for json histogram builder
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:07 +03:00
Michael Okoko
79cdb535da add json statistics test and change histogram column type to blob 2022-01-19 18:10:07 +03:00
Michael Okoko
2aca7b0c33 Prepare JSON as valid histogram_type
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:07 +03:00
Rucha Deodhar
2fdb556e04 MDEV-8334: Rename utf8 to utf8mb3
This patch changes the main name of 3 byte character set from utf8 to
utf8mb3. New old_mode UTF8_IS_UTF8MB3 is added and set TRUE by default,
so that utf8 would mean utf8mb3. If not set, utf8 would mean utf8mb4.
2021-05-19 06:48:36 +02:00
Sergei Golubchik
f33e57a9e6 Merge branch '10.4' into 10.5 2021-02-23 13:06:22 +01:00
Sergei Golubchik
e841957416 Merge branch '10.3' into 10.4 2021-02-23 09:25:57 +01:00
Sergei Golubchik
0ab1e3914c Merge branch '10.2' into 10.3 2021-02-22 22:42:27 +01:00
Sergei Golubchik
c4f0133444 cleanup: stat tables
don't allocate Column_statistics_collected objects that won't
be used.

minor style fixes (StringBuffer<>, etc)
2021-02-22 19:27:12 +01:00
Sergei Golubchik
06a791aa12 MDEV-23753: SIGSEGV in Column_stat::store_stat_fields
only collect persistent stats for columns explicitly listed
by the user in the  ANALYZE TABLE PERSISTENT FOR COLUMNS (...)
clause. The engine can extend table->read_set as much as
it wants, it should not affect the collected statistics.

Test case from the 3b94309a6c applies - it used to crash,
because ha_partition extended table->read_set after the loop that
initialized some objects based on bits in the read_set but before the
loop that used these objects based on bits in the read_set.
2021-02-22 19:27:12 +01:00
Sergei Golubchik
caad32ca92 Revert "MDEV-23753: SIGSEGV in Column_stat::store_stat_fields"
This reverts the commit 3b94309a6c but keeps the test

Because the fix is a hack that isn't supposed to do anything,
and relies on a side-effect of rnd_init inside ha_partition.

A different fix is coming up.
2021-02-22 19:27:12 +01:00
Varun Gupta
a461e4d306 MDEV-19474: Histogram statistics are used even with optimizer_use_condition_selectivity=3
The issue here was histogram statistics were being used even when
the level of optimizer_use_condition_selectivity doesn't allow
usage of statistics from histogram.

The histogram statistics are read for a table only when
optimizer_use_condition_selectivity > 3. But the TABLE structure can be
stored in the internal table cache and be reused for the next query.
So in this case the histogram statistics will be available for the next query.

The fix would be to make sure to use the histogram statistics only when
optimizer_use_condition_selectivity > 3.
2021-02-16 11:53:13 +05:30
Sergei Golubchik
25d9d2e37f Merge branch 'bb-10.4-release' into bb-10.5-release 2021-02-15 16:43:15 +01:00
Sergei Golubchik
00a313ecf3 Merge branch 'bb-10.3-release' into bb-10.4-release
Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution"
was null-merged. 10.4 version of the fix is coming up separately
2021-02-12 17:44:22 +01:00
Sergei Golubchik
60ea09eae6 Merge branch '10.2' into 10.3 2021-02-01 13:49:33 +01:00
Varun Gupta
072b39da66 MDEV-22583: Selectivity for BIT columns in filtered column for EXPLAIN is incorrect
For BIT columns when EITS is collected, we store the integral value in
text representation in the min and max fields of the statistical table
When this value is retrieved from the statistical table to original table
field then we try to store the text representation in the original field
which is INCORRECT.

The value that is retrieved should be converted to integral type and that
value should be stored back in the original field. This would get us the
correct estimate for selectivity of the predicate.
2021-01-30 22:36:51 +05:30
Nikita Malyavin
21809f9a45 MDEV-17556 Assertion `bitmap_is_set_all(&table->s->all_set)' failed
The assertion failed in handler::ha_reset upon SELECT under
READ UNCOMMITTED from table with index on virtual column.

This was the debug-only failure, though the problem is mush wider:
* MY_BITMAP is a structure containing my_bitmap_map, the latter is a raw
 bitmap.
* read_set, write_set and vcol_set of TABLE are the pointers to MY_BITMAP
* The rest of MY_BITMAPs are stored in TABLE and TABLE_SHARE
* The pointers to the stored MY_BITMAPs, like orig_read_set etc, and
 sometimes all_set and tmp_set, are assigned to the pointers.
* Sometimes tmp_use_all_columns is used to substitute the raw bitmap
 directly with all_set.bitmap
* Sometimes even bitmaps are directly modified, like in
TABLE::update_virtual_field(): bitmap_clear_all(&tmp_set) is called.

The last three bullets in the list, when used together (which is mostly
always) make the program flow cumbersome and impossible to follow,
notwithstanding the errors they cause, like this MDEV-17556, where tmp_set
pointer was assigned to read_set, write_set and vcol_set, then its bitmap
was substituted with all_set.bitmap by dbug_tmp_use_all_columns() call,
and then bitmap_clear_all(&tmp_set) was applied to all this.

To untangle this knot, the rule should be applied:
* Never substitute bitmaps! This patch is about this.
 orig_*, all_set bitmaps are never substituted already.

This patch changes the following function prototypes:
* tmp_use_all_columns, dbug_tmp_use_all_columns
 to accept MY_BITMAP** and to return MY_BITMAP * instead of my_bitmap_map*
* tmp_restore_column_map, dbug_tmp_restore_column_maps to accept
 MY_BITMAP* instead of my_bitmap_map*

These functions now will substitute read_set/write_set/vcol_set directly,
and won't touch underlying bitmaps.
2021-01-27 00:50:55 +10:00
Varun Gupta
3b94309a6c MDEV-23753: SIGSEGV in Column_stat::store_stat_fields
For EITS collection min and max fields are allocated for each column
that is set in the read_set bitmap of a table. This allocation of min and max
fields happens inside alloc_statistics_for_table.

For a partitioned table ha_rnd_init is called inside the function
collect_statistics_for_table which sets the read_set bitmap for the columns
inside the partition expression. This happens only when there is a write lock
on the partitioned table.
But the allocation happens before this, so min and max fields are not allocated
for the columns involved in the partition expression.
This resulted in a crash, as the EITS statistics were collected but there was
no min and max field to store the value to.

The fix would be to call ha_rnd_init inside the function alloc_statistics_for_table
that would make sure that min and max fields are allocated for the columns
involved in the partition expression.
2021-01-12 18:47:35 +05:30
Nikita Malyavin
e25623e78a MDEV-17556 Assertion `bitmap_is_set_all(&table->s->all_set)' failed
The assertion failed in handler::ha_reset upon SELECT under
READ UNCOMMITTED from table with index on virtual column.

This was the debug-only failure, though the problem is mush wider:
* MY_BITMAP is a structure containing my_bitmap_map, the latter is a raw
 bitmap.
* read_set, write_set and vcol_set of TABLE are the pointers to MY_BITMAP
* The rest of MY_BITMAPs are stored in TABLE and TABLE_SHARE
* The pointers to the stored MY_BITMAPs, like orig_read_set etc, and
 sometimes all_set and tmp_set, are assigned to the pointers.
* Sometimes tmp_use_all_columns is used to substitute the raw bitmap
 directly with all_set.bitmap
* Sometimes even bitmaps are directly modified, like in
TABLE::update_virtual_field(): bitmap_clear_all(&tmp_set) is called.

The last three bullets in the list, when used together (which is mostly
always) make the program flow cumbersome and impossible to follow,
notwithstanding the errors they cause, like this MDEV-17556, where tmp_set
pointer was assigned to read_set, write_set and vcol_set, then its bitmap
was substituted with all_set.bitmap by dbug_tmp_use_all_columns() call,
and then bitmap_clear_all(&tmp_set) was applied to all this.

To untangle this knot, the rule should be applied:
* Never substitute bitmaps! This patch is about this.
 orig_*, all_set bitmaps are never substituted already.

This patch changes the following function prototypes:
* tmp_use_all_columns, dbug_tmp_use_all_columns
 to accept MY_BITMAP** and to return MY_BITMAP * instead of my_bitmap_map*
* tmp_restore_column_map, dbug_tmp_restore_column_maps to accept
 MY_BITMAP* instead of my_bitmap_map*

These functions now will substitute read_set/write_set/vcol_set directly,
and won't touch underlying bitmaps.
2021-01-08 16:04:29 +10:00
Marko Mäkelä
6a1e655cb0 Merge 10.4 into 10.5 2020-12-02 18:29:49 +02:00
Marko Mäkelä
589cf8dbf3 Merge 10.3 into 10.4 2020-12-01 19:51:14 +02:00
Sergei Golubchik
00f54b56b1 cleanup: RAII helper for changing thd->count_cuted_rows 2020-11-25 22:19:59 +01:00
Marko Mäkelä
4d4865de6f Merge 10.4 into 10.5 2020-07-20 15:55:59 +03:00
Marko Mäkelä
4b959bd8df Merge 10.3 into 10.4 2020-07-20 15:34:59 +03:00
Marko Mäkelä
acc58fd835 Merge 10.2 into 10.3 2020-07-20 15:11:59 +03:00
Marko Mäkelä
ca9276e37e Merge 10.1 into 10.2 2020-07-20 14:53:24 +03:00
Varun Gupta
dfdfeecb03 MDEV-22851: Engine independent index statistics are incorrect for large tables on Windows
An oveflow was happening on windows because on Windows sizeof(ulong) is 4 bytes
while it is 8 bytes on Linux.
Switched avg_frequency and avg length for column statistics to ulonglong.
Switched avg_frequency for index statistics to ulonglong.
2020-07-15 11:27:32 +05:30