1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-05 13:16:09 +03:00
Commit Graph

343 Commits

Author SHA1 Message Date
Michael Okoko
c129689ddc Use binary search to compute range selectivity
* it also adds an "explain select" statement to the test so that the fprintf calls
  can print the computed intervals to mysqld.1.err

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
c605285bb8 fix returned value type for empty json objects
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
69f24c238e Use generic Histogram_base class for Histogram_builders
This fixes the wrong calculation for avg_frequency in json histograms
by replacing the specific histogram objects with the generic Histogram_base class.

It also restores get/set size functions as they were useful in calculating fields
for binary histogram.

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Sergei Petrunia
21e0f5487f MDEV-21130: Histograms: use JSON as on-disk format
A demo of how to use in-memory data structure for histogram.
The patch shows how to
* convert string form of data to binary form
* compare two values in binary form
* compute a fraction for val in [X, Y] range.

grep for GSOC-TODO for notes.
2022-01-19 18:10:08 +03:00
Michael Okoko
e778d12f83 report parse error when parsing JSON histogram fails
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
fe2e516a50 inform test result of zero hist_size for json histogram
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
bf4d0dcfe2 implement parse and serialize for histogram json 2022-01-19 18:10:08 +03:00
Michael Okoko
9bba595528 remove unneeded shared methods
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
1fa7af749e Split histogram classes and into JSON and binary classes
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Sergei Petrunia
1998b787ac MDEV-21130: Histograms: use JSON as on-disk format
Preparation for handling different kinds of histograms:

- In Column_statistics, change "Histogram histogram" into
  "Histogram *histogram_".  This allows for different kinds
  of Histogram classes with virtual functions.

- [Almost] remove the usage of Histogram->set_values and
  Histogram->set_size. The code outside the histogram should
  not make any assumptions about what/how is stored in the Histogram.

- Introduce drafts of methods to read/save histograms to/from disk.
2022-01-19 18:10:08 +03:00
Michael Okoko
fb2edab3eb Extract json parser functions from class
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
6bc2df5fa4 Add parser to read JSON array (of histograms) into string vector
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
524322ad3e Properly initialize bucket bounds vector
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
d4d539803b Fix garbage null values at end of histogram json
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:08 +03:00
Michael Okoko
a378735862 Fix garbage null values at end of json array elements
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:07 +03:00
Michael Okoko
9954aecc2b Store bucket bounds and extend test cases for JSON histogram
This fixes the memory allocation for json histogram builder and add more column types for testing.
Some challenges at the moment include:
* Garbage value at the end of JSON array still persists.
* Garbage value also gets appended to bucket values if the column is a primary key.
* There's a memory leak resulting in a "Warning: Memory not freed" message at the end of tests.

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:07 +03:00
Michael Okoko
237447de63 rough base for json histogram builder
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:07 +03:00
Michael Okoko
79cdb535da add json statistics test and change histogram column type to blob 2022-01-19 18:10:07 +03:00
Michael Okoko
2aca7b0c33 Prepare JSON as valid histogram_type
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2022-01-19 18:10:07 +03:00
Rucha Deodhar
2fdb556e04 MDEV-8334: Rename utf8 to utf8mb3
This patch changes the main name of 3 byte character set from utf8 to
utf8mb3. New old_mode UTF8_IS_UTF8MB3 is added and set TRUE by default,
so that utf8 would mean utf8mb3. If not set, utf8 would mean utf8mb4.
2021-05-19 06:48:36 +02:00
Sergei Golubchik
f33e57a9e6 Merge branch '10.4' into 10.5 2021-02-23 13:06:22 +01:00
Sergei Golubchik
e841957416 Merge branch '10.3' into 10.4 2021-02-23 09:25:57 +01:00
Sergei Golubchik
0ab1e3914c Merge branch '10.2' into 10.3 2021-02-22 22:42:27 +01:00
Sergei Golubchik
c4f0133444 cleanup: stat tables
don't allocate Column_statistics_collected objects that won't
be used.

minor style fixes (StringBuffer<>, etc)
2021-02-22 19:27:12 +01:00
Sergei Golubchik
06a791aa12 MDEV-23753: SIGSEGV in Column_stat::store_stat_fields
only collect persistent stats for columns explicitly listed
by the user in the  ANALYZE TABLE PERSISTENT FOR COLUMNS (...)
clause. The engine can extend table->read_set as much as
it wants, it should not affect the collected statistics.

Test case from the 3b94309a6c applies - it used to crash,
because ha_partition extended table->read_set after the loop that
initialized some objects based on bits in the read_set but before the
loop that used these objects based on bits in the read_set.
2021-02-22 19:27:12 +01:00
Sergei Golubchik
caad32ca92 Revert "MDEV-23753: SIGSEGV in Column_stat::store_stat_fields"
This reverts the commit 3b94309a6c but keeps the test

Because the fix is a hack that isn't supposed to do anything,
and relies on a side-effect of rnd_init inside ha_partition.

A different fix is coming up.
2021-02-22 19:27:12 +01:00
Varun Gupta
a461e4d306 MDEV-19474: Histogram statistics are used even with optimizer_use_condition_selectivity=3
The issue here was histogram statistics were being used even when
the level of optimizer_use_condition_selectivity doesn't allow
usage of statistics from histogram.

The histogram statistics are read for a table only when
optimizer_use_condition_selectivity > 3. But the TABLE structure can be
stored in the internal table cache and be reused for the next query.
So in this case the histogram statistics will be available for the next query.

The fix would be to make sure to use the histogram statistics only when
optimizer_use_condition_selectivity > 3.
2021-02-16 11:53:13 +05:30
Sergei Golubchik
25d9d2e37f Merge branch 'bb-10.4-release' into bb-10.5-release 2021-02-15 16:43:15 +01:00
Sergei Golubchik
00a313ecf3 Merge branch 'bb-10.3-release' into bb-10.4-release
Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution"
was null-merged. 10.4 version of the fix is coming up separately
2021-02-12 17:44:22 +01:00
Sergei Golubchik
60ea09eae6 Merge branch '10.2' into 10.3 2021-02-01 13:49:33 +01:00
Varun Gupta
072b39da66 MDEV-22583: Selectivity for BIT columns in filtered column for EXPLAIN is incorrect
For BIT columns when EITS is collected, we store the integral value in
text representation in the min and max fields of the statistical table
When this value is retrieved from the statistical table to original table
field then we try to store the text representation in the original field
which is INCORRECT.

The value that is retrieved should be converted to integral type and that
value should be stored back in the original field. This would get us the
correct estimate for selectivity of the predicate.
2021-01-30 22:36:51 +05:30
Nikita Malyavin
21809f9a45 MDEV-17556 Assertion `bitmap_is_set_all(&table->s->all_set)' failed
The assertion failed in handler::ha_reset upon SELECT under
READ UNCOMMITTED from table with index on virtual column.

This was the debug-only failure, though the problem is mush wider:
* MY_BITMAP is a structure containing my_bitmap_map, the latter is a raw
 bitmap.
* read_set, write_set and vcol_set of TABLE are the pointers to MY_BITMAP
* The rest of MY_BITMAPs are stored in TABLE and TABLE_SHARE
* The pointers to the stored MY_BITMAPs, like orig_read_set etc, and
 sometimes all_set and tmp_set, are assigned to the pointers.
* Sometimes tmp_use_all_columns is used to substitute the raw bitmap
 directly with all_set.bitmap
* Sometimes even bitmaps are directly modified, like in
TABLE::update_virtual_field(): bitmap_clear_all(&tmp_set) is called.

The last three bullets in the list, when used together (which is mostly
always) make the program flow cumbersome and impossible to follow,
notwithstanding the errors they cause, like this MDEV-17556, where tmp_set
pointer was assigned to read_set, write_set and vcol_set, then its bitmap
was substituted with all_set.bitmap by dbug_tmp_use_all_columns() call,
and then bitmap_clear_all(&tmp_set) was applied to all this.

To untangle this knot, the rule should be applied:
* Never substitute bitmaps! This patch is about this.
 orig_*, all_set bitmaps are never substituted already.

This patch changes the following function prototypes:
* tmp_use_all_columns, dbug_tmp_use_all_columns
 to accept MY_BITMAP** and to return MY_BITMAP * instead of my_bitmap_map*
* tmp_restore_column_map, dbug_tmp_restore_column_maps to accept
 MY_BITMAP* instead of my_bitmap_map*

These functions now will substitute read_set/write_set/vcol_set directly,
and won't touch underlying bitmaps.
2021-01-27 00:50:55 +10:00
Varun Gupta
3b94309a6c MDEV-23753: SIGSEGV in Column_stat::store_stat_fields
For EITS collection min and max fields are allocated for each column
that is set in the read_set bitmap of a table. This allocation of min and max
fields happens inside alloc_statistics_for_table.

For a partitioned table ha_rnd_init is called inside the function
collect_statistics_for_table which sets the read_set bitmap for the columns
inside the partition expression. This happens only when there is a write lock
on the partitioned table.
But the allocation happens before this, so min and max fields are not allocated
for the columns involved in the partition expression.
This resulted in a crash, as the EITS statistics were collected but there was
no min and max field to store the value to.

The fix would be to call ha_rnd_init inside the function alloc_statistics_for_table
that would make sure that min and max fields are allocated for the columns
involved in the partition expression.
2021-01-12 18:47:35 +05:30
Nikita Malyavin
e25623e78a MDEV-17556 Assertion `bitmap_is_set_all(&table->s->all_set)' failed
The assertion failed in handler::ha_reset upon SELECT under
READ UNCOMMITTED from table with index on virtual column.

This was the debug-only failure, though the problem is mush wider:
* MY_BITMAP is a structure containing my_bitmap_map, the latter is a raw
 bitmap.
* read_set, write_set and vcol_set of TABLE are the pointers to MY_BITMAP
* The rest of MY_BITMAPs are stored in TABLE and TABLE_SHARE
* The pointers to the stored MY_BITMAPs, like orig_read_set etc, and
 sometimes all_set and tmp_set, are assigned to the pointers.
* Sometimes tmp_use_all_columns is used to substitute the raw bitmap
 directly with all_set.bitmap
* Sometimes even bitmaps are directly modified, like in
TABLE::update_virtual_field(): bitmap_clear_all(&tmp_set) is called.

The last three bullets in the list, when used together (which is mostly
always) make the program flow cumbersome and impossible to follow,
notwithstanding the errors they cause, like this MDEV-17556, where tmp_set
pointer was assigned to read_set, write_set and vcol_set, then its bitmap
was substituted with all_set.bitmap by dbug_tmp_use_all_columns() call,
and then bitmap_clear_all(&tmp_set) was applied to all this.

To untangle this knot, the rule should be applied:
* Never substitute bitmaps! This patch is about this.
 orig_*, all_set bitmaps are never substituted already.

This patch changes the following function prototypes:
* tmp_use_all_columns, dbug_tmp_use_all_columns
 to accept MY_BITMAP** and to return MY_BITMAP * instead of my_bitmap_map*
* tmp_restore_column_map, dbug_tmp_restore_column_maps to accept
 MY_BITMAP* instead of my_bitmap_map*

These functions now will substitute read_set/write_set/vcol_set directly,
and won't touch underlying bitmaps.
2021-01-08 16:04:29 +10:00
Marko Mäkelä
6a1e655cb0 Merge 10.4 into 10.5 2020-12-02 18:29:49 +02:00
Marko Mäkelä
589cf8dbf3 Merge 10.3 into 10.4 2020-12-01 19:51:14 +02:00
Sergei Golubchik
00f54b56b1 cleanup: RAII helper for changing thd->count_cuted_rows 2020-11-25 22:19:59 +01:00
Marko Mäkelä
4d4865de6f Merge 10.4 into 10.5 2020-07-20 15:55:59 +03:00
Marko Mäkelä
4b959bd8df Merge 10.3 into 10.4 2020-07-20 15:34:59 +03:00
Marko Mäkelä
acc58fd835 Merge 10.2 into 10.3 2020-07-20 15:11:59 +03:00
Marko Mäkelä
ca9276e37e Merge 10.1 into 10.2 2020-07-20 14:53:24 +03:00
Varun Gupta
dfdfeecb03 MDEV-22851: Engine independent index statistics are incorrect for large tables on Windows
An oveflow was happening on windows because on Windows sizeof(ulong) is 4 bytes
while it is 8 bytes on Linux.
Switched avg_frequency and avg length for column statistics to ulonglong.
Switched avg_frequency for index statistics to ulonglong.
2020-07-15 11:27:32 +05:30
Marko Mäkelä
701efbb25b Merge 10.4 into 10.5 2020-06-03 09:45:39 +03:00
Marko Mäkelä
8059148154 Merge 10.3 into 10.4 2020-06-03 07:32:09 +03:00
Varun Gupta
d5e8b4d7f9 MDEV-22509: Server crashes in Field_inet6::store_inet6_null_with_warn / Field::maybe_null
For field with type INET, during EITS collection the min and max values are store in text
representation in the statistical table.
While retrieving the value from the statistical table, the value is stored back in the original
field using binary form instead of text and this was resulting in the crash.

Introduced 2 functions in the Field structure:
  1) store_to_statistical_minmax_field
  2) store_from_statistical_minmax_field
2020-06-02 17:43:45 +05:30
Marko Mäkelä
8300f639a1 Merge 10.2 into 10.3 2020-06-02 10:25:11 +03:00
Marko Mäkelä
d72eebaa3d Merge 10.1 into 10.2 2020-06-01 09:33:03 +03:00
Sergey Vojtovich
c279878493 Thread safe histograms loading
Previously multiple threads were allowed to load histograms concurrently.
There were no known problems caused by this. But given amount of data
races in this code, it'd happen sooner or later.

To avoid scalability bottleneck, histograms loading is protected by
per-TABLE_SHARE atomic variable.

Whenever histograms were loaded by preceding statement (hot-path), a
scalable load-acquire check is performed.

Whenever histograms have to be loaded anew, mutual exclusion for loaders
is established by atomic variable. If histograms are being loaded
concurrently, statement waits until load is completed.

- Table_statistics::total_hist_size moved to TABLE_STATISTICS_CB: only
  meaningful within TABLE_SHARE (not used for collected stats).
- TABLE_STATISTICS_CB::histograms_can_be_read and
  TABLE_STATISTICS_CB::histograms_are_read are replaced with a tri state
  atomic variable.
- Simplified away alloc_histograms_for_table_share().

Note: there's still likely a data race if a thread attempts accessing
histograms data after it failed to load it (because of concurrent load).
It was there previously and goes out of the scope of this effort. One way
of fixing it could be reviving TABLE::histograms_are_read and adding
appropriate checks whenever it is needed.

Part of MDEV-19061 - table_share used for reading statistical tables is
                     not protected
2020-05-29 21:53:54 +04:00
Sergey Vojtovich
609a0d3db3 Thread safe statistics loading
Previously multiple threads were allowed to load statistics concurrently.
There were no known problems caused by this. But given amount of data
races in this code, it'd happen sooner or later.

To avoid scalability bottleneck, statistics loading is protected by
per-TABLE_SHARE atomic variable.

Whenever statistics were loaded by preceding statement (hot-path), a
scalable load-acquire check is performed.

Whenever statistics have to be loaded anew, mutual exclusion for loaders
is established by atomic variable. If statistics are being loaded
concurrently, statement waits until load is completed.

TABLE_STATISTICS_CB::stats_can_be_read and
TABLE_STATISTICS_CB::stats_is_read are replaced with a tri state atomic
variable.

Part of MDEV-19061 - table_share used for reading statistical tables is
                     not protected
2020-05-29 21:53:54 +04:00
Sergey Vojtovich
1055a7f4fc Simplified away statistics_for_tables_is_needed()
Removed redundant loops, integrated logics into the caller instead.
Unified condition in read_statistics_for_tables(), less
"table_share != NULL" checks, no more potential "table_share == NULL"
dereferencing.

Part of MDEV-19061 - table_share used for reading statistical tables is
                     not protected
2020-05-29 21:53:54 +04:00