mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-11-22 17:44:29 +03:00

Author	SHA1	Message	Date
Monty	a49ebf71af	Fixed memory leak when using histograms This was introduced in last merge with 10.6 The reason is that 10.6 does not need anything special to free histograms as everything is allocated on a memroot. In 10.10 histograms is using the vector class, which has some problems: - No automatic free - No memory usage accounting (we should at some point remove vector usage because of the above problem) Fixed by expliciting freeing histograms when freeing TABLE_STATISTICS objects.	2023-10-17 15:12:49 +03:00
Marko Mäkelä	d5e15424d8	Merge 10.6 into 10.10 The MDEV-29693 conflict resolution is from Monty, as well as is a bug fix where ANALYZE TABLE wrongly built histograms for single-column PRIMARY KEY. Also includes a fix for safe_malloc error reporting. Other things: - Copied main.log_slow from 10.4 to avoid mtr issue Disabled test: - spider/bugfix.mdev_27239 because we started to get +Error 1429 Unable to connect to foreign data source: localhost -Error 1158 Got an error reading communication packets - main.delayed - Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED This part is disabled for now as it fails randomly with different warnings/errors (no corruption).	2023-10-14 13:36:11 +03:00
Monty	e3b36b8f1b	MDEV-31957 Concurrent ALTER and ANALYZE collecting statistics can result in stale statistical data Example of what causes the problem: T1: ANALYZE TABLE starts to collect statistics T2: ALTER TABLE starts by deleting statistics for all changed fields, then creates a temp table and copies data to it. T1: ANALYZE ends and writes to the statistics tables. T2: ALTER TABLE renames temp table in place of the old table. Now the statistics from analyze matches the old deleted tables. Fixed by waiting to delete old statistics until ALTER TABLE is the only one using the old table and ensure that rename of columns can handle swapping of column names. rename_columns_in_stat_table() (former rename_column_in_stat_tables()) now takes a list of columns to rename. It uses the following algorithm to update column_stats to be able to handle circular renames - While there are columns to be renamed and it is the first loop or last rename loop did change something. - Loop over all columns to be renamed - Change column name in column_stat - If fail because of duplicate key - If this is first change attempt for this column - Change column name to a temporary column name - If there was a conflicting row, replace it with the current row. else - Remove entry from column list - Loop over all remaining columns in the list - Remove the conflicting row - Change column from temporary name to final name in column_stat Other things: - Don't flush tables for every operation. Only flush when all updates are done. - Rename of columns was not handled in case of ALGORITHM=copy (old bug). - Fixed that we do not collect statistics for hidden hash columns used by UNIQUE constraint on long values. - Fixed that we do not collect statistics for blob columns referred by generated virtual columns. This was achieved by storing the fields for which we want to have statistics in table->has_value_set instead of in table->read_set. - Rename of indexes was not handled for persistent statistics. - This is now handled similar as rename of columns. Renamed columns are now stored in 'rename_stat_indexes' and handled in Alter_info::delete_statistics() together with drooped indexes. - ALTER TABLE .. ADD INDEX may instead of creating a new index rename an existing generated foreign key index. This was not reflected in the index_stats table because this was handled in mysql_prepare_create_table instead instead of in the mysql_alter() code. Fixed by adding a call in mysql_prepare_create_table() to drop the changed index. I also had to change the code that 'marked the index' to be ignored with code that would not destroy the original index name. Reviewer: Sergei Petrunia <sergey@mariadb.com>	2023-10-03 08:25:30 +03:00
Monty	a6bf4b5807	MDEV-29693 ANALYZE TABLE still flushes table definition cache when engine-independent statistics is used This commits enables reloading of engine-independent statistics without flushing the table from table definition cache. This is achieved by allowing multiple version of the TABLE_STATISTICS_CB object and having independent pointers to it in TABLE and TABLE_SHARE. The TABLE_STATISTICS_CB object have reference pointers and are freed when no one is pointing to it anymore. TABLE's TABLE_STATISTICS_CB pointer is updated to use the TABLE_SHARE's pointer when read_statistics_for_tables() is called at the beginning of a query. Main changes: - read_statistics_for_table() will allocate an new TABLE_STATISTICS_CB object. - All get_stat_values() functions has a new parameter that tells where collected data should be stored. get_stat_values() are not using the table_field object anymore to store data. - All get_stat_values() functions returns 1 if they found any data in the statistics tables. Other things: - Fixed INSERT DELAYED to not read statistics tables. - Removed Statistics_state from TABLE_STATISTICS_CB as this is not needed anymore as wer are not changing TABLE_SHARE->stats_cb while calculating or loading statistics. - Store values used with store_from_statistical_minmax_field() in TABLE_STATISTICS_CB::mem_root. This allowed me to remove the function delete_stat_values_for_table_share(). - Field_blob::store_from_statistical_minmax_field() is implemented but is not normally used as we do not yet support EIS statistics for blobs. For example Field_blob::update_min() and Field_blob::update_max() are not implemented. Note that the function can be called if there is an concurrent "ALTER TABLE MODIFY field BLOB" running because of a bug in ALTER TABLE where it deletes entries from column_stats before it has an exclusive lock on the table. - Use result of field->val_str(&val) as a pointer to the result instead of val (safetly fix). - Allocate memory for collected statistics in THD::mem_root, not in in TABLE::mem_root. This could cause the TABLE object to grow if a ANALYZE TABLE was run many times on the same table. This was done in allocate_statistics_for_table(), create_min_max_statistical_fields_for_table() and create_min_max_statistical_fields_for_table_share(). - Store in TABLE_STATISTICS_CB::stats_available which statistics was found in the statistics tables. - Removed index_table from class Index_prefix_calc as it was not used. - Added TABLE_SHARE::LOCK_statistics to ensure we don't load EITS in parallel. First thread will load it, others will reuse the loaded data. - Eliminate read_histograms_for_table(). The loading happens within read_statistics_for_tables() if histograms are needed. One downside is that if we have read statistics without histograms before and someone requires histograms, we have to read all statistics again (once) from the statistics tables. A smaller downside is the need to call alloc_root() for each individual histogram. Before we could allocate all the space for histograms with a single alloc_root. - Fixed bug in MyISAM and Aria where they did not properly notice that table had changed after analyze table. This was not a problem before this patch as then the MyISAM and Aria tables where flushed as part of ANALYZE table which did hide this issue. - Fixed a bug in ANALYZE table where table->records could be seen as 0 in collect_statistics_for_table(). The effect of this unlikely bug was that a full table scan could be done even if analyze_sample_percentage was not set to 1. - Changed multiple mallocs in a row to use multi_alloc_root(). - Added a mutex protection in update_statistics_for_table() to ensure that several tables are not updating the statistics at the same time. Some of the changes in sql_statistics.cc are based on a patch from Oleg Smirnov <olernov@gmail.com> Co-authored-by: Oleg Smirnov <olernov@gmail.com> Co-authored-by: Vicentiu Ciorbaru <cvicentiu@gmail.com> Reviewer: Sergei Petrunia <sergey@mariadb.com>	2023-08-18 13:28:39 +03:00
Sergei Petrunia	ce4956f322	Code cleanup	2022-01-19 18:14:07 +03:00
Sergei Petrunia	db8f15be93	MDEV-27229: Estimation for filtered rows less precise ... #5 Followup: remove this line from get_column_range_cardinality() set_if_bigger(res, col_stats->get_avg_frequency()); and make sure it is only used with the binary histograms. For JSON histograms, it makes the estimates unnecessarily imprecise.	2022-01-19 18:10:12 +03:00
Sergei Petrunia	1d14176ec4	MDEV-26519: Improved histograms: Make JSON parser efficient Previous JSON parser was using an API which made the parsing inefficient: the same JSON contents was parsed again and again. Switch to using a lower-level parsing API which allows to do parsing in an efficient way.	2022-01-19 18:10:11 +03:00
Sergei Petrunia	05877df472	MDEV-26849: JSON Histograms: point selectivity estimates are off .. for non-existent values. Handle this special case.	2022-01-19 18:10:11 +03:00
Sergei Petrunia	702f4efcd9	More "straightforward" memory management Do not put Histogram objects on MEM_ROOT at all	2022-01-19 18:10:10 +03:00
Sergei Petrunia	9271bd17f7	More code cleanups Remove Histogram_*::is_available(), it is not applicable anymore. Fix compilation on Windows	2022-01-19 18:10:10 +03:00
Sergei Petrunia	1d98168547	Move JSON histograms code into its own files	2022-01-19 18:10:10 +03:00
Sergei Petrunia	4ab2b78b65	Histogram code cleanup and fixes Factor the code that updates count, count_distinct, count_distinct_single_occurrence into class Basic_stats_collector Change from Histogram_builder and its descendant Histogram_builder_json to Histogram_builder (the interface), and Histogram_binary_builder, Histogram_json_builder. In Histogram_json_builder, do not forget to collect the right bound of the right-most bucket.	2022-01-19 18:10:10 +03:00
Sergei Petrunia	859c14ff01	Better names: s/histogram_/histogram/, s/Histogram_json/Histogram_json_hb/	2022-01-19 18:10:09 +03:00
Sergei Petrunia	fc6a4a33b2	Cleanup histogram collection code	2022-01-19 18:10:09 +03:00
Sergei Petrunia	02a67307d3	Fix compiation on windows	2022-01-19 18:10:09 +03:00
Sergei Petrunia	3486bf4110	Code cleanup + reduce the diff size	2022-01-19 18:10:09 +03:00
Sergei Petrunia	a93b377863	Fix histogram memory management There are "local" histograms that are allocated by one thread for one TABLE object, and "global" that are allocated for TABLE_SHARE.	2022-01-19 18:10:09 +03:00
Sergei Petrunia	fcf58a5e0f	Code cleanup part#2: do not copy key values in xxx_selectivity() functions	2022-01-19 18:10:09 +03:00
Sergei Petrunia	2a1cdbabec	Fix JSON parsing: future-proof data representation in JSON, code cleanup	2022-01-19 18:10:09 +03:00
Sergei Petrunia	a0b4a86822	Code cleanup part #2 .	2022-01-19 18:10:09 +03:00
Sergei Petrunia	72c0ba43b2	Code cleanup part #1	2022-01-19 18:10:09 +03:00
Sergei Petrunia	f76e310ace	Rename histogram_type=JSON to JSON_HB	2022-01-19 18:10:09 +03:00
Michael Okoko	bff65a813e	Implement point selectivity for JSON histograms * Also merges tests relating to JSON statistics into one file Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:08 +03:00
Michael Okoko	547f805311	Refactor histogram point selectivity Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:08 +03:00
Michael Okoko	63cbd0748b	replace range_selectivity methods for Histograms and add tests Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:08 +03:00
Michael Okoko	c129689ddc	Use binary search to compute range selectivity * it also adds an "explain select" statement to the test so that the fprintf calls can print the computed intervals to mysqld.1.err Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:08 +03:00
Michael Okoko	69f24c238e	Use generic Histogram_base class for Histogram_builders This fixes the wrong calculation for avg_frequency in json histograms by replacing the specific histogram objects with the generic Histogram_base class. It also restores get/set size functions as they were useful in calculating fields for binary histogram. Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:08 +03:00
Sergei Petrunia	21e0f5487f	MDEV-21130: Histograms: use JSON as on-disk format A demo of how to use in-memory data structure for histogram. The patch shows how to * convert string form of data to binary form * compare two values in binary form * compute a fraction for val in [X, Y] range. grep for GSOC-TODO for notes.	2022-01-19 18:10:08 +03:00
Michael Okoko	fe2e516a50	inform test result of zero hist_size for json histogram Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:08 +03:00
Michael Okoko	bf4d0dcfe2	implement parse and serialize for histogram json	2022-01-19 18:10:08 +03:00
Michael Okoko	9bba595528	remove unneeded shared methods Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:08 +03:00
Michael Okoko	1fa7af749e	Split histogram classes and into JSON and binary classes Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:08 +03:00
Sergei Petrunia	1998b787ac	MDEV-21130: Histograms: use JSON as on-disk format Preparation for handling different kinds of histograms: - In Column_statistics, change "Histogram histogram" into "Histogram *histogram_". This allows for different kinds of Histogram classes with virtual functions. - [Almost] remove the usage of Histogram->set_values and Histogram->set_size. The code outside the histogram should not make any assumptions about what/how is stored in the Histogram. - Introduce drafts of methods to read/save histograms to/from disk.	2022-01-19 18:10:08 +03:00
Michael Okoko	9954aecc2b	Store bucket bounds and extend test cases for JSON histogram This fixes the memory allocation for json histogram builder and add more column types for testing. Some challenges at the moment include: * Garbage value at the end of JSON array still persists. * Garbage value also gets appended to bucket values if the column is a primary key. * There's a memory leak resulting in a "Warning: Memory not freed" message at the end of tests. Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:07 +03:00
Michael Okoko	2aca7b0c33	Prepare JSON as valid histogram_type Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:07 +03:00
Sergei Golubchik	e841957416	Merge branch '10.3' into 10.4	2021-02-23 09:25:57 +01:00
Sergei Golubchik	0ab1e3914c	Merge branch '10.2' into 10.3	2021-02-22 22:42:27 +01:00
Varun Gupta	a461e4d306	MDEV-19474: Histogram statistics are used even with optimizer_use_condition_selectivity=3 The issue here was histogram statistics were being used even when the level of optimizer_use_condition_selectivity doesn't allow usage of statistics from histogram. The histogram statistics are read for a table only when optimizer_use_condition_selectivity > 3. But the TABLE structure can be stored in the internal table cache and be reused for the next query. So in this case the histogram statistics will be available for the next query. The fix would be to make sure to use the histogram statistics only when optimizer_use_condition_selectivity > 3.	2021-02-16 11:53:13 +05:30
Marko Mäkelä	4b959bd8df	Merge 10.3 into 10.4	2020-07-20 15:34:59 +03:00
Marko Mäkelä	acc58fd835	Merge 10.2 into 10.3	2020-07-20 15:11:59 +03:00
Marko Mäkelä	ca9276e37e	Merge 10.1 into 10.2	2020-07-20 14:53:24 +03:00
Varun Gupta	dfdfeecb03	MDEV-22851: Engine independent index statistics are incorrect for large tables on Windows An oveflow was happening on windows because on Windows sizeof(ulong) is 4 bytes while it is 8 bytes on Linux. Switched avg_frequency and avg length for column statistics to ulonglong. Switched avg_frequency for index statistics to ulonglong.	2020-07-15 11:27:32 +05:30
Marko Mäkelä	8059148154	Merge 10.3 into 10.4	2020-06-03 07:32:09 +03:00
Marko Mäkelä	8300f639a1	Merge 10.2 into 10.3	2020-06-02 10:25:11 +03:00
Marko Mäkelä	d72eebaa3d	Merge 10.1 into 10.2	2020-06-01 09:33:03 +03:00
Sergey Vojtovich	c279878493	Thread safe histograms loading Previously multiple threads were allowed to load histograms concurrently. There were no known problems caused by this. But given amount of data races in this code, it'd happen sooner or later. To avoid scalability bottleneck, histograms loading is protected by per-TABLE_SHARE atomic variable. Whenever histograms were loaded by preceding statement (hot-path), a scalable load-acquire check is performed. Whenever histograms have to be loaded anew, mutual exclusion for loaders is established by atomic variable. If histograms are being loaded concurrently, statement waits until load is completed. - Table_statistics::total_hist_size moved to TABLE_STATISTICS_CB: only meaningful within TABLE_SHARE (not used for collected stats). - TABLE_STATISTICS_CB::histograms_can_be_read and TABLE_STATISTICS_CB::histograms_are_read are replaced with a tri state atomic variable. - Simplified away alloc_histograms_for_table_share(). Note: there's still likely a data race if a thread attempts accessing histograms data after it failed to load it (because of concurrent load). It was there previously and goes out of the scope of this effort. One way of fixing it could be reviving TABLE::histograms_are_read and adding appropriate checks whenever it is needed. Part of MDEV-19061 - table_share used for reading statistical tables is not protected	2020-05-29 21:53:54 +04:00
Marko Mäkelä	c11e5cdd12	Merge 10.3 into 10.4	2019-10-10 11:19:25 +03:00
Marko Mäkelä	892378fb9d	Merge 10.2 into 10.3	2019-10-09 13:25:11 +03:00
Marko Mäkelä	24232ec12c	Merge 10.1 into 10.2	2019-10-09 08:30:23 +03:00
Sergey Vojtovich	adefaeffcc	MDEV-19536 - Server crash or ASAN heap-use-after-free in is_temporary_table / read_statistics_for_tables_if_needed Regression after `279a907`, read_statistics_for_tables_if_needed() was called after open_normal_and_derived_tables() failure. Fixed by moving read_statistics_for_tables() call to a branch of get_schema_stat_record() where result of open_normal_and_derived_tables() is checked. Removed THD::force_read_stats, added read_statistics_for_tables() instead. Simplified away statistics_for_command_is_needed().	2019-10-07 13:30:22 +04:00

1 2

97 Commits