mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-07-08 17:02:21 +03:00

Author	SHA1	Message	Date
Marko Mäkelä	a5b80531fb	Merge 11.4 into 11.6	2024-09-04 10:38:25 +03:00
Marko Mäkelä	cfcf27c6fe	Merge 10.6 into 10.11	2024-08-29 07:47:29 +03:00
Sergei Petrunia	9020baf126	Trivial fix: Make test_if_cheaper_ordering() use actual_rec_per_key() Discovered this while working on MDEV-34720: test_if_cheaper_ordering() uses rec_per_key, while the original estimate for the access method is produced in best_access_path() by using actual_rec_per_key(). Make test_if_cheaper_ordering() also use actual_rec_per_key(). Also make several getter function "const" to make this compile. Also adjusted the testcase to handle this (the change backported from 11.0)	2024-08-25 16:05:00 +03:00
Alexander Barkov	fd247cc21f	MDEV-31340 Remove MY_COLLATION_HANDLER::strcasecmp() This patch also fixes: MDEV-33050 Build-in schemas like oracle_schema are accent insensitive MDEV-33084 LASTVAL(t1) and LASTVAL(T1) do not work well with lower-case-table-names=0 MDEV-33085 Tables T1 and t1 do not work well with ENGINE=CSV and lower-case-table-names=0 MDEV-33086 SHOW OPEN TABLES IN DB1 -- is case insensitive with lower-case-table-names=0 MDEV-33088 Cannot create triggers in the database `MYSQL` MDEV-33103 LOCK TABLE t1 AS t2 -- alias is not case sensitive with lower-case-table-names=0 MDEV-33109 DROP DATABASE MYSQL -- does not drop SP with lower-case-table-names=0 MDEV-33110 HANDLER commands are case insensitive with lower-case-table-names=0 MDEV-33119 User is case insensitive in INFORMATION_SCHEMA.VIEWS MDEV-33120 System log table names are case insensitive with lower-cast-table-names=0 - Removing the virtual function strnncoll() from MY_COLLATION_HANDLER - Adding a wrapper function CHARSET_INFO::streq(), to compare two strings for equality. For now it calls strnncoll() internally. In the future it will turn into a virtual function. - Adding new accent sensitive case insensitive collations: - utf8mb4_general1400_as_ci - utf8mb3_general1400_as_ci They implement accent sensitive case insensitive comparison. The weight of a character is equal to the code point of its upper case variant. These collations use Unicode-14.0.0 casefolding data. The result of my_charset_utf8mb3_general1400_as_ci.strcoll() is very close to the former my_charset_utf8mb3_general_ci.strcasecmp() There is only a difference in a couple dozen rare characters, because: - the switch from "tolower" to "toupper" comparison, to make utf8mb3_general1400_as_ci closer to utf8mb3_general_ci - the switch from Unicode-3.0.0 to Unicode-14.0.0 This difference should be tolarable. See the list of affected characters in the MDEV description. Note, utf8mb4_general1400_as_ci correctly handles non-BMP characters! Unlike utf8mb4_general_ci, it does not treat all BMP characters as equal. - Adding classes representing names of the file based database objects: Lex_ident_db Lex_ident_table Lex_ident_trigger Their comparison collation depends on the underlying file system case sensitivity and on --lower-case-table-names and can be either my_charset_bin or my_charset_utf8mb3_general1400_as_ci. - Adding classes representing names of other database objects, whose names have case insensitive comparison style, using my_charset_utf8mb3_general1400_as_ci: Lex_ident_column Lex_ident_sys_var Lex_ident_user_var Lex_ident_sp_var Lex_ident_ps Lex_ident_i_s_table Lex_ident_window Lex_ident_func Lex_ident_partition Lex_ident_with_element Lex_ident_rpl_filter Lex_ident_master_info Lex_ident_host Lex_ident_locale Lex_ident_plugin Lex_ident_engine Lex_ident_server Lex_ident_savepoint Lex_ident_charset engine_option_value::Name - All the mentioned Lex_ident_xxx classes implement a method streq(): if (ident1.streq(ident2)) do_equal(); This method works as a wrapper for CHARSET_INFO::streq(). - Changing a lot of "LEX_CSTRING name" to "Lex_ident_xxx name" in class members and in function/method parameters. - Replacing all calls like system_charset_info->coll->strcasecmp(ident1, ident2) to ident1.streq(ident2) - Taking advantage of the c++11 user defined literal operator for LEX_CSTRING (see m_strings.h) and Lex_ident_xxx (see lex_ident.h) data types. Use example: const Lex_ident_column primary_key_name= "PRIMARY"_Lex_ident_column; is now a shorter version of: const Lex_ident_column primary_key_name= Lex_ident_column({STRING_WITH_LEN("PRIMARY")});	2024-04-18 15:22:10 +04:00
Marko Mäkelä	3036b36f9b	Merge 10.10 into 10.11	2023-10-23 18:44:12 +03:00
Marko Mäkelä	5a8fca5a4f	Merge 10.6 into 10.10	2023-10-23 18:43:36 +03:00
Monty	a1b6befc78	Fixed crash in is_stat_table() when using hash joins. Other usage if persistent statistics is checking 'stats_is_read' in caller, which is why this was not noticed earlier. Other things: - Simplified no_stat_values_provided	2023-10-19 16:17:01 +03:00
Marko Mäkelä	2ecc0443ec	Merge 10.10 into 10.11	2023-10-17 16:04:21 +03:00
Monty	a49ebf71af	Fixed memory leak when using histograms This was introduced in last merge with 10.6 The reason is that 10.6 does not need anything special to free histograms as everything is allocated on a memroot. In 10.10 histograms is using the vector class, which has some problems: - No automatic free - No memory usage accounting (we should at some point remove vector usage because of the above problem) Fixed by expliciting freeing histograms when freeing TABLE_STATISTICS objects.	2023-10-17 15:12:49 +03:00
Marko Mäkelä	d5e15424d8	Merge 10.6 into 10.10 The MDEV-29693 conflict resolution is from Monty, as well as is a bug fix where ANALYZE TABLE wrongly built histograms for single-column PRIMARY KEY. Also includes a fix for safe_malloc error reporting. Other things: - Copied main.log_slow from 10.4 to avoid mtr issue Disabled test: - spider/bugfix.mdev_27239 because we started to get +Error 1429 Unable to connect to foreign data source: localhost -Error 1158 Got an error reading communication packets - main.delayed - Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED This part is disabled for now as it fails randomly with different warnings/errors (no corruption).	2023-10-14 13:36:11 +03:00
Monty	e3b36b8f1b	MDEV-31957 Concurrent ALTER and ANALYZE collecting statistics can result in stale statistical data Example of what causes the problem: T1: ANALYZE TABLE starts to collect statistics T2: ALTER TABLE starts by deleting statistics for all changed fields, then creates a temp table and copies data to it. T1: ANALYZE ends and writes to the statistics tables. T2: ALTER TABLE renames temp table in place of the old table. Now the statistics from analyze matches the old deleted tables. Fixed by waiting to delete old statistics until ALTER TABLE is the only one using the old table and ensure that rename of columns can handle swapping of column names. rename_columns_in_stat_table() (former rename_column_in_stat_tables()) now takes a list of columns to rename. It uses the following algorithm to update column_stats to be able to handle circular renames - While there are columns to be renamed and it is the first loop or last rename loop did change something. - Loop over all columns to be renamed - Change column name in column_stat - If fail because of duplicate key - If this is first change attempt for this column - Change column name to a temporary column name - If there was a conflicting row, replace it with the current row. else - Remove entry from column list - Loop over all remaining columns in the list - Remove the conflicting row - Change column from temporary name to final name in column_stat Other things: - Don't flush tables for every operation. Only flush when all updates are done. - Rename of columns was not handled in case of ALGORITHM=copy (old bug). - Fixed that we do not collect statistics for hidden hash columns used by UNIQUE constraint on long values. - Fixed that we do not collect statistics for blob columns referred by generated virtual columns. This was achieved by storing the fields for which we want to have statistics in table->has_value_set instead of in table->read_set. - Rename of indexes was not handled for persistent statistics. - This is now handled similar as rename of columns. Renamed columns are now stored in 'rename_stat_indexes' and handled in Alter_info::delete_statistics() together with drooped indexes. - ALTER TABLE .. ADD INDEX may instead of creating a new index rename an existing generated foreign key index. This was not reflected in the index_stats table because this was handled in mysql_prepare_create_table instead instead of in the mysql_alter() code. Fixed by adding a call in mysql_prepare_create_table() to drop the changed index. I also had to change the code that 'marked the index' to be ignored with code that would not destroy the original index name. Reviewer: Sergei Petrunia <sergey@mariadb.com>	2023-10-03 08:25:30 +03:00
Monty	a6bf4b5807	MDEV-29693 ANALYZE TABLE still flushes table definition cache when engine-independent statistics is used This commits enables reloading of engine-independent statistics without flushing the table from table definition cache. This is achieved by allowing multiple version of the TABLE_STATISTICS_CB object and having independent pointers to it in TABLE and TABLE_SHARE. The TABLE_STATISTICS_CB object have reference pointers and are freed when no one is pointing to it anymore. TABLE's TABLE_STATISTICS_CB pointer is updated to use the TABLE_SHARE's pointer when read_statistics_for_tables() is called at the beginning of a query. Main changes: - read_statistics_for_table() will allocate an new TABLE_STATISTICS_CB object. - All get_stat_values() functions has a new parameter that tells where collected data should be stored. get_stat_values() are not using the table_field object anymore to store data. - All get_stat_values() functions returns 1 if they found any data in the statistics tables. Other things: - Fixed INSERT DELAYED to not read statistics tables. - Removed Statistics_state from TABLE_STATISTICS_CB as this is not needed anymore as wer are not changing TABLE_SHARE->stats_cb while calculating or loading statistics. - Store values used with store_from_statistical_minmax_field() in TABLE_STATISTICS_CB::mem_root. This allowed me to remove the function delete_stat_values_for_table_share(). - Field_blob::store_from_statistical_minmax_field() is implemented but is not normally used as we do not yet support EIS statistics for blobs. For example Field_blob::update_min() and Field_blob::update_max() are not implemented. Note that the function can be called if there is an concurrent "ALTER TABLE MODIFY field BLOB" running because of a bug in ALTER TABLE where it deletes entries from column_stats before it has an exclusive lock on the table. - Use result of field->val_str(&val) as a pointer to the result instead of val (safetly fix). - Allocate memory for collected statistics in THD::mem_root, not in in TABLE::mem_root. This could cause the TABLE object to grow if a ANALYZE TABLE was run many times on the same table. This was done in allocate_statistics_for_table(), create_min_max_statistical_fields_for_table() and create_min_max_statistical_fields_for_table_share(). - Store in TABLE_STATISTICS_CB::stats_available which statistics was found in the statistics tables. - Removed index_table from class Index_prefix_calc as it was not used. - Added TABLE_SHARE::LOCK_statistics to ensure we don't load EITS in parallel. First thread will load it, others will reuse the loaded data. - Eliminate read_histograms_for_table(). The loading happens within read_statistics_for_tables() if histograms are needed. One downside is that if we have read statistics without histograms before and someone requires histograms, we have to read all statistics again (once) from the statistics tables. A smaller downside is the need to call alloc_root() for each individual histogram. Before we could allocate all the space for histograms with a single alloc_root. - Fixed bug in MyISAM and Aria where they did not properly notice that table had changed after analyze table. This was not a problem before this patch as then the MyISAM and Aria tables where flushed as part of ANALYZE table which did hide this issue. - Fixed a bug in ANALYZE table where table->records could be seen as 0 in collect_statistics_for_table(). The effect of this unlikely bug was that a full table scan could be done even if analyze_sample_percentage was not set to 1. - Changed multiple mallocs in a row to use multi_alloc_root(). - Added a mutex protection in update_statistics_for_table() to ensure that several tables are not updating the statistics at the same time. Some of the changes in sql_statistics.cc are based on a patch from Oleg Smirnov <olernov@gmail.com> Co-authored-by: Oleg Smirnov <olernov@gmail.com> Co-authored-by: Vicentiu Ciorbaru <cvicentiu@gmail.com> Reviewer: Sergei Petrunia <sergey@mariadb.com>	2023-08-18 13:28:39 +03:00
Otto Kekalainen	50c8ef01fc	Fix trivial spelling errors - agressively -> aggressively - exising -> existing - occured -> occurred - releated -> related - seperated -> separated - sucess -> success - use use -> use All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.	2023-03-24 12:54:05 +11:00
Sergei Petrunia	ce4956f322	Code cleanup	2022-01-19 18:14:07 +03:00
Sergei Petrunia	db8f15be93	MDEV-27229: Estimation for filtered rows less precise ... #5 Followup: remove this line from get_column_range_cardinality() set_if_bigger(res, col_stats->get_avg_frequency()); and make sure it is only used with the binary histograms. For JSON histograms, it makes the estimates unnecessarily imprecise.	2022-01-19 18:10:12 +03:00
Sergei Petrunia	1d14176ec4	MDEV-26519: Improved histograms: Make JSON parser efficient Previous JSON parser was using an API which made the parsing inefficient: the same JSON contents was parsed again and again. Switch to using a lower-level parsing API which allows to do parsing in an efficient way.	2022-01-19 18:10:11 +03:00
Sergei Petrunia	05877df472	MDEV-26849: JSON Histograms: point selectivity estimates are off .. for non-existent values. Handle this special case.	2022-01-19 18:10:11 +03:00
Sergei Petrunia	702f4efcd9	More "straightforward" memory management Do not put Histogram objects on MEM_ROOT at all	2022-01-19 18:10:10 +03:00
Sergei Petrunia	9271bd17f7	More code cleanups Remove Histogram_*::is_available(), it is not applicable anymore. Fix compilation on Windows	2022-01-19 18:10:10 +03:00
Sergei Petrunia	1d98168547	Move JSON histograms code into its own files	2022-01-19 18:10:10 +03:00
Sergei Petrunia	4ab2b78b65	Histogram code cleanup and fixes Factor the code that updates count, count_distinct, count_distinct_single_occurrence into class Basic_stats_collector Change from Histogram_builder and its descendant Histogram_builder_json to Histogram_builder (the interface), and Histogram_binary_builder, Histogram_json_builder. In Histogram_json_builder, do not forget to collect the right bound of the right-most bucket.	2022-01-19 18:10:10 +03:00
Sergei Petrunia	859c14ff01	Better names: s/histogram_/histogram/, s/Histogram_json/Histogram_json_hb/	2022-01-19 18:10:09 +03:00
Sergei Petrunia	fc6a4a33b2	Cleanup histogram collection code	2022-01-19 18:10:09 +03:00
Sergei Petrunia	02a67307d3	Fix compiation on windows	2022-01-19 18:10:09 +03:00
Sergei Petrunia	3486bf4110	Code cleanup + reduce the diff size	2022-01-19 18:10:09 +03:00
Sergei Petrunia	a93b377863	Fix histogram memory management There are "local" histograms that are allocated by one thread for one TABLE object, and "global" that are allocated for TABLE_SHARE.	2022-01-19 18:10:09 +03:00
Sergei Petrunia	fcf58a5e0f	Code cleanup part#2: do not copy key values in xxx_selectivity() functions	2022-01-19 18:10:09 +03:00
Sergei Petrunia	2a1cdbabec	Fix JSON parsing: future-proof data representation in JSON, code cleanup	2022-01-19 18:10:09 +03:00
Sergei Petrunia	a0b4a86822	Code cleanup part #2 .	2022-01-19 18:10:09 +03:00
Sergei Petrunia	72c0ba43b2	Code cleanup part #1	2022-01-19 18:10:09 +03:00
Sergei Petrunia	f76e310ace	Rename histogram_type=JSON to JSON_HB	2022-01-19 18:10:09 +03:00
Michael Okoko	bff65a813e	Implement point selectivity for JSON histograms * Also merges tests relating to JSON statistics into one file Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:08 +03:00
Michael Okoko	547f805311	Refactor histogram point selectivity Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:08 +03:00
Michael Okoko	63cbd0748b	replace range_selectivity methods for Histograms and add tests Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:08 +03:00
Michael Okoko	c129689ddc	Use binary search to compute range selectivity * it also adds an "explain select" statement to the test so that the fprintf calls can print the computed intervals to mysqld.1.err Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:08 +03:00
Michael Okoko	69f24c238e	Use generic Histogram_base class for Histogram_builders This fixes the wrong calculation for avg_frequency in json histograms by replacing the specific histogram objects with the generic Histogram_base class. It also restores get/set size functions as they were useful in calculating fields for binary histogram. Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:08 +03:00
Sergei Petrunia	21e0f5487f	MDEV-21130: Histograms: use JSON as on-disk format A demo of how to use in-memory data structure for histogram. The patch shows how to * convert string form of data to binary form * compare two values in binary form * compute a fraction for val in [X, Y] range. grep for GSOC-TODO for notes.	2022-01-19 18:10:08 +03:00
Michael Okoko	fe2e516a50	inform test result of zero hist_size for json histogram Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:08 +03:00
Michael Okoko	bf4d0dcfe2	implement parse and serialize for histogram json	2022-01-19 18:10:08 +03:00
Michael Okoko	9bba595528	remove unneeded shared methods Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:08 +03:00
Michael Okoko	1fa7af749e	Split histogram classes and into JSON and binary classes Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:08 +03:00
Sergei Petrunia	1998b787ac	MDEV-21130: Histograms: use JSON as on-disk format Preparation for handling different kinds of histograms: - In Column_statistics, change "Histogram histogram" into "Histogram *histogram_". This allows for different kinds of Histogram classes with virtual functions. - [Almost] remove the usage of Histogram->set_values and Histogram->set_size. The code outside the histogram should not make any assumptions about what/how is stored in the Histogram. - Introduce drafts of methods to read/save histograms to/from disk.	2022-01-19 18:10:08 +03:00
Michael Okoko	9954aecc2b	Store bucket bounds and extend test cases for JSON histogram This fixes the memory allocation for json histogram builder and add more column types for testing. Some challenges at the moment include: * Garbage value at the end of JSON array still persists. * Garbage value also gets appended to bucket values if the column is a primary key. * There's a memory leak resulting in a "Warning: Memory not freed" message at the end of tests. Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:07 +03:00
Michael Okoko	2aca7b0c33	Prepare JSON as valid histogram_type Signed-off-by: Michael Okoko <okokomichaels@outlook.com>	2022-01-19 18:10:07 +03:00
Sergei Golubchik	e841957416	Merge branch '10.3' into 10.4	2021-02-23 09:25:57 +01:00
Sergei Golubchik	0ab1e3914c	Merge branch '10.2' into 10.3	2021-02-22 22:42:27 +01:00
Varun Gupta	a461e4d306	MDEV-19474: Histogram statistics are used even with optimizer_use_condition_selectivity=3 The issue here was histogram statistics were being used even when the level of optimizer_use_condition_selectivity doesn't allow usage of statistics from histogram. The histogram statistics are read for a table only when optimizer_use_condition_selectivity > 3. But the TABLE structure can be stored in the internal table cache and be reused for the next query. So in this case the histogram statistics will be available for the next query. The fix would be to make sure to use the histogram statistics only when optimizer_use_condition_selectivity > 3.	2021-02-16 11:53:13 +05:30
Marko Mäkelä	4b959bd8df	Merge 10.3 into 10.4	2020-07-20 15:34:59 +03:00
Marko Mäkelä	acc58fd835	Merge 10.2 into 10.3	2020-07-20 15:11:59 +03:00
Marko Mäkelä	ca9276e37e	Merge 10.1 into 10.2	2020-07-20 14:53:24 +03:00

1 2 3

106 Commits