This patch also fixes:
MDEV-33050 Build-in schemas like oracle_schema are accent insensitive
MDEV-33084 LASTVAL(t1) and LASTVAL(T1) do not work well with lower-case-table-names=0
MDEV-33085 Tables T1 and t1 do not work well with ENGINE=CSV and lower-case-table-names=0
MDEV-33086 SHOW OPEN TABLES IN DB1 -- is case insensitive with lower-case-table-names=0
MDEV-33088 Cannot create triggers in the database `MYSQL`
MDEV-33103 LOCK TABLE t1 AS t2 -- alias is not case sensitive with lower-case-table-names=0
MDEV-33109 DROP DATABASE MYSQL -- does not drop SP with lower-case-table-names=0
MDEV-33110 HANDLER commands are case insensitive with lower-case-table-names=0
MDEV-33119 User is case insensitive in INFORMATION_SCHEMA.VIEWS
MDEV-33120 System log table names are case insensitive with lower-cast-table-names=0
- Removing the virtual function strnncoll() from MY_COLLATION_HANDLER
- Adding a wrapper function CHARSET_INFO::streq(), to compare
two strings for equality. For now it calls strnncoll() internally.
In the future it will turn into a virtual function.
- Adding new accent sensitive case insensitive collations:
- utf8mb4_general1400_as_ci
- utf8mb3_general1400_as_ci
They implement accent sensitive case insensitive comparison.
The weight of a character is equal to the code point of its
upper case variant. These collations use Unicode-14.0.0 casefolding data.
The result of
my_charset_utf8mb3_general1400_as_ci.strcoll()
is very close to the former
my_charset_utf8mb3_general_ci.strcasecmp()
There is only a difference in a couple dozen rare characters, because:
- the switch from "tolower" to "toupper" comparison, to make
utf8mb3_general1400_as_ci closer to utf8mb3_general_ci
- the switch from Unicode-3.0.0 to Unicode-14.0.0
This difference should be tolarable. See the list of affected
characters in the MDEV description.
Note, utf8mb4_general1400_as_ci correctly handles non-BMP characters!
Unlike utf8mb4_general_ci, it does not treat all BMP characters
as equal.
- Adding classes representing names of the file based database objects:
Lex_ident_db
Lex_ident_table
Lex_ident_trigger
Their comparison collation depends on the underlying
file system case sensitivity and on --lower-case-table-names
and can be either my_charset_bin or my_charset_utf8mb3_general1400_as_ci.
- Adding classes representing names of other database objects,
whose names have case insensitive comparison style,
using my_charset_utf8mb3_general1400_as_ci:
Lex_ident_column
Lex_ident_sys_var
Lex_ident_user_var
Lex_ident_sp_var
Lex_ident_ps
Lex_ident_i_s_table
Lex_ident_window
Lex_ident_func
Lex_ident_partition
Lex_ident_with_element
Lex_ident_rpl_filter
Lex_ident_master_info
Lex_ident_host
Lex_ident_locale
Lex_ident_plugin
Lex_ident_engine
Lex_ident_server
Lex_ident_savepoint
Lex_ident_charset
engine_option_value::Name
- All the mentioned Lex_ident_xxx classes implement a method streq():
if (ident1.streq(ident2))
do_equal();
This method works as a wrapper for CHARSET_INFO::streq().
- Changing a lot of "LEX_CSTRING name" to "Lex_ident_xxx name"
in class members and in function/method parameters.
- Replacing all calls like
system_charset_info->coll->strcasecmp(ident1, ident2)
to
ident1.streq(ident2)
- Taking advantage of the c++11 user defined literal operator
for LEX_CSTRING (see m_strings.h) and Lex_ident_xxx (see lex_ident.h)
data types. Use example:
const Lex_ident_column primary_key_name= "PRIMARY"_Lex_ident_column;
is now a shorter version of:
const Lex_ident_column primary_key_name=
Lex_ident_column({STRING_WITH_LEN("PRIMARY")});
This patch fixes the issue with passing the DEFAULT or IGNORE values to
positional parameters for some kind of SQL statements to be executed
as prepared statements.
The main idea of the patch is to associate an actual value being passed
by the USING clause with the positional parameter represented by
the Item_param class. Such association must be performed on execution of
UPDATE statement in PS/SP mode. Other corner cases that results in
server crash is on handling CREATE TABLE when positional parameter
placed after the DEFAULT clause or CALL statement and passing either
the value DEFAULT or IGNORE as an actual value for the positional parameter.
This case is fixed by checking whether an error is set in diagnostics
area at the function pack_vcols() on return from the function pack_expression()
use the original, not the truncated, field in the long unique prefix,
that is, in the hash(left(field, length)) expression.
because MyISAM CHECK/REPAIR in compute_vcols() moves table->field
but not prefix fields from keyparts.
Also, implement Field_string::cmp_prefix() for prefix comparison
of CHAR columns to work.
Enable unusable key notes for non-equality predicates:
<, <=, =>, >, BETWEEN, IN, LIKE
Note, in some scenarios it displays duplicate notes, e.g.
for queries with ORDER BY:
SELECT * FROM t1
WHERE indexed_string_column >= 10
ORDER BY indexed_string_column
LIMIT 5;
This should be tolarable. Getting rid of the diplicate note
completely would need a much more complex patch, which is
not desiable in 10.6.
Details:
- Changing RANGE_OPT_PARAM::note_unusable_keys from bool
to a new data type Item_func::Bitmap, so the caller can
choose with a better granuality which predicates
should raise unusable key notes inside the range optimizer:
a. all predicates (=, <=>, <, <=, =>, >, BETWEEN, IN, LIKE)
b. all predicates except equality (=, <=>)
c. none of the predicates
"b." is needed because in some scenarios equality predicates (=, <=>)
send unusable key notes at an earlier stage, before the range optimizer,
during update_ref_and_keys(). Calling the range optimizer with
"all predicates" would produce duplicate notes for = and <=> in such cases.
- Fixing get_quick_record_count() to call the range optimizer
with "all predicates except equality" instead of "none of the predicates".
Before this change the range optimizer suppressed all notes for
non-equality predicates: <, <=, =>, >, BETWEEN, IN, LIKE.
This actually fixes the reported problem.
- Fixing JOIN::make_range_rowid_filters() to call the range optimizer
with "all predicates except equality" instead of "all predicates".
Before this change the range optimizer produced duplicate notes
for = and <=> during a rowid_filter optimization.
- Cleanup:
Adding the op_collation argument to Field::raise_note_cannot_use_key_part()
and displaying the operation collation rather than the argument collation
in the unusable key note. This is important for operations with more than
two arguments: BETWEEN and IN, e.g.:
SELECT * FROM t1
WHERE column_utf8mb3_general_ci
BETWEEN 'a' AND 'b' COLLATE utf8mb3_unicode_ci;
SELECT * FROM t1
WHERE column_utf8mb3_general_ci
IN ('a', 'b' COLLATE utf8mb3_unicode_ci);
The note for 'a' now prints utf8mb3_unicode_ci as the collation.
which is the collation of the entire operation:
Cannot use key key1 part[0] for lookup:
"`column_utf8mb3_general_ci`" of collation `utf8mb3_general_ci` >=
"'a'" of collation `utf8mb3_unicode_ci`
Before this change it printed the collation of 'a',
so the note was confusing:
Cannot use key key1 part[0] for lookup:
"`column_utf8mb3_general_ci`" of collation `utf8mb3_general_ci` >=
"'a'" of collation `utf8mb3_general_ci`"
(Variant#3: Allow cross-charset comparisons, use a special
CHARSET_INFO to create lookup keys. Review input addressed.)
Equalities that compare utf8mb{3,4}_general_ci strings, like:
WHERE ... utf8mb3_key_col=utf8mb4_value (MB3-4-CMP)
can now be used to construct ref[const] access and also participate
in multiple-equalities.
This means that utf8mb3_key_col can be used for key-lookups when
compared with an utf8mb4 constant, field or expression using '=' or
'<=>' comparison operators.
This is controlled by optimizer_switch='cset_narrowing=on', which is
OFF by default.
IMPLEMENTATION
Item value comparison in (MB3-4-CMP) is done using utf8mb4_general_ci.
This is valid as any utf8mb3 value is also an utf8mb4 value.
When making index lookup value for utf8mb3_key_col, we do "Charset
Narrowing": characters that are in the Basic Multilingual Plane (=BMP) are
copied as-is, as they can be represented in utf8mb3. Characters that are
outside the BMP cannot be represented in utf8mb3 and are replaced
with U+FFFD, the "Replacement Character".
In utf8mb4_general_ci, the Replacement Character compares as equal to any
character that's not in BMP. Because of this, the constructed lookup value
will find all index records that would be considered equal by the original
condition (MB3-4-CMP).
Approved-by: Monty <monty@mariadb.org>
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.
Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue
Disabled test:
- spider/bugfix.mdev_27239 because we started to get
+Error 1429 Unable to connect to foreign data source: localhost
-Error 1158 Got an error reading communication packets
- main.delayed
- Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
This part is disabled for now as it fails randomly with different
warnings/errors (no corruption).
The crash inside my_vsnprintf_utf32() happened correctly,
because the caller methods:
Field_string::sql_rpl_type()
Field_varstring::sql_rpl_type()
mis-used the charset library and sent pure ASCII data to the
virtual function snprintf() of a utf32 CHARSET_INFO.
It was wrong to use Field::charset() in sql_rpl_type().
We're printing the metadata (the data type) here, not the column data.
The string contraining the data type of a CHAR/VARCHAR column
is a pure ASCII string.
Fixing to use res->charset() to print, like all virtual implementations
of sql_type() do.
Review was done by Andrei Elkin.
Thanks to Andrei for proposing MTR test improvents.
There where several reasons why the test failed:
- Constructors for Field_double and Field_float changed an argument
to the constructor instead of a the correct class variable.
- gcc 7.5.0 produced wrong code when inlining Field_double constructor
into Field_test_double constructor.
Fixed by changing the correct class variable and make the constructors
not inline to go around the gcc bug.
Raise notes if indexes cannot be used:
- in case of data type or collation mismatch (diferent error messages).
- in case if a table field was replaced to something else
(e.g. Item_func_conv_charset) during a condition rewrite.
Added option to write warnings and notes to the slow query log for
slow queries.
New variables added/changed:
- note_verbosity, with is a set of the following options:
basic - All old notes
unusable_keys - Print warnings about keys that cannot be used
for select, delete or update.
explain - Print unusable_keys warnings for EXPLAIN querys.
The default is 'basic,explain'. This means that for old installations
the only notable new behavior is that one will get notes about
unusable keys when one does an EXPLAIN for a query. One can turn all
of all notes by either setting note_verbosity to "" or setting sql_notes=0.
- log_slow_verbosity has a new option 'warnings'. If this is set
then warnings and notes generated are printed in the slow query log
(up to log_slow_max_warnings times per statement).
- log_slow_max_warnings - Max number of warnings written to
slow query log.
Other things:
- One can now use =ALL for any 'set' variable to set all options at once.
For example using "note_verbosity=ALL" in a config file or
"SET @@note_verbosity=ALL' in SQL.
- mysqldump will in the future use @@note_verbosity=""' instead of
@sql_notes=0 to disable notes.
- Added "enum class Data_type_compatibility" and changing the return type
of all Field::can_optimize*() methods from "bool" to this new data type.
Reviewer & Co-author: Alexander Barkov <bar@mariadb.com>
- The code that prints out the notes comes mainly from Alexander
remove old deprecation helpers that were not used anywhere.
create new deprecation helpers and enforce their usage
this also removes inconsistencies in reporting deprecation:
sometimes it was ER_WARN_DEPRECATED_SYNTAX (1287),
sometimes ER_WARN_DEPRECATED_SYNTAX_NO_REPLACEMENT (1681),
sometimes a warning, sometimes a note.
it should always be
* ER_WARN_DEPRECATED_SYNTAX
* a warning (because it's something actionable, not purely informational)
There are two functions to extract a Field::val_str() value
as a LEX_STRING or LEX_CSTRING pointing to the data allocated on a MEM_ROOT:
char *get_field(MEM_ROOT *mem, Field *field);
bool get_field(MEM_ROOT *mem, Field *field, class String *res);
The first function requires strlen() calls to make a LEX_CSTRING/LEX_STRING.
The second function requires a redundant String buffer,
which is used only as a temporary proxy value pointing to a MEM_ROOT fragment
(and does not use any String dynamic allocation methods).
This patch add a native way to extract a Field::val_str() value
as a LEX_STRING or LEX_CSTRING pointing to a MEM_ROOT fragment.
It helps to remove redundant strlen() calls and redundant String buffers.
- Adding a new method:
LEX_STRING Field::val_lex_string_strmake(MEM_ROOT *mem);
- Reusing the new method Field::val_lex_string_strmake() in;
bool get_field(MEM_ROOT *mem, Field *field, String *res);
Also, moving it from table.cc to a static function in sql_help.cc.
It is used in sql_help.cc only, and we don't want it to be reused
in other parts of the code (to avoid redundant String buffers).
- Reusing the new method Field::val_lex_string_strmake() in this function:
char *get_field(MEM_ROOT *mem, Field *field);
- Replacing get_field() to Field::val_lex_string_strmake() in these files:
sql_plugin.cc (redundant String buffers were removed)
sql_udf.cc (redundant strlen() calls were removed)
Note, this function:
char *get_field(MEM_ROOT *mem, Field *field);
is still used in a number of files:
event_data_objects.cc
event_db_repository.cc
sql_acl.cc
sql_servers.cc
These remaining calls will be removed by separate patches,
and get_field() will be removed after that.
This commits enables reloading of engine-independent statistics
without flushing the table from table definition cache.
This is achieved by allowing multiple version of the
TABLE_STATISTICS_CB object and having independent pointers to it in
TABLE and TABLE_SHARE. The TABLE_STATISTICS_CB object have reference
pointers and are freed when no one is pointing to it anymore.
TABLE's TABLE_STATISTICS_CB pointer is updated to use the
TABLE_SHARE's pointer when read_statistics_for_tables() is called at
the beginning of a query.
Main changes:
- read_statistics_for_table() will allocate an new TABLE_STATISTICS_CB
object.
- All get_stat_values() functions has a new parameter that tells
where collected data should be stored. get_stat_values() are not
using the table_field object anymore to store data.
- All get_stat_values() functions returns 1 if they found any
data in the statistics tables.
Other things:
- Fixed INSERT DELAYED to not read statistics tables.
- Removed Statistics_state from TABLE_STATISTICS_CB as this is not
needed anymore as wer are not changing TABLE_SHARE->stats_cb while
calculating or loading statistics.
- Store values used with store_from_statistical_minmax_field() in
TABLE_STATISTICS_CB::mem_root. This allowed me to remove the function
delete_stat_values_for_table_share().
- Field_blob::store_from_statistical_minmax_field() is implemented
but is not normally used as we do not yet support EIS statistics
for blobs. For example Field_blob::update_min() and
Field_blob::update_max() are not implemented.
Note that the function can be called if there is an concurrent
"ALTER TABLE MODIFY field BLOB" running because of a bug in
ALTER TABLE where it deletes entries from column_stats
before it has an exclusive lock on the table.
- Use result of field->val_str(&val) as a pointer to the result
instead of val (safetly fix).
- Allocate memory for collected statistics in THD::mem_root, not in
in TABLE::mem_root. This could cause the TABLE object to grow if a
ANALYZE TABLE was run many times on the same table.
This was done in allocate_statistics_for_table(),
create_min_max_statistical_fields_for_table() and
create_min_max_statistical_fields_for_table_share().
- Store in TABLE_STATISTICS_CB::stats_available which statistics was
found in the statistics tables.
- Removed index_table from class Index_prefix_calc as it was not used.
- Added TABLE_SHARE::LOCK_statistics to ensure we don't load EITS
in parallel. First thread will load it, others will reuse the
loaded data.
- Eliminate read_histograms_for_table(). The loading happens within
read_statistics_for_tables() if histograms are needed.
One downside is that if we have read statistics without histograms
before and someone requires histograms, we have to read all statistics
again (once) from the statistics tables.
A smaller downside is the need to call alloc_root() for each
individual histogram. Before we could allocate all the space for
histograms with a single alloc_root.
- Fixed bug in MyISAM and Aria where they did not properly notice
that table had changed after analyze table. This was not a problem
before this patch as then the MyISAM and Aria tables where flushed
as part of ANALYZE table which did hide this issue.
- Fixed a bug in ANALYZE table where table->records could be seen as 0
in collect_statistics_for_table(). The effect of this unlikely bug
was that a full table scan could be done even if
analyze_sample_percentage was not set to 1.
- Changed multiple mallocs in a row to use multi_alloc_root().
- Added a mutex protection in update_statistics_for_table() to ensure
that several tables are not updating the statistics at the same time.
Some of the changes in sql_statistics.cc are based on a patch from
Oleg Smirnov <olernov@gmail.com>
Co-authored-by: Oleg Smirnov <olernov@gmail.com>
Co-authored-by: Vicentiu Ciorbaru <cvicentiu@gmail.com>
Reviewer: Sergei Petrunia <sergey@mariadb.com>