The failing reason was inconsistent truncation rules: the value of virtual
column could have been evaluated to '2000' sometimes instead of '0000' for
value 'a'.
The reason why `c YEAR AS ('aaaa')` was not evaluated same is that len=4 is
a special case insidew Field_year::store.
The correct fix is: always evaluate a bad value to 0000 instead 2000.
The truncated values should be evaluated as usual.
$support_virtual_index is finally changed to 1 in gcol.gcol_ins_upd_innodb,
which is also enough for testing.
The test from original bug report is also added.
also avoid an oxymoron of using `MYSQL_PLUGIN_IMPORT` under
`#ifdef MYSQL_SERVER`, and empty_clex_str is so trivial that a plugin
can define it if needed.
- Removed Tokudb (no need to test this anymore with valgrind)
- Added __attribute__(unused)) to a few places to be able to compile even
if valgrind/memcheck.h is not installed.
Reviewer: Marko Mäkelä <marko.makela@mariadb.com>
This change removed 68 explict strlen() calls from the code.
The following renames was done to ensure we don't use the old names
when merging code from earlier releases, as using the new variables
for print function could result in crashes:
- charset->csname renamed to charset->cs_name
- charset->name renamed to charset->coll_name
Almost everything where mechanical changes except:
- Changed to use the new Protocol::store(LEX_CSTRING..) when possible
- Changed to use field->store(LEX_CSTRING*, CHARSET_INFO*) when possible
- Changed to use String->append(LEX_CSTRING&) when possible
Other things:
- There where compiler issues with ensuring that all character set names
points to the same string: gcc doesn't allow one to use integer constants
when defining global structures (constant char * pointers works fine).
To get around this, I declared defines for each character set name
length.
Changes:
- To detect automatic strlen() I removed the methods in String that
uses 'const char *' without a length:
- String::append(const char*)
- Binary_string(const char *str)
- String(const char *str, CHARSET_INFO *cs)
- append_for_single_quote(const char *)
All usage of append(const char*) is changed to either use
String::append(char), String::append(const char*, size_t length) or
String::append(LEX_CSTRING)
- Added STRING_WITH_LEN() around constant string arguments to
String::append()
- Added overflow argument to escape_string_for_mysql() and
escape_quotes_for_mysql() instead of returning (size_t) -1 on overflow.
This was needed as most usage of the above functions never tested the
result for -1 and would have given wrong results or crashes in case
of overflows.
- Added Item_func_or_sum::func_name_cstring(), which returns LEX_CSTRING.
Changed all Item_func::func_name()'s to func_name_cstring()'s.
The old Item_func_or_sum::func_name() is now an inline function that
returns func_name_cstring().str.
- Changed Item::mode_name() and Item::func_name_ext() to return
LEX_CSTRING.
- Changed for some functions the name argument from const char * to
to const LEX_CSTRING &:
- Item::Item_func_fix_attributes()
- Item::check_type_...()
- Type_std_attributes::agg_item_collations()
- Type_std_attributes::agg_item_set_converter()
- Type_std_attributes::agg_arg_charsets...()
- Type_handler_hybrid_field_type::aggregate_for_result()
- Type_handler_geometry::check_type_geom_or_binary()
- Type_handler::Item_func_or_sum_illegal_param()
- Predicant_to_list_comparator::add_value_skip_null()
- Predicant_to_list_comparator::add_value()
- cmp_item_row::prepare_comparators()
- cmp_item_row::aggregate_row_elements_for_comparison()
- Cursor_ref::print_func()
- Removes String_space() as it was only used in one cases and that
could be simplified to not use String_space(), thanks to the fixed
my_vsnprintf().
- Added some const LEX_CSTRING's for common strings:
- NULL_clex_str, DATA_clex_str, INDEX_clex_str.
- Changed primary_key_name to a LEX_CSTRING
- Renamed String::set_quick() to String::set_buffer_if_not_allocated() to
clarify what the function really does.
- Rename of protocol function:
bool store(const char *from, CHARSET_INFO *cs) to
bool store_string_or_null(const char *from, CHARSET_INFO *cs).
This was done to both clarify the difference between this 'store' function
and also to make it easier to find unoptimal usage of store() calls.
- Added Protocol::store(const LEX_CSTRING*, CHARSET_INFO*)
- Changed some 'const char*' arrays to instead be of type LEX_CSTRING.
- class Item_func_units now used LEX_CSTRING for name.
Other things:
- Fixed a bug in mysql.cc:construct_prompt() where a wrong escape character
in the prompt would cause some part of the prompt to be duplicated.
- Fixed a lot of instances where the length of the argument to
append is known or easily obtain but was not used.
- Removed some not needed 'virtual' definition for functions that was
inherited from the parent. I added override to these.
- Fixed Ordered_key::print() to preallocate needed buffer. Old code could
case memory overruns.
- Simplified some loops when adding char * to a String with delimiters.
- Changed order of class fields to remove dead alignment space.
- Changed bool fields in Item to bit fields.
- Used packed enum's for some fields in common classes
- Removed not used Item::rsize.
- Changed some class variables from uint/int to smaller type int's.
- Ensured that field_index is uint16 in all classes and functions. Fixed
also that we proparly compare with NO_CACHED_FIELD_INDEX when checking
if variable is not set.
- Removed checking of highest bit of unireg_check (has not been used in
a long time)
- Fixed wrong arguments to make_cond_for_table() for join_tab_idx_arg
from false to 0.
One of the result was reducing the size if class Item with ~24 bytes
aspects of decimals and integers
For fields and Item's uint8 should be good enough. After
discussions with Alexander Barkov we choose uint16 (for now)
as some format functions may accept +256 digits.
The reason for this patch was to make the usage and storage of decimal
digits simlar. Before this patch decimals was stored/used as uint8,
int and uint. The lengths for numbers where also using a lot of
different types.
Changed most decimal variables and functions to use the new typedef.
squash! af7f09106b6c1dc20ae8c480bff6fd22d266b184
Use decimal_digits_t for all aspects of digits (total, precision
and scale), both for decimals and integers.
Problem:
The problem happened because of a conceptual flaw in the server code:
a. The table level CHARSET/COLLATE clause affected all data types,
including numeric and temporal ones:
CREATE TABLE t1 (a INT) CHARACTER SET utf8 [COLLATE utf8_general_ci];
In the above example, the Column_definition_attributes
(and then the FRM record) for the column "a" erroneously inherited
"utf8" as its character set.
b. The "ALTER TABLE t1 CONVERT TO CHARACTER SET csname" statement
also erroneously affected Column_definition_attributes::charset
for numeric and temporal data types and wrote "csname" as their
character set into FRM files.
So now we have arbitrary non-relevant charset ID values for numeric
and temporal data types in all FRM files in the world :)
The code in the server and the other engines did not seem to be affected
by this flaw. Only InnoDB inplace ALTER was affected.
Solution:
Fixing the code in the way that only character string data types
(CHAR,VARCHAR,TEXT,ENUM,SET):
- inherit the table level CHARSET/COLLATE clause
- get the charset value according to "CONVERT TO CHARACTER SET csname".
Numeric and temporal data types now always get &my_charset_numeric
in Column_definition_attributes::charset and always write its ID into FRM files:
- no matter what the table level CHARSET/COLLATE clause is, and
- no matter what "CONVERT TO CHARACTER SET" says.
Details:
1. Adding helper classes to pass small parts of HA_CREATE_INFO
into Type_handler methods:
- Column_derived_attributes - to pass table level CHARSET/COLLATE,
so columns that do not have explicit CHARSET/COLLATE clauses
can derive them from the table level, e.g.
CREATE TABLE t1 (a VARCHAR(1), b CHAR(1)) CHARACTER SET utf8;
- Column_bulk_alter_attributes - to pass bulk attribute changes
generated by the ALTER related code. These bulk changes affect
multiple columns at the same time:
ALTER TABLE ... CONVERT TO CHARACTER SET csname;
Note, passing the whole HA_CREATE_INFO directly to Type_handler
would not be good: HA_CREATE_INFO is huge and would need not desired
dependencies in sql_type.h and sql_type.cc. The Type_handler API should
use smallest possible data types!
2. Type_handler::Column_definition_prepare_stage1() is now responsible
to set Column_definition::charset properly, according to the data type,
for example:
- For string data types, Column_definition_attributes::charset is set from
the table level CHARSET/COLLATE clause (if not specified explicitly in
the column definition).
- For numeric and temporal fields, Column_definition_attributes::charset is
set to &my_charset_numeric, no matter what the table level
CHARSET/COLLATE says.
- For GEOMETRY, Column_definition_attributes::charset is set to
&my_charset_bin, no matter what the table level CHARSET/COLLATE says.
Previously this code (setting `charset`) was outside of of
Column_definition_prepare_stage1(), namely in
mysql_prepare_create_table(), and was erroneously called for
all data types.
3. Adding Type_handler::Column_definition_bulk_alter(), to handle
"ALTER TABLE .. CONVERT TO". Previously this code was inside
get_sql_field_charset() and was erroneously called for all data types.
4. Removing the Schema_specification_st parameter from
Type_handler::Column_definition_redefine_stage1().
Column_definition_attributes::charset is now fully properly initialized by
Column_definition_prepare_stage1(). So we don't need access to the
table level CHARSET/COLLATE clause in Column_definition_redefine_stage1()
any more.
5. Other changes:
- Removing global function get_sql_field_charset()
- Moving the part of the former get_sql_field_charset(), which was
responsible to inherit the table level CHARSET/COLLATE clause to
new methods:
-- Column_definition_attributes::explicit_or_derived_charset() and
-- Column_definition::prepare_charset_for_string().
This code is only needed for string data types.
Previously it was erroneously called for all data types.
- Moving another part, which was responsible to apply the
"CONVERT TO" clause, to
Type_handler_general_purpose_string::Column_definition_bulk_alter().
- Replacing the call for get_sql_field_charset() in sql_partition.cc
to sql_field->explicit_or_derived_charset() - it is perfectly enough.
The old code was redundant: get_sql_field_charset() was called from
sql_partition.cc only when there were no a "CONVERT TO CHARACTER SET"
clause involved, so its purpose was only to inherit the table
level CHARSET/COLLATE clause.
- Moving the code handling the BINCMP_FLAG flag from
mysql_prepare_create_table() to
Column_definition::prepare_charset_for_string():
This code is responsible to resolve the BINARY comparison style
into the corresponding _bin collation, to do the following transparent
rewrite:
CREATE TABLE t1 (a VARCHAR(10) BINARY) CHARSET utf8; ->
CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8 COLLATE utf8_bin);
This code is only needed for string data types.
Previously it was erroneously called for all data types.
6. Renaming Table_scope_and_contents_source_pod_st::table_charset
to alter_table_convert_to_charset, because the only purpose it's used for
is handlering "ALTER .. CONVERT". The new name is much more self-descriptive.
HAVE_valgrind_or_MSAN to HAVE_valgrind was incorrect in
af784385b4a2b286000fa2c658e34283fe7bba60.
In my_valgrind.h when clang exists (hence no __has_feature(memory_sanitizer),
and -DWITH_VALGRIND=1, but without memcheck.h, we end up with a MEM_CHECK_DEFINED
being empty.
If we are also doing a CMAKE_BUILD_TYPE=Debug this results a number of
[-Werror,-Wunused-variable] errors because MEM_CHECK_DEFINED is empty.
With MEM_CHECK_DEFINED empty, there becomes no uses of this of the
fixed field and innodb variables in this patch.
So we stop using HAVE_valgrind as catchall and use the name
HAVE_CHECK_MEM to indicate that a CHECK_MEM_DEFINED function exists.
Reviewer: Monty
Corrects: af784385b4a2b286000fa2c658e34283fe7bba60
The assertion failed in handler::ha_reset upon SELECT under
READ UNCOMMITTED from table with index on virtual column.
This was the debug-only failure, though the problem is mush wider:
* MY_BITMAP is a structure containing my_bitmap_map, the latter is a raw
bitmap.
* read_set, write_set and vcol_set of TABLE are the pointers to MY_BITMAP
* The rest of MY_BITMAPs are stored in TABLE and TABLE_SHARE
* The pointers to the stored MY_BITMAPs, like orig_read_set etc, and
sometimes all_set and tmp_set, are assigned to the pointers.
* Sometimes tmp_use_all_columns is used to substitute the raw bitmap
directly with all_set.bitmap
* Sometimes even bitmaps are directly modified, like in
TABLE::update_virtual_field(): bitmap_clear_all(&tmp_set) is called.
The last three bullets in the list, when used together (which is mostly
always) make the program flow cumbersome and impossible to follow,
notwithstanding the errors they cause, like this MDEV-17556, where tmp_set
pointer was assigned to read_set, write_set and vcol_set, then its bitmap
was substituted with all_set.bitmap by dbug_tmp_use_all_columns() call,
and then bitmap_clear_all(&tmp_set) was applied to all this.
To untangle this knot, the rule should be applied:
* Never substitute bitmaps! This patch is about this.
orig_*, all_set bitmaps are never substituted already.
This patch changes the following function prototypes:
* tmp_use_all_columns, dbug_tmp_use_all_columns
to accept MY_BITMAP** and to return MY_BITMAP * instead of my_bitmap_map*
* tmp_restore_column_map, dbug_tmp_restore_column_maps to accept
MY_BITMAP* instead of my_bitmap_map*
These functions now will substitute read_set/write_set/vcol_set directly,
and won't touch underlying bitmaps.
The assertion failed in handler::ha_reset upon SELECT under
READ UNCOMMITTED from table with index on virtual column.
This was the debug-only failure, though the problem is mush wider:
* MY_BITMAP is a structure containing my_bitmap_map, the latter is a raw
bitmap.
* read_set, write_set and vcol_set of TABLE are the pointers to MY_BITMAP
* The rest of MY_BITMAPs are stored in TABLE and TABLE_SHARE
* The pointers to the stored MY_BITMAPs, like orig_read_set etc, and
sometimes all_set and tmp_set, are assigned to the pointers.
* Sometimes tmp_use_all_columns is used to substitute the raw bitmap
directly with all_set.bitmap
* Sometimes even bitmaps are directly modified, like in
TABLE::update_virtual_field(): bitmap_clear_all(&tmp_set) is called.
The last three bullets in the list, when used together (which is mostly
always) make the program flow cumbersome and impossible to follow,
notwithstanding the errors they cause, like this MDEV-17556, where tmp_set
pointer was assigned to read_set, write_set and vcol_set, then its bitmap
was substituted with all_set.bitmap by dbug_tmp_use_all_columns() call,
and then bitmap_clear_all(&tmp_set) was applied to all this.
To untangle this knot, the rule should be applied:
* Never substitute bitmaps! This patch is about this.
orig_*, all_set bitmaps are never substituted already.
This patch changes the following function prototypes:
* tmp_use_all_columns, dbug_tmp_use_all_columns
to accept MY_BITMAP** and to return MY_BITMAP * instead of my_bitmap_map*
* tmp_restore_column_map, dbug_tmp_restore_column_maps to accept
MY_BITMAP* instead of my_bitmap_map*
These functions now will substitute read_set/write_set/vcol_set directly,
and won't touch underlying bitmaps.
This follows up commit
commit 94a520ddbe39ae97de1135d98699cf2674e6b77e and
commit 7c5519c12d46ead947d341cbdcbb6fbbe4d4fe1b.
After these changes, the default test suites on a
cmake -DWITH_UBSAN=ON build no longer fail due to passing
null pointers as parameters that are declared to never be null,
but plenty of other runtime errors remain.
Problem:
Queries like this showed performance degratation in 10.4 over 10.3:
SELECT temporal_literal FROM t1;
SELECT temporal_literal + 1 FROM t1;
SELECT COUNT(*) FROM t1 WHERE temporal_column = temporal_literal;
SELECT COUNT(*) FROM t1 WHERE temporal_column = string_literal;
Fix:
Replacing the universal member "MYSQL_TIME cached_time" in
Item_temporal_literal to data type specific containers:
- Date in Item_date_literal
- Time in Item_time_literal
- Datetime in Item_datetime_literal
This restores the performance, and make it even better in some cases.
See benchmark results in MDEV.
Also, this change makes futher separations of Date, Time, Datetime
from each other, which will make it possible not to derive them from
a too heavy (40 bytes) MYSQL_TIME, and replace them to smaller data
type specific containers.
Implementing methods:
- Field::val_time_packed()
- Field::val_datetime_packed()
- Item_field::val_datetime_packed(THD *thd);
- Item_field::val_time_packed(THD *thd);
to give a faster access to temporal packed longlong representation of a Field,
which is used in temporal Arg_comparator's to DATE, TIME, DATETIME data types.
The same idea is used in MySQL-5.6+.
This improves performance.
Problem:
When calculatung MIN() and MAX() in a query with GROUP BY, like this:
SELECT MIN(time_expr), MAX(time_expr) FROM t1 GROUP BY i;
the code in Item_sum_min_max::update_field() erroneosly used
string format comparison, therefore '100:20:30' was considered as
smaller than '10:20:30'.
Fix:
1. Implementing low level "native" related methods in class Time:
Time::Time(const Native &native) - convert native to Time
Time::to_native(Native *to, uint decimals) - convert Time to native
The "native" binary representation for TIME is equal to
the binary data format of Field_timef, which is used to
store TIME when mysql56_temporal_format is ON (default).
2. Implementing Type_handler_time_common "native" related methods:
Type_handler_time_common::cmp_native()
Type_handler_time_common::Item_val_native_with_conversion()
Type_handler_time_common::Item_val_native_with_conversion_result()
Type_handler_time_common::Item_param_val_native()
3. Implementing missing "native representation" related methods
in Field_time and Field_timef:
Field_time::store_native()
Field_time::val_native()
Field_timef::store_native()
Field_timef::val_native()
4. Implementing missing "native" related methods in all Items
that can have the TIME data type:
Item_timefunc::val_native()
Item_name_const::val_native()
Item_time_literal::val_native()
Item_cache_time::val_native()
Item_handled_func::val_native()
5. Marking Type_handler_time_common as "native ready".
So now Item_sum_min_max::update_field() calculates
values using min_max_update_native_field(),
which uses native binary representation rather than string representation.
Before this change, only the TIMESTAMP data type used native
representation to calculate MIN() and MAX().
Benchmarks (see more details in MDEV):
This change not only fixes the wrong result, but also
makes a "SELECT .. MAX.. GROUP BY .." query faster:
# TIME(0)
CREATE TABLE t1 (id INT, time_col TIME) ENGINE=HEAP;
INSERT INTO t1 VALUES (1,'10:10:10'); -- repeat this 1m times
SELECT id, MAX(time_col) FROM t1 GROUP BY id;
MySQL80: 0.159 sec
10.3: 0.108 sec
10.4: 0.094 sec (fixed)
# TIME(6):
CREATE TABLE t1 (id INT, time_col TIME(6)) ENGINE=HEAP;
INSERT INTO t1 VALUES (1,'10:10:10.999999'); -- repeat this 1m times
SELECT id, MAX(time_col) FROM t1 GROUP BY id;
My80: 0.154
10.3: 0.135
10.4: 0.093 (fixed)
An alternative implementation (replacing the one based on repertoire).
This implementation makes Field send itself to Protocol_text using
data type specific Protocol methods rather than field->val_str()
followed by protocol_text->store_str().
As now Field sends itself in the same way to all protocol types
(e.g. Protocol_binary, Protocol_text, Protocol_local),
the method Field::send_binary() was renamed just to Field::send().
Note, this change introduces symmetry between Field and Item,
because Items also send themself using a single method Item::send(),
which is used for *all* protocol types.
Performance improvement is achieved by the fact that Protocol_text
implements these data type specific methods using store_numeric_string_aux()
rather than store_string_aux(). The conversion now happens only when
character_set_results is not ASCII compatible character sets
(e.g. UCS2, UTF16, UTF32).
In the old code (before any MDEV-23162 work, e.g. as of 10.5.4),
Protocol_text::store(Field*) used val_str() for all data types.
So the execution went through the character set conversion routines
even for numeric and temporal data types.
Benchmarking summary (see details in MDEV-23478):
The new approach stably demonstrates additional improvement comparing
to the previous implementation (the smaller time - the better):
Original - the commit before MDEV-23162
be98036f25ac8cfb34fa5bb5066975d79f595aec
1m9.336s
1m9.290s
1m9.300s
MDEV-23162 - the repertoire optimization
1m6.101s
1m5.988s
1m6.264s
MDEV-23478 - this commit
1m2.150s
1m2.079s
1m2.099s
and
MDEV-23414 Assertion `res->charset() == item->collation.collation' failed in Type_handler_string_result::make_packed_sort_key_part
pack_sort_string() *must* take a collation from the Item, not from the
String value. Because when casting a string to _binary the original
String is not copied for performance reasons, it's reused but its
collation does not match Item's collation anymore.
Note, that String's collation cannot be simply changed to _binary,
because for an Item_string literal the original String must stay
unchanged for the duration of the query.
this partially reverts 61c15ebe323
- Adding optional qualifiers to data types:
CREATE TABLE t1 (a schema.DATE);
Qualifiers now work only for three pre-defined schemas:
mariadb_schema
oracle_schema
maxdb_schema
These schemas are virtual (hard-coded) for now, but may turn into real
databases on disk in the future.
- mariadb_schema.TYPE now always resolves to a true MariaDB data
type TYPE without sql_mode specific translations.
- oracle_schema.DATE translates to MariaDB DATETIME.
- maxdb_schema.TIMESTAMP translates to MariaDB DATETIME.
- Fixing SHOW CREATE TABLE to use a qualifier for a data type TYPE
if the current sql_mode translates TYPE to something else.
The above changes fix the reported problem, so this script:
SET sql_mode=ORACLE;
CREATE TABLE t2 AS SELECT mariadb_date_column FROM t1;
is now replicated as:
SET sql_mode=ORACLE;
CREATE TABLE t2 (mariadb_date_column mariadb_schema.DATE);
and the slave can unambiguously treat DATE as the true MariaDB DATE
without ORACLE specific translation to DATETIME.
Similar,
SET sql_mode=MAXDB;
CREATE TABLE t2 AS SELECT mariadb_timestamp_column FROM t1;
is now replicated as:
SET sql_mode=MAXDB;
CREATE TABLE t2 (mariadb_timestamp_column mariadb_schema.TIMESTAMP);
so the slave treats TIMESTAMP as the true MariaDB TIMESTAMP
without MAXDB specific translation to DATETIME.