Problem:
When calculatung MIN() and MAX() in a query with GROUP BY, like this:
SELECT MIN(time_expr), MAX(time_expr) FROM t1 GROUP BY i;
the code in Item_sum_min_max::update_field() erroneosly used
string format comparison, therefore '100:20:30' was considered as
smaller than '10:20:30'.
Fix:
1. Implementing low level "native" related methods in class Time:
Time::Time(const Native &native) - convert native to Time
Time::to_native(Native *to, uint decimals) - convert Time to native
The "native" binary representation for TIME is equal to
the binary data format of Field_timef, which is used to
store TIME when mysql56_temporal_format is ON (default).
2. Implementing Type_handler_time_common "native" related methods:
Type_handler_time_common::cmp_native()
Type_handler_time_common::Item_val_native_with_conversion()
Type_handler_time_common::Item_val_native_with_conversion_result()
Type_handler_time_common::Item_param_val_native()
3. Implementing missing "native representation" related methods
in Field_time and Field_timef:
Field_time::store_native()
Field_time::val_native()
Field_timef::store_native()
Field_timef::val_native()
4. Implementing missing "native" related methods in all Items
that can have the TIME data type:
Item_timefunc::val_native()
Item_name_const::val_native()
Item_time_literal::val_native()
Item_cache_time::val_native()
Item_handled_func::val_native()
5. Marking Type_handler_time_common as "native ready".
So now Item_sum_min_max::update_field() calculates
values using min_max_update_native_field(),
which uses native binary representation rather than string representation.
Before this change, only the TIMESTAMP data type used native
representation to calculate MIN() and MAX().
Benchmarks (see more details in MDEV):
This change not only fixes the wrong result, but also
makes a "SELECT .. MAX.. GROUP BY .." query faster:
# TIME(0)
CREATE TABLE t1 (id INT, time_col TIME) ENGINE=HEAP;
INSERT INTO t1 VALUES (1,'10:10:10'); -- repeat this 1m times
SELECT id, MAX(time_col) FROM t1 GROUP BY id;
MySQL80: 0.159 sec
10.3: 0.108 sec
10.4: 0.094 sec (fixed)
# TIME(6):
CREATE TABLE t1 (id INT, time_col TIME(6)) ENGINE=HEAP;
INSERT INTO t1 VALUES (1,'10:10:10.999999'); -- repeat this 1m times
SELECT id, MAX(time_col) FROM t1 GROUP BY id;
My80: 0.154
10.3: 0.135
10.4: 0.093 (fixed)
Type_handler_temporal_result::Item_func_min_max_fix_attributes()
in an expression GREATEST(string,date), e.g:
SELECT GREATEST('1', CAST('2020-12-12' AS DATE));
incorrectly evaluated decimals as 6 (like for DATETIME).
Adding a separate virtual implementation:
Type_handler_date_common::Item_func_min_max_fix_attributes()
This makes the code simpler.
Changing that in case of *INT and hex hybrid input:
- ROUND(x,NULL) creates a column with the same type as x.
The old code created a DOUBLE column, which was not relevant at all.
This change simplifies the code a lot.
- ROUND(x,non_constant) creates a column of the INT, BIGINT or DECIMAL
data type (depending on the exact type of x).
The old code created a column of the DOUBLE data type,
which lead to precision loss. Hence MDEV-23366.
- ROUND(bigint_30,negative_constant) creates a column of the DECIMAL(30,0)
data type. The old code created DECIMAL(29,0), which looked strange:
the data type promoted to a higher one, but max length reduced.
Now the length attribute is preserved.
Item_func_round::fix_arg_int() did not take into account cases
when the result of ROUND(bigint_subject,negative_precision)
could go outside of the BIGINT range. The old code only incremented
max_length, but did not extend change the data type.
Fixing to extend the data type (together with max_length increment).
- Adding optional qualifiers to data types:
CREATE TABLE t1 (a schema.DATE);
Qualifiers now work only for three pre-defined schemas:
mariadb_schema
oracle_schema
maxdb_schema
These schemas are virtual (hard-coded) for now, but may turn into real
databases on disk in the future.
- mariadb_schema.TYPE now always resolves to a true MariaDB data
type TYPE without sql_mode specific translations.
- oracle_schema.DATE translates to MariaDB DATETIME.
- maxdb_schema.TIMESTAMP translates to MariaDB DATETIME.
- Fixing SHOW CREATE TABLE to use a qualifier for a data type TYPE
if the current sql_mode translates TYPE to something else.
The above changes fix the reported problem, so this script:
SET sql_mode=ORACLE;
CREATE TABLE t2 AS SELECT mariadb_date_column FROM t1;
is now replicated as:
SET sql_mode=ORACLE;
CREATE TABLE t2 (mariadb_date_column mariadb_schema.DATE);
and the slave can unambiguously treat DATE as the true MariaDB DATE
without ORACLE specific translation to DATETIME.
Similar,
SET sql_mode=MAXDB;
CREATE TABLE t2 AS SELECT mariadb_timestamp_column FROM t1;
is now replicated as:
SET sql_mode=MAXDB;
CREATE TABLE t2 (mariadb_timestamp_column mariadb_schema.TIMESTAMP);
so the slave treats TIMESTAMP as the true MariaDB TIMESTAMP
without MAXDB specific translation to DATETIME.
Fixing ROUND(date,0), TRUNCATE(date,x), FLOOR(date), CEILING(date)
to return the `int(8) unsigned` data type.
Details:
1. Cleanup: moving virtual implementations
- Type_handler_temporal_result::Item_func_int_val_fix_length_and_dec()
- Type_handler_temporal_result::Item_func_round_fix_length_and_dec()
to Type_handler_date_common. Other temporal data type handlers
override these methods anyway. So they were only DATE specific.
This change makes the code clearer.
2. Backporting DTCollation_numeric from 10.5, to reuse the code easier.
3. Adding the `preferred_attrs` argument to Item_func_round::fix_arg_int(). Now
Type_handler_xxx::Item_func_round_val_fix_length_and_dec() work as follows:
- The INT-alike and YEAR handlers copy preferred_attrs from args[0].
- The DATE handler passes explicit attributes, to get `int(8) unsigned`.
- The hex hybrid handler passes NULL, so fix_arg_int() calculates attributes.
4. Type_handler_date_common::Item_func_int_val_fix_length_and_dec()
now sets the type handler and attributes to get `int(8) unsigned`.
1. Fixing ROUND(x) and TRUNCATE(x,0) with TINYINT, SMALLINT, MEDIUMINT, BIGINT
input to preserve the exact data type of the argument when it's possible.
2. Fixing FLOOR(x) and CEILING(x) with TINYINT, SMALLINT, MEDIUMINT, BIGINT
to preserve the exact data type of the argument.
3. Adding dedicated Type_handler_year::Item_func_round_fix_length_and_dec()
to easier handle ROUND(x) and TRUNCATE(x,y) for the YEAR(2) and YEAR(4)
input. They still return INT(2) UNSIGNED and INT(4) UNSIGNED correspondingly,
as before.
Implementing dedicated fixing methods:
- Type_handler_bit::Item_func_round_fix_length_and_dec()
- Type_handler_bit::Item_func_int_val_fix_length_and_dec()
- Type_handler_typelib::Item_func_round_fix_length_and_dec()
because the inherited methods did not work well.
Fixing:
- Type_handler_typelib::Item_func_int_val_fix_length_and_dec
It did not work well, because it used args[0]->max_length to
calculate the result data type. In case of ENUM and SET it was
not correct, because in FLOOR() and CEILING() context
ENUM and SET return not more than 5 digits (65535 is the biggest
possible value).
Misc:
- Changing the API of
Type_handler_bit::Bit_decimal_notation_int_digits(const Item *item)
to a more generic form:
Type_handler_bit::Bit_decimal_notation_int_digits_by_nbits(uint nbits)
- Fixing Type_handler_bit::Bit_decimal_notation_int_digits_by_nbits() to
return the exact number of decimal digits for all nbits 1..64.
The old implementation was approximate.
This change gives better (more precise) data types.
- Type_handler_hex_hybrid did not override
Type_handler_string_result::Item_func_round_fix_length_and_dec(),
so the result type of ROUND(0xFFFFFFFFFFFFFFFF) was erroneously
calculated ad DOUBLE with a wrong length.
Overriding Item_func_round_fix_length_and_dec(), to calculated
the result type as INT/BIGINT.
Also, fixing Item_func_round::fix_arg_int() to use
args[0]->decimal_precision() instead of args[0]->max_length
when calculating this->max_length, to get a correct result
for hex hybrids.
- Type_handler_hex_hybrid::Item_func_int_val_fix_length_and_dec()
called item->fix_length_and_dec_int_or_decimal(), which did not
produce a correct result data type for hex hybrid.
Implementing a dedicated code instead, to return INT UNSIGNED or
BIGINT UNSIGNED depending in the number of digits in the arguments.
Item_func_div::fix_length_and_dec_temporal() set the return data type to
integer in case of @div_precision_increment==0 for temporal input with FSP=0.
This caused Item_func_div to call int_op(), which is not implemented,
so a crash on DBUG_ASSERT(0) happened.
Fixing fix_length_and_dec_temporal() to set the result type to DECIMAL.
The bug was that when long item-strings was converted to VARCHAR,
type_handler::string_type_handler() didn't take into account max
VARCHAR length. The resulting Aria temporary table was created with
a VARCHAR field of length 1 when it should have been 65537. This caused
MariaDB to send impossible records to ma_write() and Aria reported
eventually the table as crashed.
Fixed by updating Type_handler::string_type_handler() to not create too long
VARCHAR fields. To make things extra safe, I also added checks in when
writing dynamic Aria records to ensure we find the wrong record during write
instead of during read.
Make Field::is_equal() const and return bool as it's a naturally fitting
type for it. Also it's agrument was narrowed to Column_definition.
InnoDB can change type of some columns by itself. InnoDB-specific code used to
reside in Field_xxx:is_equal() methods. Now engine-specific stuff was
moved to a virtual methods of handler::can_convert{string,varstring,blob,geom}.
These methods are called by Field::can_be_converted_by_engine() which is a
double dispatch pattern.
Some InnoDB-specific code still resides in compare_keys_but_name(). It should
be moved from here someday to handler::compare_key_parts(...) or similar.
IS_EQUAL_WITH_REINTERPRET_COMPATIBLE_CHARSET
IS_EQUAL_WITH_REINTERPRET_COMPATIBLE_CHARSET_BUT_COLLATE: both was removed
IS_EQUAL_NO, IS_EQUAL_YES are not needed now and should be removed
along with deprecated handler::check_if_incompatible_data().
HA_EXTENDED_TYPES_CONVERSION: was removed as such logic is not needed now by
server code.
ALTER_COLUMN_EQUAL_PACK_LENGTH: was renamed to a more generic
ALTER_COLUMN_TYPE_CHANGE_BY_ENGINE
Patch is about two cases:
1) On some collate changes it's possible to rebuild only secondary indexes
2) For non-indexed columns collate can be changed INSTANTly
Implemented mostly in Field_{string,varstring,blob}::is_equal().
Make this method return how exactly collationa differs.
This information is later used by fill_alter_inplace_info() to pass
correct info to engine.
This patch fixes:
- MDEV-19284 INSTANT ALTER with ucs2-to-utf16 conversion produces bad data
- MDEV-19285 INSTANT ALTER from ascii_general_ci to latin1_general_ci produces corrupt data
These regressions were introduced in 10.4.3 by:
- MDEV-15564 Avoid table rebuild in ALTER TABLE on collation or charset changes
Changes:
1. Cleanup: Adding a helper method
Field_longstr::csinfo_change_allows_instant_alter(),
to remove some duplicate code in field.cc.
2. Cleanup: removing Type_handler::Charsets_are_compatible() and static
function charsets_are_compatible() and
introducing new methods in the recently added class Charset instead:
- encoding_allows_reinterpret_as()
- encoding_and_order_allow_reinterpret_as()
3. Bug fix: Removing the code that allowed instant conversion for
ascii-to->8bit and ucs2-to->utf16.
This actually fixes MDEV-19284 and MDEV-19285.
4. Bug fix: Adding a helper method Charset::collation_specific_name().
The old corresponding code in Type_handler::Charsets_are_compatible()
was not safe against (badly named) user-defined collations whose
character set name can be longer than collation name.
This also fixes:
MDEV-17299 Assertion `maybe_null' failed in make_sortkey
Note, during merge of the 10.1 version of MDEV-17299,
please use the 10.3 version of the code (i.e. null merge the 10.1 version).
1. Renaming Type_handler_json to Type_handler_json_longtext
There will be other JSON handlers soon, e.g. Type_handler_json_varchar.
2. Making the code more symmetric for data types:
- Adding a new virtual method
Type_handler::Column_definition_validate_check_constraint()
- Moving JSON-specific code from sql_yacc.yy to
Type_handler_json_longtext::Column_definition_validate_check_constraint()
3. Adding new files sql_type_json.cc and sql_type_json.h
and moving Type_handler+JSON related code into these files.
Allow ALGORITHM=INSTANT (or avoid touching any data)
when changing the collation, or in some cases, the character set,
of a non-indexed CHAR or VARCHAR column. There is no penalty
for subsequent DDL or DML operations, and compatibility with
older MariaDB versions will be unaffected.
Character sets may be changed when the old encoding is compatible
with the new one. For example, changing from ASCII to anything
ASCII-based, or from 3-byte to 4-byte UTF-8 can sometimes be
performed instantly.
This is joint work with Eugene Kosov.
The test cases as well as ALTER_CONVERT_TO, charsets_are_compatible(),
Type_handler::Charsets_are_compatible() are his work.
The Field_str::is_equal(), Field_varstring::is_equal() and
the InnoDB changes were mostly rewritten by me due to conflicts
with MDEV-15563.
Limitations:
Changes of indexed columns will still require
ALGORITHM=COPY. We should allow ALGORITHM=NOCOPY and allow
the indexes to be rebuilt inside the storage engine,
without copying the entire table.
Instant column size changes (in bytes) are not supported by
all storage engines.
Instant CHAR column changes are only allowed for InnoDB
ROW_FORMAT=REDUNDANT. We could allow this for InnoDB
when the CHAR internally uses a variable-length encoding,
say, when converting from 3-byte UTF-8 to 4-byte UTF-8.
Instant VARCHAR column changes are allowed for InnoDB
ROW_FORMAT=REDUNDANT, and for others only if the size
in bytes does not change from 128..255 bytes to more
than 256 bytes.
Inside InnoDB, this slightly changes the way how MDEV-15563
works and fixes the result of the innodb.instant_alter_extend test.
We change the way how ALTER_COLUMN_EQUAL_PACK_LENGTH_EXT
is handled. All column extension, type changes and renaming
now go through a common route, except when ctx->is_instant()
is in effect, for example, instant ADD or DROP COLUMN has
been initiated. Only in that case we will go through
innobase_instant_try() and rewrite all column metadata.
get_type(field, prtype, mtype, len): Convert a SQL data type into
InnoDB column metadata.
innobase_rename_column_try(): Remove the update of SYS_COLUMNS.
innobase_rename_or_enlarge_column_try(): New function,
replacing part of innobase_rename_column_try() and all of
innobase_enlarge_column_try(). Also changes column types.
innobase_rename_or_enlarge_columns_cache(): Also change
the column type.
When creating a field of type JSON, it will be automatically
converted to TEXT with CHECK (json_valid(`a`)), if there wasn't any
previous check for the column.
Additional things:
- Added two bug fixes that was found while testing JSON. These bug
fixes has also been pushed to 10.3 (with a test case), but as they
where minimal and needed to get this task done and tested, the fixes
are repeated here.
- CREATE TABLE ... SELECT drops constraints for columns that
are both in the create and select part.
- If one has both a default expression and check constraint for a
column, one can get the error "Expression for field `a` is refering
to uninitialized field `a`.
- Removed some duplicate MYSQL_PLUGIN_IMPORT symbols