SELECT DISTINCT did not work with expressions with sum functions.
Distinct was only done on the values stored in the intermediate temporary
tables, which only stored the value of each sum function.
In other words:
SELECT DISTINCT sum(a),sum(b),avg(c) ... worked.
SELECT DISTINCT sum(a),sum(b) > 2,sum(c)+sum(d) would not work.
The later query would do ONLY apply distinct on the sum(a) part.
Reviewer: Sergei Petrunia <sergey@mariadb.com>
This was fixed by extending remove_dup_with_hash_index() and
remove_dup_with_compare() to take into account the columns in the result
list that where not stored in the temporary table.
Note that in many cases the above dup removal functions are not used as
the optimizer may be able to either remove duplicates early or it will
discover that duplicate remove is not needed. The later happens for
example if the group by fields is part of the result.
Other things:
- Backported from 11.0 the change of Sort_param.tmp_buffer from char* to
String.
- Changed Type_handler::make_sort_key() to take String as a parameter
instead of Sort_param. This was done to allow make_sort_key() functions
to be reused by distinct elimination functions.
This makes Type_handler_string_result::make_sort_key() similar to code
in 11.0
- Simplied error handling in remove_dup_with_compare() to remove code
duplication.
This patch is the result of running
run-clang-tidy -fix -header-filter=.* -checks='-*,modernize-use-equals-default' .
Code style changes have been done on top. The result of this change
leads to the following improvements:
1. Binary size reduction.
* For a -DBUILD_CONFIG=mysql_release build, the binary size is reduced by
~400kb.
* A raw -DCMAKE_BUILD_TYPE=Release reduces the binary size by ~1.4kb.
2. Compiler can better understand the intent of the code, thus it leads
to more optimization possibilities. Additionally it enabled detecting
unused variables that had an empty default constructor but not marked
so explicitly.
Particular change required following this patch in sql/opt_range.cc
result_keys, an unused template class Bitmap now correctly issues
unused variable warnings.
Setting Bitmap template class constructor to default allows the compiler
to identify that there are no side-effects when instantiating the class.
Previously the compiler could not issue the warning as it assumed Bitmap
class (being a template) would not be performing a NO-OP for its default
constructor. This prevented the "unused variable warning".
Problem:
The problem happened because of a conceptual flaw in the server code:
a. The table level CHARSET/COLLATE clause affected all data types,
including numeric and temporal ones:
CREATE TABLE t1 (a INT) CHARACTER SET utf8 [COLLATE utf8_general_ci];
In the above example, the Column_definition_attributes
(and then the FRM record) for the column "a" erroneously inherited
"utf8" as its character set.
b. The "ALTER TABLE t1 CONVERT TO CHARACTER SET csname" statement
also erroneously affected Column_definition_attributes::charset
for numeric and temporal data types and wrote "csname" as their
character set into FRM files.
So now we have arbitrary non-relevant charset ID values for numeric
and temporal data types in all FRM files in the world :)
The code in the server and the other engines did not seem to be affected
by this flaw. Only InnoDB inplace ALTER was affected.
Solution:
Fixing the code in the way that only character string data types
(CHAR,VARCHAR,TEXT,ENUM,SET):
- inherit the table level CHARSET/COLLATE clause
- get the charset value according to "CONVERT TO CHARACTER SET csname".
Numeric and temporal data types now always get &my_charset_numeric
in Column_definition_attributes::charset and always write its ID into FRM files:
- no matter what the table level CHARSET/COLLATE clause is, and
- no matter what "CONVERT TO CHARACTER SET" says.
Details:
1. Adding helper classes to pass small parts of HA_CREATE_INFO
into Type_handler methods:
- Column_derived_attributes - to pass table level CHARSET/COLLATE,
so columns that do not have explicit CHARSET/COLLATE clauses
can derive them from the table level, e.g.
CREATE TABLE t1 (a VARCHAR(1), b CHAR(1)) CHARACTER SET utf8;
- Column_bulk_alter_attributes - to pass bulk attribute changes
generated by the ALTER related code. These bulk changes affect
multiple columns at the same time:
ALTER TABLE ... CONVERT TO CHARACTER SET csname;
Note, passing the whole HA_CREATE_INFO directly to Type_handler
would not be good: HA_CREATE_INFO is huge and would need not desired
dependencies in sql_type.h and sql_type.cc. The Type_handler API should
use smallest possible data types!
2. Type_handler::Column_definition_prepare_stage1() is now responsible
to set Column_definition::charset properly, according to the data type,
for example:
- For string data types, Column_definition_attributes::charset is set from
the table level CHARSET/COLLATE clause (if not specified explicitly in
the column definition).
- For numeric and temporal fields, Column_definition_attributes::charset is
set to &my_charset_numeric, no matter what the table level
CHARSET/COLLATE says.
- For GEOMETRY, Column_definition_attributes::charset is set to
&my_charset_bin, no matter what the table level CHARSET/COLLATE says.
Previously this code (setting `charset`) was outside of of
Column_definition_prepare_stage1(), namely in
mysql_prepare_create_table(), and was erroneously called for
all data types.
3. Adding Type_handler::Column_definition_bulk_alter(), to handle
"ALTER TABLE .. CONVERT TO". Previously this code was inside
get_sql_field_charset() and was erroneously called for all data types.
4. Removing the Schema_specification_st parameter from
Type_handler::Column_definition_redefine_stage1().
Column_definition_attributes::charset is now fully properly initialized by
Column_definition_prepare_stage1(). So we don't need access to the
table level CHARSET/COLLATE clause in Column_definition_redefine_stage1()
any more.
5. Other changes:
- Removing global function get_sql_field_charset()
- Moving the part of the former get_sql_field_charset(), which was
responsible to inherit the table level CHARSET/COLLATE clause to
new methods:
-- Column_definition_attributes::explicit_or_derived_charset() and
-- Column_definition::prepare_charset_for_string().
This code is only needed for string data types.
Previously it was erroneously called for all data types.
- Moving another part, which was responsible to apply the
"CONVERT TO" clause, to
Type_handler_general_purpose_string::Column_definition_bulk_alter().
- Replacing the call for get_sql_field_charset() in sql_partition.cc
to sql_field->explicit_or_derived_charset() - it is perfectly enough.
The old code was redundant: get_sql_field_charset() was called from
sql_partition.cc only when there were no a "CONVERT TO CHARACTER SET"
clause involved, so its purpose was only to inherit the table
level CHARSET/COLLATE clause.
- Moving the code handling the BINCMP_FLAG flag from
mysql_prepare_create_table() to
Column_definition::prepare_charset_for_string():
This code is responsible to resolve the BINARY comparison style
into the corresponding _bin collation, to do the following transparent
rewrite:
CREATE TABLE t1 (a VARCHAR(10) BINARY) CHARSET utf8; ->
CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8 COLLATE utf8_bin);
This code is only needed for string data types.
Previously it was erroneously called for all data types.
6. Renaming Table_scope_and_contents_source_pod_st::table_charset
to alter_table_convert_to_charset, because the only purpose it's used for
is handlering "ALTER .. CONVERT". The new name is much more self-descriptive.
Allow materialization strategy when collations on the
inner and outer sides of an IN subquery are the same and the
character set of the inner side is a proper subset of the character
set on the outer side.
This allows conversion from utf8mb3 to utf8mb4
as the former is a subset of the later.
This is only allowed when IN predicate is converted to an IN subquery
Backported part of the patch (d6a00d9b18f) of MDEV-17905.
The issue here was the system variable max_sort_length was being applied
to decimals and it was truncating the value for decimals to the number
of bytes set by max_sort_length.
This was leading to a buffer overflow as the values were written
to the buffer without truncation and then we moved the offset to
the number of bytes(set by max_sort_length), that are needed for comparison.
The fix is to not apply max_sort_length for fixed size types like INT,
DECIMALS and only apply max_sort_length for CHAR, VARCHARS, TEXT and
BLOBS.
Problem:
Queries like this showed performance degratation in 10.4 over 10.3:
SELECT temporal_literal FROM t1;
SELECT temporal_literal + 1 FROM t1;
SELECT COUNT(*) FROM t1 WHERE temporal_column = temporal_literal;
SELECT COUNT(*) FROM t1 WHERE temporal_column = string_literal;
Fix:
Replacing the universal member "MYSQL_TIME cached_time" in
Item_temporal_literal to data type specific containers:
- Date in Item_date_literal
- Time in Item_time_literal
- Datetime in Item_datetime_literal
This restores the performance, and make it even better in some cases.
See benchmark results in MDEV.
Also, this change makes futher separations of Date, Time, Datetime
from each other, which will make it possible not to derive them from
a too heavy (40 bytes) MYSQL_TIME, and replace them to smaller data
type specific containers.
Implementing methods:
- Field::val_time_packed()
- Field::val_datetime_packed()
- Item_field::val_datetime_packed(THD *thd);
- Item_field::val_time_packed(THD *thd);
to give a faster access to temporal packed longlong representation of a Field,
which is used in temporal Arg_comparator's to DATE, TIME, DATETIME data types.
The same idea is used in MySQL-5.6+.
This improves performance.
Problem:
When calculatung MIN() and MAX() in a query with GROUP BY, like this:
SELECT MIN(time_expr), MAX(time_expr) FROM t1 GROUP BY i;
the code in Item_sum_min_max::update_field() erroneosly used
string format comparison, therefore '100:20:30' was considered as
smaller than '10:20:30'.
Fix:
1. Implementing low level "native" related methods in class Time:
Time::Time(const Native &native) - convert native to Time
Time::to_native(Native *to, uint decimals) - convert Time to native
The "native" binary representation for TIME is equal to
the binary data format of Field_timef, which is used to
store TIME when mysql56_temporal_format is ON (default).
2. Implementing Type_handler_time_common "native" related methods:
Type_handler_time_common::cmp_native()
Type_handler_time_common::Item_val_native_with_conversion()
Type_handler_time_common::Item_val_native_with_conversion_result()
Type_handler_time_common::Item_param_val_native()
3. Implementing missing "native representation" related methods
in Field_time and Field_timef:
Field_time::store_native()
Field_time::val_native()
Field_timef::store_native()
Field_timef::val_native()
4. Implementing missing "native" related methods in all Items
that can have the TIME data type:
Item_timefunc::val_native()
Item_name_const::val_native()
Item_time_literal::val_native()
Item_cache_time::val_native()
Item_handled_func::val_native()
5. Marking Type_handler_time_common as "native ready".
So now Item_sum_min_max::update_field() calculates
values using min_max_update_native_field(),
which uses native binary representation rather than string representation.
Before this change, only the TIMESTAMP data type used native
representation to calculate MIN() and MAX().
Benchmarks (see more details in MDEV):
This change not only fixes the wrong result, but also
makes a "SELECT .. MAX.. GROUP BY .." query faster:
# TIME(0)
CREATE TABLE t1 (id INT, time_col TIME) ENGINE=HEAP;
INSERT INTO t1 VALUES (1,'10:10:10'); -- repeat this 1m times
SELECT id, MAX(time_col) FROM t1 GROUP BY id;
MySQL80: 0.159 sec
10.3: 0.108 sec
10.4: 0.094 sec (fixed)
# TIME(6):
CREATE TABLE t1 (id INT, time_col TIME(6)) ENGINE=HEAP;
INSERT INTO t1 VALUES (1,'10:10:10.999999'); -- repeat this 1m times
SELECT id, MAX(time_col) FROM t1 GROUP BY id;
My80: 0.154
10.3: 0.135
10.4: 0.093 (fixed)
Type_handler_temporal_result::Item_func_min_max_fix_attributes()
in an expression GREATEST(string,date), e.g:
SELECT GREATEST('1', CAST('2020-12-12' AS DATE));
incorrectly evaluated decimals as 6 (like for DATETIME).
Adding a separate virtual implementation:
Type_handler_date_common::Item_func_min_max_fix_attributes()
This makes the code simpler.
- Adding optional qualifiers to data types:
CREATE TABLE t1 (a schema.DATE);
Qualifiers now work only for three pre-defined schemas:
mariadb_schema
oracle_schema
maxdb_schema
These schemas are virtual (hard-coded) for now, but may turn into real
databases on disk in the future.
- mariadb_schema.TYPE now always resolves to a true MariaDB data
type TYPE without sql_mode specific translations.
- oracle_schema.DATE translates to MariaDB DATETIME.
- maxdb_schema.TIMESTAMP translates to MariaDB DATETIME.
- Fixing SHOW CREATE TABLE to use a qualifier for a data type TYPE
if the current sql_mode translates TYPE to something else.
The above changes fix the reported problem, so this script:
SET sql_mode=ORACLE;
CREATE TABLE t2 AS SELECT mariadb_date_column FROM t1;
is now replicated as:
SET sql_mode=ORACLE;
CREATE TABLE t2 (mariadb_date_column mariadb_schema.DATE);
and the slave can unambiguously treat DATE as the true MariaDB DATE
without ORACLE specific translation to DATETIME.
Similar,
SET sql_mode=MAXDB;
CREATE TABLE t2 AS SELECT mariadb_timestamp_column FROM t1;
is now replicated as:
SET sql_mode=MAXDB;
CREATE TABLE t2 (mariadb_timestamp_column mariadb_schema.TIMESTAMP);
so the slave treats TIMESTAMP as the true MariaDB TIMESTAMP
without MAXDB specific translation to DATETIME.
Fixing ROUND(date,0), TRUNCATE(date,x), FLOOR(date), CEILING(date)
to return the `int(8) unsigned` data type.
Details:
1. Cleanup: moving virtual implementations
- Type_handler_temporal_result::Item_func_int_val_fix_length_and_dec()
- Type_handler_temporal_result::Item_func_round_fix_length_and_dec()
to Type_handler_date_common. Other temporal data type handlers
override these methods anyway. So they were only DATE specific.
This change makes the code clearer.
2. Backporting DTCollation_numeric from 10.5, to reuse the code easier.
3. Adding the `preferred_attrs` argument to Item_func_round::fix_arg_int(). Now
Type_handler_xxx::Item_func_round_val_fix_length_and_dec() work as follows:
- The INT-alike and YEAR handlers copy preferred_attrs from args[0].
- The DATE handler passes explicit attributes, to get `int(8) unsigned`.
- The hex hybrid handler passes NULL, so fix_arg_int() calculates attributes.
4. Type_handler_date_common::Item_func_int_val_fix_length_and_dec()
now sets the type handler and attributes to get `int(8) unsigned`.
1. Fixing ROUND(x) and TRUNCATE(x,0) with TINYINT, SMALLINT, MEDIUMINT, BIGINT
input to preserve the exact data type of the argument when it's possible.
2. Fixing FLOOR(x) and CEILING(x) with TINYINT, SMALLINT, MEDIUMINT, BIGINT
to preserve the exact data type of the argument.
3. Adding dedicated Type_handler_year::Item_func_round_fix_length_and_dec()
to easier handle ROUND(x) and TRUNCATE(x,y) for the YEAR(2) and YEAR(4)
input. They still return INT(2) UNSIGNED and INT(4) UNSIGNED correspondingly,
as before.
Implementing dedicated fixing methods:
- Type_handler_bit::Item_func_round_fix_length_and_dec()
- Type_handler_bit::Item_func_int_val_fix_length_and_dec()
- Type_handler_typelib::Item_func_round_fix_length_and_dec()
because the inherited methods did not work well.
Fixing:
- Type_handler_typelib::Item_func_int_val_fix_length_and_dec
It did not work well, because it used args[0]->max_length to
calculate the result data type. In case of ENUM and SET it was
not correct, because in FLOOR() and CEILING() context
ENUM and SET return not more than 5 digits (65535 is the biggest
possible value).
Misc:
- Changing the API of
Type_handler_bit::Bit_decimal_notation_int_digits(const Item *item)
to a more generic form:
Type_handler_bit::Bit_decimal_notation_int_digits_by_nbits(uint nbits)
- Fixing Type_handler_bit::Bit_decimal_notation_int_digits_by_nbits() to
return the exact number of decimal digits for all nbits 1..64.
The old implementation was approximate.
This change gives better (more precise) data types.
- Type_handler_hex_hybrid did not override
Type_handler_string_result::Item_func_round_fix_length_and_dec(),
so the result type of ROUND(0xFFFFFFFFFFFFFFFF) was erroneously
calculated ad DOUBLE with a wrong length.
Overriding Item_func_round_fix_length_and_dec(), to calculated
the result type as INT/BIGINT.
Also, fixing Item_func_round::fix_arg_int() to use
args[0]->decimal_precision() instead of args[0]->max_length
when calculating this->max_length, to get a correct result
for hex hybrids.
- Type_handler_hex_hybrid::Item_func_int_val_fix_length_and_dec()
called item->fix_length_and_dec_int_or_decimal(), which did not
produce a correct result data type for hex hybrid.
Implementing a dedicated code instead, to return INT UNSIGNED or
BIGINT UNSIGNED depending in the number of digits in the arguments.
This patch fixes:
- MDEV-19284 INSTANT ALTER with ucs2-to-utf16 conversion produces bad data
- MDEV-19285 INSTANT ALTER from ascii_general_ci to latin1_general_ci produces corrupt data
These regressions were introduced in 10.4.3 by:
- MDEV-15564 Avoid table rebuild in ALTER TABLE on collation or charset changes
Changes:
1. Cleanup: Adding a helper method
Field_longstr::csinfo_change_allows_instant_alter(),
to remove some duplicate code in field.cc.
2. Cleanup: removing Type_handler::Charsets_are_compatible() and static
function charsets_are_compatible() and
introducing new methods in the recently added class Charset instead:
- encoding_allows_reinterpret_as()
- encoding_and_order_allow_reinterpret_as()
3. Bug fix: Removing the code that allowed instant conversion for
ascii-to->8bit and ucs2-to->utf16.
This actually fixes MDEV-19284 and MDEV-19285.
4. Bug fix: Adding a helper method Charset::collation_specific_name().
The old corresponding code in Type_handler::Charsets_are_compatible()
was not safe against (badly named) user-defined collations whose
character set name can be longer than collation name.