1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-08 11:22:35 +03:00

MDEV-32113: utf8mb3_key_col=utf8mb4_value cannot be used for ref

(Variant#3: Allow cross-charset comparisons, use a special
CHARSET_INFO to create lookup keys. Review input addressed.)

Equalities that compare utf8mb{3,4}_general_ci strings, like:

  WHERE ... utf8mb3_key_col=utf8mb4_value    (MB3-4-CMP)

can now be used to construct ref[const] access and also participate
in multiple-equalities.
This means that utf8mb3_key_col can be used for key-lookups when
compared with an utf8mb4 constant, field or expression using '=' or
'<=>' comparison operators.

This is controlled by optimizer_switch='cset_narrowing=on', which is
OFF by default.

IMPLEMENTATION
Item value comparison in (MB3-4-CMP) is done using utf8mb4_general_ci.
This is valid as any utf8mb3 value is also an utf8mb4 value.

When making index lookup value for utf8mb3_key_col, we do "Charset
Narrowing": characters that are in the Basic Multilingual Plane (=BMP) are
copied as-is, as they can be represented in utf8mb3. Characters that are
outside the BMP cannot be represented in utf8mb3 and are replaced
with U+FFFD, the "Replacement Character".

In utf8mb4_general_ci, the Replacement Character compares as equal to any
character that's not in BMP. Because of this, the constructed lookup value
will find all index records that would be considered equal by the original
condition (MB3-4-CMP).

Approved-by: Monty <monty@mariadb.org>
This commit is contained in:
Sergei Petrunia
2023-09-19 18:22:49 +03:00
parent 6a674c3142
commit 4941ac9192
23 changed files with 1001 additions and 39 deletions

View File

@@ -8998,11 +8998,28 @@ SEL_ARG *Field_str::get_mm_leaf(RANGE_OPT_PARAM *prm, KEY_PART *key_part,
const Item_bool_func *cond,
scalar_comparison_op op, Item *value)
{
int err;
DBUG_ENTER("Field_str::get_mm_leaf");
if (can_optimize_scalar_range(prm, key_part, cond, op, value) !=
Data_type_compatibility::OK)
DBUG_RETURN(0);
int err= value->save_in_field_no_warnings(this, 1);
{
/*
Do CharsetNarrowing if necessary
This means that we are temporary changing the character set of the
current key field to make key lookups possible.
This is needed when comparing an utf8mb3 key field with an utf8mb4 value.
See cset_narrowing.h for more details.
*/
bool do_narrowing=
Utf8_narrow::should_do_narrowing(this, value->collation.collation);
Utf8_narrow narrow(this, do_narrowing);
err= value->save_in_field_no_warnings(this, 1);
narrow.stop();
}
if ((op != SCALAR_CMP_EQUAL && is_real_null()) || err < 0)
DBUG_RETURN(&null_element);
if (err > 0)