1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-08 11:22:35 +03:00
Commit Graph

4 Commits

Author SHA1 Message Date
Alexander Barkov
f6118acda9 A follow-up patch MDEV-27266 Improve UCA collation performance for utf8mb3 and utf8mb4
Moving these members:

   CHARSET_INFO *cs;
   const MY_UCA_WEIGHT_LEVEL *level;

from my_uca_scanner to a new separate structure my_uca_scanner_param.

Rationale:

During a comparison of two strings these members were initialized two times
(one time for every string).

After the change these members initialized only one time inside
a shared instance of my_uca_scanner_param, and the instance is
shared between two scanners (its const address is passed as new a parameter
to the underlying scanner functions).

This change gives a slight performance improvement (~5%).
2022-09-02 13:23:24 +04:00
Alexander Barkov
d8f172c11c MDEV-27266 Improve UCA collation performance for utf8mb3 and utf8mb4
Adding two levels of optimization:

1. For every bytes pair [00..FF][00..FF] which:
  a. consists of two ASCII characters or makes a well-formed two-byte character
  b. whose total weight string fits into 4 weights
     (concatenated weight string in case of two ASCII characters,
     or a single weight string in case of a two-byte character)
  c. whose weight is context independent (i.e. does not depend on contractions
     or previous context pairs)
  store weights in a separate array of MY_UCA_2BYTES_ITEM,
  so during scanner_next() we can scan two bytes at a time.
  Byte pairs that do not match the conditions a-c are marked in this array
  as not applicable for optimization and scanned as before.

2. For every byte pair which is applicable for optimization in #1,
   and which produces only one or two weights, store
   weights in one more array of MY_UCA_WEIGHT2. So in the beginning
   of strnncoll*() we can skip equal prefixes using an even more efficient
   loop. This loop consumes two bytes at a time. The loop scans while the
   two bytes on both sides produce weight strings of equal length
   (i.e. one weight on both sides, or two weight on both sides).
   This allows to compare efficiently:
   - Context independent sequences consisting of two ASCII characters
   - Context independent 2-byte characters
   - Contractions consisting of two ASCII characters, e.g. Czech "ch".
   - Some tricky cases: "ss" vs "SHARP S"
     ("ss" produces two weights, 0xC39F also produces two weights)
2022-08-10 15:04:50 +02:00
Oleksandr Byelkin
4fb2cb1a30 Merge branch '10.7' into 10.8 2022-02-04 14:50:25 +01:00
Alexander Barkov
b915f79e4e MDEV-25904 New collation functions to compare InnoDB style trimmed NO PAD strings 2022-01-21 12:16:07 +04:00