MDEV-7947 strcmp() takes 0.37% in OLTP RO

This patch ensures that all identical character sets shares the same cs->csname. This allows us to replace strcmp() in my_charset_same() with comparisons of pointers. This fixes a long standing performance issue that could cause as strcmp() for every item sent trough the protocol class to the end user. One consequence of this patch is that we don't allow one to add a character definition in the Index.xml file that changes the csname of an existing character set. This is by design as changing character set names of existing ones is extremely dangerous, especially as some storage engines just records character set numbers. As we now have a hash over character set's csname, we can in the future use that for faster access to a specific character set. This could be done by changing the hash to non unique and use the hash to find the next character set with same csname.
2025-08-08 11:22:35 +03:00 · 2020-07-20 19:26:31 +03:00
parent 46ffd47f42
commit dbcd3384e0
30 changed files with 386 additions and 245 deletions
--- a/strings/strings_def.h
+++ b/strings/strings_def.h
@@ -130,4 +130,13 @@ int my_wc_to_printable_generic(CHARSET_INFO *cs, my_wc_t wc,
 int my_wc_to_printable_8bit(CHARSET_INFO *cs, my_wc_t wc,
                            uchar *s, uchar *e);

+/* Some common character set names */
+extern const char charset_name_latin2[];
+extern const char charset_name_utf8[];
+extern const char charset_name_utf16[];
+extern const char charset_name_utf32[];
+extern const char charset_name_ucs2[];
+extern const char charset_name_ucs2[];
+extern const char charset_name_utf8mb4[];
+
 #endif /*STRINGS_DEF_INCLUDED */