1
0
mirror of https://github.com/MariaDB/server.git synced 2025-07-29 05:21:33 +03:00

Patch for the following bugs:

- BUG#11986: Stored routines and triggers can fail if the code
    has a non-ascii symbol
  - BUG#16291: mysqldump corrupts string-constants with non-ascii-chars
  - BUG#19443: INFORMATION_SCHEMA does not support charsets properly
  - BUG#21249: Character set of SP-var can be ignored
  - BUG#25212: Character set of string constant is ignored (stored routines)
  - BUG#25221: Character set of string constant is ignored (triggers)

There were a few general problems that caused these bugs:
1. Character set information of the original (definition) query for views,
   triggers, stored routines and events was lost.
2. mysqldump output query in client character set, which can be
   inappropriate to encode definition-query.
3. INFORMATION_SCHEMA used strings with mixed encodings to display object
   definition;

1. No query-definition-character set.

In order to compile query into execution code, some extra data (such as
environment variables or the database character set) is used. The problem
here was that this context was not preserved. So, on the next load it can
differ from the original one, thus the result will be different.

The context contains the following data:
  - client character set;
  - connection collation (character set and collation);
  - collation of the owner database;

The fix is to store this context and use it each time we parse (compile)
and execute the object (stored routine, trigger, ...).

2. Wrong mysqldump-output.

The original query can contain several encodings (by means of character set
introducers). The problem here was that we tried to convert original query
to the mysqldump-client character set.

Moreover, we stored queries in different character sets for different
objects (views, for one, used UTF8, triggers used original character set).

The solution is
  - to store definition queries in the original character set;
  - to change SHOW CREATE statement to output definition query in the
    binary character set (i.e. without any conversion);
  - introduce SHOW CREATE TRIGGER statement;
  - to dump special statements to switch the context to the original one
    before dumping and restore it afterwards.

Note, in order to preserve the database collation at the creation time,
additional ALTER DATABASE might be used (to temporary switch the database
collation back to the original value). In this case, ALTER DATABASE
privilege will be required. This is a backward-incompatible change.

3. INFORMATION_SCHEMA showed non-UTF8 strings

The fix is to generate UTF8-query during the parsing, store it in the object
and show it in the INFORMATION_SCHEMA.

Basically, the idea is to create a copy of the original query convert it to
UTF8. Character set introducers are removed and all text literals are
converted to UTF8.

This UTF8 query is intended to provide user-readable output. It must not be
used to recreate the object.  Specialized SHOW CREATE statements should be
used for this.

The reason for this limitation is the following: the original query can
contain symbols from several character sets (by means of character set
introducers).

Example:

  - original query:
    CREATE VIEW v1 AS SELECT _cp1251 'Hello' AS c1;

  - UTF8 query (for INFORMATION_SCHEMA):
    CREATE VIEW v1 AS SELECT 'Hello' AS c1;
This commit is contained in:
anozdrin/alik@ibm.
2007-06-28 21:34:54 +04:00
parent 64cac0d6ad
commit 9fae9ef66f
82 changed files with 11828 additions and 937 deletions

View File

@ -112,7 +112,8 @@ enum enum_sql_command {
SQLCOM_SHOW_CONTRIBUTORS,
SQLCOM_CREATE_SERVER, SQLCOM_DROP_SERVER, SQLCOM_ALTER_SERVER,
SQLCOM_CREATE_EVENT, SQLCOM_ALTER_EVENT, SQLCOM_DROP_EVENT,
SQLCOM_SHOW_CREATE_EVENT, SQLCOM_SHOW_EVENTS,
SQLCOM_SHOW_CREATE_EVENT, SQLCOM_SHOW_EVENTS,
SQLCOM_SHOW_CREATE_TRIGGER,
/* This should be the last !!! */
@ -1330,6 +1331,26 @@ public:
return (uint) ((m_ptr - m_tok_start) - 1);
}
/** Get the utf8-body string. */
const char *get_body_utf8_str()
{
return m_body_utf8;
}
/** Get the utf8-body length. */
uint get_body_utf8_length()
{
return m_body_utf8_ptr - m_body_utf8;
}
void body_utf8_start(THD *thd, const char *begin_ptr);
void body_utf8_append(const char *ptr);
void body_utf8_append(const char *ptr, const char *end_ptr);
void body_utf8_append_literal(THD *thd,
const LEX_STRING *txt,
CHARSET_INFO *txt_cs,
const char *end_ptr);
/** Current thread. */
THD *m_thd;
@ -1361,6 +1382,9 @@ private:
/** Begining of the query text in the input stream, in the raw buffer. */
const char* m_buf;
/** Length of the raw buffer. */
uint m_buf_length;
/** Echo the parsed stream to the pre-processed buffer. */
bool m_echo;
@ -1388,6 +1412,18 @@ private:
*/
const char* m_cpp_tok_end;
/** UTF8-body buffer created during parsing. */
char *m_body_utf8;
/** Pointer to the current position in the UTF8-body buffer. */
char *m_body_utf8_ptr;
/**
Position in the pre-processed buffer. The query from m_cpp_buf to
m_cpp_utf_processed_ptr is converted to UTF8-body.
*/
const char *m_cpp_utf8_processed_ptr;
public:
/** Current state of the lexical analyser. */
@ -1410,6 +1446,29 @@ public:
/** State of the lexical analyser for comments. */
enum_comment_state in_comment;
/**
Starting position of the TEXT_STRING or IDENT in the pre-processed
buffer.
NOTE: this member must be used within MYSQLlex() function only.
*/
const char *m_cpp_text_start;
/**
Ending position of the TEXT_STRING or IDENT in the pre-processed
buffer.
NOTE: this member must be used within MYSQLlex() function only.
*/
const char *m_cpp_text_end;
/**
Character set specified by the character-set-introducer.
NOTE: this member must be used within MYSQLlex() function only.
*/
CHARSET_INFO *m_underscore_cs;
};
@ -1444,7 +1503,7 @@ typedef struct st_lex : public Query_tables_list
DYNAMIC_ARRAY plugins;
plugin_ref plugins_static_buffer[INITIAL_LEX_PLUGIN_LIST_SIZE];
CHARSET_INFO *charset, *underscore_charset;
CHARSET_INFO *charset;
/* store original leaf_tables for INSERT SELECT and PS/SP */
TABLE_LIST *leaf_tables_insert;
@ -1635,6 +1694,8 @@ typedef struct st_lex : public Query_tables_list
const char *fname_start;
const char *fname_end;
LEX_STRING view_body_utf8;
/*
Reference to a struct that contains information in various commands
to add/create/drop/change table spaces.