MDEV-16020 SP variables inside GROUP BY..WITH ROLLUP break replication

The code passing positions in the query to constructors of Rewritable_query_parameter descendants (e.g. Item_splocal) was not reliable. It used various Lex_input_stream methods: - get_tok_start() - get_tok_start_prev() - get_tok_end() - get_ptr() to find positions of the recently scanned tokens. The challenge was mostly to choose between get_tok_start() and get_tok_start_prev(), taking into account to the current grammar (depending if lookahead takes place before or after we read the positions in every particular rule). But this approach did not work at all in combination with token contractions, when MYSQLlex() translates two tokens into one token ID, for example: WITH ROLLUP -> WITH_ROLLUP_SYM As a result, the tokenizer is already one more token ahead. So in query fragment: "GROUP BY d, spvar WITH ROLLUP" get_tok_start() points to "ROLLUP". get_tok_start_prev() points to "WITH". As a result, it was "WITH" who was erroneously replaced to NAME_CONST() instead of "spvar". This patch modifies the code to do it a different way. Changes: 1. For keywords and identifiers, the tokenizer now returns LEX_CTRING pointing directly to the query fragment. So query positions are now just available using: - $1.str - for the beginning of a token - $1.str+$1.length - for the end of a token 2. Identifiers are not allocated on the THD memory root in the tokenizer any more. Allocation is now done on later stages, in methods like LEX::create_item_ident(). 3. Two LEX_CSTRING based structures were added: - Lex_ident_cli_st - used to store the "client side" identifier representation, pointing to the query fragment. Note, these identifiers are encoded in @@character_set_client and can have broken byte sequences. - Lex_ident_sys_st - used to store the "server side" identifier representation, pointing to the THD allocated memory. This representation guarantees that the identifier was checked for being well-formed, and is encoded in utf8. 4. To distinguish between two identifier types in the grammar, two Bison types were added: <ident_cli> and <ident_sys> 5. All non-reserved keywords were marked as being of the type <ident_cli>. All reserved keywords are still of the type NONE. 6. All curly brackets in rules collecting non-reserved keywords into non-terminal symbols were removed, e.g.: Was: keyword_sp_data_type: BIT_SYM {} | BOOLEAN_SYM {} Now: keyword_sp_data_type: BIT_SYM | BOOLEAN_SYM This is important NOT to have brackets here!!!! This is needed to make sure that the underlying Lex_ident_cli_ststructure correctly passes up to the calling rule. 6. The code to scan identifiers and keywords was moved from lex_one_token() into new Lex_input_stream methods: scan_ident_sysvar() scan_ident_start() scan_ident_middle() scan_ident_delimited() This was done to: - get rid of enormous amount of references to &yylval->lex_str - and remove a lot of references like lip->xxx 7. The allocating functionality which puts identifiers on the THD memory root now resides in methods of Lex_ident_sys_st, and in THD::to_ident_sys_alloc(). get_quoted_token() was removed. 8. Cleanup: check_simple_select() was moved as a method to LEX. 9. Cleanup: Some more functionality was moved from *.yy to new methods were added to LEX: make_item_colon_ident_ident() make_item_func_call_generic() create_item_qualified_asterisk()
2025-07-27 18:02:13 +03:00 · 2018-04-27 22:11:18 +04:00
parent 6c5e60f1b1
commit 96a301bbbe
13 changed files with 2862 additions and 2395 deletions
--- a/sql/sql_class.h
+++ b/sql/sql_class.h
@ -3658,6 +3658,26 @@ public:
    lex_str->length= length;
    return lex_str;
  }
+  // Remove double quotes:  aaa""bbb -> aaa"bbb
+  bool quote_unescape(LEX_CSTRING *dst, const LEX_CSTRING *src, char quote)
+  {
+    const char *tmp= src->str;
+    const char *tmpend= src->str + src->length;
+    char *to;
+    if (!(dst->str= to= (char *) alloc(src->length + 1)))
+    {
+      dst->length= 0; // Safety
+      return true;
+    }
+    for ( ; tmp < tmpend; )
+    {
+      if ((*to++= *tmp++) == quote)
+        tmp++;                                  // Skip double quotes
+    }
+    *to= 0;                                     // End null for safety
+    dst->length= to - dst->str;
+    return false;
+  }

  LEX_CSTRING *make_clex_string(const char* str, size_t length)
  {
@ -3701,7 +3721,6 @@ public:
  bool convert_with_error(CHARSET_INFO *dstcs, LEX_STRING *dst,
                          CHARSET_INFO *srccs,
                          const char *src, size_t src_length);
-
  /*
    If either "dstcs" or "srccs" is &my_charset_bin,
    then performs native copying using cs->cset->copy_fix().
@ -3720,6 +3739,17 @@ public:

  bool convert_string(String *s, CHARSET_INFO *from_cs, CHARSET_INFO *to_cs);

+  /*
+    Check if the string is wellformed, raise an error if not wellformed.
+    @param str    - The string to check.
+    @param length - the string length.
+  */
+  bool check_string_for_wellformedness(const char *str,
+                                       size_t length,
+                                       CHARSET_INFO *cs) const;
+
+  bool to_ident_sys_alloc(Lex_ident_sys_st *to, const Lex_ident_cli_st *from);
+
  /*
    Create a string literal with optional client->connection conversion.
    @param str        - the string in the client character set
@ -3827,7 +3857,7 @@ public:
  void set_stmt_da(Diagnostics_area *da)
  { m_stmt_da= da; }

-  inline CHARSET_INFO *charset() { return variables.character_set_client; }
+  inline CHARSET_INFO *charset() const { return variables.character_set_client; }
  void update_charset();
  void update_charset(CHARSET_INFO *character_set_client,
                      CHARSET_INFO *collation_connection)