1
0
mirror of https://github.com/MariaDB/server.git synced 2025-07-29 05:21:33 +03:00

MDEV-25292 Atomic CREATE OR REPLACE TABLE

Atomic CREATE OR REPLACE allows to keep an old table intact if the
command fails or during the crash. That is done through creating
a table with a temporary name and filling it with the data
(for CREATE OR REPLACE .. SELECT), then renaming the original table
to another temporary (backup) name and renaming the replacement table
to original table. The backup table is kept until the last chance of
failure and if that happens, the replacement table is thrown off and
backup recovered. When the command is complete and logged the backup
table is deleted.

Atomic replace algorithm

  Two DDL chains are used for CREATE OR REPLACE:
  ddl_log_state_create (C) and ddl_log_state_rm (D).

  1. (C) Log CREATE_TABLE_ACTION of TMP table (drops TMP table);
  2. Create new table as TMP;
  3. Do everything with TMP (like insert data);

  finalize_atomic_replace():
  4. Link chains: (D) is executed only if (C) is closed;
  5. (D) Log DROP_ACTION of BACKUP;
  6. (C) Log RENAME_TABLE_ACTION from ORIG to BACKUP (replays BACKUP -> ORIG);
  7. Rename ORIG to BACKUP;
  8. (C) Log CREATE_TABLE_ACTION of ORIG (drops ORIG);
  9. Rename TMP to ORIG;

  finalize_ddl() in case of success:
  10. Close (C);
  11. Replay (D): BACKUP is dropped.

  finalize_ddl() in case of error:
  10. Close (D);
  11. Replay (C):
    1) ORIG is dropped (only after finalize_atomic_replace());
    2) BACKUP renamed to ORIG (only after finalize_atomic_replace());
    3) drop TMP.

  If crash happens (C) or (D) is replayed in reverse order. (C) is
  replayed if crash happens before it is closed, otherwise (D) is
  replayed.

Temporary table for CREATE OR REPLACE

  Before dropping "old" table, CREATE OR REPLACE creates "tmp" table.
  ddl_log_state_create holds the drop of the "tmp" table.  When
  everything is OK (data is inserted, "tmp" is ready) ddl_log_state_rm
  is written to replace "old" with "tmp". Until ddl_log_state_create
  is closed ddl_log_state_rm is not executed.

  After the binlogging is done ddl_log_state_create is closed. At that
  point ddl_log_state_rm is executed and "tmp" is replaced with
  "old". That is: final rename is done by the DDL log.

  With that important role of DDL log for CREATE OR REPLACE operation
  replay of ddl_log_state_rm must fail at the first hit error and
  print the error message if possible. F.ex. foreign key error is
  discovered at this phase: InnoDB rejects to drop the "old" table and
  returns corresponding foreign key error code.

Additional notes

  - CREATE TABLE without REPLACE is not affected by this commit.

  - Engines having HTON_EXPENSIVE_RENAME flag set are not affected by
    this commit.

  - CREATE TABLE .. SELECT XID usage is fixed and now there is no need
    to log DROP TABLE via DDL_CREATE_TABLE_PHASE_LOG (see comments in
    do_postlock()). XID is now correctly updated so it disables
    DDL_LOG_DROP_TABLE_ACTION. Note that binary log is flushed at the
    final stage when the table is ready. So if we have XID in the
    binary log we don't need to drop the table.

  - Three variations of CREATE OR REPLACE handled:

    1. CREATE OR REPLACE TABLE t1 (..);
    2. CREATE OR REPLACE TABLE t1 LIKE t2;
    3. CREATE OR REPLACE TABLE t1 SELECT ..;

  - Test case uses 6 combinations for engines (aria, aria_notrans,
    myisam, ib, lock_tables, expensive_rename) and 2 combinations for
    binlog types (row, stmt). Combinations help to check differences
    between the results. Error failures are tested for the above three
    variations.

  - expensive_rename tests CREATE OR REPLACE without atomic
    replace. The effect should be the same as with the old behaviour
    before this commit.

  - Triggers mechanism is unaffected by this change. This is tested in
    create_replace.test.

  - LOCK TABLES is affected. Lock restoration must be done after "rm"
    chain is replayed.

  - Moved ddl_log_complete() from send_eof() to finalize_ddl(). This
    checkpoint was not executed before for normal CREATE TABLE but is
    executed now.

  - CREATE TABLE will now rollback also if writing to the binary
    logging failed. See rpl_gtid_strict.test

Rename and drop via DDL log

  We replay ddl_log_state_rm to drop the old table and rename the
  temporary table. In that case we must throw the correct error
  message if ddl_log_revert() fails (f.ex. on FK error).

  If table is deleted earlier and not via DDL log and the crash
  happened, the create chain is not closed. Linked drop chain is not
  executed and the new table is not installed. But the old table is
  already deleted.

ddl_log.cc changes

  Now we can place action before DDL_LOG_DROP_INIT_ACTION and it will
  be replayed after DDL_LOG_DROP_TABLE_ACTION.

  report_error parameter for ddl_log_revert() allows to fail at first
  error and print the error message if possible.
  ddl_log_execute_action() now can print error message.

  Since we now can handle errors from ddl_log_execute_action() (in
  case of non-recovery execution) unconditional setting "error= TRUE"
  is wrong (it was wrong anyway because it was overwritten at the end
  of the function).

On XID usage

  Like with all other atomic DDL operations XID is used to avoid
  inconsistency between master and slave in the case of a crash after
  binary log is written and before ddl_log_state_create is closed. On
  recovery XIDs are taken from binary log and corresponding DDL log
  events get disabled.  That is done by
  ddl_log_close_binlogged_events().

On linking two chains together

  Chains are executed in the ascending order of entry_pos of execute
  entries. But entry_pos assignment order is undefined: it may assign
  bigger number for the first chain and then smaller number for the
  second chain. So the execution order in that case will be reverse:
  second chain will be executed first.

  To avoid that we link one chain to another. While the base chain
  (ddl_log_state_create) is active the secondary chain
  (ddl_log_state_rm) is not executed. That is: only one chain can be
  executed in two linked chains.

  The interface ddl_log_link_chains() was done in "MDEV-22166
  ddl_log_write_execute_entry() extension".

More on CREATE OR REPLACE .. SELECT

  We use create_and_open_tmp_table() like in ALTER TABLE to create
  temporary TABLE object (tmp_table is (NON_)TRANSACTIONAL_TMP_TABLE).

  After we created such TABLE object we use create_info->tmp_table()
  instead of table->s->tmp_table when we need to check for
  parser-requested tmp-table.

  External locking is required for temporary table created by
  create_and_open_tmp_table(). F.ex. that disables logging for Aria
  transactional tables and without that (when no mysql_lock_tables()
  is done) it cannot work correctly.

  For making external lock the patch requires Aria table to work in
  non-transactional mode. That is usually done by
  ha_enable_transaction(false). But we cannot disable transaction
  completely because: 1. binlog rollback removes pending row events
  (binlog_remove_pending_rows_event()). The row events are added
  during CREATE .. SELECT data insertion phase. 2. replication slave
  highly depends on transaction and cannot work without it.

  So we put temporary Aria table into non-transactional mode with
  "thd->transaction->on hack". See comment for on_save variable.

  Note that Aria table has internal_table mode. But we cannot use it
  because:

  if (!internal_table)
  {
    mysql_mutex_lock(&THR_LOCK_myisam);
    old_info= test_if_reopen(name_buff);
  }

  For internal_table test_if_reopen() is not called and we get a new
  MARIA_SHARE for each file handler. In that case duplicate errors are
  missed because insert and lookup in CREATE .. SELECT is done via two
  different handlers (see create_lookup_handler()).

  For temporary table before dropping TABLE_SHARE by
  drop_temporary_table() we must do ha_reset(). ha_reset() releases
  storage share. Without that the share is kept and the second CREATE
  OR REPLACE .. SELECT fails with:

    HA_ERR_TABLE_EXIST (156): MyISAM table '#sql-create-b5377-4-t2' is
    in use (most likely by a MERGE table). Try FLUSH TABLES.

    HA_EXTRA_PREPARE_FOR_DROP also removes MYISAM_SHARE, but that is
    not needed as ha_reset() does the job.

  ha_reset() is usually done by
  mark_tmp_table_as_free_for_reuse(). But we don't need that mechanism
  for our temporary table.

Atomic_info in HA_CREATE_INFO

  Many functions in CREATE TABLE pass the same parameters. These
  parameters are part of table creation info and should be in
  HA_CREATE_INFO (or whatever). Passing parameters via single
  structure is much easier for adding new data and
  refactoring.

InnoDB changes (revised by Marko Mäkelä)

  row_rename_table_for_mysql(): Specify the treatment of FOREIGN KEY
  constraints in a 4-valued enum parameter. In cases where FOREIGN KEY
  constraints cannot exist (partitioned tables, or internal tables of
  FULLTEXT INDEX), we can use the mode RENAME_IGNORE_FK.
  The mod RENAME_REBUILD is for any DDL operation that rebuilds the
  table inside InnoDB, such as TRUNCATE and native ALTER TABLE
  (or OPTIMIZE TABLE). The mode RENAME_ALTER_COPY is used solely
  during non-native ALTER TABLE in ha_innobase::rename_table().
  Normal ha_innobase::rename_table() will use the mode RENAME_FK.

  CREATE OR REPLACE will rename the old table (if one exists) along
  with its FOREIGN KEY constraints into a temporary name. The replacement
  table will be initially created with another temporary name.
  Unlike in ALTER TABLE, all FOREIGN KEY constraints must be renamed
  and not inherited as part of these operations, using the mode RENAME_FK.

  dict_get_referenced_table(): Let the callers convert names when needed.

  create_table_info_t::create_foreign_keys(): CREATE OR REPLACE creates
  the replacement table with a temporary name table, so for
  self-references foreign->referenced_table will be a table with
  temporary name and charset conversion must be skipped for it.

Reviewed by:

  Michael Widenius <monty@mariadb.org>
This commit is contained in:
Aleksey Midenkov
2022-08-31 11:55:04 +03:00
parent 86da0f4ee8
commit 93c8252f02
73 changed files with 8640 additions and 1107 deletions

View File

@ -3877,7 +3877,8 @@ select_insert::select_insert(THD *thd_arg, TABLE_LIST *table_list_par,
sel_result(result),
table_list(table_list_par), table(table_par), fields(fields_par),
autoinc_value_of_last_inserted_row(0),
insert_into_view(table_list_par && table_list_par->view != 0)
insert_into_view(table_list_par && table_list_par->view != 0),
binary_logged(false), atomic_replace(false), create_info(NULL)
{
bzero((char*) &info,sizeof(info));
info.handle_duplicates= duplic;
@ -3886,6 +3887,37 @@ select_insert::select_insert(THD *thd_arg, TABLE_LIST *table_list_par,
info.update_values= update_values;
info.view= (table_list_par->view ? table_list_par : 0);
info.table_list= table_list_par;
tmp_table= table ? table->s->tmp_table != NO_TMP_TABLE : false;
}
select_create::select_create(THD *thd, TABLE_LIST *table_arg,
Table_specification_st *create_info_par,
Alter_info *alter_info_arg,
List<Item> &select_fields,
enum_duplicates duplic, bool ignore,
TABLE_LIST *select_tables_arg):
select_insert(thd, table_arg, NULL, &select_fields, 0, 0, duplic,
ignore, NULL),
orig_table(table_arg),
select_tables(select_tables_arg),
alter_info(alter_info_arg),
m_plock(NULL), exit_done(0),
saved_tmp_table_share(0)
{
DBUG_ASSERT(create_info_par->default_table_charset);
bzero(&ddl_log_state_create, sizeof(ddl_log_state_create));
bzero(&ddl_log_state_rm, sizeof(ddl_log_state_rm));
create_info= create_info_par;
if (!thd->is_current_stmt_binlog_format_row() ||
!ha_check_storage_engine_flag(create_info->db_type,
HTON_NO_BINLOG_ROW_OPT))
atomic_replace= create_info->is_atomic_replace();
else
DBUG_ASSERT(!atomic_replace);
create_info->ddl_log_state_create= &ddl_log_state_create;
create_info->ddl_log_state_rm= &ddl_log_state_rm;
tmp_table= create_info->tmp_table();
}
@ -4226,10 +4258,10 @@ void select_insert::store_values(List<Item> &values)
bool select_insert::prepare_eof()
{
int error;
#ifndef DBUG_OFF
bool const trans_table= table->file->has_transactions_and_rollback();
#endif
bool changed;
bool binary_logged= 0;
killed_state killed_status= thd->killed;
DBUG_ENTER("select_insert::prepare_eof");
DBUG_PRINT("enter", ("trans_table: %d, table_type: '%s'",
@ -4247,12 +4279,31 @@ bool select_insert::prepare_eof()
error= thd->get_stmt_da()->sql_errno();
if (info.ignore || info.handle_duplicates != DUP_ERROR)
if (table->file->ha_table_flags() & HA_DUPLICATE_POS)
table->file->ha_rnd_end();
if (table->file->ha_table_flags() & HA_DUPLICATE_POS)
table->file->ha_rnd_end();
table->file->extra(HA_EXTRA_END_ALTER_COPY);
table->file->extra(HA_EXTRA_NO_IGNORE_DUP_KEY);
table->file->extra(HA_EXTRA_WRITE_CANNOT_REPLACE);
if (atomic_replace)
{
DBUG_ASSERT(table->s->tmp_table);
/*
Note: InnoDB does autocommit on external unlock.
We cannot do commit twice and we must commit after binlog
(flush row events is done at commit), so we cannot do it here.
Test: rpl.create_or_replace_row
*/
ulonglong save_options_bits= thd->variables.option_bits;
thd->variables.option_bits|= OPTION_NOT_AUTOCOMMIT;
int lock_error= table->file->ha_external_lock(thd, F_UNLCK);
thd->variables.option_bits= save_options_bits;
if (lock_error)
DBUG_RETURN(true); /* purecov: inspected */
}
if (likely((changed= (info.copied || info.deleted || info.updated))))
{
/*
@ -4270,19 +4321,54 @@ bool select_insert::prepare_eof()
DBUG_ASSERT(trans_table || !changed ||
thd->transaction->stmt.modified_non_trans_table);
if (unlikely(error))
{
if ((thd->transaction->stmt.modified_non_trans_table ||
thd->log_current_statement()) && !atomic_replace)
{
if (binlog_query())
table->file->print_error(error,MYF(0));
}
else
table->file->ha_release_auto_increment();
DBUG_RETURN(true);
}
DBUG_RETURN(false);
}
bool select_insert::binlog_query()
{
/* For atomic_replace table was already closed in send_eof(). */
DBUG_ASSERT(table || atomic_replace);
const bool trans_table= table ? table->file->has_transactions_and_rollback() :
false;
killed_state killed_status= thd->killed;
DBUG_ENTER("select_insert::binlog_query");
/*
Write to binlog before commiting transaction. No statement will
be written by the binlog_query() below in RBR mode. All the
events are in the transaction cache and will be written when
ha_autocommit_or_rollback() is issued below.
*/
if ((WSREP_EMULATE_BINLOG(thd) || mysql_bin_log.is_open()) &&
(likely(!error) || thd->transaction->stmt.modified_non_trans_table ||
thd->log_current_statement()))
if ((WSREP_EMULATE_BINLOG(thd) || mysql_bin_log.is_open()))
{
debug_crash_here("ddl_log_create_before_binlog");
if (create_info && !create_info->tmp_table())
{
thd->binlog_xid= thd->query_id;
/* Remember xid's for the case of row based logging */
ddl_log_update_xid(create_info->ddl_log_state_create, thd->binlog_xid);
if (create_info->ddl_log_state_rm->is_active() && !atomic_replace)
ddl_log_update_xid(create_info->ddl_log_state_rm, thd->binlog_xid);
}
int errcode= 0;
int res;
if (likely(!error))
if (thd->is_error())
thd->clear_error();
else
errcode= query_error_code(thd, killed_status == NOT_KILLED);
@ -4291,20 +4377,24 @@ bool select_insert::prepare_eof()
res= thd->binlog_query(THD::ROW_QUERY_TYPE,
thd->query(), thd->query_length(),
trans_table, FALSE, FALSE, errcode);
/*
NOTE: binlog_xid must be cleared after commit because pending row events
are written at commit phase.
*/
if (res > 0)
{
table->file->ha_release_auto_increment();
thd->binlog_xid= 0;
if (table)
table->file->ha_release_auto_increment();
DBUG_RETURN(true);
}
binary_logged= res == 0 || !table->s->tmp_table;
binary_logged= res == 0 || !tmp_table;
}
table->s->table_creation_was_logged|= binary_logged;
table->file->ha_release_auto_increment();
if (unlikely(error))
if (table)
{
table->file->print_error(error,MYF(0));
DBUG_RETURN(true);
/* NOTE: used in binlog_drop_table(), not needed for atomic_replace */
table->s->table_creation_was_logged|= binary_logged;
table->file->ha_release_auto_increment();
}
DBUG_RETURN(false);
@ -4351,7 +4441,8 @@ bool select_insert::send_eof()
{
bool res;
DBUG_ENTER("select_insert::send_eof");
res= (prepare_eof() || (!suppress_my_ok && send_ok_packet()));
res= (prepare_eof() || binlog_query() ||
(!suppress_my_ok && send_ok_packet()));
DBUG_RETURN(res);
}
@ -4416,7 +4507,11 @@ void select_insert::abort_result_set()
res= thd->binlog_query(THD::ROW_QUERY_TYPE, thd->query(),
thd->query_length(),
transactional_table, FALSE, FALSE, errcode);
binary_logged= res == 0 || !table->s->tmp_table;
/* TODO: Update binary_logged in do_postlock() for RBR? */
const bool tmp_table= create_info ? create_info->tmp_table() :
(bool) table->s->tmp_table;
binary_logged= res == 0 || !tmp_table;
}
if (changed)
query_cache_invalidate3(thd, table, 1);
@ -4494,6 +4589,8 @@ TABLE *select_create::create_table_from_items(THD *thd, List<Item> *items,
List_iterator_fast<Item> it(*items);
Item *item;
bool save_table_creation_was_logged;
int create_table_mode= C_ORDINARY_CREATE;
LEX_CUSTRING frm= {0, 0};
DBUG_ENTER("select_create::create_table_from_items");
tmp_table.s= &share;
@ -4564,6 +4661,13 @@ TABLE *select_create::create_table_from_items(THD *thd, List<Item> *items,
create_info->mdl_ticket= table_list->table->mdl_ticket;
}
if (atomic_replace)
{
if (create_info->make_tmp_table_list(thd, &table_list,
&create_table_mode))
DBUG_RETURN(NULL);
}
/*
Create and lock table.
@ -4581,11 +4685,14 @@ TABLE *select_create::create_table_from_items(THD *thd, List<Item> *items,
open_table().
*/
if (!mysql_create_table_no_lock(thd, &ddl_log_state_create, &ddl_log_state_rm,
if (!mysql_create_table_no_lock(thd,
&orig_table->db,
&orig_table->table_name,
&table_list->db,
&table_list->table_name,
create_info, alter_info, NULL,
C_ORDINARY_CREATE, table_list))
create_table_mode, table_list,
atomic_replace ? &frm : NULL))
{
DEBUG_SYNC(thd,"create_table_select_before_open");
@ -4595,7 +4702,58 @@ TABLE *select_create::create_table_from_items(THD *thd, List<Item> *items,
*/
table_list->table= 0;
if (!create_info->tmp_table())
if (atomic_replace)
{
char tmp_path[FN_REFLEN + 1];
build_table_filename(tmp_path, sizeof(tmp_path) - 1, table_list->db.str,
table_list->table_name.str, "", FN_IS_TMP);
table_list->table=
thd->create_and_open_tmp_table(&frm, tmp_path, orig_table->db.str,
orig_table->table_name.str, false);
/*
NOTE: if create_and_open_tmp_table() fails the table is dropped by
ddl_log_state_create
*/
if (table_list->table)
{
table_list->table->s->tmp_table= TMP_TABLE_ATOMIC_REPLACE;
/*
NOTE: Aria tables require table locking to work in transactional
mode. Since we don't lock our temporary table we get problems with
unproperly initialized transactional mode: seg-fault while accessing
uninitialized trn member (reproduced by
atomic.create_replace,aria,stmt).
This hack disables logging for Aria table (that is not needed anyway
for a temporary table).
*/
TABLE *table= table_list->table;
int error;
/* Disable logging of inserted rows */
mysql_trans_prepare_alter_copy_data(thd);
if ((DBUG_IF("atomic_replace_external_lock_fail") &&
(error= HA_ERR_LOCK_TABLE_FULL)) ||
(error= table->file->ha_external_lock(thd, F_WRLCK)))
{
table->file->print_error(error, MYF(0));
/*
Enable transaction logging. We cannot call ha_enable_transaction()
as this would write the transaction to the binary log
*/
thd->transaction->on= true;
table->file->ha_reset();
thd->drop_temporary_table(table, NULL, false);
table_list->table= 0;
goto err;
}
table_list->table->s->can_do_row_logging= 1;
}
}
else if (!create_info->tmp_table())
{
Open_table_context ot_ctx(thd, MYSQL_OPEN_REOPEN);
TABLE_LIST::enum_open_strategy save_open_strategy;
@ -4636,13 +4794,18 @@ TABLE *select_create::create_table_from_items(THD *thd, List<Item> *items,
}
else
table_list->table= 0; // Create failed
err:
DBUG_ASSERT(!table_list->table || frm.str || !atomic_replace);
my_free(const_cast<uchar *>(frm.str));
if (unlikely(!(table= table_list->table)))
{
if (likely(!thd->is_error())) // CREATE ... IF NOT EXISTS
my_ok(thd); // succeed, but did nothing
ddl_log_complete(&ddl_log_state_rm);
ddl_log_complete(&ddl_log_state_create);
const bool error= thd->is_error();
/* CREATE ... IF NOT EXISTS succeed, but did nothing */
if (likely(!error))
my_ok(thd);
create_info->finalize_ddl(thd, error);
DBUG_RETURN(NULL);
}
@ -4661,9 +4824,13 @@ TABLE *select_create::create_table_from_items(THD *thd, List<Item> *items,
mysql_lock_tables() below should never fail with request to reopen table
since it won't wait for the table lock (we have exclusive metadata lock on
the table) and thus can't get aborted.
In case of atomic_replace we have already called ha_external_lock() above
on the newly created temporary table.
*/
if (unlikely(!((*lock)= mysql_lock_tables(thd, &table, 1, 0)) ||
postlock(thd, &table)))
if ((!atomic_replace &&
unlikely(!((*lock)= mysql_lock_tables(thd, &table, 1, 0)))) ||
postlock(thd, &table))
{
/* purecov: begin tested */
/*
@ -4680,13 +4847,24 @@ TABLE *select_create::create_table_from_items(THD *thd, List<Item> *items,
*lock= 0;
}
drop_open_table(thd, table, &table_list->db, &table_list->table_name);
ddl_log_complete(&ddl_log_state_rm);
ddl_log_complete(&ddl_log_state_create);
if (atomic_replace)
create_info->finalize_ddl(thd, 1);
else
{
debug_crash_here("ddl_log_create_log_complete");
ddl_log_complete(&ddl_log_state_create);
debug_crash_here("ddl_log_create_log_complete2");
}
thd->transaction->on= true;
DBUG_RETURN(NULL);
/* purecov: end */
}
DBUG_ASSERT(
create_info->tmp_table() ||
thd->mdl_context.is_lock_owner(MDL_key::TABLE, orig_table->db.str,
orig_table->table_name.str, MDL_SHARED));
table->s->table_creation_was_logged= save_table_creation_was_logged;
if (!table->s->tmp_table)
if (!create_info->tmp_table())
table->file->prepare_for_row_logging();
/*
@ -4734,11 +4912,10 @@ int select_create::postlock(THD *thd, TABLE **tables)
if (unlikely(error))
return error;
TABLE const *const table = *tables;
if (thd->is_current_stmt_binlog_format_row() &&
!table->s->tmp_table)
return binlog_show_create_table(thd, *tables, create_info);
return 0;
if (thd->is_current_stmt_binlog_format_row() && !create_info->tmp_table())
error= binlog_show_create_table(thd, *tables, create_info);
return error;
}
@ -4765,7 +4942,11 @@ select_create::prepare(List<Item> &_values, SELECT_LEX_UNIT *u)
if (!(table= create_table_from_items(thd, &values, &extra_lock)))
{
if (create_info->or_replace())
/*
TODO: Use create_info->table_was_deleted
(now binlog.binlog_stm_binlog fails).
*/
if (create_info->or_replace() && !atomic_replace)
{
/* Original table was deleted. We have to log it */
log_drop_table(thd, &table_list->db, &table_list->table_name,
@ -4779,6 +4960,8 @@ select_create::prepare(List<Item> &_values, SELECT_LEX_UNIT *u)
DBUG_RETURN(-1);
}
DBUG_ASSERT(table == table_list->table);
if (create_info->tmp_table())
{
/*
@ -4788,17 +4971,26 @@ select_create::prepare(List<Item> &_values, SELECT_LEX_UNIT *u)
list to keep them inaccessible from inner statements.
e.g. CREATE TEMPORARY TABLE `t1` AS SELECT * FROM `t1`;
*/
saved_tmp_table_share= thd->save_tmp_table_share(table_list->table);
saved_tmp_table_share= thd->save_tmp_table_share(table);
}
if (extra_lock)
{
DBUG_ASSERT(m_plock == NULL);
if (create_info->tmp_table())
if (table->s->tmp_table)
{
/* Table is a temporary table, don't write table map to binary log */
m_plock= &m_lock;
}
else
{
/*
Table is a normal table. Inform binlog_write_table_maps() that
it should write the table map for the current table.
*/
m_plock= &thd->extra_lock;
}
*m_plock= extra_lock;
}
@ -4890,6 +5082,14 @@ static int binlog_show_create_table(THD *thd, TABLE *table,
create_info, WITH_DB_NAME);
DBUG_ASSERT(result == 0); /* show_create_table() always return 0 */
/*
NOTE: why it does show_create_table() even if !mysql_bin_log.is_open()?
Because Galera needs it even if there is no binlog.
(I assume Galera will hijack the binlog information and use it itself
if there is no binlog). That is the the only thing that makes sence
looking at the if statement... Monty
*/
if (WSREP_EMULATE_BINLOG(thd) || mysql_bin_log.is_open())
{
int errcode= query_error_code(thd, thd->killed == NOT_KILLED);
@ -5011,30 +5211,12 @@ bool select_create::send_eof()
is in select_insert::prepare_eof(). For that reason, we
mark the flag at this point.
*/
if (table->s->tmp_table)
if (create_info->tmp_table())
thd->transaction->stmt.mark_created_temp_table();
if (thd->slave_thread)
thd->variables.binlog_annotate_row_events= 0;
debug_crash_here("ddl_log_create_before_binlog");
/*
In case of crash, we have to add DROP TABLE to the binary log as
the CREATE TABLE will already be logged if we are not using row based
replication.
*/
if (!thd->is_current_stmt_binlog_format_row())
{
if (ddl_log_state_create.is_active()) // Not temporary table
ddl_log_update_phase(&ddl_log_state_create, DDL_CREATE_TABLE_PHASE_LOG);
/*
We can ignore if we replaced an old table as ddl_log_state_create will
now handle the logging of the drop if needed.
*/
ddl_log_complete(&ddl_log_state_rm);
}
if (prepare_eof())
{
abort_result_set();
@ -5042,7 +5224,7 @@ bool select_create::send_eof()
}
debug_crash_here("ddl_log_create_after_prepare_eof");
if (table->s->tmp_table)
if (create_info->tmp_table())
{
/*
Now is good time to add the new table to THD temporary tables list.
@ -5068,11 +5250,11 @@ bool select_create::send_eof()
tables. This can fail, but we should unlock the table
nevertheless.
*/
if (!table->s->tmp_table)
if (!create_info->tmp_table())
{
#ifdef WITH_WSREP
if (WSREP(thd) &&
table->file->ht->db_type == DB_TYPE_INNODB)
create_info->db_type->db_type == DB_TYPE_INNODB)
{
if (thd->wsrep_trx_id() == WSREP_UNDEFINED_TRX_ID)
{
@ -5107,15 +5289,46 @@ bool select_create::send_eof()
thd->get_stmt_da()->set_overwrite_status(true);
}
#endif /* WITH_WSREP */
thd->binlog_xid= thd->query_id;
/* Remember xid's for the case of row based logging */
ddl_log_update_xid(&ddl_log_state_create, thd->binlog_xid);
ddl_log_update_xid(&ddl_log_state_rm, thd->binlog_xid);
if (atomic_replace)
{
table_list= orig_table;
create_info->table= orig_table->table;
thd->transaction->on= true;
table->file->ha_reset();
/*
Remove the temporary table structures from memory but keep the table
files.
*/
thd->drop_temporary_table(table, NULL, false);
table= NULL;
if (create_info->finalize_atomic_replace(thd, orig_table))
{
abort_result_set();
DBUG_RETURN(true);
}
}
if (binlog_query())
{
abort_result_set();
DBUG_RETURN(true);
}
debug_crash_here("ddl_log_create_after_binlog");
trans_commit_stmt(thd);
if (!(thd->variables.option_bits & OPTION_GTID_BEGIN))
trans_commit_implicit(thd);
thd->binlog_xid= 0;
/*
If are using statement based replication the table will be deleted here
in case of a crash as we can't use xid to check if the query was logged
(as the query was logged before commit!)
*/
create_info->finalize_ddl(thd, false);
#ifdef WITH_WSREP
if (WSREP(thd))
{
@ -5147,17 +5360,22 @@ bool select_create::send_eof()
ddl_log.org_database= table_list->db;
ddl_log.org_table= table_list->table_name;
ddl_log.org_table_id= create_info->tabledef_version;
/*
Since atomic replace doesn't do mysql_rm_table_no_locks() we have
to log DROP entry now. It was already prepared in create_table_impl().
*/
if (create_info->drop_entry.query.length)
{
DBUG_ASSERT(atomic_replace);
backup_log_ddl(&create_info->drop_entry);
}
backup_log_ddl(&ddl_log);
}
/*
If are using statement based replication the table will be deleted here
in case of a crash as we can't use xid to check if the query was logged
(as the query was logged before commit!)
*/
debug_crash_here("ddl_log_create_after_binlog");
ddl_log_complete(&ddl_log_state_rm);
ddl_log_complete(&ddl_log_state_create);
debug_crash_here("ddl_log_create_log_complete");
else if (binlog_query())
{
abort_result_set();
DBUG_RETURN(true);
}
/*
exit_done must only be set after last potential call to
@ -5165,10 +5383,9 @@ bool select_create::send_eof()
*/
exit_done= 1; // Avoid double calls
send_ok_packet();
if (m_plock)
{
DBUG_ASSERT(!atomic_replace);
MYSQL_LOCK *lock= *m_plock;
*m_plock= NULL;
m_plock= NULL;
@ -5187,11 +5404,20 @@ bool select_create::send_eof()
create_info->
pos_in_locked_tables,
table, lock))
{
send_ok_packet();
DBUG_RETURN(false); // ok
}
/* Fail. Continue without locking the table */
thd->clear_error();
}
mysql_unlock_tables(thd, lock);
}
else if (atomic_replace && create_info->pos_in_locked_tables &&
create_info->finalize_locked_tables(thd))
DBUG_RETURN(true);
send_ok_packet();
DBUG_RETURN(false);
}
@ -5229,6 +5455,7 @@ void select_create::abort_result_set()
thd->variables.option_bits&= ~OPTION_BIN_LOG;
select_insert::abort_result_set();
thd->transaction->stmt.modified_non_trans_table= FALSE;
thd->transaction->on= true;
thd->variables.option_bits= save_option_bits;
/* possible error of writing binary log is ignored deliberately */
@ -5236,9 +5463,7 @@ void select_create::abort_result_set()
if (table)
{
bool tmp_table= table->s->tmp_table;
bool table_creation_was_logged= (!tmp_table ||
table->s->table_creation_was_logged);
bool tmp_table= create_info->tmp_table();
if (tmp_table)
{
DBUG_ASSERT(saved_tmp_table_share);
@ -5260,7 +5485,14 @@ void select_create::abort_result_set()
m_plock= NULL;
}
drop_open_table(thd, table, &table_list->db, &table_list->table_name);
if (atomic_replace)
{
(void) table->file->ha_external_lock(thd, F_UNLCK);
(void) thd->drop_temporary_table(table, NULL, true);
}
else
drop_open_table(thd, table, &table_list->db,
&table_list->table_name);
table=0; // Safety
if (thd->log_current_statement())
{
@ -5268,23 +5500,8 @@ void select_create::abort_result_set()
{
/* Remove logging of drop, create + insert rows */
binlog_reset_cache(thd);
/* Original table was deleted. We have to log it */
if (table_creation_was_logged)
{
thd->binlog_xid= thd->query_id;
ddl_log_update_xid(&ddl_log_state_create, thd->binlog_xid);
ddl_log_update_xid(&ddl_log_state_rm, thd->binlog_xid);
debug_crash_here("ddl_log_create_before_binlog");
log_drop_table(thd, &table_list->db, &table_list->table_name,
&create_info->org_storage_engine_name,
create_info->db_type == partition_hton,
&create_info->tabledef_version,
tmp_table);
debug_crash_here("ddl_log_create_after_binlog");
thd->binlog_xid= 0;
}
}
else if (!tmp_table)
else if (!tmp_table && !atomic_replace)
{
backup_log_info ddl_log;
bzero(&ddl_log, sizeof(ddl_log));
@ -5299,8 +5516,8 @@ void select_create::abort_result_set()
}
}
ddl_log_complete(&ddl_log_state_rm);
ddl_log_complete(&ddl_log_state_create);
create_info->finalize_ddl(thd, !binary_logged);
DBUG_ASSERT(!thd->binlog_xid);
if (create_info->table_was_deleted)
{
@ -5308,6 +5525,7 @@ void select_create::abort_result_set()
(void) trans_rollback_stmt(thd);
thd->locked_tables_list.unlock_locked_table(thd, create_info->mdl_ticket);
}
else if (atomic_replace && create_info->pos_in_locked_tables)
(void) create_info->finalize_locked_tables(thd);
DBUG_VOID_RETURN;
}