BREAKS RBR
Analysis:
--------
A table created using a query of the format:
CREATE TABLE t1 AS SELECT REPEAT('A',1000) DIV 1 AS a;
breaks the Row Based Replication.
The query above creates a table having a field of datatype
'bigint' with a display width of 3000 which is beyond the
maximum acceptable value of 255.
In the RBR mode, CREATE TABLE SELECT statement is
replicated as a combination of CREATE TABLE statement
equivalent to one the returned by SHOW CREATE TABLE and
row events for rows inserted. When this CREATE TABLE event
is executed on the slave, an error is reported:
Display width out of range for column 'a' (max = 255)
The following is the output of 'SHOW CREATE TABLE t1':
CREATE TABLE t1(`a` bigint(3000) DEFAULT NULL)
ENGINE=InnoDB DEFAULT CHARSET=latin1;
The problem is due to the combination of two facts:
1) The above CREATE TABLE SELECT statement uses the display
width of the result of DIV operation as the display width
of the column created without validating the width for out
of bound condition.
2) The DIV operation incorrectly returns the length of its first
argument as the display width of its result; thus allowing
creation of a table with an incorrect display width of 3000
for the field.
Fix:
----
This fix changes the DIV operation implementation to correctly
evaluate the display width of its result. We check if DIV's
results estimated width crosses maximum width for integer
value (21) and if yes set it to this maximum value.
This patch also fixes fixes maximum display width evaluation
for DIV function when its first argument is in UCS2.
The problem is the async binlog checkpointing; this could on rare
occasions occur too late, causing SHOW BINLOG EVENTS to show the
wrong events and cause .result file difference.
Replication caches the character sets used in a query, to be able to quickly
reuse them for the next query in the common case of them not having changed.
In parallel replication, this caching needs to be per-worker-thread. The
code was not modified to handle this correctly, so the caching in one worker
could cause another worker to run a query using the wrong character set,
causing replication corruption.
This will help the following replicaition scenario:
MariaDB 10.0 master (statement replication) -> MariaDB 10.0 slave (row based replication) -> MySQL or MariaDB 5.x slave
mysql-test/r/mysqld--help.result:
Updated help text
mysql-test/suite/rpl/r/create_or_replace_mix.result:
Added more tests
mysql-test/suite/rpl/r/create_or_replace_row.result:
Added more tests
mysql-test/suite/rpl/r/create_or_replace_statement.result:
Added more tests
mysql-test/suite/rpl/t/create_or_replace.inc:
Added more tests
sql/handler.h:
Added org_options so that we can detect what come from the query and what was possible added later.
sql/sql_insert.cc:
Only write CREATE OR REPLACE if was originally specified or if we delete a conflicting table as part of create
sql/sql_parse.cc:
Remember orginal create options
sql/sql_table.cc:
Only write CREATE OR REPLACE if was originally specified or if we delete a conflicting table as part of create
sql/sys_vars.cc:
Updated help text
Due to how gap locks work, two transactions could group commit together on the
master, but get lock conflicts and then deadlock due to different thread
scheduling order on slave.
For now, remove these deadlocks by running the parallel slave in READ
COMMITTED mode. And let InnoDB/XtraDB allow statement-based binlogging for the
parallel slave in READ COMMITTED.
We are also investigating a different solution long-term, which is based on
relaxing the gap locks only between the transactions running in parallel for
one slave, but not against possibly external transactions.
When a transaction fails in parallel replication, it should signal the error
to any following transactions doing wait_for_prior_commit() on it. But the
code for this was incorrect, and would not correctly remember a prior error
when sending the signal. This caused corruption when slave stopped due to an
error.
Fix by remembering the error code when we first get an error, and passing the
saved error code to wakeup_subsequent_commits().
Thanks to nanyi607rao who reported this bug on
maria-developers@lists.launchpad.net and analysed the root cause.
Now if CREATE OR REPLACE fails but we have deleted a table already, we will generate a DROP TABLE in the binary log.
This fixes this issue.
In addition, for a failing CREATE OR REPLACE TABLE ... SELECT we don't generate a log of all the inserted rows, only the DROP TABLE.
I added code for not logging DROP TEMPORARY TABLE for tables where the CREATE TABLE was not logged. This code will be activated in 10.1
by removing the code protected by DONT_LOG_DROP_OF_TEMPORARY_TABLES.
mysql-test/suite/rpl/r/create_or_replace_mix.result:
More test cases
mysql-test/suite/rpl/r/create_or_replace_row.result:
More test cases
mysql-test/suite/rpl/r/create_or_replace_statement.result:
More test cases
mysql-test/suite/rpl/t/create_or_replace.inc:
More test cases
sql/log.cc:
Added binlog_reset_cache() to clear the binary log.
sql/log.h:
Added prototype
sql/sql_insert.cc:
If CREATE OR REPLACE TABLE ... SELECT fails:
- Don't log anything if nothing changed
- If table was deleted, log a DROP TABLE.
Remember if we table creation of temporary tables was logged.
sql/sql_table.cc:
Added log_drop_table()
Remember if we table creation of temporary tables was logged.
If CREATE OR REPLACE TABLE ... SELECT fails and a table was deleted, log a DROP TABLE.
sql/sql_table.h:
Added prototype
sql/sql_truncate.cc:
Remember if we table creation of temporary tables was logged.
sql/table.h:
Added table_creation_was_logged
mysql-test/suite/rpl/t/rpl_000011-slave.opt:
Renamed test case as it's slave that needs to restarted
support-files/compiler_warnings.supp:
Fixed bad characters in suppression
mysql-test/suite/rpl/t/rpl_000011-master.opt:
Added master.opt file to ensure that other tests don't interfere with rpl_000011
plugin/server_audit/server_audit.c:
Fixed compiler error on solaris
support-files/compiler_warnings.supp:
Ignore warning from xtradb
Reason for the bug was an optimization for higher connect speed where we moved when global status was updated,
but forgot to update states when slave thread dies.
Fixed by adding thd->add_status_to_global() before deleting slave thread's thd.
mysys/my_delete.c:
Added missing newline
sql/mysqld.cc:
Use add_status_to_global()
sql/slave.cc:
Added missing add_status_to_global()
sql/sql_class.cc:
Use add_status_to_global()
sql/sql_class.h:
Simplify adding local status to global by adding add_status_to_global()
When an rpl_group_info object was returned from the free list, the
rgi->deferred_events_collecting and rgi->deferred_events was not correctly
re-inited. Additionally, the rgi->deferred_events was incorrectly freed in
free_rgi(), which causes unnecessary malloc/free (or crash when re-init is not
done).
Thanks to user nanyi607rao, who reported this bug on maria-developers@.
The problem was when a GTID event was part of a group commit, and so contained
a commit id. The code that replaces GTID with a BEGIN event for old slaves did
not correctly handle this case.
Fix the code so that the GTID with commit id can also be properly replaced
with a BEGIN query event. The extra two bytes are in the BEGIN event replaced
with a dummy, empty time zone string.
Older master has no GTID events, so such events are not available for
deciding on scheduling of event groups and so on.
With this patch, we run such events from old masters single-threaded, in the
sql driver thread.
This seems better than trying to make the parallel code handle the data from
older masters; while possible, this would require a lot of testing (as well as
possibly some extra overhead in the scheduling of events), which hardly seems
worthwhile.
Clean up and improve the parallel implementation code, mainly related to
scheduling of work to threads and handling of stop and errors.
Fix a lot of bugs in various corner cases that could lead to crashes or
corruption.
Fix that a single replication domain could easily grab all worker threads and
stall all other domains; now a configuration variable
--slave-domain-parallel-threads allows to limit the number of
workers.
Allow next event group to start as soon as previous group begins the commit
phase (as opposed to when it ends it); this allows multiple event groups on
the slave to participate in group commit, even when no other opportunities for
parallelism are available.
Various fixes:
- Fix some races in the rpl.rpl_parallel test case.
- Fix an old incorrect assertion in Log_event iocache read.
- Fix repeated malloc/free of wait_for_commit and rpl_group_info objects.
- Simplify wait_for_commit wakeup logic.
- Fix one case in queue_for_group_commit() where killing one thread would
fail to correctly signal the error to the next, causing loss of the
transaction after slave restart.
- Fix leaking of pthreads (and their allocated stack) due to missing
PTHREAD_CREATE_DETACHED attribute.
- Fix how one batch of group-committed transactions wait for the previous
batch before starting to execute themselves. The old code had a very
complex scheduling where the first transaction was handled differently,
with subtle bugs in corner cases. Now each event group is always scheduled
for a new worker (in a round-robin fashion amongst available workers).
Keep a count of how many transactions have started to commit, and wait for
that counter to reach the appropriate value.
- Fix slave stop to wait for all workers to actually complete processing;
before, the wait was for update of last_committed_sub_id, which happens a
bit earlier, and could leave worker threads potentially accessing bits of
the replication state that is no longer valid after slave stop.
- Fix a couple of places where the test suite would kill a thread waiting
inside enter_cond() in connection with debug_sync; debug_sync + kill can
crash in rare cases due to a race with mysys_var_current_mutex in this
case.
- Fix some corner cases where we had enter_cond() but no exit_cond().
- Fix that we could get failure in wait_for_prior_commit() but forget to flag
the error with my_error().
- Fix slave stop (both for normal stop and stop due to error). Now, at stop
we pick a specific safe point (in terms of event groups executed) and make
sure that all event groups before that point are executed to completion,
and that no event group after start executing; this ensures a safe place to
restart replication, even for non-transactional stuff/DDL. In error stop,
make sure that all prior event groups are allowed to execute to completion,
and that any later event groups that have started are rolled back, if
possible. The old code could leave eg. T1 and T3 committed but T2 not, or
it could even leave half a transaction not rolled back in some random
worker, which would cause big problems when that worker was later reused
after slave restart.
- Fix the accounting of amount of events queued for one worker. Before, the
amount was reduced immediately as soon as the events were dequeued (which
happens all at once); this allowed twice the amount of events to be queued
in memory for each single worker, which is not what users would expect.
- Fix that an error set during execution of one event was sometimes not
cleared before executing the next, causing problems with the error
reporting.
- Fix incorrect handling of thd->killed in worker threads.
As part of the fix we don't anymore generate a create table statement when doing a
CREATE TABLE IF NOT EXISTS table_that_exist LiKE temporary_table
if the 'table_that_exist' existed.
This is because it's not self evident if we should generate a create statement
matching the existing table or the temporary_table.
The old code generated a table like the existing table in row based replication and like the temporary table
in statement based replication.
It's better to ensure that both cases works the same way.
mysql-test/suite/rpl/r/rpl_row_create_table.result:
Updated results
(Now we don't anymore CREATE TABLE IF NOT EXISTS LIKE if the table existed)
sql/sql_base.cc:
More DBUG_PRINT
sql/sql_error.cc:
More DBUG_PRINT
sql/sql_table.cc:
Don't generate a create table statement when doing a
CREATE TABLE IF NOT EXISTS table_that_exist like temporary_table if the table existed.
As a side-effect of purge_relay_logs(), sql_slave_skip_counter
was silently ignored in GTID mode.
But sql_slave_skip_counter in fact is not a good match with GTID.
And it is not really needed either, as users can explicitly set
@@gtid_slave_pos to skip specific GTIDs, in a way that matches
well how GTID replication works.
So with this patch, we give an error on attempts to set
sql_slave_skip_counter when using GTID, with a suggestion to use
gtid_slave_pos instead, if needed.
MDEV-4984: Implement MASTER_GTID_WAIT() and @@LAST_GTID.
MDEV-4726: Race in mysql-test/suite/rpl/t/rpl_gtid_stop_start.test
MDEV-5636: Deadlock in RESET MASTER
Rewrite the gtid_waiting::wait_for_gtid() function.
The code was rubbish (and buggy). Now the logic is
much clearer.
Also fix a missing slave sync that could cause test failure.
Some GTID test cases were using include/wait_condition.inc with a
condition like SELECT COUNT(*)=4 FROM t1 to wait for the slave to
catch up with the master. This causes races and test failures, as the
changes to the tables become visible at the COMMIT of the SQL thread
(or even before in case of MyISAM), but the changes to
@@gtid_slave_pos only become visible a little bit after the COMMIT.
Now that we have MASTER_GTID_WAIT(), just use that to sync up in a
GTID-friendly way, wrapped in nice include/save_master_gtid.inc and
include/sync_with_master_gtid.inc scripts.
MASTER_GTID_WAIT() is similar to MASTER_POS_WAIT(), but works with a
GTID position rather than an old-style filename/offset.
@@LAST_GTID gives the GTID assigned to the last transaction written
into the binlog.
Together, the two can be used by applications to obtain the GTID of
an update on the master, and then do a MASTER_GTID_WAIT() for that
position on any read slave where it is important to get results that
are caught up with the master at least to the point of the update.
The implementation of MASTER_GTID_WAIT() is implemented in a way
that tries to minimise the performance impact on the SQL threads,
even in the presense of many waiters on single GTID positions (as
from @@LAST_GTID).
mysql-test/r/lowercase_table2.result:
Updated result
(The change happend because we don't try to open the table anymore as part of create table)
mysql-test/suite/rpl/r/create_or_replace_mix.result:
Fixed result file
mysql-test/suite/rpl/r/create_or_replace_row.result:
Fixed result file
mysql-test/suite/rpl/r/create_or_replace_statement.result:
Fixed result file
mysql-test/suite/rpl/t/create_or_replace.inc:
Drop open temporary table
mysys/my_delete.c:
Added missing newline
plugin/metadata_lock_info/mysql-test/metadata_lock_info/r/user_lock.result:
Fixed result
(Lock names was before off by one. Was corrected by my previous patch)
sql/sql_select.cc:
Fixed compiler warnings by adding missing casts
storage/connect/ha_connect.cc:
Fixed compiler warnings
storage/innobase/os/os0file.cc:
Fixed compiler warnings
storage/xtradb/btr/btr0btr.cc:
Fixed compiler warnings
storage/xtradb/handler/ha_innodb.cc:
removed not used function
strings/ctype-uca.c:
Fixed compiler warnings
support-files/compiler_warnings.supp:
Added suppression for warnings that are wrong or are not serious andthat we don't plan to fix.
- CREATE TABLE is by default executed on the slave as CREATE OR REPLACE
- DROP TABLE is by default executed on the slave as DROP TABLE IF NOT EXISTS
This means that a slave will by default continue even if we try to create
a table that existed on the slave (the table will be deleted and re-created) or
if we try to drop a table that didn't exist on the slave.
This should be safe as instead of having the slave stop because of an inconsistency between
master and slave, it will fix the inconsistency.
Those that would prefer to get a stopped slave instead for the above cases can set slave_ddl_exec_mode to STRICT.
- Ensure that a CREATE OR REPLACE TABLE which dropped a table is replicated
- DROP TABLE that generated an error on master is handled as an identical DROP TABLE on the slave (IF NOT EXISTS is not added in this case)
- Added slave_ddl_exec_mode variable to decide how DDL's are replicated
New logic for handling BEGIN GTID ... COMMIT from the binary log:
- When we find a BEGIN GTID, we start a transaction and set OPTION_GTID_BEGIN
- When we find COMMIT, we reset OPTION_GTID_BEGIN and execute the normal COMMIT code.
- While OPTION_GTID_BEGIN is set:
- We don't generate implict commits before or after statements
- All tables are regarded as transactional tables in the binary log (to ensure things are executed exactly as on the master)
- We reset OPTION_GTID_BEGIN also on rollback
This will help ensuring that we don't get any sporadic commits (and thus new GTID's) on the slave and will help keep the GTID's between master and slave in sync.
mysql-test/extra/rpl_tests/rpl_log.test:
Added testing of mode slave_ddl_exec_mode=STRICT
mysql-test/r/mysqld--help.result:
New help messages
mysql-test/suite/rpl/r/create_or_replace_mix.result:
Testing of CREATE OR REPLACE TABLE with replication
mysql-test/suite/rpl/r/create_or_replace_row.result:
Testing of CREATE OR REPLACE TABLE with replication
mysql-test/suite/rpl/r/create_or_replace_statement.result:
Testing replication of create or replace
mysql-test/suite/rpl/r/rpl_gtid_startpos.result:
Test must be run in slave_ddl_exec_mode=STRICT as part of the test depends on that DROP TABLE should fail on slave.
mysql-test/suite/rpl/r/rpl_row_log.result:
Updated result
mysql-test/suite/rpl/r/rpl_row_log_innodb.result:
Updated result
mysql-test/suite/rpl/r/rpl_row_show_relaylog_events.result:
Updated result
mysql-test/suite/rpl/r/rpl_stm_log.result:
Updated result
mysql-test/suite/rpl/r/rpl_stm_mix_show_relaylog_events.result:
Updated result
mysql-test/suite/rpl/r/rpl_temp_table_mix_row.result:
Updated result
mysql-test/suite/rpl/t/create_or_replace.inc:
Testing of CREATE OR REPLACE TABLE with replication
mysql-test/suite/rpl/t/create_or_replace_mix.cnf:
Testing of CREATE OR REPLACE TABLE with replication
mysql-test/suite/rpl/t/create_or_replace_mix.test:
Testing of CREATE OR REPLACE TABLE with replication
mysql-test/suite/rpl/t/create_or_replace_row.cnf:
Testing of CREATE OR REPLACE TABLE with replication
mysql-test/suite/rpl/t/create_or_replace_row.test:
Testing of CREATE OR REPLACE TABLE with replication
mysql-test/suite/rpl/t/create_or_replace_statement.cnf:
Testing of CREATE OR REPLACE TABLE with replication
mysql-test/suite/rpl/t/create_or_replace_statement.test:
Testing of CREATE OR REPLACE TABLE with replication
mysql-test/suite/rpl/t/rpl_gtid_startpos.test:
Test must be run in slave_ddl_exec_mode=STRICT as part of the test depends on that DROP TABLE should fail on slave.
mysql-test/suite/rpl/t/rpl_stm_log.test:
Removed some lines
mysql-test/suite/sys_vars/r/slave_ddl_exec_mode_basic.result:
Testing of slave_ddl_exec_mode
mysql-test/suite/sys_vars/t/slave_ddl_exec_mode_basic.test:
Testing of slave_ddl_exec_mode
sql/handler.cc:
Regard all tables as transactional in commit if OPTION_GTID_BEGIN is set.
This is to ensure that statments are not commited too early if non transactional tables are used.
sql/log.cc:
Regard all tables as transactional in commit if OPTION_GTID_BEGIN is set.
Also treat 'direct' log events as transactional (to get them logged as they where on the master)
sql/log_event.cc:
Ensure that the new error from DROP TABLE when trying to drop a view is treated same as the old one.
Store error code that slave expects in THD.
Set OPTION_GTID_BEGIN if we find a BEGIN.
Reset OPTION_GTID_BEGIN if we find a COMMIT.
sql/mysqld.cc:
Added slave_ddl_exec_mode_options
sql/mysqld.h:
Added slave_ddl_exec_mode_options
sql/rpl_gtid.cc:
Reset OPTION_GTID_BEGIN if we record a gtid (safety)
sql/sql_class.cc:
Regard all tables as transactional in commit if OPTION_GTID_BEGIN is set.
sql/sql_class.h:
Added to THD: log_current_statement and slave_expected_error
sql/sql_insert.cc:
Ensure that CREATE OR REPLACE is logged if table was deleted.
Don't do implicit commit for CREATE if we are under OPTION_GTID_BEGIN
sql/sql_parse.cc:
Change CREATE TABLE -> CREATE OR REPLACE TABLE for slaves
Change DROP TABLE -> DROP TABLE IF EXISTS for slaves
CREATE TABLE doesn't force implicit commit in case of OPTION_GTID_BEGIN
Don't do commits before or after any statement if OPTION_GTID_BEGIN was set.
sql/sql_priv.h:
Added OPTION_GTID_BEGIN
sql/sql_show.cc:
Enhanced store_create_info() to also be able to handle CREATE OR REPLACE
sql/sql_show.h:
Updated prototype
sql/sql_table.cc:
Ensure that CREATE OR REPLACE is logged if table was deleted.
sql/sys_vars.cc:
Added slave_ddl_exec_mode
sql/transaction.cc:
Added warning if we got a GTID under OPTION_GTID_BEGIN