1
0
mirror of https://github.com/MariaDB/server.git synced 2026-01-06 05:22:24 +03:00
Commit Graph

3131 Commits

Author SHA1 Message Date
Sergei Golubchik
b7434bacbd include/master-slave.inc must always be included last 2017-09-20 18:17:50 +02:00
Alexander Barkov
434e283507 MDEV-13685 Can not replay binary log due to Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8mb4_general_ci,COERCIBLE) for operation 'concat' 2017-09-15 12:25:06 +04:00
Sergei Golubchik
496cea45e2 update error messages for 10.0 2017-07-27 12:42:40 +02:00
Vicențiu Ciorbaru
786ad0a158 Merge remote-tracking branch 'origin/5.5' into 10.0 2017-07-25 00:41:54 +03:00
Sergei Golubchik
9a5fe1f4ea Merge remote-tracking branch 'mysql/5.5' into 5.5 2017-07-18 14:59:10 +02:00
Andrei Elkin
946a07e8a8 Fix for MDEV-9670 server_id mysteriously set to 0
Problem was that in a circular replication setup the master remembers
position to events it has generated itself when reading from a slave.
If there are no new events in the queue from the slave, a
Gtid_list_log_event is generated to remember the last skipped event.
The problem happens if there is a network delay and we generate a
Gtid_list_log_event in the middle of the transaction, in which case there
will be an implicit comment and a new transaction with serverid=0 will be
logged.

The fix was to not generate any Gtid_list_log_events in the middle of a
transaction.
2017-07-02 19:47:30 +03:00
Venkatesh Duggirala
bb9e547a86 Bug#18950197 RPL_SEMI_SYNC_UNINSTALL_PLUGIN FAILS BECAUSE
RPL_SEMI_SYNC_MASTER_CLIENTS=1

Analysis: Uninstalling rpl_semi_sync_slave on slave
          will trigger removing the slave logic on Master which
          will reduce Rpl_semi_sync_master_clients by one number.
          But it happens asynchronously on Master. Having assert
          to check this value with zero will have problems on
          slow pb2 machines.

Fix: Change assert into wait_for_status_var condition.
2017-05-25 12:39:20 +05:30
Sergei Golubchik
725e47bfb5 Merge branch '5.5' into 10.0 2017-05-20 00:59:40 +02:00
Sachin Setiya
7d57ba6e28 MDEV-11092 :- Fix Previous commit of MDEV-11092 2017-05-19 13:02:45 +05:30
Sachin Setiya
b5cdf01404 MDEV-11092 Assertion `!writer.checksum_len || writer.remains == 0' failed
Problem:-
This crash happens because logged stmt is quite big and while writing
Annotate_rows_log_event it throws EFBIG error  but we ignore this error
and do not call cache_data->set_incident().

Solution:-
When we normally write Binlog_log_event we check for error EFBIG, but we did
do this for Annotate_rows_log_event. We check for this error and call
cache_data->set_incident() accordingly.

# Conflicts:
#	sql/log.cc
2017-05-18 17:13:37 +05:30
Vicențiu Ciorbaru
1acfa942ed Merge branch '5.5' into 10.0 2017-03-03 01:37:54 +02:00
Sujatha Sivakumar
e619295e1b Bug#24901077: RESET SLAVE ALL DOES NOT ALWAYS RESET SLAVE
Description:
============
If you have a relay log index file that has ended up with
some relay log files that do not exists, then RESET SLAVE
ALL is not enough to get back to a clean state.

Analysis:
=========
In the bug scenario slave server is in stopped state and
some of the relay logs got deleted but the relay log index
file is not updated.

During slave server restart replication initialization fails
as some of the required relay logs are missing. User
executes RESET SLAVE/RESET SLAVE ALL command to start a
clean slave. As per the documentation RESET SLAVE command
clears the master info and relay log info repositories,
deletes all the relay log files, and starts a new relay log
file. But in a scenario where the slave server's
Relay_log_info object is not initialized slave will not
purge the existing relay logs. Hence the index file still
remains in a bad state. Users will not be able to start
the slave unless these files are cleared.

Fix:
===
RESET SLAVE/RESET SLAVE ALL commands should do the cleanup
even in a scenario where Relay_log_info object
initialization failed.

Backported a flag named 'error_on_rli_init_info' which is
required to identify slave's Relay_log_info object
initialization failure. This flag exists in MySQL-5.6
onwards as part of BUG#14021292 fix.

During RESET SLAVE/RESET SLAVE ALL execution this flag
indicates the Relay_log_info initialization failure.
In such a case open the relay log index/relay log files
and do the required clean up.
2017-02-28 10:00:51 +05:30
Elena Stepanova
343ba58562 MDEV-10631 rpl.rpl_mdev6386 failed in buildbot
The failure happens due to a race condition between processing
a row event (INSERT) and an automatically generated event
DROP TEMPORARY TABLE. Even though DROP has a higher GTID, it can
become visible in @@gtid_slave_pos before the row event with
a lower GTID has been committed. Since the test makes the slave
to synchronize with the master using GTID, the waiting stops
as soon as GTID of the DROP TEMPORARY TABLE becomes visible,
and if changes from the previous event haven't been applied yet,
the error occurs.

According to Kristian (see the comment to MDEV-10631), the real
problem is that DROP TEMPORARY TABLE is logged in the row mode
at all. For this particular test, since DROP does not do anything,
nothing prevents it from competing with the prior transaction.

The workaround for the test is to add a meaningful event
after DROP TEMPORARY TABLE, so that the slave would wait on its
GTID instead of the one from DROP.

Additionally (unrelated to this problem) removed FLUSH TABLES,
which, as the comment stated, should have been removed after
MDEV-6403 was fixed.
2017-02-20 01:48:26 +02:00
Elena Stepanova
b70cd26d73 MDEV-11668 rpl.rpl_heartbeat_basic fails sporadically in buildbot
On a slow builder, a delay between binlog events on master could
occur, which would cause a heartbeat which is not expected by the
test. The solution is to monitor the timing of binlog events
on the master and only perform the heartbeat check if no critical
delays have happened.

Additionally, an unused variable was removed (this change is
unrelated to the bugfix).
2017-02-17 00:57:24 +02:00
vicentiu
e9aed131ea Merge remote-tracking branch 'origin/5.5' into 10.0 2017-01-06 17:09:59 +02:00
Elena Stepanova
9bf92706d1 MDEV-8518 rpl.sec_behind_master-5114 fails sporadically in buildbot
- fix the test to avoid false-negatives before MDEV-5114 patch;
- fix the race condition which made the test fail on slow builders
2017-01-04 14:50:10 +02:00
Sachin Setiya
d02a77bc5f MDEV-11636 Extra persistent columns on slave always gets NULL in RBR
Problem:- In replication if slave has extra persistent column then these
column are not computed while applying write-set from master.

Solution:- While applying row events from server, we will generate values
for extra persistent columns.
2017-01-01 16:45:44 +05:30
Sergei Golubchik
b03b38dd65 cleanup: rpl.rpl_row_mysqlbinlog
some trivial simplifications. drinking the ocean,
one drop at a time
2016-12-17 00:16:15 +01:00
Sergei Golubchik
3e8155c637 Merge branch '5.5' into 10.0 2016-12-09 16:33:48 +01:00
Sergei Golubchik
7f2fd34500 MDEV-11231 Server crashes in check_duplicate_key on CREATE TABLE ... SELECT
be consistent and don't include the table name into the error message,
no other CREATE TABLE error does it.

(the crash happened, because thd->lex->query_tables was NULL)
2016-12-04 01:59:35 +01:00
Sergei Golubchik
f640527e65 typo fixed: s/MSYQL/MYSQL/ 2016-12-03 22:02:00 +01:00
Kristian Nielsen
717f212840 MDEV-10863: parallel replication tries to continue from wrong position
This occured when the SQL thread (but not the IO thread) stops while
GTID and parallel replication are used with multiple domain ids in the
GTID position, and is restarted.

In this case, the SQL needs to start some way back in the relay log,
applying or skipping events within each replication domain as
appropriate.

The SQL threads starts at the beginning of an old relay log file, and
this position may be in the middle of an event group. The bug was that
such partial event group could be re-applied, causing replication
corruption.

This patch fixes the issue, by making sure to skip any initial events
that were part of an earlier (already applied) event group.
2016-11-04 12:33:42 +01:00
Sergei Golubchik
22490a0d70 MDEV-8345 STOP SLAVE should not cause an ERROR to be logged to the error log
cherry-pick from 5.7:
  commit 6b24763
  Author: Manish Kumar <manish.4.kumar@oracle.com>
  Date:   Tue Mar 27 13:10:42 2012 +0530

  BUG#12977988 - ON STOP SLAVE: ERROR READING PACKET FROM SERVER: LOST CONNECTION
                 TO MYSQL SERVER
  BUG#11761457 - ERROR 2013 + "ERROR READING RELAY LOG EVENT" ON STOP SLAVEBUG#12977988 - ON STOP SLAVE: ERROR READING PACKET FROM SERVER: LOST CONNECTION
               TO MYSQL SERVER
2016-10-26 18:44:34 +02:00
Sergei Golubchik
25932708b1 backport include/search_pattern_in_file.inc from 10.1 2016-10-26 18:44:34 +02:00
Kristian Nielsen
998f987eda Upstream MIPS test fixes from Debian Bug 838557.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=838557

MIPS has a different errno for "directory not empty".
2016-10-21 22:37:51 +02:00
Kristian Nielsen
b34d7fba31 Debian bug#837369 - test failures on hppa
ENOTEMPTY is 247 on hppa, not 39 like on Linux, according to this report:

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=837369

So add another replacement for this to rpl.rpl_drop_db and
binlog.binlog_databasae tests (there were already a couple similar
replacements for other platforms).

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2016-09-11 11:18:27 +02:00
Monty
b51109693e MDEV-10630 rpl.rpl_mdev6020 fails in buildbot with timeout
The issue was that when running with valgrind the wait for master_pos_Wait()
was not long enough.

This patch also fixes two other failures that could affect rpl_mdev6020:
- check_if_conflicting_replication_locks() didn't properly check domains
- 'did_mark_start_commit' was after signals to other threads was sent which could
  get the variable read too early.
2016-08-22 10:16:00 +03:00
Sergei Golubchik
a10fd659aa Fixed for failures in buildbot: Replication
1. remove unnecessary rpl-tokudb combination file.
2. fix rpl_ignore_table to cleanup properly (not leave test
   grants in memory)
3. check_temp_dir() is supposed to set the error in stmt_da - do
   it even when called multiple times, this fixes a crash when
   rpl.rpl_slave_load_tmpdir_not_exist is run twice.
2016-06-22 10:40:43 +02:00
Elena Stepanova
67b4a6f576 MDEV-8859 rpl.rpl_mdev382 sporadically fails to finish due to disappeared expect file
The combination of --remove_file and --write_file on .expect file creates
a race condition which can be hit by MTR which reads the file in a loop.
Instead, .expect file should be changed with --append_file.
It was fixed in 10.x, but in 5.5 the sporadic failure still affected buildbot.
Fixed 3 test files which use the problematic combination
2016-06-12 20:14:51 +03:00
Sujatha Sivakumar
df7ecf64f5 Bug#23251517: SEMISYNC REPLICATION HANGING
Revert following bug fix:

Bug#20685029: SLAVE IO THREAD SHOULD STOP WHEN DISK IS
FULL
Bug#21753696: MAKE SHOW SLAVE STATUS NON BLOCKING IF IO
THREAD WAITS FOR DISK SPACE

This fix results in a deadlock between slave IO thread
and SQL thread.
2016-05-13 16:42:45 +05:30
Sergei Golubchik
872649c7ba Merge branch '5.5' into 10.0 2016-04-26 23:05:26 +02:00
Sergei Golubchik
b069d19284 Merge branch 'mysql/5.5' into 5.5 2016-04-20 15:25:55 +02:00
Sujatha Sivakumar
3a8f43bec7 Bug#22897202: RPL_IO_THD_WAIT_FOR_DISK_SPACE HAS OCCASIONAL
FAILURES

Analysis:
=========
Test script is not ensuring that "assert_grep.inc" should be
called only after 'Disk is full' error is written to the
error log.

Test checks for "Queueing master event to the relay log"
state. But this state is set before invoking 'queue_event'.
Actual 'Disk is full' error happens at a very lower level.
It can happen that we might even reset the debug point
before even the actual disk full simulation occurs and the
"Disk is full" message will never appear in the error log.

In order to guarentee that we must have some mechanism where
in after we write "Disk is full" error messge into the error
log we must signal the test to execute SSS and then reset
the debug point. So that test is deterministic.

Fix:
===
Added debug sync point to make script deterministic.
2016-04-19 11:44:34 +05:30
Sujatha Sivakumar
5102a7f278 Bug#20685029: SLAVE IO THREAD SHOULD STOP WHEN DISK IS
FULL
Bug#21753696: MAKE SHOW SLAVE STATUS NON BLOCKING IF IO
THREAD WAITS FOR DISK SPACE

Fixing a post push test issue.
2016-03-07 18:19:26 +05:30
Otto Kekäläinen
1777fd5f55 Fix spelling: occurred, execute, which etc 2016-03-04 02:09:37 +02:00
Sujatha Sivakumar
8361151765 Bug#20685029: SLAVE IO THREAD SHOULD STOP WHEN DISK IS
FULL
Bug#21753696: MAKE SHOW SLAVE STATUS NON BLOCKING IF IO
THREAD WAITS FOR DISK SPACE

Problem:
========
Currently SHOW SLAVE STATUS blocks if IO thread waits for
disk space. This makes automation tools verifying
server health block on taking relevant action. Finally this
will create SHOW SLAVE STATUS piles.

Analysis:
=========
SHOW SLAVE STATUS hangs on mi->data_lock if relay log write
is waiting for free disk space while holding mi->data_lock.
mi->data_lock is needed to protect the format description
event (mi->format_description_event) which is accessed by
the clients running FLUSH LOGS and slave IO thread. Note
relay log writes don't need to be protected by
mi->data_lock, LOCK_log is used to protect relay log between
IO and SQL thread (see MYSQL_BIN_LOG::append_event). The
code takes mi->data_lock to protect
mi->format_description_event during relay log rotate which
might get triggered right after relay log write.

Fix:
====
Release the data_lock just for the duration of writing into
relay log.

Made change to ensure the following lock order is maintained
to avoid deadlocks.

data_lock, LOCK_log

data_lock is held during relay log rotations to protect
the description event.
2016-03-01 12:29:51 +05:30
Venkatesh Duggirala
29cc2c2883 BUG#20574550 MAIN.MERGE TEST CASE FAILS IF BINLOG_FORMAT=ROW
The main.merge test case was failing when tested using row based
binlog format.

While analyzing the issue it was found the following issues:

a) The server is calling binlog related code even when a statement will
   not be binlogged;
b) The child table list was not present into table structure by the time
   to generate the create table statement;
c) The tables in the child table list will not be opened yet when
   generating table create info using row based replication;
d) CREATE TABLE LIKE TEMP_TABLE does not preserve original table storage
   engine when using row based replication;

This patch addressed all above issues.

@ sql/sql_class.h

Added a function to determine if the binary log is disabled to
  the current session. This is related with issue (a) above.

@ sql/sql_table.cc

Added code to skip binary logging related code if the statement
  will not be binlogged. This is related with issue (a) above.

Added code to add the children to the query list of the table that
  will have its CREATE TABLE generated. This is related with issue (b)
  above.

Added code to force the storage engine to be generated into the
  CREATE TABLE. This is related with issue (d) above.

@ storage/myisammrg/ha_myisammrg.cc

Added a test to skip a table getting info about a child table if the
  child table is not opened. This is related to issue (c) above.
2016-02-26 09:01:49 +05:30
Sergei Golubchik
271fed4106 Merge branch '5.5' into 10.0 2016-02-15 22:50:59 +01:00
Sergei Golubchik
f3444df415 Merge branch 'mysql/5.5' into 5.5
reverted about half of commits as either not applicable or
outright wrong
2016-02-09 11:27:40 +01:00
Deepthi Eranti_Sreenivas
1624c26d42 Bug#22086528: TEST CODE DISABLED THOUGH THE HISTORIC REASONS - BUGS - ARE FIXED
Post push fix for 5.5 and 5.6.Disabled the test code due to Bug#22587377
2016-01-22 16:51:21 +05:30
Deepthi Eranti_Sreenivas
7d19d4b2dd Bug#22086528 : TEST CODE DISABLED THOUGH THE HISTORIC REASONS - BUGS - ARE FIXED
Problem:
mysql-test/suite/rpl/t/rpl_killed_ddl.test

This test contains code which was disabled because of certain bugs.
BUG#44041 declared to be a duplicate of Bug#45516 which was fixed 2010
BUG#43353 fixed 2012
BUG#44171 fixed 2010

Fix:
Enabled the test code related to the above mentioned bugs.
2016-01-20 18:23:16 +05:30
Kristian Nielsen
74b1af19e9 Merge branch 'tmp' into tmp-10.0
Conflicts:
	sql/slave.cc
2016-01-15 12:50:23 +01:00
Kristian Nielsen
06b2e327fc Fix error handling for GTID and domain-based parallel replication
This occurs when replication stops with an error, domain-based parallel
replication is used, and the GTID position contains more than one domain.
Furthermore, it relates to the case where the SQL thread is restarted
without first stopping the IO thread.

In this case, the file/offset relay-log position does not correctly
represent the slave's multi-dimensional position, because other domains may
be far ahead of, or behind, the domain with the failing event. So the code
reverts the relay log position back to the start of a relay log file that is
known to be before all active domains.

There was a bug that when the SQL thread was restarted, the
rli->relay_log_state was incorrectly initialised from @@gtid_slave_pos. This
position will likely be too far ahead, due to reverting the relay log
position. Thus, if the replication fails again after the SQL thread restart,
the rli->restart_gtid_pos might be updated incorrectly. This in turn would
cause a second SQL thread restart to replicate from the wrong position, if
the IO thread was still left running.

The fix is to initialise rli->relay_log_state from @@gtid_slave_pos only
when we actually purge and re-fetch relay logs from the master, not at every
SQL thread start.

A related problem is the use of sql_slave_skip_counter to resolve
replication failures in this kind of scenario. Since the slave position is
multi-dimensional, sql_slave_skip_counter can not work properly - it is
indeterminate exactly which event is to be skipped, and is unlikely to work
as expected for the user. So make this an error in the case where
domain-based parallel replication is used with multiple domains, suggesting
instead the user to set @@gtid_slave_pos to reliably skip the desired event.
2016-01-15 12:48:14 +01:00
Monty
8fcc0bfefa Fixed bug in semi_sync replication tests.
The problem was that wait_for_slave_io_to_start reported that the io thread
was ready, when it was still initializing. This caused test suite to
continue too early, for example before the semi sync plugin was properly
enabled.

Fixed by introducing a new internal stage: "Preparing". Slave_IO_Running is
now set to "Yes" only when all initializing is done and the IO thread is
ready to read things from the master.

The only test affected by this change is rpl_flsh_tbls, which got stuck in
the preparing phase while trying to read the GTID position from a table.
Fixed by having this test waiting for Preparing instead of Yes.
2016-01-03 13:27:59 +02:00
Sergei Golubchik
2116649dee after-merge fix replication tests
* mostly update result files
* also updating test/include files to match 5.6
2015-12-15 20:25:06 +01:00
Venkatesh Duggirala
2735f0b920 Bug#21205695 DROP TABLE MAY CAUSE SLAVES TO BREAK
Problem:
    ========
    1) Drop table queries are re-generated by server
    before writing the events(queries) into binlog
    for various reasons. If table name/db name contains
    a non regular characters (like latin characters),
    the generated query is wrong. Hence it breaks the
    replication.
    2) In the edge case, when table name/db name contains
    64 characters, server is throwing an assert
    assert(M_TBLLEN < 128)
    3) In the edge case, when db name contains 64 latin
    characters, binlog content is interpreted badly
    which is leading replication failure.

    Analysis & Fix :
    ================
    1) Parser reads the table name from the query and converts
    it to standard charset(utf8) and stores it in table_name variable.
    When drop table query is regenerated with the same table_name
    variable, it should be converted back to the original charset
    from standard charset(utf8).

    2) Latin character takes two bytes for each character. Limit
    of the identifier is 64. SYSTEM_CHARSET_MBMAXLEN is set to '3'.
    So there is a possiblity that tablename/dbname contains 3 * 64.
    Hence assert is changed to
    (M_TBLLEN <= NAME_CHAR_LEN*SYSTEM_CHARSET_MBMAXLEN)

    3) db_len in the binlog event header is taking 1 byte.
       db_len is ranged from 0 to 192 bytes (3 * 64).
       While reading the db_len from the event, server
       is casting to uint instead of uchar which is leading
       to bad db_len. This problem is fixed by changing the
       cast type to uchar.
2015-12-01 15:38:11 +05:30
Monty
654547b5b4 Fixed problems found by buildbot:
- Better error from check_slave_param
- Better error message from TokuDB if it can't be compiled.
- Marked rpl_mixed_drop_create_temp_table and
  rpl_stm_drop_create_temp_table as big tests to stop timeout
  failures on power8
- Added sync_slave_with_master to semisync_future-7591 to
  ensure that slave is up to date with master before calling
  rpl_end.
- Disabled compiler warnings from connect and mroonga and on
  MacOSX.

Mroonga:
- Fixed bug when testing if file is a normal file that can be deleted
- Marked a lot of date and datetime test to not run on macosx.
  This is because mktime() can't handle negative years and this
  restricts mroonga so that it can only store dates after the year 1900.
2015-11-27 02:06:58 +02:00
Monty
f813a00029 Fixed failing test cases and compiler warnings found by buildbot
- Added some extra command to rpl_start_stop to ensure that the
  IO thread has connected to the master before we shut down the server.
- if signal returns signalhandler_t, use this with the alarm code
- Added missing tests to sys_vars
- Fixed some possible overflow bugs in tabxml.cpp
2015-11-24 20:04:12 +02:00
Venkatesh Duggirala
bb56c30ad7 Bug#17047208 REPLICATION DIFFERENCE FOR MULTIPLE TRIGGERS
Problem & Analysis: If DML invokes a trigger or a
    stored function that inserts into an AUTO_INCREMENT column,
    that DML has to be marked as 'unsafe' statement. If the
    tables are locked in the transaction prior to DML statement
    (using LOCK TABLES), then the same statement is not marked as
    'unsafe' statement. The logic of checking whether unsafeness
    is protected with if (!thd->locked_tables_mode). Hence if
    we lock the tables prior to DML statement, it is *not* entering
    into this if condition. Hence the statement is not marked
    as unsafe statement.

    Fix: Irrespective of locked_tables_mode value, the unsafeness
    check should be done. Now with this patch, the code is moved
    out to 'decide_logging_format()' function where all these checks
    are happening and also with out 'if(!thd->locked_tables_mode)'.
    Along with the specified test case in the bug scenario
    (BINLOG_STMT_UNSAFE_AUTOINC_COLUMNS), we also identified that
    other cases BINLOG_STMT_UNSAFE_AUTOINC_NOT_FIRST,
    BINLOG_STMT_UNSAFE_WRITE_AUTOINC_SELECT, BINLOG_STMT_UNSAFE_INSERT_TWO_KEYS
    are also protected with thd->locked_tables_mode which is not right. All
    of those checks also moved to 'decide_logging_format()' function.
2015-11-19 13:59:27 +05:30
Monty
f383cbcb03 Added some selects to rpl_parallel2.test to find out where it fails in buildbot 2015-11-18 14:46:30 +02:00