1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-08 11:22:35 +03:00
Commit Graph

4561 Commits

Author SHA1 Message Date
Marko Mäkelä
5098d708a0 Merge 10.2 into 10.3 2019-11-12 16:42:58 +02:00
Marko Mäkelä
2350066e63 Merge 10.1 into 10.2 2019-11-12 14:36:37 +02:00
Sujatha
7df07c7666 MDEV-20953: binlog_encryption.rpl_corruption failed in buildbot due to wrong error code
Problem:
========
CURRENT_TEST: binlog_encryption.rpl_corruption

mysqltest: In included file "./include/wait_for_slave_io_error.inc":
...
At line 72: Slave stopped with wrong error code
**** Slave stopped with wrong error code: 1743 (expected 1595,1913) ****

Analysis:
========
The test emulates the corruption at the various stages of replication for
example in binlog file, in network and in relay log etc. It verifies that all
corruption cases are handled through appropriate error messages.

The test cases which emulate network failure expect following errors.
--ER_SLAVE_RELAY_LOG_WRITE_FAILURE (1595)
--ER_NETWORK_READ_EVENT_CHECKSUM_FAILURE (1743)

Ideally test should expect error codes as 1595 and 1743.
But the test actually waits on incorrect error code 1595,1913

Fix:
===
Added appropriate error code for 'ER_NETWORK_READ_EVENT_CHECKSUM_FAILURE'.
Replaced 1913 with 1743.
2019-11-12 16:31:08 +05:30
Andrei Elkin
40e65e878e rpl_semi_sync_gtid_reconnect results merge 2019-11-11 21:12:14 +02:00
Andrei Elkin
d103c5a489 merge 10.2->10.3 with conflict resolutions 2019-11-11 16:28:21 +02:00
Andrei Elkin
26fd880d5e manual merge 10.1->10.2 2019-11-11 16:03:43 +02:00
Andrei Elkin
13db50fc03 MDEV-19376 Repl_semi_sync_master::commit_trx assertion failure: ... || !m_active_tranxs->is_tranx_end_pos(trx_wait_binlog_name, trx_wait_binlog_pos)
The assert indicates that the current transaction got caught uncleaned from
the semisync master's cache when it is signaled to proceed upon its
ack receive.

The reason of missed cleanup turns out to be a flaw in the gtid
connect mode.
A submitted by connecting slave value of its last received event's
binlog file *name* was adopted into
{{Repl_semi_sync_master::m_reply_file_name}} as a part of semisync
initialization.

Notice that the initialization still refines the position part of the
submitted last received event's binlog coordinates.
The master side binlog filename:pos refinement is
specific to the gtid connect mode for purpose of computing the latest
binlog file to resume slave feeding from.
Effectively in the gtid connect mode the computed resumption filename:pos
may appear smaller in which case a new post-connect time committing
transaction may be logged with its filename:pos also less than the
submitted coordinates and that triggers the assert.

Fixed with making the semisync initialization to use the refined filename:pos.
It is guaranteed to be less than any new generated transaction's binlog:pos.
2019-11-10 16:16:37 +02:00
Oleksandr Byelkin
3ad37ed0eb Merge 10.4 into 10.5 2019-11-07 08:52:30 +01:00
Marko Mäkelä
ec40980ddd Merge 10.3 into 10.4 2019-11-01 15:23:18 +02:00
Marko Mäkelä
1c022aaf58 MDEV-20487: Fix a test
The test rpl.rpl_failed_drop_tbl_binlog exercises a scenario where
the adaptive hash index is enabled. Try to explicitly enable the
adaptive hash index for this test, and skip the test if it does not
succeed (that is, if the server was not built WITH_INNODB_AHI=ON).
2019-10-28 21:28:21 +02:00
Michael Widenius
06d2e1d828 read-only slave using statement replication should replicate tmp tables
Relates to MDEV-17863 DROP TEMPORARY TABLE creates a transaction in
binary log on read only server

Other things:
- Fixed that insert into normal_table select from tmp_table is
  replicated as row events if tmp_table doesn't exists on slave.
2019-10-21 17:17:09 +03:00
Marko Mäkelä
d04f2de80a Merge 10.4 into 10.5 2019-10-11 08:41:36 +03:00
Marko Mäkelä
c11e5cdd12 Merge 10.3 into 10.4 2019-10-10 11:19:25 +03:00
sachin
1e0f09cacb MDEV-16239 Many test in rpl suite fails
Fix rpl_skip_error test.
  We cant reset Slave_skipped_errors(even with FLUSH STATUS), So instead
of absolute slave_skipped_errors we look for delta of slave_skipped_errors
Fix rpl.rpl_binlog_errors and binlog_encryption.rpl_binlog_errors
  We create the $load_file and $load_file2 but we never remove them.
Fix rpl_000011.test
  Instead of real value use delta value , Since flush status wont flush
LONGLONG variable.
Fix rpl_row_find_row_debug
  Instead of searching whole log_error_ file we will use search_pattern_in_file
which runs pattern search only on latest test run , instead of full file.
Fix rpl_ip_mix rpl_ip_mix2
  We should call reset slave all because we also want to reset master_host
otherwise show slave status wont be empty and making repeat N a failure.
Fix rpl_rotate_logs
  First we have to remove master.info file (cleanup) and second we have to
call reset slave all because if we do not call reset slave all then we wont
read master.info file beacuse we already have master config in memory.
And this makes start slave to pass , which shoud fail becuase its permision
is 000
Fix circular_serverid0 test
  The reason is that ++dbug_rows_event_count == 2 in queue_event does
not take --repeat into account. So I have reseted the dbug_rows_event_count
in if body.
2019-10-08 13:34:25 +05:30
Alexander Barkov
1ae09ec863 Merge remote-tracking branch 'origin/10.4' into 10.5 2019-10-01 11:44:27 +04:00
Alexander Barkov
dc588e3d3f Merge remote-tracking branch 'origin/10.3' into 10.4 2019-10-01 10:45:52 +04:00
Alexander Barkov
7e44c455f4 Merge remote-tracking branch 'origin/10.2' into 10.3 2019-10-01 09:37:40 +04:00
Alexander Barkov
f203245e9e Merge remote-tracking branch 'origin/10.1' into 10.2 2019-10-01 07:11:54 +04:00
Sujatha
9b80f9300d MDEV-20645: Replication consistency is broken as workers miss the error notification from an earlier failed group.
Analysis:
========
In general if there are three groups.
1 - Inserts 32 which fails due to local entry '32' on slave.
2 - Inserts 33
3 - Inserts 34

Each group considers itself as a waiter and it waits for prior group 'waitee'.
This is done in 'register_wait_for_prior_event_group_commit'. If there is no
other parallel group being scheduled then no waitee will be there.

Let us assume 3 groups are being scheduled in parallel.

3-> waits for 2-> waits for->1

'1' upon completion it checks is there any registered subsequent waiter. If
so it wakes up the subsequent waiter with its execution status. This execution
status is stored in wakeup_error.

If '1' failed then it sends corresponding wakeup_error to 2. Then '2' aborts
and it propagates error to '3'.  So all further commits are aborted.  This
mechanism works only when all transactions reach a stage where they are
waiting for their prior commit to complete.

In case of optimistic following scenario occurs.

1,2,3 are scheduled in parallel.

3 - Reaches group_commit_code waits for 2 to complete.
1 - errors out sets stop_on_error_sub_id=1.

When a group execution results in error its corresponding sub_id is set to
'stop_on_error_sub_id'. Any new groups queued for execution will check if
their sub_id is > stop_on_error_sub_id.  If it is true their execution will be
skipped as prior group execution failed.  'skip_event_group=1' will be set.
Since the execution of SQL thread is about to stop we just skip execution of
all the following event groups.  We still do all the normal waiting and wakeup
processing between the event groups as a simple way to ensure that everything
is stopped and cleaned up correctly.

Upon error '1' transaction checks for registered waiters. Since no one is
there it simply goes away.

2 - Starts the execution. It checks do I have a waitee.

Since wait_commit_sub_id == entry->last_committed_sub_id no waitee is set.

Secondly: 'entry->stop_on_error_sub_id' is set by '1'st execution.  Now
'handle_parallel_thread' code checks if the current group 'sub_id' is greater
than the 'sub_id' set within 'stop_on_error_sub_id'.

Since the above is true 'skip_event_group=true' is set.  Simply call
'wait_for_prior_commit' to wakeup all waiters.  Group '2' didn't had any
waitee and its execution is skipped.  Hence its wakeup_error=0.It sends a
positive wakeup signal to '3'. Which commits. This results in a missed
transaction. i.e 33 is missed and 34 is committed.

Fix:
===
When a worker learns that an earlier transaction execution has failed, and it
should not proceed for further execution, it should mark its own execution
status as failed so that it alerts its followers to abort as well.
2019-09-30 13:22:37 +05:30
Marko Mäkelä
1333da90b5 Merge 10.4 into 10.5 2019-09-24 10:07:56 +03:00
Marko Mäkelä
5a92ccbaea Merge 10.3 into 10.4
Disable MDEV-20576 assertions until MDEV-20595 has been fixed.
2019-09-23 17:35:29 +03:00
Sujatha
90a9c4cae7 MDEV-20217: Semi_sync: Last_IO_Error: Fatal error: Failed to run 'after_queue_event' hook
Fix:
===
Implemented upstream fix.

commit 7d3d0fc303
Author: He Zhenxing <zhenxing.he@sun.com>

Backport Bug#45852 Semisynch: Last_IO_Error: Fatal error: Failed
to run 'after_queue_event' hook

Errors when send reply to master should never cause the IO thread
to stop, because master can fall back to async replication if it
does not get reply from slave.

The problem is fixed by deliberately ignoring the return value of
slave_reply.
2019-09-16 15:45:24 +05:30
Marko Mäkelä
4081b7b27a Merge 10.4 into 10.5 2019-09-06 17:16:40 +03:00
Marko Mäkelä
780d2bb8a7 Merge 10.4 into 10.5 2019-09-06 14:25:20 +03:00
Sergei Golubchik
244f0e6dd8 Merge branch '10.3' into 10.4 2019-09-06 11:53:10 +02:00
Marko Mäkelä
537f8594a6 Merge 10.2 into 10.3 2019-09-04 17:52:04 +03:00
Sachin
53ec9047c9 MDEV-20137 rpl.mdev_17588 fails in buildbot with "Table doesn't exist"
Fix the test case.
2019-09-04 17:52:43 +05:30
Sachin
0e38cd37c7 MDEV-20137 rpl.mdev_17588 fails in buildbot with "Table doesn't exist"
Fix the test case.
2019-09-04 17:45:25 +05:30
Monty
a071e0e029 Merge branch '10.2' into 10.3 2019-09-03 13:17:32 +03:00
Monty
9cba6c5aa3 Updated mtr files to support different compiled in options
This allows one to run the test suite even if any of the following
options are changed:
- character-set-server
- collation-server
- join-cache-level
- log-basename
- max-allowed-packet
- optimizer-switch
- query-cache-size and query-cache-type
- skip-name-resolve
- table-definition-cache
- table-open-cache
- Some innodb options
etc

Changes:
- Don't print out the value of system variables as one can't depend on
  them to being constants.
- Don't set global variables to 'default' as the default may not
  be the same as the test was started with if there was an additional
  option file. Instead save original value and reset it at end of test.
- Test that depends on the latin1 character set should include
  default_charset.inc or set the character set to latin1
- Test that depends on the original optimizer switch, should include
  default_optimizer_switch.inc
- Test that depends on the value of a specific system variable should
  set it in the test (like optimizer_use_condition_selectivity)
- Split subselect3.test into subselect3.test and subselect3.inc to
  make it easier to set and reset system variables.
- Added .opt files for test that required specfic options that could
  be changed by external configuration files.
- Fixed result files in rockdsb & tokudb that had not been updated for
  a while.
2019-09-01 19:17:35 +03:00
Marko Mäkelä
db4a27ab73 Merge 10.3 into 10.4 2019-08-31 06:53:45 +03:00
Marko Mäkelä
e41eb044f1 Merge 10.2 into 10.3 2019-08-28 10:18:41 +03:00
Sujatha
e7b71e0daa MDEV-19925: Column ... cannot be converted from type 'varchar(20)' to type 'varchar(20)'
Cherry picking:
Bug#25135304: RBR: WRONG FIELD LENGTH IN ERROR MESSAGE
commit 47bd3f7cf3c8518f62b1580ec65af2ba7ac13b95

Description:
============
In row based replication, when replicating from a table with a field with
character set set to UTF8mb3 to the same table with the same field set to
character set UTF8mb4 I get a confusing error message:

For VARCHAR: VARCHAR(1) 'utf8mb3' to VARCHAR(1) 'utf8mb4'
"Column 0 of table 'test.t1' cannot be converted from type 'varchar(3)' to
type 'varchar(1)'"

Similar issue with CHAR type as well.

Issue with respect to BLOB types:

For BLOB: LONGBLOB to TINYBLOB - Error message displays incorrect blob type.
"Column 0 of table 'test.t1' cannot be converted from type 'tinyblob' to type
'tinyblob'"

For BINARY to BINARY - Error message displays incorrect type for master side
field.
"Column 0 of table 'test.t' cannot be converted from type 'char(1)' to type
'binary(10)'"
Similar issue exists for VARBINARY type. It is displayed as 'VARCHAR'.

Analysis:
=========
In Row based replication charset information is not sent as part of metadata
from master to slave.

For VARCHAR field its character length is converted into equivalent
octets/bytes and stored internally. At the time of displaying the data to user
it is converted back to original character length.

For example:
VARCHAR(2)- utf8mb3 is stored as:2*3 = VARCHAR(6)
At the time of displaying it to user
VARCHAR(6)- charset utf8mb3:6/3= VARCHAR(2).

At present the internally converted octect length is sent from master to slave
with out providing the charset information. On slave side if the type
conversion fails 'show_sql_type' function is used to get the type specific
information from metadata. Since there is no charset information is available
the filed type is displayed as VARCHAR(6).

This results in confused error message.

For CHAR fields
CHAR(1)- utf8mb3 - CHAR(3)
CHAR(1)- utf8mb4 - CHAR(4)

'show_sql_type' function which retrieves type information from metadata uses
(bytes/local charset length) to get actual character length. If slave's chaset
is 'utf8mb4' then

CHAR(3/4)-->CHAR(0)
CHAR(4/4)-->CHAR(1).

This results in confused error message.

Analysis for BLOB type issue:

BLOB's length is represented in two forms.
1. Actual length
i.e
  (length < 256) type= MYSQL_TYPE_TINY_BLOB;
  (length < 65536) type= MYSQL_TYPE_BLOB; ...

2. packlength - The number of bytes used to represent the length of the blob
  1- tinyblob
  2- blob ...

In row based replication only the packlength is written in the binary log. On
the slave side this packlength is interpreted as actual length of the blob.
Hence the length is always < 256 and the type is displayed as tiny blob.

Analysis for BINARY to BINARY type issue:
The character set information is needed to identify a filed's type as char or
binary. Since master side character set information is not available on the
slave side both binary and char fields are displayed as char.

Fix:
===
For CHAR and VARCHAR fields display their length in octets for both source and
target fields. For target field display the charset information if it is
relevant.

For blob type changed the code to use the packlength and display appropriate
blob type in error message.

For binary and varbinary fields use the slave side character set as reference
to map them to binary or varbinary fields.
2019-08-27 13:05:04 +05:30
Monty
97dd057702 Fixed issues when running mtr with --valgrind
- Note that some issues was also fixed in 10.2 and 10.4. I also fixed them
  here to be able to continue with making 10.5 valgrind safe again
- Disable connection threads warnings when doing shutdown
2019-08-23 22:03:54 +02:00
Marko Mäkelä
67ddb6507d Merge 10.4 into 10.5 2019-08-16 14:35:32 +03:00
Marko Mäkelä
c221bcdce7 Merge 10.3 into 10.4 2019-08-16 10:51:20 +03:00
Sujatha
828191b6a0 MDEV-20348: DROP TABLE IF EXISTS killed on master but was replicated
Merge branch '10.2' into 10.3
2019-08-15 14:09:53 +05:30
Sujatha
29e560cdf3 MDEV-20348: DROP TABLE IF EXISTS killed on master but was replicated
Problem:
=======
DROP TABLE IF EXISTS was killed. The table still exists on
the master but the DDL was still logged.

Analysis:
=========
During the execution of DROP TABLE command "ha_delete_table" call is invoked
to delete the table. If the query is killed at this point, the kill command
is not handled within the code. This results in two issues.
1) The table which is not dropped also gets written into the binary log.
2) The code continues further upon receiving 'KILL QUERY'.

Fix:
===
Upon receiving the KILL command the query should stop its current execution.
Tables which were successfully dropped prior to KILL command should be
included in the binary log.
2019-08-14 22:53:16 +05:30
Marko Mäkelä
1d15a28e52 Merge 10.3 into 10.4 2019-08-14 18:06:51 +03:00
Alexander Barkov
c1599821a5 Merge remote-tracking branch 'origin/10.4' into 10.5 2019-08-13 23:49:10 +04:00
Marko Mäkelä
65d48b4a7b Merge 10.2 to 10.3 2019-08-13 19:28:51 +03:00
Marko Mäkelä
624dd71b94 Merge 10.4 into 10.5 2019-08-13 18:57:00 +03:00
Alexander Barkov
95cdc1ca5f Merge commit '43882e764d6867c6855b1ff057758a3f08b25c55' into 10.4 2019-08-13 11:42:31 +04:00
Marko Mäkelä
609ea2f37b MDEV-17614: After-merge fix
MDEV-17614 flags INSERT…ON DUPLICATE KEY UPDATE unsafe for statement-based
replication when there are multiple unique indexes. This correctly fixes
something whose attempted fix in MySQL 5.7
in mysql/mysql-server@c93b0d9a97
caused lock conflicts. That change was reverted in MySQL 5.7.26
in mysql/mysql-server@066b6fdd43
(with a substantial amount of other changes).

In MDEV-17073 we already disabled the unfortunate MySQL change when
statement-based replication was not being used. Now, thanks to MDEV-17614,
we can actually remove the change altogether.

This reverts commit 8a346f31b9 (MDEV-17073)
and mysql/mysql-server@c93b0d9a97 while
keeping the test cases.
2019-08-12 18:50:54 +03:00
Marko Mäkelä
be33124c9d Merge 10.1 into 10.2 2019-08-12 18:25:35 +03:00
Monty
fe8181aca1 Fixed issues found by valgrind
- mysqltest didn't free read_command_buf
- wait_for_slave_param did write different things to the log if valgrind
  was used.
- Table open cache should not write the initial variable value as it
  can depend on the configuration or if valgrind is used
- A variable in GetResult was used uninitalized
2019-08-12 15:41:14 +03:00
Sachin
284c72eacf MDEV-17614 INSERT on dup key update is replication unsafe
Problem:-
When mysql executes INSERT ON DUPLICATE KEY INSERT, the storage engine checks
if the inserted row would generate a duplicate key error. If yes, it returns
the existing row to mysql, mysql updates it and sends it back to the storage
engine.When the table has more than one unique or primary key, this statement
is sensitive to the order in which the storage engines checks the keys.
Depending on this order, the storage engine may determine different rows
to mysql, and hence mysql can update different rows.The order that the
storage engine checks keys is not deterministic. For example, InnoDB checks
keys in an order that depends on the order in which indexes were added to
the table. The first added index is checked first. So if master and slave
have added indexes in different orders, then slave may go out of sync.

Solution:-
Make INSERT...ON DUPLICATE KEY UPDATE unsafe while using stmt or mixed format
When there is more then one unique key.
Although there is two exception.
  1. Auto Increment key is not counted because Innodb will get gap lock for
    failed Insert and concurrent insert will get a next increment value. But if
    user supplies auto inc value it can be unsafe.
  2. Count only unique keys for which insertion is performed.

So this patch also addresses the bug id #72921
2019-08-09 19:36:56 +05:30
Thirunarayanan Balathandayuthapani
47f8a18fec MDEV-20247 Replication hangs with "preparing" and never starts
- The commit ab6dd77408 wrongly sets the
condition inside innobase_srv_conc_enter_innodb().  Problem is that
InnoDB makes the thread to sleep indefinitely if it is a replication
slave thread.

Thanks to Sujatha Sivakumar for contributing the replication test case.
2019-08-07 12:35:04 +05:30
Sujatha
eef7540405 MDEV-18930: Failed CREATE OR REPLACE TEMPORARY not written into binary log makes data on master and slave diverge
Problem:
=======
Failed CREATE OR REPLACE TEMPORARY TABLE statement which dropped the table but
failed at a later stage of creation of temporary table is not written to
binarylog in row based replication. This causes the slave to diverge.

Analysis:
========
CREATE OR REPLACE statements work as shown below.

CREATE OR REPLACE TABLE table_name (a int);
is basically the same as:

DROP TABLE IF EXISTS table_name;
CREATE TABLE table_name (a int);

Hence every CREATE OR REPLACE TABLE command which dropped the table should be
written to binary log, even when following CREATE TABLE part fails. In order
to achieve this, during the execution of CREATE OR REPLACE command, when a
table is dropped 'thd->log_current_statement' flag is set. When table creation
results in an error within 'mysql_create_table' code, the error handling part
looks for this flag. If it is set the failed CREATE OR REPLACE statement is
written into the binary log inspite of error. This ensure that slave doesn't
diverge from the master. In case of row based replication the error handling
code returns very early, if the table is of type temporary. This is done based
on the assumption that temporary tables are not replicated in row based
replication.

It fails to handle the cases where a temporary table was created as part of
statement based replication at an earlier stage and the binary log format was
changed to row because of an unsafe statement. In this case when a CREATE OR
REPLACE statement is executed on this temporary table it will dropped but the
query will not be written to binary log. Hence slave diverges.

Fix:
===
In error handling code check the return status of create table operation. If
it is successful and replication mode is row based and table is of type
temporary then return. Other wise proceed further to the code which checks for
thd->log_current_statement flag and does appropriate logging.
2019-08-05 14:34:31 +05:30
Alexey Botchkov
c6efbc543d MDEV-17544 No warning when trying to name a primary key constraint.
Warning added.
2019-07-30 21:57:48 +04:00