1
0
mirror of https://github.com/MariaDB/server.git synced 2025-11-30 05:23:50 +03:00
Commit Graph

3892 Commits

Author SHA1 Message Date
Sujatha
b0d38448a1 MDEV-25958: rpl_semi_sync_fail_over.test fails in buildbot
Post push fix.

Added a missing WAIT_FOR signal line.
2021-08-19 16:35:44 +05:30
Sujatha
475f69b985 MDEV-25958: rpl_semi_sync_fail_over.test fails in buildbot
Analysis:
========
In case multi binlog truncation scenario debug sync points are in the
following order.

Two inserts are done on master as shown below.

INSERT INTO t1 VALUES (4, REPEAT("x", 4100)
commit_after_release_LOCK_after_binlog_sync

INSERT INTO t1 VALUES (5, REPEAT("x", 4100)
commit_after_release_LOCK_log

First insert debug sync ensures that transaction is synced to binlog and
not committed but it reached slave through semi sync.

Second insert debug sync ensures that transaction is synced to binlog and
not committed. It doesn't ensure that 'INSERT 5' reached slave.

Most of the times INSERT 5 reaches slave, hence when it is promoted as
master it sends 4,5 to slave. But occasionally 5 may not reach slave in
those cases post recovery master will have only 4. When row 6 is inserted
Master has 4-6 and Slave has 4,5,6.

This results in test failure.

Fix:
===
For the first insert use 'commit_before_get_LOCK_commit_ordered' debug sync
point, it will ensure that binlog was sent to slave and slave has
acknowledged the receipt. Now enable debug code such that when the next
transaction is written to binary log, dump thread will read and send it
across the network and notify the server to be get killed. Insert row 5
and wait for notification from dump thread. Kill the server. This ensures
that both 4 and 5 have reached the semi-sync slave.

Added a new test case:
Insert two rows on master such that first is present in master's binlog and
reached semi sync slave. Second insert should be flushed to binlog but not
sent to slave. Now crash and fail over to slave. The promoted master will send
the extra transaction to slave.
2021-08-19 11:59:39 +05:30
Oleksandr Byelkin
6efb5e9f5e Merge branch '10.5' into 10.6 2021-08-02 10:11:41 +02:00
Oleksandr Byelkin
ae6bdc6769 Merge branch '10.4' into 10.5 2021-07-31 23:19:51 +02:00
Oleksandr Byelkin
7841a7eb09 Merge branch '10.3' into 10.4 2021-07-31 22:59:58 +02:00
Sergei Golubchik
2575eaa502 dissapear -> disappear 2021-07-26 12:40:01 +02:00
Marko Mäkelä
641f09398f Merge 10.5 into 10.6 2021-07-22 10:11:08 +03:00
Daniel Black
1519013f51 mtr: aix - stack-trace is optional 2021-07-22 15:22:47 +10:00
Sergei Golubchik
6190a02f35 Merge branch '10.2' into 10.3 2021-07-21 20:11:07 +02:00
Aleksey Midenkov
22e4baaa5d MDEV-25595 DROP part of failed CREATE OR REPLACE is not written into binary log
Do log_drop_table() in case of failed mysql_prepare_create_table().
2021-07-06 00:47:41 +03:00
Alice Sherepa
8bb225d10a MDEV-16708: fixed test rpl_innodb_bug28430, failing with --ps 2021-06-18 18:03:26 +02:00
Dmitry Shulga
510662e81b MDEV-16708: more fixes to test cases 2021-06-17 19:30:24 +02:00
Dmitry Shulga
97e8d27bed MDEV-16708: fix in test failures(added --enable_prepared_warnings/--disable_prepared_warnings) 2021-06-17 19:30:24 +02:00
Sujatha
6c39eaeb12 MDEV-21117: refine the server binlog-based recovery for semisync
Problem:
=======
When the semisync master is crashed and restarted as slave it could
recover transactions that former slaves may never have seen.
A known method existed to clear out all prepared transactions
with --tc-heuristic-recover=rollback does not care to adjust
binlog accordingly.

Fix:
===
The binlog-based recovery is made to concern of the slave semisync role of
post-crash restarted server.
No changes in behavior is done to the "normal" binloggging server
and the semisync master.

When the restarted server is configured with
  --rpl-semi-sync-slave-enabled=1
the refined recovery attempts to roll back prepared transactions
and truncate binlog accordingly.
In case of a partially committed (that is committed at least
in one of the engine participants) such transaction gets committed.
It's guaranteed no (partially as well) committed transactions
exist beyond the truncate position.
In case there exists a non-transactional replication event
(being in a way a committed transaction) past the
computed truncate position the recovery ends with an error.

As after master crash and failover to slave, the demoted-to-slave
ex-master must be ready to face and accept its own (generated by)
events, without generally necessary --replicate-same-server-id.
So the acceptance conditions are relaxed for the semisync slave
to accept own events without that option.
While gtid_strict_mode ON ensures no duplicate transaction can be
(re-)executed the master_use_gtid=none slave has to be
configured with --replicate-same-server-id.

*NOTE* for reviewers.

This patch does not handle the user XA which is done
in next git commit.
2021-06-11 19:49:39 +03:00
Rucha Deodhar
4e19539c14 MDEV-22189: Change error messages inside code to have mariadb instead of
mysql

Fix: Changed error messages, rerecorded results and changed other relevant
files.
2021-05-24 11:38:13 +05:30
Monty
47010ccffa MDEV-23842 Atomic RENAME TABLE
- Major rewrite of ddl_log.cc and ddl_log.h
  - ddl_log.cc described in the beginning how the recovery works.
  - ddl_log.log has unique signature and is dynamic. It's easy to
    add more information to the header and other ddl blocks while still
    being able to execute old ddl entries.
  - IO_SIZE for ddl blocks is now dynamic. Can be changed without affecting
    recovery of old logs.
  - Code is more modular and is now usable outside of partition handling.
  - Renamed log file to dll_recovery.log and added option --log-ddl-recovery
    to allow one to specify the path & filename.
- Added ddl_log_entry_phase[], number of phases for each DDL action,
  which allowed me to greatly simply set_global_from_ddl_log_entry()
- Changed how strings are stored in log entries, which allows us to
  store much more information in a log entry.
- ddl log is now always created at start and deleted on normal shutdown.
  This simplices things notable.
- Added probes debug_crash_here() and debug_simulate_error() to simply
  crash testing and allow crash after a given number of times a probe
  is executed. See comments in debug_sync.cc and rename_table.test for
  how this can be used.
- Reverting failed table and view renames is done trough the ddl log.
  This ensures that the ddl log is tested also outside of recovery.
- Added helper function 'handler::needs_lower_case_filenames()'
- Extend binary log with Q_XID events. ddl log handling is using this
  to check if a ddl log entry was logged to the binary log (if yes,
  it will be deleted from the log during ddl_log_close_binlogged_events()
- If a DDL entry fails 3 time, disable it. This is to ensure that if
  we have a crash in ddl recovery code the server will not get stuck
  in a forever crash-restart-crash loop.

mysqltest.cc changes:
- --die will now replace $variables with their values
- $error will contain the error of the last failed statement

storage engine changes:
- maria_rename() was changed to be more robust against crashes during
  rename.
2021-05-19 22:54:12 +02:00
Rucha Deodhar
2fdb556e04 MDEV-8334: Rename utf8 to utf8mb3
This patch changes the main name of 3 byte character set from utf8 to
utf8mb3. New old_mode UTF8_IS_UTF8MB3 is added and set TRUE by default,
so that utf8 would mean utf8mb3. If not set, utf8 would mean utf8mb4.
2021-05-19 06:48:36 +02:00
Marko Mäkelä
f09d33f521 Merge 10.5 into 10.6 2021-05-18 11:13:45 +03:00
Marko Mäkelä
cc2651b74c Merge 10.4 into 10.5 2021-05-18 09:21:59 +03:00
Marko Mäkelä
4240704abc Merge 10.3 into 10.4 2021-05-18 08:59:12 +03:00
Marko Mäkelä
ca3f497564 Merge 10.2 into 10.3, except MDEV-25682 2021-05-18 08:40:19 +03:00
Sujatha
410e3c1a9a MDEV-17515: GTID Replication in optimistic mode deadlock
Problem:
=======
In slave_parallel_mode=optimistic configuration, when admin commands and
DML operation on the same table are scheduled simultaneously for execution,
it results in lock conflict and slave server either hangs due to
deadlock or goes down with an assert.

Analysis:
========
Admin commands OPTIMIZE, REPAIR and ANALYZE are written to binary log as
ordinary transactions. When 'slave_parallel_mode' is 'optimistic' DMLs are
allowed to run in parallel. But these locks are not detected by parallel
replication deadlock detection-and-handling mechanism. At times they result
in deadlock or assertion.

Fix:
===
Flag admin commands as DDL in Gtid_log_event at the time of writing to
binary log. Add a new bit EXECUTED_TABLE_ADMIN_CMD to
'm_unsafe_rollback_flags'. During 'mysql_admin_table' command execution it
accepts a list of tables to be processed and executes them in a loop. Upon
successful execution enable 'EXECUTED_TABLE_ADMIN_CMD' bit in
thd->transaction.stmt_unsafe_rollback_flags. Gtid_log_event constructor
will notice this flag and mark the current transaction with 'FL_DDL' flag.
Gtid_log_events marked as FL_DDL will not be scheduled parallel execution,
on the slave. They will execute in isolation to prevent deadlocks.

Note: Removed the call to 'trans_commit_implicit' from 'mysql_admin_table'
function as 'mysql_execute_command' will take care of invoking
'trans_commit_implicit'.
2021-05-17 16:38:58 +05:30
Sachin Kumar
e607f3398c MDEV-25336 Parallel replication causes failed assert while restarting
Problem:- When slave is shutdown, we will get this assertion failure
sql/sql_list.h:642: void ilink::assert_linked(): Assertion `prev != 0
&& next != 0' failed.

Solution:- In close_connections when we call threads.get() it resets to
prev and next to NULL. And in parallel worker thread(handle_rpl_parallel_thread)
calls unlink_not_visible_thd() which assert on prev and next being not NULL.
.unlink_not_visible_thd() should be always called first before threads.get()
is called. To make sure worker calls unlink_not_visible_thd() in
slave_prepare_for_shutdown() we are deactivating the  worker thread pool
which in turn will close all worker threads. Since this is already done in 10.4
and 10.5 I am backPorting MDEV-20821 and MDEV-22370 to 10.2. Mdev-22370
is improving the MDEV-20821 patch.
2021-05-14 11:50:12 +01:00
Andrei Elkin
3616640a31 MDEV-20821 parallel slave server shutdown hang
Parallel slave server shutdown found to be hanging in
close_connections() triggered by shutdown due to a slave worker thread
would not be notified to exit in case the worker was sitting idle.

Fixed with destroying the worker pool earlier that is in
slave_prepare_for_shutdown() when all their driver threads have already left.
A test file is added to simulate the bug condition as well as check
multi-sourced and not-idle worker cases.
2021-05-14 11:49:26 +01:00
Sujatha
fe9450676f MDEV-25502: rpl.rpl_perfschema_applier_status_by_worker failed in bb with: Test assertion failed
Problem:
=======
Test fails with 3 different symptoms
connection slave;
Assertion text: 'Last_Seen_Transaction should show .'
Assertion condition: '"0-1-1" = ""'
Assertion condition, interpolated: '"0-1-1" = ""'
Assertion result: '0'

connection slave;
Assertion text: 'Value returned by SSS and PS table for Last_Error_Number
                 should be same.'
Assertion condition: '"1146" = "0"'
Assertion condition, interpolated: '"1146" = "0"'
Assertion result: '0'

connection slave;
Assertion text: 'Value returned by PS table for worker_idle_time should be
                >= 1'
Assertion condition: '"0" >= "1"'
Assertion condition, interpolated: '"0" >= "1"'
Assertion result: '0'

Fix1:
====
Performance schema table's Last_Seen_Transaction is compared with 'SELECT
gtid_slave_pos'. Since DDLs are not transactional changes to user table and
gtid_slave_pos table are not guaranteed to be synchronous. To fix the
issue Gtid_IO_Pos value from SHOW SLAVE STATUS command will be used to
verify the correctness of Performance schema specific
Last_Seen_Transaction.

Fix2:
====
On error worker thread information is stored as part of backup pool. Access
to this backup pool should be protected by 'LOCK_rpl_thread_pool' mutex so
that simultaneous START SLAVE cannot destroy the backup pool, while it is
being queried by performance schema.

Fix3:
====
When a worker is waiting for events if performance schema table is queried,
at present it just returns the difference between current_time and
start_time.  This is incorrect. It should be worker_idle_time +
(current_time - start_time).

For example a worker thread was idle for 10 seconds and then it got events
to process. Upon completion it goes to idle state, now if the pfs table is
queried it should return current_idle time  + worker_idle_time.
2021-05-13 10:34:32 +05:30
Marko Mäkelä
f8665314d4 Merge 10.5 into 10.6 2021-05-10 11:42:04 +03:00
Marko Mäkelä
0e1437e147 Merge 10.4 into 10.5 2021-05-10 10:01:15 +03:00
Marko Mäkelä
8c73fab7f7 Merge 10.3 into 10.4 2021-05-10 09:52:01 +03:00
Marko Mäkelä
916b237b3f Merge 10.5 into 10.6 2021-05-07 15:00:27 +03:00
Oleksandr Byelkin
72753d2b65 Merge branch 'bb-10.3-release' into 10.3 2021-05-07 11:51:22 +02:00
Nikita Malyavin
3f55c56951 Merge branch bb-10.4-release into bb-10.5-release 2021-05-05 23:57:11 +03:00
Nikita Malyavin
509e4990af Merge branch bb-10.3-release into bb-10.4-release 2021-05-05 23:03:01 +03:00
Sujatha
625e44a80d MDEV-25597: Disable rpl_semi_sync_slave_compressed_protocol.test 2021-05-05 15:46:22 +05:30
Nikita Malyavin
a8a925dd22 Merge branch bb-10.2-release into bb-10.3-release 2021-05-04 14:49:31 +03:00
Sujatha
11597e02f3 MDEV-25502: rpl.rpl_perfschema_applier_status_by_worker failed in bb with: Test assertion failed
Problem:
=======
Test assertion fails on slave.

Assertion text: 'Last_Seen_Transaction should show .'
Assertion condition: '"0-1-1" = ""'
Assertion condition, interpolated: '"0-1-1" = ""'
Assertion result: '0'

Analysis:
========
Test case creates a table on master and it waits for it to be replicated to
slave and applied by slave applier. On completion 'Last_Seen_Transaction'
value from 'performance_schema.replication_applier_status_by_worker' table
is compared with '@@gtid_slave_pos' to ensure its correctness. The test
should ensure that user table and 'gtid_slave_pos' table are of same engine
type 'InnoDB' to get consistent test results. This guarantees that
'gtid_slave_pos' table is updated as part of transaction commit. In the of
such an engine consistency, user table gets created in default MyISAM
storage engine and 'mysql.gtid_slave_pos' table gets created in Aria
storage engine. When the test code reaches above assert there might be a
delay in an update to 'gtid_slave_pos' table, which leads to test assertion
failure.

Fix:
===
Use InnoDB engine for both user table and 'mysql.gtid_slave_pos' table.
2021-05-04 16:35:56 +05:30
Sujatha
abe6eb10a6 MDEV-16146: MariaDB slave stops with following errors.
Problem:
========
180511 11:07:58 [ERROR] Slave I/O: Unexpected master's heartbeat data:
heartbeat is not compatible with local info;the event's data: log_file_name
mysql-bin.000009 log_pos 1054262041, Error_code: 1623

Analysis:
=========
In replication setup when master server doesn't have any events to send to
slave server it sends an 'Heartbeat_log_event'. This event carries the
current binary log filename and offset details. The offset values is stored
within 4 bytes of event header. When the size of binary log is higher than
UINT32_MAX the log_pos values will not fit in 4 bytes memory.  It overflows
and hence slave stops with an error.

Fix:
===
Since we cannot extend the common_header of Log_event class, a greater than
4GB value of Log_event::log_pos is made to be transported with a HeartBeat
event's sub-header.  Log_event::log_pos in such case is set to zero to
indicate that the 8 byte sub-header is allocated in the event.

In case of cross version replication following behaviour is expected

OLD - Server without fix
NEW - Server with fix

OLD<->NEW : works bidirectionally as long as the binlog offset is
            (normally) within 4GB.

When log_pos > UINT32_MAX
OLD->NEW  : The 'log_pos' is bound to overflow and NEW slave may report
            an invalid event/incompatible heart beat event error.
NEW->OLD  : Since patched server sets log_pos=0 on overflow, OLD slave will
            report invalid event error.
2021-04-30 20:34:31 +05:30
Marko Mäkelä
55e0ce1401 Merge 10.5 into 10.6 2021-04-29 16:35:26 +03:00
Sergei Golubchik
e8b9d8d38c MDEV-25530 Error 1451 on slave: Cannot delete or update a parent row: a foreign key constraint fails
after dfb41fddf6 tables that failed to drop are excluded from the
binlogged DROP TABLE statement. It means that the slave should not
expect any errors when executing DROP TABLE, and the binlog should
report that no error has happened, even if it was.

Do not write error code into the binlogged DROP TABLE,
and remove all code that was needed to compute it.
2021-04-29 13:17:31 +02:00
Marko Mäkelä
559efad44e Merge 10.4 into 10.5 2021-04-27 09:10:47 +03:00
Marko Mäkelä
90a306a7ab Merge 10.3 into 10.4 2021-04-27 08:53:50 +03:00
Sujatha
391f1aa6ee MDEV-24773: slave_compressed_protocol doesn't work properly with semi-sync replication
Back port upstream fix

commit 1800b015a1d487330f7b15f2020b887be348a66b
Author: Venkatesh Duggirala <venkatesh.duggirala@oracle.com>
Date:   Fri Sep 8 20:29:22 2017 +0530

Bug#26027024    SLAVE_COMPRESSED_PROTOCOL DOESN'T WORK WITH
SEMI-SYNC REPLICATION IN MYSQL-5.7

Analysis: In mysql-5.6, dump thread (the thread that is created
on Master after Slave requested for a binlog dump) is also used
to receive acknowledgements from the Slave and act on them accordingly.
For performance reasons, a special thread called Ack Receiver thread
is added in mysql-5.7 Semi synchronous replication plugin.
This thread does not have special handling to receive acknowledgements
if Slave has enabled compression in the protocol. Hence Master is
unable to handle any slave if Slave_compressed_protocol is enabled
on it.

Fix: Enable compress flag on the communication channels if the Slave
has Slave_compressed_protocol ON.
2021-04-26 11:09:39 +05:30
Marko Mäkelä
4930f9c94b Merge 10.5 into 10.6 2021-04-21 11:45:00 +03:00
Oleksandr Byelkin
a3099a3b4a MDEV-24312 master_host has 60 character limit, increase to 255 bytes
Also increase user name up to 128.

The work was started by Rucha Deodhar <rucha.deodhar@mariadb.com>,
contains audit plugin fixes by Alexey Botchkov <holyfoot@askmonty.org>.
2021-04-20 16:36:56 +02:00
Sujatha
410832313e MDEV-16437: merge 5.7 P_S replication instrumentation and tables
Merge 'replication_applier_status' table.

This table captures SQL_THREAD status.
2021-04-16 09:11:48 +05:30
Sujatha
767648cc2b MDEV-16437: merge 5.7 P_S replication instrumentation and tables
Merge 'replication_applier_configuration' table.

This table captures SQL_THREAD configuration parameters.
2021-04-16 09:05:39 +05:30
Sujatha
70642871bc MDEV-16437: merge 5.7 P_S replication instrumentation and tables
Merge 'replication_applier_status_by_coordinator' table.

This table captures SQL_THREAD status in case of both single threaded and
multi threaded slave configuration. When multi_source replication is enabled
this table will display each source specific SQL_THREAD status.

Added new columns for:
 - LAST_SEEN_TRANSACTION
 - LAST_TRANS_RETRY_COUNT
2021-04-16 09:02:00 +05:30
Sujatha
2674365c8e MDEV-16437: merge 5.7 P_S replication instrumentation and tables
Merge 'replication_connection_configuration' table.

Replaced following column:
  - AUTO_POSITION with USING_GTID
Added new columns for:
  - IGNORE_SERVER_IDS
  - DO_DOMAIN_IDS
  - IGNORE_SERVER_IDS
Removed following columns as they are not part of mariadb replication
connection configuration:
  - NETWORK_INTERFACE
  - TLS_VERSION

@sql/mysqld.cc
  Changed "master-retry-count" default value to 100000.
2021-04-16 08:54:19 +05:30
Eugene Kosov
f46536e7c3 fix perfschema tests in a non-perfschema builds 2021-04-14 23:09:06 +03:00
Marko Mäkelä
d2e2d32933 Merge 10.5 into 10.6 2021-04-14 12:32:27 +03:00
Marko Mäkelä
6c3e860cbf Merge 10.4 into 10.5 2021-04-14 11:35:39 +03:00