1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-12 20:49:12 +03:00
Commit Graph

74 Commits

Author SHA1 Message Date
Kristian Nielsen
596921dab8 MDEV-34042: Deadlock kill of XA PREPARE can break replication / rpl.rpl_parallel_multi_domain_xa sporadic failure
Clear any pending deadlock kill after completing XA PREPARE, and before
updating the mysql.gtid_slave_pos table in a separate transaction.

Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-05-02 21:07:51 +02:00
Kristian Nielsen
d90a2b44ad MDEV-33668: More precise dependency tracking of XA XID in parallel replication
Keep track of each recently active XID, recording which worker it was queued
on. If an XID might still be active, choose the same worker to queue event
groups that refer to the same XID to avoid conflicts.

Otherwise, schedule the XID freely in the next round-robin slot.

This way, XA PREPARE can normally be scheduled without restrictions (unless
duplicate XID transactions come close together). This improves scheduling
and parallelism over the old method, where the worker thread to schedule XA
PREPARE on was fixed based on a hash value of the XID.

XA COMMIT will normally be scheduled on the same worker as XA PREPARE, but
can be a different one if the XA PREPARE is far back in the event history.

Testcase and code for trimming dynamic array due to Andrei.

Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-04-09 11:42:34 +03:00
Kristian Nielsen
f9ecaa87ce MDEV-33668: Refactor parallel replication round-robin scheduling to use explicit FIFO
This is a preparatory patch to facilitate the next commit to improve
the scheduling of XA transactions in parallel replication.

When choosing the scheduling bucket for the next event group in
rpl_parallel_entry::choose_thread(), use an explicit FIFO for the
round-robin selection instead of a simple cyclic counter i := (i+1) % N.

This allows to schedule XA COMMIT/ROLLBACK dependencies explicitly without
changing the round-robin scheduling of other event groups.

Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-04-09 11:42:34 +03:00
Marko Mäkelä
2b01e5103d Merge 10.5 into 10.6 2023-12-19 18:41:42 +02:00
Marko Mäkelä
4ae105a37d Merge 10.4 into 10.5 2023-12-18 08:59:07 +02:00
Brandon Nesterenko
8dad51481b MDEV-10653: SHOW SLAVE STATUS Can Deadlock an Errored Slave
AKA rpl.rpl_parallel, binlog_encryption.rpl_parallel fails in
buildbot with timeout in include

A replication parallel worker thread can deadlock with another
connection running SHOW SLAVE STATUS. That is, if the replication
worker thread is in do_gco_wait() and is killed, it will already
hold the LOCK_parallel_entry, and during error reporting, try to
grab the err_lock. SHOW SLAVE STATUS, however, grabs these locks in
reverse order. It will initially grab the err_lock, and then try to
grab LOCK_parallel_entry. This leads to a deadlock when both threads
have grabbed their first lock without the second.

This patch implements the MDEV-31894 proposed fix to optimize the
workers_idle() check to compare the last in-use relay log’s
queued_count==dequeued_count for idleness. This removes the need for
workers_idle() to grab LOCK_parallel_entry, as these values are
atomically updated.

Huge thanks to Kristian Nielsen for diagnosing the problem!

Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Andrei Elkin <andrei.elkin@mariadb.com>
2023-12-11 07:45:23 -07:00
Oleksandr Byelkin
6bf8483cac Merge branch '10.5' into 10.6 2023-08-01 15:08:52 +02:00
Oleksandr Byelkin
7564be1352 Merge branch '10.4' into 10.5 2023-07-26 16:02:57 +02:00
Kristian Nielsen
a8ea6627a4 MDEV-31448: Killing a replica thread awaiting its GCO can hang/crash a parallel replica
The problem was an incorrect unmark_start_commit() in
signal_error_to_sql_driver_thread(). If an event group gets an error, this
unmark could run after the following GCO started, and the subsequent
re-marking could access de-allocated GCO.

The offending unmark_start_commit() looks obviously incorrect, and the fix
is to just remove it. It was introduced in the MDEV-8302 patch, the commit
message of which suggests it was added there solely to satisfy an assertion
in ha_rollback_trans(). So update this assertion instead to not trigger for
event groups that experienced an error (rgi->worker_error). When an error
occurs in an event group, all following event groups are skipped anyway, so
the unmark should never be needed in this case.

Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-07-12 09:41:32 +02:00
Kristian Nielsen
60bec1d54d MDEV-13915: STOP SLAVE takes very long time on a busy system
At STOP SLAVE, worker threads will continue applying event groups until the
end of the current GCO before stopping. This is a left-over from when only
conservative mode was available. In optimistic and aggressive mode, often
_all_ queued event will be in the same GCO, and slave stop will be
needlessly delayed.

This patch instead records at STOP SLAVE time the latest (highest sub_id)
event group that has started. Then worker threads will continue to apply
event groups up to that event group, but skip any following. The result is
that each worker thread will complete its currently running event group, and
then the slave will stop.

If the slave is caught up, and STOP SLAVE is run in the middle of an event
group that is already executing in a worker thread, then that event group
will be rolled back and the slave stop immediately, as normal.

Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-07-12 09:41:32 +02:00
Marko Mäkelä
5bada1246d Merge 10.5 into 10.6 2023-04-11 16:15:19 +03:00
Oleksandr Byelkin
ac5a534a4c Merge remote-tracking branch '10.4' into 10.5 2023-03-31 21:32:41 +02:00
Andrei
216d99bb39 MDEV-26071: rpl.rpl_perfschema_applier_status_by_worker failed in bb …
…with: Test assertion failed

Problem:
=======
Assertion text: 'Value returned by SSS and PS table for Last_Error_Number
should be same.'
Assertion condition: '"1146" = "0"'
Assertion condition, interpolated: '"1146" = "0"'
Assertion result: '0'

Analysis:
========
In parallel replication when slave is started the worker pool gets
activated and it gets cleared when slave stops. Each time the worker pool
gets activated a backup worker pool also gets created to store worker
specific perforance schema information in case of errors. On error, all
relevant information is copied from rpl_parallel_thread to rli and it gets
cleared from thread.  Then server waits for all workers to complete their
work, during this stage performance schema table specific worker info is
stored into the backup pool and finally the actual pool gets cleared. If
users query the performance schema table to know the status of workers the
information from backup pool will be used. The test simulates
ER_NO_SUCH_TABLE error and verifies the worker information in pfs table.
Test works fine if execution occurs in following order.

Step 1. Error occurred 'worker information is copied to backup pool'.
Step 2. handle_slave_sql invokes 'rpl_parallel_resize_pool_if_no_slaves' to
deactivate worker pool, it marks the pool->count=0
Step 3. PFS table is queried, since actual pool is deactivated backup pool
information is read.

If the Step 3 happens prior to Step2 the pool is yet to be deactivated and
the actual pool is read, which doesn't have any error details as they were
cleared. Hence test ocasionally fails.

Fix:
===
Upon error mark the back pool as being active so that if PFS table is
quried since the backup pool is flagged as valid its information will be
read, in case it is not flagged regular pool will be read.

This work is one of the last pieces created by the late Sujatha Sivakumar.
2023-03-24 15:56:24 +02:00
Andrei
d4339620be MDEV-30780 optimistic parallel slave hangs after hit an error
The hang could be seen as show slave status displaying an error like
    Last_Error: Could not execute Write_rows_v1
along with
    Slave_SQL_Running: Yes

accompanied with one of the replication threads in show-processlist
characteristically having status like

   2394 | system user  |    | NULL | Slave_worker | 50852| closing tables

It turns out that closing tables worker got entrapped in endless looping
in mark_start_commit_inner() across already garbage-collected gco items.

The reclaimed gco links are explained with actually possible
out-of-order groups of events termination due to the Last_Error.
This patch reinforces the correct ordering to perform
finish_event_group's cleanup actions, incl unlinking gco:s
from the active list.
2023-03-16 18:55:19 +02:00
Sujatha
fe9450676f MDEV-25502: rpl.rpl_perfschema_applier_status_by_worker failed in bb with: Test assertion failed
Problem:
=======
Test fails with 3 different symptoms
connection slave;
Assertion text: 'Last_Seen_Transaction should show .'
Assertion condition: '"0-1-1" = ""'
Assertion condition, interpolated: '"0-1-1" = ""'
Assertion result: '0'

connection slave;
Assertion text: 'Value returned by SSS and PS table for Last_Error_Number
                 should be same.'
Assertion condition: '"1146" = "0"'
Assertion condition, interpolated: '"1146" = "0"'
Assertion result: '0'

connection slave;
Assertion text: 'Value returned by PS table for worker_idle_time should be
                >= 1'
Assertion condition: '"0" >= "1"'
Assertion condition, interpolated: '"0" >= "1"'
Assertion result: '0'

Fix1:
====
Performance schema table's Last_Seen_Transaction is compared with 'SELECT
gtid_slave_pos'. Since DDLs are not transactional changes to user table and
gtid_slave_pos table are not guaranteed to be synchronous. To fix the
issue Gtid_IO_Pos value from SHOW SLAVE STATUS command will be used to
verify the correctness of Performance schema specific
Last_Seen_Transaction.

Fix2:
====
On error worker thread information is stored as part of backup pool. Access
to this backup pool should be protected by 'LOCK_rpl_thread_pool' mutex so
that simultaneous START SLAVE cannot destroy the backup pool, while it is
being queried by performance schema.

Fix3:
====
When a worker is waiting for events if performance schema table is queried,
at present it just returns the difference between current_time and
start_time.  This is incorrect. It should be worker_idle_time +
(current_time - start_time).

For example a worker thread was idle for 10 seconds and then it got events
to process. Upon completion it goes to idle state, now if the pfs table is
queried it should return current_idle time  + worker_idle_time.
2021-05-13 10:34:32 +05:30
Sujatha
f9bd7f2012 MDEV-20220: Merge 5.7 P_S replication table 'replication_applier_status_by_worker
Step 3:
======

Preserve worker pool information on either STOP SLAVE/Error.  In case STOP
SLAVE is executed worker threads will be gone, hence worker threads will be
unavailable. Querying the table at this stage will give empty rows. To
address this case when worker threads are about to stop, due to an error or
forced stop, create a backup pool and preserve the data which is relevant to
populate performance schema table. Clear the backup pool upon slave start.
2021-04-08 17:19:51 +05:30
Sujatha
036ee61246 MDEV-20220: Merge 5.7 P_S replication table 'replication_applier_status_by_worker
Step2:
=====
Add two extra columns mentioned below.
---------------------------------------------------------------------------
|Column Name:           |        Description:                             |
|-------------------------------------------------------------------------|
|                       |                                                 |
|WORKER_IDLE_TIME       | Total idle time in seconds that the worker      |
|                       | thread has spent waiting for work from          |
|                       | co-ordinator thread                             |
|                       |                                                 |
|LAST_TRANS_RETRY_COUNT | Total number of retries attempted by last       |
|                       | transaction                                     |
---------------------------------------------------------------------------
2021-04-08 17:19:51 +05:30
Sujatha
94f1d0f84d MDEV-20220: Merge 5.7 P_S replication table 'replication_applier_status_by_worker
Step1:
=====
Backport 'replication_applier_status_by_worker' from upstream.

Iterate through rpl_parallel_thread_pool and display slave worker thread
specific information as part of 'replication_applier_status_by_worker'
table.

---------------------------------------------------------------------------
|Column Name:           |        Description:                             |
|-------------------------------------------------------------------------|
|                       |                                                 |
|CHANNEL_NAME           | Name of replication channel through which the   |
|                       | transaction is received.                        |
|                       |                                                 |
|THREAD_ID              | Thread_Id as displayed in 'performance_schema.  |
|                       | threads' table for thread with name             |
|                       | 'thread/sql/rpl_parallel_thread'                |
|                       |                                                 |
|                       | THREAD_ID will be NULL when worker threads are  |
|                       | stopped due to an error/force stop              |
|                       |                                                 |
|SERVICE_STATE          | Thread is running or not                        |
|                       |                                                 |
|LAST_SEEN_TRANSACTION  | Last GTID executed by worker                    |
|                       |                                                 |
|LAST_ERROR_NUMBER      | Last Error that occured on a particular worker  |
|                       |                                                 |
|LAST_ERROR_MESSAGE     | Last error specific message                     |
|                       |                                                 |
|LAST_ERROR_TIMESTAMP   | Time stamp of last error                        |
|                       |                                                 |
---------------------------------------------------------------------------

CHANNEL_NAME will be empty when the worker has not processed any
transaction. Channel_name points to valid source channel_name when it is
processing a transaction/event group.
2021-04-08 17:19:51 +05:30
Marko Mäkelä
c515b1d092 Merge 10.4 into 10.5 2020-06-18 13:58:54 +03:00
Sachin
592a10d079 MDEV-22370 safe_mutex: Trying to lock uninitialized mutex at /data/src/10.4-bug/sql/rpl_parallel.cc, line 470 upon shutdown during FTWRL
Problem:- When we issue FTWRL with shutdown in parallel, there is race between
FTWRL and shutdown. Shutdown might destroy the mutex (pool->LOCK_rpl_thread_pool)
before FTWRL can lock it. So we can get crash on FTWRL thread

Solution:- mysql_mutex_destroy(pool->LOCK_rpl_thread_pool) should wait for
FTWRL thread to complete its work , and then destroy.
So slave_prepare_for_shutdown will just deactivate the pool, and mutex is destroyed
later in end_slave()
2020-06-17 02:22:46 +05:30
Andrei Elkin
c8ae357341 MDEV-742 XA PREPAREd transaction survive disconnect/server restart
Lifted long standing limitation to the XA of rolling it back at the
transaction's
connection close even if the XA is prepared.

Prepared XA-transaction is made to sustain connection close or server
restart.
The patch consists of

    - binary logging extension to write prepared XA part of
      transaction signified with
      its XID in a new XA_prepare_log_event. The concusion part -
      with Commit or Rollback decision - is logged separately as
      Query_log_event.
      That is in the binlog the XA consists of two separate group of
      events.

      That makes the whole XA possibly interweaving in binlog with
      other XA:s or regular transaction but with no harm to
      replication and data consistency.

      Gtid_log_event receives two more flags to identify which of the
      two XA phases of the transaction it represents. With either flag
      set also XID info is added to the event.

      When binlog is ON on the server XID::formatID is
      constrained to 4 bytes.

    - engines are made aware of the server policy to keep up user
      prepared XA:s so they (Innodb, rocksdb) don't roll them back
      anymore at their disconnect methods.

    - slave applier is refined to cope with two phase logged XA:s
      including parallel modes of execution.

This patch does not address crash-safe logging of the new events which
is being addressed by MDEV-21469.

CORNER CASES: read-only, pure myisam, binlog-*, @@skip_log_bin, etc

Are addressed along the following policies.
1. The read-only at reconnect marks XID to fail for future
   completion with ER_XA_RBROLLBACK.

2. binlog-* filtered XA when it changes engine data is regarded as
   loggable even when nothing got cached for binlog.  An empty
   XA-prepare group is recorded. Consequent Commit-or-Rollback
   succeeds in the Engine(s) as well as recorded into binlog.

3. The same applies to the non-transactional engine XA.

4. @@skip_log_bin=OFF does not record anything at XA-prepare
   (obviously), but the completion event is recorded into binlog to
   admit inconsistency with slave.

The following actions are taken by the patch.

At XA-prepare:
   when empty binlog cache - don't do anything to binlog if RO,
   otherwise write empty XA_prepare (assert(binlog-filter case)).

At Disconnect:
   when Prepared && RO (=> no binlogging was done)
     set Xid_cache_element::error := ER_XA_RBROLLBACK
     *keep* XID in the cache, and rollback the transaction.

At XA-"complete":
   Discover the error, if any don't binlog the "complete",
   return the error to the user.

Kudos
-----
Alexey Botchkov took to drive this work initially.
Sergei Golubchik, Sergei Petrunja, Marko Mäkelä provided a number of
good recommendations.
Sergei Voitovich made a magnificent review and improvements to the code.
They all deserve a bunch of thanks for making this work done!
2020-03-14 22:45:48 +02:00
Michael Widenius
d82ac8eaaf Change "static int" to enum in classes
This was done when static int where used as bit fields or enums
2017-04-18 12:23:40 +03:00
Marko Mäkelä
adc91387e3 Merge 10.0 into 10.1 2017-03-03 13:27:12 +02:00
Monty
e65f667bb6 MDEV-9573 'Stop slave' hangs on replication slave
The reason for this is that stop slave takes LOCK_active_mi over the
whole operation while some slave operations will also need LOCK_active_mi
which causes deadlocks.

Fixed by introducing object counting for Master_info and not taking
LOCK_active_mi over stop slave or even stop_all_slaves()

Another benefit of this approach is that it allows:
- Multiple threads can run SHOW SLAVE STATUS at the same time
- START/STOP/RESET/SLAVE STATUS on a slave will not block other slaves
- Simpler interface for handling get_master_info()
- Added some missing unlock of 'log_lock' in error condtions
- Moved rpl_parallel_inactivate_pool(&global_rpl_thread_pool) to end
  of stop_slave() to not have to use LOCK_active_mi inside
  terminate_slave_threads()
- Changed argument for remove_master_info() to Master_info, as we always
  have this available
- Fixed core dump when doing FLUSH TABLES WITH READ LOCK and parallel
  replication. Problem was that waiting for pause_for_ftwrl was not done
  when deleting rpt->current_owner after a force_abort.
2017-02-28 16:10:46 +01:00
Sergei Golubchik
3b0c7ac1f9 Merge branch '10.0' into 10.1 2016-03-21 13:02:53 +01:00
Otto Kekäläinen
1777fd5f55 Fix spelling: occurred, execute, which etc 2016-03-04 02:09:37 +02:00
Sergei Golubchik
a2bcee626d Merge branch '10.0' into 10.1 2015-12-21 21:24:22 +01:00
Monty
b30a768e7b Fixed failures in rpl_parallel2
Problem was that we used same condition variable with 2 different mutex.
Fixed by changing to use COND_rpl_thread_stop instead of COND_parallel_entry
for stopping threads.

Patch by Kristian Nielsen
2015-11-23 19:58:30 +02:00
Kristian Nielsen
8f2e05f41c Merge branch 'mdev7818-4' into 10.1
Conflicts:
	mysql-test/suite/perfschema/r/stage_mdl_global.result
	sql/rpl_rli.cc
	sql/sql_parse.cc
2015-11-13 14:24:40 +01:00
Kristian Nielsen
ba02550166 MDEV-7818: Deadlock occurring with parallel replication and FTWRL
Problem is that FLUSH TABLES WITH READ LOCK first blocks threads from
starting new commits, then waits for running commits to complete. But
in-order parallel replication needs commits to happen in a particular
order, so this can easily deadlock.

To fix this problem, this patch introduces a way to temporarily pause
the parallel replication worker threads. Before starting FTWRL, we let
all worker threads complete in-progress transactions, and then
wait. Then we proceed to take the global read lock. Once the lock is
obtained, we unpause the worker threads. Now commits are blocked from
starting by the global read lock, so the deadlock will no longer occur.
2015-11-13 14:02:15 +01:00
Kristian Nielsen
903f8dc72d Merge MDEV-8147 into 10.1 2015-05-26 15:03:22 +02:00
Kristian Nielsen
e5f1e841dc MDEV-8147: Assertion `m_lock_type == 2' failed in handler::ha_close() during parallel replication
When the slave processes the master restart format_description event,
parallel replication needs to complete any prior events before processing
the restart event (which closes temporary tables and such stuff).

This happens in wait_for_workers_idle(), however it was not waiting long
enough. The wait was using wait_for_prior_commit(), but at that points table
can still be open. This lead to assertion in this case.

So change wait_for_workers_idle() to wait until all worker threads have
reached finish_event_group(), at which point all tables should have been
closed.
2015-05-26 13:04:15 +02:00
Kristian Nielsen
167332597f Merge 10.0 -> 10.1.
Conflicts:
	mysql-test/suite/multi_source/multisource.result
	sql/sql_base.cc
2015-04-17 15:18:44 +02:00
Kristian Nielsen
f573b65e41 Merge MDEV-7847 and MDEV-7882 into 10.0.
Conflicts:
	mysql-test/suite/rpl/r/rpl_parallel.result
	sql/rpl_parallel.cc
2015-03-30 15:10:29 +02:00
Kristian Nielsen
c41e4d3b49 Merge MDEV-7847 and MDEV-7882 into 10.0.
Conflicts:
	mysql-test/suite/rpl/r/rpl_parallel.result
	mysql-test/suite/rpl/t/rpl_parallel.test
2015-03-30 14:51:25 +02:00
Kristian Nielsen
880f2273fd MDEV-7847: "Slave worker thread retried transaction 10 time(s) in vain, giving up", followed by replication hanging
This patch fixes a bug in the error handling in parallel replication, when one
worker thread gets a failure and other worker threads processing later
transactions have to rollback and abort.

The problem was with the lifetime of group_commit_orderer objects (GCOs).
A GCO is freed when we register that its last event group has committed. This
relies on register_wait_for_prior_commit() and wait_for_prior_commit() to
ensure that the fact that T2 has committed implies that any earlier T1 has
also committed, and can thus no longer execute mark_start_commit().

However, in the error case, the code was skipping the
register_wait_for_prior_commit() and wait_for_prior_commit() calls. Thus
commit ordering was not guaranteed, and a GCO could be freed too early. Then a
later mark_start_commit() would reference deallocated GCO, which could lead to
lost wakeup (causing slave threads to hang) or other corruption.

This patch makes also the error case respect commit order. This way, also the
error case gets the GCO lifetime correct, and the hang no longer occurs.
2015-03-30 14:33:44 +02:00
Kristian Nielsen
bd2ae787ea MDEV-7825: Parallel replication race condition on gco->flags, possibly resulting in slave hang
The patch for optimistic parallel replication as a memory optimisation moved
the gco->installed field into a bit in gco->flags. However, that is just plain
wrong. The gco->flags field is owned by the SQL driver thread, but
gco->installed is used by the worker threads, so this will cause a race
condition.

The user-visible problem might be conflicts between transactions and/or slave
threads hanging.

So revert this part of the optimistic parallel replication patch, going back
to using a separate field gco->installed like in 10.0.
2015-03-24 16:33:51 +01:00
Kristian Nielsen
ed04c40b01 MDEV-5289: master server starts slave parallel threads
Delay spawning parallel replication worker threads until a slave SQL
thread is running, and de-spawn them when the last SQL thread stops.

This is especially useful to avoid needless threads on a master in a
setup where same my.cnf is used on masters and slaves.
2015-03-11 09:18:16 +01:00
Kristian Nielsen
95d7208859 Merge MDEV-6589 and MDEV-6403 into 10.1.
Conflicts:
	sql/log.cc
	sql/rpl_rli.cc
	sql/sql_repl.cc
2015-03-04 13:49:37 +01:00
Kristian Nielsen
ad0d203f2e MDEV-6589: Incorrect relay log start position when restarting SQL thread after error in parallel replication
The problem occurs in parallel replication in GTID mode, when we are using
multiple replication domains. In this case, if the SQL thread stops, the
slave GTID position may refer to a different point in the relay log for each
domain.

The bug was that when the SQL thread was stopped and restarted (but the IO
thread was kept running), the SQL thread would resume applying the relay log
from the point of the most advanced replication domain, silently skipping all
earlier events within other domains. This caused replication corruption.

This patch solves the problem by storing, when the SQL thread stops with
multiple parallel replication domains active, the current GTID
position. Additionally, the current position in the relay logs is moved back
to a point known to be earlier than the current position of any replication
domain. Then when the SQL thread restarts from the earlier position, GTIDs
encountered are compared against the stored GTID position. Any GTID that was
already applied before the stop is skipped to avoid duplicate apply.

This patch should have no effect if multi-domain GTID parallel replication is
not used. Similarly, if both SQL and IO thread are stopped and restarted, the
patch has no effect, as in this case the existing relay logs are removed and
re-fetched from the master at the current global @@gtid_slave_pos.
2015-03-04 13:36:04 +01:00
Sergei Golubchik
4b21cd21fe Merge branch '10.0' into merge-wip 2015-01-31 21:48:47 +01:00
Kristian Nielsen
f27817c1d0 MDEV-7326: Server deadlock in connection with parallel replication
The bug occurs when a transaction does a retry after all transactions have
done mark_start_commit() in a batch of group commit from the master. In this
case, the retrying transaction can unmark_start_commit() after the following
batch has already started running and de-allocated the GCO. Then after retry,
the transaction will re-do mark_start_commit() on a de-allocated GCO, and also
wakeup of later GCOs can be lost.

This was seen "in the wild" by a user, even though it is not known exactly
what circumstances can lead to retry of one transaction after all transactions
in a group have reached the commit phase.

The lifetime around GCO was somewhat clunky anyway. With this patch, a GCO
lives until rpl_parallel_entry::last_committed_sub_id has reached the last
transaction in the GCO. This guarantees that the GCO will still be alive when
a transaction does mark_start_commit(). Also, we now loop over the list of
active GCOs for wakeup, to ensure we do not lose a wakeup even in the
problematic case.
2015-01-07 14:45:39 +01:00
Kristian Nielsen
db21fddc37 MDEV-6676: Optimistic parallel replication
Implement a new mode for parallel replication. In this mode, all transactions
are optimistically attempted applied in parallel. In case of conflicts, the
offending transaction is rolled back and retried later non-parallel.

This is an early-release patch to facilitate testing, more changes to user
interface / options will be expected. The new mode is not enabled by default.
2014-12-06 08:49:50 +01:00
Kristian Nielsen
eec04fb4f6 MDEV-6680: Performance of domain_parallel replication is disappointing
The code that handles free lists of various objects passed to worker threads
in parallel replication handles freeing in batches, to avoid taking and
releasing LOCK_rpl_thread too often. However, it was possible for freeing to
be delayed to the point where one thread could stall the SQL driver thread due
to full queue, while other worker threads might be idle. This could
significantly degrade possible parallelism and thus performance.

Clean up the batch freeing code so that it is more robust and now able to
regularly free batches of object, so that normally the queue will not run full
unless the SQL driver thread is really far ahead of the worker threads.
2014-11-13 10:20:48 +01:00
Kristian Nielsen
c6a60f6d79 MDEV-6321: close_temporary_tables() in format description event not serialised correctly
After-review fixes.

Mainly catching if the wait in wait_for_workers_idle() is aborted due to kill.
In this case, we should return an error and not proceed to execute the format
description event, as other threads might still be running for a bit until the
error is caught in all threads.
2014-08-20 10:59:39 +02:00
Kristian Nielsen
453c29c3f7 MDEV-6321: close_temporary_tables() in format description event not serialised correctly
Follow-up patch, fixing a possible deadlock issue.

If the master crashes in the middle of an event group, there can be an active
transaction in a worker thread when we encounter the following master restart
format description event. In this case, we need to notify that worker thread
to abort and roll back the partial event group. Otherwise a deadlock occurs:
the worker thread waits for the commit that never arrives, and the SQL driver
thread waits for the worker thread to complete its event group, which it never
does.
2014-08-19 14:26:42 +02:00
Kristian Nielsen
4cb1e0eea0 MDEV-6321: close_temporary_tables() in format description event not serialised correctly
When a master server starts up, it logs a special format_description event at
the start of a new binlog to mark that is has restarted. This is used by a
slave to drop all temporary tables - this is needed in case the master crashed
and did not have a chance to send explicit DROP TEMPORARY TABLE statements to
the slave.

In parallel replication, we need to be careful when dropping the temporary
tables - we need to be sure that no prior events are still executing that
might be using the temporary tables to be dropped, _and_ that no following
events have started executing that might have created new temporary tables
that should not be dropped.

This was not handled correctly, which could cause errors about access to not
existing temporary tables or even crashes. This patch implements that such
format_description events cause serialisation of event execution; all prior
events are executed to completion first, then the format_description event is
executed, dropping temporary tables, then following events are queued for
execution.

Master restarts should be sufficiently infrequent that the resulting loss of
parallelism should be of minimal impact.
2014-07-02 12:51:45 +02:00
Kristian Nielsen
cfa1ce81bb MDEV-6551: Some replication errors are ignored if slave_parallel_threads > 0
The problem occured when using parallel replication, and an error occured that
caused the SQL thread to stop when the IO thread had already reached a
following binlog file from the master (or otherwise performed a relay log
rotation).

In this case, the Rotate Event at the end of the relay log file could still be
executed, even though an earlier event in that relay log file had gotten an
error. This would cause the position to be incorrectly updated, so that upon
restart of the SQL thread, the event that had failed would be silently skipped
and ignored, causing replication corruption.

Fixed by checking before executing Rotate Event, whether an earlier event
has failed. If so, the Rotate Event is not executed, just dequeued, same as
for other normal events following a failing event.
2014-08-15 11:31:13 +02:00
unknown
bd4153a8c2 MDEV-5262, MDEV-5914, MDEV-5941, MDEV-6020: Deadlocks during parallel
replication causing replication to fail.

Remove the temporary fix for MDEV-5914, which used READ COMMITTED for parallel
replication worker threads. Replace it with a better, more selective solution.

The issue is with certain edge cases of InnoDB gap locks, for example between
INSERT and ranged DELETE. It is possible for the gap lock set by the DELETE to
block the INSERT, if the DELETE runs first, while the record lock set by
INSERT does not block the DELETE, if the INSERT runs first. This can cause a
conflict between the two in parallel replication on the slave even though they
ran without conflicts on the master.

With this patch, InnoDB will ask the server layer about the two involved
transactions before blocking on a gap lock. If the server layer tells InnoDB
that the transactions are already fixed wrt. commit order, as they are in
parallel replication, InnoDB will ignore the gap lock and allow the two
transactions to proceed in parallel, avoiding the conflict.

Improve the fix for MDEV-6020. When InnoDB itself detects a deadlock, it now
asks the server layer for any preferences about which transaction to roll
back. In case of parallel replication with two transactions T1 and T2 fixed to
commit T1 before T2, the server layer will ask InnoDB to roll back T2 as the
deadlock victim, not T1. This helps in some cases to avoid excessive deadlock
rollback, as T2 will in any case need to wait for T1 to complete before it can
itself commit.

Also some misc. fixes found during development and testing:

 - Remove thd_rpl_is_parallel(), it is not used or needed.

 - Use KILL_CONNECTION instead of KILL_QUERY when a parallel replication
   worker thread is killed to resolve a deadlock with fixed commit
   ordering. There are some cases, eg. in sql/sql_parse.cc, where a KILL_QUERY
   can be ignored if the query otherwise completed successfully, and this
   could cause the deadlock kill to be lost, so that the deadlock was not
   correctly resolved.

 - Fix random test failure due to missing wait_for_binlog_checkpoint.inc.

 - Make sure that deadlock or other temporary errors during parallel
   replication are not printed to the the error log; there were some places
   around the replication code with extra error logging. These conditions can
   occur occasionally and are handled automatically without breaking
   replication, so they should not pollute the error log.

 - Fix handling of rgi->gtid_sub_id. We need to be able to access this also at
   the end of a transaction, to be able to detect and resolve deadlocks due to
   commit ordering. But this value was also used as a flag to mark whether
   record_gtid() had been called, by being set to zero, losing the value. Now,
   introduce a separate flag rgi->gtid_pending, so rgi->gtid_sub_id remains
   valid for the entire duration of the transaction.

 - Fix one place where the code to handle ignored errors called reset_killed()
   unconditionally, even if no error was caught that should be ignored. This
   could cause loss of a deadlock kill signal, breaking deadlock detection and
   resolution.

 - Fix a couple of missing mysql_reset_thd_for_next_command(). This could
   cause a prior error condition to remain for the next event executed,
   causing assertions about errors already being set and possibly giving
   incorrect error handling for following event executions.

 - Fix code that cleared thd->rgi_slave in the parallel replication worker
   threads after each event execution; this caused the deadlock detection and
   handling code to not be able to correctly process the associated
   transactions as belonging to replication worker threads.

 - Remove useless error code in slave_background_kill_request().

 - Fix bug where wfc->wakeup_error was not cleared at
   wait_for_commit::unregister_wait_for_prior_commit(). This could cause the
   error condition to wrongly propagate to a later wait_for_prior_commit(),
   causing spurious ER_PRIOR_COMMIT_FAILED errors.

 - Do not put the binlog background thread into the processlist. It causes
   too many result differences in mtr, but also it probably is not useful
   for users to pollute the process list with a system thread that does not
   really perform any user-visible tasks...
2014-06-10 10:13:15 +02:00
unknown
787c470cef MDEV-5262: Missing retry after temp error in parallel replication
Handle retry of event groups that span multiple relay log files.

 - If retry reaches the end of one relay log file, move on to the next.

 - Handle refcounting of relay log files, and avoid purging relay log
   files until all event groups have completed that might have needed
   them for transaction retry.
2014-05-15 15:52:08 +02:00