1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-08 11:22:35 +03:00
Commit Graph

2867 Commits

Author SHA1 Message Date
Marko Mäkelä
12995559f9 Merge 10.4 into 10.5 2023-12-19 18:30:58 +02:00
Sergei Golubchik
8c8bce05d2 Merge branch '10.11' into 11.0 2023-12-19 15:53:18 +01:00
Kristian Nielsen
eaa4968fc5 MDEV-10653: Fix segfault in SHOW MASTER STATUS with NULL inuse_relaylog
The previous patch for MDEV-10653 changes the rpl_parallel::workers_idle()
function to use Relay_log_info::last_inuse_relaylog to check for idle
workers. But the code was missing a NULL check. Also, there was one place
during SQL slave thread start which was missing mutex synchronisation when
updating inuse_relaylog.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-12-19 12:08:54 +01:00
Sergei Golubchik
fd0b47f9d6 Merge branch '10.6' into 10.11 2023-12-18 11:19:04 +01:00
Marko Mäkelä
4ae105a37d Merge 10.4 into 10.5 2023-12-18 08:59:07 +02:00
Sergei Golubchik
e95bba9c58 Merge branch '10.5' into 10.6 2023-12-17 11:20:43 +01:00
Brandon Nesterenko
8dad51481b MDEV-10653: SHOW SLAVE STATUS Can Deadlock an Errored Slave
AKA rpl.rpl_parallel, binlog_encryption.rpl_parallel fails in
buildbot with timeout in include

A replication parallel worker thread can deadlock with another
connection running SHOW SLAVE STATUS. That is, if the replication
worker thread is in do_gco_wait() and is killed, it will already
hold the LOCK_parallel_entry, and during error reporting, try to
grab the err_lock. SHOW SLAVE STATUS, however, grabs these locks in
reverse order. It will initially grab the err_lock, and then try to
grab LOCK_parallel_entry. This leads to a deadlock when both threads
have grabbed their first lock without the second.

This patch implements the MDEV-31894 proposed fix to optimize the
workers_idle() check to compare the last in-use relay log’s
queued_count==dequeued_count for idleness. This removes the need for
workers_idle() to grab LOCK_parallel_entry, as these values are
atomically updated.

Huge thanks to Kristian Nielsen for diagnosing the problem!

Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Andrei Elkin <andrei.elkin@mariadb.com>
2023-12-11 07:45:23 -07:00
Sergei Golubchik
98a39b0c91 Merge branch '10.4' into 10.5 2023-12-02 01:02:50 +01:00
Monty
1ffa8c5072 Fixed build failure on aarch64-macos
debug_sync.h was wrongly combined with replication
2023-11-28 15:31:44 +02:00
Vladislav Vaintroub
013fc02a23 MDEV-32567 Remove thr_alarm from server codebase
This allows to simplify net_real_read() and net_real_write() a bit.

Removed some superfluous #ifdef/ifndef MYSQL_SERVER from net_serv.cc
The code always runs in server, either normal or embedded.
Dead code for switching socket between blocking and non-blocking modes,
is also removed.

Removed pthread_kill() with alarm signal that woke up main thread on
server shutdown. Used shutdown(2) on polling sockets instead, to the same
effect.

Removed yet another superstitious pthread_kill(), that ran on non-Windows
in terminate_slave_thread().
2023-11-23 11:52:38 +11:00
Vladislav Vaintroub
bb8e1bf7a2 Merge 11.3 into 11.4 2023-11-21 15:43:20 +01:00
Anel Husakovic
a7d186a17d MDEV-32168: slave_error_param condition is never checked from the wait_for_slave_param.inc
- Reviewer: <knielsen@knielsen-hq.org>
            <brandon.nesterenko@mariadb.com>
            <andrei.elkin@mariadb.com>
2023-11-16 10:41:11 +01:00
Oleksandr Byelkin
34272bd6a5 Merge branch '11.2' into 11.3 2023-11-14 18:33:03 +01:00
Oleksandr Byelkin
48af85db21 Merge branch '10.11' into 11.0 2023-11-08 17:09:44 +01:00
Oleksandr Byelkin
fecd78b837 Merge branch '10.10' into 10.11 2023-11-08 16:46:47 +01:00
Oleksandr Byelkin
04d9a46c41 Merge branch '10.6' into 10.10 2023-11-08 16:23:30 +01:00
Oleksandr Byelkin
b83c379420 Merge branch '10.5' into 10.6 2023-11-08 15:57:05 +01:00
Oleksandr Byelkin
6cfd2ba397 Merge branch '10.4' into 10.5 2023-11-08 12:59:00 +01:00
Marko Mäkelä
7b842f1536 Merge 11.2 into 11.3 2023-10-27 10:48:29 +03:00
Kristian Nielsen
8eee9806fb MDEV-31273: Eliminate Log_event::checksum_alg
This is a preparatory commit for pre-computing checksums outside of
holding LOCK_log, no functional changes.

Which checksum algorithm is used (if any) when writing an event does not
belong in the event, it is a property of the log being written to.

Instead decide the checksum algorithm when constructing the
Log_event_writer object, and store it there.

Introduce a client-only Log_event::read_checksum_alg to be able to
print the checksum read, and a
Format_description_log_event::source_checksum_alg which is the
checksum algorithm (if any) to use when reading events from a log.

Also eliminate some redundant `enum` keywords on the enum_binlog_checksum_alg
type.

Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-10-26 20:45:35 +02:00
Brandon Nesterenko
c5f776e9fa MDEV-32265: seconds_behind_master is inaccurate for Delayed replication
If a replica is actively delaying a transaction when restarted (STOP
SLAVE/START SLAVE), when the sql thread is back up,
Seconds_Behind_Master will present as 0 until the configured
MASTER_DELAY has passed. That is, before the restart,
last_master_timestamp is updated to the timestamp of the delayed
event. Then after the restart, the negation of sql_thread_caught_up
is skipped because the timestamp of the event has already been used
for the last_master_timestamp, and their update is grouped together
in the same conditional block.

This patch fixes this by separating the negation of
sql_thread_caught_up out of the timestamp-dependent block, so it is
called any time an idle parallel slave queues an event to a worker.

Note that sql_thread_caught_up is still left in the check for internal
events, as SBM should remain idle in such case to not "magically" begin
incrementing.

Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
2023-10-23 14:25:03 -06:00
Brandon Nesterenko
0c1bf5e247 MDEV-27247: Add keywords "SQL_BEFORE_GTIDS" and "SQL_AFTER_GTIDS" for START SLAVE UNTIL
New Feature:
============
This patch extends the START SLAVE UNTIL command with options
SQL_BEFORE_GTIDS and SQL_AFTER_GTIDS to allow user control of
whether the replica stops before or after a provided GTID state. Its
syntax is:

START SLAVE UNTIL (SQL_BEFORE_GTIDS|SQL_AFTER_GTIDS)=”<gtid_list>”

When providing SQL_BEFORE_GTIDS=”<gtid_list>”, for each domain
specified in the gtid_list, the replica will execute transactions up
to the GTID found, and immediately stop processing events in that
domain (without executing the transaction of the specified GTID).
Once all domains have stopped, the replica will stop. Events
originating from domains that are not specified in the list are not
replicated.

START SLAVE UNTIL SQL_AFTER_GTIDS=”<gtid_list>” is an alias to the
default behavior of START SLAVE UNTIL master_gtid_pos=”<gtid_list>”.
That is, the replica will only execute transactions originating from
domain ids provided in the list, and will stop once all transactions
provided in the UNTIL list have all been executed.

Example:
=========
If a primary server has a binary log consisting of the following GTIDs:

0-1-1
1-1-1
0-1-2
1-1-2
0-1-3
1-1-3

If a fresh replica (i.e. one with an empty GTID position,
@@gtid_slave_pos='') is started with SQL_BEFORE_GTIDS, i.e.

START SLAVE UNTIL SQL_BEFORE_GTIDS=”1-1-2”

The resulting gtid_slave_pos of the replica will be “1-1-1”.
This is because the replica will execute only events from domain 1
until it sees the transaction with sequence number 2, and
immediately stop without executing it.

If the replica is started with SQL_AFTER_GTIDS, i.e.

START SLAVE UNTIL SQL_AFTER_GTIDS=”1-1-2”

then the resulting gtid_slave_pos of the replica will be “1-1-2”.
This is because it will only execute events from domain 1 until it
has executed the provided GTID.

Reviewed By:
============
Kristian Nielson <knielsen@knielsen-hq.org>
2023-10-23 06:40:05 -06:00
Marko Mäkelä
be24e75229 Merge 10.11 into 11.0 2023-10-19 08:12:16 +03:00
Marko Mäkelä
2ecc0443ec Merge 10.10 into 10.11 2023-10-17 16:04:21 +03:00
Marko Mäkelä
d5e15424d8 Merge 10.6 into 10.10
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.

Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue

Disabled test:
- spider/bugfix.mdev_27239 because we started to get
  +Error	1429 Unable to connect to foreign data source: localhost
  -Error	1158 Got an error reading communication packets
- main.delayed
  - Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
    This part is disabled for now as it fails randomly with different
    warnings/errors (no corruption).
2023-10-14 13:36:11 +03:00
Marko Mäkelä
6a470db552 Merge 10.5 into 10.6 2023-09-14 15:25:53 +03:00
Yuchen Pei
cb1965bd9d Merge branch '10.4' into 10.5 2023-09-14 16:30:11 +10:00
Marko Mäkelä
0f9acce3f2 Merge 10.5 into 10.6 2023-09-14 09:01:15 +03:00
Brandon Nesterenko
1407f99963 MDEV-31177: SHOW SLAVE STATUS Last_SQL_Errno Race Condition on Errored Slave Restart
The SQL thread and a user connection executing SHOW SLAVE STATUS
have a race condition on Last_SQL_Errno, such that a slave which
previously errored and stopped, on its next start, SHOW SLAVE STATUS
can show that the SQL Thread is running while the previous error is
also showing.

The fix is to move when the last error is cleared when the SQL
thread starts to occur before setting the status of
Slave_SQL_Running.

Thanks to Kristian Nielson for his work diagnosing the problem!

Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
Kristian Nielson <knielsen@knielsen-hq.org>
2023-09-13 12:01:47 -06:00
sjaakola
a3cbc44b24 MDEV-31833 replication breaks when using optimistic replication and replica is a galera node
MariaDB async replication SQL thread was stopped for any failure
in applying of replication events and error message logged for the failure
was: "Node has dropped from cluster". The assumption was that event applying
failure is always due to node dropping out.
With optimistic parallel replication, event applying can fail for natural
reasons and applying should be retried to handle the failure. This retry
logic was never exercised because the slave SQL thread was stopped with first
applying failure.

To support optimistic parallel replication retrying logic this commit will
now skip replication slave abort, if node remains in cluster (wsrep_ready==ON)
and replication is configured for optimistic or aggressive retry logic.

During the development of this fix, galera.galera_as_slave_nonprim test showed
some problems. The test was analyzed, and it appears to need some attention.
One excessive sleep command was removed in this commit, but it will need more
fixes still to be fully deterministic. After this commit galera_as_slave_nonprim
is successful, though.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2023-09-12 02:37:30 +02:00
Alexander Barkov
baf00fc553 Merge remote-tracking branch 'origin/10.11' into 11.0 2023-08-18 07:34:54 +04:00
Sergei Petrunia
725bd56834 Merge 10.10 into 10.11 2023-08-17 13:44:05 +03:00
Marko Mäkelä
9cd2989589 Merge 10.6 into 10.10 2023-08-16 15:28:42 +03:00
Kristian Nielsen
7c9837ce74 Merge 10.4 into 10.5
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-08-15 18:02:18 +02:00
Kristian Nielsen
18acbaf416 MDEV-31655: Parallel replication deadlock victim preference code errorneously removed
Restore code to make InnoDB choose the second transaction as a deadlock
victim if two transactions deadlock that need to commit in-order for
parallel replication. This code was erroneously removed when VATS was
implemented in InnoDB.

Also add a test case for InnoDB choosing the right deadlock victim.
Also fixes this bug, with testcase that reliably reproduces:

MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master

Reviewed-by: Marko Mäkelä <marko.makela@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-08-15 16:39:49 +02:00
Kristian Nielsen
900c4d6920 MDEV-31655: Parallel replication deadlock victim preference code errorneously removed
Restore code to make InnoDB choose the second transaction as a deadlock
victim if two transactions deadlock that need to commit in-order for
parallel replication. This code was erroneously removed when VATS was
implemented in InnoDB.

Also add a test case for InnoDB choosing the right deadlock victim.
Also fixes this bug, with testcase that reliably reproduces:

MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master

Note: This should be null-merged to 10.6, as a different fix is needed
there due to InnoDB locking code changes.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-08-15 16:35:30 +02:00
Marko Mäkelä
5f6e987481 Merge 10.11 into 11.0 2023-08-15 12:02:07 +03:00
Marko Mäkelä
acc90ce363 Merge 10.10 into 10.11 2023-08-15 11:24:41 +03:00
Marko Mäkelä
17f5f1cba9 Merge 10.6 into 10.10 2023-08-15 11:22:36 +03:00
Marko Mäkelä
3fee1b4471 Merge 10.5 into 10.6 2023-08-15 11:21:34 +03:00
Marko Mäkelä
599c4d9a40 Merge 10.4 into 10.5 2023-08-15 11:10:27 +03:00
Oleksandr Byelkin
51f9d62005 Merge branch '10.11' into 11.0 2023-08-09 07:53:48 +02:00
Oleksandr Byelkin
036df5f970 Merge branch '10.10' into 10.11 2023-08-08 14:57:31 +02:00
Jan Lindström
277968aa4c MDEV-31413 : Node has been dropped from the cluster on Startup / Shutdown with async replica
There was two related problems:

(1) Galera node that is defined as a slave to async MariaDB
master at restart might do SST (state stransfer) and
part of that it will copy mysql.gtid_slave_pos table.
Problem is that updates on that table are not replicated
on a cluster. Therefore, table from donor that is not
slave is copied and joiner looses gtid position it was
and start executing events from wrong position of the binlog.
This incorrect position could break replication and
causes node to be dropped and requiring user action.

(2) Slave sql thread might start executing events before
galera is ready (wsrep_ready=ON) and that could also
cause node to be dropped from the cluster.

In this fix we enable replication of mysql.gtid_slave_pos
table on a cluster. In this way all nodes in a cluster
will know gtid slave position and even after SST joiner
knows correct gtid position to start.

Furthermore, we wait galera to be ready before slave
sql thread executes any events to prevent too early
execution.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2023-08-08 03:25:56 +02:00
Oleksandr Byelkin
ced243a099 Merge branch '10.9' into 10.10 2023-08-05 20:34:09 +02:00
Oleksandr Byelkin
34a8e78581 Merge branch '10.6' into 10.9 2023-08-04 08:01:06 +02:00
Oleksandr Byelkin
6bf8483cac Merge branch '10.5' into 10.6 2023-08-01 15:08:52 +02:00
Oleksandr Byelkin
7564be1352 Merge branch '10.4' into 10.5 2023-07-26 16:02:57 +02:00
Brandon Nesterenko
063f4ac25e MDEV-30619: Parallel Slave SQL Thread Can Update Seconds_Behind_Master with Active Workers
MDEV-31749 sporadic assert in MDEV-30619 new test

If the workers of a parallel replica are busy (potentially with long
queues), but the SQL thread has no events left to distribute (so it
goes idle), then the next event that comes from the primary will
update mi->last_master_timestamp with its timestamp, even if the
workers have not yet finished.

This patch changes the parallel replica logic which updates
last_master_timestamp after idling from using solely sql_thread_caught_up
(added in MDEV-29639) to using the latter with rli queued/dequeued
event counters.
That is, if  the queued count is equal to the dequeued count, it
means all events have been processed and the replica is considered
idle when the driver thread has also distributed all events.

Low level details of the commit include
- to make a more generalized test for Seconds_Behind_Master on
  the parallel replica, rpl_delayed_parallel_slave_sbm.test
  is renamed to rpl_parallel_sbm.test for this purpose.
- pause_sql_thread_on_next_event usage was removed
  with the MDEV-30619 fixes. Rather than remove it, we adapt it
  to the needs of this test case
- added test case to cover SBM spike of relay log read and LMT
  update that was fixed by MDEV-29639
- rpl_seconds_behind_master_spike.test is made to use
  the negate_clock_diff_with_master debug eval.

Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
2023-07-25 16:36:14 +03:00
Sergei Golubchik
0005f2f06c Merge branch 'bb-10.11-release' into bb-11.0-release 2023-06-05 19:27:00 +02:00