1
0
mirror of https://github.com/MariaDB/server.git synced 2025-09-11 05:52:26 +03:00
Commit Graph

3659 Commits

Author SHA1 Message Date
Oleksandr Byelkin
fecd78b837 Merge branch '10.10' into 10.11 2023-11-08 16:46:47 +01:00
Oleksandr Byelkin
04d9a46c41 Merge branch '10.6' into 10.10 2023-11-08 16:23:30 +01:00
Oleksandr Byelkin
b83c379420 Merge branch '10.5' into 10.6 2023-11-08 15:57:05 +01:00
Oleksandr Byelkin
6cfd2ba397 Merge branch '10.4' into 10.5 2023-11-08 12:59:00 +01:00
Monty
e5a5573f78 rpl.rpl_invoked_features fails sporadically with "Duplicate key error"
The reason was that Event e11 was re-executed before
"ALTER EVENT e11 DISABLE" had been executed.

Fixed by increasing re-schedule time

Other things:
- Removed double accounting of 'execution_count'. It was incremented in
  top->mark_last_executed(thd) that was executed a few lines earlier.
2023-11-03 11:42:52 +02:00
Monty
22ffe4dd22 rpl.rpl_invoked_features fails sporadically with "Duplicate key error"
The reason was that Event e11 was re-executed before
"ALTER EVENT e11 DISABLE" had been executed.

Fixed by increasing re-schedule time

Other things:
- Removed double accounting of 'execution_count'. It was incremented in
  top->mark_last_executed(thd) that was executed a few lines earlier.
2023-11-02 11:49:03 +02:00
Brandon Nesterenko
80ea3590de MDEV-32655: rpl_semi_sync_slave_compressed_protocol.test assert_only_after is wrong
The MTR test rpl.rpl_semi_sync_slave_compressed_protocol scans the
log file to ensure there is no magic number error. It attempts to
only scan the log files of the current test; however, the variable
which controls this, , is initialized incorrectly,
and it thereby scans the entire log file, which includes output from
prior tests. This causes it to fail if a test which expects this
error runs previously on the same worker.

This patch fixes the assert_only_after so the test only scans
through its own log contents.
2023-11-01 09:10:17 -06:00
Brandon Nesterenko
c341743e83 MDEV-32651: Lost Debug_sync signal in rpl_sql_thd_start_errno_cleared
The test rpl.rpl_sql_thd_start_errno_cleared can lose a debug_sync
signal, as there is a RESET immediately following a SIGNAL. When the
signal is lost, the sql_thread is stuck in a WAIT_FOR clause until
it times out, resulting in long test times (albeit still
successful).

This patch extends the test to ensure the debug_sync signal was
received before issuing the RESET
2023-11-01 07:35:07 -06:00
Kristian Nielsen
6fa69ad747 MDEV-27436: binlog corruption (/tmp no space left on device at the same moment)
This commit fixes several bugs in error handling around disk full when
writing the statement/transaction binlog caches:

1. If the error occurs during a non-transactional statement, the code
attempts to binlog the partially executed statement (as it cannot roll
back). The stmt_cache->error was still set from the disk full error. This
caused MYSQL_BIN_LOG::write_cache() to get an error while trying to read the
cache to copy it to the binlog. This was then wrongly interpreted as a disk
full error writing to the binlog file. As a result, a partial event group
containing just a GTID event (no query or commit) was binlogged. Fixed by
checking if an error is set in the statement cache, and if so binlog an
INCIDENT event instead of a corrupt event group, as for other errors.

2. For LOAD DATA LOCAL INFILE, if a disk full error occured while writing to
the statement cache, the code would attempt to abort and read-and-discard
any remaining data sent by the client. The discard code would however
continue trying to write data to the statement cache, and wrongly interpret
another disk full error as end-of-file from the client. This left the client
connection with extra data which corrupts the communication for the next
command, as well as again causing an corrupt/incomplete event to be
binlogged. Fixed by restoring the default read function before reading any
remaining data from the client connection.

Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-10-31 11:48:00 +01:00
Kristian Nielsen
b8f9f796ff MDEV-31273: Precompute binlog checksums
Compute binlog checksums (when enabled) already when writing events
into the statement or transaction caches, where before it was done
when the caches are copied to the real binlog file. This moves the
checksum computation outside of holding LOCK_log, improving
scalabitily.

At stmt/trx cache write time, the final end_log_pos values are not
known, so with this patch these will be set to 0. Events that are
written directly to the binlog file (not through stmt/trx cache) keep
the correct end_log_pos value. The GTID and COMMIT/XID events at the
start and end of event groups are written directly, so the zero
end_log_pos is only for events in the middle of event groups, which
do not negatively affect replication.

An option --binlog-legacy-event-pos, off by default, is provided to
disable this behavior to provide backwards compatibility with any
external applications that might rely on end_log_pos in events in the
middle of event groups.

Checksums cannot be pre-computed when binlog encryption is enabled, as
encryption relies on correct end_log_pos to provide part of the
nonce/IV.

Checksum pre-computation is also disabled for WSREP/Galera, as it uses
events differently in its write-sets and so on. Extending pre-computation of
checksums to Galera where it makes sense could be added in a future patch.

The current --binlog-checksum configuration is saved in
binlog_cache_data at transaction start and used to pre-compute
checksums in cache, if applicable. When the cache is later copied to
the binlog, a check is made if the saved value still matches the
configured global value; if so, the events are block-copied directly
into the binlog file. If --binlog-checksum was changed during the
transaction, events are re-written to the binlog file one-by-one and
the checksums recomputed/discarded as appropriate.

Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-10-27 19:57:43 +02:00
Andrei
728bca44e8 MDEV-32593 Assertion failure upon CREATE SEQUENCE
A recently added by MDEV-32593 assert conditions are corrected.
2023-10-27 12:26:34 +03:00
Marko Mäkelä
7b842f1536 Merge 11.2 into 11.3 2023-10-27 10:48:29 +03:00
Yuchen Pei
d0f8dfbcf0 Merge branch '11.1' into 11.2 2023-10-27 18:11:56 +11:00
Andrei
9c43343213 MDEV-32365 detailize the semisync replication magic number error
Semisync ack (master side) receiver thread is made to report
details of faced errors.
In case of 'magic byte' error, a hexdump of the received packet
is always (level) NOTEd into the error log.
In other cases an exact server level error is print out
as a warning (as it may not be critical) under log_warnings > 2.

An MTR test added for the magic byte error. For others existing mtr
tests cover that, provided log_warnings > 2 is set.
2023-10-26 20:24:44 +03:00
Brandon Nesterenko
c5f776e9fa MDEV-32265: seconds_behind_master is inaccurate for Delayed replication
If a replica is actively delaying a transaction when restarted (STOP
SLAVE/START SLAVE), when the sql thread is back up,
Seconds_Behind_Master will present as 0 until the configured
MASTER_DELAY has passed. That is, before the restart,
last_master_timestamp is updated to the timestamp of the delayed
event. Then after the restart, the negation of sql_thread_caught_up
is skipped because the timestamp of the event has already been used
for the last_master_timestamp, and their update is grouped together
in the same conditional block.

This patch fixes this by separating the negation of
sql_thread_caught_up out of the timestamp-dependent block, so it is
called any time an idle parallel slave queues an event to a worker.

Note that sql_thread_caught_up is still left in the check for internal
events, as SBM should remain idle in such case to not "magically" begin
incrementing.

Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
2023-10-23 14:25:03 -06:00
Brandon Nesterenko
0c1bf5e247 MDEV-27247: Add keywords "SQL_BEFORE_GTIDS" and "SQL_AFTER_GTIDS" for START SLAVE UNTIL
New Feature:
============
This patch extends the START SLAVE UNTIL command with options
SQL_BEFORE_GTIDS and SQL_AFTER_GTIDS to allow user control of
whether the replica stops before or after a provided GTID state. Its
syntax is:

START SLAVE UNTIL (SQL_BEFORE_GTIDS|SQL_AFTER_GTIDS)=”<gtid_list>”

When providing SQL_BEFORE_GTIDS=”<gtid_list>”, for each domain
specified in the gtid_list, the replica will execute transactions up
to the GTID found, and immediately stop processing events in that
domain (without executing the transaction of the specified GTID).
Once all domains have stopped, the replica will stop. Events
originating from domains that are not specified in the list are not
replicated.

START SLAVE UNTIL SQL_AFTER_GTIDS=”<gtid_list>” is an alias to the
default behavior of START SLAVE UNTIL master_gtid_pos=”<gtid_list>”.
That is, the replica will only execute transactions originating from
domain ids provided in the list, and will stop once all transactions
provided in the UNTIL list have all been executed.

Example:
=========
If a primary server has a binary log consisting of the following GTIDs:

0-1-1
1-1-1
0-1-2
1-1-2
0-1-3
1-1-3

If a fresh replica (i.e. one with an empty GTID position,
@@gtid_slave_pos='') is started with SQL_BEFORE_GTIDS, i.e.

START SLAVE UNTIL SQL_BEFORE_GTIDS=”1-1-2”

The resulting gtid_slave_pos of the replica will be “1-1-1”.
This is because the replica will execute only events from domain 1
until it sees the transaction with sequence number 2, and
immediately stop without executing it.

If the replica is started with SQL_AFTER_GTIDS, i.e.

START SLAVE UNTIL SQL_AFTER_GTIDS=”1-1-2”

then the resulting gtid_slave_pos of the replica will be “1-1-2”.
This is because it will only execute events from domain 1 until it
has executed the provided GTID.

Reviewed By:
============
Kristian Nielson <knielsen@knielsen-hq.org>
2023-10-23 06:40:05 -06:00
Andrei
1fe4a71b67 MDEV-31792 Assertion fails in MDL_context::acquire_lock upon parallel replication of CREATE SEQUENCE
The assert's reason was in missed FL_DDL flagging of CREATE-or-REPLACE
Query event.
MDEV-27365 fixes covered only the non-pre-existing table execution branch so
did not see a possibility of implicit commit in
the middle of execution in a rollback branch when the being CREATEd
sequence table is actually replaced.
The pre-existing table branch cleared the DDL modification
flag so the query lost FL_DDL in binlog and its parallel execution
on slave may have ended up with the assert to indicate the query
is raced by a following in binlog order event.

Fixed with applying the MDEV-27365 pattern.
An mtr test is added to cover the rollback situation.
The description test [ pass ] with a generous number of mtr parallel
reties.
2023-10-23 15:39:51 +03:00
Marko Mäkelä
9b2a65e41a Merge 11.0 into 11.1 2023-10-19 08:26:16 +03:00
Marko Mäkelä
be24e75229 Merge 10.11 into 11.0 2023-10-19 08:12:16 +03:00
Marko Mäkelä
2ecc0443ec Merge 10.10 into 10.11 2023-10-17 16:04:21 +03:00
Marko Mäkelä
d5e15424d8 Merge 10.6 into 10.10
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.

Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue

Disabled test:
- spider/bugfix.mdev_27239 because we started to get
  +Error	1429 Unable to connect to foreign data source: localhost
  -Error	1158 Got an error reading communication packets
- main.delayed
  - Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
    This part is disabled for now as it fails randomly with different
    warnings/errors (no corruption).
2023-10-14 13:36:11 +03:00
Alexander Barkov
6400b199ac MDEV-32249 strings/ctype-ucs2.c:2336: my_vsnprintf_utf32: Assertion `(n % 4) == 0' failed in my_vsnprintf_utf32 on INSERT
The crash inside my_vsnprintf_utf32() happened correctly,
because the caller methods:
  Field_string::sql_rpl_type()
  Field_varstring::sql_rpl_type()
mis-used the charset library and sent pure ASCII data to the
virtual function snprintf() of a utf32 CHARSET_INFO.

It was wrong to use Field::charset() in sql_rpl_type().
We're printing the metadata (the data type) here, not the column data.
The string contraining the data type of a CHAR/VARCHAR column
is a pure ASCII string.

Fixing to use res->charset() to print, like all virtual implementations
of sql_type() do.

Review was done by Andrei Elkin.
Thanks to Andrei for proposing MTR test improvents.
2023-10-11 22:39:36 +04:00
Sergei Golubchik
df4bfefbb8 compile-time deprecation reminders
remove old deprecation helpers that were not used anywhere.

create new deprecation helpers and enforce their usage

this also removes inconsistencies in reporting deprecation:
sometimes it was ER_WARN_DEPRECATED_SYNTAX (1287),
sometimes ER_WARN_DEPRECATED_SYNTAX_NO_REPLACEMENT (1681),
sometimes a warning, sometimes a note.

it should always be
* ER_WARN_DEPRECATED_SYNTAX
* a warning (because it's something actionable, not purely informational)
2023-09-30 14:43:12 +02:00
Sergei Golubchik
82174dae06 MDEV-32104 remove deprecated features
In particular:

* @@debug
  deprecated since 5.5.37
* sr_YU locale
  deprecated since 10.0.11
* "engine_condition_pushdown" in the @@optimizer_switch
  deprecated since 10.1.1
* @@date_format, @@datetime_format, @@time_format, @@max_tmp_tables
  deprecated since  10.1.2
* @@wsrep_causal_reads
  deprecated since 10.1.3
* "parser" in mroonga table comment
  deprecated since 10.2.11
2023-09-30 14:43:12 +02:00
Sergei Golubchik
970c885d1a Merge branch '11.1' into 11.2 2023-09-24 09:38:34 +02:00
Sergei Golubchik
f031889ae4 Merge branch '11.0' into 11.1 2023-09-24 01:46:43 +02:00
Marko Mäkelä
6a470db552 Merge 10.5 into 10.6 2023-09-14 15:25:53 +03:00
Yuchen Pei
cb1965bd9d Merge branch '10.4' into 10.5 2023-09-14 16:30:11 +10:00
Marko Mäkelä
0f9acce3f2 Merge 10.5 into 10.6 2023-09-14 09:01:15 +03:00
Brandon Nesterenko
1407f99963 MDEV-31177: SHOW SLAVE STATUS Last_SQL_Errno Race Condition on Errored Slave Restart
The SQL thread and a user connection executing SHOW SLAVE STATUS
have a race condition on Last_SQL_Errno, such that a slave which
previously errored and stopped, on its next start, SHOW SLAVE STATUS
can show that the SQL Thread is running while the previous error is
also showing.

The fix is to move when the last error is cleared when the SQL
thread starts to occur before setting the status of
Slave_SQL_Running.

Thanks to Kristian Nielson for his work diagnosing the problem!

Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
Kristian Nielson <knielsen@knielsen-hq.org>
2023-09-13 12:01:47 -06:00
Brandon Nesterenko
7de0c7b569 MDEV-31038: rpl.rpl_xa_prepare_gtid_fail clean up
- Removed commented out and unused lines.
- Updated test to reference true failure of timeout
  rather than deadlock
- Switched save variables from MTR to user
- Forced relay-log purge to not potentially re-execute
  an already prepared transaction
2023-09-13 10:59:26 -06:00
Alexander Barkov
baf00fc553 Merge remote-tracking branch 'origin/10.11' into 11.0 2023-08-18 07:34:54 +04:00
Sergei Golubchik
18ddde4826 Merge branch '11.1' into 11.2 2023-08-18 00:59:16 +02:00
Sergei Petrunia
725bd56834 Merge 10.10 into 10.11 2023-08-17 13:44:05 +03:00
Marko Mäkelä
9cd2989589 Merge 10.6 into 10.10 2023-08-16 15:28:42 +03:00
Kristian Nielsen
7c9837ce74 Merge 10.4 into 10.5
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-08-15 18:02:18 +02:00
Kristian Nielsen
805e0668c9 MDEV-31482: Lock wait timeout with INSERT-SELECT, autoinc, and statement-based replication
Remove the exception that InnoDB does not report auto-increment locks waits
to the parallel replication.

There was an assumption that these waits could not cause conflicts with
in-order parallel replication and thus need not be reported. However, this
assumption is wrong and it is possible to get conflicts that lead to hangs
for the duration of --innodb-lock-wait-timeout. This can be seen with three
transactions:

1. T1 is waiting for T3 on an autoinc lock
2. T2 is waiting for T1 to commit
3. T3 is waiting on a normal row lock held by T2

Here, T3 needs to be deadlock killed on the wait by T1.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-08-15 16:40:02 +02:00
Kristian Nielsen
18acbaf416 MDEV-31655: Parallel replication deadlock victim preference code errorneously removed
Restore code to make InnoDB choose the second transaction as a deadlock
victim if two transactions deadlock that need to commit in-order for
parallel replication. This code was erroneously removed when VATS was
implemented in InnoDB.

Also add a test case for InnoDB choosing the right deadlock victim.
Also fixes this bug, with testcase that reliably reproduces:

MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master

Reviewed-by: Marko Mäkelä <marko.makela@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-08-15 16:39:49 +02:00
Kristian Nielsen
900c4d6920 MDEV-31655: Parallel replication deadlock victim preference code errorneously removed
Restore code to make InnoDB choose the second transaction as a deadlock
victim if two transactions deadlock that need to commit in-order for
parallel replication. This code was erroneously removed when VATS was
implemented in InnoDB.

Also add a test case for InnoDB choosing the right deadlock victim.
Also fixes this bug, with testcase that reliably reproduces:

MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master

Note: This should be null-merged to 10.6, as a different fix is needed
there due to InnoDB locking code changes.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-08-15 16:35:30 +02:00
Kristian Nielsen
920789e9d4 MDEV-31482: Lock wait timeout with INSERT-SELECT, autoinc, and statement-based replication
Remove the exception that InnoDB does not report auto-increment locks waits
to the parallel replication.

There was an assumption that these waits could not cause conflicts with
in-order parallel replication and thus need not be reported. However, this
assumption is wrong and it is possible to get conflicts that lead to hangs
for the duration of --innodb-lock-wait-timeout. This can be seen with three
transactions:

1. T1 is waiting for T3 on an autoinc lock
2. T2 is waiting for T1 to commit
3. T3 is waiting on a normal row lock held by T2

Here, T3 needs to be deadlock killed on the wait by T1.

Note: This should be null-merged to 10.6, as a different fix is needed
there due to InnoDB lock code changes.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2023-08-15 16:34:09 +02:00
Nikita Malyavin
a1af525588 MDEV-31804 Assertion `thd->m_transaction_psi == __null' fails
... upon replicating online ALTER

When an online event is applied and slave_exec_mode is idempotent,
Write_rows_log_event::do_before_row_operations had reset
thd->lex->sql_command to SQLCOM_REPLACE.

This led to that a statement was detected as a row-type during binlogging,
and was logged as not standalone.

So the corresponding Gtid_log_event, when applied on replica, did not exit
early and created a new PSI transaction. Hence the difference with
non-online ALTER.
2023-08-15 14:00:28 +02:00
Nikita Malyavin
982b689566 MDEV-31838 Assertion fails upon replication online alter with MINIMAL row
Replica honors its own binlog_row_image value when it sets up read_set.
When a slave thread is applying replicated row events in parallel with the
running online alter, we need all columns to be read from the table
(for the online alter logged row event) not only those that were present in
the pre-image for the replicated row event.

Avoid shrinking the set when online alter is running, i.e. leave it all set
2023-08-15 14:00:28 +02:00
Andrei
bac8f189ed MDEV-31755 Replica's DML event deadlocks wit online alter table
The deadlock was caused by too strong MDL acquired by the start ALTER.

Replica's ALTER TABLE replication consists of two phases:
1. Start ALTER (SA) -- the event is emittd in the very beginning,
allowing replication start ALTER in parallel
2. Commit ALTER (CA) -- ensures that master finishes successfully

CA is normally received by wait_for_master call.
If parallel DML was run, the following sequence will take place:

|- SA
|- DML
|- CA

If CA is handled after MDL upgrade, it'll will deadlock with DML.

While MDL is shared by the start ALTER wait for its 2nd part
to allow concurrent DMLs to grab the lock.

The fix uses wait_for_master reentrancy -- no need to avoid a second call
in the end of mysql_alter_table.

Since SA and CA are marked with FL_DDL, the DML issued in-between cannot be
rescheduled before or after them. However, SA "commits" (by he call of
write_bin_log_start_alter and, subsequently,
thd->wakeup_subsequent_commits) before the copy stage begins, unlocking
the DMLs to run on this table. That is, these DMLs will be executed
concurrently with the copy stage, making Online alter effective on replicas
as well

Co-authored-by: Nikita Malyavin (nikitamalyavin@gmail.com)
2023-08-15 14:00:28 +02:00
Marko Mäkelä
5f6e987481 Merge 10.11 into 11.0 2023-08-15 12:02:07 +03:00
Marko Mäkelä
acc90ce363 Merge 10.10 into 10.11 2023-08-15 11:24:41 +03:00
Marko Mäkelä
17f5f1cba9 Merge 10.6 into 10.10 2023-08-15 11:22:36 +03:00
Marko Mäkelä
3fee1b4471 Merge 10.5 into 10.6 2023-08-15 11:21:34 +03:00
Nikita Malyavin
5f206259e5 MDEV-30985 Replica stops with error on ALTER ONLINE with Geometry Types
The bug is inherent for row-based replication as well.
To reproduce, a virtual (not stored) field of a blob type computed from
another field of a different blob type is required.

The following happens during an update or delete row event:
1. A row is unpacked.
2. Virtual fields are updated. Field b1 stores the pointer in
   Field_blob::value and references it in table->record[0].
3. record[0] is stored to record[1] in Rows_log_event::find_row.
4. A new record is fetched from handler. (e.g. ha_rnd_next)
5. Virtual columns are updated (only non-stored).
6. Field b1 receives new value. Old value is deallocated
   (Field_blob::val_str).
7. record_compare is called. record[0] and record[1] are compared.
8. record[1] contains a reference to a freed value.

record_compare is used in replication to find a matching record for update
or delete. Virtual columns that are not stored should be definitely skipped
both for correctness, and for this bug fix.

STORED virtual columns, on the other hand, may be required and shouldn't be
skipped. Stored columns are not affected, since they are not updated after
handler's fetch.
2023-08-15 10:16:13 +02:00
Nikita Malyavin
30756775d5 MDEV-29069 follow-up: support partially usable keys 2023-08-15 10:16:13 +02:00
Nikita Malyavin
daebec6093 rename rpl/rpl_alter_instant -> rpl/rpl_alter_innodb 2023-08-15 10:16:12 +02:00