- `default_client` is included already in rpl_1slave_base.cnf`, so
remove it from `my.cnf`
- Remove option group for `mysqld` server as and add comment how to
override specific settings for specific server
- Reviewer: <brandon.nesterenko@mariadb.com>
The SQL thread and a user connection executing SHOW SLAVE STATUS
have a race condition on Last_SQL_Errno, such that a slave which
previously errored and stopped, on its next start, SHOW SLAVE STATUS
can show that the SQL Thread is running while the previous error is
also showing.
The fix is to move when the last error is cleared when the SQL
thread starts to occur before setting the status of
Slave_SQL_Running.
Thanks to Kristian Nielson for his work diagnosing the problem!
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
Kristian Nielson <knielsen@knielsen-hq.org>
- Removed commented out and unused lines.
- Updated test to reference true failure of timeout
rather than deadlock
- Switched save variables from MTR to user
- Forced relay-log purge to not potentially re-execute
an already prepared transaction
Remove the exception that InnoDB does not report auto-increment locks waits
to the parallel replication.
There was an assumption that these waits could not cause conflicts with
in-order parallel replication and thus need not be reported. However, this
assumption is wrong and it is possible to get conflicts that lead to hangs
for the duration of --innodb-lock-wait-timeout. This can be seen with three
transactions:
1. T1 is waiting for T3 on an autoinc lock
2. T2 is waiting for T1 to commit
3. T3 is waiting on a normal row lock held by T2
Here, T3 needs to be deadlock killed on the wait by T1.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Restore code to make InnoDB choose the second transaction as a deadlock
victim if two transactions deadlock that need to commit in-order for
parallel replication. This code was erroneously removed when VATS was
implemented in InnoDB.
Also add a test case for InnoDB choosing the right deadlock victim.
Also fixes this bug, with testcase that reliably reproduces:
MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master
Reviewed-by: Marko Mäkelä <marko.makela@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Restore code to make InnoDB choose the second transaction as a deadlock
victim if two transactions deadlock that need to commit in-order for
parallel replication. This code was erroneously removed when VATS was
implemented in InnoDB.
Also add a test case for InnoDB choosing the right deadlock victim.
Also fixes this bug, with testcase that reliably reproduces:
MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master
Note: This should be null-merged to 10.6, as a different fix is needed
there due to InnoDB locking code changes.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Remove the exception that InnoDB does not report auto-increment locks waits
to the parallel replication.
There was an assumption that these waits could not cause conflicts with
in-order parallel replication and thus need not be reported. However, this
assumption is wrong and it is possible to get conflicts that lead to hangs
for the duration of --innodb-lock-wait-timeout. This can be seen with three
transactions:
1. T1 is waiting for T3 on an autoinc lock
2. T2 is waiting for T1 to commit
3. T3 is waiting on a normal row lock held by T2
Here, T3 needs to be deadlock killed on the wait by T1.
Note: This should be null-merged to 10.6, as a different fix is needed
there due to InnoDB lock code changes.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
... upon replicating online ALTER
When an online event is applied and slave_exec_mode is idempotent,
Write_rows_log_event::do_before_row_operations had reset
thd->lex->sql_command to SQLCOM_REPLACE.
This led to that a statement was detected as a row-type during binlogging,
and was logged as not standalone.
So the corresponding Gtid_log_event, when applied on replica, did not exit
early and created a new PSI transaction. Hence the difference with
non-online ALTER.
Replica honors its own binlog_row_image value when it sets up read_set.
When a slave thread is applying replicated row events in parallel with the
running online alter, we need all columns to be read from the table
(for the online alter logged row event) not only those that were present in
the pre-image for the replicated row event.
Avoid shrinking the set when online alter is running, i.e. leave it all set
The deadlock was caused by too strong MDL acquired by the start ALTER.
Replica's ALTER TABLE replication consists of two phases:
1. Start ALTER (SA) -- the event is emittd in the very beginning,
allowing replication start ALTER in parallel
2. Commit ALTER (CA) -- ensures that master finishes successfully
CA is normally received by wait_for_master call.
If parallel DML was run, the following sequence will take place:
|- SA
|- DML
|- CA
If CA is handled after MDL upgrade, it'll will deadlock with DML.
While MDL is shared by the start ALTER wait for its 2nd part
to allow concurrent DMLs to grab the lock.
The fix uses wait_for_master reentrancy -- no need to avoid a second call
in the end of mysql_alter_table.
Since SA and CA are marked with FL_DDL, the DML issued in-between cannot be
rescheduled before or after them. However, SA "commits" (by he call of
write_bin_log_start_alter and, subsequently,
thd->wakeup_subsequent_commits) before the copy stage begins, unlocking
the DMLs to run on this table. That is, these DMLs will be executed
concurrently with the copy stage, making Online alter effective on replicas
as well
Co-authored-by: Nikita Malyavin (nikitamalyavin@gmail.com)
The bug is inherent for row-based replication as well.
To reproduce, a virtual (not stored) field of a blob type computed from
another field of a different blob type is required.
The following happens during an update or delete row event:
1. A row is unpacked.
2. Virtual fields are updated. Field b1 stores the pointer in
Field_blob::value and references it in table->record[0].
3. record[0] is stored to record[1] in Rows_log_event::find_row.
4. A new record is fetched from handler. (e.g. ha_rnd_next)
5. Virtual columns are updated (only non-stored).
6. Field b1 receives new value. Old value is deallocated
(Field_blob::val_str).
7. record_compare is called. record[0] and record[1] are compared.
8. record[1] contains a reference to a freed value.
record_compare is used in replication to find a matching record for update
or delete. Virtual columns that are not stored should be definitely skipped
both for correctness, and for this bug fix.
STORED virtual columns, on the other hand, may be required and shouldn't be
skipped. Stored columns are not affected, since they are not updated after
handler's fetch.
* in online ALTER it must include the complete new row,
note that an UPDATE should set all extra columns to their
default values, as if UPDATE was completely done before the ALTER.
* in rpl WRITE_ROWS_EVENT it must include all extra slave columns,
but not existing columns unmarked in the m_cols (sequences do that)
* in rpl UPDATE/DELETE events it should follow m_cols_ai
also: default values must be updated for WRITE_ROWS_EVENT and
for UPDATE/DELETE in the online ALTER mode, see above.
Update the result file accordingly.
Extend bitmap_copy() to support arguments of different lengths
We shouldn't rely on `fill_extra_persistent_columns`, as it only updates
fields which have an index > cols->n_bits (replication bitmap width).
Actually, it should never be used, as its approach is error-prone.
Normal update_virtual_fields+update_default_fields should be done.
unpack_row() must calculate all stored and indexed vcols
(in fill_extra_persistent_columns()).
Also Update and Delete row events must mark in read_set
all columns needed for calculating all stored and indexed vcols.
If it's done properly in do_apply_event(), it no longer needs
to be repeated per row.
Additionally fixes the bugs uncovered in:
- `MDEV-28332: Alter on temporary table causes ER_TABLE_EXISTS_ERROR note`
Since there is no `warning` issued by shadowing of base table, this MDEV
is irrelevant. Keeping for review purposes and for future development
in case shadowing is going to be implemented
- `MDEV-28334: SHOW TABLE STATUS shows all temporary tables ignoring database and conditions`
- `MDEV-28453: SHOW commands are inconsistent for temporary tables`
Reviewed by: <monty@mariadb.org>,
<vicentiu@mariadb.org>
The test case accessed slave-relay-bin.000003 without waiting for the IO
thread to write it first. If the IO thread was slow, this could fail.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
A simple "SET SESSION gtid_seq_no= DEFAULT" did not work, it would straight
up crash the server! Also, explicitly setting gtid_seq_no to 0 gave an error
in --gtid-strict-mode=1.
Setting to DEFAULT or 0 should disable any prior setting of
gtid_seq_no, so that the next transaction is allocated the next GTID
in sequence, as normal.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
This patch adds for "--ps-protocol" second execution
of queries "SELECT".
Also in this patch it is added ability to disable/enable
(--disable_ps2_protocol/--enable_ps2_protocol) second
execution for "--ps-prototocol" in testcases.
MDEV-31749 sporadic assert in MDEV-30619 new test
If the workers of a parallel replica are busy (potentially with long
queues), but the SQL thread has no events left to distribute (so it
goes idle), then the next event that comes from the primary will
update mi->last_master_timestamp with its timestamp, even if the
workers have not yet finished.
This patch changes the parallel replica logic which updates
last_master_timestamp after idling from using solely sql_thread_caught_up
(added in MDEV-29639) to using the latter with rli queued/dequeued
event counters.
That is, if the queued count is equal to the dequeued count, it
means all events have been processed and the replica is considered
idle when the driver thread has also distributed all events.
Low level details of the commit include
- to make a more generalized test for Seconds_Behind_Master on
the parallel replica, rpl_delayed_parallel_slave_sbm.test
is renamed to rpl_parallel_sbm.test for this purpose.
- pause_sql_thread_on_next_event usage was removed
with the MDEV-30619 fixes. Rather than remove it, we adapt it
to the needs of this test case
- added test case to cover SBM spike of relay log read and LMT
update that was fixed by MDEV-29639
- rpl_seconds_behind_master_spike.test is made to use
the negate_clock_diff_with_master debug eval.
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>