It places a limit N (a timeout value in milliseconds) on how long
a statement is permitted to execute before the server terminates it.
Syntax:
SELECT /*+ MAX_EXECUTION_TIME(milliseconds) */ ...
Only top-level SELECT statements support the hint.
Nothing is broken by this, but the function
Rows_log_event::write_data_header
slightly over-allocates its buf by 2 bytes. It allocates 10
(ROWS_HEADER_LEN_V2), but only 8 are ever written.
This patch fixes the buf size to match what is written (i.e.
ROWS_HEADER_LEN_V1, or 8 bytes).
Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
* rpl.rpl_system_versioning_partitions updated for MDEV-32188
* innodb.row_size_error_log_warnings_3 changed error for MDEV-33658
(checks are done in a different order)
This commit fixes a bug in MSAN verification in debug builds
related to the member of a lex->definer structure field that may
be uninitialized for some SET statements. The bug is caused by
debug output in a wsrep-specific insert and does not affect
server functionality.
Implementation of this task adds ability to raise the signal with
SQLSTATE '02TRG' from a BEFORE INSERT/UPDATE/DELETE trigger and handles
this signal as an indicator meaning 'to throw away the current row'
on processing the INSERT/UPDATE/DELETE statement. The signal with
SQLSTATE '02TRG' has special meaning only in case it is raised inside
BEFORE triggers, for AFTER trigger's this value of SQLSTATE isn't treated
in any special way. In according with SQL standard, the SQLSTATE class '02'
means NO DATA and sql_errno for this class is set to value
ER_SIGNAL_NOT_FOUND by current implementation of MariaDB server.
Implementation of this task assigns the value ER_SIGNAL_SKIP_ROW_FROM_TRIGGER
to sql_errno in Diagnostics_area in case the signal is raised from a trigger
and SQLSTATE has value '02TRG'.
To catch signal with SQLTSATE '02TRG' and handle it in special way, the methods
Table_triggers_list::process_triggers
select_insert::store_values
select_create::store_values
Rows_log_event::process_triggers
and the overloaded function
fill_record_n_invoke_before_triggers
were extended with extra out parameter for returning the flag whether
to skip the current values being processed by INSERT/UPDATE/DELETE
statement. This extra parameter is passed as nullptr in case of AFTER trigger
and BEFORE trigger this parameter points to a variable to store a marker
whether to skip the current record or store it by calling write_record().
In Log_event::read_log_event(), don't use IO_CACHE::error of the relay log's
IO_CACHE to signal an error back to the caller. When reading the active
relay log, this flag is also being used by the IO thread, and setting it can
randomly cause the IO thread to wrongly detect IO error on writing and
permanently disable the relay log.
This was seen sporadically in test case rpl.rpl_from_mysql80. The read
error set by the SQL thread in the IO_CACHE would be interpreted as a
write error by the IO thread, which would cause it to throw a fatal
error and close the relay log. And this would later cause CHANGE
MASTER to try to purge a closed relay log, resulting in nullptr crash.
SQL thread is not able to parse an event read from the relay log. This
can happen like here when replicating unknown events from a MySQL master,
potentially also for other reasons.
Also fix a mistake in my_b_flush_io_cache() introduced back in 2001
(fa09f2cd7e) where my_b_flush_io_cache() could wrongly return an error set
in IO_CACHE::error, even if the flush operation itself succeeded.
Also fix another sporadic failure in rpl.rpl_from_mysql80 where the outout
of MASTER_POS_WAIT() depended on timing of SQL and IO thread.
Reviewed-by: Monty <monty@mariadb.org>
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Two-Phase ALTER added a sa_seq_no field, but `Gtid_log_event::write()`'s
size calculation doesn't have an addend in its name.
This patch resizes the buffer to match `write()`'s code.
Handle null bits for record comparison in row events the same way as in
handler::calculate_checksum(), forcing bits that can be undefined to 1.
These bits are the trailing unused bits, as well as the first bit for
tables not using HA_OPTION_PACK_RECORD.
The csv storage engine leaves these bits at 0, while the row-based
replication has them set to 1, which otherwise cause can't find record error.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The assertion occurred in the SQL thread if an event group was incompletely
written, missing the end XID or COMMIT event, and immediately followed by a
new event group. This could also lead to the incomplete event group being
committed, and with the wrong GTID.
Fix by rolling back any active transaction from a prior event group when
applying the following GTID event.
Getting an incomplete event like this is somewhat rare to happen. If the
server crashes in the middle of writing an event group, the server restart
will write a new format description event, which makes the slave roll back
the partial event group. But presumably it could happen if the master
experiences temporary write errors in the binlog, like intermittent disk
full for example.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Parallel slave failed to retry in retry_event_group() with error
WSREP: Parallel slave worker failed at wsrep_before_command() hook
Fix wsrep transaction cleanup/restart in retry_event_group() to properly
clean up previous transaction by calling wsrep_after_statement().
Also move call to reset error after call to wsrep_after_statement()
to make sure that it remains effective.
Add a MTR test galera_as_slave_parallel_retry to reproduce the error
when the fix is not present.
Other issues which were detected when testing with sysbench:
Check if parallel slave is killed for retry before waiting for prior
commits in THD::wsrep_parallel_slave_wait_for_prior_commit(). This
is required with slave-parallel-mode=optimistic to avoid deadlock
when a slave later in commit order manages to reach prepare phase
before a lock conflict is detected.
Suppress wsrep applier specific warning for slave threads.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Fill in the `todo:gtid` in `check_and_remove_stale_alter`
(Note that `Master_info::master_id` is a `ulong`,
unlike `rpl_gtid::server_id`)
> We could have caught this before MDEV-11675 was
> published if we'd had this validation earlier 😇 .
> ⸺ Brandon, reply in #3360 (MDEV-21978)
Co-authored-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
That PR uncovered countless issues on `my_snprintf` uses.
This commit backports a squashed subset of their fixes.
(Excludes previous parts #3485 and #3493)
Problem was missing thd->set_time() before binlog event
execution in wsrep_apply_events.
Removed part of earlier commit 1363580 because it had
nothing to do with VERSIONED tables.
Note that this commit does not contain mtr-testcase
because actual timestamps on binlog file depends the
actual time when events are executed and how long
their execution takes.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
create templates
thd->alloc<X>(n) to use instead of (X*)thd->alloc(sizeof(X)*n)
and the same for thd->calloc(). By the default the type is char,
so old usage of thd->alloc(size) works too.
1. Binlog commit by rotate (MDEV-32014) should not be
used with Galera, yet while WSREP binlog emulation
is active, the code path could lead into
binlog_cache_data::write_prepare() in an invalid
state, leading to errors in MTR. To fix, an extra
check is added to ensure the binlog is actually
active before calling write_prepare().
2. If the #binlog_cache_files directory exists on a
mariadbd run without opt_log_bin, the directory
was treated as a table/database, leading to errors.
To fix, on startup, if opt_log_bin is disabled and
#binlog_cache_files exists (in the default log
directory), the directory is deleted (and an
informational message is provided in the error
log)
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
for large transaction
Description
===========
When a transaction commits, it copies the binlog events from
binlog cache to binlog file. Very large transactions
(eg. gigabytes) can stall other transactions for a long time
because the data is copied while holding LOCK_log, which blocks
other commits from binlogging.
The solution in this patch is to rename the binlog cache file to
a binlog file instead of copy, if the commiting transaction has
large binlog cache. Rename is a very fast operation, it doesn't
block other transactions a long time.
Design
======
* binlog_large_commit_threshold
type: ulonglong
scope: global
dynamic: yes
default: 128MB
Only the binlog cache temporary files large than 128MB are
renamed to binlog file.
* #binlog_cache_files directory
To support rename, all binlog cache temporary files are managed
as normal files now. `#binlog_cache_files` directory is in the same
directory with binlog files. It is created at server startup if it doesn't
exist. Otherwise, all files in the directory is deleted at startup.
The temporary files are named with ML_ prefix and the memorary address
of the binlog_cache_data object which guarantees it is unique.
* Reserve space
To supprot rename feature, It must reserve enough space at the
begin of the binlog cache file. The space is required for
Format description, Gtid list, checkpoint and Gtid events when
renaming it to a binlog file.
Since binlog_cache_data's cache_log is directly accessed by binlog log,
online alter and wsrep. It is not easy to update all the code. Thus
binlog cache will not reserve space if it is not session binlog cache or
wsrep session is enabled.
- m_file_reserved_bytes
Stores the bytes reserved at the begin of the cache file.
It is initialized in write_prepare() and cleared by reset().
The reserved file header is hide to callers. Thus there is no
change for callers. E.g.
- get_byte_position() still get the length of binlog data
written to the cache, but not the file length.
- truncate(0) will truncate the file to m_file_reserved_bytes but not 0.
- write_prepare()
write_prepare() is called everytime when anything is being written
into the cache. It will call init_file_reserved_bytes() to create
the cache file (if it doesn't exist) and reserve suitable space if
the data written exceeds buffer's size.
* Binlog_commit_by_rotate
It is used to encapsulate the code for remaing a binlog cache
tempoary file to binlog file.
- should_commit_by_rotate()
it is called by write_transaction_to_binlog_events() to check if
a binlog cache should be rename to a binlog file.
- commit()
That is the entry to rename a binlog cache and commit the
transaction. Both rename and commit are protected by LOCK_log,
Thus not other transactions can write anything into the renamed
binlog before it.
Rename happens in a rotation. After the new binlog file is generated,
replace_binlog_file() is called to:
- copy data from the new binlog file to its binlog cache file.
- write gtid event.
- rename the binlog cache file to binlog file.
After that the rotation will continue to succeed. Then the transaction
is committed in a seperated group itself. Its cache file will be
detached and cache log will be reset before calling
trx_group_commit_with_engines(). Thus only Xid event be written.
Problem was that we did not found that table was partitioned
and then we should find what is actual underlaying storage
engine.
We should not use RSU for !InnoDB tables.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
This commit adds 3 new status variables to 'show all slaves status':
- Master_last_event_time ; timestamp of the last event read from the
master by the IO thread.
- Slave_last_event_time ; Master timestamp of the last event committed
on the slave.
- Master_Slave_time_diff: The difference of the above two timestamps.
All the above variables are NULL until the slave has started and the
slave has read one query event from the master that changes data.
- Added information_schema.slave_status, which allows us to remove:
- show_master_info(), show_master_info_get_fields(),
send_show_master_info_data(), show_all_master_info()
- class Sql_cmd_show_slave_status.
- Protocol::store(I_List<i_string_pair>* str_list) as it is not
used anymore.
- Changed old SHOW SLAVE STATUS and SHOW ALL SLAVES STATUS to
use the SELECT code path, as all other SHOW ... STATUS commands.
Other things:
- Xid_log_time is set to time of commit to allow slave that reads the
binary log to calculate Master_last_event_time and
Slave_last_event_time.
This is needed as there is not 'exec_time' for row events.
- Fixed that Load_log_event calculates exec_time identically to
Query_event.
- Updated RESET SLAVE to reset Master/Slave_last_event_time
- Updated SQL thread's update on first transaction read-in to
only update Slave_last_event_time on group events.
- Fixed possible (unlikely) bugs in sql_show.cc ...old_format() functions
if allocation of 'field' would fail.
Reviewed By:
Brandon Nesterenko <brandon.nesterenko@mariadb.com>
Kristian Nielsen <knielsen@knielsen-hq.org>
There are two problems.
First, replication fails when XA transactions are used where the
slave has replicate_do_db set and the client has touched a different
database when running DML such as inserts. This is because XA
commands are not treated as keywords, and are thereby not exempt
from the replication filter. The effect of this is that during an XA
transaction, if its logged “use db” from the master is filtered out
by the replication filter, then XA END will be ignored, yet its
corresponding XA PREPARE will be executed in an invalid state,
thereby breaking replication.
Second, if the slave replicates an XA transaction which results in
an empty transaction, the XA START through XA PREPARE first phase of
the transaction won’t be binlogged, yet the XA COMMIT will be
binlogged. This will break replication in chain configurations.
The first problem is fixed by treating XA commands in
Query_log_event as keywords, thus allowing them to bypass the
replication filter. Note that Query_log_event::is_trans_keyword() is
changed to accept a new parameter to define its mode, to either
check for XA commands or regular transaction commands, but not both.
In addition, mysqlbinlog is adapted to use this mode so its
--database filter does not remove XA commands from its output.
The second problem fixed by overwriting the XA state in the XID
cache to be XA_ROLLBACK_ONLY, so at commit time, the server knows to
rollback the transaction and skip its binlogging. If the xid cache
is cleared before an XA transaction receives its completion command
(e.g. on server shutdown), then before reporting ER_XAER_NOTA when
the completion command is executed, the filter is first checked if
the database is ignored, and if so, the error is ignored.
Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Andrei Elkin <andrei.elkin@mariadb.com>