Most things where wrong in the test suite.
The one thing that was a bug was that table_map_id was in some places
defined as ulong and in other places as ulonglong. On Linux 64 bit this
is not a problem as ulong == ulonglong, but on windows this caused failures.
Fixed by ensuring that all instances of table_map_id are ulonglong.
This patch augments Gtid_log_event with the user thread-id.
In particular that compensates for the loss of this info in
Rows_log_events.
Gtid_log_event::thread_id gets visible in mysqlbinlog output like
#231025 16:21:45 server id 1 end_log_pos 537 CRC32 0x1cf1d963 GTID 0-1-2 ddl thread_id=10
as 64 bit unsigned integer.
While the size of Gtid event has grown by 8-9 bytes
replication from OLD <-> NEW is not affected by it.
This work was started by the late Sujatha Sivakumar.
Brandon Nesterenko took it over, reviewed initial patches and extended
the work.
Reviewed-by: <andrei.elkin@mariadb.com>
An existing binlog checksum can be overridden to 0 if writing a NULL
payload when using Zlib for the computation. That is, calling into
Zlib's crc32 with empty data initializes an incremental CRC
computation to 0.
This patch changes the Log_event_writer::write_data() to exit
immediately if there is nothing to write, thereby bypassing the
checksum computation. This follows the pattern of
Log_event_writer::encrypt_and_write(), which also exits immediately
if there is no data to write.
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
Compute binlog checksums (when enabled) already when writing events
into the statement or transaction caches, where before it was done
when the caches are copied to the real binlog file. This moves the
checksum computation outside of holding LOCK_log, improving
scalabitily.
At stmt/trx cache write time, the final end_log_pos values are not
known, so with this patch these will be set to 0. Events that are
written directly to the binlog file (not through stmt/trx cache) keep
the correct end_log_pos value. The GTID and COMMIT/XID events at the
start and end of event groups are written directly, so the zero
end_log_pos is only for events in the middle of event groups, which
do not negatively affect replication.
An option --binlog-legacy-event-pos, off by default, is provided to
disable this behavior to provide backwards compatibility with any
external applications that might rely on end_log_pos in events in the
middle of event groups.
Checksums cannot be pre-computed when binlog encryption is enabled, as
encryption relies on correct end_log_pos to provide part of the
nonce/IV.
Checksum pre-computation is also disabled for WSREP/Galera, as it uses
events differently in its write-sets and so on. Extending pre-computation of
checksums to Galera where it makes sense could be added in a future patch.
The current --binlog-checksum configuration is saved in
binlog_cache_data at transaction start and used to pre-compute
checksums in cache, if applicable. When the cache is later copied to
the binlog, a check is made if the saved value still matches the
configured global value; if so, the events are block-copied directly
into the binlog file. If --binlog-checksum was changed during the
transaction, events are re-written to the binlog file one-by-one and
the checksums recomputed/discarded as appropriate.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
This is a preparatory commit for pre-computing checksums outside of
holding LOCK_log, no functional changes.
Which checksum algorithm is used (if any) when writing an event does not
belong in the event, it is a property of the log being written to.
Instead decide the checksum algorithm when constructing the
Log_event_writer object, and store it there.
Introduce a client-only Log_event::read_checksum_alg to be able to
print the checksum read, and a
Format_description_log_event::source_checksum_alg which is the
checksum algorithm (if any) to use when reading events from a log.
Also eliminate some redundant `enum` keywords on the enum_binlog_checksum_alg
type.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
This is a preparatory patch for precomputing binlog checksums outside
of holding LOCK_log, no functional changes.
Replace Log_event::writer with just passing the writer object as a
function parameter to Log_event::write().
This is mainly for code clarity. Having to set ev->writer before every
call to ev->write() is error-prone (what if it's forgotten in some
code place?), while passing it as parameter as usual makes it explicit
how the dataflow is.
As a minor point, it also improves the code, as the compiler now can
save the function parameter in a register across nested calls (when it
is a class member, compiler needs to reload across nested calls in
case the object would be modified during the call).
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.
Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue
Disabled test:
- spider/bugfix.mdev_27239 because we started to get
+Error 1429 Unable to connect to foreign data source: localhost
-Error 1158 Got an error reading communication packets
- main.delayed
- Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
This part is disabled for now as it fails randomly with different
warnings/errors (no corruption).
MariaDB async replication SQL thread was stopped for any failure
in applying of replication events and error message logged for the failure
was: "Node has dropped from cluster". The assumption was that event applying
failure is always due to node dropping out.
With optimistic parallel replication, event applying can fail for natural
reasons and applying should be retried to handle the failure. This retry
logic was never exercised because the slave SQL thread was stopped with first
applying failure.
To support optimistic parallel replication retrying logic this commit will
now skip replication slave abort, if node remains in cluster (wsrep_ready==ON)
and replication is configured for optimistic or aggressive retry logic.
During the development of this fix, galera.galera_as_slave_nonprim test showed
some problems. The test was analyzed, and it appears to need some attention.
One excessive sleep command was removed in this commit, but it will need more
fixes still to be fully deterministic. After this commit galera_as_slave_nonprim
is successful, though.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Replacing my_casedn_str() called on local char[] buffer variables
to CharBuffer::copy_casedn() calls.
This is a sub-task for MDEV-31531 Remove my_casedn_str()
Details:
- Adding a helper template class IdentBuffer (a CharBuffer descendant),
which assumes utf8 data. Like CharBuffer, it's initialized to an empty
string in the constructor, but can be populated with lower-cased data
later.
- Adding a helper template class IdentBufferCasedn, which initializes
to lower case right in the constructor.
- Removing char[] buffers, replacing them to IdentBuffer and IdentBufferCasedn.
- Changing the data type of "db" and "table" parameters from
"const char*" to LEX_CSTRING in the following functions:
find_field_in_table_ref()
insert_fields()
set_thd_db()
mysql_grant()
to reuse IdentBuffer easeir.
... upon replicating online ALTER
When an online event is applied and slave_exec_mode is idempotent,
Write_rows_log_event::do_before_row_operations had reset
thd->lex->sql_command to SQLCOM_REPLACE.
This led to that a statement was detected as a row-type during binlogging,
and was logged as not standalone.
So the corresponding Gtid_log_event, when applied on replica, did not exit
early and created a new PSI transaction. Hence the difference with
non-online ALTER.
Pack these fields together:
event_owns_temp_buf
cache_type
slave_exec_mode
checksum_alg
Make them bitfields to fit a single 2-byte hole.
This saves 24 bytes per event.
SLAVE_EXEC_MODE_LAST_BIT is rewritten as
> SLAVE_EXEC_MODE_LAST= SLAVE_EXEC_MODE_IDEMPOTENT
to avoid a false-positive -Wbitfield-enum-conversion warning:
Bit-field 'slave_exec_mode' is not wide enough to store all enumerators of
'enum_slave_exec_mode'.
Replica honors its own binlog_row_image value when it sets up read_set.
When a slave thread is applying replicated row events in parallel with the
running online alter, we need all columns to be read from the table
(for the online alter logged row event) not only those that were present in
the pre-image for the replicated row event.
Avoid shrinking the set when online alter is running, i.e. leave it all set
Add a new virtual function that will increase the inserted rows count
for the insert log event and decrease it for the delete event.
Reuses Rows_log_event::m_row_count on the replication side, which was only
set on the logging side.
Don't skip row_end if it wasn't set explicitly.
Also another segfault was caused by accessing rpl_write_set on slave during
the row update/delete.
The reason was a default_column_bitmaps() call, which also sets
rpl_write_set to NULL.
Previously, the related behavior was changed in commit afd3ee97ad, where
one such call was removed from Update_rows_log_event::do_exec_row, but the
same one was mistakenly left in Delete_rows_log_event. Now it's also
removed.
...in bitmap_intersect
m_cols_ai was accessed during the Delete event, however this field is only
related to Updates.
Moving it to Update_rows_event would require too much effort. So instead:
* Only access m_cols_ai in Update events (conditional branch is added in
Rows_log_event::do_add_row_data)
* Clean up m_cols_ai operations in Rows_log_event constructor.
m_cols_ai.bitmap is first set to NULL, indicating invalid event.
Then it is initialized:
-> For Update events, a new bitmap is created.
-> For other events, debug mode, m_cols_ai.bitmap is set to 1, indicating
that the value is correct, but it shouldn't be accessed. To make sure
we'll have a failure, n_bits is also set to 1.
-> In release mode, m_cols_ai mirrors m_cols, providing extra safety
in production.
The bug is inherent for row-based replication as well.
To reproduce, a virtual (not stored) field of a blob type computed from
another field of a different blob type is required.
The following happens during an update or delete row event:
1. A row is unpacked.
2. Virtual fields are updated. Field b1 stores the pointer in
Field_blob::value and references it in table->record[0].
3. record[0] is stored to record[1] in Rows_log_event::find_row.
4. A new record is fetched from handler. (e.g. ha_rnd_next)
5. Virtual columns are updated (only non-stored).
6. Field b1 receives new value. Old value is deallocated
(Field_blob::val_str).
7. record_compare is called. record[0] and record[1] are compared.
8. record[1] contains a reference to a freed value.
record_compare is used in replication to find a matching record for update
or delete. Virtual columns that are not stored should be definitely skipped
both for correctness, and for this bug fix.
STORED virtual columns, on the other hand, may be required and shouldn't be
skipped. Stored columns are not affected, since they are not updated after
handler's fetch.
ONLINE ALTER TABLE uses binlog events like the replication does.
Before it was never used outside of replication, so significant
change was required. For example, a single event had a statement-like
befavior: it locked the tables, opened it, and closed them in the end. But
for ONLINE ALTER we use preopened table.
A crash scenario is following: lex->query_tables was set to NULL in
restore_empty_query_table_list when alter event is applied.
Then lex->query_tables->prev_global was write-accessed in
LEX::first_lists_tables_same, leading to a segfault.
In replication restore_empty_query_table_list would mean resetting lex
before next query or event.
In ONLINE ALTER TABLE we reuse a locked table between the events, so
we should avoid it. Here the need to reset lex state (or close the tables)
can be determined by nonzero rgi->tables_to_lock_count.
If no table is locked, then event doesn't own the tables.
The same was already done before for rgi->slave_close_thread_tables call.
previously, fields with DEFAULTs were allowed just when expression is
deterministic. In case of online alter, we should recursively check that
underlying fields of expression also either have explicit values, or
have DEFAULT following this validity rule.