This is a preparatory commit for pre-computing checksums outside of
holding LOCK_log, no functional changes.
Which checksum algorithm is used (if any) when writing an event does not
belong in the event, it is a property of the log being written to.
Instead decide the checksum algorithm when constructing the
Log_event_writer object, and store it there.
Introduce a client-only Log_event::read_checksum_alg to be able to
print the checksum read, and a
Format_description_log_event::source_checksum_alg which is the
checksum algorithm (if any) to use when reading events from a log.
Also eliminate some redundant `enum` keywords on the enum_binlog_checksum_alg
type.
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
This is a preparatory patch for precomputing binlog checksums outside
of holding LOCK_log, no functional changes.
Replace Log_event::writer with just passing the writer object as a
function parameter to Log_event::write().
This is mainly for code clarity. Having to set ev->writer before every
call to ev->write() is error-prone (what if it's forgotten in some
code place?), while passing it as parameter as usual makes it explicit
how the dataflow is.
As a minor point, it also improves the code, as the compiler now can
save the function parameter in a register across nested calls (when it
is a class member, compiler needs to reload across nested calls in
case the object would be modified during the call).
Reviewed-by: Monty <monty@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.
Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue
Disabled test:
- spider/bugfix.mdev_27239 because we started to get
+Error 1429 Unable to connect to foreign data source: localhost
-Error 1158 Got an error reading communication packets
- main.delayed
- Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
This part is disabled for now as it fails randomly with different
warnings/errors (no corruption).
MariaDB async replication SQL thread was stopped for any failure
in applying of replication events and error message logged for the failure
was: "Node has dropped from cluster". The assumption was that event applying
failure is always due to node dropping out.
With optimistic parallel replication, event applying can fail for natural
reasons and applying should be retried to handle the failure. This retry
logic was never exercised because the slave SQL thread was stopped with first
applying failure.
To support optimistic parallel replication retrying logic this commit will
now skip replication slave abort, if node remains in cluster (wsrep_ready==ON)
and replication is configured for optimistic or aggressive retry logic.
During the development of this fix, galera.galera_as_slave_nonprim test showed
some problems. The test was analyzed, and it appears to need some attention.
One excessive sleep command was removed in this commit, but it will need more
fixes still to be fully deterministic. After this commit galera_as_slave_nonprim
is successful, though.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
Replacing my_casedn_str() called on local char[] buffer variables
to CharBuffer::copy_casedn() calls.
This is a sub-task for MDEV-31531 Remove my_casedn_str()
Details:
- Adding a helper template class IdentBuffer (a CharBuffer descendant),
which assumes utf8 data. Like CharBuffer, it's initialized to an empty
string in the constructor, but can be populated with lower-cased data
later.
- Adding a helper template class IdentBufferCasedn, which initializes
to lower case right in the constructor.
- Removing char[] buffers, replacing them to IdentBuffer and IdentBufferCasedn.
- Changing the data type of "db" and "table" parameters from
"const char*" to LEX_CSTRING in the following functions:
find_field_in_table_ref()
insert_fields()
set_thd_db()
mysql_grant()
to reuse IdentBuffer easeir.
... upon replicating online ALTER
When an online event is applied and slave_exec_mode is idempotent,
Write_rows_log_event::do_before_row_operations had reset
thd->lex->sql_command to SQLCOM_REPLACE.
This led to that a statement was detected as a row-type during binlogging,
and was logged as not standalone.
So the corresponding Gtid_log_event, when applied on replica, did not exit
early and created a new PSI transaction. Hence the difference with
non-online ALTER.
Pack these fields together:
event_owns_temp_buf
cache_type
slave_exec_mode
checksum_alg
Make them bitfields to fit a single 2-byte hole.
This saves 24 bytes per event.
SLAVE_EXEC_MODE_LAST_BIT is rewritten as
> SLAVE_EXEC_MODE_LAST= SLAVE_EXEC_MODE_IDEMPOTENT
to avoid a false-positive -Wbitfield-enum-conversion warning:
Bit-field 'slave_exec_mode' is not wide enough to store all enumerators of
'enum_slave_exec_mode'.
Replica honors its own binlog_row_image value when it sets up read_set.
When a slave thread is applying replicated row events in parallel with the
running online alter, we need all columns to be read from the table
(for the online alter logged row event) not only those that were present in
the pre-image for the replicated row event.
Avoid shrinking the set when online alter is running, i.e. leave it all set
Add a new virtual function that will increase the inserted rows count
for the insert log event and decrease it for the delete event.
Reuses Rows_log_event::m_row_count on the replication side, which was only
set on the logging side.
Don't skip row_end if it wasn't set explicitly.
Also another segfault was caused by accessing rpl_write_set on slave during
the row update/delete.
The reason was a default_column_bitmaps() call, which also sets
rpl_write_set to NULL.
Previously, the related behavior was changed in commit afd3ee97ad, where
one such call was removed from Update_rows_log_event::do_exec_row, but the
same one was mistakenly left in Delete_rows_log_event. Now it's also
removed.
...in bitmap_intersect
m_cols_ai was accessed during the Delete event, however this field is only
related to Updates.
Moving it to Update_rows_event would require too much effort. So instead:
* Only access m_cols_ai in Update events (conditional branch is added in
Rows_log_event::do_add_row_data)
* Clean up m_cols_ai operations in Rows_log_event constructor.
m_cols_ai.bitmap is first set to NULL, indicating invalid event.
Then it is initialized:
-> For Update events, a new bitmap is created.
-> For other events, debug mode, m_cols_ai.bitmap is set to 1, indicating
that the value is correct, but it shouldn't be accessed. To make sure
we'll have a failure, n_bits is also set to 1.
-> In release mode, m_cols_ai mirrors m_cols, providing extra safety
in production.
The bug is inherent for row-based replication as well.
To reproduce, a virtual (not stored) field of a blob type computed from
another field of a different blob type is required.
The following happens during an update or delete row event:
1. A row is unpacked.
2. Virtual fields are updated. Field b1 stores the pointer in
Field_blob::value and references it in table->record[0].
3. record[0] is stored to record[1] in Rows_log_event::find_row.
4. A new record is fetched from handler. (e.g. ha_rnd_next)
5. Virtual columns are updated (only non-stored).
6. Field b1 receives new value. Old value is deallocated
(Field_blob::val_str).
7. record_compare is called. record[0] and record[1] are compared.
8. record[1] contains a reference to a freed value.
record_compare is used in replication to find a matching record for update
or delete. Virtual columns that are not stored should be definitely skipped
both for correctness, and for this bug fix.
STORED virtual columns, on the other hand, may be required and shouldn't be
skipped. Stored columns are not affected, since they are not updated after
handler's fetch.
ONLINE ALTER TABLE uses binlog events like the replication does.
Before it was never used outside of replication, so significant
change was required. For example, a single event had a statement-like
befavior: it locked the tables, opened it, and closed them in the end. But
for ONLINE ALTER we use preopened table.
A crash scenario is following: lex->query_tables was set to NULL in
restore_empty_query_table_list when alter event is applied.
Then lex->query_tables->prev_global was write-accessed in
LEX::first_lists_tables_same, leading to a segfault.
In replication restore_empty_query_table_list would mean resetting lex
before next query or event.
In ONLINE ALTER TABLE we reuse a locked table between the events, so
we should avoid it. Here the need to reset lex state (or close the tables)
can be determined by nonzero rgi->tables_to_lock_count.
If no table is locked, then event doesn't own the tables.
The same was already done before for rgi->slave_close_thread_tables call.
previously, fields with DEFAULTs were allowed just when expression is
deterministic. In case of online alter, we should recursively check that
underlying fields of expression also either have explicit values, or
have DEFAULT following this validity rule.
We can't rely on keys formed with columns that were added during this
ALTER. These columns can be set with non-deterministic values, which can
end up with broken or incorrect search.
The same applies to the keys that contain reliable columns, but also have
bogus ones. Using them can narrow the search, but they're also ignored.
Also, added columns shouldn't be considered during the record match. To
determine them, table->has_value_set bitmap is used.
To fill has_value_set bitmap in the find_key call, extra unpack_row call
has been added.
For replication case, extra replica columns can be considered for this
case. We try to ignore them, too.
* in online ALTER it must include the complete new row,
note that an UPDATE should set all extra columns to their
default values, as if UPDATE was completely done before the ALTER.
* in rpl WRITE_ROWS_EVENT it must include all extra slave columns,
but not existing columns unmarked in the m_cols (sequences do that)
* in rpl UPDATE/DELETE events it should follow m_cols_ai
also: default values must be updated for WRITE_ROWS_EVENT and
for UPDATE/DELETE in the online ALTER mode, see above.
Update the result file accordingly.
Extend bitmap_copy() to support arguments of different lengths
wsrep expected that any Rows_log_event::do_apply_event() failures
can only happen in the applier thread, because no other thread ever
applies row events. With online alter it's no longer true.
Division by zero is a good example. sql_mode is basically ignored by
replication, see Bug#56662.
The behavior for ONLINE should remain the same as for non-ONLINE ALTER.
We shouldn't rely on `fill_extra_persistent_columns`, as it only updates
fields which have an index > cols->n_bits (replication bitmap width).
Actually, it should never be used, as its approach is error-prone.
Normal update_virtual_fields+update_default_fields should be done.
ALTER ONLINE TABLE acquires table with TL_READ. Myisam normally acquires
TL_WRITE for DML, which makes it hang until table is freed.
We deadlock once ALTER upgrades its MDL lock.
Solution:
Unlock table earlier. We don't need to hold TL_READ once we finished
copying. Relay log replication requires no data locks on `from` table.
in RBR - only show warnings for values that are to be written into a
table, that is, only for the after-image. Don't show data conversion
warnings for the before-image.
* Log rows in online_alter_binlog.
* Table online data is replicated within dedicated binlog file
* Cached data is written on commit.
* Versioning is fully supported.
* Works both wit and without binlog enabled.
* For now savepoints setup is forbidden while ONLINE ALTER goes on.
Extra support is required. We can simply log the SAVEPOINT query events
and replicate them together with row events. But it's not implemented
for now.
* Cache flipping:
We want to care for the possible bottleneck in the online alter binlog
reading/writing in advance.
IO_CACHE does not provide anything better that sequential access,
besides, only a single write is mutex-protected, which is not suitable,
since we should write a transaction atomically.
To solve this, a special layer on top Event_log is implemented.
There are two IO_CACHE files underneath: one for reading, and one for
writing.
Once the read cache is empty, an exclusive lock is acquired (we can wait
for a currently active transaction finish writing), and flip() is emitted,
i.e. the write cache is reopened for read, and the read cache is emptied,
and reopened for writing.
This reminds a buffer flip that happens in accelerated graphics
(DirectX/OpenGL/etc).
Cache_flip_event_log is considered non-blocking for a single reader and a
single writer in this sense, with the only lock held by reader during flip.
An alternative approach by implementing a fair concurrent circular buffer
is described in MDEV-24676.
* Cache managers:
We have two cache sinks: statement and transactional.
It is important that the changes are first cached per-statement and
per-transaction.
If a statement fails, then only statement data is rolled back. The
transaction moves along, however.
Turns out, there's no guarantee that TABLE well persist in
thd->open_tables to the transaction commit moment.
If an error occurs, tables from statement are purged.
Therefore, we can't store te caches in TABLE. Ideally, it should be
handlerton, but we cut the corner and store it in THD in a list.
unpack_row() must calculate all stored and indexed vcols
(in fill_extra_persistent_columns()).
Also Update and Delete row events must mark in read_set
all columns needed for calculating all stored and indexed vcols.
If it's done properly in do_apply_event(), it no longer needs
to be repeated per row.
This patch adds a way to override default collations
(or "character set collations") for desired character sets.
The SQL standard says:
> Each collation known in an SQL-environment is applicable to one
> or more character sets, and for each character set, one or more
> collations are applicable to it, one of which is associated with
> it as its character set collation.
In MariaDB, character set collations has been hard-coded so far,
e.g. utf8mb4_general_ci has been a hard-coded character set collation
for utf8mb4.
This patch allows to override (globally per server, or per session)
character set collations, so for example, uca1400_ai_ci can be set as a
character set collation for Unicode character sets
(instead of compiled xxx_general_ci).
The array of overridden character set collations is stored in a new
(session and global) system variable @@character_set_collations and
can be set as a comma separated list of charset=collation pairs, e.g.:
SET @@character_set_collations='utf8mb3=uca1400_ai_ci,utf8mb4=uca1400_ai_ci';
The variable is empty by default, which mean use the hard-coded
character set collations (e.g. utf8mb4_general_ci for utf8mb4).
The variable can also be set globally by passing to the server startup command
line, and/or in my.cnf.