Problem:
========
Auto purge of relaylogs stops when relay-log-file is
'slave-relay-log.999999' and slave_parallel_threads is enabled.
Analysis:
=========
The problem is that in Relay_log_info::inc_group_relay_log_pos() function,
when two log names are compared via strcmp() function, it gives correct
result, when log name sequence numbers are of same digits(6 digits), But
when the number goes to 7 digits, a 999999 compares greater than
1000000, which is wrong, hence the bug.
Fix:
====
Extract the numeric extension part of the file name, convert it into
unsigned long and compare.
Thanks to David Zhao for the contribution.
Problem:
=======
SHOW BINLOG EVENTS FROM <"random"-pos> caused a variety of failures as
reported in MDEV-18046. They are fixed but that approach is not future-proof
as well as is not optimal to create extra check for being constructed event
parameters.
Analysis:
=========
"show binlog events from <pos>" code considers the user given position as a
valid event start position. The code starts reading data from this event start
position onwards and tries to map it to a set of known events. Each event has
a specific event structure and asserts have been added to ensure that, read
event data, satisfies the event specific requirements. When a random position
is supplied to "show binlog events command" the event structure specific
checks will fail and they result in assert.
For example: https://jira.mariadb.org/browse/MDEV-18046
In the bug description user executes CREATE TABLE/INSERT and ALTER SQL
commands.
When a crazy offset like "SHOW BINLOG EVENTS FROM 365" is provided code
assumes offset 365 as valid event begin and proceeds to EVENT_LEN_OFFSET reads
some random length and comes up with a crazy event which didn't exits in the
binary log. In this quoted example scenario, event read at offset 365 is
considered as "Update_rows_log_event", which is not present in binary log.
Since this is a random event its validation fails and code results in
assert/segmentation fault, as shown below.
mysqld: /data/src/10.4/sql/log_event.cc:10863: Rows_log_event::Rows_log_event(
const char*, uint, const Format_description_log_event*):
Assertion `var_header_len >= 2' failed.
181220 15:27:02 [ERROR] mysqld got signal 6 ;
#7 0x00007fa0d96abee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#8 0x000055e744ef82de in Rows_log_event::Rows_log_event (this=0x7fa05800d390,
buf=0x7fa05800d080 "", event_len=254, description_event=0x7fa058006d60) at
/data/src/10.4/sql/log_event.cc:10863
#9 0x000055e744f00cf8 in Update_rows_log_event::Update_rows_log_event
Since we are reading random data repeating the same command SHOW BINLOG EVENTS
FROM 365 produces different types of crashes with different events. MDEV-18046
reported 10 such crashes.
In order to avoid such scenarios user provided starting offset needs to be
validated for its correctness. Best way of doing this is to make use of
checksums if they are available. MDEV-18046 fix introduced the checksum based
validation.
The issue still remains in cases where binlog checksums are disabled. Please
find the following bug reports.
MDEV-22473: binlog.binlog_show_binlog_event_random_pos failed in buildbot,
server crashed in read_log_event
MDEV-22455: Server crashes in Table_map_log_event,
binlog.binlog_invalid_read_in_rotate failed in buildbot
Fix:
====
When binlog checksum is disabled, perform scan(via reading event by event), to
validate the requested FROM <pos> offset. Starting from offset 4 read the
event_length of next_event in the binary log. Using the next_event length
advance current offset to point to next event. Repeat this process till the
current offset is less than or equal to crazy offset. If current offset is
higher than crazy offset provide appropriate invalid input offset error.
Problem:
=======
The "Start binlog_dump" message hasn't been updated to include the slave's
requested GTID position:
20:05:05 139836760311552 [Note] Start binlog_dump to slave_server(2), pos(, 4)
For diagnostic purposes, it would be helpful if the GTID position were
included.
Fix:
===
Imporve "Start binlog_dump" print message to include "using_gtid" and
"GTID position" requested by slave.
Ex:
[Note] Start binlog_dump to slave_server(2), pos(, 4), using_gtid(1),
gtid('1-1-201,2-2-100')
[Note] Start binlog_dump to slave_server(3), pos('mariadb-bin.004142',
507988273), using_gtid(0), gtid('')
Lifted long standing limitation to the XA of rolling it back at the
transaction's
connection close even if the XA is prepared.
Prepared XA-transaction is made to sustain connection close or server
restart.
The patch consists of
- binary logging extension to write prepared XA part of
transaction signified with
its XID in a new XA_prepare_log_event. The concusion part -
with Commit or Rollback decision - is logged separately as
Query_log_event.
That is in the binlog the XA consists of two separate group of
events.
That makes the whole XA possibly interweaving in binlog with
other XA:s or regular transaction but with no harm to
replication and data consistency.
Gtid_log_event receives two more flags to identify which of the
two XA phases of the transaction it represents. With either flag
set also XID info is added to the event.
When binlog is ON on the server XID::formatID is
constrained to 4 bytes.
- engines are made aware of the server policy to keep up user
prepared XA:s so they (Innodb, rocksdb) don't roll them back
anymore at their disconnect methods.
- slave applier is refined to cope with two phase logged XA:s
including parallel modes of execution.
This patch does not address crash-safe logging of the new events which
is being addressed by MDEV-21469.
CORNER CASES: read-only, pure myisam, binlog-*, @@skip_log_bin, etc
Are addressed along the following policies.
1. The read-only at reconnect marks XID to fail for future
completion with ER_XA_RBROLLBACK.
2. binlog-* filtered XA when it changes engine data is regarded as
loggable even when nothing got cached for binlog. An empty
XA-prepare group is recorded. Consequent Commit-or-Rollback
succeeds in the Engine(s) as well as recorded into binlog.
3. The same applies to the non-transactional engine XA.
4. @@skip_log_bin=OFF does not record anything at XA-prepare
(obviously), but the completion event is recorded into binlog to
admit inconsistency with slave.
The following actions are taken by the patch.
At XA-prepare:
when empty binlog cache - don't do anything to binlog if RO,
otherwise write empty XA_prepare (assert(binlog-filter case)).
At Disconnect:
when Prepared && RO (=> no binlogging was done)
set Xid_cache_element::error := ER_XA_RBROLLBACK
*keep* XID in the cache, and rollback the transaction.
At XA-"complete":
Discover the error, if any don't binlog the "complete",
return the error to the user.
Kudos
-----
Alexey Botchkov took to drive this work initially.
Sergei Golubchik, Sergei Petrunja, Marko Mäkelä provided a number of
good recommendations.
Sergei Voitovich made a magnificent review and improvements to the code.
They all deserve a bunch of thanks for making this work done!
MDEV-18046: Assortment of crashes, assertion failures and ASAN errors in mysql_show_binlog_events
Problem:
========
SHOW BINLOG EVENTS FROM <pos> reports following assert when ASAN is enabled.
uint32 binlog_get_uncompress_len(const char*):
Assertion `(buf[0] & 0xe0) == 0x80' failed
Fix:
===
**Part11: Converted debug assert to error handler code**
Problem:
========
SHOW BINLOG EVENTS FROM <pos> causes a variety of failures, some of which are
listed below. It is not a race condition issue, but there is some
non-determinism in it.
Analysis:
========
"show binlog events from <pos>" code considers the user given position as a
valid event start position. The code starts reading data from this event start
position onwards and tries to map it to a set of known events. Each event has
a specific event structure and asserts have been added to ensure that read
event data satisfies the event specific requirements. When a random position
is supplied to "show binlog events command" the event structure specific
checks will fail and they result in assert.
Fix:
====
The fix is split into different parts. Each part addresses either an ASAN
issue or an assert/crash.
**Part1: Checksum based position validation when checksum is enabled**
Using checksum validate the very first event read at the user specified
position. If there is a checksum mismatch report an appropriate error for the
invalid event.
The assert indicates that the current transaction got caught uncleaned from
the semisync master's cache when it is signaled to proceed upon its
ack receive.
The reason of missed cleanup turns out to be a flaw in the gtid
connect mode.
A submitted by connecting slave value of its last received event's
binlog file *name* was adopted into
{{Repl_semi_sync_master::m_reply_file_name}} as a part of semisync
initialization.
Notice that the initialization still refines the position part of the
submitted last received event's binlog coordinates.
The master side binlog filename:pos refinement is
specific to the gtid connect mode for purpose of computing the latest
binlog file to resume slave feeding from.
Effectively in the gtid connect mode the computed resumption filename:pos
may appear smaller in which case a new post-connect time committing
transaction may be logged with its filename:pos also less than the
submitted coordinates and that triggers the assert.
Fixed with making the semisync initialization to use the refined filename:pos.
It is guaranteed to be less than any new generated transaction's binlog:pos.
Now both offset and limit are stored and do not chenged during execution
(offset is decreased during processing in versions before 10.5).
(Big part of this changes made by Monty)
Analysis
Mysqlbinlog output for encrypted binary log
#Q> insert into tab1 values (3,'row 003')
#190912 17:36:35 server id 10221 end_log_pos 980 CRC32 0x53bcb3d3 Table_map: `test`.`tab1` mapped to number 19
# at 940
#190912 17:36:35 server id 10221 end_log_pos 1026 CRC32 0xf2ae5136 Write_rows: table id 19 flags: STMT_END_F
Here we can see Table_map_log_event ends at 980 but Next event starts at 940.
And the reason for that is we do not send START_ENCRYPTION_EVENT to the slave
Solution:-
Send Start_encryption_log_event as Ignorable_log_event to slave(mysqlbinlog),
So that mysqlbinlog can update its log_pos.
Since Slave can request multiple FORMAT_DESCRIPTION_EVENT while master does not
have so We only update slave master pos when master actually have the
FORMAT_DESCRIPTION_EVENT. Similar logic should be applied for START_ENCRYPTION_EVENT.
Also added the test case when new server reads the data from old server which
does not send START_ENCRYPTION_EVENT to slave.
Master Slave Upgrade Scenario.
When Slave is updated first, Slave will have extra logic of handling
START_ENCRYPTION_EVENT But master willnot be sending START_ENCRYPTION_EVENT.
So there will be no issue.
When Master is updated first, It will send START_ENCRYPTION_EVENT to
slave , But slave will ignore this event in queue_event.
On some systems with 10,000+ binlogs, show binary logs could block
log rotation for more than 10 seconds.
This patch fixes this by first caching all binary log names and
releases all mutexes while calculating the sizes of the binary logs.
Other things:
- Ensure that reinit_io_cache() sets end_of_file when moving to read_cache.
This ensures that external changes of the underlying file is known to
the cache.
- get_binlog_list() is made more efficent and show_binlogs() is changed
to call get_binlog_list()
Reviewed by Andrei Elkin
The patches features an optional shutdown behavior to hold on until
after all connected slaves have been sent the last binlogged event.
The connected slave is one whose START SLAVE has been acknowledged and
that was not stopped since that though it could be technically
reconnecting in background.
The solution therefore disallows killing the dump thread until is has
found EOF of the latest binlog file. It is up to the shutdown
requester (DBA) to set up a sufficiently large shutdown timeout value
for shudown to wait patiently until lagging behind slaves have been
synchronized. On the other hand if a specific slave needs exclusion
from synchronization the DBA would have to stop it manually which
would terminate its dump thread.
`mysqladmin shutdown' is extended with a `--wait_for_all_slaves' option
which translates to `SHUTDOW WAIT FOR ALL SLAVES' sql query
to enable the feature on the client side.
The patch also performs a small refactoring of the server shutdown
around close_connections() to introduce kill thread phases which
are two as of current.