1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-08 11:22:35 +03:00
Commit Graph

2961 Commits

Author SHA1 Message Date
Kristian Nielsen
585785c7bc Binlog-in-engine: Handle mixing transactional and non-transactional tables
When updating non-transactional tables inside a multi-statement transaction,
and binlog_direct_non_transactional_updates=1, then the non-transactional
updates are binlogged directly through the statement cache while the
transaction cache is still being added to in the main transaction.

Thus, move the engine_binlog_info out from binlog_cache_mngr and into the
individual stmt/trx binlog_cache_data, so that we can have separate
engine_binlog_info active for the statement and the transaction cache.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
7a67f72979 Binlog-in-engine: Also binlog non-innodb event groups
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
97e9106e5a Binlog-in-engine: Make --binlog-storage-engine available as read-only system variable
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
6e7f1f95f0 Binlog-in-engine: Handle single event writes larger than binlog size
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
685b0b0def Binlog-in-engine: Implement dynamically changing binlog max size
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
31ba7922a0 Binlog-in-engine: Implement savepoint support
Support for SAVEPOINT, ROLLBACK TO SAVEPOINT, rolling back a failed
statement (keeping active transaction), and rolling back transaction.

For savepoints (and start-of-statement), if the binlog data to be rolled
back is still in the in-memory part of trx cache we can just truncate the
cache to the point.

But if we need to spill cache contents as out-of-band data containing one or
more savepoints/start-of-statement point, then split the spill at each point
and inform the engine of the savepoints.

In InnoDB, at savepoint set, save the state of the forest of perfect binary
trees being built. Then at rollback, restore the appropriate state.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
84da20e658 MDEV-34705: Binlog-in-engine: Protect against concurrent RESET MASTER and dump threads
This is actually an existing problem in the old binlog implementation, and
this patch is applicable to old binlog also. The problem is that RESET
MASTER can run concurrently with binlog dump threads / connected slaves.
This will remove the binlog from under the feet of the reader, which can
cause all sorts of strange behaviour.

This patch fixes the problem by disallowing to run RESET MASTER when dump
threads (or other RESET MASTER or SHOW BINARY LOGS) are running. An error is
thrown in this case, user must stop slaves and/or kill dump threads to make
the RESET MASTER go through. A slave that connects in the middle of RESET
MASTER will wait for it to complete.

Fix a lot of test cases to kill any lingering dump threads before doing
RESET MASTER, mostly just by sourcing include/kill_binlog_dump_threads.inc.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
d26851a575 MDEV-34705: Binlog-in-engine: Crash-safe slave
This patch makes replication crash-safe with the new binlog implementation,
even when --innodb-flush-log-at-trx-commit=0|2. The point is to not send any
binlog events to the slave until they have become durable on master, thus
avoiding that a slave may replicate a transaction that is lost during master
recovery, diverging the slave from the master.

Keep track of which point in the binlog has been durably synced to disk
(meaning the corresponding LSN has been durably synced to disk in the InnoDB
redo log). Each write to the binlog inserts an entry with offset and
corresponding LSN in a FIFO. Dump threads will first read only up to the
durable point in the binlog. A dump thread will then check the LSN fifo, and
do an InnoDB redo log sync if anything is pending. Then the FIFO is emptied
of any LSNs that have now become durable, and the durable point in the
binlog is updated and reading the binlog can continue.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
baec2064a1 MDEV-34705: Binlog-in-engine: Fix hang with event group of specific size
If the event group fitted in the binlog cache without the GTID event but not
with, the code would attempt to spill part of the GTID event as out-of-band
data, which is not correct. In release builds this would hang the server as
the spilling would try to lock an already owned mutex.

Fix by checking if the GTID event fits, and spilling any non-GTID data as
oob if it does not.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-03 14:44:51 +02:00
Kristian Nielsen
1b8ce5d554 MDEV-34705: Binlog-in-engine: Few bug fixes
Fix that spilling of out-of-band data to the binlog could happen
concurrently with binlog group commit, by holding LOCK_commit_ordered
over all binlog writes now.

Fix silly use-after-free bug where data was accessed in the old buffer after
realloc().

Improve the wording of the error when specifying an argument for --log-bin.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-05-20 11:13:56 +02:00
Kristian Nielsen
9e13086ab8 MDEV-34705: Binlog-in-engine: Fix leftover fsync of legacy binlog
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-05-15 14:30:46 +02:00
Kristian Nielsen
f0d4b63bac MDEV-34705: Binlog-in-engine: Implement refcounting outstanding OOB records
Keep track of, for each binlog file, how many open transactions have
out-of-band data starting in that file. Then at the start of each new binlog
file, in the header page, record the file_no of the earliest file that this
file might contain commit records with references back to OOB records in
that earlier file.

Use this in PURGE BINARY LOGS, so that when a dump thread (slave connection)
is active in file number N, and that file (or a later one) may require
looking back in an earlier file number M for out-of-band records, purge will
stop already at file number M. This way, we avoid that purge accidentally
deletes some binlog file that a dump thread would later get an error on
because it needs to read out-of-band data.

This patch also includes placeholder data for a similar facility for XA
references. The actual implementation of support for XA is for later though.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-19 12:26:28 +02:00
Kristian Nielsen
d496e5278d MDEV-34705: Binlog-in-engine: Integration with server-layer code
Mostly various fixes to avoid initializing or creating any data or files for
the legacy binlog.

A possible later refinement could be to sub-class the binlog class
differently for legacy and in-engine binlogs, writing separate virtual
functions for behaviour that differ, extracting common functionality into
sub-methods. This could remove some if (opt_binlog_engine_hton)
conditionals.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-10 19:16:55 +02:00
Kristian Nielsen
b3c6bbdbd3 MDEV-34705: Binlog-in-engine: First working recovery
Still needs more testing.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:51 +02:00
Kristian Nielsen
68f37e6e58 MDEV-34705: Binlog-in-engine: Implement DELETE_DOMAIN_ID for FLUSH
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
0671add213 MDEV-34705: Binlog-in-engine: Implement PURGE BINARY LOGS
Still ToDo: is to restrict auto-purge so that it does not purge any binlog
file with out-of-band data that might still be needed by a connected slave.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
d4b37fcc85 MDEV-34705: Binlog-in-engine: Handful of fixes
Fix missing WORDS_BIGENDIAN define in ut0compr_int.cc.

Fix misaligned read buffer for O_DIRECT.

Fix wrong/missing update_binlog_end_pos() in binlog group commit.

Fix race where active_binlog_file_no incremented too early.

Fix wrong assertion when reader reaches the very start of (active+1).

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
dd8ffe952d MDEV-34705: Binlog-in-engine: Misc. small fixes to make normal test suite mostly pass
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
c67b014c9c MDEV-34705: Binlog-in-engine: Implement RESET MASTER
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
6889c8e4cf MDEV-34705: Binlog-in-engine: Implement FLUSH BINARY LOGS
No DELETE_DOMAIN_ID supported yet, will come in a later commit, after PURGE
is implemented.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
6f6baf9655 MDEV-34705: Binlog-in-engine: Read side of out-of-band binlogging
With this commit, the out-of-band binlogging of large event groups in
multiple smaller records interleaved with other event groups is now working.

Instead of flushing the binlog cache to disk when they reach
@@binlog_cache_size, instead the cache is binlogged as an out-of-band
record. Then at transaction commit, a commit record is written containing
just the GTID and a link to the out-of-band data.

To facilitate append-only operation, the binlogged records do not have a
"next" pointer. Instead, they are written out as a forest of perfect binary
trees, the leftmost leaf of one tree pointing to the root of the previous
tree. This structure is used in the binlog reader to efficiently read out
the event group data consecutively for the binlog dump thread, needing to
maintain only O(log(N)) amount of memory during the reading.

As part of this commit, the existing binlog reader code is refactored to be
greatly improved, with a much cleaner explicit state machine and handling of
chunk/page/file boundaries etc.

Also fixes some bugs in the gtid_search::find_gtid_pos().

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
07232f1e45 MDEV-34705: out-of band binlogging, fix trx_cache handling for out-of-band
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:48 +02:00
Kristian Nielsen
c80d87f8c5 MDEV-34705: out-of band binlogging, partial untested commit to do a separate refactoring of end_event
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:17 +02:00
Kristian Nielsen
ce2269353f MDEV-34705: Binlog-in-engine: Working replication to slave
Only GTID slave connection is supported, at least for now.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:17 +02:00
Kristian Nielsen
586ed18fe9 MDEV-34705: Code to restore binlog GTID state at restart
To restore the binlog state, after finding the position in the old binlog to
continue from, read the full gtid state saved at the start of the binlog
file as well as the most recent differentioal gtid state written shortly
before the starting position. Then construct a binlog reader to read the
remaining few events (if any), and update with any GTIDs read to obtain the
final restored GTID binlog state.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:17 +02:00
Kristian Nielsen
18b9ec637e MDEV-34705: Binlog in Engine: Searchability for GTID position
Every N bytes (hardcoded at 64k for now, to become a configurable setting),
write the binlog GTID state into the binlog tablespace. This allows to
quickly find a given GTID position by binary search to the prior GTID state
in the tablespace and then a small linear scan from that point.

The full binlog state is dumped at the start of the binlog file; remaining
states dumped are differential states containing only the changed
(domain_id, server_id) pairs, to save space if binlog space is large.

This commit only implements the writing of the binlog state to the
tablespace at regular intervals. The binary search to be implemented in a
subsequent commit.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:17 +02:00
Kristian Nielsen
094c772213 MDEV-34705: Binlog in Engine: Also binlog standalone (eg. DDL) in the engine
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:17 +02:00
Kristian Nielsen
219f643ba0 MDEV-34705: Binlog in Engine: Change option to --binlog-storage-engine to get a hton available
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:16 +02:00
Kristian Nielsen
1db620338d MDEV-34705: Binlog in Engine: Early draft, first binlogging of DML to InnoDB tablespace
The option --innodb-in-engine now causes InnoDB DML commits to include
binlogging in the same mtr. Binlog group commit now skips binlogging to
old file-based binlog and passes events to InnoDB instead.

Many things unfinished still, like allocating new tablespaces when the first
one is filled, writing large event groups out-of-band to not bloat the
InnoDB commit record in the redo log and exceed max mtr size, writing DDL
and all other events to the InnoDB binlog, skipping the creation of the
old-style binlog, reading the new style binlog from InnoDB, etc. etc.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:16 +02:00
Marko Mäkelä
3ae8f114e2 Merge 10.11 into 11.4 2025-04-02 10:15:08 +03:00
Julius Goryavsky
74f0b99edf Merge branch '10.6' into '10.11' 2025-04-02 06:33:39 +02:00
Julius Goryavsky
03c31ab099 Merge branch '10.5' into '10.6' 2025-04-02 04:43:24 +02:00
Jan Lindström
25737dbab7 MDEV-33850 : For Galera, create sequence with low cache got signal 6 error: [ERROR] WSREP: FSM: no such a transition REPLICATING -> COMMITTED
Problem was that transacton was BF-aborted after certification
succeeded and transaction tried to rollback and during
rollback binlog stmt cache containing sequence value reservations
was written into binlog.

Transaction must replay because certification succeeded but
transaction must not be written into binlog yet, it will
be done during commit after the replay.

Fix is to skip binlog write if transaction must replay and
in replay we need to reset binlog stmt cache.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2025-04-02 04:29:40 +02:00
Daniele Sciascia
d698b784c8 MDEV-35658 Assertion `commit_trx' failed in test galera_as_master
The test issues a simple INSERT statement, while sql_log_bin = 0.
This option disables writes to binlog.  However, since MDEV-7205,
the option does not affect Galera, so changes are still replicated.
So sql_log_bin=off, "partially" disabled the binlog and the INSERT
will involve both binlog and innodb, thus requiring internal 2 phase
commit (2PC). In 2PC INSERT is first prepared, which will make it
transition to PREPARED state in innodb, and later committed which
causes the new assertion from MDEV-24035 to fail.
Running the same test with sql_log_bin enabled also results in 2PC,
but the execution has one more step for ordered commit, between prepare
and commit. Ordered commit causes the transaction state to transition
back to TRX_STATE_NOT_STARTED. Thus avoiding the assertion.
This patch makes sure that when sql_log_bin=off, the ordered commit
step is not skipped, thus going through the expected state transitions
in the storage engine.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2025-04-02 04:29:40 +02:00
Sergei Golubchik
7d657fda64 Merge branch '10.11 into 11.4 2025-01-30 12:01:11 +01:00
Sergei Golubchik
e69f8cae1a Merge branch '10.6' into 10.11 2025-01-30 11:55:13 +01:00
Sergei Golubchik
066e8d6aea Merge branch '10.5' into 10.6 2025-01-29 11:17:38 +01:00
Oleksandr Byelkin
47f87c5f88 MDEV-20281 "[ERROR] Failed to write to mysql.slow_log:" without error reason
Add "backup" (in case of absence issued by error) reasons for failed logging.
2025-01-25 20:37:51 +01:00
Kristian Nielsen
72e1cc8f52 MDEV-35806: Error in read_log_event() corrupts relay log writer, crashes server
In Log_event::read_log_event(), don't use IO_CACHE::error of the relay log's
IO_CACHE to signal an error back to the caller. When reading the active
relay log, this flag is also being used by the IO thread, and setting it can
randomly cause the IO thread to wrongly detect IO error on writing and
permanently disable the relay log.

This was seen sporadically in test case rpl.rpl_from_mysql80. The read
error set by the SQL thread in the IO_CACHE would be interpreted as a
write error by the IO thread, which would cause it to throw a fatal
error and close the relay log. And this would later cause CHANGE
MASTER to try to purge a closed relay log, resulting in nullptr crash.

SQL thread is not able to parse an event read from the relay log. This
can happen like here when replicating unknown events from a MySQL master,
potentially also for other reasons.

Also fix a mistake in my_b_flush_io_cache() introduced back in 2001
(fa09f2cd7e) where my_b_flush_io_cache() could wrongly return an error set
in IO_CACHE::error, even if the flush operation itself succeeded.

Also fix another sporadic failure in rpl.rpl_from_mysql80 where the outout
of MASTER_POS_WAIT() depended on timing of SQL and IO thread.

Reviewed-by: Monty <monty@mariadb.org>
Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-01-24 09:15:20 +00:00
Sergei Golubchik
f1a7693bc0 Merge branch '10.11' into 11.4 2025-01-14 23:45:41 +01:00
Sergei Golubchik
221aa5e08f Merge branch '10.6' into 10.11 2025-01-10 13:14:42 +01:00
Marko Mäkelä
17f01186f5 Merge 10.11 into 11.4 2025-01-09 07:58:08 +02:00
Kristian Nielsen
39f93b6eab MDEV-29744: Fix incorrect locking order of LOCK_log/LOCK_commit_ordered and LOCK_global_system_variables
The LOCK_global_system_variables must not be held when taking mutexes
such as LOCK_commit_ordered and LOCK_log, as this causes inconsistent
mutex locking order that can theoretically cause the server to
deadlock.

To avoid this, temporarily release LOCK_global_system_variables in two
system variable update functions, like it is done in many other
places.

Enforce the correct locking order at server startup, to more easily
catch (in debug builds) any remaining wrong orders that may be hidden
elsewhere in the code.

Note that when this is merged to 11.4, similar unlock/lock of
LOCK_global_system_variables must be added in update_binlog_space_limit()
as is done in binlog_checksum_update() and fix_max_binlog_size(), as this
is a new function added in 11.4 that also needs the same fix. Tests will
fail with wrong mutex order until this is done.

Reviewed-by: Sergei Golubchik <serg@mariadb.org>
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-01-08 17:52:34 +01:00
Marko Mäkelä
a54d151fc1 Merge 10.6 into 10.11 2024-12-19 15:38:53 +02:00
Julius Goryavsky
155203c352 Merge branch '10.5' into '10.6' 2024-12-13 01:45:35 +01:00
Alexander Barkov
ab9182470d MDEV-31366 Assertion `thd->start_time' failed in bool LOGGER::slow_log_print(THD*, const char*, size_t, ulonglong)
Fixing a wrong DBUG_ASSERT.

thd->start_time and thd->start_time_sec_part cannot be 0 at the same time.

But thd->start_time can be 0 when thd->start_time_sec_part is not 0,
e.g. after:

SET timestamp=0.99;
2024-12-12 20:32:56 +01:00
Marko Mäkelä
2719cc4925 Merge 10.11 into 11.4 2024-12-02 11:35:34 +02:00
Marko Mäkelä
3d23adb766 Merge 10.6 into 10.11 2024-11-29 13:43:17 +02:00
ParadoxV5
d5f16d6305 Extract some of #3360 fixes to 10.6.x
That PR uncovered countless issues on `my_snprintf` uses.
This commit backports a squashed subset of their fixes (excludes #3485).
2024-11-18 13:29:04 +11:00
Brandon Nesterenko
b07258a0d5 MDEV-35109: Semi-sync Replication stalling Primary using wait point=AFTER_SYNC
For a primary configured with wait_point=AFTER_SYNC, if two threads
T1 (binlogging through MYSQL_BIN_LOG::write()) and T2 were
binlogging at the same time, T1 could accidentally wait for its
semi-sync ACK using the binlog coordinates of T2. Prior to
MDEV-33551, this only resulted in delayed transactions, because all
transactions shared the same condition variable for ACK signaling.
However, with the MDEV-33551 changes, each thread has its own
condition variable to signal. So T1 could wait indefinitely when
either:
  1) T1's ACK is received but not T2's when T1 goes into
wait_after_sync(), because the ACK receiver thread has already
notified about the T1 ACK, but T1 was _actually_ waiting on T2's
ACK, and therefore tries to wait (in vain).

  2) T1 goes to wait_after_sync() before any ACKs have arrived. When
T1's ACK comes in, T1 is woken up; however, sees it needs to wait
more (because it was actually waiting on T2's ACK), and goes to wait
again (this time, in vain).

Note that the actual cause of T1 waiting on T2's binlog coordinates
is when MYSQL_BIN_LOG::write() would call
Repl_semisync_master::wait_after_sync(), the binlog offset parameter
was read as the end of MYSQL_BIN_LOG::log_file, which is shared
among transactions. So if T2 had updated the binary log _after_ T1
had released LOCK_log, but not yet invoked wait_after_sync(), it
would use the end of the binary log file as the binlog offset, which
was that of T2 (or any future transaction).

The fix in this patch ensures consistency between the binary log
coordinates a transaction uses between report_binlog_update() and
wait_after_sync().

Reviewed By
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Andrei Elkin <andrei.elkin@mariadb.com>
2024-11-04 10:45:58 -07:00