1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-08 11:22:35 +03:00
Commit Graph

18623 Commits

Author SHA1 Message Date
Kristian Nielsen
585785c7bc Binlog-in-engine: Handle mixing transactional and non-transactional tables
When updating non-transactional tables inside a multi-statement transaction,
and binlog_direct_non_transactional_updates=1, then the non-transactional
updates are binlogged directly through the statement cache while the
transaction cache is still being added to in the main transaction.

Thus, move the engine_binlog_info out from binlog_cache_mngr and into the
individual stmt/trx binlog_cache_data, so that we can have separate
engine_binlog_info active for the statement and the transaction cache.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
97e9106e5a Binlog-in-engine: Make --binlog-storage-engine available as read-only system variable
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
6e7f1f95f0 Binlog-in-engine: Handle single event writes larger than binlog size
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
685b0b0def Binlog-in-engine: Implement dynamically changing binlog max size
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
31ba7922a0 Binlog-in-engine: Implement savepoint support
Support for SAVEPOINT, ROLLBACK TO SAVEPOINT, rolling back a failed
statement (keeping active transaction), and rolling back transaction.

For savepoints (and start-of-statement), if the binlog data to be rolled
back is still in the in-memory part of trx cache we can just truncate the
cache to the point.

But if we need to spill cache contents as out-of-band data containing one or
more savepoints/start-of-statement point, then split the spill at each point
and inform the engine of the savepoints.

In InnoDB, at savepoint set, save the state of the forest of perfect binary
trees being built. Then at rollback, restore the appropriate state.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
84da20e658 MDEV-34705: Binlog-in-engine: Protect against concurrent RESET MASTER and dump threads
This is actually an existing problem in the old binlog implementation, and
this patch is applicable to old binlog also. The problem is that RESET
MASTER can run concurrently with binlog dump threads / connected slaves.
This will remove the binlog from under the feet of the reader, which can
cause all sorts of strange behaviour.

This patch fixes the problem by disallowing to run RESET MASTER when dump
threads (or other RESET MASTER or SHOW BINARY LOGS) are running. An error is
thrown in this case, user must stop slaves and/or kill dump threads to make
the RESET MASTER go through. A slave that connects in the middle of RESET
MASTER will wait for it to complete.

Fix a lot of test cases to kill any lingering dump threads before doing
RESET MASTER, mostly just by sourcing include/kill_binlog_dump_threads.inc.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
d26851a575 MDEV-34705: Binlog-in-engine: Crash-safe slave
This patch makes replication crash-safe with the new binlog implementation,
even when --innodb-flush-log-at-trx-commit=0|2. The point is to not send any
binlog events to the slave until they have become durable on master, thus
avoiding that a slave may replicate a transaction that is lost during master
recovery, diverging the slave from the master.

Keep track of which point in the binlog has been durably synced to disk
(meaning the corresponding LSN has been durably synced to disk in the InnoDB
redo log). Each write to the binlog inserts an entry with offset and
corresponding LSN in a FIFO. Dump threads will first read only up to the
durable point in the binlog. A dump thread will then check the LSN fifo, and
do an InnoDB redo log sync if anything is pending. Then the FIFO is emptied
of any LSNs that have now become durable, and the durable point in the
binlog is updated and reading the binlog can continue.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
baec2064a1 MDEV-34705: Binlog-in-engine: Fix hang with event group of specific size
If the event group fitted in the binlog cache without the GTID event but not
with, the code would attempt to spill part of the GTID event as out-of-band
data, which is not correct. In release builds this would hang the server as
the spilling would try to lock an already owned mutex.

Fix by checking if the GTID event fits, and spilling any non-GTID data as
oob if it does not.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-03 14:44:51 +02:00
Kristian Nielsen
7a306564d7 MDEV-34705: Binlog-in-engine: mariadb-backup integration
InnoDB binlog files are now backed up along with other InnoDB data by
mariadb-backup.

The files are copied after backup locks have been released. Backup files
created later than the backup LSN are skipped. Then during --prepare, any
data missing from the hot-copied binlog files will be restored by the
binlog recovery code, and any excess data written after the backup LSN will
be zeroed out.

A couple test cases test taking a consistent backup of a server with active
traffic during the backup, by provisioning a slave from the restored binlog
position and checking that the slave can replicate from the original master
and get identical data.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-25 15:28:32 +02:00
Kristian Nielsen
f0d4b63bac MDEV-34705: Binlog-in-engine: Implement refcounting outstanding OOB records
Keep track of, for each binlog file, how many open transactions have
out-of-band data starting in that file. Then at the start of each new binlog
file, in the header page, record the file_no of the earliest file that this
file might contain commit records with references back to OOB records in
that earlier file.

Use this in PURGE BINARY LOGS, so that when a dump thread (slave connection)
is active in file number N, and that file (or a later one) may require
looking back in an earlier file number M for out-of-band records, purge will
stop already at file number M. This way, we avoid that purge accidentally
deletes some binlog file that a dump thread would later get an error on
because it needs to read out-of-band data.

This patch also includes placeholder data for a similar facility for XA
references. The actual implementation of support for XA is for later though.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-19 12:26:28 +02:00
Kristian Nielsen
d496e5278d MDEV-34705: Binlog-in-engine: Integration with server-layer code
Mostly various fixes to avoid initializing or creating any data or files for
the legacy binlog.

A possible later refinement could be to sub-class the binlog class
differently for legacy and in-engine binlogs, writing separate virtual
functions for behaviour that differ, extracting common functionality into
sub-methods. This could remove some if (opt_binlog_engine_hton)
conditionals.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-10 19:16:55 +02:00
Kristian Nielsen
da3e9edafb MDEV-34705: Binlog-in-engine: Fix race between reader and flush
A reader could latch a page that was currently being flushed to disk, while
the flushing thread is temporarily releasing the mutex. If the page was
complete with data when the flushing started, the flush thread would not
correctly wait for the reader to release the latch, and the page could be
freed while the reader was still using it.

Also adjust a couple assertions to reflect the addition of the file header
page as page 0.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-07 08:47:46 +02:00
Kristian Nielsen
e1055af14f MDEV-34705: Binlog-in-engine: Implement file header page
Now the first page of each binlog tablespace file is reserved as a file
header, replacing the use of extra fields in the first gtid state record of
the file. The header is primarily used during recovery, especially to get
the file LSN before which no redo should be applied to the file.

Using a dedicated page makes it possible to durably sync the file header to
disk after RESET MASTER (and at first server startup) and not have it
overwritten (and potentially corrupted) later; this guarantees that the
recovery will have at least one file header to look at to determine from
which LSN to apply redo records.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:51 +02:00
Kristian Nielsen
21751e21f1 MDEV-34705: Binlog-in-engine: Use separate 4k pagesize for binlog files
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:51 +02:00
Kristian Nielsen
e4935b716a MDEV-34705: Binlog-in-engine: Use the whole page for binlog data
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:51 +02:00
Kristian Nielsen
8b3b6770f4 MDEV-34705: Binlog-in-engine: Implement page checksum
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:51 +02:00
Kristian Nielsen
4cdb059b8c MDEV-34705: Binlog-in-engine: Recovery testcase + few bugfixes
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:51 +02:00
Kristian Nielsen
b3c6bbdbd3 MDEV-34705: Binlog-in-engine: First working recovery
Still needs more testing.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:51 +02:00
Kristian Nielsen
1582a6d885 MDEV-34705: Binlog-in-engine: Recovery intermediate commit
Add test case binlog_in_engine.recover with a first very simple recovery
test.

The test currently fails during InnoDB recovery:

2025-03-02 11:35:44 0 [ERROR] InnoDB: Missing FILE_DELETE or FILE_MODIFY for [page id: space=4294967281, page number=0] at 62894; set innodb_force_recovery=1 to ignore the record.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:51 +02:00
Kristian Nielsen
9e1fe70bfe MDEV-34705: Binlog-in-engine: Implement SHOW BINLOG EVENTS
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:51 +02:00
Kristian Nielsen
980a8e6c42 MDEV-34705: Binlog-in-engine: Implement legacy SHOW MASTER STATUS
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:51 +02:00
Kristian Nielsen
86fbbbe273 MDEV-34705: Binlog-in-engine: No use of InnoDB tablespace and bufferpool
In preparation for a simplified, lower-level recovery of binlog files
implemented in InnoDB, remove use of InnoDB tablespaces and buffer pool from
the binlog code. Instead, a custom binlog page fifo replaces the general
buffer pool for binlog pages, and tablespaces are replaced by simple file_no
references.

The new binlog page fifo is deliberately naively written in this commit for
simplicity, until the new recovery is complete and proven with tests; later
it can be improved for better efficiency and scalability. This first version
uses a simple global mutex, linear scans of linked lists, repeated
alloc/free of pages, and simple backgrund flush thread that uses
synchroneous pwrite() one page after another. Error handling is also mostly
omitted in this first version.

The page header/footer is not changed in this commit, nor is the pagesize,
to be done in a later patch.

The call to mtr_t::write_binlog() is currently commented-out in function
fsp_log_binlog_write() as it asserts in numerous places. To be enabled when
those asserts are fixed. For the same reason, the code does not yet
implement binlog_write_up_to(lsn_t lsn), to be done once mtr_t operations
are working.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
68f37e6e58 MDEV-34705: Binlog-in-engine: Implement DELETE_DOMAIN_ID for FLUSH
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
0671add213 MDEV-34705: Binlog-in-engine: Implement PURGE BINARY LOGS
Still ToDo: is to restrict auto-purge so that it does not purge any binlog
file with out-of-band data that might still be needed by a connected slave.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
9e3ec748fd MDEV-34705: Binlog-in-engine: Buildbot fixes
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
dd8ffe952d MDEV-34705: Binlog-in-engine: Misc. small fixes to make normal test suite mostly pass
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
c67b014c9c MDEV-34705: Binlog-in-engine: Implement RESET MASTER
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
6889c8e4cf MDEV-34705: Binlog-in-engine: Implement FLUSH BINARY LOGS
No DELETE_DOMAIN_ID supported yet, will come in a later commit, after PURGE
is implemented.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
947de2bfaf MDEV-34705: Binlog-in-engine: Implement SHOW BINARY LOGS
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
f0fdaa9665 MDEV-34705: Binlog-in-engine: Configurable binlog directory
Add option --binlog-directory, used to place the binlogs outside the data
directory (eg. to put them on different disk/file system).

Disallow specifying the binlog name in --log-bin when
--binlog-storage-engine is used, as the name is then not user configurable.

A ToDo (not implemented in this commit) is to use the --binlog-directory
value, if given, also for the legacy binlog implementation.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
6f6baf9655 MDEV-34705: Binlog-in-engine: Read side of out-of-band binlogging
With this commit, the out-of-band binlogging of large event groups in
multiple smaller records interleaved with other event groups is now working.

Instead of flushing the binlog cache to disk when they reach
@@binlog_cache_size, instead the cache is binlogged as an out-of-band
record. Then at transaction commit, a commit record is written containing
just the GTID and a link to the out-of-band data.

To facilitate append-only operation, the binlogged records do not have a
"next" pointer. Instead, they are written out as a forest of perfect binary
trees, the leftmost leaf of one tree pointing to the root of the previous
tree. This structure is used in the binlog reader to efficiently read out
the event group data consecutively for the binlog dump thread, needing to
maintain only O(log(N)) amount of memory during the reading.

As part of this commit, the existing binlog reader code is refactored to be
greatly improved, with a much cleaner explicit state machine and handling of
chunk/page/file boundaries etc.

Also fixes some bugs in the gtid_search::find_gtid_pos().

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
ce2269353f MDEV-34705: Binlog-in-engine: Working replication to slave
Only GTID slave connection is supported, at least for now.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:17 +02:00
Kristian Nielsen
951a472437 MDEV-34705: Inplement starting from a specific GTID position
To find the target position, we first loop backwards over binlog files,
reading the initial GTID state written at the start to find the file to
start in. We then binary search on the differential GTID states written
every --innodb-binlog-state-interval bytes.

This patch does only minimal changes to the dump thread code in sql_repl.cc
to be able to send out binlog data to the client. Some re-factoring/cleanup
should be done in a follow-up patch to more cleanly separate the two code
paths, avoid a lot of if-statements and make the binlog-in-engine code path
free of much of the cruft from the legacy binlog implementation.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:17 +02:00
Kristian Nielsen
586ed18fe9 MDEV-34705: Code to restore binlog GTID state at restart
To restore the binlog state, after finding the position in the old binlog to
continue from, read the full gtid state saved at the start of the binlog
file as well as the most recent differentioal gtid state written shortly
before the starting position. Then construct a binlog reader to read the
remaining few events (if any), and update with any GTIDs read to obtain the
final restored GTID binlog state.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:17 +02:00
Kristian Nielsen
1794fd427a MDEV-34705: Binlog in Engine: Resume from existing binlogs
When (re-)starting the server, check for any existing binlog files.
Open the last two found (if any), and find the position that was last
written before the restart. Continue binlogging from that point rather
than creating new binlog files.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:17 +02:00
Kristian Nielsen
75c334a9f8 MDEV-34705: Binlog in Engine
Initial code to read in the binlog dump thread events from InnoDB binlog.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:16 +02:00
Kristian Nielsen
44bd9f84c7 MDEV-34705: Binlog in Engine: Start of binlog reader (untested, incomplete)
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:16 +02:00
Kristian Nielsen
60436e40bd MDEV-34705: Binlog in Engine: Fix re-using ids for binlog tablespaces
Before creating the next binlog tablespace N+2, flush out and close the old
binlog tablespace N, so that the new tablespace can re-use the tablespace
id without conflict.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:16 +02:00
Kristian Nielsen
d2d62133a8 MDEV-34705: Binlog in Engine: Allocate next binlog tablespace as needed.
Only works for two tablespace files though. For the third, we need to
implement closing the first one, so that the tablespace id can be reused.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:16 +02:00
Kristian Nielsen
1db620338d MDEV-34705: Binlog in Engine: Early draft, first binlogging of DML to InnoDB tablespace
The option --innodb-in-engine now causes InnoDB DML commits to include
binlogging in the same mtr. Binlog group commit now skips binlogging to
old file-based binlog and passes events to InnoDB instead.

Many things unfinished still, like allocating new tablespaces when the first
one is filled, writing large event groups out-of-band to not bloat the
InnoDB commit record in the redo log and exceed max mtr size, writing DDL
and all other events to the InnoDB binlog, skipping the creation of the
old-style binlog, reading the new style binlog from InnoDB, etc. etc.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:16 +02:00
Kristian Nielsen
64315911ba MDEV-34705: Binlog in Engine: Very first sketch, able to create and write an InnoDB tablespace
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:16 +02:00
Marko Mäkelä
3ae8f114e2 Merge 10.11 into 11.4 2025-04-02 10:15:08 +03:00
Marko Mäkelä
aaec841865 Merge 10.6 into 10.11 2025-04-02 09:33:20 +03:00
Marko Mäkelä
4c0e2f1aca MDEV-35813: even more robust test case
The test in commit 1756b0f37d
is occasionally failing if there are unexpectedly many page cleaner
batches that are updating the log checkpoint by small amounts.
This occurs in particular when running the server under Valgrind.

Let us insert the same number of records with a larger number of
statements in a hope that the test would then be more likely to pass.
2025-04-02 08:12:29 +03:00
Julius Goryavsky
74f0b99edf Merge branch '10.6' into '10.11' 2025-04-02 06:33:39 +02:00
Denis Protivensky
c01bff4a10 MDEV-36360: Don't grab table-level X locks for applied inserts
It prevents a crash in wsrep_report_error() which happened when appliers would run
with FK and UK checks disabled and erroneously execute plain inserts as bulk inserts.

Moreover, in release builds such a behavior could lead to deadlocks between two applier
threads if a thread waiting for a table-level lock was ordered before the lock holder.
In that case the lock holder would proceed to commit order and wait forever for the
now-blocked other applier thread to commit before.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2025-04-02 04:50:30 +02:00
Julius Goryavsky
b983a911e9 galera mtr tests: synchronization between branches and editions 2025-04-02 04:50:11 +02:00
Julius Goryavsky
5003dac220 MDEV-36116: Post-merge correction for 10.6+ 2025-04-02 04:49:32 +02:00
Julius Goryavsky
03c31ab099 Merge branch '10.5' into '10.6' 2025-04-02 04:43:24 +02:00
Jan Lindström
25737dbab7 MDEV-33850 : For Galera, create sequence with low cache got signal 6 error: [ERROR] WSREP: FSM: no such a transition REPLICATING -> COMMITTED
Problem was that transacton was BF-aborted after certification
succeeded and transaction tried to rollback and during
rollback binlog stmt cache containing sequence value reservations
was written into binlog.

Transaction must replay because certification succeeded but
transaction must not be written into binlog yet, it will
be done during commit after the replay.

Fix is to skip binlog write if transaction must replay and
in replay we need to reset binlog stmt cache.

Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>
2025-04-02 04:29:40 +02:00