1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-05 13:16:09 +03:00
Commit Graph

2241 Commits

Author SHA1 Message Date
Kristian Nielsen
585785c7bc Binlog-in-engine: Handle mixing transactional and non-transactional tables
When updating non-transactional tables inside a multi-statement transaction,
and binlog_direct_non_transactional_updates=1, then the non-transactional
updates are binlogged directly through the statement cache while the
transaction cache is still being added to in the main transaction.

Thus, move the engine_binlog_info out from binlog_cache_mngr and into the
individual stmt/trx binlog_cache_data, so that we can have separate
engine_binlog_info active for the statement and the transaction cache.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
685b0b0def Binlog-in-engine: Implement dynamically changing binlog max size
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
31ba7922a0 Binlog-in-engine: Implement savepoint support
Support for SAVEPOINT, ROLLBACK TO SAVEPOINT, rolling back a failed
statement (keeping active transaction), and rolling back transaction.

For savepoints (and start-of-statement), if the binlog data to be rolled
back is still in the in-memory part of trx cache we can just truncate the
cache to the point.

But if we need to spill cache contents as out-of-band data containing one or
more savepoints/start-of-statement point, then split the spill at each point
and inform the engine of the savepoints.

In InnoDB, at savepoint set, save the state of the forest of perfect binary
trees being built. Then at rollback, restore the appropriate state.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
95ea6e15a6 MDEV-34705: Binlog-in-engine: Binlog reader to read whole page at a time
Instead of returning only one chunk at a time, make
ha_innodb_binlog_reader::read_data() try to read all chunks on the page.
This reduces the number of times each reader has to latch pages in the page
fifo, which contends for a global mutex also shared with the writer.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
d26851a575 MDEV-34705: Binlog-in-engine: Crash-safe slave
This patch makes replication crash-safe with the new binlog implementation,
even when --innodb-flush-log-at-trx-commit=0|2. The point is to not send any
binlog events to the slave until they have become durable on master, thus
avoiding that a slave may replicate a transaction that is lost during master
recovery, diverging the slave from the master.

Keep track of which point in the binlog has been durably synced to disk
(meaning the corresponding LSN has been durably synced to disk in the InnoDB
redo log). Each write to the binlog inserts an entry with offset and
corresponding LSN in a FIFO. Dump threads will first read only up to the
durable point in the binlog. A dump thread will then check the LSN fifo, and
do an InnoDB redo log sync if anything is pending. Then the FIFO is emptied
of any LSNs that have now become durable, and the durable point in the
binlog is updated and reading the binlog can continue.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-07-23 16:19:50 +02:00
Kristian Nielsen
f0d4b63bac MDEV-34705: Binlog-in-engine: Implement refcounting outstanding OOB records
Keep track of, for each binlog file, how many open transactions have
out-of-band data starting in that file. Then at the start of each new binlog
file, in the header page, record the file_no of the earliest file that this
file might contain commit records with references back to OOB records in
that earlier file.

Use this in PURGE BINARY LOGS, so that when a dump thread (slave connection)
is active in file number N, and that file (or a later one) may require
looking back in an earlier file number M for out-of-band records, purge will
stop already at file number M. This way, we avoid that purge accidentally
deletes some binlog file that a dump thread would later get an error on
because it needs to read out-of-band data.

This patch also includes placeholder data for a similar facility for XA
references. The actual implementation of support for XA is for later though.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-19 12:26:28 +02:00
Kristian Nielsen
d496e5278d MDEV-34705: Binlog-in-engine: Integration with server-layer code
Mostly various fixes to avoid initializing or creating any data or files for
the legacy binlog.

A possible later refinement could be to sub-class the binlog class
differently for legacy and in-engine binlogs, writing separate virtual
functions for behaviour that differ, extracting common functionality into
sub-methods. This could remove some if (opt_binlog_engine_hton)
conditionals.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-10 19:16:55 +02:00
Kristian Nielsen
9651561c11 MDEV-34705: Binlog-in-engine: Work-around compiler warning
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-07 11:49:55 +02:00
Kristian Nielsen
9e1fe70bfe MDEV-34705: Binlog-in-engine: Implement SHOW BINLOG EVENTS
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:51 +02:00
Kristian Nielsen
980a8e6c42 MDEV-34705: Binlog-in-engine: Implement legacy SHOW MASTER STATUS
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:51 +02:00
Kristian Nielsen
68f37e6e58 MDEV-34705: Binlog-in-engine: Implement DELETE_DOMAIN_ID for FLUSH
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
0671add213 MDEV-34705: Binlog-in-engine: Implement PURGE BINARY LOGS
Still ToDo: is to restrict auto-purge so that it does not purge any binlog
file with out-of-band data that might still be needed by a connected slave.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
c67b014c9c MDEV-34705: Binlog-in-engine: Implement RESET MASTER
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
6889c8e4cf MDEV-34705: Binlog-in-engine: Implement FLUSH BINARY LOGS
No DELETE_DOMAIN_ID supported yet, will come in a later commit, after PURGE
is implemented.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
947de2bfaf MDEV-34705: Binlog-in-engine: Implement SHOW BINARY LOGS
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
f0fdaa9665 MDEV-34705: Binlog-in-engine: Configurable binlog directory
Add option --binlog-directory, used to place the binlogs outside the data
directory (eg. to put them on different disk/file system).

Disallow specifying the binlog name in --log-bin when
--binlog-storage-engine is used, as the name is then not user configurable.

A ToDo (not implemented in this commit) is to use the --binlog-directory
value, if given, also for the legacy binlog implementation.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
6f6baf9655 MDEV-34705: Binlog-in-engine: Read side of out-of-band binlogging
With this commit, the out-of-band binlogging of large event groups in
multiple smaller records interleaved with other event groups is now working.

Instead of flushing the binlog cache to disk when they reach
@@binlog_cache_size, instead the cache is binlogged as an out-of-band
record. Then at transaction commit, a commit record is written containing
just the GTID and a link to the out-of-band data.

To facilitate append-only operation, the binlogged records do not have a
"next" pointer. Instead, they are written out as a forest of perfect binary
trees, the leftmost leaf of one tree pointing to the root of the previous
tree. This structure is used in the binlog reader to efficiently read out
the event group data consecutively for the binlog dump thread, needing to
maintain only O(log(N)) amount of memory during the reading.

As part of this commit, the existing binlog reader code is refactored to be
greatly improved, with a much cleaner explicit state machine and handling of
chunk/page/file boundaries etc.

Also fixes some bugs in the gtid_search::find_gtid_pos().

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
9230e75249 MDEV-34705: Binlog-in-engine: Small visibility tweak in handler_binlog_reader
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:50 +02:00
Kristian Nielsen
07232f1e45 MDEV-34705: out-of band binlogging, fix trx_cache handling for out-of-band
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:01:48 +02:00
Kristian Nielsen
c80d87f8c5 MDEV-34705: out-of band binlogging, partial untested commit to do a separate refactoring of end_event
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:17 +02:00
Kristian Nielsen
ce2269353f MDEV-34705: Binlog-in-engine: Working replication to slave
Only GTID slave connection is supported, at least for now.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:17 +02:00
Kristian Nielsen
951a472437 MDEV-34705: Inplement starting from a specific GTID position
To find the target position, we first loop backwards over binlog files,
reading the initial GTID state written at the start to find the file to
start in. We then binary search on the differential GTID states written
every --innodb-binlog-state-interval bytes.

This patch does only minimal changes to the dump thread code in sql_repl.cc
to be able to send out binlog data to the client. Some re-factoring/cleanup
should be done in a follow-up patch to more cleanly separate the two code
paths, avoid a lot of if-statements and make the binlog-in-engine code path
free of much of the cruft from the legacy binlog implementation.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:17 +02:00
Kristian Nielsen
18b9ec637e MDEV-34705: Binlog in Engine: Searchability for GTID position
Every N bytes (hardcoded at 64k for now, to become a configurable setting),
write the binlog GTID state into the binlog tablespace. This allows to
quickly find a given GTID position by binary search to the prior GTID state
in the tablespace and then a small linear scan from that point.

The full binlog state is dumped at the start of the binlog file; remaining
states dumped are differential states containing only the changed
(domain_id, server_id) pairs, to save space if binlog space is large.

This commit only implements the writing of the binlog state to the
tablespace at regular intervals. The binary search to be implemented in a
subsequent commit.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:17 +02:00
Kristian Nielsen
13d6249f19 MDEV-34705: Binlog in Engine: Pre-allocate binlog tablespaces in background thread
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:17 +02:00
Kristian Nielsen
094c772213 MDEV-34705: Binlog in Engine: Also binlog standalone (eg. DDL) in the engine
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:17 +02:00
Kristian Nielsen
75c334a9f8 MDEV-34705: Binlog in Engine
Initial code to read in the binlog dump thread events from InnoDB binlog.

Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:16 +02:00
Kristian Nielsen
44bd9f84c7 MDEV-34705: Binlog in Engine: Start of binlog reader (untested, incomplete)
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2025-04-06 10:00:16 +02:00
Sergei Golubchik
7d657fda64 Merge branch '10.11 into 11.4 2025-01-30 12:01:11 +01:00
Sergei Golubchik
e69f8cae1a Merge branch '10.6' into 10.11 2025-01-30 11:55:13 +01:00
Marko Mäkelä
98dbe3bfaf Merge 10.5 into 10.6 2025-01-20 09:57:37 +02:00
Aleksey Midenkov
e1e1e50bba MDEV-35343 DML debug logging
Usage:

mtr --mysqld=--debug=d,dml,query:i:o,/tmp/dml.log

Example output:

T@6    : dispatch_command: query: insert into t1 values ('a')
T@6    : handler::ha_write_row: exit: INSERT: t1(a) = 0
T@6    : dispatch_command: query: alter ignore table t1 add unique index (data)
T@6    : handler::ha_write_row: exit: INSERT: t1(a) = 0
T@6    : dispatch_command: query: alter ignore table t1 add unique index (data)
T@6    : handler::ha_write_row: exit: INSERT: t1(a) = 0

T@6    : dispatch_command: query: replace into t1 values ('b'), ('c'), ('a'), ('b')
T@6    : handler::ha_write_row: exit: INSERT: t1(b) = 0
T@6    : handler::ha_write_row: exit: INSERT: t1(c) = 0
T@6    : handler::ha_write_row: exit: INSERT: t1(a) = 121
T@6    : write_record: exit: DELETE: t1(a) = 0
T@6    : handler::ha_write_row: exit: INSERT: t1(a) = 0
T@6    : handler::ha_write_row: exit: INSERT: t1(b) = 121
T@6    : write_record: exit: DELETE: t1(b) = 0
T@6    : handler::ha_write_row: exit: INSERT: t1(b) = 0
2025-01-14 18:56:13 +03:00
Kristian Nielsen
0f47db8525 Merge 10.11 -> 11.4
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-12-05 11:01:42 +01:00
Kristian Nielsen
e7c6cdd842 Merge 10.6 -> 10.11
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
2024-12-05 10:11:58 +01:00
Julius Goryavsky
cefdc3e67d Merge branch '10.5' into '10.6' 2024-12-03 13:08:12 +01:00
Aleksey Midenkov
55b5993205 Cleanup: make_keypart_map inline
for easier debugging.
2024-12-03 13:49:42 +03:00
Marko Mäkelä
2719cc4925 Merge 10.11 into 11.4 2024-12-02 11:35:34 +02:00
Marko Mäkelä
3d23adb766 Merge 10.6 into 10.11 2024-11-29 13:43:17 +02:00
Marko Mäkelä
7d4077cc11 Merge 10.5 into 10.6 2024-11-29 12:37:46 +02:00
Brandon Nesterenko
7a8eb26bda MDEV-34348: Fix casting related to plugins
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict

Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
2024-11-23 08:14:23 -07:00
Monty
93fb364cd9 Removed not used ha_drop_table()
This was done after changing call in sql_select.cc from
ha_drop_table() to drop_table(), like in 11.5
2024-11-20 09:59:43 +02:00
Oleksandr Byelkin
c770bce898 Merge branch '11.2' into 11.4 2024-10-30 15:11:17 +01:00
Oleksandr Byelkin
69d033d165 Merge branch '10.11' into 11.2 2024-10-29 16:42:46 +01:00
Oleksandr Byelkin
3d0fb15028 Merge branch '10.6' into 10.11 2024-10-29 15:24:38 +01:00
Oleksandr Byelkin
f00711bba2 Merge branch '10.5' into 10.6 2024-10-29 14:20:03 +01:00
Oleg Smirnov
6bd1cb0ea0 MDEV-34880 Incorrect result for query with derived table having TEXT field
When a derived table which has distinct values and BLOB fields is
materialized, an index is created over all columns to ensure only
unique values are placed to the result.
This index is created in a special mode HA_UNIQUE_HASH to support BLOBs.
Later the optimizer may incorrectly choose this index to retrieve values
from the derived table, although such type of index cannot be used
for data retrieval.

This commit excludes HA_UNIQUE_HASH indexes from adding to
`JOIN::keyuse` array thus preventing their subsequent usage for
data retrieval
2024-10-23 17:55:00 +07:00
Monty
0de2613e7a Fixed that SHOW CREATE TABLE for sequences shows used table options 2024-10-16 17:24:46 +03:00
Oleksandr Byelkin
1d0e94c55f Merge branch '10.5' into 10.6 2024-10-09 08:38:48 +02:00
Aleksey Midenkov
d37bb140b1 MDEV-31297 Create table as select on system versioned tables do not
work consistently on replication

Row-based replication does not execute CREATE .. SELECT but instead
CREATE TABLE. CREATE .. SELECT creates implict system fields on
unusual place: in-between declared fields and select fields. That was
done because select_field_pos logic requires select fields go last in
create_list.

So, CREATE .. SELECT on master and CREATE TABLE on slave create system
fields on different positions and replication gets field mismatch.

To fix this we've changed CREATE .. SELECT to create implicit system
fields on usual place in the end and updated select_field_pos for
handling this case.
2024-10-08 13:08:10 +03:00
Marko Mäkelä
b53b81e937 Merge 11.2 into 11.4 2024-10-03 14:32:14 +03:00
Monty
6f6c1911dc MDEV-34251 Conditional jump or move depends on uninitialised value in ha_handler_stats::has_stats
Fixed by checking handler_stats if it's active instead of
thd->variables.log_slow_verbosity & LOG_SLOW_VERBOSITY_ENGINE.

Reviewed-by: Sergei Petrunia <sergey@mariadb.com>
2024-10-03 13:45:26 +03:00