mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-08-08 11:22:35 +03:00

Author	SHA1	Message	Date
Kristian Nielsen	585785c7bc	Binlog-in-engine: Handle mixing transactional and non-transactional tables When updating non-transactional tables inside a multi-statement transaction, and binlog_direct_non_transactional_updates=1, then the non-transactional updates are binlogged directly through the statement cache while the transaction cache is still being added to in the main transaction. Thus, move the engine_binlog_info out from binlog_cache_mngr and into the individual stmt/trx binlog_cache_data, so that we can have separate engine_binlog_info active for the statement and the transaction cache. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-07-23 16:19:50 +02:00
Kristian Nielsen	7a67f72979	Binlog-in-engine: Also binlog non-innodb event groups Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-07-23 16:19:50 +02:00
Kristian Nielsen	97e9106e5a	Binlog-in-engine: Make --binlog-storage-engine available as read-only system variable Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-07-23 16:19:50 +02:00
Kristian Nielsen	6e7f1f95f0	Binlog-in-engine: Handle single event writes larger than binlog size Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-07-23 16:19:50 +02:00
Kristian Nielsen	685b0b0def	Binlog-in-engine: Implement dynamically changing binlog max size Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-07-23 16:19:50 +02:00
Kristian Nielsen	31ba7922a0	Binlog-in-engine: Implement savepoint support Support for SAVEPOINT, ROLLBACK TO SAVEPOINT, rolling back a failed statement (keeping active transaction), and rolling back transaction. For savepoints (and start-of-statement), if the binlog data to be rolled back is still in the in-memory part of trx cache we can just truncate the cache to the point. But if we need to spill cache contents as out-of-band data containing one or more savepoints/start-of-statement point, then split the spill at each point and inform the engine of the savepoints. In InnoDB, at savepoint set, save the state of the forest of perfect binary trees being built. Then at rollback, restore the appropriate state. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-07-23 16:19:50 +02:00
Kristian Nielsen	84da20e658	MDEV-34705: Binlog-in-engine: Protect against concurrent RESET MASTER and dump threads This is actually an existing problem in the old binlog implementation, and this patch is applicable to old binlog also. The problem is that RESET MASTER can run concurrently with binlog dump threads / connected slaves. This will remove the binlog from under the feet of the reader, which can cause all sorts of strange behaviour. This patch fixes the problem by disallowing to run RESET MASTER when dump threads (or other RESET MASTER or SHOW BINARY LOGS) are running. An error is thrown in this case, user must stop slaves and/or kill dump threads to make the RESET MASTER go through. A slave that connects in the middle of RESET MASTER will wait for it to complete. Fix a lot of test cases to kill any lingering dump threads before doing RESET MASTER, mostly just by sourcing include/kill_binlog_dump_threads.inc. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-07-23 16:19:50 +02:00
Kristian Nielsen	d26851a575	MDEV-34705: Binlog-in-engine: Crash-safe slave This patch makes replication crash-safe with the new binlog implementation, even when --innodb-flush-log-at-trx-commit=0\|2. The point is to not send any binlog events to the slave until they have become durable on master, thus avoiding that a slave may replicate a transaction that is lost during master recovery, diverging the slave from the master. Keep track of which point in the binlog has been durably synced to disk (meaning the corresponding LSN has been durably synced to disk in the InnoDB redo log). Each write to the binlog inserts an entry with offset and corresponding LSN in a FIFO. Dump threads will first read only up to the durable point in the binlog. A dump thread will then check the LSN fifo, and do an InnoDB redo log sync if anything is pending. Then the FIFO is emptied of any LSNs that have now become durable, and the durable point in the binlog is updated and reading the binlog can continue. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-07-23 16:19:50 +02:00
Kristian Nielsen	baec2064a1	MDEV-34705: Binlog-in-engine: Fix hang with event group of specific size If the event group fitted in the binlog cache without the GTID event but not with, the code would attempt to spill part of the GTID event as out-of-band data, which is not correct. In release builds this would hang the server as the spilling would try to lock an already owned mutex. Fix by checking if the GTID event fits, and spilling any non-GTID data as oob if it does not. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-07-03 14:44:51 +02:00
Kristian Nielsen	1b8ce5d554	MDEV-34705: Binlog-in-engine: Few bug fixes Fix that spilling of out-of-band data to the binlog could happen concurrently with binlog group commit, by holding LOCK_commit_ordered over all binlog writes now. Fix silly use-after-free bug where data was accessed in the old buffer after realloc(). Improve the wording of the error when specifying an argument for --log-bin. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-05-20 11:13:56 +02:00
Kristian Nielsen	9e13086ab8	MDEV-34705: Binlog-in-engine: Fix leftover fsync of legacy binlog Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-05-15 14:30:46 +02:00
Kristian Nielsen	f0d4b63bac	MDEV-34705: Binlog-in-engine: Implement refcounting outstanding OOB records Keep track of, for each binlog file, how many open transactions have out-of-band data starting in that file. Then at the start of each new binlog file, in the header page, record the file_no of the earliest file that this file might contain commit records with references back to OOB records in that earlier file. Use this in PURGE BINARY LOGS, so that when a dump thread (slave connection) is active in file number N, and that file (or a later one) may require looking back in an earlier file number M for out-of-band records, purge will stop already at file number M. This way, we avoid that purge accidentally deletes some binlog file that a dump thread would later get an error on because it needs to read out-of-band data. This patch also includes placeholder data for a similar facility for XA references. The actual implementation of support for XA is for later though. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-04-19 12:26:28 +02:00
Kristian Nielsen	d496e5278d	MDEV-34705: Binlog-in-engine: Integration with server-layer code Mostly various fixes to avoid initializing or creating any data or files for the legacy binlog. A possible later refinement could be to sub-class the binlog class differently for legacy and in-engine binlogs, writing separate virtual functions for behaviour that differ, extracting common functionality into sub-methods. This could remove some if (opt_binlog_engine_hton) conditionals. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-04-10 19:16:55 +02:00
Kristian Nielsen	b3c6bbdbd3	MDEV-34705: Binlog-in-engine: First working recovery Still needs more testing. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-04-06 10:01:51 +02:00
Kristian Nielsen	68f37e6e58	MDEV-34705: Binlog-in-engine: Implement DELETE_DOMAIN_ID for FLUSH Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-04-06 10:01:50 +02:00
Kristian Nielsen	0671add213	MDEV-34705: Binlog-in-engine: Implement PURGE BINARY LOGS Still ToDo: is to restrict auto-purge so that it does not purge any binlog file with out-of-band data that might still be needed by a connected slave. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-04-06 10:01:50 +02:00
Kristian Nielsen	d4b37fcc85	MDEV-34705: Binlog-in-engine: Handful of fixes Fix missing WORDS_BIGENDIAN define in ut0compr_int.cc. Fix misaligned read buffer for O_DIRECT. Fix wrong/missing update_binlog_end_pos() in binlog group commit. Fix race where active_binlog_file_no incremented too early. Fix wrong assertion when reader reaches the very start of (active+1). Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-04-06 10:01:50 +02:00
Kristian Nielsen	dd8ffe952d	MDEV-34705: Binlog-in-engine: Misc. small fixes to make normal test suite mostly pass Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-04-06 10:01:50 +02:00
Kristian Nielsen	c67b014c9c	MDEV-34705: Binlog-in-engine: Implement RESET MASTER Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-04-06 10:01:50 +02:00
Kristian Nielsen	6889c8e4cf	MDEV-34705: Binlog-in-engine: Implement FLUSH BINARY LOGS No DELETE_DOMAIN_ID supported yet, will come in a later commit, after PURGE is implemented. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-04-06 10:01:50 +02:00
Kristian Nielsen	6f6baf9655	MDEV-34705: Binlog-in-engine: Read side of out-of-band binlogging With this commit, the out-of-band binlogging of large event groups in multiple smaller records interleaved with other event groups is now working. Instead of flushing the binlog cache to disk when they reach @@binlog_cache_size, instead the cache is binlogged as an out-of-band record. Then at transaction commit, a commit record is written containing just the GTID and a link to the out-of-band data. To facilitate append-only operation, the binlogged records do not have a "next" pointer. Instead, they are written out as a forest of perfect binary trees, the leftmost leaf of one tree pointing to the root of the previous tree. This structure is used in the binlog reader to efficiently read out the event group data consecutively for the binlog dump thread, needing to maintain only O(log(N)) amount of memory during the reading. As part of this commit, the existing binlog reader code is refactored to be greatly improved, with a much cleaner explicit state machine and handling of chunk/page/file boundaries etc. Also fixes some bugs in the gtid_search::find_gtid_pos(). Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-04-06 10:01:50 +02:00
Kristian Nielsen	07232f1e45	MDEV-34705: out-of band binlogging, fix trx_cache handling for out-of-band Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-04-06 10:01:48 +02:00
Kristian Nielsen	c80d87f8c5	MDEV-34705: out-of band binlogging, partial untested commit to do a separate refactoring of end_event Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-04-06 10:00:17 +02:00
Kristian Nielsen	ce2269353f	MDEV-34705: Binlog-in-engine: Working replication to slave Only GTID slave connection is supported, at least for now. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-04-06 10:00:17 +02:00
Kristian Nielsen	586ed18fe9	MDEV-34705: Code to restore binlog GTID state at restart To restore the binlog state, after finding the position in the old binlog to continue from, read the full gtid state saved at the start of the binlog file as well as the most recent differentioal gtid state written shortly before the starting position. Then construct a binlog reader to read the remaining few events (if any), and update with any GTIDs read to obtain the final restored GTID binlog state. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-04-06 10:00:17 +02:00
Kristian Nielsen	18b9ec637e	MDEV-34705: Binlog in Engine: Searchability for GTID position Every N bytes (hardcoded at 64k for now, to become a configurable setting), write the binlog GTID state into the binlog tablespace. This allows to quickly find a given GTID position by binary search to the prior GTID state in the tablespace and then a small linear scan from that point. The full binlog state is dumped at the start of the binlog file; remaining states dumped are differential states containing only the changed (domain_id, server_id) pairs, to save space if binlog space is large. This commit only implements the writing of the binlog state to the tablespace at regular intervals. The binary search to be implemented in a subsequent commit. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-04-06 10:00:17 +02:00
Kristian Nielsen	094c772213	MDEV-34705: Binlog in Engine: Also binlog standalone (eg. DDL) in the engine Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-04-06 10:00:17 +02:00
Kristian Nielsen	219f643ba0	MDEV-34705: Binlog in Engine: Change option to --binlog-storage-engine to get a hton available Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-04-06 10:00:16 +02:00
Kristian Nielsen	1db620338d	MDEV-34705: Binlog in Engine: Early draft, first binlogging of DML to InnoDB tablespace The option --innodb-in-engine now causes InnoDB DML commits to include binlogging in the same mtr. Binlog group commit now skips binlogging to old file-based binlog and passes events to InnoDB instead. Many things unfinished still, like allocating new tablespaces when the first one is filled, writing large event groups out-of-band to not bloat the InnoDB commit record in the redo log and exceed max mtr size, writing DDL and all other events to the InnoDB binlog, skipping the creation of the old-style binlog, reading the new style binlog from InnoDB, etc. etc. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-04-06 10:00:16 +02:00
Marko Mäkelä	3ae8f114e2	Merge 10.11 into 11.4	2025-04-02 10:15:08 +03:00
Julius Goryavsky	74f0b99edf	Merge branch '10.6' into '10.11'	2025-04-02 06:33:39 +02:00
Julius Goryavsky	03c31ab099	Merge branch '10.5' into '10.6'	2025-04-02 04:43:24 +02:00
Jan Lindström	25737dbab7	MDEV-33850 : For Galera, create sequence with low cache got signal 6 error: [ERROR] WSREP: FSM: no such a transition REPLICATING -> COMMITTED Problem was that transacton was BF-aborted after certification succeeded and transaction tried to rollback and during rollback binlog stmt cache containing sequence value reservations was written into binlog. Transaction must replay because certification succeeded but transaction must not be written into binlog yet, it will be done during commit after the replay. Fix is to skip binlog write if transaction must replay and in replay we need to reset binlog stmt cache. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2025-04-02 04:29:40 +02:00
Daniele Sciascia	d698b784c8	MDEV-35658 Assertion `commit_trx' failed in test galera_as_master The test issues a simple INSERT statement, while sql_log_bin = 0. This option disables writes to binlog. However, since MDEV-7205, the option does not affect Galera, so changes are still replicated. So sql_log_bin=off, "partially" disabled the binlog and the INSERT will involve both binlog and innodb, thus requiring internal 2 phase commit (2PC). In 2PC INSERT is first prepared, which will make it transition to PREPARED state in innodb, and later committed which causes the new assertion from MDEV-24035 to fail. Running the same test with sql_log_bin enabled also results in 2PC, but the execution has one more step for ordered commit, between prepare and commit. Ordered commit causes the transaction state to transition back to TRX_STATE_NOT_STARTED. Thus avoiding the assertion. This patch makes sure that when sql_log_bin=off, the ordered commit step is not skipped, thus going through the expected state transitions in the storage engine. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2025-04-02 04:29:40 +02:00
Sergei Golubchik	7d657fda64	Merge branch '10.11 into 11.4	2025-01-30 12:01:11 +01:00
Sergei Golubchik	e69f8cae1a	Merge branch '10.6' into 10.11	2025-01-30 11:55:13 +01:00
Sergei Golubchik	066e8d6aea	Merge branch '10.5' into 10.6	2025-01-29 11:17:38 +01:00
Oleksandr Byelkin	47f87c5f88	MDEV-20281 "[ERROR] Failed to write to mysql.slow_log:" without error reason Add "backup" (in case of absence issued by error) reasons for failed logging.	2025-01-25 20:37:51 +01:00
Kristian Nielsen	72e1cc8f52	MDEV-35806: Error in read_log_event() corrupts relay log writer, crashes server In Log_event::read_log_event(), don't use IO_CACHE::error of the relay log's IO_CACHE to signal an error back to the caller. When reading the active relay log, this flag is also being used by the IO thread, and setting it can randomly cause the IO thread to wrongly detect IO error on writing and permanently disable the relay log. This was seen sporadically in test case rpl.rpl_from_mysql80. The read error set by the SQL thread in the IO_CACHE would be interpreted as a write error by the IO thread, which would cause it to throw a fatal error and close the relay log. And this would later cause CHANGE MASTER to try to purge a closed relay log, resulting in nullptr crash. SQL thread is not able to parse an event read from the relay log. This can happen like here when replicating unknown events from a MySQL master, potentially also for other reasons. Also fix a mistake in my_b_flush_io_cache() introduced back in 2001 (`fa09f2cd7e`) where my_b_flush_io_cache() could wrongly return an error set in IO_CACHE::error, even if the flush operation itself succeeded. Also fix another sporadic failure in rpl.rpl_from_mysql80 where the outout of MASTER_POS_WAIT() depended on timing of SQL and IO thread. Reviewed-by: Monty <monty@mariadb.org> Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-01-24 09:15:20 +00:00
Sergei Golubchik	f1a7693bc0	Merge branch '10.11' into 11.4	2025-01-14 23:45:41 +01:00
Sergei Golubchik	221aa5e08f	Merge branch '10.6' into 10.11	2025-01-10 13:14:42 +01:00
Marko Mäkelä	17f01186f5	Merge 10.11 into 11.4	2025-01-09 07:58:08 +02:00
Kristian Nielsen	39f93b6eab	MDEV-29744: Fix incorrect locking order of LOCK_log/LOCK_commit_ordered and LOCK_global_system_variables The LOCK_global_system_variables must not be held when taking mutexes such as LOCK_commit_ordered and LOCK_log, as this causes inconsistent mutex locking order that can theoretically cause the server to deadlock. To avoid this, temporarily release LOCK_global_system_variables in two system variable update functions, like it is done in many other places. Enforce the correct locking order at server startup, to more easily catch (in debug builds) any remaining wrong orders that may be hidden elsewhere in the code. Note that when this is merged to 11.4, similar unlock/lock of LOCK_global_system_variables must be added in update_binlog_space_limit() as is done in binlog_checksum_update() and fix_max_binlog_size(), as this is a new function added in 11.4 that also needs the same fix. Tests will fail with wrong mutex order until this is done. Reviewed-by: Sergei Golubchik <serg@mariadb.org> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-01-08 17:52:34 +01:00
Marko Mäkelä	a54d151fc1	Merge 10.6 into 10.11	2024-12-19 15:38:53 +02:00
Julius Goryavsky	155203c352	Merge branch '10.5' into '10.6'	2024-12-13 01:45:35 +01:00
Alexander Barkov	ab9182470d	MDEV-31366 Assertion `thd->start_time' failed in bool LOGGER::slow_log_print(THD, const char, size_t, ulonglong) Fixing a wrong DBUG_ASSERT. thd->start_time and thd->start_time_sec_part cannot be 0 at the same time. But thd->start_time can be 0 when thd->start_time_sec_part is not 0, e.g. after: SET timestamp=0.99;	2024-12-12 20:32:56 +01:00
Marko Mäkelä	2719cc4925	Merge 10.11 into 11.4	2024-12-02 11:35:34 +02:00
Marko Mäkelä	3d23adb766	Merge 10.6 into 10.11	2024-11-29 13:43:17 +02:00
ParadoxV5	d5f16d6305	Extract some of #3360 fixes to 10.6.x That PR uncovered countless issues on `my_snprintf` uses. This commit backports a squashed subset of their fixes (excludes #3485).	2024-11-18 13:29:04 +11:00
Brandon Nesterenko	b07258a0d5	MDEV-35109: Semi-sync Replication stalling Primary using wait point=AFTER_SYNC For a primary configured with wait_point=AFTER_SYNC, if two threads T1 (binlogging through MYSQL_BIN_LOG::write()) and T2 were binlogging at the same time, T1 could accidentally wait for its semi-sync ACK using the binlog coordinates of T2. Prior to MDEV-33551, this only resulted in delayed transactions, because all transactions shared the same condition variable for ACK signaling. However, with the MDEV-33551 changes, each thread has its own condition variable to signal. So T1 could wait indefinitely when either: 1) T1's ACK is received but not T2's when T1 goes into wait_after_sync(), because the ACK receiver thread has already notified about the T1 ACK, but T1 was _actually_ waiting on T2's ACK, and therefore tries to wait (in vain). 2) T1 goes to wait_after_sync() before any ACKs have arrived. When T1's ACK comes in, T1 is woken up; however, sees it needs to wait more (because it was actually waiting on T2's ACK), and goes to wait again (this time, in vain). Note that the actual cause of T1 waiting on T2's binlog coordinates is when MYSQL_BIN_LOG::write() would call Repl_semisync_master::wait_after_sync(), the binlog offset parameter was read as the end of MYSQL_BIN_LOG::log_file, which is shared among transactions. So if T2 had updated the binary log _after_ T1 had released LOCK_log, but not yet invoked wait_after_sync(), it would use the end of the binary log file as the binlog offset, which was that of T2 (or any future transaction). The fix in this patch ensures consistency between the binary log coordinates a transaction uses between report_binlog_update() and wait_after_sync(). Reviewed By ============ Kristian Nielsen <knielsen@knielsen-hq.org> Andrei Elkin <andrei.elkin@mariadb.com>	2024-11-04 10:45:58 -07:00

1 2 3 4 5 ...

2961 Commits