mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-11-09 11:41:36 +03:00

Author	SHA1	Message	Date
Monty	22024da64e	MDEV-36143 Row event replication with Aria does not honour BLOCK_COMMIT This commit fixes a bug where Aria tables are used in (master->slave1->slave2) and a backup is taken on slave2. In this case it is possible that the replication position in the backup, stored in mysql.gtid_slave_pos, will be wrong. This will lead to replication errors if one is trying to use the backup as a new slave. Analyze: Replicated row events are committed with trans_commit_stmt() and thd->transaction->all.ha_list != 0. This means that backup_commit_lock is not taken for Aria tables, which means the rows are committed and binary logged on the slave under BLOCK_COMMIT which should not happen. This issue does not occur on the master as thd->transaction->all.ha_list is == 0 under AUTO_COMMIT, which sets 'is_real_trans' and 'rw_trans' which in turn causes backup_commit_lock to be taken. Fixed by checking in ha_check_and_coalesce_trx_read_only() if all handlers supports rollback and if not, then wait for BLOCK_COMMIT also for statement commit.	2025-06-02 14:02:53 +03:00
Hemant Dangi	d38558f99b	MDEV-36117: MDL BF-BF conflict on ALTER and UPDATE with multi-level foreign key parents Issue: Mariadb acquires additional MDL locks on UPDATE/INSERT/DELETE statements on table with foreign keys. For example, table t1 references t2, an UPDATE to t1 will MDL lock t2 in addition to t1. A replica may deliver an ALTER t1 and UPDATE t2 concurrently for applying. Then the UPDATE may acquire MDL lock for t1, followed by a conflict when the ALTER attempts to MDL lock on t1. Causing a BF-BF conflict. Solution: Additional keys for the referenced/foreign table needs to be added to avoid potential MDL conflicts with concurrent update and DDLs. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2025-05-20 20:59:10 +02:00
Julius Goryavsky	b983a911e9	galera mtr tests: synchronization between branches and editions	2025-04-02 04:50:11 +02:00
Julius Goryavsky	03c31ab099	Merge branch '10.5' into '10.6'	2025-04-02 04:43:24 +02:00
Denis Protivensky	dd54ce9e10	MDEV-36116: Remove debug assert in TOI when executing thread is killed Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2025-04-02 04:29:40 +02:00
Julius Goryavsky	41565615c5	galera: synchronization changes to stop random test failures	2025-04-02 04:29:34 +02:00
Julius Goryavsky	e3d7d5ca26	Merge branch '10.5' into '10.6'	2025-02-27 04:02:33 +01:00
Jan Lindström	b167730499	MDEV-34891 : SST failure occurs when gtid_strict_mode is enabled Problem was that initial GTID was set on wsrep_before_prepare out-of-order. In practice GTID was set to same as previous executed transaction GTID. In recovery valid GTID was found from prepared transaction and this transaction is committed leading to fact that same GTID was executed twice. This is fixed by setting invalid GTID at wsrep_before_prepare and later in wsrep_before_commit actual correct GTID is set and this setting is done while we are in commit monitor i.e. assigment is done in order of replication. In recovery if prepared transaction is found we check its GTID, if it is invalid transaction will be rolled back and if it is valid it will be committed. Initialize gtid seqno from recovered seqno when bootstrapping a new cluster. Added two test cases for both mariabackup and rsync SST methods to show that GTIDs remain consistent on cluster and that all expected rows are in the table. Added tests for wsrep GTID recovery with binlog on and off. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2025-02-18 19:30:04 +01:00
Jan Lindström	bb64a51037	MDEV-35941 : galera_bf_abort_lock_table fails with wait for metadata lock Problem was missing case from wsrep_handle_mdl_conflict. Test case was trying to confirm that LOCK TABLE thread is not BF-aborted. However as case was missing it was BF-aborted. Test case passed because BF-aborting takes time and used wait condition might see expected thread status before it was BF-aborted. Test naturally failed if BF-aborting was done early enough. Fix is to add missing case for SQLCOM_LOCK_TABLES to wsrep_handle_mdl_conflict. Note that using LOCK TABLE is still not recomended on cluster because it could cause cluster hang. This is a 10.5 specific commit that will then be overridden by another one for 10.6+. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2025-02-12 13:35:47 +01:00
Julius Goryavsky	7b040e53cc	galera mtr tests: fixes for test failures, 'cosmetic' changes and unification between versions	2025-02-12 12:25:09 +01:00
Jan Lindström	3009b5439d	MDEV-35941 : galera_bf_abort_lock_table fails with wait for metadata lock Problem was missing case from wsrep_handle_mdl_conflict. Test case was trying to confirm that LOCK TABLE thread is not BF-aborted. However as case was missing it was BF-aborted. Test case passed because BF-aborting takes time and used wait condition might see expected thread status before it was BF-aborted. Test naturally failed if BF-aborting was done early enough. Fix is to add missing case for SQLCOM_LOCK_TABLES to wsrep_handle_mdl_conflict. Note that using LOCK TABLE is still not recomended on cluster because it could cause cluster hang. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2025-02-12 01:23:41 +01:00
Julius Goryavsky	d6f31ed263	Merge branch '10.5' into '10.6'	2025-02-03 10:44:13 +01:00
Daniele Sciascia	10fd2c207a	MDEV-35946 Assertion `thd->is_error()' failed in Sql_cmd_dml::prepare Fix a regression that caused assertion thd->is_error() after sync wait failures. If wsrep_sync_wait() fails make sure a appropriate error is set. Partially revert `75dd0246f8`. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2025-02-03 10:03:50 +01:00
Julius Goryavsky	53c693ec2f	Merge branch '10.5' into '10.6'	2025-02-02 12:55:16 +01:00
Jan Lindström	22414d2ed0	MDEV-27861: Creating partitioned tables should not be allowed with wsrep_osu_method=TOI and wsrep_strict_ddl=ON Problem was incorrect handling of partitioned tables, because db_type == DB_TYPE_PARTITION_DB wsrep_should_replicate_ddl incorrectly marked DDL as not replicatable. However, in partitioned tables we should check implementing storage engine from table->file->partition_ht() if available because if partition handler is InnoDB all DDL should be allowed even with wsrep_strict_ddl. For other storage engines DDL should not be allowed and error should be issued. This is 10.5 version of the fix. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2025-02-02 04:54:42 +01:00
Sergei Golubchik	066e8d6aea	Merge branch '10.5' into 10.6	2025-01-29 11:17:38 +01:00
Hemant Dangi	c0b11e75ff	MDEV-34218: Mariadb Galera cluster fails when replicating from Mysql 5.7 on use of DDL Issue: Mariadb Galera cluster fails to replicate from Mysql 5.7 when configured with MASTER_USE_GTID=no option for CHANGE MASTER. HOST: mysql, mysql 5.7.44 binlog_format=ROW HOST: m1, mariadb 10.6 GALERA NODE replicating from HOST mysql, Using_Gtid: No (log file and position) HOST: m2 mariadb 10.6 GALERA NODE HOST: m3 mariadb 10.6 GALERA NODE Error on m1: 2024-05-22 16:11:07 1 [ERROR] WSREP: Vote 0 (success) on 78cebda7-1876-11ef-896b-8a58fca50d36:2565 is inconsistent with group. Leaving cluster. Error on m2 and m3: 2024-05-22 16:11:06 2 [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 42, event_type: -94 2024-05-22 16:11:06 2 [ERROR] WSREP: applier could not read binlog event, seqno: 2565, len: 482 It fails in Gtid_log_event::is_valid() check on secondary node when sequence number sent from primary is 0. On primary for applier or slave thread sequence number is set to 0, when both thd->variables.gtid_seq_no and thd->variables.wsrep_gtid_seq_no have value 0. Solution: Skip adding Gtid Event on primary for applier or slave thread when both thd->variables.gtid_seq_no and thd->variables.wsrep_gtid_seq_no have value 0. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2025-01-27 17:17:11 +01:00
Nikita Malyavin	765458c93c	fix my_error usage	2025-01-26 16:15:46 +01:00
sjaakola	552cba92de	MDEV-35710 support for threadpool When client connections use threadpool, i.e. configuration has: thread_handling = pool-of-threads it turned out that during wsrep replication shutdown, not all client connections could be closed. Reason was that some client threads has stmt_da in state DA_EOF, and this state was earlier used to detect if client connection was issuing SHUTDOWN command. To fix this, the connection executing SHUTDOWN is now detected by looking at the actual command being executed: thd->get_command() == COM_SHUTDOWN During replication shutdown, all other connections but the SHUTDOWN executor, are terminated. This commit has new mtr test galera.galera_threadpool, which opens a number of threadpool client connections, and then restarts the node to verify that connections in threadpool are terminated during shutdown. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2025-01-24 17:09:34 +01:00
Julius Goryavsky	50cf189717	MDEV-35018 addendum: improved warnings handling Fixed regression after original MDEV-35018 fix related to warnings appearing when running MDEV-26266 test and other possible problems with warnings.	2025-01-24 17:09:21 +01:00
Daniele Sciascia	841a7d391b	MDEV-35018 MDL BF-BF conflict on DROP TABLE DROP TABLE on child and UPDATE of parent table can cause an MDL BF-BF conflict when applied concurrently. DROP TABLE takes MDL locks on both child and its parent table, however it only it did not add certification keys for the parent table. This patch adds the following: * Append certification keys corresponding to all parent tables before DROP TABLE replication. * Fix wsrep_append_fk_parent_table() so that it works when it is given a table list containing temporary tables. * Make sure function wsrep_append_fk_parent_table() is only called for local transaction. That was not the case for ALTER TABLE. * Add a test case that verifies that UPDATE parent depends on preceeding DROP TABLE child. * Adapt galera_ddl_fk_conflict test to work with DROP TABLE as well. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2025-01-24 17:02:44 +01:00
Julius Goryavsky	3cd9f9d1b3	Merge branch '10.5' into '10.6'	2024-12-18 05:09:23 +01:00
Daniele Sciascia	75dd0246f8	Remove error handling from wsrep_sync_wait() Let the wsrep-lib error be set/overriden at the end of dispatch_command(). Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-12-17 09:52:32 +01:00
Kristian Nielsen	0166c89e02	Merge 10.5 -> 10.6 Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-12-05 09:20:36 +01:00
Teemu Ollakka	a2575a0703	MDEV-35465 Async replication stops working on Galera async replica node when parallel replication is enabled Parallel slave failed to retry in retry_event_group() with error WSREP: Parallel slave worker failed at wsrep_before_command() hook Fix wsrep transaction cleanup/restart in retry_event_group() to properly clean up previous transaction by calling wsrep_after_statement(). Also move call to reset error after call to wsrep_after_statement() to make sure that it remains effective. Add a MTR test galera_as_slave_parallel_retry to reproduce the error when the fix is not present. Other issues which were detected when testing with sysbench: Check if parallel slave is killed for retry before waiting for prior commits in THD::wsrep_parallel_slave_wait_for_prior_commit(). This is required with slave-parallel-mode=optimistic to avoid deadlock when a slave later in commit order manages to reach prepare phase before a lock conflict is detected. Suppress wsrep applier specific warning for slave threads. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-12-03 15:05:32 +01:00
Jan Lindström	f5aed74573	MDEV-35486 : MDEV-33997 test failed Problem was that at wsrep_to_isolation_end saved_lock_wait_timeout variable was set to thd->variables.lock_wait_timeout when RSU is used and variable value was 0 leading sporadic lock wait timeout errors. Fixed by removing incorrect variable set. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-11-27 13:00:08 +01:00
Thirunarayanan Balathandayuthapani	074831ec61	Merge branch 10.5 into 10.6	2024-11-08 18:17:15 +05:30
Jan Lindström	e4a3a11dcc	MDEV-35344 : Galera test failure on galera_sync_wait_upto Ignoring configured server_id should not be a warning because correct configuration is documented. Changed message to info level with more detailed message what was configured and what will be actually used. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-11-06 04:59:10 +01:00
Denis Protivensky	6d5fe9ed0d	MDEV-28378: Don't hang trying to peek log event past the end of log While applying CTAS log event, we peek the relay log to see if CTAS contains inserted rows or if it's empty. The peek function didn't check for end-of-file condition when tried to get the next event from the log, and thus it hanged. The fix includes checking for end-of-file while peeking for log events and considering returned XID_EVENT value as a sign of an empty CTAS. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-11-06 04:59:09 +01:00
Oleksandr Byelkin	f00711bba2	Merge branch '10.5' into 10.6	2024-10-29 14:20:03 +01:00
Jan Lindström	b3be3c2157	MDEV-30653 : With wsrep_mode=REPLICATE_ARIA only part of mixed-engine transactions is replicated Replication of non-transactional engines is experimental and uses TOI. This naturally means that if there is open transaction with transactional engine it's changes will be rolled back. Fixed by adding error message if non-transactional engine is part of multi-engine transaction with warning. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-10-23 04:00:52 +02:00
Monty	bddbef3573	MDEV-34533 asan error about stack overflow when writing record in Aria The problem was that when using clang + asan, we do not get a correct value for the thread stack as some local variables are not allocated at the normal stack. It looks like that for example clang 18.1.3, when compiling with -O2 -fsanitize=addressan it puts local variables and things allocated by alloca() in other areas than on the stack. The following code shows the issue Thread 6 "mariadbd" hit Breakpoint 3, do_handle_one_connection (connect=0x5080000027b8, put_in_cache=<optimized out>) at sql/sql_connect.cc:1399 THD thd; 1399 thd->thread_stack= (char) &thd; (gdb) p &thd (THD *) 0x7fffedee7060 (gdb) p $sp (void ) 0x7fffef4e7bc0 The address of thd is 24M away from the stack pointer (gdb) info reg ... rsp 0x7fffef4e7bc0 0x7fffef4e7bc0 ... r13 0x7fffedee7060 140737185214560 r13 is pointing to the address of the thd. Probably some kind of "local stack" used by the sanitizer I have verified this with gdb on a recursive call that calls alloca() in a loop. In this case all objects was stored in a local heap, not on the stack. To solve this issue in a portable way, I have added two functions: my_get_stack_pointer() returns the address of the current stack pointer. The code is using asm instructions for intel 32/64 bit, powerpc, arm 32/64 bit and sparc 32/64 bit. Supported compilers are gcc, clang and MSVC. For MSVC 64 bit we are using _AddressOfReturnAddress() As a fallback for other compilers/arch we use the address of a local variable. my_get_stack_bounds() that will return the address of the base stack and stack size using pthread_attr_getstack() or NtCurrentTed() with fallback to using the address of a local variable and user provided stack size. Server changes are: - Moving setting of thread_stack to THD::store_globals() using my_get_stack_bounds(). - Removing setting of thd->thread_stack, except in functions that allocates a lot on the stack before calling store_globals(). When using estimates for stack start, we reduce stack_size with MY_STACK_SAFE_MARGIN (8192) to take into account the stack used before calling store_globals(). I also added a unittest, stack_allocation-t, to verify the new code. Reviewed-by: Sergei Golubchik <serg@mariadb.org>	2024-10-16 17:24:46 +03:00
Marko Mäkelä	7e0afb1c73	Merge 10.5 into 10.6	2024-10-03 09:31:39 +03:00
Denis Protivensky	9f61aa4f8a	MDEV-34822 pre-fix: Make wsrep_ready flag read lock-free It's read for every command execution, and during slave replication for every applied event. It's also planned to be used during write set applying, so it means mostly every server thread is going to compete for the mutex covering this variable, especially considering how rarely it changes. Converting wsrep_ready to atomic relaxes the things. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-09-26 00:04:56 +02:00
Denis Protivensky	4e2c02a12c	MDEV-33133: MDL conflict handling code should skip BF-aborted trxs It's possible that MDL conflict handling code is called more than once for a transaction when: - it holds more than one conflicting MDL lock - reschedule_waiters() is executed, which results in repeated attempts to BF-abort already aborted transaction. In such situations, it might be that BF-aborting logic sees a partially rolled back transaction and erroneously decides on future actions for such a transaction. The specific situation tested and fixed is when a SR transaction applied in the node gets BF-aborted by a started TOI operation. It's then caught with the server transaction already rolled back, but with no MDL locks yet released. This caused wrong state detection for such a transaction during repeated MDL conflict handling code execution. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-09-03 07:45:57 +02:00
Julius Goryavsky	d5a669b6b6	Merge branch '10.5' into '10.6'	2024-09-03 07:44:51 +02:00
Denis Protivensky	235f33e360	MDEV-33133: MDL conflict handling code should skip BF-aborted trxs It's possible that MDL conflict handling code is called more than once for a transaction when: - it holds more than one conflicting MDL lock - reschedule_waiters() is executed, which results in repeated attempts to BF-abort already aborted transaction. In such situations, it might be that BF-aborting logic sees a partially rolled back transaction and erroneously decides on future actions for such a transaction. The specific situation tested and fixed is when a SR transaction applied in the node gets BF-aborted by a started TOI operation. It's then caught with the server transaction already rolled back, but with no MDL locks yet released. This caused wrong state detection for such a transaction during repeated MDL conflict handling code execution. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-09-01 16:19:59 +02:00
Julius Goryavsky	bac0804d81	Merge branch '10.5' into '10.6'	2024-09-01 06:51:25 +02:00
Alexey Yurchenko	731a5aba0b	Use only MySQL code for TOI error vote For TOI events specifically we have a situation where in case of the same error different nodes may generate different messages. This may be for two reasons: - different locale setting between the current client session and server default (we can reasonably require server locales to be identical on all nodes, but user can change message locale for the session) - non-deterministic course of STATEMENT execution e.g. for ALTER TABLE On the other hand we may reasonably expect TOI event failures since they are executed after replication, so we must ensure that voting is consistent. For that purpose error codes should be sufficiently unique and deterministic for TOI event failures as DDLs normally deal with a single object, so we can merely use MySQL error codes to vote on. Notice that this problem does not happen with regular transactional writesets, since the originator node will always vote success and replica nodes are assumed to have the same global locale setting. As such different error messages indicate different errors even if the error code is the same (e.g. ER_DUP_KEY can happen on different rows tables). Use only MySQL error code (without the error message) for error voting in case of TOI event failure. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-09-01 02:58:27 +02:00
Julius Goryavsky	c21aa486a8	MDEV-32633: additional post-merge changes for 10.5+	2024-06-03 09:48:13 +02:00
Denis Protivensky	0cc9b49751	MDEV-32633: Fix Galera cluster <-> native replication interaction It's possible to establish Galera multi-cluster setups connected through the native replication when every Galera cluster is configured to have a separate domain ID. For this setup to work, we need to replace domain ID values in generated GTID events when they are written at transaction commit to the values configured by Wsrep replication. At the same time, it's possible that the GTID event already contains a correct domain ID if it comes through the native replication from another Galera cluster. In this case, when such an event is applied either through a native replication slave thread or through Wsrep applier, we write GTID event on transaction start and avoid writing it during transaction commit. The code contained multiple problems that were fixed: - applying GTID events didn't work because it's applied without a running server transaction and Wsrep transaction was not started - GTID event generation on transaction start didn't contain proper "standalone" and "is_transactional" flags that the original applied GTID event contained - condition determining that GTID event is written on transaction start to avoid writing it on commit relied on the fact that the GTID event is the first found in transaction/statement caches, which wasn't the case and resulted in duplicate GTID events written - instead of relying on the caches to find a GTID event, a simple check is introduced that follows the exact rules for checking if event is written at transaction start as described above - the test case is improved to check that exact GTID events are applied after two Galera clusters have synced. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-06-03 09:48:13 +02:00
Sergei Golubchik	7b53672c63	Merge branch '10.5' into 10.6	2024-05-08 20:06:00 +02:00
Julius Goryavsky	52c45332a8	MDEV-34071: Failure during the galera_3nodes_sr.GCF-336 test This commit fixes sporadic failures in galera_3nodes_sr.GCF-336 test. The following changes have been made here: 1) A small addition to the test itself which should make it more deterministic by waiting for non-primary state before COMMIT; 2) More careful handling of the wsrep_ready variable in the server code (it should always be protected with mutex). No additional tests are required.	2024-05-06 03:16:59 +02:00
Marko Mäkelä	829cb1a49c	Merge 10.5 into 10.6	2024-04-17 14:14:58 +03:00
Oleksandr Byelkin	9b18275623	Merge branch '10.4' into 10.5	2024-04-16 11:04:14 +02:00
Marko Mäkelä	ccb7a1e9a1	Merge 10.5 into 10.6	2024-03-27 15:00:56 +02:00
Daniele Sciascia	c71dc39529	MDEV-26499 Fix error "mysql_shutdown failed" during MTR tests - Fix to avoid mysqltest client getting killed abruptly during mysql_shutdown(). When Galera replication is shutdown, wait for THDs with `thd->stmt_da()->is_eof()` to disconnect (these are about to disconnect anyway). - Extract duplicate code from `wsrep_stop_replication()` and `wsrep_shutdown_replication()` in a new function. - No need to use a custom `shutdown_mysqld.inc` in galera suite. Delete it, so that the one in `mysql-test/include/` is used. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-03-27 04:31:45 +01:00
Denis Protivensky	7bf3c3124a	MDEV-33136: Properly BF-abort user transactions with explicit locks User transactions may acquire explicit MDL locks from InnoDB level when persistent statistics is re-read for a table. If such a transaction would be subject to BF-abort, it was improperly detected as a system transaction and wouldn't get aborted. The fix: Check if a transaction holding explicit MDL locks is a user transaction in the MDL conflict handling code. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-03-27 01:25:22 +01:00
Jan Lindström	e9d334434d	MDEV-32787 : Assertion `!wsrep_has_changes(thd) \|\| (thd->lex->sql_command == SQLCOM_CREATE_TABLE && !thd->is_current_stmt_binlog_format_row()) \|\| thd->wsrep_cs().transaction().state() == wsrep::transaction::s_aborted' failed in void wsrep_commit_empty(THD*, bool) When we commit empty transaction we should allow wsrep transaction to be on s_must_replay state for DDL that was killed during certification. Fix is tested with RQG because deterministic mtr-testcase was not found. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-03-25 12:10:53 +01:00
Marko Mäkelä	8bd5a3de7f	Merge 10.5 into 10.6	2024-01-03 14:24:47 +02:00

1 2 3 4 5 ...

540 Commits