mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-08-07 00:04:31 +03:00

Author	SHA1	Message	Date
Oleksandr Byelkin	a8d4642375	Merge branch '10.11' into 11.4	2025-04-26 10:53:02 +02:00
ParadoxV5	1d5557d9c0	MDEV-36663 Semi-sync Replica Can't Kill Dump Thread When Using SSL When a replica stops an established semi-sync connection, it is supposed to kill the corresponding binlog dump thread on the primary server. However, when connections are configured to use SSL, this new connection created by the replica to kill the dump thread doesn't have any logic to configure SSL options, and thereby the connection can't be made, and the dump thread will never be killed. This patch adds logic to configure the semi-sync kill connection with SSL. The exising logic to set up the connection options for the regular connection was extracted into a function that the semi-sync kill connection invokes. Co-author: Brandon Nesterenko <brandon.nesterenko@mariadb.com>	2025-04-23 17:20:47 -06:00
Oleksandr Byelkin	20b818f45e	Merge branch '10.6' into 10.11	2025-04-21 11:23:11 +02:00
Monty	51c5b75335	Always call mysql_cond_broadcast(&rli->data_cond) under data_lock This is a safetly fix to try to fix random failures in parallel_backup_xa_debug reported as: sync_slave_with_master failed: 'select master_pos_wait('master-bin.000001', 1034, 300, '')' returned -1 One possible reason could be lost signals, which this patch fixes.	2025-04-19 11:03:43 +03:00
Marko Mäkelä	f5bd250f5b	Merge 10.11 into 11.4	2025-03-28 13:55:21 +02:00
Marko Mäkelä	ab0f2a00b6	Merge 10.6 into 10.11	2025-03-27 08:01:47 +02:00
Marko Mäkelä	191209d8ab	Merge 10.5 into 10.6	2025-03-26 17:09:57 +02:00
Kristian Nielsen	d931bb8174	MDEV-36287: Server crash in SHOW SLAVE STATUS concurrent with STOP SLAVE In SHOW SLAVE STATUS, do not access members of the SQL thread's THD without holding mi->run_lock. Otherwise the THD can go away in case of concurrent STOP SLAVE, leading to invalid memory references and server crash. Reviewed-by: Monty <monty@mariadb.org> Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-03-15 11:16:00 +01:00
Monty	1331c73243	Moved server_threads.erase(thd) to end of handle_slave_sql() The effect is that 'show processlist' will show the Slave SQL thread until the thread ends. This may help finding cases where the Slave SQL thread could hang for some time during the cleanup part. The Slave SQL thread will have the state "Slave SQL thread ending' during this stage. Reviewed-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-03-09 12:49:44 +02:00
Sergei Golubchik	7d657fda64	Merge branch '10.11 into 11.4	2025-01-30 12:01:11 +01:00
Sergei Golubchik	e69f8cae1a	Merge branch '10.6' into 10.11	2025-01-30 11:55:13 +01:00
Kristian Nielsen	72e1cc8f52	MDEV-35806: Error in read_log_event() corrupts relay log writer, crashes server In Log_event::read_log_event(), don't use IO_CACHE::error of the relay log's IO_CACHE to signal an error back to the caller. When reading the active relay log, this flag is also being used by the IO thread, and setting it can randomly cause the IO thread to wrongly detect IO error on writing and permanently disable the relay log. This was seen sporadically in test case rpl.rpl_from_mysql80. The read error set by the SQL thread in the IO_CACHE would be interpreted as a write error by the IO thread, which would cause it to throw a fatal error and close the relay log. And this would later cause CHANGE MASTER to try to purge a closed relay log, resulting in nullptr crash. SQL thread is not able to parse an event read from the relay log. This can happen like here when replicating unknown events from a MySQL master, potentially also for other reasons. Also fix a mistake in my_b_flush_io_cache() introduced back in 2001 (`fa09f2cd7e`) where my_b_flush_io_cache() could wrongly return an error set in IO_CACHE::error, even if the flush operation itself succeeded. Also fix another sporadic failure in rpl.rpl_from_mysql80 where the outout of MASTER_POS_WAIT() depended on timing of SQL and IO thread. Reviewed-by: Monty <monty@mariadb.org> Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-01-24 09:15:20 +00:00
Marko Mäkelä	2719cc4925	Merge 10.11 into 11.4	2024-12-02 11:35:34 +02:00
Marko Mäkelä	3d23adb766	Merge 10.6 into 10.11	2024-11-29 13:43:17 +02:00
Marko Mäkelä	7d4077cc11	Merge 10.5 into 10.6	2024-11-29 12:37:46 +02:00
Brandon Nesterenko	dbfee9fc2b	MDEV-34348: Consolidate cmp function declarations Partial commit of the greater MDEV-34348 scope. MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict The functions queue_compare, qsort2_cmp, and qsort_cmp2 all had similar interfaces, and were used interchangable and unsafely cast to one another. This patch consolidates the functions all into the qsort_cmp2 interface. Reviewed By: ============ Marko Mäkelä <marko.makela@mariadb.com>	2024-11-23 08:14:22 -07:00
ParadoxV5	cf2d49ddcf	Extract some of #3360 fixes to 10.5.x That PR uncovered countless issues on `my_snprintf` uses. This commit backports a squashed subset of their fixes.	2024-11-21 22:43:56 +11:00
Thirunarayanan Balathandayuthapani	074831ec61	Merge branch 10.5 into 10.6	2024-11-08 18:17:15 +05:30
Denis Protivensky	6d5fe9ed0d	MDEV-28378: Don't hang trying to peek log event past the end of log While applying CTAS log event, we peek the relay log to see if CTAS contains inserted rows or if it's empty. The peek function didn't check for end-of-file condition when tried to get the next event from the log, and thus it hanged. The fix includes checking for end-of-file while peeking for log events and considering returned XID_EVENT value as a sign of an empty CTAS. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-11-06 04:59:09 +01:00
Brandon Nesterenko	5290fa043b	MDEV-35109 PREP: simulate_delay_semisync_slave_reply use debug_sync This is a preparatory commit for MDEV-35109 to make its testing code cleaner (and harden other tests too). The DEBUG_DBUG point simulate_delay_semisync_slave_reply up to this patch used my_sleep() to delay an ACK response, but sleeps are prone to test failures on machines that run tests when already having a heavy load (e.g. on buildbot). This patch changes this DEBUG_DBUG sleep to use DEBUG_SYNC to coordinate exactly when a slave should send its reply, which is safer and faster. As DEBUG_SYNC can't be used while a server is shutting down, to synchronize threads with SHUTDOWN WAIT FOR SLAVES logic, we use and extend wait_for_pattern_in_file.inc to wait for an informational error message in the logic to indicate that the shutdown process has reached the intended state (i.e. indicating that the shutdown has been delayed to await semi-sync ACKs). Specifically, the extensions are as follows: 1. wait_for_pattern_in_file.inc is extended with parameter wait_for_pattern_count as a number that indicates the number of times a pattern should occur in the file before return control back to the calling script. 2. search_for_pattern_in_file.inc is extended with parameter SEARCH_ABORT_IS_SUCCESS to inverse the error/success logic, so the SEARCH_ABORT condition can be used to indicate success, rather than error.	2024-11-04 10:45:58 -07:00
Oleksandr Byelkin	c770bce898	Merge branch '11.2' into 11.4	2024-10-30 15:11:17 +01:00
Oleksandr Byelkin	69d033d165	Merge branch '10.11' into 11.2	2024-10-29 16:42:46 +01:00
Oleksandr Byelkin	3d0fb15028	Merge branch '10.6' into 10.11	2024-10-29 15:24:38 +01:00
Oleksandr Byelkin	f00711bba2	Merge branch '10.5' into 10.6	2024-10-29 14:20:03 +01:00
Monty	bddbef3573	MDEV-34533 asan error about stack overflow when writing record in Aria The problem was that when using clang + asan, we do not get a correct value for the thread stack as some local variables are not allocated at the normal stack. It looks like that for example clang 18.1.3, when compiling with -O2 -fsanitize=addressan it puts local variables and things allocated by alloca() in other areas than on the stack. The following code shows the issue Thread 6 "mariadbd" hit Breakpoint 3, do_handle_one_connection (connect=0x5080000027b8, put_in_cache=<optimized out>) at sql/sql_connect.cc:1399 THD thd; 1399 thd->thread_stack= (char) &thd; (gdb) p &thd (THD *) 0x7fffedee7060 (gdb) p $sp (void ) 0x7fffef4e7bc0 The address of thd is 24M away from the stack pointer (gdb) info reg ... rsp 0x7fffef4e7bc0 0x7fffef4e7bc0 ... r13 0x7fffedee7060 140737185214560 r13 is pointing to the address of the thd. Probably some kind of "local stack" used by the sanitizer I have verified this with gdb on a recursive call that calls alloca() in a loop. In this case all objects was stored in a local heap, not on the stack. To solve this issue in a portable way, I have added two functions: my_get_stack_pointer() returns the address of the current stack pointer. The code is using asm instructions for intel 32/64 bit, powerpc, arm 32/64 bit and sparc 32/64 bit. Supported compilers are gcc, clang and MSVC. For MSVC 64 bit we are using _AddressOfReturnAddress() As a fallback for other compilers/arch we use the address of a local variable. my_get_stack_bounds() that will return the address of the base stack and stack size using pthread_attr_getstack() or NtCurrentTed() with fallback to using the address of a local variable and user provided stack size. Server changes are: - Moving setting of thread_stack to THD::store_globals() using my_get_stack_bounds(). - Removing setting of thd->thread_stack, except in functions that allocates a lot on the stack before calling store_globals(). When using estimates for stack start, we reduce stack_size with MY_STACK_SAFE_MARGIN (8192) to take into account the stack used before calling store_globals(). I also added a unittest, stack_allocation-t, to verify the new code. Reviewed-by: Sergei Golubchik <serg@mariadb.org>	2024-10-16 17:24:46 +03:00
Marko Mäkelä	b53b81e937	Merge 11.2 into 11.4	2024-10-03 14:32:14 +03:00
Marko Mäkelä	12a91b57e2	Merge 10.11 into 11.2	2024-10-03 13:24:43 +03:00
Marko Mäkelä	63913ce5af	Merge 10.6 into 10.11	2024-10-03 10:55:08 +03:00
Marko Mäkelä	7e0afb1c73	Merge 10.5 into 10.6	2024-10-03 09:31:39 +03:00
Brandon Nesterenko	68938d2b42	MDEV-33500 (part 2): rpl.rpl_parallel_sbm can still fail The failing test case validates Seconds_Behind_Master for a delayed slave, while STOP SLAVE is executed during a delay. The test fixes initially added to the test (commit `b04c857596`) added a table lock to ensure a transaction could not finish before validating the Seconds_Behind_Master field after SLAVE START, but did not address a possibility that the transaction could finish before running the STOP SLAVE command, which invalidates the validations for the rest of the test case. Specifically, this would result in 1) a timeout in “Waiting for table metadata lock” on the replica, which expects the transaction to retry after slave restart and hit a lock conflict on the locked tables (added in `b04c857596`), and 2) that Seconds_Behind_Master should have increased, but did not. The failure can be reproduced by synchronizing the slave to the master before the MDEV-32265 echo statement (i.e. before the SLAVE STOP). This patch fixes the test by adding a mechanism to use DEBUG_SYNC to synchronize a MASTER_DELAY, rather than continually increase the duration of the delay each time the test fails on buildbot. This is to ensure that on slow machines, a delay does not pass before the test gets a chance to validate results. Additionally, it decreases overall test time because the test can continue immediately after validation, thereby bypassing the remainder of a full delay for each transaction.	2024-09-17 06:29:20 -06:00
Oleksandr Byelkin	1640c9b06e	Merge branch '11.2' into 11.4	2024-08-04 17:27:48 +02:00
Oleksandr Byelkin	80abd847da	Merge branch '10.11' into 11.1	2024-08-03 09:32:42 +02:00
Oleksandr Byelkin	0fe39d368a	Merge branch '10.6' into 10.11	2024-07-22 15:14:50 +02:00
Monty	e0cff1e72b	Fixed failure in rpl.rpl_change_master_demote : "IO thread should not be running..." The issue was that the test did not take into account that the IO thread could have been in COMMAND=Connecting state, which happens before the COMMANMD=Slave_IO state. The test is a bit fragile as it depends on the COMMAND state to be syncronised with the Slave_IO_State, which is not the case. I added a new proc state and some more information to the error output to be able to diagnose future failures more easily.	2024-07-11 11:15:47 +03:00
Julius Goryavsky	4026f04425	Merge branch 10.5 into 10.6	2024-07-09 11:56:47 +02:00
Brandon Nesterenko	744580d5a7	MDEV-32892: IO Thread Reports False Error When Stopped During Connecting to Primary The IO thread can report error code 2013 into the error log when it is stopped during the initial connection process to the primary, as well as when trying to read an event. However, because the IO thread is being stopped, its connection to the primary is force-killed by the signaling thread (see THD::awake_no_mutex()), and thereby these connection errors should be ignored. Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org>	2024-07-08 10:39:17 -06:00
Oleksandr Byelkin	99b370e023	Merge branch '11.2' into 11.4	2024-05-21 19:38:51 +02:00
Sergei Golubchik	f9807aadef	Merge branch '10.11' into 11.0	2024-05-12 12:18:28 +02:00
Sergei Golubchik	a6b2f820e0	Merge branch '10.6' into 10.11	2024-05-10 20:02:18 +02:00
Sergei Golubchik	7b53672c63	Merge branch '10.5' into 10.6	2024-05-08 20:06:00 +02:00
Julius Goryavsky	52c45332a8	MDEV-34071: Failure during the galera_3nodes_sr.GCF-336 test This commit fixes sporadic failures in galera_3nodes_sr.GCF-336 test. The following changes have been made here: 1) A small addition to the test itself which should make it more deterministic by waiting for non-primary state before COMMIT; 2) More careful handling of the wsrep_ready variable in the server code (it should always be protected with mutex). No additional tests are required.	2024-05-06 03:16:59 +02:00
Sergei Golubchik	cea083af9f	cleanup: use THD_STAGE_INFO, not thd_proc_info and put master-slave.inc last in the series of includes	2024-05-05 21:37:07 +02:00
Sergei Golubchik	9cf718859f	cleanup: use THD_STAGE_INFO, not thd_proc_info and put master-slave.inc last in the series of includes	2024-04-24 22:08:52 +02:00
Sergei Golubchik	018d537ec1	Merge branch '10.6' into 10.11	2024-04-22 15:23:10 +02:00
Sergei Golubchik	d8368ae289	Merge '10.5' into 10.6	2024-04-20 14:47:26 +02:00
Kristian Nielsen	57f6a1ca98	MDEV-19415: use-after-free on charsets_dir from slave connect The slave IO thread sets MYSQL_SET_CHARSET_DIR. The code for this option however is not thread-safe in sql-common/client.c. The value set is temporarily written to mysys global variable `charsets-dir` and can be seen by other threads running in parallel, which can result in use-after-free error. Problem was visible as random failures of test cases in suite multi_source with Valgrind or MSAN. Work-around by not setting this option for slave connect, it is redundant anyway as it is just setting the default value. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-04-20 13:41:08 +02:00
Monty	da47c0370d	Fixed calculating of last_master_timestamp for parallel replication. This effects the Seconds_Behind_Master value.	2024-04-10 17:01:24 +03:00
Marko Mäkelä	fec2fd6add	Merge 10.11 into 11.0	2024-03-28 10:51:36 +02:00
Marko Mäkelä	788953463d	Merge 10.6 into 10.11 Some fixes related to commit `f838b2d799` and Rows_log_event::do_apply_event() and Update_rows_log_event::do_exec_row() for system-versioned tables were provided by Nikita Malyavin. This was required by test versioning.rpl,trx_id,row.	2024-03-28 09:16:57 +02:00
Marko Mäkelä	b8a6719889	MDEV-26642/MDEV-26643/MDEV-32898 Implement innodb_snapshot_isolation https://jepsen.io/analyses/mysql-8.0.34 highlights that the transaction isolation levels in the InnoDB storage engine do not correspond to any widely accepted definitions, such as "Generalized Isolation Level Definitions" https://pmg.csail.mit.edu/papers/icde00.pdf (PL-1 = READ UNCOMMITTED, PL-2 = READ COMMITTED, PL-2.99 = REPEATABLE READ, PL-3 = SERIALIZABLE). Only READ UNCOMMITTED in InnoDB seems to match the above definition. The issue is that InnoDB does not detect write/write conflicts (Section 4.4.3, Definition 6) in the above. It appears that as soon as we implement write/write conflict detection (SET SESSION innodb_snapshot_isolation=ON), the default isolation level (SET TRANSACTION ISOLATION LEVEL REPEATABLE READ) will become Snapshot Isolation (similar to Postgres), as defined in Section 4.2 of "A Critique of ANSI SQL Isolation Levels", MSR-TR-95-51, June 1995 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf Locking reads inside InnoDB used to read the latest committed version, ignoring what should actually be visible to the transaction. The added test innodb.lock_isolation illustrates this. The statement UPDATE t SET a=3 WHERE b=2; is executed in a transaction that was started before a read view or a snapshot of the current transaction was created, and committed before the current transaction attempts to execute UPDATE t SET b=3; If SET innodb_snapshot_isolation=ON is in effect when the second transaction was started, the second transaction will be aborted with the error ER_CHECKREAD. By default (innodb_snapshot_isolation=OFF), the second transaction would execute inconsistently, displaying an incorrect SELECT COUNT(*) FROM t in its read view. If innodb_snapshot_isolation=ON, if an attempt to acquire a lock on a record that does not exist in the current read view is made, an error DB_RECORD_CHANGED (HA_ERR_RECORD_CHANGED, ER_CHECKREAD) will be raised. This error will be treated in the same way as a deadlock: the transaction will be rolled back. lock_clust_rec_read_check_and_lock(): If the current transaction has a read view where the record is not visible and innodb_snapshot_isolation=ON, fail before trying to acquire the lock. row_sel_build_committed_vers_for_mysql(): If innodb_snapshot_isolation=ON, disable the "semi-consistent read" logic that had been implemented by myself on the directions of Heikki Tuuri in order to address https://bugs.mysql.com/bug.php?id=3300 that was motivated by a customer wanting UPDATE to skip locked rows that do not match the WHERE condition. It looks like my changes were included in the MySQL 5.1.5 commit ad126d90e019f223470e73e1b2b528f9007c4532; at that time, employees of Innobase Oy (a recent acquisition of Oracle) had lost write access to the repository. The only reason why we set innodb_snapshot_isolation=OFF by default is backward compatibility with applications, such as the one that motivated the implementation of "semi-consistent read" back in 2005. In a later major release, we can default to innodb_snapshot_isolation=ON. Thanks to Peter Alvaro, Kyle Kingsbury and Alexey Gotsman for their work on https://github.com/jepsen-io/ and to Kyle and Alexey for explanations and some testing of this fix. Thanks to Vladislav Lesin for the initial test for MDEV-26643, as well as reviewing these changes.	2024-03-20 09:48:03 +02:00

1 2 3 4 5 ...

2838 Commits