mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-08-08 11:22:35 +03:00

Author	SHA1	Message	Date
Apostolis Stamatis	247e2f8d4d	MDEV-29499 Improving the 'Can't execute init_slave query' error message with the actual failure Currently, there are multiple error codes reported for the issue 'Can't execute init_slave query'. Those error codes are the underlying reason the init_slave query cannot be executed, but this makes it difficult to detect the issue in an automated way. This patch introduces a new error code, ER_INIT_SLAVE_ERROR, to unify all the errors related to the init_slave query. The ER_INIT_SLAVE_ERROR error is raised for any issue related to the init_slave query, and the underlying error code and message are included in the Last_SQL_Error field. Reviewed by: Jimmy Hu <jimmy.hu@mariadb.com> Brandon Nesterenko <brandon.nesterenko@mariadb.com>	2025-06-13 15:28:38 -06:00
Oleksandr Byelkin	f1102da37a	Merge branch '11.8' into 12.0	2025-05-22 09:22:55 +02:00
Vasilii Lakhin	40c5b62531	Fix remaining typos	2025-04-29 11:18:00 +10:00
Sergei Golubchik	237e24497b	Merge remote-tracking branch 'github/bb-11.4-release' into bb-11.8-serg	2025-04-27 19:40:00 +02:00
Oleksandr Byelkin	a8d4642375	Merge branch '10.11' into 11.4	2025-04-26 10:53:02 +02:00
ParadoxV5	1d5557d9c0	MDEV-36663 Semi-sync Replica Can't Kill Dump Thread When Using SSL When a replica stops an established semi-sync connection, it is supposed to kill the corresponding binlog dump thread on the primary server. However, when connections are configured to use SSL, this new connection created by the replica to kill the dump thread doesn't have any logic to configure SSL options, and thereby the connection can't be made, and the dump thread will never be killed. This patch adds logic to configure the semi-sync kill connection with SSL. The exising logic to set up the connection options for the regular connection was extracted into a function that the semi-sync kill connection invokes. Co-author: Brandon Nesterenko <brandon.nesterenko@mariadb.com>	2025-04-23 17:20:47 -06:00
ParadoxV5	cbd6755869	MDEV-27669: Add `skip-slave-start` info message When a slave does not start up the slave threads on restart, but not reporting anything to the error log about startup failures either, this can be due to `skip-slave-start` being set in the config file(s) or on the command line (and most likely is). Reviewed-by: Sergei Golubchik <serg@mariadb.org>	2025-04-22 17:39:13 -06:00
Oleksandr Byelkin	20b818f45e	Merge branch '10.6' into 10.11	2025-04-21 11:23:11 +02:00
Monty	51c5b75335	Always call mysql_cond_broadcast(&rli->data_cond) under data_lock This is a safetly fix to try to fix random failures in parallel_backup_xa_debug reported as: sync_slave_with_master failed: 'select master_pos_wait('master-bin.000001', 1034, 300, '')' returned -1 One possible reason could be lost signals, which this patch fixes.	2025-04-19 11:03:43 +03:00
Sergei Golubchik	9b824e62d4	Merge branch '11.8' into main	2025-04-18 17:11:01 +02:00
Marko Mäkelä	bb1d88b6dc	Merge 11.4 into 11.8	2025-04-02 14:07:01 +03:00
Marko Mäkelä	f5bd250f5b	Merge 10.11 into 11.4	2025-03-28 13:55:21 +02:00
Marko Mäkelä	ab0f2a00b6	Merge 10.6 into 10.11	2025-03-27 08:01:47 +02:00
Marko Mäkelä	191209d8ab	Merge 10.5 into 10.6	2025-03-26 17:09:57 +02:00
Kristian Nielsen	d931bb8174	MDEV-36287: Server crash in SHOW SLAVE STATUS concurrent with STOP SLAVE In SHOW SLAVE STATUS, do not access members of the SQL thread's THD without holding mi->run_lock. Otherwise the THD can go away in case of concurrent STOP SLAVE, leading to invalid memory references and server crash. Reviewed-by: Monty <monty@mariadb.org> Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-03-15 11:16:00 +01:00
Sergey Vojtovich	feb1cf9086	Corrections to parent "fix typos" commmit	2025-03-14 12:08:56 +04:00
Vasilii Lakhin	717c12de0e	Fix typos in C comments inside sql/	2025-03-14 12:08:56 +04:00
Monty	1331c73243	Moved server_threads.erase(thd) to end of handle_slave_sql() The effect is that 'show processlist' will show the Slave SQL thread until the thread ends. This may help finding cases where the Slave SQL thread could hang for some time during the cleanup part. The Slave SQL thread will have the state "Slave SQL thread ending' during this stage. Reviewed-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-03-09 12:49:44 +02:00
ParadoxV5	5091986cea	misc. `sql/slave.cc` & co. refactor * `get_master_version_and_clock()` de-duplicate label using fall-through * `io_slave_killed()` & `check_io_slave_killed()`: * reüse the result from the level lower * add distinguishing docs * `try_to_reconnect()`: extract `'` from `if`-`else` * `handle_slave_io()`: Both `while`s have the same condition; looks like the outer `while` can simply be an `if`. * `connect_to_master()`: * assume `mysql_errno()` is not 0 on connection error * utilize 0’s falsiness in the loop * extend docs * `sql/sql_show.cc`: refactor SHOW ALL REPLICAS filter’s condition * `sql/mysqld.cc`: init `master-retry-count` with `master_retry_count` Reviewed-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-02-26 20:37:53 -07:00
ParadoxV5	e2dbd9b6ac	MDEV-35304: Add `Connects_Tried` and `Master_Retry_Count` to SSS When the IO thread (re)connect to a primary, no updates are available besides unique errors that cause the failure. These new `Master_info` numbers supplement SHOW SLAVE STATUS’s (most- recent) ‘Connecting’ state with statistics on (re)connect attempts: * `Connects_Tried`: how many retries have been attempted so far This was previously a local variable that only counted re-attempts; it’s now meaningful even after the “Connecting” state concludes. * `Master_Retry_Count` (from MDEV-25674): out of how many configured Side-note: Some of the tests updated by this commit dump the entire SHOW SLAVE STATUS, which might include non-deterministic entries. Reviewed-by: Kristian Nielsen <knielsen@knielsen-hq.org> Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>	2025-02-26 20:37:53 -07:00
ParadoxV5	7094a75596	MDEV-25674: Add CHANGE MASTER TO master_retry_count This new CHANGE MASTER TO field specifies the `--master-retry-count` (global option: the number of Primary connection attempts) for each multi-source replica; i.e, per-channel `performance_schema.` `replication_connection_configuration.CONNECTION_RETRY_COUNT`. `--master-retry-count` remains the default for new `CHANGE MASTER TO`s. This new keyword and `master-info` entry matches those of pre-‘REPLICATION SOURCE’ MySQL.	2025-02-26 20:37:53 -07:00
ParadoxV5	66f52ba630	slave.cc `try_to_reconnect` remove `retry_counter` `try_to_reconnect()` wraps `safe_reconnect()` with logging, but the latter already loops reconnection attempts up to `master_retry_count` times with `mi->connect_retry`-msec sleeps inbetween. This means `try_to_reconnect()` has been counting disconnects of the replication session (since it doesn’t loop) while `safe_reconnect()` was counting actual tries (which may be multiple per disconnect). In practice, this outer counter’s only benefit was to override the edge case `--master-retry-count=0` that the inner loop doesn’t cover with 1.	2025-02-26 20:37:53 -07:00
Sergei Golubchik	ba01c2aaf0	Merge branch '11.4' into 11.7 * rpl.rpl_system_versioning_partitions updated for MDEV-32188 * innodb.row_size_error_log_warnings_3 changed error for MDEV-33658 (checks are done in a different order)	2025-02-06 16:46:36 +01:00
ParadoxV5	697b88bf75	SHOW REPLICA STATUS: mark columns as unsigned Update all integer columns of SHOW REPLICA STATUS (technically INFORMATION_SCHEMA.SLAVE_STATUS) to unsigned because, well, they are (:. Some `uint32` ones were accidentally using the `Field::store(double nr)` overload because they forgot the `, true` for `Field::store(longlong nr, bool unsigned_val)`. The mistake’s harmless, fortunately, as `double` supports over 15 significant decimal digits, well over `uint32`’s 9-and-a-half.	2025-01-31 20:56:41 -07:00
Sergei Golubchik	7d657fda64	Merge branch '10.11 into 11.4	2025-01-30 12:01:11 +01:00
Sergei Golubchik	e69f8cae1a	Merge branch '10.6' into 10.11	2025-01-30 11:55:13 +01:00
Kristian Nielsen	72e1cc8f52	MDEV-35806: Error in read_log_event() corrupts relay log writer, crashes server In Log_event::read_log_event(), don't use IO_CACHE::error of the relay log's IO_CACHE to signal an error back to the caller. When reading the active relay log, this flag is also being used by the IO thread, and setting it can randomly cause the IO thread to wrongly detect IO error on writing and permanently disable the relay log. This was seen sporadically in test case rpl.rpl_from_mysql80. The read error set by the SQL thread in the IO_CACHE would be interpreted as a write error by the IO thread, which would cause it to throw a fatal error and close the relay log. And this would later cause CHANGE MASTER to try to purge a closed relay log, resulting in nullptr crash. SQL thread is not able to parse an event read from the relay log. This can happen like here when replicating unknown events from a MySQL master, potentially also for other reasons. Also fix a mistake in my_b_flush_io_cache() introduced back in 2001 (`fa09f2cd7e`) where my_b_flush_io_cache() could wrongly return an error set in IO_CACHE::error, even if the flush operation itself succeeded. Also fix another sporadic failure in rpl.rpl_from_mysql80 where the outout of MASTER_POS_WAIT() depended on timing of SQL and IO thread. Reviewed-by: Monty <monty@mariadb.org> Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-01-24 09:15:20 +00:00
Marko Mäkelä	33907f9ec6	Merge 11.4 into 11.7	2024-12-02 17:51:17 +02:00
Marko Mäkelä	2719cc4925	Merge 10.11 into 11.4	2024-12-02 11:35:34 +02:00
Marko Mäkelä	3d23adb766	Merge 10.6 into 10.11	2024-11-29 13:43:17 +02:00
Marko Mäkelä	7d4077cc11	Merge 10.5 into 10.6	2024-11-29 12:37:46 +02:00
Brandon Nesterenko	dbfee9fc2b	MDEV-34348: Consolidate cmp function declarations Partial commit of the greater MDEV-34348 scope. MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict The functions queue_compare, qsort2_cmp, and qsort_cmp2 all had similar interfaces, and were used interchangable and unsafely cast to one another. This patch consolidates the functions all into the qsort_cmp2 interface. Reviewed By: ============ Marko Mäkelä <marko.makela@mariadb.com>	2024-11-23 08:14:22 -07:00
ParadoxV5	cf2d49ddcf	Extract some of #3360 fixes to 10.5.x That PR uncovered countless issues on `my_snprintf` uses. This commit backports a squashed subset of their fixes.	2024-11-21 22:43:56 +11:00
Thirunarayanan Balathandayuthapani	074831ec61	Merge branch 10.5 into 10.6	2024-11-08 18:17:15 +05:30
Oleksandr Byelkin	9e1fb104a3	Merge tag '11.4' into 11.6 MariaDB 11.4.4 release	2024-11-08 07:17:00 +01:00
Denis Protivensky	6d5fe9ed0d	MDEV-28378: Don't hang trying to peek log event past the end of log While applying CTAS log event, we peek the relay log to see if CTAS contains inserted rows or if it's empty. The peek function didn't check for end-of-file condition when tried to get the next event from the log, and thus it hanged. The fix includes checking for end-of-file while peeking for log events and considering returned XID_EVENT value as a sign of an empty CTAS. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2024-11-06 04:59:09 +01:00
Brandon Nesterenko	5290fa043b	MDEV-35109 PREP: simulate_delay_semisync_slave_reply use debug_sync This is a preparatory commit for MDEV-35109 to make its testing code cleaner (and harden other tests too). The DEBUG_DBUG point simulate_delay_semisync_slave_reply up to this patch used my_sleep() to delay an ACK response, but sleeps are prone to test failures on machines that run tests when already having a heavy load (e.g. on buildbot). This patch changes this DEBUG_DBUG sleep to use DEBUG_SYNC to coordinate exactly when a slave should send its reply, which is safer and faster. As DEBUG_SYNC can't be used while a server is shutting down, to synchronize threads with SHUTDOWN WAIT FOR SLAVES logic, we use and extend wait_for_pattern_in_file.inc to wait for an informational error message in the logic to indicate that the shutdown process has reached the intended state (i.e. indicating that the shutdown has been delayed to await semi-sync ACKs). Specifically, the extensions are as follows: 1. wait_for_pattern_in_file.inc is extended with parameter wait_for_pattern_count as a number that indicates the number of times a pattern should occur in the file before return control back to the calling script. 2. search_for_pattern_in_file.inc is extended with parameter SEARCH_ABORT_IS_SUCCESS to inverse the error/success logic, so the SEARCH_ABORT condition can be used to indicate success, rather than error.	2024-11-04 10:45:58 -07:00
Oleksandr Byelkin	c770bce898	Merge branch '11.2' into 11.4	2024-10-30 15:11:17 +01:00
Oleksandr Byelkin	69d033d165	Merge branch '10.11' into 11.2	2024-10-29 16:42:46 +01:00
Oleksandr Byelkin	3d0fb15028	Merge branch '10.6' into 10.11	2024-10-29 15:24:38 +01:00
Oleksandr Byelkin	f00711bba2	Merge branch '10.5' into 10.6	2024-10-29 14:20:03 +01:00
Monty	bddbef3573	MDEV-34533 asan error about stack overflow when writing record in Aria The problem was that when using clang + asan, we do not get a correct value for the thread stack as some local variables are not allocated at the normal stack. It looks like that for example clang 18.1.3, when compiling with -O2 -fsanitize=addressan it puts local variables and things allocated by alloca() in other areas than on the stack. The following code shows the issue Thread 6 "mariadbd" hit Breakpoint 3, do_handle_one_connection (connect=0x5080000027b8, put_in_cache=<optimized out>) at sql/sql_connect.cc:1399 THD thd; 1399 thd->thread_stack= (char) &thd; (gdb) p &thd (THD *) 0x7fffedee7060 (gdb) p $sp (void ) 0x7fffef4e7bc0 The address of thd is 24M away from the stack pointer (gdb) info reg ... rsp 0x7fffef4e7bc0 0x7fffef4e7bc0 ... r13 0x7fffedee7060 140737185214560 r13 is pointing to the address of the thd. Probably some kind of "local stack" used by the sanitizer I have verified this with gdb on a recursive call that calls alloca() in a loop. In this case all objects was stored in a local heap, not on the stack. To solve this issue in a portable way, I have added two functions: my_get_stack_pointer() returns the address of the current stack pointer. The code is using asm instructions for intel 32/64 bit, powerpc, arm 32/64 bit and sparc 32/64 bit. Supported compilers are gcc, clang and MSVC. For MSVC 64 bit we are using _AddressOfReturnAddress() As a fallback for other compilers/arch we use the address of a local variable. my_get_stack_bounds() that will return the address of the base stack and stack size using pthread_attr_getstack() or NtCurrentTed() with fallback to using the address of a local variable and user provided stack size. Server changes are: - Moving setting of thread_stack to THD::store_globals() using my_get_stack_bounds(). - Removing setting of thd->thread_stack, except in functions that allocates a lot on the stack before calling store_globals(). When using estimates for stack start, we reduce stack_size with MY_STACK_SAFE_MARGIN (8192) to take into account the stack used before calling store_globals(). I also added a unittest, stack_allocation-t, to verify the new code. Reviewed-by: Sergei Golubchik <serg@mariadb.org>	2024-10-16 17:24:46 +03:00
Marko Mäkelä	43465352b9	Merge 11.4 into 11.6	2024-10-03 16:09:56 +03:00
Marko Mäkelä	b53b81e937	Merge 11.2 into 11.4	2024-10-03 14:32:14 +03:00
Marko Mäkelä	12a91b57e2	Merge 10.11 into 11.2	2024-10-03 13:24:43 +03:00
Marko Mäkelä	63913ce5af	Merge 10.6 into 10.11	2024-10-03 10:55:08 +03:00
Marko Mäkelä	7e0afb1c73	Merge 10.5 into 10.6	2024-10-03 09:31:39 +03:00
Brandon Nesterenko	68938d2b42	MDEV-33500 (part 2): rpl.rpl_parallel_sbm can still fail The failing test case validates Seconds_Behind_Master for a delayed slave, while STOP SLAVE is executed during a delay. The test fixes initially added to the test (commit `b04c857596`) added a table lock to ensure a transaction could not finish before validating the Seconds_Behind_Master field after SLAVE START, but did not address a possibility that the transaction could finish before running the STOP SLAVE command, which invalidates the validations for the rest of the test case. Specifically, this would result in 1) a timeout in “Waiting for table metadata lock” on the replica, which expects the transaction to retry after slave restart and hit a lock conflict on the locked tables (added in `b04c857596`), and 2) that Seconds_Behind_Master should have increased, but did not. The failure can be reproduced by synchronizing the slave to the master before the MDEV-32265 echo statement (i.e. before the SLAVE STOP). This patch fixes the test by adding a mechanism to use DEBUG_SYNC to synchronize a MASTER_DELAY, rather than continually increase the duration of the delay each time the test fails on buildbot. This is to ensure that on slow machines, a delay does not pass before the test gets a chance to validate results. Additionally, it decreases overall test time because the test can continue immediately after validation, thereby bypassing the remainder of a full delay for each transaction.	2024-09-17 06:29:20 -06:00
Oleksandr Byelkin	d6444022ca	Merge branch 'bb-11.5-release' into bb-11.6-release	2024-08-06 17:28:38 +02:00
Oleksandr Byelkin	ea75a0b600	Merge branch '11.4' into 11.5	2024-08-05 17:50:18 +02:00

1 2 3 4 5 ...

2867 Commits