mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-11-16 20:23:18 +03:00

Author	SHA1	Message	Date
Oleksandr Byelkin	9e1fb104a3	Merge tag '11.4' into 11.6 MariaDB 11.4.4 release	2024-11-08 07:17:00 +01:00
Oleksandr Byelkin	3d0fb15028	Merge branch '10.6' into 10.11	2024-10-29 15:24:38 +01:00
Oleksandr Byelkin	f00711bba2	Merge branch '10.5' into 10.6	2024-10-29 14:20:03 +01:00
Monty	bddbef3573	MDEV-34533 asan error about stack overflow when writing record in Aria The problem was that when using clang + asan, we do not get a correct value for the thread stack as some local variables are not allocated at the normal stack. It looks like that for example clang 18.1.3, when compiling with -O2 -fsanitize=addressan it puts local variables and things allocated by alloca() in other areas than on the stack. The following code shows the issue Thread 6 "mariadbd" hit Breakpoint 3, do_handle_one_connection (connect=0x5080000027b8, put_in_cache=<optimized out>) at sql/sql_connect.cc:1399 THD thd; 1399 thd->thread_stack= (char) &thd; (gdb) p &thd (THD *) 0x7fffedee7060 (gdb) p $sp (void ) 0x7fffef4e7bc0 The address of thd is 24M away from the stack pointer (gdb) info reg ... rsp 0x7fffef4e7bc0 0x7fffef4e7bc0 ... r13 0x7fffedee7060 140737185214560 r13 is pointing to the address of the thd. Probably some kind of "local stack" used by the sanitizer I have verified this with gdb on a recursive call that calls alloca() in a loop. In this case all objects was stored in a local heap, not on the stack. To solve this issue in a portable way, I have added two functions: my_get_stack_pointer() returns the address of the current stack pointer. The code is using asm instructions for intel 32/64 bit, powerpc, arm 32/64 bit and sparc 32/64 bit. Supported compilers are gcc, clang and MSVC. For MSVC 64 bit we are using _AddressOfReturnAddress() As a fallback for other compilers/arch we use the address of a local variable. my_get_stack_bounds() that will return the address of the base stack and stack size using pthread_attr_getstack() or NtCurrentTed() with fallback to using the address of a local variable and user provided stack size. Server changes are: - Moving setting of thread_stack to THD::store_globals() using my_get_stack_bounds(). - Removing setting of thd->thread_stack, except in functions that allocates a lot on the stack before calling store_globals(). When using estimates for stack start, we reduce stack_size with MY_STACK_SAFE_MARGIN (8192) to take into account the stack used before calling store_globals(). I also added a unittest, stack_allocation-t, to verify the new code. Reviewed-by: Sergei Golubchik <serg@mariadb.org>	2024-10-16 17:24:46 +03:00
Marko Mäkelä	a5b80531fb	Merge 11.4 into 11.6	2024-09-04 10:38:25 +03:00
Marko Mäkelä	cfcf27c6fe	Merge 10.6 into 10.11	2024-08-29 07:47:29 +03:00
Marko Mäkelä	48becffd07	Merge 10.5 into 10.6	2024-08-27 08:52:10 +03:00
Kristian Nielsen	b4c2e23954	MDEV-34696: do_gco_wait() completes too early on InnoDB dict stats updates Before doing mark_start_commit(), check that there is no pending deadlock kill. If there is a pending kill, we won't commit (we will abort, roll back, and retry). Then we should not mark the commit as started, since that could potentially make the following GCO start too early, before we completed the commit after the retry. This condition could trigger in some corner cases, where InnoDB would take temporarily table/row locks that are released again immediately, not held until the transaction commits. This happens with dict_stats updates and possibly auto-increment locks. Such locks can be passed to thd_rpl_deadlock_check() and cause a deadlock kill to be scheduled in the background. But since the blocking locks are held only temporarily, they can be released before the background kill happens. This way, the kill can be delayed until after mark_start_commit() has been called. Thus we need to check the synchronous indication rgi->killed_for_retry, not just the asynchroneous thd->killed. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-08-26 14:39:24 +02:00
Monty	25b5c63905	MDEV-33856: Alternative Replication Lag Representation via Received/Executed Master Binlog Event Timestamps This commit adds 3 new status variables to 'show all slaves status': - Master_last_event_time ; timestamp of the last event read from the master by the IO thread. - Slave_last_event_time ; Master timestamp of the last event committed on the slave. - Master_Slave_time_diff: The difference of the above two timestamps. All the above variables are NULL until the slave has started and the slave has read one query event from the master that changes data. - Added information_schema.slave_status, which allows us to remove: - show_master_info(), show_master_info_get_fields(), send_show_master_info_data(), show_all_master_info() - class Sql_cmd_show_slave_status. - Protocol::store(I_List<i_string_pair>* str_list) as it is not used anymore. - Changed old SHOW SLAVE STATUS and SHOW ALL SLAVES STATUS to use the SELECT code path, as all other SHOW ... STATUS commands. Other things: - Xid_log_time is set to time of commit to allow slave that reads the binary log to calculate Master_last_event_time and Slave_last_event_time. This is needed as there is not 'exec_time' for row events. - Fixed that Load_log_event calculates exec_time identically to Query_event. - Updated RESET SLAVE to reset Master/Slave_last_event_time - Updated SQL thread's update on first transaction read-in to only update Slave_last_event_time on group events. - Fixed possible (unlikely) bugs in sql_show.cc ...old_format() functions if allocation of 'field' would fail. Reviewed By: Brandon Nesterenko <brandon.nesterenko@mariadb.com> Kristian Nielsen <knielsen@knielsen-hq.org>	2024-07-25 08:57:27 -06:00
Vladislav Vaintroub	186a1afe63	MDEV-32537 due to Linux, restrict thread name to 15 characters, also in PS. Rename some threads to workaround this restrictions, e.g "rpl_parallel_thread"->"rpl_parallel", "slave_background" -> "slave_bg" etc.	2024-07-09 13:20:49 +02:00
Vladislav Vaintroub	5bd0516488	MDEV-32537 Name threads to improve debugging experience and diagnostics. Use SetThreadDescription/pthread_setname_np to give threads a name.	2024-07-09 13:17:20 +02:00
Vladislav Vaintroub	584fc85e21	MDEV-32537 Name threads to improve debugging experience and diagnostics. Use SetThreadDescription/pthread_setname_np to give threads a name.	2024-07-09 13:17:20 +02:00
Alexander Barkov	8f4ec79d09	Merge remote-tracking branch 'origin/11.4' into 11.5	2024-07-08 12:25:04 +04:00
Marko Mäkelä	22ba7e4ff8	Merge 10.6 into 10.11	2024-05-30 16:04:00 +03:00
Marko Mäkelä	5ba542e9ee	Merge 10.5 into 10.6	2024-05-30 14:27:07 +03:00
Monty	2464ee758a	MDEV-33655 Remove alter_algorithm Remove alter_algorithm but keep the variable as no-op (with a warning). The reasons for removing alter_algorithm are: - alter_algorithm was introduced as a replacement for the old_alter_table that was used to force the usage of the original alter table algorithm (copy) in the cases where the new alter algorithm did not work. The new option was added as a way to force the usage of a specific algorithm when it should instead have made it possible to disable algorithms that would not work for some reason. - alter_algorithm introduced some cases where ALTER TABLE would not work without specifying the ALGORITHM=XXX option together with ALTER TABLE. - Having different values of alter_algorithm on master and slave could cause slave to stop unexpectedly. - ALTER TABLE FORCE, as used by mariadb-upgrade, would not always work if alter_algorithm was set for the server. - As part of the MDEV-33449 "improving repair of tables" it become clear that alter- algorithm made it harder to provide a better and more consistent ALTER TABLE FORCE and REPAIR TABLE and it would be better to remove it.	2024-05-27 12:39:03 +02:00
Monty	dfdedd46e4	MDEV-32188 make TIMESTAMP use whole 32-bit unsigned range This patch extends the timestamp from 2038-01-19 03:14:07.999999 to 2106-02-07 06:28:15.999999 for 64 bit hardware and OS where 'long' is 64 bits. This is true for 64 bit Linux but not for Windows. This is done by treating the 32 bit stored int as unsigned instead of signed. This is safe as MariaDB has never accepted dates before the epoch (1970). The benefit of this approach that for normal timestamp the storage is compatible with earlier version. However for tables using system versioning we before stored a timestamp with the year 2038 as the 'max timestamp', which is used to detect current values. This patch stores the new 2106 year max value as the max timestamp. This means that old tables using system versioning needs to be updated with mariadb-upgrade when moving them to 11.4. That will be done in a separate commit.	2024-05-27 12:39:02 +02:00
Robin Newhouse	dc38d8ea80	Minimize unsafe C functions with safe_strcpy() Similar to #2480. `567b681` introduced safe_strcpy() to minimize the use of C with potentially unsafe memory overflow with strcpy() whose use is discouraged. Replace instances of strcpy() with safe_strcpy() where possible, limited here to files in the `sql/` directory. All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.	2024-05-17 13:33:16 +01:00
Kristian Nielsen	383ee364dc	Merge 10.6 to 10.11	2024-05-07 08:45:31 +02:00
Kristian Nielsen	596921dab8	MDEV-34042: Deadlock kill of XA PREPARE can break replication / rpl.rpl_parallel_multi_domain_xa sporadic failure Clear any pending deadlock kill after completing XA PREPARE, and before updating the mysql.gtid_slave_pos table in a separate transaction. Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-05-02 21:07:51 +02:00
Sergei Golubchik	018d537ec1	Merge branch '10.6' into 10.11	2024-04-22 15:23:10 +02:00
Marko Mäkelä	829cb1a49c	Merge 10.5 into 10.6	2024-04-17 14:14:58 +03:00
Kristian Nielsen	16aa4b5f59	Merge from 10.4 to 10.5 Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-04-15 17:46:49 +02:00
Kristian Nielsen	d90a2b44ad	MDEV-33668: More precise dependency tracking of XA XID in parallel replication Keep track of each recently active XID, recording which worker it was queued on. If an XID might still be active, choose the same worker to queue event groups that refer to the same XID to avoid conflicts. Otherwise, schedule the XID freely in the next round-robin slot. This way, XA PREPARE can normally be scheduled without restrictions (unless duplicate XID transactions come close together). This improves scheduling and parallelism over the old method, where the worker thread to schedule XA PREPARE on was fixed based on a hash value of the XID. XA COMMIT will normally be scheduled on the same worker as XA PREPARE, but can be a different one if the XA PREPARE is far back in the event history. Testcase and code for trimming dynamic array due to Andrei. Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-04-09 11:42:34 +03:00
Kristian Nielsen	f9ecaa87ce	MDEV-33668: Refactor parallel replication round-robin scheduling to use explicit FIFO This is a preparatory patch to facilitate the next commit to improve the scheduling of XA transactions in parallel replication. When choosing the scheduling bucket for the next event group in rpl_parallel_entry::choose_thread(), use an explicit FIFO for the round-robin selection instead of a simple cyclic counter i := (i+1) % N. This allows to schedule XA COMMIT/ROLLBACK dependencies explicitly without changing the round-robin scheduling of other event groups. Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-04-09 11:42:34 +03:00
Marko Mäkelä	788953463d	Merge 10.6 into 10.11 Some fixes related to commit `f838b2d799` and Rows_log_event::do_apply_event() and Update_rows_log_event::do_exec_row() for system-versioned tables were provided by Nikita Malyavin. This was required by test versioning.rpl,trx_id,row.	2024-03-28 09:16:57 +02:00
Kristian Nielsen	0a6f46965a	MDEV-33475: --gtid-ignore-duplicate can double-apply event in case of parallel replication retry When rolling back and retrying a transaction in parallel replication, don't release the domain ownership (for --gtid-ignore-duplicates) as part of the rollback. Otherwise another master connection could grab the ownership and double-apply the transaction in parallel with the retry. Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-03-13 16:59:10 +01:00
Monty	9a132d423a	MDEV-33620 Improve times and states in show processlist for replication This will makes it easier to find out what replication workers are doing and what they are waiting for. Things changed in processlist: - Slave_SQL time was not consistent. Now time for state "Slave has read all relay log; waiting for more updates" shows how long it has waited for getting the next event. - Slave_worker threads did often show "Closing tables" for a long time. Now the state is reverted to the previous state after "Closing tables" is done. - Commit and Rollback states where not shown for replication (and some other threads). Now Commit and Rollback states are always shown and the state is reverted to previous state when the Commit/Rollback have finished. Code changes: - Added thd->set_time_for_next_stage() for parallel replication when when starting to wait for prior transactions to commit, group commit, and FTWRL and for free space in thread pool. Before we reset the time only after the above events. - Moved THD_STAGE_INFO(stage_rollback) and THD_STAGE_INFO(stage_commit) from sql_parse.cc to transaction.cc to ensure this is done for all commits and not only 'normal connection queries'. Test case changes: - close_thread_tables() reverting stage to previous stage caused the counter in performance_schema to be increased. In many case it is the 'sql/starting' stage that was effected. - We only change to "Commit" stage if there is a need for a commit. This caused some "Commit" stages to disapper from perfschema reports. TODO in 11.#: - Slave_IO always showes "Waiting for master to send event" and the time is from SLAVE START. We should in 11.# change this to be the time since reading the last event.	2024-03-08 15:23:17 +02:00
Marko Mäkelä	2b99e5f7ef	Merge 10.6 into 10.11	2023-12-20 15:58:36 +02:00
Marko Mäkelä	2b01e5103d	Merge 10.5 into 10.6	2023-12-19 18:41:42 +02:00
Marko Mäkelä	12995559f9	Merge 10.4 into 10.5	2023-12-19 18:30:58 +02:00
Kristian Nielsen	eaa4968fc5	MDEV-10653: Fix segfault in SHOW MASTER STATUS with NULL inuse_relaylog The previous patch for MDEV-10653 changes the rpl_parallel::workers_idle() function to use Relay_log_info::last_inuse_relaylog to check for idle workers. But the code was missing a NULL check. Also, there was one place during SQL slave thread start which was missing mutex synchronisation when updating inuse_relaylog. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-12-19 12:08:54 +01:00
Kristian Nielsen	1cbba45e6e	Attempt to fix rare race in test for MDEV-8031 The error-injection inject_mdev8031 simulates a deadlock kill in a specific place, by setting killed_for_retry to RETRY_KILL_KILLED directly. If a real deadlock kill triggers at the same time, it is possible for the thread to complete its transaction retry and set rgi_slave to NULL before the real readlock kill can complete in the background. This will cause a segfault due to null-pointer access. Fix by changing the error injection to do a real background deadlock kill, which ensures that the thread will wait for any pending background kills to complete. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-12-19 12:08:53 +01:00
Marko Mäkelä	4ae105a37d	Merge 10.4 into 10.5	2023-12-18 08:59:07 +02:00
Brandon Nesterenko	8dad51481b	MDEV-10653: SHOW SLAVE STATUS Can Deadlock an Errored Slave AKA rpl.rpl_parallel, binlog_encryption.rpl_parallel fails in buildbot with timeout in include A replication parallel worker thread can deadlock with another connection running SHOW SLAVE STATUS. That is, if the replication worker thread is in do_gco_wait() and is killed, it will already hold the LOCK_parallel_entry, and during error reporting, try to grab the err_lock. SHOW SLAVE STATUS, however, grabs these locks in reverse order. It will initially grab the err_lock, and then try to grab LOCK_parallel_entry. This leads to a deadlock when both threads have grabbed their first lock without the second. This patch implements the MDEV-31894 proposed fix to optimize the workers_idle() check to compare the last in-use relay log’s queued_count==dequeued_count for idleness. This removes the need for workers_idle() to grab LOCK_parallel_entry, as these values are atomically updated. Huge thanks to Kristian Nielsen for diagnosing the problem! Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org> Andrei Elkin <andrei.elkin@mariadb.com>	2023-12-11 07:45:23 -07:00
Marko Mäkelä	d5e15424d8	Merge 10.6 into 10.10 The MDEV-29693 conflict resolution is from Monty, as well as is a bug fix where ANALYZE TABLE wrongly built histograms for single-column PRIMARY KEY. Also includes a fix for safe_malloc error reporting. Other things: - Copied main.log_slow from 10.4 to avoid mtr issue Disabled test: - spider/bugfix.mdev_27239 because we started to get +Error 1429 Unable to connect to foreign data source: localhost -Error 1158 Got an error reading communication packets - main.delayed - Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED This part is disabled for now as it fails randomly with different warnings/errors (no corruption).	2023-10-14 13:36:11 +03:00
Marko Mäkelä	0f9acce3f2	Merge 10.5 into 10.6	2023-09-14 09:01:15 +03:00
sjaakola	a3cbc44b24	MDEV-31833 replication breaks when using optimistic replication and replica is a galera node MariaDB async replication SQL thread was stopped for any failure in applying of replication events and error message logged for the failure was: "Node has dropped from cluster". The assumption was that event applying failure is always due to node dropping out. With optimistic parallel replication, event applying can fail for natural reasons and applying should be retried to handle the failure. This retry logic was never exercised because the slave SQL thread was stopped with first applying failure. To support optimistic parallel replication retrying logic this commit will now skip replication slave abort, if node remains in cluster (wsrep_ready==ON) and replication is configured for optimistic or aggressive retry logic. During the development of this fix, galera.galera_as_slave_nonprim test showed some problems. The test was analyzed, and it appears to need some attention. One excessive sleep command was removed in this commit, but it will need more fixes still to be fully deterministic. After this commit galera_as_slave_nonprim is successful, though. Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>	2023-09-12 02:37:30 +02:00
Marko Mäkelä	0dd25f28f7	Merge 10.5 into 10.6	2023-09-11 14:46:39 +03:00
Marko Mäkelä	f8f7d9de2c	Merge 10.4 into 10.5	2023-09-11 11:29:31 +03:00
Kristian Nielsen	e937a64d46	MDEV-10356: rpl.rpl_parallel_temptable failure due to incorrect commit optimization of temptables The problem was that parallel replication of temporary tables using statement-based binlogging could overlap the COMMIT in one thread with a DML or DROP TEMPORARY TABLE in another thread using the same temporary table. Temporary tables are not safe for concurrent access, so this caused reference to freed memory and possibly other nastiness. The fix is to disable the optimisation with overlapping commits of one transaction with the start of a later transaction, when temporary tables are in use. Then the following event groups will be blocked from starting until the one using temporary tables is completed. This also fixes occasional test failures of rpl.rpl_parallel_temptable seen in Buildbot. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-09-07 14:40:05 +02:00
Marko Mäkelä	448c2077fb	Merge 10.5 into 10.6	2023-08-21 15:50:31 +03:00
Marko Mäkelä	5895a3622b	Merge 10.4 into 10.5	2023-08-17 10:33:36 +03:00
Marko Mäkelä	9cd2989589	Merge 10.6 into 10.10	2023-08-16 15:28:42 +03:00
Kristian Nielsen	34e8585437	MDEV-29974: Missed kill waiting for worker queues to drain When the SQL driver thread goes to wait for room in the parallel slave worker queue, there was a race where a kill at the right moment could be ignored and the wait proceed uninterrupted by the kill. Fix by moving the THD::check_killed() to occur _after_ doing ENTER_COND(). This bug was seen as sporadic failure of the testcase rpl.rpl_parallel (rpl.rpl_parallel_gco_wait_kill since 10.5), with "Slave stopped with wrong error code". Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-16 14:07:06 +02:00
Kristian Nielsen	7c9837ce74	Merge 10.4 into 10.5 Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-15 18:02:18 +02:00
Kristian Nielsen	18acbaf416	MDEV-31655: Parallel replication deadlock victim preference code errorneously removed Restore code to make InnoDB choose the second transaction as a deadlock victim if two transactions deadlock that need to commit in-order for parallel replication. This code was erroneously removed when VATS was implemented in InnoDB. Also add a test case for InnoDB choosing the right deadlock victim. Also fixes this bug, with testcase that reliably reproduces: MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master Reviewed-by: Marko Mäkelä <marko.makela@mariadb.com> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-15 16:39:49 +02:00
Kristian Nielsen	900c4d6920	MDEV-31655: Parallel replication deadlock victim preference code errorneously removed Restore code to make InnoDB choose the second transaction as a deadlock victim if two transactions deadlock that need to commit in-order for parallel replication. This code was erroneously removed when VATS was implemented in InnoDB. Also add a test case for InnoDB choosing the right deadlock victim. Also fixes this bug, with testcase that reliably reproduces: MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master Note: This should be null-merged to 10.6, as a different fix is needed there due to InnoDB locking code changes. Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-08-15 16:35:30 +02:00
Oleksandr Byelkin	34a8e78581	Merge branch '10.6' into 10.9	2023-08-04 08:01:06 +02:00
Oleksandr Byelkin	6bf8483cac	Merge branch '10.5' into 10.6	2023-08-01 15:08:52 +02:00

1 2 3 4 5 ...

303 Commits