mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-08-12 20:49:12 +03:00

Author	SHA1	Message	Date
Kristian Nielsen	72e1cc8f52	MDEV-35806: Error in read_log_event() corrupts relay log writer, crashes server In Log_event::read_log_event(), don't use IO_CACHE::error of the relay log's IO_CACHE to signal an error back to the caller. When reading the active relay log, this flag is also being used by the IO thread, and setting it can randomly cause the IO thread to wrongly detect IO error on writing and permanently disable the relay log. This was seen sporadically in test case rpl.rpl_from_mysql80. The read error set by the SQL thread in the IO_CACHE would be interpreted as a write error by the IO thread, which would cause it to throw a fatal error and close the relay log. And this would later cause CHANGE MASTER to try to purge a closed relay log, resulting in nullptr crash. SQL thread is not able to parse an event read from the relay log. This can happen like here when replicating unknown events from a MySQL master, potentially also for other reasons. Also fix a mistake in my_b_flush_io_cache() introduced back in 2001 (`fa09f2cd7e`) where my_b_flush_io_cache() could wrongly return an error set in IO_CACHE::error, even if the flush operation itself succeeded. Also fix another sporadic failure in rpl.rpl_from_mysql80 where the outout of MASTER_POS_WAIT() depended on timing of SQL and IO thread. Reviewed-by: Monty <monty@mariadb.org> Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2025-01-24 09:15:20 +00:00
Sergei Golubchik	addc828363	Merge branch '10.5' into 10.6	2025-01-09 10:15:53 +01:00
Sergei Golubchik	82fd202fa4	fix "enforce no trailing \n in Diagnostic_area messages" cannot have an assert in Warning_info::push_warning() because SQL command SIGNAL can set an absolutely arbitrary message, even an empty one or ending with '\n' move the assert into push_warning() and my_message_sql(). followup for `9508a44c37`	2025-01-09 09:28:28 +01:00
Marko Mäkelä	b251cb6a4f	Merge 10.5 into 10.6	2025-01-08 08:48:21 +02:00
Sergei Golubchik	9508a44c37	enforce no trailing \n in Diagnostic_area messages that is in my_error(), push_warning(), etc	2025-01-07 16:31:39 +01:00
Monty	87ee1e75bc	MDEV-35643 Add support for MySQL 8.0 binlog events MDEV-29533 Crash when MariaDB is replica of MySQL 8.0 MySQL 8.0 has added the following new events in the MySQL binary log PARTIAL_UPDATE_ROWS_EVENT TRANSACTION_PAYLOAD_EVENT HEARTBEAT_LOG_EVENT_V2 - PARTIAL_UPDATE_ROWS_EVENT is used by MySQL to generate update statements using JSON_SET, JSON_REPLACE and JSON_REMOVE to make update of JSON columns more efficient. These events can be disabled by setting 'binlog-row-value-options=""' - TRANSACTION_PAYLOAD_EVENT is used by MySQL to signal that a row event is compressed. It an be disably by setting 'binlog_transaction_compression=0'. - HEARTBEAT_LOG_EVENT_V2 is written to the binary log many times per seconds. It can be ignored by the server. What this patch does: - If PARTIAL_UPDATE_ROWS_EVENT or TRANSACTION_PAYLOAD_EVENT is found, the server will stop with an error message of how to disable the MySQL server to generate such events. - HEARTBEAT_LOG_EVENT_V2 events are ignored. - mariadb-binlog will write the name of the new events. - mariadb-binlog will stop if PARTIAL_UPDATE_ROWS_EVENT or TRANSACTION_PAYLOAD_EVENT is found, unless --force is given. - Fixes a crash in mariadb-binlog if a character set unknown to MariaDB is found. (MDEV-29533) From Kristian Nielsen: - Add test case for MySQL 8.0 to MariaDB replication and fixed a a small typo in post_header_len initialization. Reviewer: knielsen@mariadb.org	2025-01-05 16:40:11 +02:00
Marko Mäkelä	5ba542e9ee	Merge 10.5 into 10.6	2024-05-30 14:27:07 +03:00
Robin Newhouse	dc38d8ea80	Minimize unsafe C functions with safe_strcpy() Similar to #2480. `567b681` introduced safe_strcpy() to minimize the use of C with potentially unsafe memory overflow with strcpy() whose use is discouraged. Replace instances of strcpy() with safe_strcpy() where possible, limited here to files in the `sql/` directory. All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.	2024-05-17 13:33:16 +01:00
Sergei Golubchik	e83d92ee5e	sporadic failures of rpl.rpl_semi_sync_fail_over in the $case=2 - it's wrong to kill after the first binlog EOF, because that might happen between INSERT(4) and INSERT(5). So, wait for the slave to acknowledge INSERT(5) before killing the master, that is, both connection threads must pass repl_semisync_master.wait_after_sync()	2024-04-21 22:54:52 +02:00
Marko Mäkelä	829cb1a49c	Merge 10.5 into 10.6	2024-04-17 14:14:58 +03:00
Kristian Nielsen	16aa4b5f59	Merge from 10.4 to 10.5 Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2024-04-15 17:46:49 +02:00
Marko Mäkelä	59e7289b6c	Fix g++-14 -Wmaybe-uninitialized	2024-03-19 08:10:42 +02:00
Michael Widenius	7af50e4df4	MDEV-32551: "Read semi-sync reply magic number error" warnings on master rpl_semi_sync_slave_enabled_consistent.test and the first part of the commit message comes from Brandon Nesterenko. A test to show how to induce the "Read semi-sync reply magic number error" message on a primary. In short, if semi-sync is turned on during the hand-shake process between a primary and replica, but later a user negates the rpl_semi_sync_slave_enabled variable while the replica's IO thread is running; if the io thread exits, the replica can skip a necessary call to kill_connection() in repl_semisync_slave.slave_stop() due to its reliance on a global variable. Then, the replica will send a COM_QUIT packet to the primary on an active semi-sync connection, causing the magic number error. The test in this patch exits the IO thread by forcing an error; though note a call to STOP SLAVE could also do this, but it ends up needing more synchronization. That is, the STOP SLAVE command also tries to kill the VIO of the replica, which makes a race with the IO thread to try and send the COM_QUIT before this happens (which would need more debug_sync to get around). See THD::awake_no_mutex for details as to the killing of the replica’s vio. Notes: - The MariaDB documentation does not make it clear that when one enables semi-sync replication it does not matter if one enables it first in the master or slave. Any order works. Changes done: - The rpl_semi_sync_slave_enabled variable is now a default value for when semisync is started. The variable does not anymore affect semisync if it is already running. This fixes the original reported bug. Internally we now use repl_semisync_slave.get_slave_enabled() instead of rpl_semi_sync_slave_enabled. To check if semisync is active on should check the @@rpl_semi_sync_slave_status variable (as before). - The semisync protocol conflicts in the way that the original MySQL/MariaDB client-server protocol was designed (client-server send and reply packets are strictly ordered and includes a packet number to allow one to check if a packet is lost). When using semi-sync the master and slave can send packets at 'any time', so packet numbering does not work. The 'solution' has been that each communication starts with packet number 1, but in some cases there is still a chance that the packet number check can fail. Fixed by adding a flag (pkt_nr_can_be_reset) in the NET struct that one can use to signal that packet number checking should not be done. This is flag is set when semi-sync is used. - Added Master_info::semi_sync_reply_enabled to allow one to configure some slaves with semisync and other other slaves without semisync. Removed global variable semi_sync_need_reply that would not work with multi-master. - Repl_semi_sync_master::report_reply_packet() can now recognize the COM_QUIT packet from semisync slave and not give a "Read semi-sync reply magic number error" error for this case. The slave will be removed from the Ack listener. - On Windows, don't stop semisync Ack listener just because one slave connection is using socket_id > FD_SETSIZE. - Removed busy loop in Ack_receiver::run() by using "Self-pipe trick" to signal new slave and stop Ack_receiver. - Changed some Repl_semi_sync_slave functions that always returns 0 from int to void. - Added Repl_semi_sync_slave::slave_reconnect(). - Removed dummy_function Repl_semi_sync_slave::reset_slave(). - Removed some duplicate semisync notes from the error log. - Add test of "if (get_slave_enabled() && semi_sync_need_reply)" before calling Repl_semi_sync_slave::slave_reply(). (Speeds up the code as we can skip all initializations). - If epl_semisync_slave.slave_reply() fails, we disable semisync for that connection. - We do not call semisync.switch_off() if there are no active slaves. Instead we check in Repl_semi_sync_master::commit_trx() if there are no active threads. This simplices the code. - Changed assert() to DBUG_ASSERT() to ensure that the DBUG log is flushed in case of asserts. - Removed the internal rpl_semi_sync_slave_status as it is not needed anymore. The @@rpl_semi_sync_slave_status status variable is now mapped to rpl_semi_sync_enabled. - Removed rpl_semi_sync_slave_enabled as it is not needed anymore. Repl_semi_sync_slave::get_slave_enabled() contains the active status. - Added checking that we do not add a slave twice with Ack_receiver::add_slave(). This could happen with old code. - Removed Repl_semi_sync_master::check_and_switch() as it is not needed anymore. - Ensure that when we call Ack_receiver::remove_slave() that the slave is removed from the listener before function returns. - Call listener.listen_on_sockets() outside of mutex for better performance and less contested mutex. - Ensure that listening is ignoring newly added slaves when checking for responses. - Fixed the master ack_receiver listener is not killed if there are no connected slaves (and thus stop semisync handling of future connections). This could happen if all slaves sockets where would be marked as unreliable. - Added unlink() to base_ilist_iterator and remove() to I_List_iterator. This enables us to remove 'dead' slaves in Ack_recever::run(). - kill_zombie_dump_threads() now does killing of dump threads properly. - It can now kill several threads (should be impossible but could happen if IO slaves reconnects very fast). - We now wait until the dump thread is done before starting the dump. - Added an error if kill_zombie_dump_threads() fails. - Set thd->variables.server_id before calling kill_zombie_dump_threads(). This simplies the code. - Added a lot of comments both in code and tests. - Removed DBUG_EVALUATE_IF "failed_slave_start" as it is not used. Test changes: - rpl.rpl_session_var2 added which runs rpl.rpl_session_var test with semisync enabled. - Some timings changed slight with startup of slave which caused rpl_binlog_dump_slave_gtid_state_info.text to fail as it checked the error log file before the slave had started properly. Fixed by adding wait_for_pattern_in_file.inc that allows waiting for the pattern to appear in the log file. - Tests have been updated so that we first set rpl_semi_sync_master_enabled on the master and then set rpl_semi_sync_slave_enabled on the slaves (this is according to how the MariaDB documentation document how to setup semi-sync). - Error text "Master server does not have semi-sync enabled" has been replaced with "Master server does not support semi-sync" for the case when the master supports semi-sync but semi-sync is not enabled. Other things: - Some trivial cleanups in Repl_semi_sync_master::update_sync_header(). - We should in 11.3 changed the default value for rpl-semi-sync-master-wait-no-slave from TRUE to FALSE as the TRUE does not make much sense as default. The main difference with using FALSE is that we do not wait for semisync Ack if there are no slave threads. In the case of TRUE we wait once, which did not bring any notable benefits except slower startup of master configured for using semisync. Co-author: Brandon Nesterenko <brandon.nesterenko@mariadb.com> This solves the problem reported in MDEV-32960 where a new slave may not be registered in time and the master disables semi sync because of that.	2024-01-23 13:03:11 +02:00
Marko Mäkelä	3a96eba25f	Merge 10.5 into 10.6	2024-01-17 13:35:05 +02:00
Alexander Barkov	fa3171df08	MDEV-27666 User variable not parsed as geometry variable in geometry function Adding GEOMETRY type user variables.	2024-01-16 18:53:23 +04:00
Sergei Golubchik	e95bba9c58	Merge branch '10.5' into 10.6	2023-12-17 11:20:43 +01:00
Sergei Golubchik	98a39b0c91	Merge branch '10.4' into 10.5	2023-12-02 01:02:50 +01:00
Brandon Nesterenko	c42aadc388	MDEV-32628: Cryptic ERROR message & inconsistent behavior on incorrect SHOW BINLOG EVENTS FROM ... Calling SHOW BINLOG EVENTS FROM <offset> with an invalid offset writes error messages into the server log about invalid reads. The read errors that occur from this command should only be relayed back to the user though, and not written into the server log. This is because they are read-only and have no impact on server operation, and the client only need be informed to correct the parameter. This patch fixes this by omitting binary log read errors from the server when the invocation happens from SHOW BINLOG EVENTS. Additionally, redundant error messages are omitted when calling the string based read_log_event from the IO_Cache based read_log_event, as the later already will report the error of the former. Reviewed By: ============ Kristian Nielsen <knielsen@knielsen-hq.org> Andrei Elkin <andrei.elkin@mariadb.com>	2023-11-17 09:43:56 -08:00
Kristian Nielsen	6fa69ad747	MDEV-27436: binlog corruption (/tmp no space left on device at the same moment) This commit fixes several bugs in error handling around disk full when writing the statement/transaction binlog caches: 1. If the error occurs during a non-transactional statement, the code attempts to binlog the partially executed statement (as it cannot roll back). The stmt_cache->error was still set from the disk full error. This caused MYSQL_BIN_LOG::write_cache() to get an error while trying to read the cache to copy it to the binlog. This was then wrongly interpreted as a disk full error writing to the binlog file. As a result, a partial event group containing just a GTID event (no query or commit) was binlogged. Fixed by checking if an error is set in the statement cache, and if so binlog an INCIDENT event instead of a corrupt event group, as for other errors. 2. For LOAD DATA LOCAL INFILE, if a disk full error occured while writing to the statement cache, the code would attempt to abort and read-and-discard any remaining data sent by the client. The discard code would however continue trying to write data to the statement cache, and wrongly interpret another disk full error as end-of-file from the client. This left the client connection with extra data which corrupts the communication for the next command, as well as again causing an corrupt/incomplete event to be binlogged. Fixed by restoring the default read function before reading any remaining data from the client connection. Reviewed-by: Andrei Elkin <andrei.elkin@mariadb.com> Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>	2023-10-31 11:48:00 +01:00
Marko Mäkelä	5bada1246d	Merge 10.5 into 10.6	2023-04-11 16:15:19 +03:00
Oleksandr Byelkin	ac5a534a4c	Merge remote-tracking branch '10.4' into 10.5	2023-03-31 21:32:41 +02:00
Anel Husakovic	c596ad734d	MDEV-30269: Remove rpl_semi_sync_[slave,master] usage in code - Description: - Before 10.3.8 semisync was a plugin that is built into the server with MDEV-13073,starting with commit `cbc71485e2`. There are still some usage of `rpl_semi_sync_master` in mtr. Note: - To recognize the replica in the `dump_thread`, replica is creating local variable `rpl_semi_sync_slave` (the keyword of plugin) in function `request_transmit`, that is catched by primary in `is_semi_sync_slave()`. This is the user variable and as such not related to the obsolete plugin. - Found in `sys_vars.all_vars` and `rpl_semi_sync_wait_point` tests, usage of plugins `rpl_semi_sync_master`, `rpl_semi_sync_slave`. The former test is disabled by default (`sys_vars/disabled.def`) and marked as `obsolete`, however this patch will remove the queries. - Add cosmetic fixes to semisync codebase Reviewer: <brandon.nesterenko@mariadb.com> Closes PR #2528, PR #2380	2023-03-23 13:39:46 +01:00
Sergei Golubchik	900d7bf360	Merge branch '10.5' into 10.6	2022-10-02 22:14:21 +02:00
Sergei Golubchik	3a2116241b	Merge branch '10.4' into 10.5	2022-10-02 14:38:13 +02:00
Sergei Golubchik	d4f6d2f08f	Merge branch '10.3' into 10.4	2022-10-01 23:07:26 +02:00
Mikhail Chalov	9de9f105b5	Use memory safe snprintf() in Connect Engine and elsewhere (#2210 ) Continue with similar changes as done in `19af1890` to replace sprintf(buf, ...) with snprintf(buf, sizeof(buf), ...), specifically in the "easy" cases where buf is allocated with a size known at compile time. All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.	2022-09-28 15:45:25 +01:00
Marko Mäkelä	829e8111c7	Merge 10.5 into 10.6	2022-09-26 14:34:43 +03:00
Marko Mäkelä	6286a05d80	Merge 10.4 into 10.5	2022-09-26 13:34:38 +03:00
Marko Mäkelä	3c92050d1c	Fix build without either ENABLED_DEBUG_SYNC or DBUG_OFF There are separate flags DBUG_OFF for disabling the DBUG facility and ENABLED_DEBUG_SYNC for enabling the DEBUG_SYNC facility. Let us allow debug builds without DEBUG_SYNC. Note: For CMAKE_BUILD_TYPE=Debug, CMakeLists.txt will continue to define ENABLED_DEBUG_SYNC.	2022-09-23 17:37:52 +03:00
sjaakola	ef2dbb8dbc	MDEV-23328 Server hang due to Galera lock conflict resolution Mutex order violation when wsrep bf thread kills a conflicting trx, the stack is wsrep_thd_LOCK() wsrep_kill_victim() lock_rec_other_has_conflicting() lock_clust_rec_read_check_and_lock() row_search_mvcc() ha_innobase::index_read() ha_innobase::rnd_pos() handler::ha_rnd_pos() handler::rnd_pos_by_record() handler::ha_rnd_pos_by_record() Rows_log_event::find_row() Update_rows_log_event::do_exec_row() Rows_log_event::do_apply_event() Log_event::apply_event() wsrep_apply_events() and mutexes are taken in the order lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data When a normal KILL statement is executed, the stack is innobase_kill_query() kill_handlerton() plugin_foreach_with_mask() ha_kill_query() THD::awake() kill_one_thread() and mutexes are victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex This patch is the plan D variant for fixing potetial mutex locking order exercised by BF aborting and KILL command execution. In this approach, KILL command is replicated as TOI operation. This guarantees total isolation for the KILL command execution in the first node: there is no concurrent replication applying and no concurrent DDL executing. Therefore there is no risk of BF aborting to happen in parallel with KILL command execution either. Potential mutex deadlocks between the different mutex access paths with KILL command execution and BF aborting cannot therefore happen. TOI replication is used, in this approach, purely as means to provide isolated KILL command execution in the first node. KILL command should not (and must not) be applied in secondary nodes. In this patch, we make this sure by skipping KILL execution in secondary nodes, in applying phase, where we bail out if applier thread is trying to execute KILL command. This is effective, but skipping the applying of KILL command could happen much earlier as well. This also fixed unprotected calls to wsrep_thd_abort that will use wsrep_abort_transaction. This is fixed by holding THD::LOCK_thd_data while we abort transaction. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-10-29 20:40:35 +02:00
Jan Lindström	d5bc05798f	MDEV-25114: Crash: WSREP: invalid state ROLLED_BACK (FATAL) Revert "MDEV-23328 Server hang due to Galera lock conflict resolution" This reverts commit `eac8341df4`.	2021-10-29 20:38:11 +02:00
sjaakola	5c230b21bf	MDEV-23328 Server hang due to Galera lock conflict resolution Mutex order violation when wsrep bf thread kills a conflicting trx, the stack is wsrep_thd_LOCK() wsrep_kill_victim() lock_rec_other_has_conflicting() lock_clust_rec_read_check_and_lock() row_search_mvcc() ha_innobase::index_read() ha_innobase::rnd_pos() handler::ha_rnd_pos() handler::rnd_pos_by_record() handler::ha_rnd_pos_by_record() Rows_log_event::find_row() Update_rows_log_event::do_exec_row() Rows_log_event::do_apply_event() Log_event::apply_event() wsrep_apply_events() and mutexes are taken in the order lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data When a normal KILL statement is executed, the stack is innobase_kill_query() kill_handlerton() plugin_foreach_with_mask() ha_kill_query() THD::awake() kill_one_thread() and mutexes are victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex This patch is the plan D variant for fixing potetial mutex locking order exercised by BF aborting and KILL command execution. In this approach, KILL command is replicated as TOI operation. This guarantees total isolation for the KILL command execution in the first node: there is no concurrent replication applying and no concurrent DDL executing. Therefore there is no risk of BF aborting to happen in parallel with KILL command execution either. Potential mutex deadlocks between the different mutex access paths with KILL command execution and BF aborting cannot therefore happen. TOI replication is used, in this approach, purely as means to provide isolated KILL command execution in the first node. KILL command should not (and must not) be applied in secondary nodes. In this patch, we make this sure by skipping KILL execution in secondary nodes, in applying phase, where we bail out if applier thread is trying to execute KILL command. This is effective, but skipping the applying of KILL command could happen much earlier as well. This also fixed unprotected calls to wsrep_thd_abort that will use wsrep_abort_transaction. This is fixed by holding THD::LOCK_thd_data while we abort transaction. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-10-29 09:52:52 +03:00
Jan Lindström	aa7ca987db	MDEV-25114: Crash: WSREP: invalid state ROLLED_BACK (FATAL) Revert "MDEV-23328 Server hang due to Galera lock conflict resolution" This reverts commit `eac8341df4`.	2021-10-29 09:52:40 +03:00
Marko Mäkelä	73f5cbd0b6	Merge 10.5 into 10.6	2021-10-21 16:06:34 +03:00
Marko Mäkelä	5f8561a6bc	Merge 10.4 into 10.5	2021-10-21 15:26:25 +03:00
Marko Mäkelä	489ef007be	Merge 10.3 into 10.4	2021-10-21 14:57:00 +03:00
Marko Mäkelä	e4a7c15dd6	Merge 10.2 into 10.3	2021-10-21 13:41:04 +03:00
Brandon Nesterenko	2291f8ef73	MDEV-25284: Assertion `info->type == READ_CACHE \|\| info->type == WRITE_CACHE' failed Problem: ======== This patch addresses two issues. First, if a CHANGE MASTER command is issued and an error happens while locating the replica’s relay logs, the logs can be put into an invalid state where future updates fail and future CHANGE MASTER calls crash the server. More specifically, right before a replica purges the relay logs (part of the `CHANGE MASTER TO` logic), the relay log is temporarily closed with state LOG_TO_BE_OPENED. If the server errors in-between the temporary log closure and purge, i.e. during the function find_log_pos, the log should be closed. MDEV-25284 reveals the log is not properly closed. Second, upon issuing a RESET SLAVE ALL command, a slave’s GTID filters are not cleared (DO_DOMAIN_IDS, IGNORE_DOMIAN_IDS, IGNORE_SERVER_IDS). MySQL had a similar bug report, Bug #18816897, which fixed this issue to clear IGNORE_SERVER_IDS after issuing RESET SLAVE ALL in version 5.7. Solution: ========= To fix the first problem, the CHANGE MASTER error handling logic was extended to transition the relay log state to LOG_CLOSED from LOG_TO_BE_OPENED. To fix the second problem, the RESET SLAVE ALL logic is extended to clear the domain_id filter and ignore_server_ids. Reviewed By: ============ Andrei Elkin <andrei.elkin@mariadb.com>	2021-10-18 10:43:51 -06:00
Marko Mäkelä	f3fcf5f45c	Merge 10.5 to 10.6	2021-08-19 12:25:00 +03:00
Sujatha	475f69b985	MDEV-25958: rpl_semi_sync_fail_over.test fails in buildbot Analysis: ======== In case multi binlog truncation scenario debug sync points are in the following order. Two inserts are done on master as shown below. INSERT INTO t1 VALUES (4, REPEAT("x", 4100) commit_after_release_LOCK_after_binlog_sync INSERT INTO t1 VALUES (5, REPEAT("x", 4100) commit_after_release_LOCK_log First insert debug sync ensures that transaction is synced to binlog and not committed but it reached slave through semi sync. Second insert debug sync ensures that transaction is synced to binlog and not committed. It doesn't ensure that 'INSERT 5' reached slave. Most of the times INSERT 5 reaches slave, hence when it is promoted as master it sends 4,5 to slave. But occasionally 5 may not reach slave in those cases post recovery master will have only 4. When row 6 is inserted Master has 4-6 and Slave has 4,5,6. This results in test failure. Fix: === For the first insert use 'commit_before_get_LOCK_commit_ordered' debug sync point, it will ensure that binlog was sent to slave and slave has acknowledged the receipt. Now enable debug code such that when the next transaction is written to binary log, dump thread will read and send it across the network and notify the server to be get killed. Insert row 5 and wait for notification from dump thread. Kill the server. This ensures that both 4 and 5 have reached the semi-sync slave. Added a new test case: Insert two rows on master such that first is present in master's binlog and reached semi sync slave. Second insert should be flushed to binlog but not sent to slave. Now crash and fail over to slave. The promoted master will send the extra transaction to slave.	2021-08-19 11:59:39 +05:30
Marko Mäkelä	4a25957274	Merge 10.4 into 10.5	2021-08-18 18:22:35 +03:00
Brandon Nesterenko	46c3e7e353	MDEV-20215: binlog.show_concurrent_rotate failed in buildbot with wrong result Problem: ======= There are two issues that are addressed in this patch: 1) SHOW BINARY LOGS uses caching to store the binary logs that exist in the log directory; however, if new events are written to the logs, the caching strategy is unaware. This is okay for users, as it is okay for SHOW to return slightly old data. The test, however, can result in inconsistent data. It runs two connections concurrently, where one shows the logs, and the other adds a new file. The output of SHOW BINARY LOGS then depends on when the cache is built, with respect to the time that the second connection rotates the logs. 2) There is a race condition between RESET MASTER and SHOW BINARY LOGS. More specifically, where they both need the binary log lock to begin, SHOW BINARY LOGS only needs the lock to build its cache. If RESET MASTER is issued after SHOW BINARY LOGS has built its cache and before it has returned the results, the presented data may be incorrect. Solution: ======== 1) As it is okay for users to see stale data, to make the test consistent, use DEBUG_SYNC to force the race condition (problem 2) to make SHOW BINARY LOGS build a cache before RESET MASTER is called. Then, use additional logic from the next part of the solution to rebuild the cache. 2) Use an Atomic_counter to keep track of the number of times RESET MASTER has been called. If the value of the counter changes after building the cache, the cache should be rebuilt and the analysis should be restarted. Reviewed By: ============ Andrei Elkin: <andrei.elkin@mariadb.com>	2021-08-13 10:53:19 -06:00
Monty	85d6278fed	Change replication to use uchar for all buffers instead of char This change is to get rid of randomly failing tests, especially those that reads random position of the binary log. From looking at the logs it's clear that some failures is because of a read char (with value >= 128) is converted to a big long value. Using uchar everywhere makes this much less likely to happen. Another benefit is that a lot of cast of char to uchar could be removed. Other things: - Removed some extra space before '=' and '+=' in assignments - Fixed indentations and lines > 80 characters - Replace '16' with 'element_size' (from class definition) in Gtid_list_log_event()	2021-05-19 22:54:12 +02:00
Monty	b6ff139aa3	Reduce usage of strlen() Changes: - To detect automatic strlen() I removed the methods in String that uses 'const char ' without a length: - String::append(const char) - Binary_string(const char str) - String(const char str, CHARSET_INFO cs) - append_for_single_quote(const char ) All usage of append(const char) is changed to either use String::append(char), String::append(const char, size_t length) or String::append(LEX_CSTRING) - Added STRING_WITH_LEN() around constant string arguments to String::append() - Added overflow argument to escape_string_for_mysql() and escape_quotes_for_mysql() instead of returning (size_t) -1 on overflow. This was needed as most usage of the above functions never tested the result for -1 and would have given wrong results or crashes in case of overflows. - Added Item_func_or_sum::func_name_cstring(), which returns LEX_CSTRING. Changed all Item_func::func_name()'s to func_name_cstring()'s. The old Item_func_or_sum::func_name() is now an inline function that returns func_name_cstring().str. - Changed Item::mode_name() and Item::func_name_ext() to return LEX_CSTRING. - Changed for some functions the name argument from const char * to to const LEX_CSTRING &: - Item::Item_func_fix_attributes() - Item::check_type_...() - Type_std_attributes::agg_item_collations() - Type_std_attributes::agg_item_set_converter() - Type_std_attributes::agg_arg_charsets...() - Type_handler_hybrid_field_type::aggregate_for_result() - Type_handler_geometry::check_type_geom_or_binary() - Type_handler::Item_func_or_sum_illegal_param() - Predicant_to_list_comparator::add_value_skip_null() - Predicant_to_list_comparator::add_value() - cmp_item_row::prepare_comparators() - cmp_item_row::aggregate_row_elements_for_comparison() - Cursor_ref::print_func() - Removes String_space() as it was only used in one cases and that could be simplified to not use String_space(), thanks to the fixed my_vsnprintf(). - Added some const LEX_CSTRING's for common strings: - NULL_clex_str, DATA_clex_str, INDEX_clex_str. - Changed primary_key_name to a LEX_CSTRING - Renamed String::set_quick() to String::set_buffer_if_not_allocated() to clarify what the function really does. - Rename of protocol function: bool store(const char from, CHARSET_INFO cs) to bool store_string_or_null(const char from, CHARSET_INFO cs). This was done to both clarify the difference between this 'store' function and also to make it easier to find unoptimal usage of store() calls. - Added Protocol::store(const LEX_CSTRING, CHARSET_INFO) - Changed some 'const char' arrays to instead be of type LEX_CSTRING. - class Item_func_units now used LEX_CSTRING for name. Other things: - Fixed a bug in mysql.cc:construct_prompt() where a wrong escape character in the prompt would cause some part of the prompt to be duplicated. - Fixed a lot of instances where the length of the argument to append is known or easily obtain but was not used. - Removed some not needed 'virtual' definition for functions that was inherited from the parent. I added override to these. - Fixed Ordered_key::print() to preallocate needed buffer. Old code could case memory overruns. - Simplified some loops when adding char to a String with delimiters.	2021-05-19 22:27:48 +02:00
Monty	36cdd5c3cd	Optimize usage of c_ptr(), c_ptr_quick() and String::alloc() The problem was that when one used String::alloc() to allocate a string, the String ensures that there is space for an extra NULL byte in the buffer and if not, reallocates the string. This is a problem with the String::set_int() that calls alloc(21), which forces extra malloc/free calls to happen. - We do not anymore re-allocate String if alloc() is called with the Allocated_length. This reduces number of malloc() allocations, especially one big re-allocation in Protocol::send_result_Set_metadata() for almost every query that produced a result to the connnected client. - Avoid extra mallocs when using LONGLONG_BUFFER_SIZE This can now be done as alloc() doesn't increase buffers if new length is not bigger than old one. - c_ptr() is redesigned to be safer (but a bit longer) than before. - Remove wrong usage of c_ptr_quick() c_ptr_quick() was used in many cases to get the pointer to the used buffer, even when it didn't need to be \0 terminated. In this case ptr() is a better substitute. Another problem with c_ptr_quick() is that it did not guarantee that the string would be \0 terminated. - item_val_str(), an API function not used currently by the server, now always returns a null terminated string (before it didn't always do that). - Ensure that all String allocations uses STRING_PSI_MEMORY_KEY. The old mixed usage of performance keys caused assert's when String buffers where shrunk. - Binary_string::shrink() is simplifed - Fixed bug in String(const char str, size_t len, CHARSET_INFO cs) that used Binary_string((char ) str, len) instead of Binary_string(str,len). - Changed argument to String() creations and String.set() functions to use 'const char' instead of 'char*'. This ensures that Alloced_length is not set, which gives safety against someone trying to change the original string. This also would allow us to use !Alloced_length in c_ptr() if needed. - Changed string_ptr_cmp() to use memcmp() instead of c_ptr() to avoid a possible malloc during string comparision.	2021-05-19 22:27:27 +02:00
Nikita Malyavin	3f55c56951	Merge branch bb-10.4-release into bb-10.5-release	2021-05-05 23:57:11 +03:00
Nikita Malyavin	509e4990af	Merge branch bb-10.3-release into bb-10.4-release	2021-05-05 23:03:01 +03:00
Nikita Malyavin	a8a925dd22	Merge branch bb-10.2-release into bb-10.3-release	2021-05-04 14:49:31 +03:00
Sujatha	abe6eb10a6	MDEV-16146: MariaDB slave stops with following errors. Problem: ======== 180511 11:07:58 [ERROR] Slave I/O: Unexpected master's heartbeat data: heartbeat is not compatible with local info;the event's data: log_file_name mysql-bin.000009 log_pos 1054262041, Error_code: 1623 Analysis: ========= In replication setup when master server doesn't have any events to send to slave server it sends an 'Heartbeat_log_event'. This event carries the current binary log filename and offset details. The offset values is stored within 4 bytes of event header. When the size of binary log is higher than UINT32_MAX the log_pos values will not fit in 4 bytes memory. It overflows and hence slave stops with an error. Fix: === Since we cannot extend the common_header of Log_event class, a greater than 4GB value of Log_event::log_pos is made to be transported with a HeartBeat event's sub-header. Log_event::log_pos in such case is set to zero to indicate that the 8 byte sub-header is allocated in the event. In case of cross version replication following behaviour is expected OLD - Server without fix NEW - Server with fix OLD<->NEW : works bidirectionally as long as the binlog offset is (normally) within 4GB. When log_pos > UINT32_MAX OLD->NEW : The 'log_pos' is bound to overflow and NEW slave may report an invalid event/incompatible heart beat event error. NEW->OLD : Since patched server sets log_pos=0 on overflow, OLD slave will report invalid event error.	2021-04-30 20:34:31 +05:30
Sergei Petrunia	bf310b4cfb	MDEV-25305: MyRocks: Killing server during RESET MASTER can lose last transactions rocksdb_checkpoint_request() should call FlushWAL(sync=true) (which does write-out and sync), not just SyncWAL() (which just syncs without writing out) Followup: the test requires debug sync facility (This is a backport to 10.5)	2021-03-31 10:26:00 +03:00

1 2 3 4 5 ...

1286 Commits