mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-08-07 00:04:31 +03:00

Author	SHA1	Message	Date
Marko Mäkelä	25ac047baf	Merge 10.5 into 10.6	2021-11-09 09:11:50 +02:00
Marko Mäkelä	9c18b96603	Merge 10.4 into 10.5	2021-11-09 08:50:33 +02:00
Marko Mäkelä	47ab793d71	Merge 10.3 into 10.4	2021-11-09 08:40:14 +02:00
Marko Mäkelä	f7054ff5df	Merge mariadb-10.3.32 into 10.3	2021-11-09 07:59:36 +02:00
Aleksey Midenkov	c8cece9144	MDEV-26928 Column-inclusive WITH SYSTEM VERSIONING doesn't work with explicit system fields versioning_fields flag indicates that any columns were specified WITH SYSTEM VERSIONING. In that case we imply WITH SYSTEM VERSIONING for the whole table and WITHOUT SYSTEM VERSIONING for the other columns.	2021-11-02 11:49:47 +03:00
sjaakola	ef2dbb8dbc	MDEV-23328 Server hang due to Galera lock conflict resolution Mutex order violation when wsrep bf thread kills a conflicting trx, the stack is wsrep_thd_LOCK() wsrep_kill_victim() lock_rec_other_has_conflicting() lock_clust_rec_read_check_and_lock() row_search_mvcc() ha_innobase::index_read() ha_innobase::rnd_pos() handler::ha_rnd_pos() handler::rnd_pos_by_record() handler::ha_rnd_pos_by_record() Rows_log_event::find_row() Update_rows_log_event::do_exec_row() Rows_log_event::do_apply_event() Log_event::apply_event() wsrep_apply_events() and mutexes are taken in the order lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data When a normal KILL statement is executed, the stack is innobase_kill_query() kill_handlerton() plugin_foreach_with_mask() ha_kill_query() THD::awake() kill_one_thread() and mutexes are victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex This patch is the plan D variant for fixing potetial mutex locking order exercised by BF aborting and KILL command execution. In this approach, KILL command is replicated as TOI operation. This guarantees total isolation for the KILL command execution in the first node: there is no concurrent replication applying and no concurrent DDL executing. Therefore there is no risk of BF aborting to happen in parallel with KILL command execution either. Potential mutex deadlocks between the different mutex access paths with KILL command execution and BF aborting cannot therefore happen. TOI replication is used, in this approach, purely as means to provide isolated KILL command execution in the first node. KILL command should not (and must not) be applied in secondary nodes. In this patch, we make this sure by skipping KILL execution in secondary nodes, in applying phase, where we bail out if applier thread is trying to execute KILL command. This is effective, but skipping the applying of KILL command could happen much earlier as well. This also fixed unprotected calls to wsrep_thd_abort that will use wsrep_abort_transaction. This is fixed by holding THD::LOCK_thd_data while we abort transaction. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-10-29 20:40:35 +02:00
Jan Lindström	30337addfc	MDEV-25114: Crash: WSREP: invalid state ROLLED_BACK (FATAL) Revert "MDEV-23328 Server hang due to Galera lock conflict resolution" This reverts commit `29bbcac0ee`.	2021-10-29 10:00:05 +03:00
sjaakola	5c230b21bf	MDEV-23328 Server hang due to Galera lock conflict resolution Mutex order violation when wsrep bf thread kills a conflicting trx, the stack is wsrep_thd_LOCK() wsrep_kill_victim() lock_rec_other_has_conflicting() lock_clust_rec_read_check_and_lock() row_search_mvcc() ha_innobase::index_read() ha_innobase::rnd_pos() handler::ha_rnd_pos() handler::rnd_pos_by_record() handler::ha_rnd_pos_by_record() Rows_log_event::find_row() Update_rows_log_event::do_exec_row() Rows_log_event::do_apply_event() Log_event::apply_event() wsrep_apply_events() and mutexes are taken in the order lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data When a normal KILL statement is executed, the stack is innobase_kill_query() kill_handlerton() plugin_foreach_with_mask() ha_kill_query() THD::awake() kill_one_thread() and mutexes are victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex This patch is the plan D variant for fixing potetial mutex locking order exercised by BF aborting and KILL command execution. In this approach, KILL command is replicated as TOI operation. This guarantees total isolation for the KILL command execution in the first node: there is no concurrent replication applying and no concurrent DDL executing. Therefore there is no risk of BF aborting to happen in parallel with KILL command execution either. Potential mutex deadlocks between the different mutex access paths with KILL command execution and BF aborting cannot therefore happen. TOI replication is used, in this approach, purely as means to provide isolated KILL command execution in the first node. KILL command should not (and must not) be applied in secondary nodes. In this patch, we make this sure by skipping KILL execution in secondary nodes, in applying phase, where we bail out if applier thread is trying to execute KILL command. This is effective, but skipping the applying of KILL command could happen much earlier as well. This also fixed unprotected calls to wsrep_thd_abort that will use wsrep_abort_transaction. This is fixed by holding THD::LOCK_thd_data while we abort transaction. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-10-29 09:52:52 +03:00
Sergei Golubchik	9e32f229d4	plugin can signal that it cannot be unloaded by failing deinit() if plugin->deinit() returns a failure, it is no longer ignored, it means that the plugin isn't ready to be unloaded from memory yet. So it's marked "dying", deinitialized as much as possible, but stays in memory until shutdown. also: * increment MARIA_PLUGIN_INTERFACE_VERSION * rewrite ha_rocksdb to use the new approach, update the test	2021-10-27 15:55:14 +02:00
Aleksey Midenkov	d324c03d0c	Vanilla cleanups and refactorings Dead code cleanup: part_info->num_parts usage was wrong and working incorrectly in mysql_drop_partitions() because num_parts is already updated in prep_alter_part_table(). We don't have to update part_info->partitions because part_info is destroyed at alter_partition_lock_handling(). Cleanups: - DBUG_EVALUATE_IF() macro replaced by shorter form DBUG_IF(); - Typo in ER_KEY_COLUMN_DOES_NOT_EXITS. Refactorings: - Splitted write_log_replace_delete_frm() into write_log_delete_frm() and write_log_replace_frm(); - partition_info via DDL_LOG_STATE; - set_part_info_exec_log_entry() removed. DBUG_EVALUATE removed DBUG_EVALUTATE was only added for consistency together with DBUG_EVALUATE_IF. It is not used anywhere in the code. DBUG_SUICIDE() fix on release build On release DBUG_SUICIDE() was statement. It was wrong as DBUG_SUICIDE() is used in expression context.	2021-10-26 17:07:46 +02:00
Thirunarayanan Balathandayuthapani	045757af4c	MDEV-24621 In bulk insert, pre-sort and build indexes one page at a time When inserting a number of rows into an empty table, InnoDB will buffer and pre-sort the records for each index, and build the indexes one page at a time. For each index, a buffer of innodb_sort_buffer_size will be created. If the buffer ran out of memory then we will create temporary files for storing the data. At the end of the statement, we will sort and apply the buffered records. Ideally, we would do this at the end of the transaction or only when starting to execute a non-INSERT statement on the table. However, it could be awkward if duplicate keys or similar errors would be reported during the execution of a later statement. This will be addressed in MDEV-25036. Any columns longer than 2000 bytes will buffered in temporary files. innodb_prepare_commit_versioned(): Apply all bulk buffered insert operation, at the end of each statement. ha_commit_trans(): Handle errors from innodb_prepare_commit_versioned(). row_merge_buf_write(): This function should accept blob file handle too and it should write the field data which are greater than 2000 bytes row_merge_bulk_t: Data structure to maintain the data during bulk insert operation. trx_mod_table_time_t::start_bulk_insert(): Notify the start of bulk insert operation and create new buffer for the given table trx_mod_table_time_t::add_tuple(): Buffer a record. trx_mod_table_time_t::write_bulk(): Do bulk insert operation present in the transaction trx_mod_table_time_t::bulk_buffer_exist(): Whether the buffer storage exist for the bulk transaction trx_mod_table_time_t::write_bulk(): Write all buffered insert operation for the transaction and the table. row_ins_clust_index_entry_low(): Insert the data into the bulk buffer if it is already exist. row_ins_sec_index_entry(): Insert the secondary tuple if the bulk buffer already exist. row_merge_bulk_buf_add(): Insert the tuple into bulk buffer insert operation. row_merge_buf_blob(): Write the field data whose length is more than 2000 bytes into blob temporary file. Write the file offset and length into the tuple field. row_merge_copy_blob_from_file(): Copy the blob from blob file handler based on reference of the given tuple. row_merge_insert_index_tuples(): Handle blob for bulk insert operation. row_merge_bulk_t::row_merge_bulk_t(): Constructor. Initialize the buffer and file for all the indexes expect fts index. row_merge_bulk_t::create_tmp_file(): Create new temporary file for the given index. row_merge_bulk_t::write_to_tmp_file(): Write the content from buffer to disk file for the given index. row_merge_bulk_t::add_tuple(): Insert the tuple into the merge buffer for the given index. If the memory ran out then InnoDB should sort the buffer and write into file. row_merge_bulk_t::write_to_index(): Do bulk insert operation from merge file/merge buffer for the given index row_merge_bulk_t::write_to_table(): Do bulk insert operation for all the indexes. dict_stats_update(): If a bulk insert transaction is in progress, treat the table as empty. The index creation could hold latches for extended amounts of time.	2021-10-26 15:01:44 +03:00
Marko Mäkelä	a8379e53e8	Merge 10.5 into 10.6 The changes to galera.galear_var_replicate_myisam_on in commit `d9b933bec6` are omitted due to conflicts with commit `27d66d644c`.	2021-10-13 13:28:12 +03:00
Marko Mäkelä	99bb3fb656	Merge 10.4 into 10.5	2021-10-13 12:33:56 +03:00
Marko Mäkelä	a736a3174a	Merge 10.3 into 10.4	2021-10-13 12:03:32 +03:00
Aleksey Midenkov	d31f953789	MDEV-22660 SIGSEGV on adding system versioning and modifying system column Second alter subcommand correctly removed VERS_ROW_END flag. We throw ER_VERS_PERIOD_COLUMNS in such case.	2021-10-11 13:36:07 +03:00
Aleksey Midenkov	911c803db1	MDEV-22660 System versioning cleanups - Cleaned up Vers_parse_info::check_sys_fields(); - Renamed VERS_SYS_START_FLAG, VERS_SYS_END_FLAG to VERS_ROW_START, VERS_ROW_END.	2021-10-11 13:36:06 +03:00
Marko Mäkelä	03c09837fc	Merge 10.5 into 10.6	2021-09-16 20:17:12 +03:00
Monty	f03fee06b0	Improve error messages from Aria - Error on commit now returns HA_ERR_COMMIT_ERROR instead of HA_ERR_INTERNAL_ERROR - If checkpoint fails, it will now print out where it failed.	2021-09-15 19:27:34 +03:00
Vladislav Vaintroub	ae85835cc7	Fix warnings from -DPLUGIN_PARTITION=NO, portably. Also fix Spider's cmake.	2021-09-05 20:00:13 +02:00
Marko Mäkelä	cc4e20e56f	Merge 10.5 into 10.6	2021-08-26 10:20:17 +03:00
Marko Mäkelä	87ff4ba7c8	Merge 10.4 into 10.5	2021-08-26 08:46:57 +03:00
Marko Mäkelä	15b691b7bd	After-merge fix `f84e28c119` In a rebase of the merge, two preceding commits were accidentally reverted: commit `112b23969a` (MDEV-26308) commit `ac2857a5fb` (MDEV-25717) Thanks to Daniele Sciascia for noticing this.	2021-08-25 17:35:44 +03:00
Marko Mäkelä	f84e28c119	Merge 10.3 into 10.4	2021-08-18 16:51:52 +03:00
Leandro Pacheco	112b23969a	MDEV-26308 : Galera test failure on galera.galera_split_brain Contains following fixes: * allow TOI commands to timeout while trying to acquire TOI with override lock_wait_timeout with a LONG_TIMEOUT only after succesfully entering TOI * only ignore lock_wait_timeout on TOI * fix galera_split_brain test as TOI operation now returns ER_LOCK_WAIT_TIMEOUT after lock_wait_timeout * explicitly test for TOI Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-08-18 08:57:33 +03:00
Oleksandr Byelkin	6efb5e9f5e	Merge branch '10.5' into 10.6	2021-08-02 10:11:41 +02:00
Oleksandr Byelkin	ae6bdc6769	Merge branch '10.4' into 10.5	2021-07-31 23:19:51 +02:00
mkaruza	eb26e20df5	MDEV-22421 Galera assertion !wsrep_has_changes(thd) \|\| (thd->lex->sql_command == SQLCOM_CREATE_TABLE && !thd->is_current_stmt_binlog_format_row()) Updates to transaction registry table shouldn't be replicated in cluster so there is no need to append wsrep keys. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-07-28 14:54:18 +03:00
Rucha Deodhar	beb401b25f	This commit is a fixup for MDEV-22189. Changes: changed mysqld -> mariadbd for some more error messages that were left.	2021-07-26 22:59:10 +05:30
Andrei Elkin	e95f78f475	MDEV-21117 post-push to cover a "custom" xid format Due to wsrep uses its own xid format for its recovery, the xid hashing has to be refined. When a xid object is not in the server "mysql" format, the hash record made to contain the xid also in the full format.	2021-06-16 23:21:12 +03:00
Andrei Elkin	79a2dbc879	MDEV-21117 post-push fixes 1. work around MDEV-25912 to not apply assert at wsrep running time; 2. handle wsrep mode of the server recovery 3. convert hton calls to static binlog_commit ones. 4. satisfy MSAN complain on uninitialized std::pair	2021-06-15 19:18:11 +03:00
Sujatha	6c39eaeb12	MDEV-21117: refine the server binlog-based recovery for semisync Problem: ======= When the semisync master is crashed and restarted as slave it could recover transactions that former slaves may never have seen. A known method existed to clear out all prepared transactions with --tc-heuristic-recover=rollback does not care to adjust binlog accordingly. Fix: === The binlog-based recovery is made to concern of the slave semisync role of post-crash restarted server. No changes in behavior is done to the "normal" binloggging server and the semisync master. When the restarted server is configured with --rpl-semi-sync-slave-enabled=1 the refined recovery attempts to roll back prepared transactions and truncate binlog accordingly. In case of a partially committed (that is committed at least in one of the engine participants) such transaction gets committed. It's guaranteed no (partially as well) committed transactions exist beyond the truncate position. In case there exists a non-transactional replication event (being in a way a committed transaction) past the computed truncate position the recovery ends with an error. As after master crash and failover to slave, the demoted-to-slave ex-master must be ready to face and accept its own (generated by) events, without generally necessary --replicate-same-server-id. So the acceptance conditions are relaxed for the semisync slave to accept own events without that option. While gtid_strict_mode ON ensures no duplicate transaction can be (re-)executed the master_use_gtid=none slave has to be configured with --replicate-same-server-id. NOTE for reviewers. This patch does not handle the user XA which is done in next git commit.	2021-06-11 19:49:39 +03:00
Marko Mäkelä	1bd681c8b3	MDEV-25506 (3 of 3): Do not delete .ibd files before commit This is a complete rewrite of DROP TABLE, also as part of other DDL, such as ALTER TABLE, CREATE TABLE...SELECT, TRUNCATE TABLE. The background DROP TABLE queue hack is removed. If a transaction needs to drop and create a table by the same name (like TRUNCATE TABLE does), it must first rename the table to an internal #sql-ib name. No committed version of the data dictionary will include any #sql-ib tables, because whenever a transaction renames a table to a #sql-ib name, it will also drop that table. Either the rename will be rolled back, or the drop will be committed. Data files will be unlinked after the transaction has been committed and a FILE_RENAME record has been durably written. The file will actually be deleted when the detached file handle returned by fil_delete_tablespace() will be closed, after the latches have been released. It is possible that a purge of the delete of the SYS_INDEXES record for the clustered index will execute fil_delete_tablespace() concurrently with the DDL transaction. In that case, the thread that arrives later will wait for the other thread to finish. HTON_TRUNCATE_REQUIRES_EXCLUSIVE_USE: A new handler flag. ha_innobase::truncate() now requires that all other references to the table be released in advance. This was implemented by Monty. ha_innobase::delete_table(): If CREATE TABLE..SELECT is detected, we will "hijack" the current transaction, drop the table in the current transaction and commit the current transaction. This essentially fixes MDEV-21602. There is a FIXME comment about making the check less failure-prone. ha_innobase::truncate(), ha_innobase::delete_table(): Implement a fast path for temporary tables. We will no longer allow temporary tables to use the adaptive hash index. dict_table_t::mdl_name: The original table name for the purpose of acquiring MDL in purge, to prevent a race condition between a DDL transaction that is dropping a table, and purge processing undo log records of DML that had executed before the DDL operation. For #sql-backup- tables during ALTER TABLE...ALGORITHM=COPY, the dict_table_t::mdl_name will differ from dict_table_t::name. dict_table_t::parse_name(): Use mdl_name instead of name. dict_table_rename_in_cache(): Update mdl_name. For the internal FTS_ tables of FULLTEXT INDEX, purge would acquire MDL on the FTS_ table name, but not on the main table, and therefore it would be able to run concurrently with a DDL transaction that is dropping the table. Previously, the DROP TABLE queue hack prevented a race between purge and DDL. For now, we introduce purge_sys.stop_FTS() to prevent purge from opening any table, while a DDL transaction that may drop FTS_ tables is in progress. The function fts_lock_table(), which will be invoked before the dictionary is locked, will wait for purge to release any table handles. trx_t::drop_table_statistics(): Drop statistics for the table. This replaces dict_stats_drop_index(). We will drop or rename persistent statistics atomically as part of DDL transactions. On lock conflict for dropping statistics, we will fail instantly with DB_LOCK_WAIT_TIMEOUT, because we will be holding the exclusive data dictionary latch. trx_t::commit_cleanup(): Separated from trx_t::commit_in_memory(). Relax an assertion around fts_commit() and allow DB_LOCK_WAIT_TIMEOUT in addition to DB_DUPLICATE_KEY. The call to fts_commit() is entirely misplaced here and may obviously break the consistency of transactions that affect FULLTEXT INDEX. It needs to be fixed separately. dict_table_t::n_foreign_key_checks_running: Remove (MDEV-21175). The counter was a work-around for missing meta-data locking (MDL) on the SQL layer, and not really needed in MariaDB. ER_TABLE_IN_FK_CHECK: Replaced with ER_UNUSED_28. HA_ERR_TABLE_IN_FK_CHECK: Remove. row_ins_check_foreign_constraints(): Do not acquire dict_sys.latch either. The SQL-layer MDL will protect us. This was reviewed by Thirunarayanan Balathandayuthapani and tested by Matthias Leich.	2021-06-09 17:06:07 +03:00
Marko Mäkelä	a722ee88f3	Merge 10.5 into 10.6	2021-06-01 11:39:38 +03:00
Marko Mäkelä	9c7a456a92	Merge 10.4 into 10.5	2021-06-01 10:38:09 +03:00
sjaakola	e212415690	MDEV-25551 applying crash with tables without PK The underlying problem with MDEV-25551 turned out to be that transactions having changes for tables with no primary key, were not safe to apply in parallel. This is due to excessive locking in innodb side, and even non related row modifications could end up in lock conflict during applying. The fix for MDEV-25551 has disabled parallel applying for tables with no PK. This fix depends on change for wsrep-lib, where a separate PR allows application to modify transaction flags in wsrep-lib. This commit has also separate mtr test for verifying that transactions modifying a table with no primary key, will not apply in parallel. This test is a modified version of initial test created by Gabor Orosz, the reporterr of MDEV-25551. Another mtr test was added in galera_sr suite, for testing if modifying tables with no primary key would causes issues for streaming replication use cases. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>	2021-05-26 07:41:05 +03:00
Sergei Golubchik	b2340cdd07	fix compilation w/o partitioning	2021-05-19 22:54:14 +02:00
Monty	83e529eced	MDEV-18465 Logging of DDL statements during backup Many of the changes was needed to be able to collect and print engine name and table version id's in the ddl log.	2021-05-19 22:54:13 +02:00
Monty	7762ee5dbe	MDEV-25180 Atomic ALTER TABLE MDEV-25604 Atomic DDL: Binlog event written upon recovery does not have default database The purpose of this task is to ensure that ALTER TABLE is atomic even if the MariaDB server would be killed at any point of the alter table. This means that either the ALTER TABLE succeeds (including that triggers, the status tables and the binary log are updated) or things should be reverted to their original state. If the server crashes before the new version is fully up to date and commited, it will revert to the original table and remove all temporary files and tables. If the new version is commited, crash recovery will use the new version, and update triggers, the status tables and the binary log. The one execption is ALTER TABLE .. RENAME .. where no changes are done to table definition. This one will work as RENAME and roll back unless the whole statement completed, including updating the binary log (if enabled). Other changes: - Added handlerton->check_version() function to allow the ddl recovery code to check, in case of inplace alter table, if the table in the storage engine is of the new or old version. - Added handler->table_version() so that an engine can report the current version of the table. This should be changed each time the table definition changes. - Added ha_signal_ddl_recovery_done() and handlerton::signal_ddl_recovery_done() to inform all handlers when ddl recovery has been done. (Needed by InnoDB). - Added handlerton call inplace_alter_table_committed, to signal engine that ddl_log has been closed for the alter table query. - Added new handerton flag HTON_REQUIRES_NOTIFY_TABLEDEF_CHANGED_AFTER_COMMIT to signal when we should call hton->notify_tabledef_changed() during mysql_inplace_alter_table. This was required as MyRocks and InnoDB needed the call at different times. - Added function server_uuid_value() to be able to generate a temporary xid when ddl recovery writes the query to the binary log. This is needed to be able to handle crashes during ddl log recovery. - Moved freeing of the frm definition to end of mysql_alter_table() to remove duplicate code and have a common exit strategy. ------- InnoDB part of atomic ALTER TABLE (Implemented by Marko Mäkelä) innodb_check_version(): Compare the saved dict_table_t::def_trx_id to determine whether an ALTER TABLE operation was committed. We must correctly recover dict_table_t::def_trx_id for this to work. Before purge removes any trace of DB_TRX_ID from system tables, it will make an effort to load the user table into the cache, so that the dict_table_t::def_trx_id can be recovered. ha_innobase::table_version(): return garbage, or the trx_id that would be used for committing an ALTER TABLE operation. In InnoDB, table names starting with #sql-ib will remain special: they will be dropped on startup. This may be revisited later in MDEV-18518 when we implement proper undo logging and rollback for creating or dropping multiple tables in a transaction. Table names starting with #sql will retain some special meaning: dict_table_t::parse_name() will not consider such names for MDL acquisition, and dict_table_rename_in_cache() will treat such names specially when handling FOREIGN KEY constraints. Simplify InnoDB DROP INDEX. Prevent purge wakeup To ensure that dict_table_t::def_trx_id will be recovered correctly in case the server is killed before ddl_log_complete(), we will block the purge of any history in SYS_TABLES, SYS_INDEXES, SYS_COLUMNS between ha_innobase::commit_inplace_alter_table(commit=true) (purge_sys.stop_SYS()) and purge_sys.resume_SYS(). The completion callback purge_sys.resume_SYS() must be between ddl_log_complete() and MDL release. -------- MyRocks support for atomic ALTER TABLE (Implemented by Sergui Petrunia) Implement these SE API functions: - ha_rocksdb::table_version() - hton->check_version = rocksdb_check_versionMyRocks data dictionary now stores table version for each table. (Absence of table version record is interpreted as table_version=0, that is, which means no upgrade changes are needed) - For inplace alter table of a partitioned table, call the underlying handlerton when checking if the table is ok. This assumes that the partition engine commits all changes at once.	2021-05-19 22:54:13 +02:00
Monty	6aa9a552c2	MDEV-24576 Atomic CREATE TABLE There are a few different cases to consider Logging of CREATE TABLE and CREATE TABLE ... LIKE - If REPLACE is used and there was an existing table, DDL log the drop of the table. - If discovery of table is to be done - DDL LOG create table else - DDL log create table (with engine type) - create the table - If table was created - Log entry to binary log with xid - Mark DDL log completed Crash recovery: - If query was in binary log do nothing and exit - If discoverted table - Delete the .frm file -else - Drop created table and frm file - If table was dropped, write a DROP TABLE statement in binary log CREATE TABLE ... SELECT required a little more work as when one is using statement logging the query is written to the binary log before commit is done. This was fixed by adding a DROP TABLE to the binary log during crash recovery if the ddl log entry was not closed. In this case the binary log will contain: CREATE TABLE xxx ... SELECT .... DROP TABLE xxx; Other things: - Added debug_crash_here() functionality to Aria to be able to test crash in create table between the creation of the .MAI and the .MAD files.	2021-05-19 22:54:13 +02:00
Monty	7a588c30b1	MDEV-24408 Crash-safe DROP DATABASE Description of how DROP DATABASE works after this patch - Collect list of tables - DDL log tables as they are dropped - DDL log drop database - Delete db.opt - Delete data directory - Log either DROP TABLE or DROP DATABASE to binary log - De active ddl log entry This is in line of how things where before (minus ddl logging) except that we delete db.opt file last to not loose it if DROP DATABASE fails. On recovery we have to ensure that all dropped tables are logged in binary log and that they are properly dropped (as with atomic drop table). No new tables be dropped as part of recovery. Recovery of active drop database ddl log entry: - If drop database was logged to ddl log but was not found in the binary log: - drop the db.opt file and database directory. - Log DROP DATABASE to binary log - If drop database was not logged to ddl log - Update binary log with DROP TABLE of the dropped tables. If table list is longer than max_allowed_packet, then the query will be split into multiple DROP TABLE/VIEW queries. Other things: - Added DDL_LOG_STATE and 'current database' as arguments to mysql_rm_table_no_locks(). This was needed to be able to combine ddl logging of DROP DATABASE and DROP TABLE and make the generated DROP TABLE statements shorter. - To make the DROP TABLE statement created by ddl log shorter, I changed the binlogged query to use current directory and omit the directory part for all tables in the current directory. - Merged some DROP TABLE and DROP VIEW code in ddl logger. This was done to be able get separate DROP VIEW and DROP TABLE statements in the binary log. - Added a 'recovery_state' variable to remember the state of dropped tables and views. - Moved out code that drops database objects (stored procedures) from mysql_rm_db_internal() to drop_database_objects() for better code reuse. - Made mysql_rm_db_internal() global so that could be used by the ddl recovery code.	2021-05-19 22:54:13 +02:00
Monty	e3cfb7c803	MDEV-23844 Atomic DROP TABLE (single table) Logging logic: - Log tables, just before they are dropped, to the ddl log - After the last table for the statement is dropped, log an xid for the whole ddl log event In case of crash: - Remove first any active DROP TABLE events from the ddl log that matches xids found in binary log (this mean the drop was successful and was propery logged). - Loop over all active DROP TABLE events - Ensure that the table is completely dropped - Write a DROP TABLE entry to the binary log with the dropped tables. Other things: - Added code to ha_drop_table() to be able to tell the difference if a get_new_handler() failed because of out-of-memory or because the handler refused/was not able to create a a handler. This was needed to get sequences to work as sequences needs a share object to be passed to get_new_handler() - TC_LOG_BINLOG::recover() was changed to always collect Xid's from the binary log and always call ddl_log_close_binlogged_events(). This was needed to be able to collect DROP TABLE events with embedded Xid's (used by ddl log). - Added a new variable "$grep_script" to binlog filter to be able to find only rows that matches a regexp. - Had to adjust some test that changed because drop statements are a bit larger in the binary log than before (as we have to store the xid) Other things: - MDEV-25588 Atomic DDL: Binlog query event written upon recovery is corrupt fixed (in the original commit).	2021-05-19 22:54:12 +02:00
Monty	47010ccffa	MDEV-23842 Atomic RENAME TABLE - Major rewrite of ddl_log.cc and ddl_log.h - ddl_log.cc described in the beginning how the recovery works. - ddl_log.log has unique signature and is dynamic. It's easy to add more information to the header and other ddl blocks while still being able to execute old ddl entries. - IO_SIZE for ddl blocks is now dynamic. Can be changed without affecting recovery of old logs. - Code is more modular and is now usable outside of partition handling. - Renamed log file to dll_recovery.log and added option --log-ddl-recovery to allow one to specify the path & filename. - Added ddl_log_entry_phase[], number of phases for each DDL action, which allowed me to greatly simply set_global_from_ddl_log_entry() - Changed how strings are stored in log entries, which allows us to store much more information in a log entry. - ddl log is now always created at start and deleted on normal shutdown. This simplices things notable. - Added probes debug_crash_here() and debug_simulate_error() to simply crash testing and allow crash after a given number of times a probe is executed. See comments in debug_sync.cc and rename_table.test for how this can be used. - Reverting failed table and view renames is done trough the ddl log. This ensures that the ddl log is tested also outside of recovery. - Added helper function 'handler::needs_lower_case_filenames()' - Extend binary log with Q_XID events. ddl log handling is using this to check if a ddl log entry was logged to the binary log (if yes, it will be deleted from the log during ddl_log_close_binlogged_events() - If a DDL entry fails 3 time, disable it. This is to ensure that if we have a crash in ddl recovery code the server will not get stuck in a forever crash-restart-crash loop. mysqltest.cc changes: - --die will now replace $variables with their values - $error will contain the error of the last failed statement storage engine changes: - maria_rename() was changed to be more robust against crashes during rename.	2021-05-19 22:54:12 +02:00
Monty	d754d3d951	Less noise in the error log - Updated error messages for recovery - Changed printing of debug sync point information to make it fit 80 char	2021-05-19 22:54:12 +02:00
Monty	249263525f	Avoid creating the .frm file twice in some cases Other things: - Updated code comments & fixed indentation - Removed an old QQ (temporary) comment that does not apply anymore	2021-05-19 22:54:12 +02:00
Monty	a206658b98	Change CHARSET_INFO character set and collaction names to LEX_CSTRING This change removed 68 explict strlen() calls from the code. The following renames was done to ensure we don't use the old names when merging code from earlier releases, as using the new variables for print function could result in crashes: - charset->csname renamed to charset->cs_name - charset->name renamed to charset->coll_name Almost everything where mechanical changes except: - Changed to use the new Protocol::store(LEX_CSTRING..) when possible - Changed to use field->store(LEX_CSTRING, CHARSET_INFO) when possible - Changed to use String->append(LEX_CSTRING&) when possible Other things: - There where compiler issues with ensuring that all character set names points to the same string: gcc doesn't allow one to use integer constants when defining global structures (constant char * pointers works fine). To get around this, I declared defines for each character set name length.	2021-05-19 22:54:07 +02:00
Monty	b6ff139aa3	Reduce usage of strlen() Changes: - To detect automatic strlen() I removed the methods in String that uses 'const char ' without a length: - String::append(const char) - Binary_string(const char str) - String(const char str, CHARSET_INFO cs) - append_for_single_quote(const char ) All usage of append(const char) is changed to either use String::append(char), String::append(const char, size_t length) or String::append(LEX_CSTRING) - Added STRING_WITH_LEN() around constant string arguments to String::append() - Added overflow argument to escape_string_for_mysql() and escape_quotes_for_mysql() instead of returning (size_t) -1 on overflow. This was needed as most usage of the above functions never tested the result for -1 and would have given wrong results or crashes in case of overflows. - Added Item_func_or_sum::func_name_cstring(), which returns LEX_CSTRING. Changed all Item_func::func_name()'s to func_name_cstring()'s. The old Item_func_or_sum::func_name() is now an inline function that returns func_name_cstring().str. - Changed Item::mode_name() and Item::func_name_ext() to return LEX_CSTRING. - Changed for some functions the name argument from const char * to to const LEX_CSTRING &: - Item::Item_func_fix_attributes() - Item::check_type_...() - Type_std_attributes::agg_item_collations() - Type_std_attributes::agg_item_set_converter() - Type_std_attributes::agg_arg_charsets...() - Type_handler_hybrid_field_type::aggregate_for_result() - Type_handler_geometry::check_type_geom_or_binary() - Type_handler::Item_func_or_sum_illegal_param() - Predicant_to_list_comparator::add_value_skip_null() - Predicant_to_list_comparator::add_value() - cmp_item_row::prepare_comparators() - cmp_item_row::aggregate_row_elements_for_comparison() - Cursor_ref::print_func() - Removes String_space() as it was only used in one cases and that could be simplified to not use String_space(), thanks to the fixed my_vsnprintf(). - Added some const LEX_CSTRING's for common strings: - NULL_clex_str, DATA_clex_str, INDEX_clex_str. - Changed primary_key_name to a LEX_CSTRING - Renamed String::set_quick() to String::set_buffer_if_not_allocated() to clarify what the function really does. - Rename of protocol function: bool store(const char from, CHARSET_INFO cs) to bool store_string_or_null(const char from, CHARSET_INFO cs). This was done to both clarify the difference between this 'store' function and also to make it easier to find unoptimal usage of store() calls. - Added Protocol::store(const LEX_CSTRING, CHARSET_INFO) - Changed some 'const char' arrays to instead be of type LEX_CSTRING. - class Item_func_units now used LEX_CSTRING for name. Other things: - Fixed a bug in mysql.cc:construct_prompt() where a wrong escape character in the prompt would cause some part of the prompt to be duplicated. - Fixed a lot of instances where the length of the argument to append is known or easily obtain but was not used. - Removed some not needed 'virtual' definition for functions that was inherited from the parent. I added override to these. - Fixed Ordered_key::print() to preallocate needed buffer. Old code could case memory overruns. - Simplified some loops when adding char to a String with delimiters.	2021-05-19 22:27:48 +02:00
Alexey Botchkov	d2e2219e46	MDEV-25146 JSON_TABLE: Non-descriptive + wrong error messages upon trying to store array or object. More informative messages added. Do not issue additional messages if already handled.	2021-04-21 10:21:45 +04:00
Alexey Botchkov	e9fd327ee3	MDEV-17399 Add support for JSON_TABLE. The specific table handler for the table functions was introduced, and used to implement JSON_TABLE.	2021-04-21 10:21:43 +04:00
Marko Mäkelä	d2e2d32933	Merge 10.5 into 10.6	2021-04-14 12:32:27 +03:00
Marko Mäkelä	6c3e860cbf	Merge 10.4 into 10.5	2021-04-14 11:35:39 +03:00

... 7 8 9 10 11 ...

3175 Commits