We implement an idea that was suggested by Michael 'Monty' Widenius
in October 2017: When InnoDB is inserting into an empty table or partition,
we can write a single undo log record TRX_UNDO_EMPTY, which will cause
ROLLBACK to clear the table.
For this to work, the insert into an empty table or partition must be
covered by an exclusive table lock that will be held until the transaction
has been committed or rolled back, or the INSERT operation has been
rolled back (and the table is empty again), in lock_table_x_unlock().
Clustered index records that are covered by the TRX_UNDO_EMPTY record
will carry DB_TRX_ID=0 and DB_ROLL_PTR=1<<55, and thus they cannot
be distinguished from what MDEV-12288 leaves behind after purging the
history of row-logged operations.
Concurrent non-locking reads must be adjusted: If the read view was
created before the INSERT into an empty table, then we must continue
to imagine that the table is empty, and not try to read any records.
If the read view was created after the INSERT was committed, then
all records must be visible normally. To implement this, we introduce
the field dict_table_t::bulk_trx_id.
This special handling only applies to the very first INSERT statement
of a transaction for the empty table or partition. If a subsequent
statement in the transaction is modifying the initially empty table again,
we must enable row-level undo logging, so that we will be able to
roll back to the start of the statement in case of an error (such as
duplicate key).
INSERT IGNORE will continue to use row-level logging and locking, because
implementing it would require the ability to roll back the latest row.
Since the undo log that we write only allows us to roll back the entire
statement, we cannot support INSERT IGNORE. We will introduce a
handler::extra() parameter HA_EXTRA_IGNORE_INSERT to indicate to storage
engines that INSERT IGNORE is being executed.
In many test cases, we add an extra record to the table, so that during
the 'interesting' part of the test, row-level locking and logging will
be used.
Replicas will continue to use row-level logging and locking until
MDEV-24622 has been addressed. Likewise, this optimization will be
disabled in Galera cluster until MDEV-24623 enables it.
dict_table_t::bulk_trx_id: The latest active or committed transaction
that initiated an insert into an empty table or partition.
Protected by exclusive table lock and a clustered index leaf page latch.
ins_node_t::bulk_insert: Whether bulk insert was initiated.
trx_t::mod_tables: Use C++11 style accessors (emplace instead of insert).
Unlike earlier, this collection will cover also temporary tables.
trx_mod_table_time_t: Add start_bulk_insert(), end_bulk_insert(),
is_bulk_insert(), was_bulk_insert().
trx_undo_report_row_operation(): Before accessing any undo log pages,
invoke trx->mod_tables.emplace() in order to determine whether undo
logging was disabled, or whether this is the first INSERT and we are
supposed to write a TRX_UNDO_EMPTY record.
row_ins_clust_index_entry_low(): If we are inserting into an empty
clustered index leaf page, set the ins_node_t::bulk_insert flag for
the subsequent trx_undo_report_row_operation() call.
lock_rec_insert_check_and_lock(), lock_prdt_insert_check_and_lock():
Remove the redundant parameter 'flags' that can be checked in the caller.
btr_cur_ins_lock_and_undo(): Simplify the logic. Correctly write
DB_TRX_ID,DB_ROLL_PTR after invoking trx_undo_report_row_operation().
trx_mark_sql_stat_end(), ha_innobase::extra(HA_EXTRA_IGNORE_INSERT),
ha_innobase::external_lock(): Invoke trx_t::end_bulk_insert() so that
the next statement will not be covered by table-level undo logging.
ReadView::changes_visible(trx_id_t) const: New accessor for the case
where the trx_id_t is not read from a potentially corrupted index page
but directly from the memory. In this case, we can skip a sanity check.
row_sel(), row_sel_try_search_shortcut(), row_search_mvcc():
row_sel_try_search_shortcut_for_mysql(),
row_merge_read_clustered_index(): Check dict_table_t::bulk_trx_id.
row_sel_clust_sees(): Replaces lock_clust_rec_cons_read_sees().
lock_sec_rec_cons_read_sees(): Replaced with lower-level code.
btr_root_page_init(): Refactored from btr_create().
dict_index_t::clear(), dict_table_t::clear(): Empty an index or table,
for the ROLLBACK of an INSERT operation.
ROW_T_EMPTY, ROW_OP_EMPTY: Note a concurrent ROLLBACK of an INSERT
into an empty table.
This is joint work with Thirunarayanan Balathandayuthapani,
who created a working prototype.
Thanks to Matthias Leich for extensive testing.
The regression that was introduced in
commit 723f87e9d3
was fixed as part of MDEV-13333
(commit 3b37edee1a)
without a test case, because the MDEV-13333 test case
is even less deterministic than these ones.
Implementation of MDEV-7660 introduced unwanted incompatible change:
modifications under LOCK TABLES with autocommit enabled are rolled back on
disconnect. Previously everything was committed, because LOCK TABLES didn't
adjust autocommit setting.
This patch restores original behavior by reverting some changes done in
MDEV-7660:
- sql/sql_parse.cc: do not reset autocommit on LOCK TABLES
- sql/sql_base.cc: do not set autocommit on UNLOCK TABLES
- test cases: main.lock_tables_lost_commit, main.partition_explicit_prune,
rpl.rpl_switch_stm_row_mixed, tokudb.nested_txn_implicit_commit,
tokudb_bugs.db806
But it makes InnoDB tables under LOCK TABLES ... READ [LOCAL] not protected
against DML. To restore protection some changes from WL#6671 were merged,
specifically MDL_SHARED_READ_ONLY and test cases.
WL#6671 merge highlights:
- Not all tests merged.
- In MySQL LOCK TABLES ... READ acquires MDL_SHARED_READ_ONLY for all engines,
in MariaDB MDL_SHARED_READ is always acquired first and then upgraded to
MDL_SHARED_READ_ONLY for InnoDB only.
- The above allows us to omit MDL_SHARED_WRITE_LOW_PRIO implementation in
MariaDB, which is rather useless with InnoDB. In MySQL it is needed to
preserve locking behavior between low priority writes and LOCK TABLES ... READ
for non-InnoDB engines (covered by sys_vars.sql_low_priority_updates_func).
- Omitted HA_NO_READ_LOCAL_LOCK, we rely on lock_count() instead.
- Omitted "piglets": in MariaDB stream of DML against InnoDB table may lead to
concurrent LOCK TABLES ... READ starvation.
- HANDLER ... OPEN acquires MDL_SHARED_READ instead of MDL_SHARED in MariaDB.
- Omitted SNRW->X MDL lock upgrade for IMPORT/DISCARD TABLESPAECE under LOCK
TABLES.
- Omitted strong locks for views, triggers and SP under LOCK TABLES.
- Omitted IX schema lock for LOCK TABLES READ.
- Omitted deadlock weight juggling for LOCK TABLES.
Full WL#6671 merge status:
- innodb.innodb-lock: fully merged
- main.alter_table: not merged due to different HANDLER solution
- main.debug_sync: fully merged
- main.handler_innodb: not merged due to different HANDLER solution
- main.handler_myisam: not merged due to different HANDLER solution
- main.innodb_mysql_lock: fully merged
- main.insert_notembedded: fully merged
- main.lock: not merged (due to no strong locks for views)
- main.lock_multi: not merged
- main.lock_sync: fully merged (partially in MDEV-7660)
- main.mdl_sync: not merged
- main.partition_debug_sync: not merged due to different HANDLER solution
- main.status: fully merged
- main.view: fully merged
- perfschema.mdl_func: not merged (no such test in MariaDB)
- perfschema.table_aggregate_global_2u_2t: not merged (didn't fail in MariaDB)
- perfschema.table_aggregate_global_2u_3t: not merged (didn't fail in MariaDB)
- perfschema.table_aggregate_global_4u_2t: not merged (didn't fail in MariaDB)
- perfschema.table_aggregate_global_4u_3t: not merged (didn't fail in MariaDB)
- perfschema.table_aggregate_hist_2u_2t: not merged (didn't fail in MariaDB)
- perfschema.table_aggregate_hist_2u_3t: not merged (didn't fail in MariaDB)
- perfschema.table_aggregate_hist_4u_2t: not merged (didn't fail in MariaDB)
- perfschema.table_aggregate_hist_4u_3t: not merged (didn't fail in MariaDB)
- perfschema.table_aggregate_thread_2u_2t: not merged (didn't fail in MariaDB)
- perfschema.table_aggregate_thread_2u_3t: not merged (didn't fail in MariaDB)
- perfschema.table_aggregate_thread_4u_2t: not merged (didn't fail in MariaDB)
- perfschema.table_aggregate_thread_4u_3t: not merged (didn't fail in MariaDB)
- perfschema.table_lock_aggregate_global_2u_2t: not merged (didn't fail in MariaDB)
- perfschema.table_lock_aggregate_global_2u_3t: not merged (didn't fail in MariaDB)
- perfschema.table_lock_aggregate_global_4u_2t: not merged (didn't fail in MariaDB)
- perfschema.table_lock_aggregate_global_4u_3t: not merged (didn't fail in MariaDB)
- perfschema.table_lock_aggregate_hist_2u_2t: not merged (didn't fail in MariaDB)
- perfschema.table_lock_aggregate_hist_2u_3t: not merged (didn't fail in MariaDB)
- perfschema.table_lock_aggregate_hist_4u_2t: not merged (didn't fail in MariaDB)
- perfschema.table_lock_aggregate_hist_4u_3t: not merged (didn't fail in MariaDB)
- perfschema.table_lock_aggregate_thread_2u_2t: not merged (didn't fail in MariaDB)
- perfschema.table_lock_aggregate_thread_2u_3t: not merged (didn't fail in MariaDB)
- perfschema.table_lock_aggregate_thread_4u_2t: not merged (didn't fail in MariaDB)
- perfschema.table_lock_aggregate_thread_4u_3t: not merged (didn't fail in MariaDB)
- sys_vars.sql_low_priority_updates_func: not merged
- include/thr_rwlock.h: not merged, rw_pr_lock_assert_write_owner and
rw_pr_lock_assert_not_write_owner are macros in MariaDB
- sql/handler.h: not merged (HA_NO_READ_LOCAL_LOCK)
- sql/mdl.cc: partially merged (MDL_SHARED_READ_ONLY only)
- sql/mdl.h: partially merged (MDL_SHARED_READ_ONLY only)
- sql/lock.cc: fully merged
- sql/sp_head.cc: not merged
- sql/sp_head.h: not merged
- sql/sql_base.cc: partially merged (MDL_SHARED_READ_ONLY only)
- sql/sql_base.h: not merged
- sql/sql_class.cc: fully merged
- sql/sql_class.h: fully merged
- sql/sql_handler.cc: merged partially (different solution in MariaDB)
- sql/sql_parse.cc: partially merged, mostly omitted low priority write part
- sql/sql_reload.cc: not merged comment change
- sql/sql_table.cc: not merged SNRW->X upgrade for IMPORT/DISCARD TABLESPACE
- sql/sql_view.cc: not merged
- sql/sql_yacc.yy: not merged (MDL_SHARED_WRITE_LOW_PRIO, MDL_SHARED_READ_ONLY)
- sql/table.cc: not merged (MDL_SHARED_WRITE_LOW_PRIO)
- sql/table.h: not merged (MDL_SHARED_WRITE_LOW_PRIO)
- sql/trigger.cc: not merged
- storage/innobase/handler/ha_innodb.cc: merged store_lock()/lock_count()
changes (in MDEV-7660), didn't merge HA_NO_READ_LOCAL_LOCK
- storage/innobase/handler/ha_innodb.h: fully merged in MDEV-7660
- storage/myisammrg/ha_myisammrg.cc: not merged comment change
- storage/perfschema/table_helper.cc: not merged (no MDL support in MariaDB PFS)
- unittest/gunit/mdl-t.cc: not merged
- unittest/gunit/mdl_sync-t.cc: not merged
MariaDB specific changes:
- handler.heap: different HANDLER solution, MDEV-7660
- handler.innodb: different HANDLER solution, MDEV-7660
- handler.interface: different HANDLER solution, MDEV-7660
- handler.myisam: different HANDLER solution, MDEV-7660
- main.mdl_sync: MDEV-7660 specific changes
- main.partition_debug_sync: removed test due to different HANDLER solution,
MDEV-7660
- main.truncate_coverage: removed test due to different HANDLER solution,
MDEV-7660
- mysql-test/include/mtr_warnings.sql: additional cleanup, MDEV-7660
- mysql-test/lib/v1/mtr_report.pl: additional cleanup, MDEV-7660
- plugin/metadata_lock_info/metadata_lock_info.cc: not in MySQL
- sql/sql_handler.cc: MariaDB specific fix for mysql_ha_read(), MDEV-7660
create table t1 (a smallint primary key auto_increment);
insert into t1 values(32767);
insert into t1 values(NULL);
ERROR 1062 (23000): Duplicate entry '32767' for key 'PRIMARY
Now on always gets error HA_ERR_AUTOINC_RANGE=167 "Out of range value for column", independent of
store engine, SQL Mode or number of inserted rows. This is an unique error that is easier to test for in replication.
Another bug fix is that we now get an error when trying to insert a too big auto-generated value, even in non-strict mode.
Before one get insted the max column value inserted.
This patch also fixes some issues with inserting negative numbers in an auto-increment column.
Fixed the ER_DUP_ENTRY and HA_ERR_AUTOINC_ERANGE are compared the same between master and slave.
This ensures that replication works between an old server to a new slave for auto-increment overflow errors.
Added SQLSTATE errors for handler errors
Smaller bug fixes:
* Added warnings for duplicate key errors when using INSERT IGNORE
* Fixed bug when using --skip-log-bin followed by --log-bin, which did set log-bin to "0"
* Allow one to see how cmake is called by using --just-print --just-configure
BUILD/FINISH.sh:
--just-print --just-configure now shows how cmake would be invoked. Good for understanding parameters to cmake.
cmake/configure.pl:
--just-print --just-configure now shows how cmake would be invoked. Good for understanding parameters to cmake.
include/CMakeLists.txt:
Added handler_state.h
include/handler_state.h:
SQLSTATE for handler error messages.
Required for HA_ERR_AUTOINC_ERANGE, but solves also some other cases.
mysql-test/extra/binlog_tests/binlog.test:
Fixed old wrong behaviour
Added more tests
mysql-test/extra/binlog_tests/binlog_insert_delayed.test:
Reset binary log to only print what's necessary in show_binlog_events
mysql-test/extra/rpl_tests/rpl_auto_increment.test:
Update to new error codes
mysql-test/extra/rpl_tests/rpl_insert_delayed.test:
Ignore warnings as this depends on how the test is run
mysql-test/include/strict_autoinc.inc:
On now gets an error on overflow
mysql-test/r/auto_increment.result:
Update results after fixing error message
mysql-test/r/auto_increment_ranges_innodb.result:
Test new behaviour
mysql-test/r/auto_increment_ranges_myisam.result:
Test new behaviour
mysql-test/r/commit_1innodb.result:
Added warnings for duplicate key error
mysql-test/r/create.result:
Added warnings for duplicate key error
mysql-test/r/insert.result:
Added warnings for duplicate key error
mysql-test/r/insert_select.result:
Added warnings for duplicate key error
mysql-test/r/insert_update.result:
Added warnings for duplicate key error
mysql-test/r/mix2_myisam.result:
Added warnings for duplicate key error
mysql-test/r/myisam_mrr.result:
Added warnings for duplicate key error
mysql-test/r/null_key.result:
Added warnings for duplicate key error
mysql-test/r/replace.result:
Update to new error codes
mysql-test/r/strict_autoinc_1myisam.result:
Update to new error codes
mysql-test/r/strict_autoinc_2innodb.result:
Update to new error codes
mysql-test/r/strict_autoinc_3heap.result:
Update to new error codes
mysql-test/r/trigger.result:
Added warnings for duplicate key error
mysql-test/r/xtradb_mrr.result:
Added warnings for duplicate key error
mysql-test/suite/binlog/r/binlog_innodb_row.result:
Updated result
mysql-test/suite/binlog/r/binlog_row_binlog.result:
Out of range data for auto-increment is not inserted anymore
mysql-test/suite/binlog/r/binlog_statement_insert_delayed.result:
Updated result
mysql-test/suite/binlog/r/binlog_stm_binlog.result:
Out of range data for auto-increment is not inserted anymore
mysql-test/suite/binlog/r/binlog_unsafe.result:
Updated result
mysql-test/suite/innodb/r/innodb-autoinc.result:
Update to new error codes
mysql-test/suite/innodb/r/innodb-lock.result:
Updated results
mysql-test/suite/innodb/r/innodb.result:
Updated results
mysql-test/suite/innodb/r/innodb_bug56947.result:
Updated results
mysql-test/suite/innodb/r/innodb_mysql.result:
Updated results
mysql-test/suite/innodb/t/innodb-autoinc.test:
Update to new error codes
mysql-test/suite/maria/maria3.result:
Updated result
mysql-test/suite/maria/mrr.result:
Updated result
mysql-test/suite/optimizer_unfixed_bugs/r/bug43617.result:
Updated result
mysql-test/suite/rpl/r/rpl_auto_increment.result:
Update to new error codes
mysql-test/suite/rpl/r/rpl_insert_delayed,stmt.rdiff:
Updated results
mysql-test/suite/rpl/r/rpl_loaddatalocal.result:
Updated results
mysql-test/t/auto_increment.test:
Update to new error codes
mysql-test/t/auto_increment_ranges.inc:
Test new behaviour
mysql-test/t/auto_increment_ranges_innodb.test:
Test new behaviour
mysql-test/t/auto_increment_ranges_myisam.test:
Test new behaviour
mysql-test/t/replace.test:
Update to new error codes
mysys/my_getopt.c:
Fixed bug when using --skip-log-bin followed by --log-bin, which did set log-bin to "0"
sql/handler.cc:
Ignore negative values for signed auto-increment columns
Always give an error if we get an overflow for an auto-increment-column (instead of inserting the max value)
Ensure that the row number is correct for the out-of-range-value error message.
******
Fixed wrong printing of column namn for "Out of range value" errors
Fixed that INSERT_ID is correctly replicated also for out-of-range autoincrement values
Fixed that print_keydup_error() can also be used to generate warnings
******
Return HA_ERR_AUTOINC_ERANGE (167) instead of ER_WARN_DATA_OUT_OF_RANGE for auto-increment overflow
sql/handler.h:
Allow INSERT IGNORE to continue also after out-of-range inserts.
Fixed that print_keydup_error() can also be used to generate warnings
sql/log_event.cc:
Added DBUG_PRINT
Fixed the ER_AUTOINC_READ_FAILED, ER_DUP_ENTRY and HA_ERR_AUTOINC_ERANGE are compared the same between master and slave.
This ensures that replication works between an old server to a new slave for auto-increment overflow errors.
sql/sql_insert.cc:
Add warnings for duplicate key errors when using INSERT IGNORE
sql/sql_state.c:
Added handler errors
sql/sql_table.cc:
Update call to print_keydup_error()
storage/innobase/handler/ha_innodb.cc:
Fixed increment handling of auto-increment columns to be consistent with rest of MariaDB.
storage/xtradb/handler/ha_innodb.cc:
Fixed increment handling of auto-increment columns to be consistent with rest of MariaDB.
The bug was accidentally fixed by fixing
Bug#11759688 52020: InnoDB can still deadlock on just INSERT...ON DUPLICATE KEY
a.k.a. the reintroduction of
Bug#7975 deadlock without any locking, simple select and update