row_search_for_mysql(): When a secondary index record might not be
visible in the current transaction's read view and we consult the
clustered index and optionally some undo log records, return the
relevant columns of the clustered index record to MySQL instead of the
secondary index record.
ibuf_insert_to_index_page_low(): New function, refactored from
ibuf_insert_to_index_page().
ibuf_insert_to_index_page(): When we are inserting a record in place
of a delete-marked record and some fields of the record differ, update
that record just like row_ins_sec_index_entry_by_modify() would do.
btr_cur_update_alloc_zip(): Make the function public.
mysql_row_templ_t: Add clust_rec_field_no.
row_sel_store_mysql_rec(), row_sel_push_cache_row_for_mysql(): Add the
flag rec_clust, for returning data at clust_rec_field_no instead of
rec_field_no. Resurrect the debug assertion that the record not be
marked for deletion. (Bug #55626)
[UNIV_DEBUG || UNIV_IBUF_DEBUG] ibuf_debug, buf_page_get_gen(),
buf_flush_page_try():
Implement innodb_change_buffering_debug=1 for evicting pages from the
buffer pool, so that change buffering will be attempted more
frequently.
This is a regression from the fix for bug no 38999. A storage engine capable
of reading only a subset of a table's columns updates corresponding bits in
the read buffer to signal that it has read NULL values for the corresponding
columns. It cannot, and should not, update any other bits. Bug no 38999
occurred because the implementation of UPDATE statements compare the NULL bits
using memcmp, inadvertently comparing bits that were never requested from the
storage engine. The regression was caused by the storage engine trying to
alleviate the situation by writing to all NULL bits, even those that it had no
knowledge of. This has devastating effects for the index merge algorithm,
which relies on all NULL bits, except those explicitly requested, being left
unchanged.
The fix reverts the fix for bug no 38999 in both InnoDB and InnoDB plugin and
changes the server's method of comparing records. For engines that always read
entire rows, we proceed as usual. For engines capable of reading only select
columns, the record buffers are now compared on a column by column basis. An
assertion was also added so that non comparable buffers are never read. Some
relevant copy-pasted code was also consolidated in a new function.
This is a simple optimization issue. All stats are related to only indexed
columns, index size or number of rows in the whole table. UPDATEs that touch
only non-indexed columns cannot affect stats and we can avoid calling the
function row_update_statistics_if_needed() which may result in unnecessary I/O.
Approved by: Marko (rb://466)
Fix compiler warning:
row/row0vers.c: In function 'row_vers_impl_x_locked_off_kernel':
row/row0vers.c:74:9: error: variable 'err' set but not used [-Werror=unused-but-set-variable]
Fix compiler warning:
row/row0umod.c: In function 'row_undo_mod_clust_low':
row/row0umod.c:117:9: error: variable 'success' set but not used [-Werror=unused-but-set-variable]
Fix compiler warning:
row/row0purge.c: In function 'row_purge_step':
row/row0purge.c:687:9: error: variable 'err' set but not used [-Werror=unused-but-set-variable]
Remove a bogus debug assertion that triggered the bug.
Add assertions precisely where records must not be delete-marked.
And a comment to clarify when the record is allowed to be delete-marked.
The bug is due to a double delete of a BLOB, once via:
rollback -> btr_cur_pessimistic_delete()
and the second time via purge.
The bug is in row_upd_clust_rec_by_insert(). There we relinquish ownership
of the non-updated BLOB columns in btr_cur_mark_extern_inherited_fields()
before building the row entry that will be inserted and whose contents will
be logged in the UNDO log. However, we don't set the BLOB column later to
INHERITED so that a possible rollback will not free the original row's
non-updated BLOB entries. This is because the condition that checks for
that is in :
if (node->upd_ext) {}.
node->upd_ext is non-NULL only if a BLOB column was updated and that column
is part of some key ordering (see row_upd_replace()). This results in the
non-update BLOB columns being deleted during a rollback and subsequently by
purge again.
rb://413
(READ UNCOMMITTED access failure of off-page DYNAMIC or COMPRESSED columns).
Records that lack incompletely written externally stored columns may
be accessed by READ UNCOMMITTED transaction even without involving a
crash during an INSERT or UPDATE operation. I verified this as follows.
(1) added a delay after the mini-transaction for writing the clustered
index 'stub' record was committed (patch attached)
(2) started mysqld in gdb, setting breakpoints to the where the
assertions about READ UNCOMMITTED were added in the bug fix
(3) invoked ibtest3 --create-options=key_block_size=2
to create BLOBs in a COMPRESSED table
(4) invoked the following:
yes 'set transaction isolation level read uncommitted;
checksum table blobt3;select sleep(1);'|mysql -uroot test
(5) noted that one of the breakpoints was triggered
(return(NULL) in btr_rec_copy_externally_stored_field())
=== modified file 'storage/innodb_plugin/row/row0ins.c'
--- storage/innodb_plugin/row/row0ins.c 2010-06-30 08:17:25 +0000
+++ storage/innodb_plugin/row/row0ins.c 2010-06-30 08:17:25 +0000
@@ -2120,6 +2120,7 @@ function_exit:
rec_t* rec;
ulint* offsets;
mtr_start(&mtr);
+ os_thread_sleep(5000000);
btr_cur_search_to_nth_level(index, 0, entry, PAGE_CUR_LE,
BTR_MODIFY_TREE, &cursor, 0,
=== modified file 'storage/innodb_plugin/row/row0upd.c'
--- storage/innodb_plugin/row/row0upd.c 2010-06-30 08:11:55 +0000
+++ storage/innodb_plugin/row/row0upd.c 2010-06-30 08:11:55 +0000
@@ -1763,6 +1763,7 @@ row_upd_clust_rec(
rec_offs_init(offsets_);
mtr_start(mtr);
+ os_thread_sleep(5000000);
ut_a(btr_pcur_restore_position(BTR_MODIFY_TREE, pcur, mtr));
rec = btr_cur_get_rec(btr_cur);
dict_table_get_format(index->table)
The REDUNDANT and COMPACT formats store a local 768-byte prefix of
each externally stored column. No row_ext cache is needed, but we
initialized one nevertheless. When the BLOB pointer was zero, we would
ignore the locally stored prefix as well. This triggered an assertion
failure in row_undo_mod_upd_exist_sec().
row_build(): Allow ext==NULL when a REDUNDANT or COMPACT table
contains externally stored columns.
row_undo_search_clust_to_pcur(), row_upd_store_row(): Invoke
row_build() with ext==NULL on REDUNDANT and COMPACT tables.
rb://382 approved by Jimmy Yang
columns
When the server crashes after a record stub has been inserted and
before all its off-page columns have been written, the record will
contain incomplete off-page columns after crash recovery. Such records
may only be accessed at the READ UNCOMMITTED isolation level or when
rolling back a recovered transaction in recv_recovery_rollback_active().
Skip these records at the READ UNCOMMITTED isolation level.
TODO: Add assertions for checking the above assumptions hold when an
incomplete BLOB is encountered.
btr_rec_copy_externally_stored_field(): Return NULL if the field is
incomplete.
row_prebuilt_t::templ_contains_blob: Clarify what "BLOB" means in this
context. Hint: MySQL BLOBs are not the same as InnoDB BLOBs.
row_sel_store_mysql_rec(): Return FALSE if not all columns could be
retrieved. Previously this function always returned TRUE. Assert that
the record is not delete-marked.
row_sel_push_cache_row_for_mysql(): Return FALSE if not all columns
could be retrieved.
row_search_for_mysql(): Skip records containing incomplete off-page
columns. Assert that the transaction isolation level is READ
UNCOMMITTED.
rb://380 approved by Jimmy Yang
and clarifies the invariant in dict_table_get_on_id().
In Mar 2007 Marko observed a crash during recovery, the crash resulted from
an UNDO operation on a system table. His solution was to acquire an X lock on
the data dictionary, this in hindsight was an overkill. It is unclear what
caused the crash, current hypothesis is that it was a memory corruption.
The X lock results in performance issues by when undoing changes due to
rollback during normal operation on regular tables.
Why the change is safe:
======================
The InnoDB code has changed since the original X lock change was made. In the
new code we always lock the data dictionary in X mode during startup when
UNDOing operations on the system tables (this is a given). This ensures that
the crash Marko observed cannot happen as long as all transactions that update
the system tables follow the standard rules by setting the appropriate DICT_OP
flag when writing the log records when they make the changes.
If transactions violate the above mentioned rule then during recovery (at
startup) the rollback code (see trx0roll.c) will not acquire the X lock
and we will see the crash again. This will however be a different bug.
when renaming tables
Allocate the table name using ut_malloc() instead of table->heap because
the latter cannot be freed.
Adjust dict_sys->size calculations all over the code.
Change dict_table_t::name from const char* to char* because we need to
ut_malloc()/ut_free() it.
Reviewed by: Inaam, Marko, Heikki (rb://384)
Approved by: Heikki (rb://384)
(InnoDB plugin branch)
mysql-test/suite/innodb_plugin/r/innodb_mysql.result:
test case
mysql-test/suite/innodb_plugin/t/innodb_mysql.test:
test case
storage/innodb_plugin/row/row0sel.c:
init null bytes with default values as they might be
left uninitialized in some cases and these uninited bytes
might be copied into mysql record buffer that leads to
valgrind warnings on next use of the buffer.
In semi-consistent read, only unlock freshly locked non-matching records.
lock_rec_lock_fast(): Return LOCK_REC_SUCCESS,
LOCK_REC_SUCCESS_CREATED, or LOCK_REC_FAIL instead of TRUE/FALSE.
enum db_err: Add DB_SUCCESS_LOCKED_REC for indicating a successful
operation where a record lock was created.
lock_sec_rec_read_check_and_lock(),
lock_clust_rec_read_check_and_lock(), lock_rec_enqueue_waiting(),
lock_rec_lock_slow(), lock_rec_lock(), row_ins_set_shared_rec_lock(),
row_ins_set_exclusive_rec_lock(), sel_set_rec_lock(),
row_sel_get_clust_rec_for_mysql(): Return DB_SUCCESS_LOCKED_REC if a
new record lock was created. Adjust callers.
row_unlock_for_mysql(): Correct the function documentation.
row_prebuilt_t::new_rec_locks: Correct the documentation.
lock_rec_unlock(): Cache first_lock and rewrite while() loops as for().
btr_cur_optimistic_update(): Use common error handling return.
row_create_prebuilt(): Add Valgrind instrumentation.
Store the max_space_id in the data dictionary header in order to avoid
space_id reuse.
DICT_HDR_MIX_ID: Renamed to DICT_HDR_MAX_SPACE_ID, DICT_HDR_MIX_ID_LOW.
dict_hdr_get_new_id(): Return table_id, index_id, space_id or a subset of them.
fil_system_t: Add ibool space_id_reuse_warned.
fil_create_new_single_table_tablespace(): Get the space_id from the caller.
fil_space_create(): Issue a warning if the fil_system->max_assigned_id
is exceeded.
fil_assign_new_space_id(): Return TRUE/FALSE and take a pointer to the
space_id as a parameter. Make the function public.
fil_init(): Initialize all fil_system fields by mem_zalloc(). Remove
explicit initializations of certain fields to 0 or NULL.
------------------------------------------------------------
revno: 3094
revision-id: vasil.dimov@oracle.com-20100513074652-0cvlhgkesgbb2bfh
parent: vasil.dimov@oracle.com-20100512173700-byf8xntxjur1hqov
committer: Vasil Dimov <vasil.dimov@oracle.com>
branch nick: mysql-trunk-innodb
timestamp: Thu 2010-05-13 10:46:52 +0300
message:
Followup to Bug#51920, fix binlog.binlog_killed
This is a followup to the fix of
Bug#51920 Innodb connections in row lock wait ignore KILL until lock wait
timeout
in that fix (rb://279) the behavior was changed to honor when a trx is
interrupted during lock wait, but the returned error code was still
"lock wait timeout" when it should be "interrupted".
This change fixes the non-deterministically failing test binlog.binlog_killed,
that failed like this:
binlog.binlog_killed 'stmt' [ fail ]
Test ended at 2010-05-12 11:39:08
CURRENT_TEST: binlog.binlog_killed
mysqltest: At line 208: query 'reap' failed with wrong errno 1205: 'Lock wait timeout exceeded; try restarting transaction', instead of 0...
Approved by: Sunny Bains (rb://344)
------------------------------------------------------------
Also make InnoDB thinks that /*/ only starts a comment. (Bug #53644).
This fixes the bugs in the InnoDB Plugin.
ha_innodb.h: Use trx_query_string() instead of trx_query() when
available (MySQL 5.1.42 or later).
innobase_get_stmt(): New function, to retrieve the currently running
SQL statement.
struct trx_struct: Remove mysql_query_str. Use innobase_get_stmt() instead.
dict_strip_comments(): Add and observe the parameter sql_length. Treat
/*/ as the start of a comment.
dict_create_foreign_constraints(), row_table_add_foreign_constraints():
Add the parameter sql_length.
Bugfix for 53290, fast unique index creation fails on duplicate null values
Summary:
Bug in the fast index creation code incorrectly considers null
values to be duplicates during block merging. Innodb policy is that
multiple null values are allowed in a unique index. Null duplicates
were correctly ignored while sorting individual blocks and with slow
index creation.
Test Plan:
mtr, including new test, load dbs using deferred index creation
DiffCamp Revision: 110840
Reviewed By: mcallaghan
CC: mcallaghan, mysql-devel@lists
Revert Plan:
OK