Regression from bug#14621190 due to disabled optimistic restoration
of cursor, which required full key lookup instead of verifying
if previously positioned btree cursor could be reused.
Fixed by enable optimistic restore and adjust cursor afterward.
rb#3324 approved by Marko.
INDEX_READ_MAP HAD NO MATCH
If index_read_map is called for exact search and no matching records
exists it will position the cursor on the next record, but still having the
relative position to BTR_PCUR_ON.
This will make a call for index_next to read yet another next record,
instead of returning the record the cursor points to.
Fixed by setting pcur->rel_pos = BTR_PCUR_BEFORE if an exact
[prefix] search is done, but failed.
Also avoids optimistic restoration if rel_pos != BTR_PCUR_ON,
since btr_cur may be different than old_rec.
rb#3324, approved by Marko and Jimmy
DICT_TABLE_GET_FORMAT(CLUST_INDEX->TABLE) >= 1
The function row_sel_sec_rec_is_for_clust_rec() was incorrectly
preparing to compare a NULL column prefix in a secondary index with a
non-NULL column in a clustered index.
This can trigger an assertion failure in 5.1 plugin and later. In the
built-in InnoDB of MySQL 5.1 and earlier, we would apparently only do
some extra work, by trimming the clustered index field for the
comparison.
The code might actually have worked properly apart from this debug
assertion failure. It is merely doing some extra work in fetching a
BLOB column, and then comparing it to NULL (which would return the
same result, no matter what the BLOB contents is).
While the test case involves CHECK TABLE, this could theoretically
occur during any read that uses a secondary index on a column prefix
of a column that can be NULL.
rb#3101 approved by Mattias Jonsson
WITH AN ASSERTION
Recently we added check to handle kill query signal for long operating
queries.
While the query interruption is reported it must to ensure cursor is restore
to proper state for HANDLER interface to work correctly.
Normal select query will not face this problem, as on recieving interrupt,
select query is aborted and new select query result in re-initialization
(including cursor).
rb://1836. Approved by Marko.
LEN <= SIZEOF(ULONGLONG)
This bug was caught in the WL#6255 ALTER TABLE...ADD COLUMN in MySQL
5.6, but there is a bug in all InnoDB versions that support
auto-increment columns.
row_search_autoinc_read_column(): When reading the maximum value of
the auto-increment column, and the column only contains NULL values,
return 0. This corresponds to the case when the table is empty in
row_search_max_autoinc().
rb:1415 approved by Sunny Bains
SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE
LOOKUPS (honoring kill query while accessing sec_index)
If secondary index is being used for select query evaluation and this
query is operating with consistent read snapshot it might take good time for
secondary index to return back control to mysql as MVCC would kick in.
If user issues "kill query <id>" while query is actively accessing
secondary index it will not be honored as there is no hook to check
for this condition. Added hook for this check.
-----
Parallely secondary index taking too long to evaluate for consistent
read snapshot case is being examined for performance improvement. WL#6540.
SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE
LOOKUPS (honoring kill query while accessing sec_index)
If secondary index is being used for select query evaluation and this
query is operating with consistent read snapshot it might take good time for
secondary index to return back control to mysql as MVCC would kick in.
If user issues "kill query <id>" while query is actively accessing
secondary index it will not be honored as there is no hook to check
for this condition. Added hook for this check.
-----
Parallely secondary index taking too long to evaluate for consistent
read snapshot case is being examined for performance improvement. WL#6540.
The ha_innobase table handler contained two search key buffers
(srch_key_val1, srch_key_val2) of fixed size used to store the search
key. The size of these buffers where fixed at
REC_VERSION_56_MAX_INDEX_COL_LEN + 2. But this size is not sufficient
to hold the search key. Hence the following assert in
row_sel_convert_mysql_key_to_innobase() failed.
2438 /* Storing may use at most data_len bytes of buf */
2439
2440 if (UNIV_LIKELY(!is_null)) {
2441 ut_a(buf + data_len <= original_buf + buf_len);
2442 row_mysql_store_col_in_innobase_format(
2443 dfield, buf,
2444 FALSE, /* MySQL key value format col */
2445 key_ptr + data_offset, data_len,
2446 dict_table_is_comp(index->table));
2447 buf += data_len;
2448 }
The buffer size is now calculated with the formula
MAX_KEY_LENGTH + MAX_REF_PARTS*2. This properly takes into account
the extra bytes needed to store the length for each column. An index
can contain a maximum of MAX_REF_PARTS columns in it, and for each
column 2 bytes are needed to store length.
rb://1238 approved by Marko and Vasil Dimov.
print page dump
buf_page_print(): Remove the ut_ad(0) from the beginning. Add two flags
(enum buf_page_print_flags) that can be bitwise-ORed together:
BUF_PAGE_PRINT_NO_CRASH:
Do not crash debug builds at the end of buf_page_print().
BUF_PAGE_PRINT_NO_FULL:
Do not print the full page dump. This can be useful when adding
diagnostic printout to flushing or to the doublewrite buffer.
trx_sys_doublewrite_init_or_restore_page(): Replace exit(1) with ut_error,
so that we can get a core dump if this extraordinary condition happens.
rb:924 approved by Sunny Bains
This fix does not remove the underlying cause of the assertion
failure. It just works around the problem, allowing a corrupted
secondary index to be fixed by DROP INDEX and CREATE INDEX (or in the
worst case, by re-creating the table).
ibuf_delete(): If the record to be purged is the last one in the page
or it is not delete-marked, refuse to purge it. Instead, write an
error message to the error log and let a debug assertion fail.
ibuf_set_del_mark(): If the record to be delete-marked is not found,
display some more information in the error log and let a debug
assertion fail.
row_undo_mod_del_unmark_sec_and_undo_update(),
row_upd_sec_index_entry(): Let a debug assertion fail when the record
to be delete-marked is not found.
buf_page_print(): Add ut_ad(0) so that corruption will be more
prominent in stress testing with debug binaries. Add ut_ad(0) here and
there where corruption is noticed.
btr_corruption_report(): Display some data on page_is_comp() mismatch.
btr_assert_not_corrupted(): A wrapper around btr_corruption_report().
Assert that page_is_comp() agrees with the table flags.
rb:911 approved by Inaam Rana
CREATE TABLE bug13510739 (c INTEGER NOT NULL, PRIMARY KEY (c)) ENGINE=INNODB;
INSERT INTO bug13510739 VALUES (1), (2), (3), (4);
DELETE FROM bug13510739 WHERE c=2;
HANDLER bug13510739 OPEN;
HANDLER bug13510739 READ `primary` = (2);
HANDLER bug13510739 READ `primary` NEXT; <-- crash
The bug is that in the particular testcase row_search_for_mysql() picked up
a delete-marked record and quit, leaving the cursor non-positioned state and
on the subsequent 'get next' call the code crashed because of the
non-positioned cursor.
In row0sel.cc (line numbers from mysql-trunk):
4653 if (rec_get_deleted_flag(rec, comp)) {
...
4679 if (index == clust_index && unique_search) {
4680
4681 err = DB_RECORD_NOT_FOUND;
4682
4683 goto normal_return;
4684 }
it quit from here, not storing the cursor position.
In contrast, if the record=2 is not found at all (e.g. sleep(1) after DELETE
to let the purge wipe it away completely) then 'get = 2' does find record=3
and quits from here:
4366 if (0 != cmp_dtuple_rec(search_tuple, rec, offsets)) {
...
4394 btr_pcur_store_position(pcur, &mtr);
4395
4396 err = DB_RECORD_NOT_FOUND;
4397 #if 0
4398 ut_print_name(stderr, trx, FALSE, index->name);
4399 fputs(" record not found 3\n", stderr);
4400 #endif
4401
4402 goto normal_return;
Another fix could be to extend the condition on line 4366 to hold only if
seach_tuple matches rec AND if rec is not delete marked.
Notice that in the above test case if we wait about 1 second somewhere after
DELETE and before 'get = 2', then the testcase does not crash and returns 4
instead. Not sure if this is the correct behavior, but this bugfix removes
the crash and makes the code return what it also returns in the non-crashing
case (if rec=2 is not found during 'get = 2', e.g. we have sleep(1) there).
Approved by: Marko (http://bur03.no.oracle.com/rb/r/863/)
Also addressed issues in bug #11745133, where we could mark a table
corrupted instead of crashing the server when found a corrupted buffer/page
if the table created with innodb_file_per_table on.
row_sel_field_store_in_mysql_format(): Do not pad the unused part of
the buffer reserved for a True VARCHAR column (introduced in 5.0.3).
Add Valgrind instrumentation ensuring that the unused part will be
flagged uninitialized.
row_sel_copy_cached_field_for_mysql(): New function: Copy a field
that is in the MySQL row format, not copying the unused tail of
VARCHAR columns.
row_sel_pop_cached_row_for_mysql(): Invoke
row_sel_copy_cached_field_for_mysql() for copying fields.
When the row is long, copy it field-by-field.
rb:715 approved by Inaam Rana
With this change, the index prefix column length lifted from 767 bytes
to 3072 bytes if "innodb_large_prefix" is set to "true".
rb://603 approved by Marko
Remove most references to thread id in InnoDB. Three references
remain: the current holder of a mutex, and the current x-lock holder
of a rw-lock, and some references in UNIV_SYNC_DEBUG checks. This
allows MySQL to change the thread associated to a client connection.
Tighten the UNIV_SYNC_DEBUG checks, trying to ensure that no InnoDB
mutex or x-lock is being held when returning control to MySQL. The
only semaphore that may be held is the btr_search_latch in shared mode.
sync_thread_levels_empty_except_dict(): A wrapper for
sync_thread_levels_empty_gen(TRUE).
sync_thread_levels_nonempty_trx(): Check that the current thread is
not holding any InnoDB semaphores, except btr_search_latch if
trx->has_search_latch.
sync_thread_levels_empty(): Unused function; remove.
trx_t: Remove mysql_thread_id and mysql_process_no.
srv_slot_t: Remove id and handle.
row_search_for_mysql(), srv_conc_enter_innodb(),
srv_conc_force_enter_innodb(), srv_conc_force_exit_innodb(),
srv_conc_exit_innodb(), srv_suspend_mysql_thread: Assert
!sync_thread_levels_nonempty_trx().
rb:634 approved by Sunny Bains
Additional fixes in 5.5:
ibuf_set_del_mark(): Add diagnostics when setting a buffered delete-mark fails.
ibuf_delete(): Correct a misleading comment about non-found records.
rec_print(): Add a const qualifier to the index parameter.
Bug #56680 wrong InnoDB results from a case-insensitive covering index
row_search_for_mysql(): When a secondary index record might not be
visible in the current transaction's read view and we consult the
clustered index and optionally some undo log records, return the
relevant columns of the clustered index record to MySQL instead of the
secondary index record.
ibuf_insert_to_index_page_low(): New function, refactored from
ibuf_insert_to_index_page().
ibuf_insert_to_index_page(): When we are inserting a record in place
of a delete-marked record and some fields of the record differ, update
that record just like row_ins_sec_index_entry_by_modify() would do.
btr_cur_update_alloc_zip(): Make the function public.
mysql_row_templ_t: Add clust_rec_field_no.
row_sel_store_mysql_rec(), row_sel_push_cache_row_for_mysql(): Add the
flag rec_clust, for returning data at clust_rec_field_no instead of
rec_field_no. Resurrect the debug assertion that the record not be
marked for deletion. (Bug #55626)
[UNIV_DEBUG || UNIV_IBUF_DEBUG] ibuf_debug, buf_page_get_gen(),
buf_flush_page_try():
Implement innodb_change_buffering_debug=1 for evicting pages from the
buffer pool, so that change buffering will be attempted more
frequently.
row_search_for_mysql(): When a secondary index record might not be
visible in the current transaction's read view and we consult the
clustered index and optionally some undo log records, return the
relevant columns of the clustered index record to MySQL instead of the
secondary index record.
REC_INFO_DELETED_FLAG: Move the definition from rem0rec.ic to rem0rec.h.
ibuf_insert_to_index_page_low(): New function, refactored from
ibuf_insert_to_index_page().
ibuf_insert_to_index_page(): When we are inserting a record in place
of a delete-marked record and some fields of the record differ, update
that record just like row_ins_sec_index_entry_by_modify() would do.
mysql_row_templ_t: Add clust_rec_field_no.
row_sel_store_mysql_rec(), row_sel_push_cache_row_for_mysql(): Add the
flag rec_clust, for returning data at clust_rec_field_no instead of
rec_field_no. Resurrect the debug assertion that the record not be
marked for deletion. (Bug #55626)
buf_LRU_free_block(): Refactored from
buf_LRU_search_and_free_block(). This is needed for the
innodb_change_buffering_debug diagnostics.
[UNIV_DEBUG || UNIV_IBUF_DEBUG] ibuf_debug, buf_page_get_gen(),
buf_flush_page_try():
Implement innodb_change_buffering_debug=1 for evicting pages from the
buffer pool, so that change buffering will be attempted more
frequently.
This is the 5.5 version of the fix. The 5.1 version was too complicated to
merge and was null merged.
This is a regression from the fix for bug no 38999. A storage engine capable
of reading only a subset of a table's columns updates corresponding bits in
the read buffer to signal that it has read NULL values for the corresponding
columns. It cannot, and should not, update any other bits. Bug no 38999
occurred because the implementation of UPDATE statements compare the NULL bits
using memcmp, inadvertently comparing bits that were never requested from the
storage engine. The regression was caused by the storage engine trying to
alleviate the situation by writing to all NULL bits, even those that it had no
knowledge of. This has devastating effects for the index merge algorithm,
which relies on all NULL bits, except those explicitly requested, being left
unchanged.
The fix reverts the fix for bug no 38999 in both InnoDB and InnoDB plugin and
changes the server's method of comparing records. For engines that always read
entire rows, we proceed as usual. For engines capable of reading only select
columns, the record buffers are now compared on a column by column basis. An
assertion was also added so that non comparable buffers are never read. Some
relevant copy-pasted code was also consolidated in a new function.
This is a regression from the fix for bug no 38999. A storage engine capable
of reading only a subset of a table's columns updates corresponding bits in
the read buffer to signal that it has read NULL values for the corresponding
columns. It cannot, and should not, update any other bits. Bug no 38999
occurred because the implementation of UPDATE statements compare the NULL bits
using memcmp, inadvertently comparing bits that were never requested from the
storage engine. The regression was caused by the storage engine trying to
alleviate the situation by writing to all NULL bits, even those that it had no
knowledge of. This has devastating effects for the index merge algorithm,
which relies on all NULL bits, except those explicitly requested, being left
unchanged.
The fix reverts the fix for bug no 38999 in both InnoDB and InnoDB plugin and
changes the server's method of comparing records. For engines that always read
entire rows, we proceed as usual. For engines capable of reading only select
columns, the record buffers are now compared on a column by column basis. An
assertion was also added so that non comparable buffers are never read. Some
relevant copy-pasted code was also consolidated in a new function.
revno: 3545
revision-id: marko.makela@oracle.com-20100818110110-zfs0i1vfrccfb4yw
parent: vasil.dimov@oracle.com-20100817193934-1yl7zz2odikxauf8
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: 5.1-innodb
timestamp: Wed 2010-08-18 14:01:10 +0300
message:
Bug#55626: MIN and MAX reading a delete-marked record from secondary index
Remove a bogus debug assertion that triggered the bug.
Add assertions precisely where records must not be delete-marked.
And a comment to clarify when the record is allowed to be delete-marked.
------------------------------------------------------------
revno: 3533
revision-id: marko.makela@oracle.com-20100630093149-wmc37t128gic933v
parent: marko.makela@oracle.com-20100629131219-pjbkpk5rsqztmw27
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: 5.1-innodb
timestamp: Wed 2010-06-30 12:31:49 +0300
message:
Correct some comments that were added in the fix of Bug #54358
(READ UNCOMMITTED access failure of off-page DYNAMIC or COMPRESSED columns).
Records that lack incompletely written externally stored columns may
be accessed by READ UNCOMMITTED transaction even without involving a
crash during an INSERT or UPDATE operation. I verified this as follows.
(1) added a delay after the mini-transaction for writing the clustered
index 'stub' record was committed (patch attached)
(2) started mysqld in gdb, setting breakpoints to the where the
assertions about READ UNCOMMITTED were added in the bug fix
(3) invoked ibtest3 --create-options=key_block_size=2
to create BLOBs in a COMPRESSED table
(4) invoked the following:
yes 'set transaction isolation level read uncommitted;
checksum table blobt3;select sleep(1);'|mysql -uroot test
(5) noted that one of the breakpoints was triggered
(return(NULL) in btr_rec_copy_externally_stored_field())
=== modified file 'storage/innodb_plugin/row/row0ins.c'
--- storage/innodb_plugin/row/row0ins.c 2010-06-30 08:17:25 +0000
+++ storage/innodb_plugin/row/row0ins.c 2010-06-30 08:17:25 +0000
@@ -2120,6 +2120,7 @@ function_exit:
rec_t* rec;
ulint* offsets;
mtr_start(&mtr);
+ os_thread_sleep(5000000);
btr_cur_search_to_nth_level(index, 0, entry, PAGE_CUR_LE,
BTR_MODIFY_TREE, &cursor, 0,
=== modified file 'storage/innodb_plugin/row/row0upd.c'
--- storage/innodb_plugin/row/row0upd.c 2010-06-30 08:11:55 +0000
+++ storage/innodb_plugin/row/row0upd.c 2010-06-30 08:11:55 +0000
@@ -1763,6 +1763,7 @@ row_upd_clust_rec(
rec_offs_init(offsets_);
mtr_start(mtr);
+ os_thread_sleep(5000000);
ut_a(btr_pcur_restore_position(BTR_MODIFY_TREE, pcur, mtr));
rec = btr_cur_get_rec(btr_cur);
------------------------------------------------------------
revno: 3529
revision-id: marko.makela@oracle.com-20100629125518-m3am4ia1ffjr0d0j
parent: jimmy.yang@oracle.com-20100629024137-690sacm5sogruzvb
committer: Marko Mäkelä <marko.makela@oracle.com>
branch nick: 5.1-innodb
timestamp: Tue 2010-06-29 15:55:18 +0300
message:
Bug#54358: READ UNCOMMITTED access failure of off-page DYNAMIC or COMPRESSED
columns
When the server crashes after a record stub has been inserted and
before all its off-page columns have been written, the record will
contain incomplete off-page columns after crash recovery. Such records
may only be accessed at the READ UNCOMMITTED isolation level or when
rolling back a recovered transaction in recv_recovery_rollback_active().
Skip these records at the READ UNCOMMITTED isolation level.
TODO: Add assertions for checking the above assumptions hold when an
incomplete BLOB is encountered.
btr_rec_copy_externally_stored_field(): Return NULL if the field is
incomplete.
row_prebuilt_t::templ_contains_blob: Clarify what "BLOB" means in this
context. Hint: MySQL BLOBs are not the same as InnoDB BLOBs.
row_sel_store_mysql_rec(): Return FALSE if not all columns could be
retrieved. Previously this function always returned TRUE. Assert that
the record is not delete-marked.
row_sel_push_cache_row_for_mysql(): Return FALSE if not all columns
could be retrieved.
row_search_for_mysql(): Skip records containing incomplete off-page
columns. Assert that the transaction isolation level is READ
UNCOMMITTED.
rb://380 approved by Jimmy Yang
Merge and adjust a forgotten change to fix this bug.
rb://393 approved by Jimmy Yang
------------------------------------------------------------------------
r3794 | marko | 2009-01-07 14:14:53 +0000 (Wed, 07 Jan 2009) | 18 lines
branches/6.0: Allow the minimum length of a multi-byte character to be
up to 4 bytes. (Bug #35391)
dtype_t, dict_col_t: Replace mbminlen:2, mbmaxlen:3 with mbminmaxlen:5.
In this way, the 5 bits can hold two values of 0..4, and the storage size
of the fields will not cross the 64-bit boundary. Encode the values as
DATA_MBMAX * mbmaxlen + mbminlen. Define the auxiliary macros
DB_MBMINLEN(mbminmaxlen), DB_MBMAXLEN(mbminmaxlen), and
DB_MINMAXLEN(mbminlen, mbmaxlen).
Try to trim and pad UTF-16 and UTF-32 with spaces as appropriate.
Alexander Barkov suggested the use of cs->cset->fill(cs, buff, len, 0x20).
ha_innobase::store_key_val_for_row() now does that, but the added function
row_mysql_pad_col() does not, because it doesn't have the MySQL TABLE object.
rb://49 approved by Heikki Tuuri
------------------------------------------------------------------------
------------------------------------------------------------
revno: 3506
revision-id: sergey.glukhov@sun.com-20100609121718-04mpk5kjxvnrxdu8
parent: sergey.glukhov@sun.com-20100609120734-ndy2281wau9067zv
committer: Sergey Glukhov <Sergey.Glukhov@sun.com>
branch nick: mysql-5.1-innodb
timestamp: Wed 2010-06-09 16:17:18 +0400
message:
Bug#38999 valgrind warnings for update statement in function compare_record()
(InnoDB plugin branch)
@ mysql-test/suite/innodb_plugin/r/innodb_mysql.result
test case
@ mysql-test/suite/innodb_plugin/t/innodb_mysql.test
test case
@ storage/innodb_plugin/row/row0sel.c
init null bytes with default values as they might be
left uninitialized in some cases and these uninited bytes
might be copied into mysql record buffer that leads to
valgrind warnings on next use of the buffer.
Valgrind warning happpens because of uninitialized null bytes.
In row_sel_push_cache_row_for_mysql() function we fill fetch cache
with necessary field values, row_sel_store_mysql_rec() is called
for this and leaves null bytes untouched.
Later row_sel_pop_cached_row_for_mysql() rewrites table record
buffer with uninited null bytes. We can see the problem from the
test case:
At 'SELECT...' we call row_sel_push...->row_sel_store...->row_sel_pop_cached...
chain which rewrites table->record[0] buffer with uninitialized null bytes.
When we call 'UPDATE...' statement, compare_record uses this buffer and
valgrind warning occurs.
The fix is to init null bytes with default values.
mysql-test/suite/innodb/r/innodb_mysql.result:
test case
mysql-test/suite/innodb/t/innodb_mysql.test:
test case
mysql-test/t/ps_3innodb.test:
enable valgrind testing
storage/innobase/row/row0sel.c:
init null bytes with default values as they might be
left uninitialized in some cases and these uninited bytes
might be copied into mysql record buffer that leads to
valgrind warnings on next use of the buffer.