compressed pages
After loading a compressed-only page in buf_page_get_gen() we allocate a new
block for decompression. The problem is that the compressed page is neither
buffer-fixed nor I/O-fixed by the time we call buf_LRU_get_free_block(),
so it may end up being evicted and returned back as a new block.
buf_page_get_gen(): Temporarily buffer-fix the compressed-only block
while allocating memory for an uncompressed page frame.
This should prevent this form of the infinite loop, which is more likely
with a small innodb_buffer_pool_size.
rb#2511 approved by Jimmy Yang, Sunny Bains
DICT_TABLE_GET_FORMAT(CLUST_INDEX->TABLE) >= 1
The function row_sel_sec_rec_is_for_clust_rec() was incorrectly
preparing to compare a NULL column prefix in a secondary index with a
non-NULL column in a clustered index.
This can trigger an assertion failure in 5.1 plugin and later. In the
built-in InnoDB of MySQL 5.1 and earlier, we would apparently only do
some extra work, by trimming the clustered index field for the
comparison.
The code might actually have worked properly apart from this debug
assertion failure. It is merely doing some extra work in fetching a
BLOB column, and then comparing it to NULL (which would return the
same result, no matter what the BLOB contents is).
While the test case involves CHECK TABLE, this could theoretically
occur during any read that uses a secondary index on a column prefix
of a column that can be NULL.
rb#3101 approved by Mattias Jonsson
There was a race condition in the rollback of TRX_UNDO_UPD_DEL_REC.
Once row_undo_mod_clust() has rolled back the changes by the rolling-back
transaction, it attempts to purge the delete-marked record, if possible, in a
separate mini-transaction.
However, row_undo_mod_remove_clust_low() fails to check if the DB_TRX_ID of
the record that it found after repositioning the cursor, is still the same.
If it is not, it means that the record was purged and another record was
inserted in its place.
So, the rollback would have performed an incorrect purge, breaking the
locking rules and causing corruption.
The problem was found by creating a table that contains a unique
secondary index and a primary key, and two threads running REPLACE
with only one value for the unique column, so that the uniqueness
constraint would be violated all the time, leading to statement
rollback.
This bug exists in all InnoDB versions (I checked MySQL 3.23.53).
It has become easier to repeat in 5.5 and 5.6 thanks to scalability
improvements and a dedicated purge thread.
rb#3085 approved by Jimmy Yang
FAILED BLOB WRITE
btr_store_big_rec_extern_fields(): Relax a debug assertion so that
some BLOB pointers may remain zero if an error occurs.
btr_free_externally_stored_field(), row_undo_ins(): Allow the BLOB
pointer to be zero on any rollback.
rb#3059 approved by Jimmy Yang, Kevin Lewis
Since the mtr_t struct is marked as invalid in DEBUG_VALGRIND build
during mtr_commit, checking mtr->inside_ibuf will cause this warning.
Also since mtr->inside_ibuf cannot be set in mtr_commit (assert check)
and mtr->state is set to MTR_COMMITTED, the 'ut_ad(!ibuf_inside(&mtr))'
check is not needed if 'ut_ad(mtr.state == MTR_COMMITTED)' is also
checked.
SHUTDOWN IS IN PROGRESS
PROBLEM
-------
In the background thread srv_master_thread() we have a
a one second delay loop which will continuously monitor
server activity .If the server is inactive (with out any
user activity) or in a shutdown state we do some background
activity like flushing the changes.In the current code
we are not checking if server is in shutdown state before
sleeping for one second.
FIX
---
If server is in shutdown state ,then dont go to one second
sleep.
Problem:
When the user specified foreign key name contains "_ibfk_", InnoDB wrongly
tries to rename it.
Solution:
When a table is renamed, all its associated foreign keys will also be renamed,
only if the foreign key names are automatically generated. If the foreign key
names are given by the user, even if it has _ibfk_ in it, it must not be
renamed.
rb#2935 approved by Jimmy, Krunal and Satya
Since log_throttle is not available in 5.5. Logging of
error message for failure of thread to create new connection
in "create_thread_to_handle_connection" is not backported.
Since, function "my_plugin_log_message" is not available in
5.5 version and since there is incompatibility between
sql_print_XXX function compiled with g++ and alog files with
gcc to use sql_print_error, changes related to audit log
plugin is not backported.
SERIALIZABLE
Problem:
The documentation claims that WITH CONSISTENT SNAPSHOT will work for both
REPEATABLE READ and SERIALIZABLE isolation levels. But it will work only
for REPEATABLE READ isolation level. Also, the clause WITH CONSISTENT
SNAPSHOT is silently ignored when it is not applicable to the given isolation
level.
Solution:
Generate a warning when the clause WITH CONSISTENT SNAPSHOT is ignored.
rb#2797 approved by Kevin.
Note: Support team wanted to push this to 5.5+.
MULTI-FILE TABLESPACE
ANALYSIS
--------
When a tablespace has multiple data files, InnoDB fails to
open the tablespace. This is because for each ibd file,
the first page is checked.But the first page of all ibd file
need not be the first page of the tablespace. Only the first
page of the tablespace contains the tablespace header. When
we check the first page of an ibd file that is not the first
page of the tablespace, then the "tablespace flags" is not
really available.This was wrongly used to check if a page is
corrupt or not.
FIX
---
Use the tablespace flags only if the page number is 0
in a tablespace.
[Approved by Inaam rb#2836 ]
Analysis
--------
The pthread_mutex commit_threads_m was initiliazed but never
used.
Fix
---
Removing the commit_threads_m mutex from the code base.
[ Approved by Marko rb#2475]
DDL AND I_S QUERIES
Skip partially created indexes (ones whose name starts with TEMP_INDEX_PREFIX)
from stats gathering.
Because InnoDB reports HA_INPLACE_ADD_INDEX_NO_WRITE to MySQL, the latter
allows parallel execution of ha_innobase::add_index() and ha_innobase::info().
Reviewed by: Inaam (rb:2613)
IF IT HAS A WRONG COUNT
If CHECK TABLE finds that a secondary index contains the wrong
number of entries, it used to report an error but not mark the
index as corrupt. The error means that the index should be rebuilt,
which can be done with ALTER TABLE DROP INDEX and ALTER TABLE ADD
INDEX. But just in case the DBA does not pay any attention to the
output of CHECK TABLE, the secondary index should be marked as
corrupted so that it is not used again.
Approved by Inaam in RB:2607
ON DELETION ORDER
Problem:
When a InnoDB index page is under-filled, we will merge it with either
the left sibling node or the right sibling node. But this checking is
incorrect. When the left sibling node is available, even if merging
is not possible with left sibling node, we do not check for the
possibility of merging with the right sibling node.
Solution:
If left sibling node is available, and merging with left sibling node
is not possible, then check if merge with right sibling node is
possible.
rb#2506 approved by jimmy & ima.
i_s_innodb_buffer_page_get_info(): Do not read the buffer block frame
contents of read-fixed blocks, because it may be invalid or
uninitialized. When we are going to decompress or read a block, we
will put it into buf_pool->page_hash and buf_pool->LRU, read-fix the
block and release the mutexes for the duration of the reading or
decompression.
rb#2500 approved by Jimmy Yang
USING THE PLUGIN INTERFACE.
ISSUE: No support for floating-point plugin
system variables.
SOLUTION: Allowing plugins to define and expose floating-point
system variables of type double. MYSQL_SYSVAR_DOUBLE
and MYSQL_THDVAR_DOUBLE are added.
ISSUE: Fractional part of the def, min, max values of system
variables are ignored.
SOLUTION: Adding functions that are used to store the raw
representation of a double in the raw bits of unsigned
longlong in a way that the binary representation
remains the same.
ESCAPED WITH BACKSLASH
Problem:
When the CREATE TABLE statement used COMMENTS with escape sequences like
'foo\'s', InnoDB did not parse is correctly when trying to extract the
foreign key information. Because of this, the foreign keys specified
in the CREATE TABLE statement were not created.
Solution:
Make the InnoDB internal parser aware of escape sequences.
rb#2457 approved by Kevin.
INSERT BUFFER MERGE
Problem:
When the record is merged from the change buffer to the actual page,
in a particular condition, it is assumed that the deleted rec will
be re-used by the inserted rec. With this assumption the lock is
restored on the pointer to the deleted rec itself, thinking that
it is pointing to the newly inserted rec.
Solution:
Just before restoring the lock, update the rec pointer to point
to the newly inserted record. An assert has been added to verify
this. This assert will fail without the fix and will pass with
the fix.
rb#2449 in review by Marko and Jimmy
INSERT BUFFER MERGE
Problem:
When the record is merged from the change buffer to the actual page,
in a particular condition, it is assumed that the deleted rec will
be re-used by the inserted rec. With this assumption the lock is
restored on the pointer to the deleted rec itself, thinking that
it is pointing to the newly inserted rec.
Solution:
Just before restoring the lock, update the rec pointer to point
to the newly inserted record. An assert has been added to verify
this. This assert will fail without the fix and will pass with
the fix.
rb#2449 in review by Marko and Jimmy
INNODB_FAST_SHUTDOWN IS 2
Problem:
When innodb_fast_shutdown is set to 2 and the master thread enters
flush loop, under some circumstances it will not be able to exit it.
This may lead to a shutdown hanging.
This is happening because of the following:
1. In the flush_loop block of code, if the srv_fast_shutdown is
equal to 2 (very fast shutdown), then we do not flush dirty
pages in buffer pool to disk.
2. In the same flush_loop block of code, if the number of dirty
pages is more than user specified limit, we go to step 1.
This results in infinite loop.
Solution:
When we are in the process of doing a very fast shutdown, don't
do step 2 above.
rb#2328 approved by Inaam.
When a record contains no user data bytes (such as when the PRIMARY
KEY is an empty string and all secondary index fields are NULL or the
empty string), page_zip_decompress() could fail to set the record
heap_no correctly.
page_zip_decompress_node_ptrs(), page_zip_decompress_sec(),
page_zip_decompress_clust(): Set heap_no also at the end of the
compressed data stream.
rb#2448 approved by Jimmy Yang and Inaam Rana
AFTER A ROW IS READ
Approved by: Sunny Bains rb://2425
Don't release concurrency tickets when asked to release
btr_search_latch. This is a 5.5 only bug. It is already
fixed in 5.6 upwards.
== Analysis ==
Both change buffer pages and on-disk indexes pages are marked as
FIL_PAGE_INDEX. So all ibuf index pages will classify as INDEX with NULL
table_name and index_name.
== Solution ==
A new page type for ibuf data pages named I_S_PAGE_TYPE_IBUF is defined. All
these pages whose index_id equal (DICT_IBUF_ID_MIN + IBUF_SPACE_ID) will
classify as IBUF_DATA instead of INDEX in INNODB_BUFFER_PAGE
and INNODB_BUFFER_PAGE_LRU.
This fix is only for IS reporting, both on-disk and buffer pool structures
keep unchanged.
Approved by both Marko and Jimmy. rb#2334
innobase_convert_to_filename_charset() was by mistake kept within
the conditional compilation of UNIV_COMPILE_TEST_FUNCS. Now placing
the function out of UNIV_COMPILE_TEST_FUNCS. Also, removed the
unnecessary log message (as in 5.6+).
TRANSACTION ROLLBACK
Problem:
=======
"prepare_commit_mutex" is acquired during "innobase_xa_prepare"
and it is freed only in "innobase_commit". After prepare,
if the commit operation fails the transaction is rolled back
but the mutex is not released.
Analysis:
========
During transaction commit process transaction is prepared and
the "prepare_commit_mutex" is acquired to preserve the order
of commit. After prepare write to binlog is initiated.
File: sql/handler.cc
if (error || (is_real_trans && xid &&
-----> (error= !(cookie= tc_log->log_xid(thd, xid)))))
{
ha_rollback_trans(thd, all);
In the above code "tc_log->log_xid" operation fails.
When the write to binlog fails the transaction is rolled back
with out freeing the mutex. A subsequent "INSERT" operation
tries to acquire the same mutex during its commit process
and the server aborts.
Fix:
===
"prepare_commit_mutex" is freed during "innobase_rollback".
Bug #16754901 PARS_INFO_FREE NOT CALLED IN DICT_CREATE_ADD_FOREIGN_TO_DICTIONARY
Problem:
There are two situations here. The constraint name is explicitly
given by the user and the constraint name is automatically generated
by InnoDB. In the case of generated constraint name, it is formed by
adding table name as prefix. The table names are stored internally in
my_charset_filename. In the case of constraint name explicitly given
by the user, it is stored in UTF8 format itself. So, in some
situations the constraint name is in utf8 and in some situations it is
in my_charset_filename format. Hence this problem.
Solution:
Always store the foreign key constraint name in UTF-8 even when
automatically generated.
Bug #16754901 PARS_INFO_FREE NOT CALLED IN DICT_CREATE_ADD_FOREIGN_TO_DICTIONARY
Problem:
There was a memory leak in the function dict_create_add_foreign_to_dictionary().
The allocated pars_info_t object is not freed in the error code path.
Solution:
Allocate the pars_info_t object after the error checking.
rb#2368 in review
eliminate a race condition over recv_sys->n_addrs which might result in a database corruption
in recovery, without reporting a recovery error.
recv_recover_page_func(): move the code segment that decrements recv_sys->n_addrs
to the end of the function, after the call to mtr_commit()
rb://2282 approved by Inaam
After a clean shutdown, InnoDB will not check the *.ibd file headers,
for maximum performance. This is unchanged before and after this
patch.
What this fix addresses is the case when crash recovery is
needed. Previously, InnoDB could load a corrupted tablespace file.
buf_page_is_corrupted(): Add the parameter check_lsn.
fil_check_first_page(): New function, to perform a consistency check
on the first page of a file. This can be overridden by setting
innodb_force_recovery.
fil_read_first_page(), fil_open_single_table_tablespace(),
fil_load_single_table_tablespace(): Invoke fil_check_first_page().
open_or_create_data_files(): Check the status of
fil_open_single_table_tablespace().
rb#2352 approved by Jimmy Yang
OPENING MISSING PARTITION
In the ha_innobase::open() call, for normal tables, there is no retry logic.
But for partitioned tables, there is a retry logic introduced as fix for:
http://bugs.mysql.com/bug.php?id=33349https://support.mysql.com/view.php?id=21080
The Bug#33349, does not provide sufficient information to analyze the original
problem. The original problem reported by bug#33349 is also minor (just an
annoyance and no loss of functionality). Most importantly, the retry logic
has been introduced without any associated test case.
So we are removing the retry logic for partitioned tables. When the original
problem occurs, a different solution will be explored.