There are two threads. In one thread, dml operation is going on
involving cascaded update operation. In another thread, alter
table add foreign key constraint is happening. Under these
circumstances, it is possible for the dml thread to access a
dict_foreign_t object that has been freed by the ddl thread.
The debug sync test case provides the sequence of operations.
Without fix, the test case will crash the server (because of
newly added assert). With fix, the alter table stmt will return
an error message.
Backporting the fix from MySQL 5.5 to 5.1
rb:961
rb:947
On shutdown(), Windows can drop traffic still queued for sending even if that
wasn't specifically requested. As a result, fatal errors (those after
signaling which the server will drop the connection) were sometimes only
seen as "connection lost" on the client side, because the server-side
shutdown() erraneously discarded the correct error message before sending
it.
If on Windows, we now use the Windows API to access the (non-broken) equivalent
of shutdown().
Backport from trunk
On shutdown(), Windows can drop traffic still queued for sending even if that
wasn't specifically requested. As a result, fatal errors (those after
signaling which the server will drop the connection) were sometimes only
seen as "connection lost" on the client side, because the server-side
shutdown() erraneously discarded the correct error message before sending
it.
If on Windows, we now use the Windows API to access the (non-broken) equivalent
of shutdown().
Backport from trunk
include/violite.h:
export mysql_socket_shutdown(). It lives in vio in the backport.
sql/mysqld.cc:
Go through our own shutdown() rather than straight to the POSIX one.
vio/viosocket.c:
Define mysql_socket_shutdown(). On UNIXoid systems, it's just a wrapper for shutdown(), but
on Window, it uses DisconnectEx, which is magic.
This bug was originally filed and fixed as Bug#12612184. The original
fix was buggy, and it was patched by Bug#12704861. Also that patch was
buggy (potentially breaking crash recovery), and both fixes were
reverted.
This fix was not ported to the built-in InnoDB of MySQL 5.1, because
the function signatures of many core functions are different from
InnoDB Plugin and later versions. The block allocation routines and
their callers would have to changed so that they handle block
descriptors instead of page frames.
When a record is updated so that its size grows, non-updated columns
can be selected for external (off-page) storage. The bug is that the
initially inserted updated record contains an all-zero BLOB pointer to
the field that was not updated. Only after the BLOB pages have been
allocated and written, the valid pointer can be written to the record.
Between the release of the page latch in mtr_commit(mtr) after
btr_cur_pessimistic_update() and the re-latching of the page in
btr_pcur_restore_position(), other threads can see the invalid BLOB
pointer consisting of 20 zero bytes. Moreover, if the system crashes
at this point, the situation could persist after crash recovery, and
the contents of the non-updated column would be permanently lost.
The problem is amplified by the ROW_FORMAT=DYNAMIC and
ROW_FORMAT=COMPRESSED that were introduced in
innodb_file_format=barracuda in InnoDB Plugin, but the bug does exist
in all InnoDB versions.
The fix is as follows. After a pessimistic B-tree operation that needs
to write out off-page columns, allocate the pages for these columns in
the mini-transaction that performed the B-tree operation (btr_mtr),
but write the pages in a separate mini-transaction (blob_mtr). Do
mtr_commit(blob_mtr) before mtr_commit(btr_mtr). A quirk: Do not reuse
pages that were previously freed in btr_mtr. Only write the off-page
columns to 'fresh' pages.
In this way, crash recovery will see redo log entries for blob_mtr
before any redo log entry for btr_mtr. It will apply the BLOB page
writes to pages that were marked free at that point. If crash recovery
fails to see all of the btr_mtr redo log, there will be some
unreachable BLOB data in free pages, but the B-tree will be in a
consistent state.
btr_page_alloc_low(): Renamed from btr_page_alloc(). Add the parameter
init_mtr. Return an allocated block, or NULL. If init_mtr!=mtr but
the page was already X-latched in mtr, do not initialize the page.
btr_page_alloc(): Wrapper for btr_page_alloc_for_ibuf() and
btr_page_alloc_low().
btr_page_free(): Add a debug assertion that the page was a B-tree page.
btr_lift_page_up(): Return the father block.
btr_compress(), btr_cur_compress_if_useful(): Add the parameter ibool
adjust, for adjusting the cursor position.
btr_cur_pessimistic_update(): Preserve the cursor position when
big_rec will be written and the new flag BTR_KEEP_POS_FLAG is defined.
Remove a duplicate rec_get_offsets() call. Keep the X-latch on
index->lock when big_rec is needed.
btr_store_big_rec_extern_fields(): Replace update_inplace with
an operation code, and local_mtr with btr_mtr. When not doing a
fresh insert and btr_mtr has freed pages, put aside any pages that
were previously X-latched in btr_mtr, and free the pages after
writing out all data. The data must be written to 'fresh' pages,
because btr_mtr will be committed and written to the redo log after
the BLOB writes have been written to the redo log.
btr_blob_op_is_update(): Check if an operation passed to
btr_store_big_rec_extern_fields() is an update or insert-by-update.
fseg_alloc_free_page_low(), fsp_alloc_free_page(),
fseg_alloc_free_extent(), fseg_alloc_free_page_general(): Add the
parameter init_mtr. Return an allocated block, or NULL. If
init_mtr!=mtr but the page was already X-latched in mtr, do not
initialize the page.
xdes_get_descriptor_with_space_hdr(): Assert that the file space
header is being X-latched.
fsp_alloc_from_free_frag(): Refactored from fsp_alloc_free_page().
fsp_page_create(): New function, for allocating, X-latching and
potentially initializing a page. If init_mtr!=mtr but the page was
already X-latched in mtr, do not initialize the page.
fsp_free_page(): Add ut_ad(0) to the error outcomes.
fsp_free_page(), fseg_free_page_low(): Increment mtr->n_freed_pages.
fsp_alloc_seg_inode_page(), fseg_create_general(): Assert that the
page was not previously X-latched in the mini-transaction. A file
segment or inode page should never be allocated in the middle of an
mini-transaction that frees pages, such as btr_cur_pessimistic_delete().
fseg_alloc_free_page_low(): If the hinted page was allocated, skip the
check if the tablespace should be extended. Return NULL instead of
FIL_NULL on failure. Remove the flag frag_page_allocated. Instead,
return directly, because the page would already have been initialized.
fseg_find_free_frag_page_slot() would return ULINT_UNDEFINED on error,
not FIL_NULL. Correct a bogus assertion.
fseg_alloc_free_page(): Redefine as a wrapper macro around
fseg_alloc_free_page_general().
buf_block_buf_fix_inc(): Move the definition from the buf0buf.ic to
buf0buf.h, so that it can be called from other modules.
mtr_t: Add n_freed_pages (number of pages that have been freed).
page_rec_get_nth_const(), page_rec_get_nth(): The inverse function of
page_rec_get_n_recs_before(), get the nth record of the record
list. This is faster than iterating the linked list. Refactored from
page_get_middle_rec().
trx_undo_rec_copy(): Add a debug assertion for the length.
trx_undo_add_page(): Return a block descriptor or NULL instead of a
page number or FIL_NULL.
trx_undo_report_row_operation(): Add debug assertions.
trx_sys_create_doublewrite_buf(): Assert that each page was not
previously X-latched.
page_cur_insert_rec_zip_reorg(): Make use of page_rec_get_nth().
row_ins_clust_index_entry_by_modify(): Pass BTR_KEEP_POS_FLAG, so that
the repositioning of the cursor can be avoided.
row_ins_index_entry_low(): Add DEBUG_SYNC points before and after
writing off-page columns. If inserting by updating a delete-marked
record, do not reposition the cursor or commit the mini-transaction
before writing the off-page columns.
row_build(): Tighten a debug assertion about null BLOB pointers.
row_upd_clust_rec(): Add DEBUG_SYNC points before and after writing
off-page columns. Do not reposition the cursor or commit the
mini-transaction before writing the off-page columns.
rb:939 approved by Jimmy Yang
OF WIDE RECORDS
row_ins_index_entry_low(), row_upd_clust_rec(): Make a redo log
checkpoint if a DEBUG flag is set. Add DEBUG_SYNC around
btr_store_big_rec_extern_fields().
rb:946 approved by Jimmy Yang
$SUBJ$
1. Took a diff between the previous base version and the
mysql sources.
2. Added the new 2.1.4 base version.
3. Reviewed and re-applied the diff from step #1.
During FIC error handling the trx->error_state was not being set to DB_SUCCESS
after failure, before attempting the next DDL SQL operation. This reset to
DB_SUCCESS is somewhat of a requirement though not explicitly stated anywhere.
The fix is to reset it to DB_SUCCESS in row0merge.cc if row_merge_rename_indexes
or row_merge_drop_index functions fail, also reset to DB_SUCCESS at trx commit.
rb://935 Approved by Jimmy Yang.
IS EXECUTED TWICE FROM P
This bug is a duplicate of bug 12567331, which was pushed to the
optimizer backporting tree on 2011-06-11. This is just a back-port of
the fix. Both test cases are included as they differ somewhat.
GRACEFUL SHUTDOWN
During startup mysql picks up .frm files from the tmpdir directory and
tries to drop those tables in the storage engine.
The problem is that when tmpdir ends in / then ha_innobase::delete_table()
is passed a string like "/var/tmp//#sql123", then it wrongly normalizes it
to "/#sql123" and calls row_drop_table_for_mysql() which of course fails
to delete the table entry from the InnoDB dictionary cache.
ha_innobase::delete_table() returns an error but nevertheless mysql wipes
away the .frm file and the entry in the InnoDB dictionary cache remains
orphaned with no easy way to remove it.
The "no easy" way to remove it is to create a similar temporary table again,
copy its .frm file to tmpdir under "#sql123.frm" and restart mysqld with
tmpdir=/var/tmp (no trailing slash) - this way mysql will pick the .frm file
after restart and will try to issue drop table for "/var/tmp/#sql123"
(notice do double slash), ha_innobase::delete_table() will normalize it to
"tmp/#sql123" and row_drop_table_for_mysql() will successfully remove the
table entry from the dictionary cache.
The solution is to fix normalize_table_name_low() to normalize things like
"/var/tmp//table" correctly to "tmp/table".
This patch also adds a test function which invokes
normalize_table_name_low() with various inputs to make sure it works
correctly and a mtr test that calls this test function.
Reviewed by: Marko (http://bur03.no.oracle.com/rb/r/929/)
CORRUPTED WHEN RUN CONCURRENTLY WITH
ISSUE: Table corruption due to concurrent queries.
Different threads running check, repair query
along with insert. Locks not properly acquired
in repair query. Rows are inserted inbetween
repair query.
SOLUTION: Mutex lock is acquired before the
repair call. Concurrent queries wont
effect the call to repair.
CASES RESETS DATA POINTER TO SMAL
ISSUE: Myisamchk doing sort recover
on a table reduces data_file_length.
Maximum size of data file decreases,
lesser number of rows are stored.
SOLUTION: Size of data_file_length is
fixed to the original length.
CASES RESETS DATA POINTER TO SMAL
ISSUE: Myisamchk doing sort recover
on a table reduces data_file_length.
Maximum size of data file decreases,
lesser number of rows are stored.
SOLUTION: Size of data_file_length is
fixed to the original length.
rb://914
approved by: Marko Makela
Poll in fil_rename_tablespace() after setting ::stop_ios flag can
result in a hang because the other thread actually dispatching the IO
won't wake IO helper threads or flush the tablespace before starting
wait in fil_mutex_enter_and_prepare_for_io().
KEY HANDLING ON SUBSEQUENT CREATE TABLE IF NOT EXISTS
PROBLEM:
--------
Consider a SP routine which does CREATE TABLE
with REFERENCES clause. The first call to this routine
invokes parser and the parsed items are cached, so as
to avoid parsing for the second execution of the routine.
It is obsevered that valgrind reports a warning
upon read of thd->lex->alter_info->key_list->Foreign_key object,
which seem to be pointing to a invalid memory address
during second time execution of the routine. Accessing this object
theoretically could cause a crash.
ANALYSIS:
---------
The problem stems from the fact that for some reason
elements of ref_columns list in thd->lex->alter_info->
key_list->Foreign_key object are changed to point to
objects allocated on runtime memory root.
During the first execution of routine we create
a copy of thd->lex->alter_info object.
As part of this process we create a clones of objects in
Alter_info::key_list and of Foreign_key object in particular.
Then Foreign_key object is cloned for some reason we
perform shallow copies of both Foreign_key::ref_columns
and Foreign_key::columns list. So new instance of
Foreign_key object starts to SHARE contents of ref_columns
and columns list with the original instance.
After that as part of cloning process we call
list_copy_and_replace_each_value() for elements of
ref_columns list. As result ref_columns lists in both
original and cloned Foreign_key object start to contain
pointers to Key_part_spec objects allocated on runtime
memory root because of shallow copy.
So when we start copying of thd->lex->alter_info object
during the second execution of stored routine we indeed
encounter pointer to the Key_part_spec object allocated
on runtime mem-root which was cleared during at the end
of previous execution. This is done in sp_head::execute(),
by a call to free_root(&execute_mem_root,MYF(0));
As result we get valgrind warnings about accessing
unreferenced memory.
FIX:
----
The safest solution to this problem is to
fix Foreign_key(Foreign_key, MEM_ROOT) constructor to do
a deep copy of columns lists, similar to Key(Key, MEM_ROOT)
constructor.
Bug#13011410 CRASH IN FILESORT CODE WITH GROUP BY/ROLLUP
The assert in 13580775 is visible in 5.6 only,
but shows that all versions are vulnerable.
13011410 crashes in all versions.
filesort tries to re-use the sort buffer between invocations in order to save
malloc/free overhead.
The fix for Bug 11748783 - 37359: FILESORT CAN BE MORE EFFICIENT.
added an assert that buffer properties (num_records, record_length) are
consistent between invocations. Indeed, they are not necessarily consistent.
Fix: re-allocate the sort buffer if properties change.
mysql-test/r/partition.result:
New tests.
mysql-test/t/partition.test:
New tests.
sql/filesort.cc:
If we already have allocated a sort buffer in a previous execution,
then verify that it is big enough for the current one.
sql/table.h:
Add sort_keys_size; Number of bytes allocated for the sort_keys buffer.