1
0
mirror of https://github.com/MariaDB/server.git synced 2025-09-02 09:41:40 +03:00
Commit Graph

72 Commits

Author SHA1 Message Date
Marko Mäkelä
f77329ace9 Bug#13721257 RACE CONDITION IN UPDATES OR INSERTS OF WIDE RECORDS
This bug was originally filed and fixed as Bug#12612184. The original
fix was buggy, and it was patched by Bug#12704861. Also that patch was
buggy (potentially breaking crash recovery), and both fixes were
reverted.

This fix was not ported to the built-in InnoDB of MySQL 5.1, because
the function signatures of many core functions are different from
InnoDB Plugin and later versions. The block allocation routines and
their callers would have to changed so that they handle block
descriptors instead of page frames.

When a record is updated so that its size grows, non-updated columns
can be selected for external (off-page) storage. The bug is that the
initially inserted updated record contains an all-zero BLOB pointer to
the field that was not updated. Only after the BLOB pages have been
allocated and written, the valid pointer can be written to the record.

Between the release of the page latch in mtr_commit(mtr) after
btr_cur_pessimistic_update() and the re-latching of the page in
btr_pcur_restore_position(), other threads can see the invalid BLOB
pointer consisting of 20 zero bytes. Moreover, if the system crashes
at this point, the situation could persist after crash recovery, and
the contents of the non-updated column would be permanently lost.

The problem is amplified by the ROW_FORMAT=DYNAMIC and
ROW_FORMAT=COMPRESSED that were introduced in
innodb_file_format=barracuda in InnoDB Plugin, but the bug does exist
in all InnoDB versions.

The fix is as follows. After a pessimistic B-tree operation that needs
to write out off-page columns, allocate the pages for these columns in
the mini-transaction that performed the B-tree operation (btr_mtr),
but write the pages in a separate mini-transaction (blob_mtr). Do
mtr_commit(blob_mtr) before mtr_commit(btr_mtr). A quirk: Do not reuse
pages that were previously freed in btr_mtr. Only write the off-page
columns to 'fresh' pages.

In this way, crash recovery will see redo log entries for blob_mtr
before any redo log entry for btr_mtr. It will apply the BLOB page
writes to pages that were marked free at that point. If crash recovery
fails to see all of the btr_mtr redo log, there will be some
unreachable BLOB data in free pages, but the B-tree will be in a
consistent state.

btr_page_alloc_low(): Renamed from btr_page_alloc(). Add the parameter
init_mtr. Return an allocated block, or NULL. If init_mtr!=mtr but
the page was already X-latched in mtr, do not initialize the page.

btr_page_alloc(): Wrapper for btr_page_alloc_for_ibuf() and
btr_page_alloc_low().

btr_page_free(): Add a debug assertion that the page was a B-tree page.

btr_lift_page_up(): Return the father block.

btr_compress(), btr_cur_compress_if_useful(): Add the parameter ibool
adjust, for adjusting the cursor position.

btr_cur_pessimistic_update(): Preserve the cursor position when
big_rec will be written and the new flag BTR_KEEP_POS_FLAG is defined.
Remove a duplicate rec_get_offsets() call. Keep the X-latch on
index->lock when big_rec is needed.

btr_store_big_rec_extern_fields(): Replace update_inplace with
an operation code, and local_mtr with btr_mtr. When not doing a
fresh insert and btr_mtr has freed pages, put aside any pages that
were previously X-latched in btr_mtr, and free the pages after
writing out all data. The data must be written to 'fresh' pages,
because btr_mtr will be committed and written to the redo log after
the BLOB writes have been written to the redo log.

btr_blob_op_is_update(): Check if an operation passed to
btr_store_big_rec_extern_fields() is an update or insert-by-update.

fseg_alloc_free_page_low(), fsp_alloc_free_page(),
fseg_alloc_free_extent(), fseg_alloc_free_page_general(): Add the
parameter init_mtr. Return an allocated block, or NULL. If
init_mtr!=mtr but the page was already X-latched in mtr, do not
initialize the page.

xdes_get_descriptor_with_space_hdr(): Assert that the file space
header is being X-latched.

fsp_alloc_from_free_frag(): Refactored from fsp_alloc_free_page().

fsp_page_create(): New function, for allocating, X-latching and
potentially initializing a page. If init_mtr!=mtr but the page was
already X-latched in mtr, do not initialize the page.

fsp_free_page(): Add ut_ad(0) to the error outcomes.

fsp_free_page(), fseg_free_page_low(): Increment mtr->n_freed_pages.

fsp_alloc_seg_inode_page(), fseg_create_general(): Assert that the
page was not previously X-latched in the mini-transaction. A file
segment or inode page should never be allocated in the middle of an
mini-transaction that frees pages, such as btr_cur_pessimistic_delete().

fseg_alloc_free_page_low(): If the hinted page was allocated, skip the
check if the tablespace should be extended. Return NULL instead of
FIL_NULL on failure. Remove the flag frag_page_allocated. Instead,
return directly, because the page would already have been initialized.

fseg_find_free_frag_page_slot() would return ULINT_UNDEFINED on error,
not FIL_NULL. Correct a bogus assertion.

fseg_alloc_free_page(): Redefine as a wrapper macro around
fseg_alloc_free_page_general().

buf_block_buf_fix_inc(): Move the definition from the buf0buf.ic to
buf0buf.h, so that it can be called from other modules.

mtr_t: Add n_freed_pages (number of pages that have been freed).

page_rec_get_nth_const(), page_rec_get_nth(): The inverse function of
page_rec_get_n_recs_before(), get the nth record of the record
list. This is faster than iterating the linked list. Refactored from
page_get_middle_rec().

trx_undo_rec_copy(): Add a debug assertion for the length.

trx_undo_add_page(): Return a block descriptor or NULL instead of a
page number or FIL_NULL.

trx_undo_report_row_operation(): Add debug assertions.

trx_sys_create_doublewrite_buf(): Assert that each page was not
previously X-latched.

page_cur_insert_rec_zip_reorg(): Make use of page_rec_get_nth().

row_ins_clust_index_entry_by_modify(): Pass BTR_KEEP_POS_FLAG, so that
the repositioning of the cursor can be avoided.

row_ins_index_entry_low(): Add DEBUG_SYNC points before and after
writing off-page columns. If inserting by updating a delete-marked
record, do not reposition the cursor or commit the mini-transaction
before writing the off-page columns.

row_build(): Tighten a debug assertion about null BLOB pointers.

row_upd_clust_rec(): Add DEBUG_SYNC points before and after writing
off-page columns. Do not reposition the cursor or commit the
mini-transaction before writing the off-page columns.

rb:939 approved by Jimmy Yang
2012-02-17 11:42:04 +02:00
Marko Mäkelä
a5d2554db0 btr_cur_search_to_nth_level(): Add a debug assertion
and some Valgrind instrumentation.
2012-01-25 10:15:27 +02:00
Karen Langford
e1df69f75a Merge from mysql-5.1.60-release 2011-11-17 00:26:16 +01:00
Marko Mäkelä
77eb01b827 Bug#13358468 ASSERTION FAILURE IN BTR_PCUR_GET_BLOCK
btr_pcur_restore_position_func(): When the cursor was positioned at
the tree infimum or supremum, initialize pos_state and latch_mode. The
assertion failed, because pos_state was BTR_PCUR_WAS_POSITIONED.  In
the test failure of WL#5874, the purge thread attempted to restore the
cursor position on the infimum record (the clustered index was empty).

btr_pcur_detach(), btr_pcur_is_detached(): Unused functions, remove.

rb:804 approved by Inaam Rana
2011-11-08 14:15:22 +02:00
Marko Mäkelä
2c67d5066d Revert revno:3452.71.32 (Bug#12612184 fix).
Bug#12612184 RACE CONDITION AFTER BTR_CUR_PESSIMISTIC_UPDATE()

The fix introduced potentially more severe crash recovery problems
than the bug causes. Revert the fix for now.
2011-10-26 12:23:57 +03:00
Marko Mäkelä
91b5e9352a Revert most of revno 3560.9.1 (Bug#12704861)
This was an attempt to address problems with the Bug#12612184 fix.
Even with this follow-up fix, crash recovery can be broken.
Let us fix the bug later.
2011-10-26 11:44:28 +03:00
Marko Mäkelä
8a0be8a541 Bug#13116045 Compilation failure using GCC 4.6.1 in btr/btr0cur.c
btr_record_not_null_field_in_rec(): Remove the parameter rec.
Use rec_offs_nth_sql_null() instead of rec_get_nth_field().

rb:788 approved by Jimmy Yang
2011-10-21 06:32:16 +03:00
Marko Mäkelä
6d59064948 Bug#13006367 62487: innodb takes 3 minutes to clean up the adaptive
hash index at shutdown

btr_search_disable(): Just drop the entire adaptive hash index,
without dropping every record separately.

buf_pool_clear_hash_index(): Renamed and simplified from
buf_pool_drop_hash_index(). Set block->index = NULL for every block in
the buffer pool. Do not release the btr_search_latch. The caller will
have to adjust other data structures.

Remove block->is_hashed. It is redundant, should be always equal to
block->index != NULL.

Remove btr_search_fully_disabled, btr_search_enabled_mutex, and
SYNC_SEARCH_SYS_CONF. We drop the AHI in one pass, without releasing
the btr_search_latch in between.

Replace void* with const rec_t* and add assertions on btr_search_latch
and btr_search_enabled to ha0ha.h, ha0ha.ic, ha0ha.c.

page_set_max_trx_id(): Ignore the adaptive hash index. I forgot to
push this in rb:750.

btr0sea.c: Always after acquiring btr_search_latch, check for
block->index==NULL or !btr_search_enabled. We can now set
block->index=NULL while only holding btr_search_latch in exclusive
mode. Always acquire btr_search_latch before reading block->index,
except in shortcuts when testing for block->index == NULL.

ha_clear(), ha_search(): Unused function, remove.

buf_page_peek_if_search_hashed(): Remove. This function may avoid
latching a page at the cost of doing a duplicate buf_pool->page_hash
lookup.

rb:775 approved by Inaam Rana
2011-10-12 09:00:49 +03:00
unknown
40761a9a73 Merge from mysql-5.1.59-release 2011-09-15 18:48:54 +02:00
Marko Mäkelä
8c545acd53 Bug#12948130 UNNECESSARY X-LOCKING OF ADAPTIVE HASH INDEX (BTR_SEARCH_LATCH)
InnoDB acquires an x-latch on btr_search_latch for certain in-place updates
that do affect the adaptive hash index. These operations do not really need
to be protected by the btr_search_latch:

* updating DB_TRX_ID
* updating DB_ROLL_PTR
* updating PAGE_MAX_TRX_ID
* updating the delete-mark flag

rb:750 approved by Sunny Bains
2011-09-08 16:10:24 +03:00
Marko Mäkelä
41f229cd9e Bug#12704861 Corruption after a crash during BLOB update
The fix of Bug#12612184 broke crash recovery. When a record that
contains off-page columns (BLOBs) is updated, we must first write redo
log about the BLOB page writes, and only after that write the redo log
about the B-tree changes. The buggy fix would log the B-tree changes
first, meaning that after recovery, we could end up having a record
that contains a null BLOB pointer.

Because we will be redo logging the writes off the off-page columns
before the B-tree changes, we must make sure that the pages chosen for
the off-page columns are free both before and after the B-tree
changes. In this way, the worst thing that can happen in crash
recovery is that the BLOBs are written to free pages, but the B-tree
changes are not applied. The BLOB pages would correctly remain free in
this case. To achieve this, we must allocate the BLOB pages in the
mini-transaction of the B-tree operation. A further quirk is that BLOB
pages are allocated from the same file segment as leaf pages. Because
of this, we must temporarily "hide" any leaf pages that were freed
during the B-tree operation by "fake allocating" them prior to writing
the BLOBs, and freeing them again before the mtr_commit() of the
B-tree operation, in btr_mark_freed_leaves().

btr_cur_mtr_commit_and_start(): Remove this faulty function that was
introduced in the Bug#12612184 fix. The problem that this function was
trying to address was that when we did mtr_commit() the BLOB writes
before the mtr_commit() of the update, the new BLOB pages could have
overwritten clustered index B-tree leaf pages that were freed during
the update. If recovery applied the redo log of the BLOB writes but
did not see the log of the record update, the index tree would be
corrupted. The correct solution is to make the freed clustered index
pages unavailable to the BLOB allocation. This function is also a
likely culprit of InnoDB hangs that were observed when testing the
Bug#12612184 fix.

btr_mark_freed_leaves(): Mark all freed clustered index leaf pages of
a mini-transaction allocated (nonfree=TRUE) before storing the BLOBs,
or freed (nonfree=FALSE) before committing the mini-transaction.

btr_freed_leaves_validate(): A debug function for checking that all
clustered index leaf pages that have been marked free in the
mini-transaction are consistent (have not been zeroed out).

btr_page_alloc_low(): Refactored from btr_page_alloc(). Return the
number of the allocated page, or FIL_NULL if out of space. Add the
parameter "mtr_t* init_mtr" for specifying the mini-transaction where
the page should be initialized, or if this is a "fake allocation"
(init_mtr=NULL) by btr_mark_freed_leaves(nonfree=TRUE).

btr_page_alloc(): Add the parameter init_mtr, allowing the page to be
initialized and X-latched in a different mini-transaction than the one
that is used for the allocation. Invoke btr_page_alloc_low(). If a
clustered index leaf page was previously freed in mtr, remove it from
the memo of previously freed pages.

btr_page_free(): Assert that the page is a B-tree page and it has been
X-latched by the mini-transaction. If the freed page was a leaf page
of a clustered index, link it by a MTR_MEMO_FREE_CLUST_LEAF marker to
the mini-transaction.

btr_store_big_rec_extern_fields_func(): Add the parameter alloc_mtr,
which is NULL (old behaviour in inserts) and the same as local_mtr in
updates. If alloc_mtr!=NULL, the BLOB pages will be allocated from it
instead of the mini-transaction that is used for writing the BLOBs.

fsp_alloc_from_free_frag(): Refactored from
fsp_alloc_free_page(). Allocate the specified page from a partially
free extent.

fseg_alloc_free_page_low(), fseg_alloc_free_page_general(): Add the
parameter "mtr_t* init_mtr" for specifying the mini-transaction where
the page should be initialized, or NULL if this is a "fake allocation"
that prevents the reuse of a previously freed B-tree page for BLOB
storage. If init_mtr==NULL, try harder to reallocate the specified page
and assert that it succeeded.

fsp_alloc_free_page(): Add the parameter "mtr_t* init_mtr" for
specifying the mini-transaction where the page should be initialized.
Do not allow init_mtr == NULL, because this function is never to be
used for "fake allocations".

mtr_t: Add the operation MTR_MEMO_FREE_CLUST_LEAF and the flag
mtr->freed_clust_leaf for quickly determining if any
MTR_MEMO_FREE_CLUST_LEAF operations have been posted.

row_ins_index_entry_low(): When columns are being made off-page in
insert-by-update, invoke btr_mark_freed_leaves(nonfree=TRUE) and pass
the mini-transaction as the alloc_mtr to
btr_store_big_rec_extern_fields(). Finally, invoke
btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.

row_build(): Correct a comment, and add a debug assertion that a
record that contains NULL BLOB pointers must be a fresh insert.

row_upd_clust_rec(): When columns are being moved off-page, invoke
btr_mark_freed_leaves(nonfree=TRUE) and pass the mini-transaction as
the alloc_mtr to btr_store_big_rec_extern_fields(). Finally, invoke
btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.

buf_reset_check_index_page_at_flush(): Remove. The function
fsp_init_file_page_low() already sets
bpage->check_index_page_at_flush=FALSE.

There is a known issue in tablespace extension. If the request to
allocate a BLOB page leads to the tablespace being extended, crash
recovery could see BLOB writes to pages that are off the tablespace
file bounds. This should trigger an assertion failure in fil_io() at
crash recovery. The safe thing would be to write redo log about the
tablespace extension to the mini-transaction of the BLOB write, not to
the mini-transaction of the record update. However, there is no redo
log record for file extension in the current redo log format.

rb:693 approved by Sunny Bains
2011-08-29 11:16:42 +03:00
Marko Mäkelä
669ff03703 Bug #11766591 59733: Possible deadlock when buffered changes are to be
discarded in buf_page_create()

This bug turned out to be a false alarm, a bug in the UNIV_SYNC_DEBUG
diagnostic code. Because of this, the patch was not backported to the
built-in InnoDB in MySQL 5.1. Furthermore, there is no test case for
InnoDB Plugin in MySQL 5.1, because the delete buffering in MySQL 5.5
makes triggering the failure much easier.

When a freed page for which there exist orphaned buffered changes is
allocated and reused for something else, buf_page_create() will discard
the buffered changes by invoking ibuf_merge_or_delete_for_page().
This would violate the InnoDB latching order.

Tweak the latching order as follows. Move SYNC_IBUF_MUTEX below
SYNC_FSP_PAGE, where it logically belongs, and assign new latching
levels for the ibuf->index->lock and the insert buffer B-tree pages:

#define SYNC_IBUF_MUTEX		370	/* ibuf_mutex */
#define SYNC_IBUF_INDEX_TREE	360
#define SYNC_IBUF_TREE_NODE_NEW	359
#define SYNC_IBUF_TREE_NODE	358

btr_block_get(), btr_page_get(): In UNIV_SYNC_DEBUG, add the parameter
"index" for determining the appropriate latching order
(SYNC_IBUF_TREE_NODE or SYNC_TREE_NODE).

btr_page_alloc_for_ibuf(), btr_create(): Use SYNC_IBUF_TREE_NODE_NEW
instead of SYNC_TREE_NODE_NEW for insert buffer pages.

btr_cur_search_to_nth_level(), btr_pcur_restore_position_func(): Use
SYNC_IBUF_TREE_NODE instead of SYNC_TREE_NODE for insert buffer pages.

btr_search_guess_on_hash(): Assert that the index is not an insert buffer tree.

dict_index_add_to_cache(): Use SYNC_IBUF_INDEX_TREE for the insert
buffer tree (ibuf->index->lock).

ibuf0ibuf.c: Use SYNC_IBUF_TREE_NODE or SYNC_IBUF_TREE_NODE_NEW for
all B-tree pages.

ibuf_merge_or_delete_for_page(): Assert that the user page is
BUF_IO_READ fixed. Only in this way it is OK to latch it as
SYNC_IBUF_TREE_NODE instead of the proper SYNC_TREE_NODE (which would
violate the changed latching order).

sync_thread_add_level(): Remove the special tweak for
SYNC_IBUF_MUTEX. Add rules for the added latching levels.

rb:591 approved by Jimmy Yang
2011-08-15 12:11:43 +03:00
Marko Mäkelä
f2f4b19678 Bug#12626794 61240: UNUSED FUNCTIONS ... 2011-08-10 14:56:14 +03:00
Inaam Rana
e60e65052f Bug 12635227 - 61188: DROP TABLE EXTREMELY SLOW
approved by: Marko
rb://681

Coalescing of free buf_page_t descriptors can prove to be one severe
bottleneck in performance of compression. One such workload where it
hurts badly is DROP TABLE. This patch removes buf_page_t allocations
from buf_buddy and uses ut_malloc instead.
In order to further reduce overhead of colaescing we no longer attempt
to coalesce a block if the corresponding free_list is less than 16 in
size.
2011-06-17 16:20:20 -04:00
Vasil Dimov
0dfe86f53f Silence bogus compiler warning introduced in
marko.makela@oracle.com-20110616072721-8bo92ctixq6eqavr
2011-06-16 16:11:43 +03:00
Marko Mäkelä
417a267927 Re-enable the debug assertions for Bug#12650861.
Replace UNIV_BLOB_NULL_DEBUG with UNIV_DEBUG||UNIV_BLOB_LIGHT_DEBUG. 
Fix known bogus failures.

btr_cur_optimistic_update(): If rec_offs_any_null_extern(), assert that
the current transaction is an incomplete transaction that is being
rolled back in crash recovery.

row_build(): If rec_offs_any_null_extern(), assert that the transaction
that last updated the record was recovered during crash recovery
(and will soon be rolled back).
2011-06-16 11:51:04 +03:00
Marko Mäkelä
5b4ceba58d Bug#12612184 Race condition after btr_cur_pessimistic_update()
btr_cur_compress_if_useful(), btr_compress(): Add the parameter ibool
adjust. If adjust=TRUE, adjust the cursor position after compressing
the page.

btr_lift_page_up(): Return a pointer to the father page.

BTR_KEEP_POS_FLAG: A new flag for btr_cur_pessimistic_update().

btr_cur_pessimistic_update(): If *big_rec != NULL and flags &
BTR_KEEP_POS_FLAG, keep the cursor positioned on the updated record.
Also, do not release the index tree x-lock if *big_rec != NULL.

btr_cur_mtr_commit_and_start(): Commits and restarts a
mini-transaction so that it will retain an x-lock on index->lock and
the page of the cursor. This is invoked when
btr_cur_pessimistic_update() returns *big_rec != NULL.

In all callers of btr_cur_pessimistic_update() that do not pass
BTR_KEEP_POS_FLAG, assert that *big_rec == NULL.

btr_cur_compress(): Unused function [in the built-in MySQL 5.1], remove.

page_rec_get_nth(): Return the nth record on the page (an inverse
function of page_rec_get_n_recs_before()). Refactored from
page_get_middle_rec().

page_get_middle_rec(): Invoke page_rec_get_nth().

page_cur_insert_rec_zip_reorg(): Make use of the page directory
shortcuts in page_rec_get_nth() instead of scanning the whole list of
records.

row_ins_clust_index_entry_by_modify(): Pass BTR_KEEP_POS_FLAG to
btr_cur_pessimistic_update().

row_ins_index_entry_low(): If row_ins_clust_index_entry_by_modify()
returns a big_rec, invoke btr_cur_mtr_commit_and_start() in order to
commit and start the mini-transaction without releasing the x-locks on
index->lock and the cursor page, and write the big_rec. Releasing the
page latch in mtr_commit() caused a race condition.

row_upd_clust_rec(): Pass BTR_KEEP_POS_FLAG to
btr_cur_pessimistic_update(). If it returns a big_rec, invoke
btr_cur_mtr_commit_and_start() in order to commit and start the
mini-transaction without releasing the x-locks on index->lock and the
cursor page, and write the big_rec. Releasing the page latch in
mtr_commit() caused a race condition.

sync_thread_add_level(): Add the parameter ibool relock. When TRUE,
bypass the latching order rules.

rw_lock_add_debug_info(): For nested X-lock requests, pass relock=TRUE
to sync_thread_add_level().

rb:678 approved by Jimmy Yang
2011-06-16 10:27:21 +03:00
Marko Mäkelä
a862937699 Introduce UNIV_BLOB_NULL_DEBUG for temporarily hiding Bug#12650861.
Some ut_a(!rec_offs_any_null_extern()) assertion failures are indicating
genuine BLOB bugs, others are bogus failures when rolling back incomplete
transactions at crash recovery. This needs more work, and until I get a
chance to work on it, other testing must not be disrupted by this.
2011-06-15 10:16:59 +03:00
Marko Mäkelä
6348b7375a BLOB instrumentation for Bug#12612184 Race condition in row_upd_clust_rec()
If UNIV_DEBUG or UNIV_BLOB_LIGHT_DEBUG is enabled, add
!rec_offs_any_null_extern() assertions, ensuring that records do not
contain null pointers to externally stored columns in inappropriate
places.

btr_cur_optimistic_update(): Assert !rec_offs_any_null_extern().
Incomplete records must never be updated or deleted. This assertion
will cover also the pessimistic route.

row_build(): Assert !rec_offs_any_null_extern(). Search tuples must
never be built from incomplete index entries.

row_rec_to_index_entry(): Assert !rec_offs_any_null_extern() unless
ROW_COPY_DATA is requested. ROW_COPY_DATA is used for
multi-versioning, and therefore it might be valid to copy the most
recent (uncommitted) version while it contains a null pointer to
off-page columns.

row_vers_build_for_consistent_read(),
row_vers_build_for_semi_consistent_read(): Assert !rec_offs_any_null_extern()
on all versions except the most recent one.

trx_undo_prev_version_build(): Assert !rec_offs_any_null_extern() on
the previous version.

rb:682 approved by Sunny Bains
2011-06-09 13:31:15 +03:00
Marko Mäkelä
4f4b404e59 Bug#11849231 inflateInit() invoked without initializing all memory
According to the zlib documentation, next_in and avail_in
must be initialized before invoking inflateInit or inflateInit2.
Furthermore, the zalloc function must clear the allocated memory.

btr_copy_zblob_prefix(): Replace the d_stream parameter with buf,len
and return the copied length.

page_zip_decompress(): Invoke inflateInit2 a little later.

page_zip_zalloc(): Rename from page_zip_alloc().
Invoke mem_heap_zalloc() instead of mem_heap_alloc().

rb:619 approved by Jimmy Yang
2011-03-15 12:01:02 +02:00
Marko Mäkelä
0f8ae318c7 Bug #58549 Race condition in buf_LRU_drop_page_hash_for_tablespace()
and compressed tables

buf_LRU_drop_page_hash_for_tablespace(): after releasing and
reacquiring the buffer pool mutex, do not dereference any block
descriptor pointer that is not known to be a pointer to an
uncompressed page frame (type buf_block_t; state ==
BUF_BLOCK_FILE_PAGE). Also, defer the acquisition of the block_mutex
until it is needed.

buf_page_get_gen(): Add mode == BUF_GET_IF_IN_POOL_PEEK for
buffer-fixing a block without making it young in the LRU list.

buf_page_get_gen(), buf_page_init(), buf_LRU_block_remove_hashed_page():
Set bpage->state = BUF_BLOCK_ZIP_FREE before buf_buddy_free(bpage),
so that similar race conditions might be detected a little easier.

btr_search_drop_page_hash_when_freed(): Use BUF_GET_IF_IN_POOL_PEEK
when dropping the hash indexes.

rb://528 approved by Jimmy Yang
2011-02-28 13:51:18 +02:00
Marko Mäkelä
89621ad738 Implement UNIV_BLOB_DEBUG. An early version of this caught Bug #55284.
This option is known to be broken when tablespaces contain off-page
columns after crash recovery. It has only been tested when creating
the data files from the scratch.

btr_blob_dbg_t: A map from page_no:heap_no:field_no to first_blob_page_no.
This map is instantiated for every clustered index in index->blobs.
It is protected by index->blobs_mutex.

btr_blob_dbg_msg_issue(): Issue a diagnostic message.
Invoked when btr_blob_dbg_msg is set.

btr_blob_dbg_rbt_insert(): Insert a btr_blob_dbg_t into index->blobs.

btr_blob_dbg_rbt_delete(): Remove a btr_blob_dbg_t from index->blobs.

btr_blob_dbg_cmp(): Comparator for btr_blob_dbg_t.

btr_blob_dbg_add_blob(): Add a BLOB reference to the map.

btr_blob_dbg_add_rec(): Add all BLOB references from a record to the map.

btr_blob_dbg_print(): Display the map of BLOB references in an index.

btr_blob_dbg_remove_rec(): Remove all BLOB references of a record from
the map.

btr_blob_dbg_is_empty(): Check that no BLOB references exist to or
from a page. Disowned references from delete-marked records are
tolerated.

btr_blob_dbg_op(): Perform an operation on all BLOB references on a
B-tree page.

btr_blob_dbg_add(): Add all BLOB references from a B-tree page to the
map.

btr_blob_dbg_remove(): Remove all BLOB references from a B-tree page
from the map.

btr_blob_dbg_restore(): Restore the BLOB references after a failed
page reorganize.

btr_blob_dbg_set_deleted_flag(): Modify the 'deleted' flag in the BLOB
references of a record.

btr_blob_dbg_owner(): Own or disown a BLOB reference.

btr_page_create(), btr_page_free_low(): Assert that no BLOB references exist.

btr_create(): Create index->blobs for clustered indexes.

btr_page_reorganize_low(): Invoke btr_blob_dbg_remove() before copying
the records. Invoke btr_blob_dbg_restore() if the operation fails.

btr_page_empty(), btr_lift_page_up(), btr_compress(), btr_discard_page():
Invoke btr_blob_dbg_remove().

btr_cur_del_mark_set_clust_rec(): Invoke btr_blob_dbg_set_deleted_flag().

Other cases of modifying the delete mark are either in the secondary
index or during crash recovery, which we do not promise to support.

btr_cur_set_ownership_of_extern_field(): Invoke btr_blob_dbg_owner().

btr_store_big_rec_extern_fields(): Invoke btr_blob_dbg_add_blob().

btr_free_externally_stored_field(): Invoke btr_blob_dbg_assert_empty()
on the first BLOB page.

page_cur_insert_rec_low(), page_cur_insert_rec_zip(),
page_copy_rec_list_end_to_created_page(): Invoke btr_blob_dbg_add_rec().

page_cur_insert_rec_zip_reorg(), page_copy_rec_list_end(),
page_copy_rec_list_start(): After failure, invoke
btr_blob_dbg_remove() and btr_blob_dbg_add().

page_cur_delete_rec(): Invoke btr_blob_dbg_remove_rec().

page_delete_rec_list_end(): Invoke btr_blob_dbg_op(btr_blob_dbg_remove_rec).

page_zip_reorganize(): Invoke btr_blob_dbg_remove() before copying the records.

page_zip_copy_recs(): Invoke btr_blob_dbg_add().

row_upd_rec_in_place(): Invoke btr_blob_dbg_rbt_delete() and
btr_blob_dbg_rbt_insert().

innobase_start_or_create_for_mysql(): Warn when UNIV_BLOB_DEBUG is enabled.

rb://550 approved by Jimmy Yang
2011-02-08 12:56:23 +02:00
Marko Mäkelä
5adf2313f7 Bug #55284 diagnostics: Introduce UNIV_BLOB_LIGHT_DEBUG, enabled by UNIV_DEBUG
btr_rec_get_field_ref_offs(), btr_rec_get_field_ref(): New functions.
Get the pointer to an externally stored field.

btr_cur_set_ownership_of_extern_field(): Assert that the BLOB has not
already been disowned.

btr_store_big_rec_extern_fields(): Rename to
btr_store_big_rec_extern_fields_func() and add the debug parameter
update_in_place. All pointers to externally stored columns in the
record must either be zero or they must be pointers to inherited
columns, owned by this record or an earlier record version. For any
BLOB that is stored, the BLOB pointer must previously have been
zero. When the function completes, all BLOB pointers must be nonzero
and owned by the record.

rb://549 approved by Jimmy Yang
2011-02-02 15:51:08 +02:00
Marko Mäkelä
f2eacde4cd Bug #55284 diagnostics: When UNIV_DEBUG, do not tolerate garbage in
Antelope files in btr_check_blob_fil_page_type(). Unfortunately, we
must keep the check in production builds, because InnoDB wrote
uninitialized garbage to FIL_PAGE_TYPE until fairly recently (5.1.x).

rb://546 approved by Jimmy Yang
2011-02-02 14:10:12 +02:00
Marko Mäkelä
e952ee1158 Bug#59230 assert 0 row_upd_changes_ord_field_binary() in post-crash
trx rollback or purge

This patch does not relax the failing debug assertion during purge.
That will be revisited once we have managed to repeat the assertion failure.

row_upd_changes_ord_field_binary_func(): Renamed from
row_upd_changes_ord_field_binary(). Add the parameter que_thr_t* in
UNIV_DEBUG builds. When the off-page column cannot be retrieved,
assert that the current transaction is a recovered one and that it is
the one that is currently being rolled back.

row_upd_changes_ord_field_binary(): A wrapper macro for
row_upd_changes_ord_field_binary_func() that discards the que_thr_t*
parameter unless UNIV_DEBUG is defined.

row_purge_upd_exist_or_extern_func(): Renamed from
row_purge_upd_exist_or_extern(). Add the parameter que_thr_t* in
UNIV_DEBUG builds.

row_purge_upd_exist_or_extern(): A wrapper macro for
row_purge_upd_exist_or_extern_func() that discards the que_thr_t*
parameter unless UNIV_DEBUG is defined.

Make trx_roll_crash_recv_trx const. If there were a 'do not
dereference' attribute, it would be appropriate as well.

rb://588 approved by Jimmy Yang
2011-01-31 09:56:51 +02:00
Jimmy Yang
71e8043bae Fix Bug #59465 btr_estimate_number_of_different_key_vals use incorrect offset
for external_size
      
rb://581 approved by Marko
2011-01-28 00:50:10 -08:00
Marko Mäkelä
60a622d1c1 Bug#59707 Unused compression-related parameters in buffer pool functions
buf_block_alloc(): ulint zip_size is always 0.
buf_LRU_get_free_block(): ulint zip_size is always 0.
buf_LRU_free_block(): ibool* buf_pool_mutex_released is always NULL.

Remove these parameters.

buf_LRU_get_free_block(): Simplify the initialization of block->page.zip
and release buf_pool_mutex() earlier.
2011-01-25 09:56:18 +02:00
Jimmy Yang
9cd4d49840 Fix Bug#30423 "InnoDBs treatment of NULL in index stats causes bad
"rows examined" estimates". This change implements "innodb_stats_method"
with options of "nulls_equal", "nulls_unequal" and "null_ignored".
      
rb://553 approved by Marko
2011-01-14 09:02:28 -08:00
Vasil Dimov
fd7de284d4 (InnoDB Plugin) Fix Bug#59303 Correct URL in crash message
old URL: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html
new URL: http://dev.mysql.com/doc/refman/5.1/en/forcing-innodb-recovery.html

Notice that there is a redirect from the old URL to the new URL, so visiting
the old URL does not give "page not found" error.
2011-01-06 09:12:53 +02:00
Marko Mäkelä
721a890e48 Bug #55284 Double BLOB free due to lock wait while updating PRIMARY KEY
This bug fix requires that Bug #58912 be fixed as well (bzr revision id
marko.makela@oracle.com-20101221093919-mcmmgd4zpse9567d). Otherwise,
another double BLOB free could occur when InnoDB would try to perform
an update-in-place as delete-and-insert-by-update-in-place.

row_upd_clust_rec_by_insert(): Do not disown the externally stored
columns from the old record (btr_cur_mark_extern_inherited_fields())
until after checking the foreign key constraints and successfully
inserting the updated record. If a lock wait timeout occurs between
the delete-marking of the old record and the insertion of the updated
record, mark the columns inherited before retrying the insert.
Distinguish the state UPD_NODE_INSERT_BLOB from
UPD_NODE_INSERT_CLUSTERED.

btr_cur_del_mark_set_clust_rec(): Replace the cursor with
block,rec,index,offsets so that the offsets need not be recalculated.
Assert that rec is on a clustered index leaf page.

btr_cur_disown_inherited_fields(): Renamed from
btr_cur_mark_extern_inherited_fields(). Use
upd_get_field_by_field_no(). Assert that there are externally stored
columns. Assert that a mini-transaction is passed. Remove the return
status. (The only caller, row_upd_clust_rec_by_insert(), will have
determined that some fields have changed ownership.)

btr_cur_mark_dtuple_inherited_extern(): Rename to
row_upd_clust_rec_by_insert_inherit_func() and declare as static. Add
the debug parameters rec, offsets. When rec is given, assert that the
off-page columns match those in the inesrt tuple and that the off-page
columns are owned by the record. Assert that the non-updated off-page
columns in the insert tuple are owned, and mark them inherited.

row_upd_clust_rec_by_insert_inherit(): A wrapper macro for
row_upd_clust_rec_by_insert_inherit_func().

row_undo_mod_upd_exist_sec(): Adjust a comment about
row_upd_clust_rec_by_insert().

rb:508 approved by Jimmy Yang
2010-12-21 13:27:22 +02:00
Marko Mäkelä
590fdf7792 Bug#58912 InnoDB unnecessarily avoids update-in-place on column prefix indexes
row_upd_changes_ord_field_binary(): Do not return TRUE if the update
vector changes a column that is covered by a prefix index, but does
not change the column prefix. Add the row_ext_t parameter for
determining whether the prefixes of externally stored columns match.

dfield_datas_are_binary_equal(): Add the parameter len, for comparing
column prefixes when len > 0.

innodb.test: Add a test case where the patch of Bug #55284 failed
without this fix.

rb:537 approved by Jimmy Yang
2010-12-21 11:39:19 +02:00
Vasil Dimov
ce782ee2c2 Merge mysql-5.1-innodb from bk-internal into my local repo 2010-11-10 10:40:22 +02:00
Marko Mäkelä
eab53d4e80 Non-functional change: Remove bogus const qualifiers
and make some function comments more accurate.
2010-11-03 11:16:11 +02:00
Vasil Dimov
49cbbf6ea4 Fix Bug#53046 dict_update_statistics_low can still be run concurrently on same table
Replace the array of mutexes that used to protect
dict_index_t::stat_n_diff_key_vals[] with an array of rw locks that protects
all the stats related members in dict_table_t and all of its indexes.

Approved by:	Jimmy (rb://503)
2010-11-02 18:57:20 +02:00
Marko Mäkelä
c38071ec18 Bug #56680 wrong InnoDB results from a case-insensitive covering index
row_search_for_mysql(): When a secondary index record might not be
visible in the current transaction's read view and we consult the
clustered index and optionally some undo log records, return the
relevant columns of the clustered index record to MySQL instead of the
secondary index record.

ibuf_insert_to_index_page_low(): New function, refactored from
ibuf_insert_to_index_page().

ibuf_insert_to_index_page(): When we are inserting a record in place
of a delete-marked record and some fields of the record differ, update
that record just like row_ins_sec_index_entry_by_modify() would do.

btr_cur_update_alloc_zip(): Make the function public.

mysql_row_templ_t: Add clust_rec_field_no.

row_sel_store_mysql_rec(), row_sel_push_cache_row_for_mysql(): Add the
flag rec_clust, for returning data at clust_rec_field_no instead of
rec_field_no. Resurrect the debug assertion that the record not be
marked for deletion. (Bug #55626)

[UNIV_DEBUG || UNIV_IBUF_DEBUG] ibuf_debug, buf_page_get_gen(),
buf_flush_page_try():
Implement innodb_change_buffering_debug=1 for evicting pages from the
buffer pool, so that change buffering will be attempted more
frequently.
2010-10-19 09:04:15 +03:00
Georgi Kodinov
292a72a043 merged mysql-5.1 into mysql-5.1-bugteam 2010-10-05 11:11:56 +03:00
Vasil Dimov
bcde57a3bd (partially) Fix Bug#55227 Fix compiler warnings in innodb with gcc 4.6
Fix compiler warning:
btr/btr0sea.c: In function 'btr_search_update_hash_on_delete':
btr/btr0sea.c:1498:9: error: variable 'found' set but not used [-Werror=unused-but-set-variable]
2010-09-14 21:36:29 +03:00
Vasil Dimov
8e3c62cc57 (partially) Fix Bug#55227 Fix compiler warnings in innodb with gcc 4.6
Fix compiler warning:
btr/btr0pcur.c: In function 'btr_pcur_move_backward_from_page':
btr/btr0pcur.c:455:9: error: variable 'space' set but not used [-Werror=unused-but-set-variable]
2010-09-14 21:35:37 +03:00
Vasil Dimov
07cded6187 (partially) Fix Bug#55227 Fix compiler warnings in innodb with gcc 4.6
Fix compiler warning:
btr/btr0cur.c: In function 'btr_free_externally_stored_field':
btr/btr0cur.c:4281:16: error: variable 'rec_block' set but not used [-Werror=unused-but-set-variable]
2010-09-14 21:33:02 +03:00
Vasil Dimov
57e860c6a1 (partially) Fix Bug#55227 Fix compiler warnings in innodb with gcc 4.6
Fix compiler warning:
btr/btr0cur.c: In function 'btr_cur_optimistic_update':
btr/btr0cur.c:1839:10: error: variable 'orig_rec' set but not used [-Werror=unused-but-set-variable]
2010-09-14 21:30:37 +03:00
Vasil Dimov
67a1603c93 (partially) Fix Bug#55227 Fix compiler warnings in innodb with gcc 4.6
Fix compiler warning:
btr/btr0btr.c: In function 'btr_compress':
btr/btr0btr.c:2564:9: error: variable 'level' set but not used [-Werror=unused-but-set-variable]
2010-09-14 21:14:42 +03:00
Vasil Dimov
424ec78a46 (partially) Fix Bug#55227 Fix compiler warnings in innodb with gcc 4.6
Fix compiler warning:
btr/btr0btr.c: In function 'btr_page_split_and_insert':
btr/btr0btr.c:1898:11: error: variable 'insert_page' set but not used [-Werror=unused-but-set-variable]
2010-09-14 21:12:19 +03:00
Jimmy Yang
9b3a3944e4 Merge from mysql-5.1-bugteam to mysql-5.1-security 2010-09-01 17:43:02 -07:00
Inaam Rana
0812058d02 Backport of revno 3148 mysql-innodb-trunk
Currently we do a full validation of AHI whenever check tables is
  called on any table. This patch fixes this by only doing this full
  check in debug versions.
  
  bug#55716
  rb://423
  approved by: Marko
2010-08-05 11:34:44 -04:00
Sunny Bains
b37256b1ab Fix bug# 55543 - InnoDB Plugin: Signal 6: Assertion failure in file fil/fil0fil.c line 4306
The bug is due to a double delete of a BLOB, once via:

    rollback -> btr_cur_pessimistic_delete()

and the second time via purge.

The bug is in row_upd_clust_rec_by_insert(). There we relinquish ownership
of the non-updated BLOB columns in btr_cur_mark_extern_inherited_fields()
before building the row entry that will be inserted and whose contents will
be logged in the UNDO log. However, we don't set the BLOB column later to
INHERITED so that a possible rollback will not free the original row's
non-updated BLOB entries. This is because the condition that checks for
that is in :

	if (node->upd_ext) {}.

node->upd_ext is non-NULL only if a BLOB column was updated and that column
is part of some key ordering (see row_upd_replace()). This results in the
non-update BLOB columns being deleted during a rollback and subsequently by
purge again.

rb://413
2010-08-05 19:18:17 +10:00
Jimmy Yang
7f19cc824f Port fix for bug #54311 from mysql-trunk-innodb to mysql-5.1-innodb codeline.
Bug #54311: Crash on CHECK PARTITION after concurrent LOAD DATA
and adaptive_hash_index=OFF
2010-06-30 22:06:01 -07:00
Marko Mäkelä
707a3bef6e Correct some comments that were added in the fix of Bug #54358
(READ UNCOMMITTED access failure of off-page DYNAMIC or COMPRESSED columns).

Records that lack incompletely written externally stored columns may
be accessed by READ UNCOMMITTED transaction even without involving a
crash during an INSERT or UPDATE operation. I verified this as follows.

(1) added a delay after the mini-transaction for writing the clustered
index 'stub' record was committed (patch attached)
(2) started mysqld in gdb, setting breakpoints to the where the
assertions about READ UNCOMMITTED were added in the bug fix
(3) invoked ibtest3 --create-options=key_block_size=2
to create BLOBs in a COMPRESSED table
(4) invoked the following:
yes 'set transaction isolation level read uncommitted;
checksum table blobt3;select sleep(1);'|mysql -uroot test
(5) noted that one of the breakpoints was triggered
(return(NULL) in btr_rec_copy_externally_stored_field())

=== modified file 'storage/innodb_plugin/row/row0ins.c'
--- storage/innodb_plugin/row/row0ins.c	2010-06-30 08:17:25 +0000
+++ storage/innodb_plugin/row/row0ins.c	2010-06-30 08:17:25 +0000
@@ -2120,6 +2120,7 @@ function_exit:
 		rec_t*	rec;
 		ulint*	offsets;
 		mtr_start(&mtr);
+		os_thread_sleep(5000000);
 
 		btr_cur_search_to_nth_level(index, 0, entry, PAGE_CUR_LE,
 					    BTR_MODIFY_TREE, &cursor, 0,

=== modified file 'storage/innodb_plugin/row/row0upd.c'
--- storage/innodb_plugin/row/row0upd.c	2010-06-30 08:11:55 +0000
+++ storage/innodb_plugin/row/row0upd.c	2010-06-30 08:11:55 +0000
@@ -1763,6 +1763,7 @@ row_upd_clust_rec(
 		rec_offs_init(offsets_);
 
 		mtr_start(mtr);
+		os_thread_sleep(5000000);
 
 		ut_a(btr_pcur_restore_position(BTR_MODIFY_TREE, pcur, mtr));
 		rec = btr_cur_get_rec(btr_cur);
2010-06-30 12:31:49 +03:00
Marko Mäkelä
62084feb55 Bug#54358: READ UNCOMMITTED access failure of off-page DYNAMIC or COMPRESSED
columns

When the server crashes after a record stub has been inserted and
before all its off-page columns have been written, the record will
contain incomplete off-page columns after crash recovery. Such records
may only be accessed at the READ UNCOMMITTED isolation level or when
rolling back a recovered transaction in recv_recovery_rollback_active().
Skip these records at the READ UNCOMMITTED isolation level.

TODO: Add assertions for checking the above assumptions hold when an
incomplete BLOB is encountered.

btr_rec_copy_externally_stored_field(): Return NULL if the field is
incomplete.

row_prebuilt_t::templ_contains_blob: Clarify what "BLOB" means in this
context. Hint: MySQL BLOBs are not the same as InnoDB BLOBs.

row_sel_store_mysql_rec(): Return FALSE if not all columns could be
retrieved. Previously this function always returned TRUE.  Assert that
the record is not delete-marked.

row_sel_push_cache_row_for_mysql(): Return FALSE if not all columns
could be retrieved.

row_search_for_mysql(): Skip records containing incomplete off-page
columns. Assert that the transaction isolation level is READ
UNCOMMITTED.

rb://380 approved by Jimmy Yang
2010-06-29 15:55:18 +03:00
Marko Mäkelä
90cb7fa432 Minor cleanup.
lock_rec_unlock(): Cache first_lock and rewrite while() loops as for().

btr_cur_optimistic_update(): Use common error handling return.

row_create_prebuilt(): Add Valgrind instrumentation.
2010-06-01 15:07:51 +03:00
Vasil Dimov
da0b6d611f Merge a change from mysql-trunk-innodb:
------------------------------------------------------------
  revno: 3127
  revision-id: vasil.dimov@oracle.com-20100531152341-x2d4hma644icamh1
  parent: vasil.dimov@oracle.com-20100531105923-kpjwl4rbgfpfj13c
  committer: Vasil Dimov <vasil.dimov@oracle.com>
  branch nick: mysql-trunk-innodb
  timestamp: Mon 2010-05-31 18:23:41 +0300
  message:
    Fix Bug #53947 InnoDB: Assertion failure in thread 4224 in file .\sync\sync0sync.c line 324
    
    Destroy the rw-lock object before freeing the memory it is occupying.
    If we do not do this, then the mutex that is contained in the rw-lock
    object btr_search_latch_temp->mutex gets "freed" and subsequently
    mutex_free() from sync_close() hits a mutex whose memory has been
    freed and crashes.
    
    Approved by:	Heikki (via IRC)
    Discussed with:	Calvin
2010-05-31 19:35:40 +03:00