1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-30 11:22:14 +03:00
Commit Graph

7 Commits

Author SHA1 Message Date
Marko Mäkelä
f77329ace9 Bug#13721257 RACE CONDITION IN UPDATES OR INSERTS OF WIDE RECORDS
This bug was originally filed and fixed as Bug#12612184. The original
fix was buggy, and it was patched by Bug#12704861. Also that patch was
buggy (potentially breaking crash recovery), and both fixes were
reverted.

This fix was not ported to the built-in InnoDB of MySQL 5.1, because
the function signatures of many core functions are different from
InnoDB Plugin and later versions. The block allocation routines and
their callers would have to changed so that they handle block
descriptors instead of page frames.

When a record is updated so that its size grows, non-updated columns
can be selected for external (off-page) storage. The bug is that the
initially inserted updated record contains an all-zero BLOB pointer to
the field that was not updated. Only after the BLOB pages have been
allocated and written, the valid pointer can be written to the record.

Between the release of the page latch in mtr_commit(mtr) after
btr_cur_pessimistic_update() and the re-latching of the page in
btr_pcur_restore_position(), other threads can see the invalid BLOB
pointer consisting of 20 zero bytes. Moreover, if the system crashes
at this point, the situation could persist after crash recovery, and
the contents of the non-updated column would be permanently lost.

The problem is amplified by the ROW_FORMAT=DYNAMIC and
ROW_FORMAT=COMPRESSED that were introduced in
innodb_file_format=barracuda in InnoDB Plugin, but the bug does exist
in all InnoDB versions.

The fix is as follows. After a pessimistic B-tree operation that needs
to write out off-page columns, allocate the pages for these columns in
the mini-transaction that performed the B-tree operation (btr_mtr),
but write the pages in a separate mini-transaction (blob_mtr). Do
mtr_commit(blob_mtr) before mtr_commit(btr_mtr). A quirk: Do not reuse
pages that were previously freed in btr_mtr. Only write the off-page
columns to 'fresh' pages.

In this way, crash recovery will see redo log entries for blob_mtr
before any redo log entry for btr_mtr. It will apply the BLOB page
writes to pages that were marked free at that point. If crash recovery
fails to see all of the btr_mtr redo log, there will be some
unreachable BLOB data in free pages, but the B-tree will be in a
consistent state.

btr_page_alloc_low(): Renamed from btr_page_alloc(). Add the parameter
init_mtr. Return an allocated block, or NULL. If init_mtr!=mtr but
the page was already X-latched in mtr, do not initialize the page.

btr_page_alloc(): Wrapper for btr_page_alloc_for_ibuf() and
btr_page_alloc_low().

btr_page_free(): Add a debug assertion that the page was a B-tree page.

btr_lift_page_up(): Return the father block.

btr_compress(), btr_cur_compress_if_useful(): Add the parameter ibool
adjust, for adjusting the cursor position.

btr_cur_pessimistic_update(): Preserve the cursor position when
big_rec will be written and the new flag BTR_KEEP_POS_FLAG is defined.
Remove a duplicate rec_get_offsets() call. Keep the X-latch on
index->lock when big_rec is needed.

btr_store_big_rec_extern_fields(): Replace update_inplace with
an operation code, and local_mtr with btr_mtr. When not doing a
fresh insert and btr_mtr has freed pages, put aside any pages that
were previously X-latched in btr_mtr, and free the pages after
writing out all data. The data must be written to 'fresh' pages,
because btr_mtr will be committed and written to the redo log after
the BLOB writes have been written to the redo log.

btr_blob_op_is_update(): Check if an operation passed to
btr_store_big_rec_extern_fields() is an update or insert-by-update.

fseg_alloc_free_page_low(), fsp_alloc_free_page(),
fseg_alloc_free_extent(), fseg_alloc_free_page_general(): Add the
parameter init_mtr. Return an allocated block, or NULL. If
init_mtr!=mtr but the page was already X-latched in mtr, do not
initialize the page.

xdes_get_descriptor_with_space_hdr(): Assert that the file space
header is being X-latched.

fsp_alloc_from_free_frag(): Refactored from fsp_alloc_free_page().

fsp_page_create(): New function, for allocating, X-latching and
potentially initializing a page. If init_mtr!=mtr but the page was
already X-latched in mtr, do not initialize the page.

fsp_free_page(): Add ut_ad(0) to the error outcomes.

fsp_free_page(), fseg_free_page_low(): Increment mtr->n_freed_pages.

fsp_alloc_seg_inode_page(), fseg_create_general(): Assert that the
page was not previously X-latched in the mini-transaction. A file
segment or inode page should never be allocated in the middle of an
mini-transaction that frees pages, such as btr_cur_pessimistic_delete().

fseg_alloc_free_page_low(): If the hinted page was allocated, skip the
check if the tablespace should be extended. Return NULL instead of
FIL_NULL on failure. Remove the flag frag_page_allocated. Instead,
return directly, because the page would already have been initialized.

fseg_find_free_frag_page_slot() would return ULINT_UNDEFINED on error,
not FIL_NULL. Correct a bogus assertion.

fseg_alloc_free_page(): Redefine as a wrapper macro around
fseg_alloc_free_page_general().

buf_block_buf_fix_inc(): Move the definition from the buf0buf.ic to
buf0buf.h, so that it can be called from other modules.

mtr_t: Add n_freed_pages (number of pages that have been freed).

page_rec_get_nth_const(), page_rec_get_nth(): The inverse function of
page_rec_get_n_recs_before(), get the nth record of the record
list. This is faster than iterating the linked list. Refactored from
page_get_middle_rec().

trx_undo_rec_copy(): Add a debug assertion for the length.

trx_undo_add_page(): Return a block descriptor or NULL instead of a
page number or FIL_NULL.

trx_undo_report_row_operation(): Add debug assertions.

trx_sys_create_doublewrite_buf(): Assert that each page was not
previously X-latched.

page_cur_insert_rec_zip_reorg(): Make use of page_rec_get_nth().

row_ins_clust_index_entry_by_modify(): Pass BTR_KEEP_POS_FLAG, so that
the repositioning of the cursor can be avoided.

row_ins_index_entry_low(): Add DEBUG_SYNC points before and after
writing off-page columns. If inserting by updating a delete-marked
record, do not reposition the cursor or commit the mini-transaction
before writing the off-page columns.

row_build(): Tighten a debug assertion about null BLOB pointers.

row_upd_clust_rec(): Add DEBUG_SYNC points before and after writing
off-page columns. Do not reposition the cursor or commit the
mini-transaction before writing the off-page columns.

rb:939 approved by Jimmy Yang
2012-02-17 11:42:04 +02:00
Marko Mäkelä
3d58fd6900 Bug#13418887 ERROR IN DIAGNOSTIC FUNCTION PAGE_REC_PRINT()
When printing information about a ROW_FORMAT=REDUNDANT record, pass
the correct flag to rec_get_next_offs().

rb:821 approved by Jimmy Yang
2011-12-12 13:48:24 +02:00
Marko Mäkelä
2c67d5066d Revert revno:3452.71.32 (Bug#12612184 fix).
Bug#12612184 RACE CONDITION AFTER BTR_CUR_PESSIMISTIC_UPDATE()

The fix introduced potentially more severe crash recovery problems
than the bug causes. Revert the fix for now.
2011-10-26 12:23:57 +03:00
Marko Mäkelä
8c545acd53 Bug#12948130 UNNECESSARY X-LOCKING OF ADAPTIVE HASH INDEX (BTR_SEARCH_LATCH)
InnoDB acquires an x-latch on btr_search_latch for certain in-place updates
that do affect the adaptive hash index. These operations do not really need
to be protected by the btr_search_latch:

* updating DB_TRX_ID
* updating DB_ROLL_PTR
* updating PAGE_MAX_TRX_ID
* updating the delete-mark flag

rb:750 approved by Sunny Bains
2011-09-08 16:10:24 +03:00
Marko Mäkelä
5b4ceba58d Bug#12612184 Race condition after btr_cur_pessimistic_update()
btr_cur_compress_if_useful(), btr_compress(): Add the parameter ibool
adjust. If adjust=TRUE, adjust the cursor position after compressing
the page.

btr_lift_page_up(): Return a pointer to the father page.

BTR_KEEP_POS_FLAG: A new flag for btr_cur_pessimistic_update().

btr_cur_pessimistic_update(): If *big_rec != NULL and flags &
BTR_KEEP_POS_FLAG, keep the cursor positioned on the updated record.
Also, do not release the index tree x-lock if *big_rec != NULL.

btr_cur_mtr_commit_and_start(): Commits and restarts a
mini-transaction so that it will retain an x-lock on index->lock and
the page of the cursor. This is invoked when
btr_cur_pessimistic_update() returns *big_rec != NULL.

In all callers of btr_cur_pessimistic_update() that do not pass
BTR_KEEP_POS_FLAG, assert that *big_rec == NULL.

btr_cur_compress(): Unused function [in the built-in MySQL 5.1], remove.

page_rec_get_nth(): Return the nth record on the page (an inverse
function of page_rec_get_n_recs_before()). Refactored from
page_get_middle_rec().

page_get_middle_rec(): Invoke page_rec_get_nth().

page_cur_insert_rec_zip_reorg(): Make use of the page directory
shortcuts in page_rec_get_nth() instead of scanning the whole list of
records.

row_ins_clust_index_entry_by_modify(): Pass BTR_KEEP_POS_FLAG to
btr_cur_pessimistic_update().

row_ins_index_entry_low(): If row_ins_clust_index_entry_by_modify()
returns a big_rec, invoke btr_cur_mtr_commit_and_start() in order to
commit and start the mini-transaction without releasing the x-locks on
index->lock and the cursor page, and write the big_rec. Releasing the
page latch in mtr_commit() caused a race condition.

row_upd_clust_rec(): Pass BTR_KEEP_POS_FLAG to
btr_cur_pessimistic_update(). If it returns a big_rec, invoke
btr_cur_mtr_commit_and_start() in order to commit and start the
mini-transaction without releasing the x-locks on index->lock and the
cursor page, and write the big_rec. Releasing the page latch in
mtr_commit() caused a race condition.

sync_thread_add_level(): Add the parameter ibool relock. When TRUE,
bypass the latching order rules.

rw_lock_add_debug_info(): For nested X-lock requests, pass relock=TRUE
to sync_thread_add_level().

rb:678 approved by Jimmy Yang
2011-06-16 10:27:21 +03:00
Satya B
e011c02e2f Applying InnoDB Plugin 1.0.5 snapshot ,part 12
From r5995 to r6043

Detailed revision comments:

r5995 | marko | 2009-09-28 03:52:25 -0500 (Mon, 28 Sep 2009) | 17 lines
branches/zip: Do not write to PAGE_INDEX_ID after page creation,
not even when restoring an uncompressed page after a compression failure.

btr_page_reorganize_low(): On compression failure, do not restore
those page header fields that should not be affected by the
reorganization.  Instead, compare the fields.

page_zip_decompress(): Add the parameter ibool all, for copying all
page header fields.  Pass the parameter all=TRUE on block read
completion, redo log application, and page_zip_validate(); pass
all=FALSE in all other cases.

page_zip_reorganize(): Do not restore the uncompressed page on
failure.  It will be restored (to pre-modification state) by the
caller anyway.

rb://167, Issue #346
r5996 | marko | 2009-09-28 07:46:02 -0500 (Mon, 28 Sep 2009) | 4 lines
branches/zip: Address Issue #350 in comments.

lock_rec_queue_validate(), lock_rec_queue_validate(): Note that
this debug code may violate the latching order and cause deadlocks.
r5997 | marko | 2009-09-28 08:03:58 -0500 (Mon, 28 Sep 2009) | 12 lines
branches/zip: Remove an assertion failure when the InnoDB data dictionary
is inconsistent with the MySQL .frm file.

ha_innobase::index_read(): When the index cannot be found,
return an error.

ha_innobase::change_active_index(): When prebuilt->index == NULL,
set also prebuilt->index_usable = FALSE.  This is not needed for
correctness, because prebuilt->index_usable is only checked by
row_search_for_mysql(), which requires prebuilt->index != NULL.

This addresses Issue #349.  Approved by Heikki Tuuri over IM.
r6005 | vasil | 2009-09-29 03:09:52 -0500 (Tue, 29 Sep 2009) | 4 lines
branches/zip:

ChangeLog: wrap around 78th column, not earlier.

r6006 | vasil | 2009-09-29 05:15:25 -0500 (Tue, 29 Sep 2009) | 4 lines
branches/zip:

Add ChangeLog entry for the release of 1.0.4.

r6007 | vasil | 2009-09-29 08:19:59 -0500 (Tue, 29 Sep 2009) | 6 lines
branches/zip:

Fix the year, should be 2009.

Pointed by:	Calvin

r6026 | marko | 2009-09-30 02:18:24 -0500 (Wed, 30 Sep 2009) | 1 line
branches/zip: Add some debug assertions for checking FSEG_MAGIC_N.
r6028 | marko | 2009-09-30 08:55:23 -0500 (Wed, 30 Sep 2009) | 3 lines
branches/zip: recv_no_log_write: New debug flag for tracking down
Mantis Issue #347.  No modifications should be made to the database
while recv_apply_hashed_log_recs() is about to complete.
r6029 | calvin | 2009-09-30 15:32:02 -0500 (Wed, 30 Sep 2009) | 4 lines
branches/zip: non-functional changes

Fix typo.

r6031 | marko | 2009-10-01 06:24:33 -0500 (Thu, 01 Oct 2009) | 49 lines
branches/zip: Clean up after a crash during DROP INDEX.
When InnoDB crashes while dropping an index, ensure that
the index will be completely dropped during crash recovery.

row_merge_drop_index(): Before dropping an index, rename the index to
start with TEMP_INDEX_PREFIX_STR and commit the change, so that
row_merge_drop_temp_indexes() will drop the index after crash
recovery if the server crashes while dropping the index.

fseg_inode_try_get(): New function, forked from fseg_inode_get().
Return NULL if the file segment index node is free.

fseg_inode_get(): Assert that the file segment index node is not free.

fseg_free_step(): If the file segment index node is already free,
print a diagnostic message and return TRUE.

fsp_free_seg_inode(): Write a nonzero number to FSEG_MAGIC_N, so that
allocated-and-freed file segment index nodes can be better
distinguished from uninitialized ones.

This is rb://174, addressing Issue #348.

Tested by restarting mysqld upon the completion of the added
log_write_up_to() invocation below, during DROP INDEX.  The index was
dropped after crash recovery, and re-issuing the DROP INDEX did not
crash the server.

Index: btr/btr0btr.c
===================================================================
--- btr/btr0btr.c	(revision 6026)
+++ btr/btr0btr.c	(working copy)
@@ -42,6 +42,7 @@ Created 6/2/1994 Heikki Tuuri
 #include "ibuf0ibuf.h"
 #include "trx0trx.h"
+#include "log0log.h"
 
 /*
 Latching strategy of the InnoDB B-tree
 --------------------------------------
@@ -873,6 +874,8 @@ leaf_loop:
 
 		goto leaf_loop;
 	}
+
+	log_write_up_to(mtr.end_lsn, LOG_WAIT_ALL_GROUPS, TRUE);
 top_loop:
 	mtr_start(&mtr);
 
r6033 | calvin | 2009-10-01 15:19:46 -0500 (Thu, 01 Oct 2009) | 4 lines
branches/zip: fix a typo in error message

Reported as bug#47763.

r6043 | inaam | 2009-10-05 09:45:35 -0500 (Mon, 05 Oct 2009) | 12 lines
branches/zip  rb://176

Do not invalidate buffer pool while an LRU batch is active. Added
code to buf_pool_invalidate() to wait for the running batches to finish.

This patch also resets the state of buf_pool struct at invalidation. This
addresses the concern where buf_pool->freed_page_clock becomes non-zero
because we read in a system tablespace page for file format info at
startup.

Approved by: Marko
2009-10-09 19:43:15 +05:30
Satya B
3945d5e554 Adding innodb_plugin-1.0.4 as storage/innodb_plugin. 2009-05-27 15:15:59 +05:30