The fix of Bug#12612184 broke crash recovery. When a record that
contains off-page columns (BLOBs) is updated, we must first write redo
log about the BLOB page writes, and only after that write the redo log
about the B-tree changes. The buggy fix would log the B-tree changes
first, meaning that after recovery, we could end up having a record
that contains a null BLOB pointer.
Because we will be redo logging the writes off the off-page columns
before the B-tree changes, we must make sure that the pages chosen for
the off-page columns are free both before and after the B-tree
changes. In this way, the worst thing that can happen in crash
recovery is that the BLOBs are written to free pages, but the B-tree
changes are not applied. The BLOB pages would correctly remain free in
this case. To achieve this, we must allocate the BLOB pages in the
mini-transaction of the B-tree operation. A further quirk is that BLOB
pages are allocated from the same file segment as leaf pages. Because
of this, we must temporarily "hide" any leaf pages that were freed
during the B-tree operation by "fake allocating" them prior to writing
the BLOBs, and freeing them again before the mtr_commit() of the
B-tree operation, in btr_mark_freed_leaves().
btr_cur_mtr_commit_and_start(): Remove this faulty function that was
introduced in the Bug#12612184 fix. The problem that this function was
trying to address was that when we did mtr_commit() the BLOB writes
before the mtr_commit() of the update, the new BLOB pages could have
overwritten clustered index B-tree leaf pages that were freed during
the update. If recovery applied the redo log of the BLOB writes but
did not see the log of the record update, the index tree would be
corrupted. The correct solution is to make the freed clustered index
pages unavailable to the BLOB allocation. This function is also a
likely culprit of InnoDB hangs that were observed when testing the
Bug#12612184 fix.
btr_mark_freed_leaves(): Mark all freed clustered index leaf pages of
a mini-transaction allocated (nonfree=TRUE) before storing the BLOBs,
or freed (nonfree=FALSE) before committing the mini-transaction.
btr_freed_leaves_validate(): A debug function for checking that all
clustered index leaf pages that have been marked free in the
mini-transaction are consistent (have not been zeroed out).
btr_page_alloc_low(): Refactored from btr_page_alloc(). Return the
number of the allocated page, or FIL_NULL if out of space. Add the
parameter "mtr_t* init_mtr" for specifying the mini-transaction where
the page should be initialized, or if this is a "fake allocation"
(init_mtr=NULL) by btr_mark_freed_leaves(nonfree=TRUE).
btr_page_alloc(): Add the parameter init_mtr, allowing the page to be
initialized and X-latched in a different mini-transaction than the one
that is used for the allocation. Invoke btr_page_alloc_low(). If a
clustered index leaf page was previously freed in mtr, remove it from
the memo of previously freed pages.
btr_page_free(): Assert that the page is a B-tree page and it has been
X-latched by the mini-transaction. If the freed page was a leaf page
of a clustered index, link it by a MTR_MEMO_FREE_CLUST_LEAF marker to
the mini-transaction.
btr_store_big_rec_extern_fields_func(): Add the parameter alloc_mtr,
which is NULL (old behaviour in inserts) and the same as local_mtr in
updates. If alloc_mtr!=NULL, the BLOB pages will be allocated from it
instead of the mini-transaction that is used for writing the BLOBs.
fsp_alloc_from_free_frag(): Refactored from
fsp_alloc_free_page(). Allocate the specified page from a partially
free extent.
fseg_alloc_free_page_low(), fseg_alloc_free_page_general(): Add the
parameter "mtr_t* init_mtr" for specifying the mini-transaction where
the page should be initialized, or NULL if this is a "fake allocation"
that prevents the reuse of a previously freed B-tree page for BLOB
storage. If init_mtr==NULL, try harder to reallocate the specified page
and assert that it succeeded.
fsp_alloc_free_page(): Add the parameter "mtr_t* init_mtr" for
specifying the mini-transaction where the page should be initialized.
Do not allow init_mtr == NULL, because this function is never to be
used for "fake allocations".
mtr_t: Add the operation MTR_MEMO_FREE_CLUST_LEAF and the flag
mtr->freed_clust_leaf for quickly determining if any
MTR_MEMO_FREE_CLUST_LEAF operations have been posted.
row_ins_index_entry_low(): When columns are being made off-page in
insert-by-update, invoke btr_mark_freed_leaves(nonfree=TRUE) and pass
the mini-transaction as the alloc_mtr to
btr_store_big_rec_extern_fields(). Finally, invoke
btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.
row_build(): Correct a comment, and add a debug assertion that a
record that contains NULL BLOB pointers must be a fresh insert.
row_upd_clust_rec(): When columns are being moved off-page, invoke
btr_mark_freed_leaves(nonfree=TRUE) and pass the mini-transaction as
the alloc_mtr to btr_store_big_rec_extern_fields(). Finally, invoke
btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.
buf_reset_check_index_page_at_flush(): Remove. The function
fsp_init_file_page_low() already sets
bpage->check_index_page_at_flush=FALSE.
There is a known issue in tablespace extension. If the request to
allocate a BLOB page leads to the tablespace being extended, crash
recovery could see BLOB writes to pages that are off the tablespace
file bounds. This should trigger an assertion failure in fil_io() at
crash recovery. The safe thing would be to write redo log about the
tablespace extension to the mini-transaction of the BLOB write, not to
the mini-transaction of the record update. However, there is no redo
log record for file extension in the current redo log format.
rb:693 approved by Sunny Bains
- Fix st_select_lex::set_explain_type() to allow producing exactly the
same EXPLAINs as it did before. SHOW EXPLAIN output may produce
select_type=SIMPLE instead or select_type=PRIMARY or vice versa (which
is ok because values of select_type weren't self-consistent in this
regard to begin with)
Analysis:
Constant table optimization of the outer query finds that
the right side of the equality is a constant that can
be used for an eq_ref access to fetch one row from t1,
and substitute t1 with a constant. Thus constant optimization
triggers evaluation of the subquery during the optimize
phase of the outer query.
The innermost subquery requires a plan with a temporary
table because with InnoDB tables the exact count of rows
is not known, and the empty tables cannot be optimzied
way. JOIN::exec for the innermost subquery substitutes
the subquery tables with a temporary table.
When EXPLAIN gets to print the tables in the innermost
subquery, EXPLAIN needs to print the name of each table
through the corresponding TABLE_LIST object. However,
the temporary table created during execution doesn't
have a corresponding TABLE_LIST, so we get a null
pointer exception.
Solution:
The solution is to forbid using expensive constant
expressions for eq_ref access for contant table
optimization. Notice that eq_ref with a subquery
providing the value is still possible during regular
execution.
Suppress the known warnings generated by filesort().
The real fix belongs to worklog 1509:
Pack values of non-sorted fields in the sort buffer
(which is basically the same issue, but in an optimization context:
We are writing the entire sort buffer to disk,
including un-used space for varchar columns.)
mysql-test/valgrind.supp:
Add new Memcheck suppressions for filesort.
sql/filesort.cc:
Remove the ifdef HAVE_purify/bzero code, use valgrind suppressions instead.
PARTITONING, ON INDEX CREATE
If the first partition succeeded in adding a index, but a successive partition failed,
then the first partition had still the new index.
The fix reverts the added indexes from previous partitions on failure.
Analysis:
During the first execution of the query through the stored
procedure, the optimization phase calls
substitute_for_best_equal_field(), which calls
Item_in_optimizer::transform(). The latter replaces
Item_in_subselect::left_expr with args[0] via assignment.
In this test case args[0] is an Item_outer_ref which is
created/deallocated for each re-execution. As a result,
during the second execution Item_in_subselect::left_expr
pointed to freed memory, which resulted in a crash.
Solution:
The solution is to use change_item_tree(), so that the
origianal left expression is restored after each execution.
Analysis:
Partial matching is used even when there are no NULLs in
a materialized subquery, as long as the left NOT IN operand
may contain NULL values.
This case was not handled correctly in two different places.
First, the implementation of parital matching did not clear
the set of matching columns when the merge process advanced
to the next row.
Second, there is no need to perform partial matching at all
when the left operand has no NULLs.
Solution:
First fix subselect_rowid_merge_engine::partial_match() to
properly cleanup the bitmap of matching keys when advancing
to the next row.
Second, change subselect_partial_match_engine::exec() so
that when the materialized subquery doesn't contain any
NULLs, and the left operand of [NOT] IN doesn't contain
NULLs either, the method returns without doing any
unnecessary partial matching. The correct result in this
case is in Item::in_value.
Added a second internal variable $mysql_errname
This is set the same way as $mysql_errno
Can be used like "if ($mysql_errname == ER_NO_SUCH_TABLE)...."
When the WHERE/HAVING condition of a subquery has been transformed
by the optimizer the pointer stored the 'where'/'having' field
of the SELECT_LEX structure used for the subquery must be updated
accordingly. Otherwise the pointer may refer to an invalid item.
This can lead to the reported assertion failure for some queries
with correlated subqueries
CRASHES SERVER
Flushing of MERGE table or one of its child tables, which was
locked by flushing thread using LOCK TABLES, might have caused
crashes or assertion failures if the thread failed to reopen
child or parent table.
Particularly, this might have happened when another connection
killed this FLUSH TABLE statement/connection.
Also this problem might have occurred when we failed to reopen
MERGE table or one of its children when executing DDL statement
under LOCK TABLES.
The problem was caused by the fact that reopen_tables() might
have failed to reopen child table but still tried to reopen,
reattach children for and re-lock its parent. Vice versa it
might have failed to reopen parent but kept references from
children to parent around. Since reopen_tables() closes table
it has failed to reopen and therefore frees all associated
memory such dangling references led to crashes when followed.
This patch solves this problem by ensuring that we always close
parent table and all its children if we fail to reopen this
table or one of its children. Same happens if we fail to reattach
children to parent.
Affects 5.1 only.
mysql-test/r/merge.result:
A test case for BUG#11763712.
mysql-test/t/merge.test:
A test case for BUG#11763712.
sql/sql_base.cc:
When flushing tables under LOCK TABLES, all locked
and flushed tables are released and then reopened.
It may happen that we failed to reopen some tables,
in this case we reopen as much tables as possible.
If it was not possible to reopen MERGE child, MERGE
parent is unusable and must be removed from thread
open tables list.
If it was not possible to reopen MERGE parent, all
MERGE child table objects are unusable as well, at
least because their locks are handled by MERGE parent.
They must also be removed from thread open tables
list.
In other words if it was impossible to reopen any
object of a MERGE table or reattach child tables,
all objects of this MERGE table must be considered
unusable and closed.
The bug is a duplicate of MySQL's Bug#11764086,
however MySQL's fix is incomplete for MariaDB, so
this fix is slightly different.
In addition, this patch renames
Item_func_not_all::top_level() to is_top_level_item()
to make it in line with the analogous methods of
Item_in_optimizer, and Item_subselect.
Analysis:
It is possible to determine whether a predicate is
NULL-rejecting only if it is a top-level one. However,
this was not taken into account for Item_in_optimizer.
As a result, a NOT IN predicate was erroneously
considered as NULL-rejecting, and the NULL-complemented
rows generated by the outer join were rejected before
being checked by the NOT IN predicate.
Solution:
Change Item_in_optimizer to be considered as
NULL-rejecting only if it a top-level predicate.
- add_ref_to_table_cond() should not just overwrite pre_idx_push_select_cond
with the contents tab->select_cond.
pre_idx_push_select_cond exists precisely for the reason that it may contain
a condition that is a strict superset of what is in tab->select_cond.
The fix is to inject generated equality into pre_idx_push_select_cond.
Also addressed issues in bug #11745133, where we could mark a table
corrupted instead of crashing the server when found a corrupted buffer/page
if the table created with innodb_file_per_table on.
- Make simplify_joins() set maybe_null=FALSE for tables that were on the
inner sides of inner joins and then were moved to the inner sides of semi-joins.
Undo previous fix, it is not reliable
Drop setting $MYSQL_LIBDIR, mtr can't be sure anyway
Test is set to override plugin-dir to some known existing dir
When merging a view / derived table the function SELECT_LEX::merge_subquery
incorrectly updated the list SELECT_LEX::leaf_tables. Erroneously it
appended the leaf_tables list of the merged object L and then removed the
reference to the merged object T from the SELECT_LEX::leaf_tables list.
A correct implementation should insert the list L into the
SELECT_LEX::leaf_tables list in place of the element of the list that
refers to T.
The bug could lead to wrong results or even crashes for queries with
nested outer joins over views / derived tables.
The bug was that when using bulk insert combined with lock table, we intitalized the io cache with the wrong file position.
This fixed a bug where MariaDB could not read in a table dump done with mysqldump.
mysql-test/suite/maria/r/locking.result:
Test case for locking + write cache bug
mysql-test/suite/maria/t/locking.test:
Test case for locking + write cache bug
storage/maria/ma_extra.c:
Initialize write cache used with bulk insert to correct file length.
(The old code didn't work if one was using LOCK TABLE for the given table).
to mysql-5.5.16-release.
Original revision:
# revision-id: dmitry.lenev@oracle.com-20110811155849-feyt3h7tj48padiu
# parent: tatjana.nuernberg@oracle.com-20110811120945-c6x9a5d2du8s9oj2
# committer: Dmitry Lenev <Dmitry.Lenev@oracle.com>
# branch nick: mysql-5.5-12828477
# timestamp: Thu 2011-08-11 19:58:49 +0400
# message:
# Fix for bug #12828477 - "MDL SUBSYSTEM CREATES BIG OVERHEAD
# FOR CERTAIN QUERIES TO INFORMATION_SCHEMA".
#
# The problem was that metadata locking subsystem introduced
# too much overhead for queries to I_S which were processed by
# opening only .FRM or .TRG files and had to scanned a lot of
# tables (e.g. SELECT COUNT(*) FROM I_S.TRIGGERS was affected).
# The same effect was not observed for similar queries which
# performed full-blown table open in order to fill I_S table.
#
# The problem stemmed from the fact that in case when I_S
# implementation opened only .FRM or .TRG file for each table
# processed it didn't release metadata lock it has acquired on
# the table after finishing its processing. As result, list
# of acquired metadata locks were growing until the end of
# statement. Since acquisition of each new lock required
# search in the list of already acquired locks performance
# degraded.
#
# The same effect is not observed when I_S implementation
# performs full-blown table open for each table being
# processed, as in the latter cases metadata lock on the
# table is released right after table processing.
#
# This fix addressed the problem by ensuring that I_S
# implementation releases metadata lock after processing
# the table in both cases of full-blown table open and in
# case when only .FRM or .TRG file is read.
The issues was:
- For some tables with a lot of not packed fields, we didn't allocate enough memory in head page which caused DBUG_ASSERT's
- Removed wrong DBUG_ASSERT()
- Fixed a problem with underflow() where it generates a key page where all keys didn't fit.
- Max key length is now limited by block_size/3 (was block_size /2). This is required for underflow() to work with packed keys.
mysql-test/lib/v1/mysql-test-run.pl:
Remove --alignment=8 as this doesn't work on 64 bit systems
mysql-test/suite/maria/r/small_blocksize.result:
Test case for Aria bug
mysql-test/suite/maria/t/small_blocksize-master.opt:
Test case for Aria bug
mysql-test/suite/maria/t/small_blocksize.test:
Test case for Aria bug
storage/maria/ha_maria.cc:
Fixed comment
storage/maria/ma_bitmap.c:
Fixed wrong variable usage in find_where_to_split_row() where we allocated too little memory for head page.
We did not take into account space for head extents (long VARCHAR) when trying to split row on head page. This caused us to allocate too little space from bitmap which lead to ASSERT failures later.
storage/maria/ma_blockrec.c:
Made some argument const (to ensure they was not accidently changed)
Removed wrong DBUG_ASSERT()
storage/maria/ma_blockrec.h:
Removed not used variable
storage/maria/ma_delete.c:
Added my_afree() in case of error
More comments and DBUG_ASSERT() for underflow()
storage/maria/ma_open.c:
Make keyinfo->underflow_block_length smaller for packed keys. This has to be done as for long packed keys, underflow() otherwise generates a key page where all keys didn't fit.
storage/maria/ma_page.c:
New DBUG_ASSERT()
storage/maria/ma_write.c:
Fixed comment
storage/maria/maria_def.h:
We have to have space for at least 3 keys on a key page.
(Otherwise the underflow() code doesn't work for packed keys, even when we have an underflow() for an empty key page)