PROBLEM: After WL 4144, when using MyISAM Merge tables, the routine
open_and_lock_tables will append to the list of tables to lock, the
base tables that make up the MERGE table. This has two side-effects in
replication:
1. On the master side, we log additional table maps for the base
tables, since they appear in the list of locked tables, even
though we don't really use them at the slave.
2. On the slave side, when opening a MERGE table while applying a
ROW event, additional tables are appended to the list of tables
to lock.
Side-effect #1 is not harmful. It's just that when using MyISAM Merge
tables a few table maps more may be logged.
Side-effect #2, is harmful, because the list rli->tables_to_lock is an
extended structure from TABLE_LIST in which the extra fields are
filled from the table maps that are processed. Since
open_and_lock_tables appends tables to the list after all table map
events have been processed we end up with entries without
replication/table map data on them. Thus when trying to access that
info for these extra tables, the server will crash.
SOLUTION: We fix side-effect #2 by making sure that we access the
replication part of the structure for those in the list that were
accounted for when processing the correspondent table map events. All
in all, we never go beyond rli->tables_to_lock_count.
We also deploy an assertion when clearing rli->tables_to_lock, making
sure that the base tables are not in the list anymore (were closed in
close_thread_tables).
CHECK_SIMPLE_EQUALITY
PROBLEM:
Crash in "check_simple_equality" when using a subquery with "IN" and
"ALL" in prepare.
ANALYSIS:
Crash can be reproduced using a simplified query like this one:
prepare s from "select 1 from g1 where 1 < all (
select @:=(1 in (select 1 from g1)) from g1)";
This bug is currently present only on 5.5.and 5.1. Its fixed as part
of work log(#1110) in 5.6. We are taking one change to fix this
in 5.5 and 5.1.
Problem seems to be present because we are trying to evaluate "is_null"
on an argument which is part of a subquery
(In Item_is_not_null_test::update_used_tables()).
But the condition to evaluate is only when we do not have a sub query
present, which means to say that "with_subselect" is not set.
With respect to the above query, we create an object of type
"Item_in_optimizer" which by definition is always associated with a
subquery. While in 5.6 we set "with_subselect" to true for
"Item_in_optimizer" object, we do not do the same in 5.5. This results in
the evaluation for "is_null" resulting in a coredump.
So, we are now setting "with_subselect" to true for "Item_in_optimizer"
in 5.1 and 5.5.
mysql-test/r/func_in.result:
Result file changes for the test case added
mysql-test/t/func_in.test:
Test case added for Bug#13012483
sql/item_cmpfunc.h:
Changed Item_in_optimizer::Item_in_optimizer( ) to set "with_subselect"
to true
PARTITION STATISTICS
Problem was the fix for bug#11756867; It always used the first
partitions, and stopped after it checked 10 [sub]partitions.
(or until it found a partition which would contain a match).
This results in bad statistics for tables where the first 10 partitions
don't represent the majority of the data (like when the first 10
partitions only contained a few rows in total).
The solution was to take statisics from the partitions containing
the most rows instead:
Added an array of partition ids which is sorted by number of records
in descending order.
this array is used in records_in_range to cover as many records as
possible in as few calls as possible.
Also changed the limit of how many partitions to use for the statistics
from a static max of 10 partitions, into a dynamic model:
Maximum number of partitions is now log2(total number of partitions)
taken from the ordered array.
It will continue calling partitions records_in_range until it has
checked:
(total rows in matching partitions) * (maximum number of partitions)
/ (number of used partitions)
Also reverted the changes for ha_partition::scan_time() and
ha_partition::estimate_rows_upper_bound() to before
the fix of bug#11756867. Since they are not as slow as
records_in_range.
RESULT FROM PREVIOUS TRANSACTION
The current Query Cache API is not fully compatible with
the partitioning engine.
There is no good way to implement support for QC due to:
1) a static callback for ha_partition would need to have access
to all partition names and call the underlying callback for each
[sub]partition with the correct name.
2) pruning would be impossible, even if one used the ulonglong
engine_data due to if engine_data is changed, the table is
invalidated by the QC.
So the only viable solution to avoid incorrect data is to not allow
caching of queries using partitioned tables.
(There are some extra changes, due to removal of \r as line break)
On shutdown(), Windows can drop traffic still queued for sending even if that
wasn't specifically requested. As a result, fatal errors (those after
signaling which the server will drop the connection) were sometimes only
seen as "connection lost" on the client side, because the server-side
shutdown() erraneously discarded the correct error message before sending
it.
If on Windows, we now use the Windows API to access the (non-broken) equivalent
of shutdown().
Backport from trunk
If a query's end time is before before its start time, the system clock has been turn back
(daylight savings time etc.). When the system clock is changed, we can't tell for certain a
given query was actually slow. We did not protect against logging such a query with a bogus
execution time (resulting from end_time - start_time being negative), and possibly logging it
even though it did not really take long to run.
We now have a sanity check in place.
sql/sql_parse.cc:
Make sure end time is not before start time - otherwise, we can be SURE the system clock
was changed in between, but not by how much. In other words, when the clock is changed,
we don't know how long a query ran, and whether it was slow.
On shutdown(), Windows can drop traffic still queued for sending even if that
wasn't specifically requested. As a result, fatal errors (those after
signaling which the server will drop the connection) were sometimes only
seen as "connection lost" on the client side, because the server-side
shutdown() erraneously discarded the correct error message before sending
it.
If on Windows, we now use the Windows API to access the (non-broken) equivalent
of shutdown().
Backport from trunk
include/violite.h:
export mysql_socket_shutdown(). It lives in vio in the backport.
sql/mysqld.cc:
Go through our own shutdown() rather than straight to the POSIX one.
vio/viosocket.c:
Define mysql_socket_shutdown(). On UNIXoid systems, it's just a wrapper for shutdown(), but
on Window, it uses DisconnectEx, which is magic.
This patch is a backport of some of the cleanups/refactorings that were done
as part of WL#1393 Optimizing filesort with small limit.
mysql-test/t/bug13633383.test:
New test case.
mysql-test/valgrind.supp:
Changed signature for find_all_keys
Fixed a typo in the comment.
Fixing test cases which were previouslyno throwing due
disable warnings macro.
sql/sql_base.cc:
Change in indentation and fixing a typo in the comment.
Problem: Statements that write to tables with auto_increment columns
based on the selection from another table, may lead to master
and slave going out of sync, as the order in which the rows
are retrieved from the table may differ on master and slave.
Solution: We mark writing to a table with auto_increment table
based on the rows selected from another table as unsafe. This
will cause the execution of such statements to throw a warning
and forces the statement to be logged in ROW if the logging
format is mixed.
Changes:
1. All the statements that writes to a table with auto_increment
column(s) based on the rows fetched from another table, will now
be unsafe.
2. CREATE TABLE with SELECT will now be unsafe.
sql/share/errmsg-utf8.txt:
Added new warning messages.
sql/sql_base.cc:
-Created function to check statements that write to
tables with auto_increment column and has select.
-Marked all the statements that write to a table
with auto_increment column based on rows fetched
from other table(s) as unsafe.
sql/sql_table.cc:
mark CREATE TABLE[with auto_increment column] as unsafe.
Problem: Statements that write to tables with auto_increment columns
based on the selection from another table, may lead to master
and slave going out of sync, as the order in which the rows
are retrived from the table may differ on master and slave.
Solution: We mark writing to a table with auto_increment table
as unsafe. This will cause the execution of such statements to
throw a warning and forces the statement to be logged in ROW if
the logging format is mixed.
Changes:
1. All the statements that writes to a table with auto_increment
column(s) based on the rows fetched from another table, will now
be unsafe.
2. CREATE TABLE with SELECT will now be unsafe.
sql/share/errmsg-utf8.txt:
Added new Warning messages
sql/sql_base.cc:
created a new function that checks for select + write on a autoinc table
made all such statements to be unsafe.
sql/sql_parse.cc:
made create autoincremnet tabble + select unsafe
IS EXECUTED TWICE FROM P
This bug is a duplicate of bug 12567331, which was pushed to the
optimizer backporting tree on 2011-06-11. This is just a back-port of
the fix. Both test cases are included as they differ somewhat.
ALTER TABLE AFTER DROP PARTITION
Bug#13608188 - 64038: CRASH IN HANDLER::HA_THD ON ALTER TABLE AFTER
REPAIR NON-EXISTING PARTITION
Backport of bug#13357766 from -trunk to -5.5.
The state of some partitions was not reset on failure, leading
to invalid states of partitions in consequent statements.
Fixed by reverting back to original state for all partitions
if not all partition names was resolved.
Also adding extra security by forcing tables to be reopened
in case of error in mysql_alter_table.
(There is also removal of \r at the end of some lines.)
KEY HANDLING ON SUBSEQUENT CREATE TABLE IF NOT EXISTS
PROBLEM:
--------
Consider a SP routine which does CREATE TABLE
with REFERENCES clause. The first call to this routine
invokes parser and the parsed items are cached, so as
to avoid parsing for the second execution of the routine.
It is obsevered that valgrind reports a warning
upon read of thd->lex->alter_info->key_list->Foreign_key object,
which seem to be pointing to a invalid memory address
during second time execution of the routine. Accessing this object
theoretically could cause a crash.
ANALYSIS:
---------
The problem stems from the fact that for some reason
elements of ref_columns list in thd->lex->alter_info->
key_list->Foreign_key object are changed to point to
objects allocated on runtime memory root.
During the first execution of routine we create
a copy of thd->lex->alter_info object.
As part of this process we create a clones of objects in
Alter_info::key_list and of Foreign_key object in particular.
Then Foreign_key object is cloned for some reason we
perform shallow copies of both Foreign_key::ref_columns
and Foreign_key::columns list. So new instance of
Foreign_key object starts to SHARE contents of ref_columns
and columns list with the original instance.
After that as part of cloning process we call
list_copy_and_replace_each_value() for elements of
ref_columns list. As result ref_columns lists in both
original and cloned Foreign_key object start to contain
pointers to Key_part_spec objects allocated on runtime
memory root because of shallow copy.
So when we start copying of thd->lex->alter_info object
during the second execution of stored routine we indeed
encounter pointer to the Key_part_spec object allocated
on runtime mem-root which was cleared during at the end
of previous execution. This is done in sp_head::execute(),
by a call to free_root(&execute_mem_root,MYF(0));
As result we get valgrind warnings about accessing
unreferenced memory.
FIX:
----
The safest solution to this problem is to
fix Foreign_key(Foreign_key, MEM_ROOT) constructor to do
a deep copy of columns lists, similar to Key(Key, MEM_ROOT)
constructor.
Bug#13011410 CRASH IN FILESORT CODE WITH GROUP BY/ROLLUP
The assert in 13580775 is visible in 5.6 only,
but shows that all versions are vulnerable.
13011410 crashes in all versions.
filesort tries to re-use the sort buffer between invocations in order to save
malloc/free overhead.
The fix for Bug 11748783 - 37359: FILESORT CAN BE MORE EFFICIENT.
added an assert that buffer properties (num_records, record_length) are
consistent between invocations. Indeed, they are not necessarily consistent.
Fix: re-allocate the sort buffer if properties change.
mysql-test/r/partition.result:
New tests.
mysql-test/t/partition.test:
New tests.
sql/filesort.cc:
If we already have allocated a sort buffer in a previous execution,
then verify that it is big enough for the current one.
sql/table.h:
Add sort_keys_size; Number of bytes allocated for the sort_keys buffer.
BUG#13519696 - 62940: SELECT RESULTS VARY WITH VERSION AND
WITH/WITHOUT INDEX RANGE SCAN
BUG#13453382 - REGRESSION SINCE 5.1.39, RANGE OPTIMIZER WRONG
RESULTS WITH DECIMAL CONVERSION
BUG#13463488 - 63437: CHAR & BETWEEN WITH INDEX RETURNS WRONG
RESULT AFTER MYSQL 5.1.
Those are all cases where the range optimizer got it wrong
with > and >=.
mysql-test/r/range.result:
Without the code fix for DECIMAL, "select count(val) from t2 where val > 0.1155"
(which uses a range scan) returned 127 instead of 128);
Moreover, both
select * from t1 force index (primary) where a=1 and c>= 2.9;
and
select * from t1 force index (primary) where a=1 and c> 2.9;
would miss "1 1 3".
Without the code fix for strings, both
SELECT * FROM t1 WHERE F1 >= 'A ';
and
SELECT * FROM t1 WHERE F1 BETWEEN 'A ' AND 'AAAAA';
would miss "A A A".
sql/item.cc:
Preamble to the explanations below: opt_range.cc:get_mm_leaf() does
this (this is not changed by the patch): changes
column > value
to
column OP V
where:
* V is what is in "column" after we stored "value" in it
(such store operation may have done rounding...)
* OP is > or >=, depending on what's correct.
For example, if c is an INT column,
c > 2.9 is changed to
c OP 3
where OP is >= ('>' would not be correct).
The bugs below are cases where we chose OP wrongly.
Note that such transformations are visible in the optimizer trace.
1) Fix for STRING. In the scenario with CHAR(5) in range.test, this happens,
in get_mm_tree(), for the condition F1>='A ':
* value->save_in_field_no_warnings(field, 1) wants to store the right argument
(named 'item') into the CHAR(5) field; this stores 'A ' (the item's value)
padded with spaces (which changes nothing: still 'A ')
* we come to
case Item_func::GE_FUNC:
/* Don't use open ranges for partial key_segments */
if ((!(key_part->flag & HA_PART_KEY_SEG)) &&
(stored_field_cmp_to_item(param->thd, field, value) < 0))
tree->min_flag= NEAR_MIN;
tree->max_flag=NO_MAX_RANGE;
What this wants to do is: if the field's value is strictly smaller
than the item's, then ">=" can be changed to ">" (this is an optimization,
it can help pruning one useless partition).
* stored_field_cmp_to_item() is called; it compares the field's
and item's values: the item's value (Item_string::val_str()) is
'A ') and the field's value (Field_string::val_str()) is
'A' (yes val_str() removes end spaces unless sql_mode='PAD_CHAR_TO_FULL_LENGTH');
and the comparison is done with stringcmp() which considers
end spaces as relevant; as end spaces differ, function returns a
negative number, and ">='A '" becomes ">'A'" (i.e. the NEAR_MIN
flag is turned on).
During execution the index range scan code will search for "A", find
a match, but exclude it (because of ">"), wrongly.
The badness is the string comparison done by stored_field_cmp_to_item():
we use the reply of this function to determine where the index search
should start, so it should do comparison like index search does
comparisons; index search comparisons are ha_key_cmp() which uses
a collation-aware comparison (in our case, my_strnncollsp_simple(),
which ignores end spaces); so stored_field_cmp_to_item()
needs to do the same. When this is fixed, condition becomes
">='A '".
2) Fix for DECIMAL: just like in other comparisons in stored_field_cmp_to_item(),
we must first pass the field and then the item; otherwise expectations
on what <0 and >0 mean (inferiority, superiority) get violated.
In the test in range.test about c>2.9: c is an INT column, so 2.9
gets stored as 3, then stored_field_cmp_to_item() compares 3
and 2.9; because of the wrong order of arguments passed
to my_decimal_cmp(), range optimizer
thinks that 3 is < 2.9 and thus changes "c> 2.9" to "c> 3".
After fixing the order, it changes to the correct "c>= 3".
In the test in range.inc for val > 0.1155, it was changed to
val > 0.116, now it is changed to val >= 0.116.
Bug#11758543 50756: BIGINT '100' MATCHES 1.001E2
Expressions of the form
BIGINT_COL <compare> <non-integer constant>
should be done either as decimal, or float.
Currently however, such comparisons are done as int,
which means that the constant may be truncated,
and yield false positives/negatives for all queries
where compare is '>' '<' '>=' '<=' '=' '!='.
BIGINT_COL IN <list of contstants>
and
BIGINT_COL BETWEEN <constant> AND <constant>
are also affected.
mysql-test/r/bigint.result:
New tests.
mysql-test/r/func_in.result:
BIGINT <=> string comparison should be done as float,
so a warning for the value 'abc' is appropriate.
mysql-test/t/bigint.test:
New tests.
sql/item_cmpfunc.cc:
In convert_constant_item() we verify that the constant item
can be stored in the given field.
For BIGINT columns (MYSQL_TYPE_LONGLONG) we must verify that the
stored constant value is actually comparable as int,
i.e. that the value was not truncated.
For between: compare as int only if both arguments convert correctly to int.
MEMORY LEAK.
Background:
- There are caches for stored functions and stored procedures (SP-cache);
- There is no similar cache for events;
- Triggers are cached together with TABLE objects;
- Those SP-caches are per-session (i.e. specific to each session);
- A stored routine is represented by a sp_head-instance internally;
- SP-cache basically contains sp_head-objects of stored routines, which
have been executed in a session;
- sp_head-object is added into the SP-cache before the corresponding
stored routine is executed;
- SP-cache is flushed in the end of the session.
The problem was that SP-cache might grow without any limit. Although this
was not a pure memory leak (the SP-cache is flushed when session is closed),
this is still a problem, because the user might take much memory by
executing many stored routines.
The patch fixes this problem in the least-intrusive way. A soft limit
(similar to the size of table definition cache) is introduced. To represent
such limit the new runtime configuration parameter 'stored_program_cache'
is introduced. The value of this parameter is stored in the new global
variable stored_program_cache_size that used to control the size of SP-cache
to overflow.
The parameter 'stored_program_cache' limits number of cached routines for
each thread. It has the following min/default/max values given from support:
min = 256, default = 256, max = 512 * 1024.
Also it should be noted that this parameter limits the size of
each cache (for stored procedures and for stored functions) separately.
The SP-cache size is checked after top-level statement is parsed.
If SP-cache size exceeds the limit specified by parameter
'stored_program_cache' then SP-cache is flushed and memory allocated for
cache objects is freed. Such approach allows to flush cache safely
when there are dependencies among stored routines.
sql/mysqld.cc:
Added global variable stored_program_cache_size to store value of
configuration parameter 'stored-program-cache'.
sql/mysqld.h:
Added declaration of global variable stored_program_cache_size.
sql/sp_cache.cc:
Extended interface for sp_cache by adding helper routine
sp_cache_enforce_limit to control size of stored routines cache for
overflow. Also added method enforce_limit into class sp_cache that
implements control of cache size for overflow.
sql/sp_cache.h:
Extended interface for sp_cache by adding standalone routine
sp_cache_enforce_limit to control size of stored routines cache
for overflow.
sql/sql_parse.cc:
Added flush of sp_cache after processing of next sql-statement
received from a client.
sql/sql_prepare.cc:
Added flush of sp_cache after preparation/execution of next prepared
sql-statement received from a client.
sql/sys_vars.cc:
Added support for configuration parameter stored-program-cache.
- Reverting the patch for Bug # 12584302
The patch will be reverted in 5.1 and 5.5.
The patch will not be reverted in 5.6, the change will
be properly documented in 5.6.
- Backporting DBUG_ASSERT not to crash on '0000-01-00'
(already fixed in mysql-trunk (5.6))
Problem : The basic problem is the way the thread sleeps in mysql-5.5 and also in mysql-5.1
when we execute a stop slave on windows platform.
On windows platform if the stop slave is executed after the master dies, we have
this long wait before the stop slave return a value. This is because there is a
sleep of the thread. The sleep is uninterruptable in the two above version,
which was fixed by Davi patch for the BUG#11765860 for mysql-trunk. Backporting
his patch for mysql-5.5 fixes the problem.
Solution : A new pair of mutex and condition variable is introduced to synchronize thread
sleep and finalization. A new mutex is required because the slave threads are
terminated while holding the slave thread locks (run_lock), which can not be
relinquished during termination as this would affect the lock order.
mysql-test/suite/rpl/r/rpl_start_stop_slave.result:
The result file associated with the test added.
mysql-test/suite/rpl/t/rpl_start_stop_slave.test:
A test to check the new functionality.
sql/rpl_mi.cc:
The constructor using the new mutex and condition variables for the master_info.
sql/rpl_mi.h:
The condition variable and mutex have been added for the master_info.
sql/rpl_rli.cc:
The constructor using the new mutex and condition variables for the realy_log_info.
sql/rpl_rli.h:
The condition variable and mutex have been added for the relay_log_info.
sql/slave.cc:
Use a timed wait on a condition variable to implement a interruptible sleep.
The wait is registered with the THD object so that the thread will be woken
up if killed.
A follow-up patch corrects max sizes of printed strings and changes llstr() to %lld.
Credits go to Davi who provided a great feedback.
sql/share/errmsg-utf8.txt:
Max size for the whole message is 512 so a part of - like '%-.512s' should be less,
reduction to 320 is safe and with good chances won't cut off a part of a rather log
message in Last_IO_Error = 'Got fatal error 1236 ...'
sql/sql_repl.cc:
llstr() is replaced by %lld.
The server crashes when receiving a COM_BINLOG_DUMP command with a position of 0 or
larger than the file size.
The execution proceeds to an error block having the last read replication coordinates
pointer be NULL and its dereferencing crashed the server.
Fixed with making "public" previously used only for heartbeat coordinates.
mysql-test/extra/rpl_tests/rpl_start_stop_slave.test:
regression test for bug#3593869-64035 is added.
mysql-test/suite/rpl/r/rpl_cant_read_event_incident.result:
results updated (error mess format is changed).
mysql-test/suite/rpl/r/rpl_log_pos.result:
results updated (error mess format is changed).
mysql-test/suite/rpl/r/rpl_manual_change_index_file.result:
results updated (error mess format is changed).
mysql-test/suite/rpl/r/rpl_packet.result:
results updated (error mess format is changed).
mysql-test/suite/rpl/r/rpl_stm_start_stop_slave.result:
results updated (error mess format is changed).
mysql-test/suite/rpl/t/rpl_stm_start_stop_slave.test:
Slave is stopped by bug#3593869-64035 tests so
-let $rpl_only_running_threads= 1 is set prior to rpl_end.
sql/share/errmsg-utf8.txt:
Increasing the max length of explanatory message to 512.
sql/sql_repl.cc:
Making `coord' to carry the last read from binlog event coordinates
regardless of heartbeat.
Renaming, small cleanup and simplifying the code after if (coord) becomes unnecessary.
Adding yet another 3rd pair of coordinates - the starting replication -
into error text.