SLOW/CRASHES SEMAPHORE
Problem:
There are 2 lakh tables - fk_000001, fk_000002 ... fk_200000. All of them
are related to the same parent_table through a foreign key constraint.
When the parent_table is loaded into the dictionary cache, all the child table
will also be loaded. This is taking lot of time. Since this operation happens
when the dictionary latch is taken, the scenario leads to "long semaphore wait"
situation and the server gets killed.
Analysis:
A simple performance analysis showed that the slowness is because of the
dict_foreign_find() function. It does a linear search on two linked list
table->foreign_list and table->referenced_list, looking for a particular
foreign key object based on foreign->id as the key. This is called two
times for each foreign key object.
Solution:
Introduce a rb tree in table->foreign_rbt and table->referenced_rbt, which
are some sort of index on table->foreign_list and table->referenced_list
respectively, using foreign->id as the key. These rbt structures will be
solely used by dict_foreign_find().
rb#5599 approved by Vasil
"HAVING SUM(DISTINCT)": WRONG RESULTS.
ISSUE:
------
If a query uses loose index scan and it has both
AGG(DISTINCT) and MIN()/MAX()functions. Then, result values
of MIN/MAX() is set improperly.
When query has AGG(DISTINCT) then end_select is set to
end_send_group. "end_send_group" keeps doing aggregation
until it sees a record from next group. And, then it will
send out the result row of that group.
Since query also has MIN()/MAX() and loose index scan is
used, values of MIN/MAX() are set as part of loose index
scan itself. Setting MIN()/MAX() values as part of loose
index scan overwrites values computed in end_send_group.
This caused invalid result.
For such queries to work loose index scan should stop
performing MIN/MAX() aggregation. And, let end_send_group to
do the same. But according to current design loose index
scan can produce only one row per group key. If we have both
MIN() and MAX() then it has to give two records out. This is
not possible as interface has to use common buffer
record[0]! for both records at a time.
SOLUTIONS:
----------
For such queries to work we need a new interface for loose
index scan. Hence, do not choose loose_index_scan for such
cases. So a new rule SA7 is introduced to take care of the
same.
SA7: "If Q has both AGG_FUNC(DISTINCT ...) and
MIN/MAX() functions then loose index scan access
method is not used."
SHOW PROCESSLIST, SHOW BINLOGS
Fixing post push test failure (MTR does not like giving
127.0.0.1 for localhost incase of --embedded run, it thinks
it is an external ip address)
SHOW PROCESSLIST, SHOW BINLOGS
Problem: A deadlock was occurring when 4 threads were
involved in acquiring locks in the following way
Thread 1: Dump thread ( Slave is reconnecting, so on
Master, a new dump thread is trying kill
zombie dump threads. It acquired thread's
LOCK_thd_data and it is about to acquire
mysys_var->current_mutex ( which LOCK_log)
Thread 2: Application thread is executing show binlogs and
acquired LOCK_log and it is about to acquire
LOCK_index.
Thread 3: Application thread is executing Purge binary logs
and acquired LOCK_index and it is about to
acquire LOCK_thread_count.
Thread 4: Application thread is executing show processlist
and acquired LOCK_thread_count and it is
about to acquire zombie dump thread's
LOCK_thd_data.
Deadlock Cycle:
Thread 1 -> Thread 2 -> Thread 3-> Thread 4 ->Thread 1
The same above deadlock was observed even when thread 4 is
executing 'SELECT * FROM information_schema.processlist' command and
acquired LOCK_thread_count and it is about to acquire zombie
dump thread's LOCK_thd_data.
Analysis:
There are four locks involved in the deadlock. LOCK_log,
LOCK_thread_count, LOCK_index and LOCK_thd_data.
LOCK_log, LOCK_thread_count, LOCK_index are global mutexes
where as LOCK_thd_data is local to a thread.
We can divide these four locks in two groups.
Group 1 consists of LOCK_log and LOCK_index and the order
should be LOCK_log followed by LOCK_index.
Group 2 consists of other two mutexes
LOCK_thread_count, LOCK_thd_data and the order should
be LOCK_thread_count followed by LOCK_thd_data.
Unfortunately, there is no specific predefined lock order defined
to follow in the MySQL system when it comes to locks across these
two groups. In the above problematic example,
there is no problem in the way we are acquiring the locks
if you see each thread individually.
But If you combine all 4 threads, they end up in a deadlock.
Fix:
Since everything seems to be fine in the way threads are taking locks,
In this patch We are changing the duration of the locks in Thread 4
to break the deadlock. i.e., before the patch, Thread 4
('show processlist' command) mysqld_list_processes()
function acquires LOCK_thread_count for the complete duration
of the function and it also acquires/releases
each thread's LOCK_thd_data.
LOCK_thread_count is used to protect addition and
deletion of threads in global threads list. While show
process list is looping through all the existing threads,
it will be a problem if a thread is exited but there is no problem
if a new thread is added to the system. Hence a new mutex is
introduced "LOCK_thd_remove" which will protect deletion
of a thread from global threads list. All threads which are
getting exited should acquire LOCK_thd_remove
followed by LOCK_thread_count. (It should take LOCK_thread_count
also because other places of the code still thinks that exit thread
is protected with LOCK_thread_count. In this fix, we are changing
only 'show process list' query logic )
(Eg: unlink_thd logic will be protected with
LOCK_thd_remove).
Logic of mysqld_list_processes(or file_schema_processlist)
will now be protected with 'LOCK_thd_remove' instead of
'LOCK_thread_count'.
Now the new locking order after this patch is:
LOCK_thd_remove -> LOCK_thd_data -> LOCK_log ->
LOCK_index -> LOCK_thread_count
ISSUE:
------
For UNION of selects, rows examined by the query will be sum
of rows examined by individual select operations and rows
examined for union operation. The value of session level
global counter that is used to count the rows examined by a
select statement should be accumulated and reset before it
is used for next select statement. But we have missed to
reset the same. Because of this examined row count of a
select query is accounted more than once.
SOLUTION:
---------
In union reset the session level global counter used to
accumulate count of examined rows after its value is saved.
Problem:
If there is a predicate on a column referenced by MIN/MAX and
that predicate is not present in all the disjunctions on
keyparts earlier in the compound index, Loose Index Scan will
not return correct result.
Analysis:
When loose index scan is chosen, range optimizer currently
groups all the predicates that contain group parts separately
and minmax parts separately. It therefore applies all the
conditions on the group parts first to the fetched row.
Then in the call to next_max, it processes the conditions
which have min/max keypart.
For ex in the following query:
Select f1, max(f2) from t1 where (f1 = 10 and f2 = 13) or
(f1 = 3) group by f1;
Condition (f2 = 13) would be applied even for rows that
satisfy (f1 = 3) thereby giving wrong results.
Solution:
Do not choose loose_index_scan for such cases. So a new rule
WA2 is introduced to take care of the same.
WA2: "If there are predicates on C, these predicates must
be in conjuction to all predicates on all earlier keyparts
in I."
Todo the same, fix reuses the function get_constant_key_infix().
Since this funciton will fail for all multi-range conditions, it
is re-written to recognize that if the sub-conditions are
equivalent across the disjuncts: it will now succeed.
And to achieve this a new helper function is introduced called
all_same().
The fix also moves the test of NGA3 up to the former only
caller, get_constant_key_infix().
Typo leading to not including the last list values (partition).
Also improved pruning to skip last partition if not used.
rb#4762 approved by Aditya and Marko.
Problem: Uninstallation of semi sync plugin causes replication to
break.
Analysis: A semisync enabled replication is mutual agreement between
Master and Slave when the connection (I/O thread) is established.
Once I/O thread is started and if semisync is enabled on both
master and slave, master appends special magic header to events
using semisync plugin functions and sends it to slave. And slave
expects that each event will have that special magic header format
and reads those bytes using semisync plugin functions.
When semi sync replication is in use if users execute
uninstallation of the plugin on master, slave gets confused while
interpreting that event's content because it expects special
magic header at the beginning of the event. Slave SQL thread will
be stopped with "Missing magic number in the header" error.
Similar problem will happen if uninstallation of the plugin happens
on slave when semi sync replication is in in use. Master sends
the events with magic header and slave does not know about the
added magic header and thinks that it received a corrupted event.
Hence slave SQL thread stops with "Found corrupted event" error.
Fix: Uninstallation of semisync plugin will be blocked when semisync
replication is in use and will throw 'ER_UNKNOWN_ERROR' error.
To detect that semisync replication is in use, this patch uses
semisync status variable values.
> On Master, it checks for 'Rpl_semi_sync_master_status' to be OFF
before allowing the uninstallation of rpl_semi_sync_master plugin.
>> Rpl_semi_sync_master_status is OFF when
>>> there is no dump thread running
>>> there are no semisync slaves
> On Slave, it checks for 'Rpl_semi_sync_slave_status' to be OFF
before allowing the uninstallation of rpl_semi_sync_slave plugin.
>> Rpl_semi_sync_slave_status is OFF when
>>> there is no I/O thread running
>>> replication is asynchronous replication.
ARE PERMANENTLY SKIPPED IN 5.5/5.6).
The problem was that some result files were not updated,
so the tests were skipped.
The fix is to record updated result files.
BREAKS RBR
Analysis:
--------
A table created using a query of the format:
CREATE TABLE t1 AS SELECT REPEAT('A',1000) DIV 1 AS a;
breaks the Row Based Replication.
The query above creates a table having a field of datatype
'bigint' with a display width of 3000 which is beyond the
maximum acceptable value of 255.
In the RBR mode, CREATE TABLE SELECT statement is
replicated as a combination of CREATE TABLE statement
equivalent to one the returned by SHOW CREATE TABLE and
row events for rows inserted. When this CREATE TABLE event
is executed on the slave, an error is reported:
Display width out of range for column 'a' (max = 255)
The following is the output of 'SHOW CREATE TABLE t1':
CREATE TABLE t1(`a` bigint(3000) DEFAULT NULL)
ENGINE=InnoDB DEFAULT CHARSET=latin1;
The problem is due to the combination of two facts:
1) The above CREATE TABLE SELECT statement uses the display
width of the result of DIV operation as the display width
of the column created without validating the width for out
of bound condition.
2) The DIV operation incorrectly returns the length of its first
argument as the display width of its result; thus allowing
creation of a table with an incorrect display width of 3000
for the field.
Fix:
----
This fix changes the DIV operation implementation to correctly
evaluate the display width of its result. We check if DIV's
results estimated width crosses maximum width for integer
value (21) and if yes set it to this maximum value.
This patch also fixes fixes maximum display width evaluation
for DIV function when its first argument is in UCS2.
Bug#18396916 MAIN.OUTFILE_LOADDATA TEST FAILS ON ARM, AARCH64, PPC/PPC64
The recorded results for the failing tests were wrong.
They were introduced by the patch for
Bug#30946 mysqldump silently ignores --default-character-set when used with --tab
Correct results were returned for platforms where 'char' is implemented as unsigned.
This was reported as
Bug#46895 Test "outfile_loaddata" fails (reproducible)
Bug#11755168 46895: TEST "OUTFILE_LOADDATA" FAILS (REPRODUCIBLE)
The patch for that bug fixed only parts of the problem,
leaving the incorrect results in the .result file.
Solution: use 'uchar' for field_terminator and line_terminator on all platforms.
Also: remove some un-necessary casts, leaving the ones we actually need.
WRITTEN WHILE ROWS REMAINS
Problem:
========
When truncate table fails while using transactional based
engines even though the operation errors out we still
continue and log it to binlog. Because of this master has
data but the truncate will be written to binary log which
will cause inconsistency.
Analysis:
========
Truncate table can happen either through drop and create of
table or by deleting rows. In the second case the existing
code is written in such a way that even if an error occurs
the truncate statement will always be binlogged. Which is not
correct.
Binlogging of TRUNCATE TABLE statement should check whether
truncate is executed "transactionally or not". If the table
is transaction based we log the TRUNCATE TABLE only on
successful completion.
If table is non transactional there are possibilities that on
error we could have partial changes done hence in such cases
we do log in spite of errors as some of the lines might have
been removed, so the statement has to be sent to slave.
Fix:
===
Using table handler whether truncate table is being executed
in transaction based mode or not is identified and statement
is binlogged accordingly.
LOCAL AND IMPORT ERRORS
Description:
-----------
This bug happens due to the fact that current algorithm is designed
that in the case of LOCAL load of data, in case of the error, the
remaining part of the file is read in order to return the proper
error message to the client side.
But, the problem with current implementation is that data stream
for the client side is cleared only in the case where line delimiters
exist, which is not a case with, for example fixed width
fields.
Fix:
----
Ported patch provided by Sinisa Milivojevic n bug report for this
issue to 5.5+ versions.
As part of this patch code is changed to clear the data stream
by calling new member function "READ_INFO::skip_data_till_eof".
FAILING ASSERTION: FLEN == LEN
Problem:
Broken invariant triggered when building a unique index on a
binary column and the input data contains duplicate keys. This was broken
in debug builds only.
Fix:
Fixed length of the binary datatype can be greater than length of
the shorter prefix on which index is being created.
ACCEPTED BUT PARSED INCORRECTLY
When we are setting the value in a system variable,
We can set it like
set sys_var="Iden1.Iden2"; //1
set sys_var='Iden1.Iden2'; //2
set sys_var=Iden1.Iden2; //3
set sys_var=.ident1.ident2; //4
set sys_var=`Iden1.Iden2`; //5
While parsing, for case 1(when ANSI_QUOTES is enable) and 2,
we will take as string literal(we will make item of type Item_string).
for case 3 & 4, taken as Item_field, where Iden1 is a table name and
iden2 is a field name.
for case 5, again Item_field type, where iden1.iden2 is taken as
field name.
Now in case 1, when we are assigning some value to system variable
(which can take string or enumerate type data), we are setting only
field part.
This means only iden2 value will be set for system variable. This
result in wrong result.
Solution:
(for string type) We need to Document that we are not allowed to set
system variable which takes string as identifier, otherwise result
in unexpected behaviour.
(for enumerate type)
if we pass iden1.iden2, we will give an error ER_WRONG_TYPE_FOR_VAR
(Incorrect argument type to variable).
Problem:
In the clustered index, when an update operation is done the overall
scenario (after rb#4479) is as follows:
1. Delete mark the old record that is to be updated.
2. The old record disowns the blobs.
3. Insert the new record into clustered index.
4. For non-updated blobs, new record must own it. Verified by assert.
5. For non-updated blobs, in new record marked as inherited.
Scenario involving DB_LOCK_WAIT:
If step 3 times out, then we will skip 1 and 2 and will continue from
step 3. This skipping is achieved by the UPD_NODE_INSERT_BLOB state.
In this case, step 4 is not correct. Because of step 1, the new
record need not own the blobs. Hence the assert failure.
Solution:
The assert in step 4 is removed. Instead code is added to ensure that
the record owns the blob.
Note:
This is a regression caused by rb#4479.
rb#4571 approved by Marko
AUTO_INCREMENT_INCREMENT
Problem:
=======
When auto_increment_increment system variable decreases,
immediate next value of auto increment column is not affected.
Solution:
========
Get the previous inserted value of auto increment column by
subtracting the previous auto_increment_increment from next
auto increment value. After that calculate the current autoinc value
using newly changed auto_increment_increment variable.
Approved by Sunny [rb#4394]
Problem:
The function row_upd_changes_ord_field_binary() is used to decide whether to
use row_upd_clust_rec_by_insert() or row_upd_clust_rec(). The function
row_upd_changes_ord_field_binary() does not make use of charset information.
Based on binary comparison it decides that r1 and r2 differ in their ordering
fields.
In the function row_upd_clust_rec_by_insert(), an update is done by delete +
insert. These operations internally make use of cmp_dtuple_rec_with_match()
to compare records r1 and r2. This comparison takes place with the use of
charset information.
This means that it is possible for the deleted record to be reused in the
subsequent insert. In the given scenario, the characters 'a' and 'A' are
considered equal in the my_charset_latin1. When this happens, the ownership
information of externally stored blobs are not correctly handled.
Solution:
When an update is done by delete followed by insert, disown the relevant
externally stored fields during the delete marking itself (within the same
mtr). If the insert succeeds, then nothing with respect to blob ownership
needs to be done. If the insert fails, then the disown done earlier will be
removed when the operation is rolled back.
rb#4479 approved by Marko.
The maximum value for innodb_thread_sleep_delay is 4294967295 (32-bit) or
18446744073709551615 (64-bit) microseconds. This is way too big, since
the max value of innodb_thread_sleep_delay is limited by
innodb_adaptive_max_sleep_delay if that value is set to non-zero value
(its default is 150,000).
Solution
The maximum value of innodb_thread_sleep_delay should be the same as
the maximum value of innodb_adaptive_max_sleep_delay, which is 1000000.
Approved by Jimmy, rb#4429
FAILS TO DROP CREATED EVENTS:
- Check for triggers should exclude mtr's own
- Move the code to before checksum table as it might affect result
of some autdit_log tests (does in 5.6)
- Replace SHOW STATUS LIKE 'slave_open_temp_tables' to be like in 5.6
Bug#17867117 - ERROR RESULT WHEN "COUNT + DISTINCT + CASE WHEN" NEED MERGE_WALK
Problem:
COUNT DISTINCT gives incorrect result when it uses a Unique
Tree and its last inserted record has null value.
Here is how COUNT DISTINCT is processed, given that this query is not
using loose index scan.
When a row is produced as a result of joining tables (there is only
one table here), we store the SELECTed value in a Unique tree. This
allows elimination of any duplicates, and thus implements DISTINCT.
When we have processed all rows like this, we walk the Unique tree,
counting its elements, in Aggregator_distinct::endup() (tree->walk());
for each element we call Item_sum_count::add(). Such function wants to
ignore any NULL value, for that it checks item_sum -> args[0] ->
null_value. It is a mistake: when walking the Unique tree, the value
to be aggregated is not item_sum ->args[0] but rather table ->
field[0].
Solution:
instead of item_sum -> args[0] -> null_value, use arg_is_null(), which
knows where to look (like in fix for bug 57932).
As a consequence of this solution, we have to make arg_is_null() a
little more general:
1) Because it was so far only used for AVG() (which always has a
single argument), this function was looking at a single argument; now
that it has to work with COUNT(DISTINCT expression1,expression2), it
must look at all arguments.
2) Because we start using arg_is_null () for COUNT(DISTINCT), i.e. in
Item_sum_count::add (), it implies that we are also using it for
COUNT(no DISTINCT) (same add ()). For COUNT(no DISTINCT), the
nullness to check is that of item_sum -> args[0]. But the null_value
of such item is reliable only if val_*() has been called on it. So far
arg_is_null() was always used after a call to arg_val*(), so could
rely on null_value; but for COUNT, there is no call to arg_val*(), so
arg_is_null() has to call is_null() instead.
Testcase for 16539979 by Neeraj. Testcase for 17867117 contributed by
Xiaobin Lin from Taobao.
Fix :
-------
Created separate suites called innodb_zip ans i_innodb_zip that contain all compression tests.
Running the new suites with following compression-related parameters :
* innodb_compression_level = {1/9}
* innodb_log_compressed_pages = {ON/OFF}
Problem:-
We have created a table with UTF8_BIN collation.
In case, when in our query we have ORDER BY clause over a function
call we are getting result in incorrect order.
Note:the bug is not there in 5.5.
Analysis:
In 5.5, for UTF16_BIN, we have min and max multi-byte length is 2 and 4
respectively.In make_sortkey(),for 2 byte character character we are
assuming that the resultant length will be 2 byte/character. But when we
use my_strnxfrm_unicode_full_bin(), we store sorting weights using 3 bytes
per character.This result in truncated result.
Same thing happen for UTF8MB4, where we have 1 byte min multi-byte and
4 byte max multi-byte.We will accsume resultant data as 1 byte/character,
which result in truncated result.
Solution:-
use strnxfrm(means use of MY_CS_STRNXFRM macro) is used for sort, in
which the resultant length is not dependent on source length.
This worklog aims at testing the two following scenarios:
1) Whenever the mysql_binlog_send method (dump thread)
reaches the end of file when reading events from the binlog, before
checking if it should wait for more events, there was a test to
check if the file being read was still active, i.e, it was the last
known binlog. However, it was possible that something was written to
the binary log and then a rotation would happen, after EOF was
detected and before the check for active was performed. In this
case, the end of the binary log would not be read by the dump
thread, and this would cause the slave to lose updates.
This test verifies that the problem has been fixed. It waits during
this window while forcing a rotation in the binlog.
2) Verify dump thread can send events in active file, correctly after
encountering an IO error.
INDEX_READ_MAP HAD NO MATCH
If index_read_map is called for exact search and no matching records
exists it will position the cursor on the next record, but still having the
relative position to BTR_PCUR_ON.
This will make a call for index_next to read yet another next record,
instead of returning the record the cursor points to.
Fixed by setting pcur->rel_pos = BTR_PCUR_BEFORE if an exact
[prefix] search is done, but failed.
Also avoids optimistic restoration if rel_pos != BTR_PCUR_ON,
since btr_cur may be different than old_rec.
rb#3324, approved by Marko and Jimmy
DURING INNODB RECOVERY
Problem:
=======
The connection 'master' is dropped by mysqltest after
rpl_end.inc. At this point, dropping temporary tables
at the connection 'master' are not synced at slave.
So, the temporary tables replicated from master remain
on slave leading to an inconsistent close of the test.
The following test thus complains about the presence of
temporary table(s) left over from the previous test.
Fix:
===
- Put explicit drop commands in replication tests so
that the temporary tables are dropped at slave as well.
- Added the check for Slave_open_temp_tables in
mtr_check.sql to warn about the remaining temporary
table, if any, at the close of a test.
IT IS DONE IN-PLACE
With change buffer enabled, InnoDB doesn't write a transaction log
record when it merges a record from the insert buffer to an secondary
index page if the insertion is performed as an update-in-place.
Fixed by logging the 'update-in-place' operation on secondary index
pages.
Approved by Marko. rb#2429
THAN LOCALHOST
This is a test bug and the explanation for the behaviour can be found
on the bug page.Modifying the select to select user where user!=root for the line where
failure is encountered on machines with no hostname other than the localhost.