timestamp: Thu 2011-12-01 15:12:10 +0100
Fix for Bug#13430436 PERFORMANCE DEGRADATION IN SYSBENCH ON INNODB DUE TO ICP
When running sysbench on InnoDB there is a performance degradation due
to index condition pushdown (ICP). Several of the queries in sysbench
have a WHERE condition that the optimizer uses for executing these
queries as range scans. The upper and lower limit of the range scan
will ensure that the WHERE condition is fulfilled. Still, the WHERE
condition is part of the queries' condition and if ICP is enabled the
condition will be pushed down to InnoDB as an index condition.
Due to the range scan's upper and lower limits ensure that the WHERE
condition is fulfilled, the pushed index condition will not filter out
any records. As a result the use of ICP for these queries results in a
performance overhead for sysbench. This overhead comes from using
resources for determining the part of the condition that can be pushed
down to InnoDB and overhead in InnoDB for executing the pushed index
condition.
With the default configuration for sysbench the range scans will use
the primary key. This is a clustered index in InnoDB. Using ICP on a
clustered index provides the lowest performance benefit since the
entire record is part of the clustered index and in InnoDB it has the
highest relative overhead for executing the pushed index condition.
The fix for removing the overhead ICP introduces when running sysbench
is to disable use of ICP when the index used by the query is a
clustered index.
When WL#6061 is implemented this change should be re-evaluated.
Problem was that now we can merge derived table (subquery in the FROM clause).
Fix: in case of detected conflict and presence of derived table "over" the table which cased the conflict - try materialization strategy.
The replication slave sets first error 1913 and immediately after error
1595. Thus it is possible, but unlikely, to get 1913. The original test
seems to realise this, but uses an invalid error code - my guess is
that this was a temporary code used in a feature tree, which was then
forgotten to be fixed when merged to main. The removed "1923" is
something committed by mistake during tests.
If the flag 'optimize_join_buffer_size' is set to 'off' and the value
of the system variable 'join_buffer_size' is greater than the value of
the system variable 'join_buffer_space_limit' than no join cache can
be employed to join tables of the executed query.
A bug in the function JOIN_CACHE::alloc_buffer allowed to use join
buffer even in this case while another bug in the function
revise_cache_usage could cause a crash of the server in this case if the
chosen execution plan for the query contained outer join or semi-join
operation.
This patch is a backport of some of the cleanups/refactorings that were done
as part of WL#1393 Optimizing filesort with small limit.
mysql-test/t/bug13633383.test:
New test case.
mysql-test/valgrind.supp:
Changed signature for find_all_keys
Fixed a typo in the comment.
Fixing test cases which were previouslyno throwing due
disable warnings macro.
sql/sql_base.cc:
Change in indentation and fixing a typo in the comment.
Problem: Statements that write to tables with auto_increment columns
based on the selection from another table, may lead to master
and slave going out of sync, as the order in which the rows
are retrieved from the table may differ on master and slave.
Solution: We mark writing to a table with auto_increment table
based on the rows selected from another table as unsafe. This
will cause the execution of such statements to throw a warning
and forces the statement to be logged in ROW if the logging
format is mixed.
Changes:
1. All the statements that writes to a table with auto_increment
column(s) based on the rows fetched from another table, will now
be unsafe.
2. CREATE TABLE with SELECT will now be unsafe.
sql/share/errmsg-utf8.txt:
Added new warning messages.
sql/sql_base.cc:
-Created function to check statements that write to
tables with auto_increment column and has select.
-Marked all the statements that write to a table
with auto_increment column based on rows fetched
from other table(s) as unsafe.
sql/sql_table.cc:
mark CREATE TABLE[with auto_increment column] as unsafe.
- mysql-test-run.pl --valgrind complains when all tests succeed.
- perfschema.all_instances fail on non-linux, where ENABLE_TEMP_POOL
is not set and therefore BITMAP mutex is not used.
- MDEV-132: main.mysqldump fails because it depends on exact size of stdio
buffers.
- MDEV-99: rpl.rpl_cant_read_event_incident fails due to a race where the
slave manages to connect while the test case is in the middle of setting up
the master, causing the slave to replicate extra/wrong events.
- MDEV-133: rpl.rpl_rotate_purge_deadlock fails because it issues a
DEBUG_SYNC SIGNAL immediately followed by RESET; this means that sometimes
the intended receipient has no time to see the signal before it is cleared
by the RESET, causing wait to timeout.
Problem: Statements that write to tables with auto_increment columns
based on the selection from another table, may lead to master
and slave going out of sync, as the order in which the rows
are retrived from the table may differ on master and slave.
Solution: We mark writing to a table with auto_increment table
as unsafe. This will cause the execution of such statements to
throw a warning and forces the statement to be logged in ROW if
the logging format is mixed.
Changes:
1. All the statements that writes to a table with auto_increment
column(s) based on the rows fetched from another table, will now
be unsafe.
2. CREATE TABLE with SELECT will now be unsafe.
sql/share/errmsg-utf8.txt:
Added new Warning messages
sql/sql_base.cc:
created a new function that checks for select + write on a autoinc table
made all such statements to be unsafe.
sql/sql_parse.cc:
made create autoincremnet tabble + select unsafe
storage/innobase/handler/handler0alter.cc:
for NEWDATE key_type says unsigned, thus col->prtype says unsigned,
but field->flags says signed. Use the same flag for value retrieval
that was used for value storage.
IS EXECUTED TWICE FROM P
This bug is a duplicate of bug 12567331, which was pushed to the
optimizer backporting tree on 2011-06-11. This is just a back-port of
the fix. Both test cases are included as they differ somewhat.
The actual Bug#11754376 does not exist in MySQL 5.5 because at startup
we drop entries for temporary tables from InnoDB dictionary cache (only
if ROW_FORMAT is not REDUNDANT). But nevertheless the bug in
normalize_table_name_low() is present so we fix it.
GRACEFUL SHUTDOWN
During startup mysql picks up .frm files from the tmpdir directory and
tries to drop those tables in the storage engine.
The problem is that when tmpdir ends in / then ha_innobase::delete_table()
is passed a string like "/var/tmp//#sql123", then it wrongly normalizes it
to "/#sql123" and calls row_drop_table_for_mysql() which of course fails
to delete the table entry from the InnoDB dictionary cache.
ha_innobase::delete_table() returns an error but nevertheless mysql wipes
away the .frm file and the entry in the InnoDB dictionary cache remains
orphaned with no easy way to remove it.
The "no easy" way to remove it is to create a similar temporary table again,
copy its .frm file to tmpdir under "#sql123.frm" and restart mysqld with
tmpdir=/var/tmp (no trailing slash) - this way mysql will pick the .frm file
after restart and will try to issue drop table for "/var/tmp/#sql123"
(notice do double slash), ha_innobase::delete_table() will normalize it to
"tmp/#sql123" and row_drop_table_for_mysql() will successfully remove the
table entry from the dictionary cache.
The solution is to fix normalize_table_name_low() to normalize things like
"/var/tmp//table" correctly to "tmp/table".
This patch also adds a test function which invokes
normalize_table_name_low() with various inputs to make sure it works
correctly and a mtr test that calls this test function.
Reviewed by: Marko (http://bur03.no.oracle.com/rb/r/929/)