This patch ports the work that facebook has performed
to make innochecksum handle compressed tables.
the basic idea is to use actual innodb-code to perform
checksum verification rather than duplicating in innochecksum.cc.
to make this work, innodb code has been annotated with
lots of #ifndef UNIV_INNOCHECKSUM so that it can be
compiled outside of storage/innobase.
A new testcase is also added that verifies that innochecksum
works on compressed/non-compressed tables.
Merged from commit fabc79d2ea976c4ff5b79bfe913e6bc03ef69d42
from https://code.google.com/p/google-mysql/
The actual steps to produce this patch are:
take innochecksum from 5.6.14
apply changes in innodb from facebook patches needed to make innochecksum compile
apply changes in innochecksum from facebook patches
add handcrafted testcase
The referenced facebook patches used are:
91e25120e7847fe76ea51135628a5a4dbf7c240c
mysql-test/suite/maria/insert_select.result:
Added test case
mysql-test/suite/maria/insert_select.test:
Added test case
mysys/thr_lock.c:
Ensure we don't allow concurrent_insert when a read_no_write lock is in use
- The code that tested if
WHERE expr=value AND expr=const
can be rewritten to:
WHERE const=value AND expr=const
was incomplete in case of STRING_RESULT.
- Moving the test into a new function, to reduce duplicate code.
FULLTEXT indexes do not permit index first lookups. By calling:
ha_index_first() with a garbage parameter, random data gets overwritten
that causes the table->field array to be corrupted. Subsequently, when
the field array is accessed, a segfault occurs.
By not allowing index statistics for FULLTEXT indexes, the problem is
resolved.
Fixing a wrong assymetric code in Arg_comparator::set_cmp_func().
It existed for a long time, but showed up in 10.0.14 after the fix
for "MDEV-6666 Malformed result for CONCAT(utf8_column, binary_string)".
MDEV-7254 Assigned expression is evaluated twice when updating column TIMESTAMP NOT NULL
The test type_timestamp failed depending on the build machine time zone.
Setting a fixed time zone for the test.
column TIMESTAMP NOT NULL
Analysis: Problem was that value->is_null() function is called
even when user had explicitly set the value for timestamp
field. Calling this function had the side effect that
expression was evaluated twice.
Fix: (by Sergei Golubchik) check instead value->null_value.
Structure of the table created by the test to archive mysql.slow_log
data didn't match the structure of mysql.slow_log. The failure only
appeared if the slow_log was not empty, which was very rare.
Updated the structure of the table.
The problem was a too low timeout for slave reconnect. It was set to 9 seconds
(10 retries with 1 second in-between). This is occasinally too short on some
Buildbot hosts, when the test crashes and restarts the master while the slave
IO thread is running.
Fix by increasing --master-retry-count for this test.
The test case injects a DBUG that will crash the server during replication,
then does a START SLAVE. We need to use --error 0,2006,2013 on the START
SLAVE, so that we will not fail the test if the server has time to crash
before the START SLAVE returns to the client.
Fixes a failure seen in Buildbot.
Test causes OS error printout and we need to supress this
error message on tests. Additionally, test could cause
different error codes on different OSs.
The bug occurs when a transaction does a retry after all transactions have
done mark_start_commit() in a batch of group commit from the master. In this
case, the retrying transaction can unmark_start_commit() after the following
batch has already started running and de-allocated the GCO. Then after retry,
the transaction will re-do mark_start_commit() on a de-allocated GCO, and also
wakeup of later GCOs can be lost.
This was seen "in the wild" by a user, even though it is not known exactly
what circumstances can lead to retry of one transaction after all transactions
in a group have reached the commit phase.
The lifetime around GCO was somewhat clunky anyway. With this patch, a GCO
lives until rpl_parallel_entry::last_committed_sub_id has reached the last
transaction in the GCO. This guarantees that the GCO will still be alive when
a transaction does mark_start_commit(). Also, we now loop over the list of
active GCOs for wakeup, to ensure we do not lose a wakeup even in the
problematic case.
Use include/sync_with_master_gtid.inc instead of --sync_with_master to avoid a
race in the test case.
In parallel replication, the old-style slave position (which is used by
--sync_with_master) is updated out-of-order between parallel threads. This
makes it possible for the position to be updated past DROP TEMPORARY TABLE t2
just before the commit of INSERT INTO t1 SELECT * FROM t2 becomes visible.
In this case, there is a small window where a SELECT just after
--sync_with_master may not see the changes from the INSERT.
Fix a possible race in the test case when restarting the server.
Make sure we have disconnected before waiting for the reconnect
that signals that the server is back up. Otherwise, we may in
rare cases continue the test while the old server is shutting
down, eventually leading to "connection lost" failure.
There was a race. The test case was expecting the slave to start processing a
particular DELETE statement, then the test would stop the slave at this
point. But there was missing something to wait until the slave would actually
reach this point; thus depending on timing it was possible that the slave
would be stopped too early, causing .result file difference.
Fixed by adding an appropriate wait to the test case.
Problem with test is that test causes OS failures that change.
Idea with test is just to test that server does not crash, no other
output is necessary.
Fix rare failures in test case rpl.rpl_gtid_basic:
- Add another possible error code when a connection is killed.
- Make sure that the IO thread has had time to complete its stop after START
SLAVE UNTIL. Otherwise, START SLAVE might run before IO thread stop,
leaving the test case with a stopped IO thread that eventually causes a
wait timeout.
There was a race, a small window between updating slave position and updating
Seconds_Behind_Master, during which the test case could see the wrong value.
Fix by waiting for the expected status to appear.
The replication relay log position was sometimes updated incorrectly at the
end of a transaction in parallel replication. This happened because the relay
log file name was taken from the current Relay_log_info (SQL driver thread),
not the correct value for the transaction in question.
The result was that if a transaction was applied while the SQL driver thread
was at least one relay log file ahead, _and_ the SQL thread was subsequently
stopped before applying any events from the most recent relay log file, then
the relay log position would be incorrect - wrong relay log file name. Thus,
when the slave was started again, usually a relay log read error would result,
or in rare cases, if the position happened to be readable, the slave might
even skip arbitrary amounts of events.
In GTID mode, the relay log position is reset when both slave threads are
restarted, so this bug would only be seen in non-GTID mode, or in GTID mode
when only the SQL thread, not the IO thread, was stopped.
Simply disallowing equality propagation into LIKE.
A more delicate fix is be possible, but it would need too many changes,
which is not desirable in 10.0 at this point.
MDEV-7106: Sporadic test failure in multi_source.gtid
MDEV-7153: Yet another sporadic failure of multi_source.gtid in buildbot
This patch fixes three races in the multi_source.gtid test case that could
cause sporadic failures:
1. Do not put SHOW ALL SLAVES STATUS in the output, the output is not stable.
2. Ensure that slave1 has replicated as far as expected, before stopping its
connection to master1 (otherwise the following wait will time out due to rows
not replicated from master1).
3. Ensure that slave2 has replicated far enough before connecting slave1 to it
(otherwise we get an error during connect that slave1 is ahead of slave2).
Problem is on test it tried to verify that no files were left
on test database.
Fix: There's no need to list other file types, it can only
list *.par files
I saw two test failures in rpl.rpl_gtid_crash where we get this in the error
log:
141123 12:47:54 [Note] InnoDB: Restoring possible half-written data pages
141123 12:47:54 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Warning: database page corruption or a failed
InnoDB: file read of space 6 page 3.
InnoDB: Trying to recover it from the doublewrite buffer.
141123 12:47:54 [Note] InnoDB: Recovered the page from the doublewrite buffer.
This test case deliberately crashes the server, and if this crash happens
right in the middle of writing a buffer pool page to disk, it is not
unexpected that we can get a half-written page. The page is recovered
correctly from the doublewrite buffer.
So this patch adds a suppression for this warning in the error log for this
test case.
When a master slave restarts, it logs a special restart format description
event in its binlog. When the slave sees this event, it knows it needs to roll
back any active partial transaction, in case the master crashed previously in
the middle of writing such transaction to its binlog.
However, there was a bug where this rollback did not reset rgi->pending_gtid.
This caused the @@gtid_slave_pos to be updated incorrectly with the GTID of
the partial transaction that was rolled back.
Fix this by always clearing rgi->pending_gtid in cleanup_context(), hopefully
preventing similar bugs from turning up in other special cases where a
transaction is rolled back during replication.
Thanks to Pavel Ivanov for tracking down the issue and providing a test case.
after Operating system error number 36 in a file operation.
Analysis: os_file_get_status did not handle error ENAMETOOLONG
correctly.
Fix: Add correct handling for error ENAMETOOLONG. Note that on InnoDB
case the error is not passed all the way up to server. That would
be bigger rewamp.