This is not a fix to the bug. It only adds debug info, so
that we can analyze the bug better next time it happens.
Please revert the patch after the bug is fixed.
the reason for the failure is that io thread passes through a sequence of state
changes before it eventually got stuck at the expect running state as NO.
It's unreasonble to wait for the running status while the whole idea of the test is
to get to the IO thread error.
Fixed with changing the waiting condition.
Problem: master binlog has 'create table t1'. Master binlog
was removed before slave could replicate it. In test's cleanup
code, master did 'drop table t1', which caused slave sql
thread to stop with an error since slave sql thread did not
know about t1.
Fix: t1 is just an auxiliary construction, only needed on
master. Hence, we turn off binlogging before t1 is created,
drop t1 as soon as we don't need it anymore, and then turn
on binlogging again.
Many dump threads can exist due to a way the new version of mtr governs suites.
For this immediate problem the test is refined not to use I_S but rather to reconnect
explicitly with preserving logics of a an old target bug fixes verification.
Problem: the test set @@global.init_slave to garbage at a time
which was not guaranteed to be after the time when the slave's
SQL thread used it. That would cause the slave's SQL thread to
stop in rare cases.
Fix: The test does not care about the value of
@@global.init_slave, except that it should be different on
master and slave. Hence, we set @@global.init_slave to
something that is valid SQL.
Post-post-push fix. The result file for rpl_rbr_to_sbr needs to be
updated. While I was there, made rpl_rbr_to_sbr clean up after itself
by reverting @@binlog_format to the value it had before the test
started.
Problem was that ha_partition had HA_FILE_BASED flag set
(since it uses a .par file), but after open it uses the first partitions
flags, which results in different case handling for create and for
open.
Solution was to change the underlying partition name so it was consistent.
(Only happens when lower_case_table_names = 2, i.e. Mac OS X and storage
engines without HA_FILE_BASED, like InnoDB and Memory.)
(Recommit after adding rename of check_lowercase_names to
get_canonical_filename, and moved it from handler.h to mysql_priv.h)
NOTE: if a mixed case name for a partitioned table was created when
lower_case_table_name = 2 it should be renamed or dropped before using
the updated version (See bug#37402 for more info)
Problem 1: tests often fail in pushbuild with a timeout when waiting
for the slave to start/stop/receive error.
Fix 1: Updated the wait_for_slave_* macros in the following way:
- The timeout is increased by a factor ten
- Refactored the macros so that wait_for_slave_param does the work for
the other macros.
Problem 2: Tests are often incorrectly written, lacking a
source include/wait_for_slave_to_[start|stop].inc.
Fix 2: Improved the chance to get it right by adding
include/start_slave.inc and include/stop_slave.inc, and updated tests
to use these.
Problem 3: The the built-in test language command
wait_for_slave_to_stop is a misnomer (does not wait for the slave io
thread) and does not give as much debug info in case of failure as
the otherwise equivalent macro
source include/wait_for_slave_sql_to_stop.inc
Fix 3: Replaced all calls to the built-in command by a call to the
macro.
Problem 4: Some, but not all, of the wait_for_slave_* macros had an
implicit connection slave. This made some tests confusing to read,
and made it more difficult to use the macro in circular replication
scenarios, where the connection named master needs to wait.
Fix 4: Removed the implicit connection slave from all
wait_for_slave_* macros, and updated tests to use an explicit
connection slave where necessary.
Problem 5: The macros wait_slave_status.inc and wait_show_pattern.inc
were unused. Moreover, using them is difficult and error-prone.
Fix 5: remove these macros.
Problem 6: log_bin_trust_function_creators_basic failed when running
tests because it assumed @@global.log_bin_trust_function_creators=1,
and some tests modified this variable without resetting it to its
original value.
Fix 6: All tests that use this variable have been updated so that
they reset the value at end of test.
The problem is that relying on the output of the 'ls' command is not
portable as its behavior is not the same between systems and it might
even not be available at all in (Windows).
So I added list_files that relies on the portable mysys library instead.
(and also list_files_write_file and list_files_append_file,
since the test was using '--exec ls' in that way.)
problem was that ha_partition::records was not implemented, thus
using the default handler::records, which is not correct if the engine
does not support HA_STATS_RECORDS_IS_EXACT.
Solution was to implement ha_partition::records as a wrapper around
the underlying partitions records.
The rows column in explain partitions will now include the total
number of records in the partitioned table.
(recommit after removing out-commented code)
Problem was that auto_repair, is_crashed and check_and_repair was not
implemented in ha_partition.
Solution, implemented them as loop over all partitions for is_crashed and
check_and_repair, and using the first partition for auto_repair.
(Recommit after fixing review comments)
Problem: the test syncs slave by a 'wait_condition' waiting until
table t1 has 5000 rows. However, there is no guarantee that t1
makes it to the slave before the wait_condition.
Fix: sync_slave_with_master just after t1 was created.
Problem: rpl_ndb_transaction fails because it assumes nothing
is written to the binlog at a certain point. However, ndb may
binlog updates in ndb system tables at a nondeterministic
time point after an ndb table update has been committed.
Fix: break the test into two. rpl_ndb_transaction still does
the ndb updates needed by the first half of the test. The new
test case rpl_bug26395 includes the part that assumes nothing
more will be written to the binlog.
Problem 1: main.loaddata tried to trigger an error caused by
reading files outside the vardir, by reading itself. However,
if loaddata.test is not world-readable (e.g., umask=0077),
then another error is triggered.
Fix 1: allow the other error too.
Problem 2: rpl_slave_skip and rpl_innodb_mixed_dml tried to
copy a file from mysql-test/suite/rpl/data to mysql-test/var
and then read it. That failed too if umask=0077, since the
file would not become world-readable.
Fix 2: move the files from mysql-test/suite/rpl/data to
mysql-test/std_data and update tests accordingly. Remove
the directory mysql-test/suite/rpl/data.
This bug has been fixed in two slightly different ways in
6.0-rpl and {5.1,6.0}-bugteam. To avoid future merge
problems, I'm now copying the 6.0-rpl fix to 5.1-bugteam.
The previous fix for the bug was incomplete. The test failed
because t2 did not exist on the slave (since the slave was
lagging) when the
wait_condition was executed. Fixed by inserting
sync_slave_with_master just after t2 was created.
Test was failing due to the addition of a '\x05' character in result sets
Latest builds of the server have shown this problem to have disappeared.
Removing code within the test that disables the test on Mac OS X.
Recommit due to tree error on earlier, approved patch.
Bug#36787 Test funcs_1.charset_collation_1 failing
Details:
1. Skip charset_collation_1 if charset "ucs2_bin" is
missing (property which distincts "vanilla" builds
from the others)
2. Let builds with version_comment LIKE "%Advanced%"
(found them for 5.1) execute charset_collation_3.
3. Update comments charset_collation.inc so that they
reflect the current experiences.
In order to handle CHAR() fields, 8 bits were reserved for
the size of the CHAR field. However, instead of denoting the
number of characters in the field, field_length was used which
denotes the number of bytes in the field.
Since UTF-8 fields can have three bytes per character (and
has been extended to have four bytes per character in 6.0),
an extra two bits have been encoded in the field metadata
work for fields of type Field_string (i.e., CHAR fields).
Since the metadata word is filled, the extra bits have been
encoded in the upper 4 bits of the real type (the most
significant byte of the metadata word) by computing the
bitwise xor of the extra two bits. Since the upper 4 bits
of the real type always is 1111 for Field_string, this
means that for fields of length <256, the encoding is
identical to the encoding used in pre-5.1.26 servers, but
for lengths of 256 or more, an unrecognized type is formed,
causing an old slave (that does not handle lengths of 256
or more) to stop.
Problem: rpl_switch_stm_row_mixed did not wait until row events generated by
INSERT DELAYED were written to the master binlog before it synchronized slave
with master. This caused sporadic errors where these rows were missing on
slave.
Fix: wait until all rows appear on the slave.
This is a backport, applying the same to 5.1-bugteam as was previously
applied to 6.0-rpl
The crash appeared to be a result of allocating an instance of Discrete_interval
automatically that that was referred in out-of-declaration scope.
Fixed with correcting backing up and restoring scheme of
auto_inc_intervals_forced, introduced by bug#33029, by means of shallow copying;
added simulation code that forces executing those fixes of the former bug that
targeted at master-and-slave having incompatible bug#33029-prone versions.
On a slow environment like valgrind the test is vulnerable
because it does not check if slave has stopped at time
of the new session is requested `start slave;' -- disabling
test till it is fixed.