The reason for the crash was rotate_relay_log (mi=0x0) did not verify
the passed value of active_mi. There are more cases where active_mi
is supposed to be non-zero e.g change_master(), stop_slave(), and it's
reasonable to protect from a similar crash all of them with common
fixes.
Fixed with spliting end_slave() in slave threads release and slave
data clean-up parts (a new close_active_mi()). The new function is
invoked at the very end of close_connections() so that all users of
active_mi are proven to have left.
with gcc 4.3.2
Compiling MySQL with gcc 4.3.2 and later produces a number of
warnings, many of which are new with the recent compiler
versions.
This bug will be resolved in more than one patch to limit the
size of changesets. This is the first patch, fixing a number
of the warnings, predominantly "suggest using parentheses
around && in ||", and empty for and while bodies.
with gcc 4.3.2
Compiling MySQL with gcc 4.3.2 and later produces a number of
warnings, many of which are new with the recent compiler
versions.
This bug will be resolved in more than one patch to limit the
size of changesets. This is the first patch, fixing a number
of the warnings, predominantly "suggest using parentheses
around && in ||", and empty for and while bodies.
The server was not cleaning the last IO error and error number when
resetting slave.
This patch addresses this issue by backporting into 5.1 part of the
patch in BUG 34654. A fix for this issue had already been pushed into
6.0 as part of the aforementioned bug, however the patch also included
some refactoring. The fix for 5.1 does not take into account the
refactoring part.
stop/start slave
When stopping and restarting the slave while it is replicating
temporary tables, the server would crash or raise an assertion
failure. This was due to the fact that although temporary tables are
saved between slave threads restart, the reference to the thread in
use (table->in_use) was not being properly updated when the restart
happened (it would still reference the old/invalid thread instead of
the new one).
This patch addresses this issue by resetting the reference to the new
slave thread on slave thread restart.
165 changesets with 23 conflicts:
Text conflict in mysql-test/r/lock_multi.result
Text conflict in mysql-test/t/lock_multi.test
Text conflict in mysql-test/t/mysqldump.test
Text conflict in sql/item_strfunc.cc
Text conflict in sql/log.cc
Text conflict in sql/log_event.cc
Text conflict in sql/parse_file.cc
Text conflict in sql/slave.cc
Text conflict in sql/sp.cc
Text conflict in sql/sp_head.cc
Text conflict in sql/sql_acl.cc
Text conflict in sql/sql_base.cc
Text conflict in sql/sql_class.cc
Text conflict in sql/sql_crypt.cc
Text conflict in sql/sql_db.cc
Text conflict in sql/sql_lex.cc
Text conflict in sql/sql_parse.cc
Text conflict in sql/sql_select.cc
Text conflict in sql/sql_table.cc
Text conflict in sql/sql_view.cc
Text conflict in storage/innobase/handler/ha_innodb.cc
Text conflict in storage/myisam/mi_packrec.c
Text conflict in tests/mysql_client_test.c
Updates to Innobase, taken from main 5.1:
bzr: ERROR: Some change isn't sane:
File mysql-test/r/innodb-semi-consistent.result is owned by Innobase and should not be updated.
File mysql-test/t/innodb-semi-consistent.test is owned by Innobase and should not be updated.
File storage/innobase/handler/ha_innodb.cc is owned by Innobase and should not be updated.
File storage/innobase/ibuf/ibuf0ibuf.c is owned by Innobase and should not be updated.
File storage/innobase/include/row0mysql.h is owned by Innobase and should not be updated.
File storage/innobase/include/srv0srv.h is owned by Innobase and should not be updated.
File storage/innobase/include/trx0trx.h is owned by Innobase and should not be updated.
File storage/innobase/include/trx0trx.ic is owned by Innobase and should not be updated.
File storage/innobase/lock/lock0lock.c is owned by Innobase and should not be updated.
File storage/innobase/page/page0cur.c is owned by Innobase and should not be updated.
File storage/innobase/row/row0mysql.c is owned by Innobase and should not be updated.
File storage/innobase/row/row0sel.c is owned by Innobase and should not be updated.
File storage/innobase/srv/srv0srv.c is owned by Innobase and should not be updated.
File storage/innobase/trx/trx0trx.c is owned by Innobase and should not be updated.
(Set env var 'ALLOW_UPDATE_INNOBASE_OWNED' to override.)
The issue of the current bug is unguarded access to mi->slave_running
by the shutdown thread calling end_slave() that is bug#29968
(alas happened not to be cross-linked with the current bug)
Fixed:
with removing the unguarded read of the running status
and perform reading it in terminate_slave_thread()
at time run_lock is taken (mostly bug#29968 backporting, still with some
improvements over that patch - see the error reporting from
terminate_slave_thread()).
Issue of bug#38716 is fixed here for 5.0 branch as well.
Note:
There has been a separate artifact identified -
a race condition between init_slave() and end_slave() -
reported as Bug#44467.
In order to define the --slave-load-tmpdir, the init_relay_log_file()
was calling fn_format(MY_PACK_FILENAME) which internally was indirectly
calling strmov_overlapp() (through pack_dirname) and the following
warning message was being printed out while running in Valgrind:
"source and destination overlap in strcpy".
We fixed the issue by removing the flag MY_PACK_FILENAME as it was not
necessary. In a nutshell, with this flag the function fn_format() tried
to replace a directory by either "~", "." or "..". However, we wanted
exactly to remove such strings.
In this patch, we also refactored the functions init_relay_log_file()
and check_temp_dir(). The former was refactored to call the fn_format()
with the flag MY_SAFE_PATH along with the MY_RETURN_REAL_PATH, in order
to avoid issues with long directories and return an absolute path,
respectively. The flag MY_SAFE_UNPACK_FILENAME was removed too as it was
responsible for removing "~", "." or ".." only from the file parameter
and we wanted to remove such strings from the directory parameter in
the fn_format(). This result is stored in an rli variable, which is then
processed by the other function in order to verify if the directory exists
and if we are able to create files in it.
We didn't expect "error: no error", although this is
in fact a legitimate state (if something is erroneous
on the master, but not on the slave, e.g. INSERT fails
on master due to UNIQUE constraint which does not exist
on slave).
We now list the ignore for "0: no error" the same way
as any other ignore; moreover, if no or an empty
--slave-skip-errors is passed at start-up, we show
"OFF" instead of empty list, as intended. (The code
for that was there, but was only run for the empty-
argument case, even if it subsequently tested for
both conditions.)
RBR was not considering the option --slave-skip-errors.
To fix the problem, we are reporting the ignored ERROR(s) as warnings thus avoiding
stopping the SQL Thread. Besides, it fixes the output of "SHOW VARIABLES LIKE
'slave_skip_errors'" which was showing nothing when the value "all" was assigned
to --slave-skip-errors.
@sql/log_event.cc
skipped rbr errors when the option skip-slave-errors is set.
@sql/slave.cc
fixed the output of for SHOW VARIABLES LIKE 'slave_skip_errors'"
@test-cases
fixed the output of rpl.rpl_idempotency
updated the test case rpl_skip_error
Bug#319 if while a non-transactional slave is replicating a transaction possible problem
It is impossible to roll back a mixed engines transaction when one of the engine is
non-transaction. In replication that fact is crucial because the slave can not safely
re-apply a transction that was interrupted with STOP SLAVE.
Fixed with making STOP SLAVE not be effective immediately in the case the current
group of replication events has modified a non-transaction table. In order for slave to leave
either the group needs finishing or the user issues KILL QUERY|CONNECTION slave_thread_id.
Compiling with debug and assigning an invalid directory to --slave-load-tmpdir
was crashing the slave due to the following assertion DBUG_ASSERT(! is_set() ||
can_overwrite_status). This assertion assumes that a thread can change its
state once (i.e. ok,error, etc) before aborting, cleaning/resuming or completing
its execution unless the overwrite flag (i.e. can_overwrite_status) is true.
The Append_block_log_event::do_apply_event which is responsible for creating
temporary file(s) was not cleaning the thread state. Thus a failure while
trying to create a file in an invalid temporary directory was causing the crash.
To fix the problem we check if the temporary directory is valid before starting
the SQL Thread and reset the thread state before creating a file in
Append_block_log_event::do_apply_event.
Some errors that cause the slave SQL thread to stop are not shown in the
Slave_SQL_Error column of "SHOW SLAVE STATUS". Instead, the error is only
in the server's error log.
That makes it difficult to analyze the error for the user. One example of an error
that stops the slave but is not shown by "SHOW SLAVE STATUS" is when @@global.init_slave
is set incorrectly (e.g., it contains something that is not valid SQL).
Three failures were not correctly reported:
1 - Failures during slave thread initialization
2 - Failures while initializing the relay log position right after
starting the slave thread.
3 - Failures while processing queries passed through the init_slave
option.
This patch fixes the issues by reporting the errors through relay-info->report.
- Remove bothersome warning messages. This change focuses on the warnings
that are covered by the ignore file: support-files/compiler_warnings.supp.
- Strings are guaranteed to be max uint in length
Several system variables did not behave like system variables should do.
When trying to SET them or use them in SELECT, they were reported as
"unknown system variable". But they appeared in SHOW VARIABLES.
This has been fixed by removing the "fixed_vars" array of variables
and integrating the variables into the normal system variables chain.
All of these variables do now behave as read-only global-only
variables. Trying to SET them tells they are read-only, trying to
SELECT the session value tells they are global only. Selecting the
global value works. It delivers the same value as SHOW VARIABLES.
In order to handle CHAR() fields, 8 bits were reserved for
the size of the CHAR field. However, instead of denoting the
number of characters in the field, field_length was used which
denotes the number of bytes in the field.
Since UTF-8 fields can have three bytes per character (and
has been extended to have four bytes per character in 6.0),
an extra two bits have been encoded in the field metadata
work for fields of type Field_string (i.e., CHAR fields).
Since the metadata word is filled, the extra bits have been
encoded in the upper 4 bits of the real type (the most
significant byte of the metadata word) by computing the
bitwise xor of the extra two bits. Since the upper 4 bits
of the real type always is 1111 for Field_string, this
means that for fields of length <256, the encoding is
identical to the encoding used in pre-5.1.26 servers, but
for lengths of 256 or more, an unrecognized type is formed,
causing an old slave (that does not handle lengths of 256
or more) to stop.
The crash appeared to be a result of allocating an instance of Discrete_interval
automatically that that was referred in out-of-declaration scope.
Fixed with correcting backing up and restoring scheme of
auto_inc_intervals_forced, introduced by bug#33029, by means of shallow copying;
added simulation code that forces executing those fixes of the former bug that
targeted at master-and-slave having incompatible bug#33029-prone versions.
using a trig in SP
For all 5.0 and up to 5.1.12 exclusive, when a stored routine or
trigger caused an INSERT into an AUTO_INCREMENT column, the
generated AUTO_INCREMENT value should not be written into the
binary log, which means if a statement does not generate
AUTO_INCREMENT value itself, there will be no Intvar event (SET
INSERT_ID) associated with it even if one of the stored routine
or trigger caused generation of such a value. And meanwhile, when
executing a stored routine or trigger, it would ignore the
INSERT_ID value even if there is a INSERT_ID value available set
by a SET INSERT_ID statement.
Starting from MySQL 5.1.12, the generated AUTO_INCREMENT value is
written into the binary log, and the value will be used if
available when executing the stored routine or trigger.
Prior fix of this bug in MySQL 5.0 and prior MySQL 5.1.12
(referenced as the buggy versions in the text below), when a
statement that generates AUTO_INCREMENT value by the top
statement was executed in the body of a SP, all statements in the
SP after this statement would be treated as if they had generated
AUTO_INCREMENT by the top statement. When a statement that did
not generate AUTO_INCREMENT value by the top statement but by a
function/trigger called by it, an erroneous Intvar event would be
associated with the statement, this erroneous INSERT_ID value
wouldn't cause problem when replicating between masters and
slaves of 5.0.x or prior 5.1.12, because the erroneous INSERT_ID
value was not used when executing functions/triggers. But when
replicating from buggy versions to 5.1.12 or newer, which will
use the INSERT_ID value in functions/triggers, the erroneous
value will be used, which would cause duplicate entry error and
cause the slave to stop.
The patch for 5.1 fixed it to ignore the SET INSERT_ID value when
executing functions/triggers if it is replicating from a master
of buggy versions, another patch for 5.0 fixed it not to generate
the erroneous Intvar event.
Problem: if the IO slave thread is attempting to connect,
STOP SLAVE waits for the attempt to finish.
It may take a long time.
Fix: don't wait, stop the slave immediately.
MASTER_POS_WAIT return values are different than expected when the server is not a slave.
It returns -1 instead of NULL.
Fixed with correcting st_relay_log_info::wait_for_pos() to return the proper
value in the case of rli info is not inited.