Problem:
========
When attempting to delay a Slave attached with GTID, there appears to be an
extra delay applied initially. For example, this output reflects a Slave that is
already delayed by 43200 seconds. When switching to GTID replication,
replication is paused until SQL_Remaining_Delay counts down to 0:
CHANGE MASTER TO master_use_gtid=current_pos; CHANGE MASTER TO
MASTER_DELAY=43200;
Seconds_Behind_Master: 44847
Using_Gtid: Current_Pos
SQL_Delay: 43200
SQL_Remaining_Delay: 43089
Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master
executed event
Analysis:
=========
When slave initiates a GTID based connection request to master, the master sends
two GTID_LIST events. The first one is actual GTID_LIST event and the second
one is a fake GTID_LIST event. This is sent by master to provide its current
binlary log file position. The fake GTID_LIST events will have their ev->when=0.
'when' (the timestamp) is set to 0 so that slave could distinguish between real
and fake Rotate events.
On slave side when MASTER_DELAY is configured to "X" the applier will ensure
that there is a time delay of "X" seconds before the event is applied.
General behaviour of MASTER_DELAY example:-
Master
timestamp of event e1=10
timestamp of event e2=11
On slave MASTER_DELAY=5
Event e1 will be applied at = 15
e2 will be applied at =16
In bug scenario:-
On Master: With GTIDs
timestamp of event e1=10
timestamp of event e2=0
On Slave:
e1 will be applied at = 10 + 5 =15
For e2, since "e2->when=0" e2->when is set to current timestamp.
i.e since the e2->when and current timestamp on slave is the same applier waits
for additional master_delay=5 seconds. the ev->when contributes to
"rli->last_master_timestamp".
rli->last_master_timestamp= ev->when + (time_t) ev->exec_time;
Fake events should not update the "ev->when" to "current timestamp" on slave.
Fix:
===
Remove the assignment of current timestamp to "ev->when" when "ev->when=0".
select from I_S
Problem:
========
When applier thread tries to access 'variable_name' of
INFORMATION_SCHEMA.SESSION_VARIABLES table through triggers, it results in an
abnormal exit of slave server.
Analysis:
========
At the time of replication of stored routines and triggers, their associated
security context will be sent by the master. The applier thread on the slave
server will use this information to set the required security context for the
execution of stored routines and triggers. This is achieved as follows.
->The stored routine object has a member named 'm_security_ctx' which holds the
security context received from master.
->The applier thread's security_ctx is stored into a 'backup' object.
->Set the applier thread's security_ctx to 'm_security_ctx'.
->Upon the completion of stored routine execution restore the original security
context of applier thread from the backup.
During the above process the 'm_security_ctx' object is not initialized
properly. Hence the 'external_user' of 'm_security_ctx' has invalid value for
this variable and accessing this variable results in abnormal exit of server.
Fix:
===
Invoke the Security_context::init() call from the constructor of stored routine
so that 'm_security_ctx' gets initialized properly.
32 bit int
Row-based slave applier could not parse correctly the table id when
the value exceeded the max of 32 bit unsigned int.
The reason turns out in that the being parsed value placeholder
was sized as 4 bytes.
The type is fixed to ulonglong.
Additionally the patch works around Rows_log_event::m_table_id 4 bytes
size on 32 bits platforms. In case of last_table_id value overflows
the 4 byte max, there won't be the zero value for m_table_id generated
and the first wrapped-around value is one, this is thanks to excluding
UINT_MAX32 + 1 from TABLE_SHARE::table_map_id.
The error message modified.
Then the TABLE_SHARE::error_table_name() implementation taken from 10.3,
to be used as a name of the table in this message.
There was a failure in rpl_delayed_slave after recent MDEV-14528 commit.
The parallel applier should not set its
Relay_log::last_master_timestamp from Format-descriptor log event.
The latter may reflect a deep past so Seconds-behind-master will be
computed through it and displayed all time while the first possibly
"slow" group of events is executed.
The main MDEV-14528 is refined, rpl_delayed_slave now passes also
in the parallel mode.
When replicated events are from Master unaware of MariaDB GTID their handling by
the Parallel slave misses Seconds_Behind_Master updating.
In the bug condition the Show-Slave-Status' field remains unchanged.
Because in such case event execution is sequential the bug
is fixed with deploying the same logics as in the explicit single-threaded mode
with is to set
Relay_log_event::last_master_timestamp
member early at the end of event reading from the relay log.
set both `password` and `authentication_string` columns in `mysql`.`user`
table for now.
Suppress the "password was ignored" warning if the password is
the same as the authentication string
consistently) on Replication Slave
lower_case_table_names 0 -> 1 replication works, it's safe as long as
mixed case names mapping to the lower case ones is one-to-one
If a mtr test runs multiple servers and only some of them get
restarted on whatever reason with new command-line parameters,
then subsequent mtr test may fail, because no cleanup is performed.
Replication and Galera test suites are affected.
In the mtr script, there is a server_need_restart function
that decides whether we need to start a new mysqld process before
the new (next) test. If the mysqld parameters were changed in the
previous test - not necessarily the parameters of the primary mysqld
server, maybe even the secondary server parameters - this function
decides to start a new mysqld process. But since it does not remove
the old (changed) parameters, the new process starts with the
parameters changed by the *previous* test.
To correct this error, we must delete the modified process
parameters after checking that they have been changed during
the previous test.
This patch also simplifies and makes more stable the
galera_drop_database test, during debugging of which this
problem was detected.
https://jira.mariadb.org/browse/MDEV-17421
This would happen especially in optimistic parallel replication, where there
is a good chance that a transaction will be rolled back (due to conflicts)
after it has executed record_gtid(). If the transaction did any deletions of
old rows as part of record_gtid(), those deletions will be undone as well.
And the code did not properly ensure that the deletions would be re-tried.
This patch makes record_gtid() remember the list of deletions done as part
of a transaction. Then in rpl_slave_state::update() when the changes have
been committed, we discard the list. However, in case of error and rollback,
in cleanup_context() we will instead put the list back into
rpl_global_gtid_slave_state so that the deletions will be re-tried later.
Probably fixes part of the cause of MDEV-12147 as well.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>
with spatial index
So the issue is since it is spatial index , at the time of searching index
for key (Rows_log_event::find_row) we use wrong field image we use
Field::itRAW while we should be using Field::itMBR
Actually if we use "set password for " command this changes the checksum
of mysql.user table
-localhost root Y Y Y Y Y Y Y Y YY Y Y Y Y Y Y Y Y Y Y Y $
Y Y Y Y Y Y Y 0 00 0 N N 0.000000
+localhost root Y Y Y Y Y Y Y Y YY Y Y Y Y Y Y Y Y Y Y Y Y
Y Y Y Y Y Y Y 0 00 0 mysql_native_password N N 0.000000
In short we replace '' with mysql_native_password
which make checksum to be different, and hence check test case fails.
So we use UPDATE mysql.user command.
as a separate source for data
Actually MDEV-15867 and MDEV-16192 are same, Slave adds "or replace" to create
table stmt. So create table t1 is create or replace on slave. So this bug
is not because of replication, We can get this bug on general server if we
manually add or replace to create query.
Problem:- So if we try to create table t1 (same name as of temp table t1 ) via
CREATE or replace TABLE t AS SELECT * FROM t;
Since in this query we are creating table from select * from t1 , we call
unique_table function to see whether if source and destination table are same.
But there is one issue unique_table does not account if source table is tmp table
in this case source and destination table can be same.
Solution:- We will change find_dup_table to not to look for temp table if
CHECK_DUP_SKIP_TEMP_TABLE flag is on.
Don't let SET PASSWORD to set the password, if auth_string is set.
Now SET PASSWORD always sets the plugin/auth_string fields and clears
the password field (on pre-plugin mysql.user table it works as before).
specific temporary errors
The optimistic parallel slave's worker thread could face a run-time error due to
the algorithm's specifics which allows for conflicts like the reported
"Can't find record in 'table'".
A typical stack is like
{noformat}
#0 handler::print_error (this=0x61c00008f8a0, error=149, errflag=0) at handler.cc:3650
#1 0x0000555555e95361 in write_record (thd=thd@entry=0x62a0000a2208, table=table@entry=0x61f00008ce88, info=info@entry=0x7fffdee356d0) at sql_insert.cc:1944
#2 0x0000555555ea7767 in mysql_insert (thd=thd@entry=0x62a0000a2208, table_list=0x61b00012ada0, fields=..., values_list=..., update_fields=..., update_values=..., duplic=<optimized out>, ignore=<optimized out>) at sql_insert.cc:1039
#3 0x0000555555efda90 in mysql_execute_command (thd=thd@entry=0x62a0000a2208) at sql_parse.cc:3927
#4 0x0000555555f0cc50 in mysql_parse (thd=0x62a0000a2208, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at sql_parse.cc:7449
#5 0x00005555566d4444 in Query_log_event::do_apply_event (this=0x61200005b9c8, rgi=<optimized out>, query_arg=<optimized out>, q_len_arg=<optimized out>) at log_event.cc:4508
#6 0x00005555566d639e in Query_log_event::do_apply_event (this=<optimized out>, rgi=<optimized out>) at log_event.cc:4185
#7 0x0000555555d738cf in Log_event::apply_event (rgi=0x61d0001ea080, this=0x61200005b9c8) at log_event.h:1343
#8 apply_event_and_update_pos_apply (ev=ev@entry=0x61200005b9c8, thd=thd@entry=0x62a0000a2208, rgi=rgi@entry=0x61d0001ea080, reason=<optimized out>) at slave.cc:3479
#9 0x0000555555d8596b in apply_event_and_update_pos_for_parallel (ev=ev@entry=0x61200005b9c8, thd=thd@entry=0x62a0000a2208, rgi=rgi@entry=0x61d0001ea080) at slave.cc:3623
#10 0x00005555562aca83 in rpt_handle_event (qev=qev@entry=0x6190000fa088, rpt=rpt@entry=0x62200002bd68) at rpl_parallel.cc:50
#11 0x00005555562bd04e in handle_rpl_parallel_thread (arg=arg@entry=0x62200002bd68) at rpl_parallel.cc:1258
{noformat}
Here {{handler::print_error}} computes whether to error log the
current error when --log-warnings > 1. The decision flag is consulted
bu {{my_message_sql()}} which can be eventually called.
In the bug case the decision is to log.
However in the optimistic mode slave applier case any conflict is
attempted to resolve with rollback and retry to success. Hence the
logging is at least extraneous.
The case is fixed with adding a new flag {{ME_LOG_AS_WARN}} which
{{handler::print_error}} may propagate further on through {{my_error}}
when the error comes from an optimistically running slave worker thread.
The new flag effectively requests the warning level for the errlog record,
while the thread's DA records the actual error (which is regarded as temporary one
by the parallel slave error handler).