Problem was that in SST log_bin_index name and directory was not
handled and passed to rsync SST script.
wsrep_sst_common.sh
Read binlog index dirname and filename if --binlog-index
parameter is provided. Read binlog filenames from that file
from donor and write transfered binlog filenames to that
file in joiner.
mysqld.cc, mysqld.h
Moved opt_binlog_index_name from static to global and added
it to extern.
wsrep_sst.cc
generate_binlog_index_opt_val
New function to generate binlog index name if opt_binlog_index_name is
given on configuration.
sst_prepare_other
Add binlog index configuration to SST command.
wsrep_sst.h
Add new SST parameter --binlog-index
Add test case.
If we have a 2+ node cluster which is replicating from an async master
and the binlog_format is set to STATEMENT and multi-row inserts are executed
on a table with an auto_increment column such that values are automatically
generated by MySQL, then the server node generates wrong auto_increment
values, which are different from what was generated on the async master.
The causes and fixes:
1. We need to improve processing of changing the auto-increment values
after changing the cluster size.
2. If wsrep auto_increment_control switched on during operation of
the node, then we should immediately update the auto_increment_increment
and auto_increment_offset global variables, without waiting of the next
invocation of the wsrep_view_handler_cb() callback. In the current version
these variables retain its initial values if wsrep_auto_increment_control
is switched on during operation of the node, which leads to inconsistent
results on the different nodes in some scenarios.
3. If wsrep auto_increment_control switched off during operation of the node,
then we must return the original values of the auto_increment_increment and
auto_increment_offset global variables, as the user has set. To make this
possible, we need to add a "shadow copies" of these variables (which stores
the latest values set by the user).
Only close stdin if it was open initinally. Otherwise we may close file
descriptor which is reused for different puprose (specifically for binlog
index file in case of this bug).
specific temporary errors
The optimistic parallel slave's worker thread could face a run-time error due to
the algorithm's specifics which allows for conflicts like the reported
"Can't find record in 'table'".
A typical stack is like
{noformat}
#0 handler::print_error (this=0x61c00008f8a0, error=149, errflag=0) at handler.cc:3650
#1 0x0000555555e95361 in write_record (thd=thd@entry=0x62a0000a2208, table=table@entry=0x61f00008ce88, info=info@entry=0x7fffdee356d0) at sql_insert.cc:1944
#2 0x0000555555ea7767 in mysql_insert (thd=thd@entry=0x62a0000a2208, table_list=0x61b00012ada0, fields=..., values_list=..., update_fields=..., update_values=..., duplic=<optimized out>, ignore=<optimized out>) at sql_insert.cc:1039
#3 0x0000555555efda90 in mysql_execute_command (thd=thd@entry=0x62a0000a2208) at sql_parse.cc:3927
#4 0x0000555555f0cc50 in mysql_parse (thd=0x62a0000a2208, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at sql_parse.cc:7449
#5 0x00005555566d4444 in Query_log_event::do_apply_event (this=0x61200005b9c8, rgi=<optimized out>, query_arg=<optimized out>, q_len_arg=<optimized out>) at log_event.cc:4508
#6 0x00005555566d639e in Query_log_event::do_apply_event (this=<optimized out>, rgi=<optimized out>) at log_event.cc:4185
#7 0x0000555555d738cf in Log_event::apply_event (rgi=0x61d0001ea080, this=0x61200005b9c8) at log_event.h:1343
#8 apply_event_and_update_pos_apply (ev=ev@entry=0x61200005b9c8, thd=thd@entry=0x62a0000a2208, rgi=rgi@entry=0x61d0001ea080, reason=<optimized out>) at slave.cc:3479
#9 0x0000555555d8596b in apply_event_and_update_pos_for_parallel (ev=ev@entry=0x61200005b9c8, thd=thd@entry=0x62a0000a2208, rgi=rgi@entry=0x61d0001ea080) at slave.cc:3623
#10 0x00005555562aca83 in rpt_handle_event (qev=qev@entry=0x6190000fa088, rpt=rpt@entry=0x62200002bd68) at rpl_parallel.cc:50
#11 0x00005555562bd04e in handle_rpl_parallel_thread (arg=arg@entry=0x62200002bd68) at rpl_parallel.cc:1258
{noformat}
Here {{handler::print_error}} computes whether to error log the
current error when --log-warnings > 1. The decision flag is consulted
bu {{my_message_sql()}} which can be eventually called.
In the bug case the decision is to log.
However in the optimistic mode slave applier case any conflict is
attempted to resolve with rollback and retry to success. Hence the
logging is at least extraneous.
The case is fixed with adding a new flag {{ME_LOG_AS_WARN}} which
{{handler::print_error}} may propagate further on through {{my_error}}
when the error comes from an optimistically running slave worker thread.
The new flag effectively requests the warning level for the errlog record,
while the thread's DA records the actual error (which is regarded as temporary one
by the parallel slave error handler).
The functionality of the socket system variable is extended
here such that a preciding '@' indicates that the socket
will be an abstract socket. Thie socket name wil be
the remainder of the name after the '@'. This is consistent
with the approached used by systemd in socket activation.
Thanks to Sergey Vojtovich:
On OS X sockaddr_un is defined as:
struct sockaddr_un
{
u_char sun_len;
u_char sun_family;
char sun_path[104];
};
There is a comment in man 7 unix (on linux):
"
On Linux, the above offsetof() expression equates to the same value as sizeof(sa_family_t),
but some other implementations include other fields before sun_path, so the offsetof()
expression more portably describes the size of the address structure.
"
As such, use the offsetof for Linux and use the previous sizeof(UNIXaddr)
for non-unix platforms as that's what worked before and they don't
support abstract sockets so there's no compatibility problem..
strace -fe trace=networking mysqld --skip-networking --socket @abc ...
...
[pid 10578] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 22
[pid 10578] setsockopt(22, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 10578] bind(22, {sa_family=AF_UNIX, sun_path=@"abc"}, 6) = 0
[pid 10578] listen(22, 80) = 0
...
Version: '10.3.6-MariaDB-log' socket: '@abc' port: 0 Source distribution
$ lsof -p 10578
mysqld 10578 dan 22u unix 0x00000000087e688c 0t0 4787815 @abc type=STREAM
MDEV-13073 effectively made the master semisync component depending on
the plugin one through instantiation of THD by its Ack thread.
The thread therefore must be closing its resources prior to
plugin_shutdown(), which was not the case.
Fixed with implementing the requirement.
Recent changes in semisync initialization via MDEV-13073 introduced
instantiation of THD too early from the server components
pov which led to segfault.
Fixed with relocating the semisync component initialization
to later time when thread specific memory can be used.
Use systemd EXTEND_TIMEOUT_USEC to advise systemd of progress
Move towards progress measures rather than pure time based measures.
Progress reporting at numberious shutdown/startup locations incuding:
* For innodb_fast_shutdown=0 trx_roll_must_shutdown() for rolling back incomplete transactions.
* For merging the change buffer (in srv_shutdown(bool ibuf_merge))
* For purging history, srv_do_purge
Thanks Marko for feedback and suggestions.
The upper 1M limit for max_prepared_stmt_count was set over 10 years
ago. It doesn't suite current hardware and a sysbench oltp_read_write
test with 512 threads will hit this limit.