Problem:
========
CURRENT_TEST: binlog_encryption.rpl_corruption
mysqltest: In included file "./include/wait_for_slave_io_error.inc":
...
At line 72: Slave stopped with wrong error code
**** Slave stopped with wrong error code: 1743 (expected 1595,1913) ****
Analysis:
========
The test emulates the corruption at the various stages of replication for
example in binlog file, in network and in relay log etc. It verifies that all
corruption cases are handled through appropriate error messages.
The test cases which emulate network failure expect following errors.
--ER_SLAVE_RELAY_LOG_WRITE_FAILURE (1595)
--ER_NETWORK_READ_EVENT_CHECKSUM_FAILURE (1743)
Ideally test should expect error codes as 1595 and 1743.
But the test actually waits on incorrect error code 1595,1913
Fix:
===
Added appropriate error code for 'ER_NETWORK_READ_EVENT_CHECKSUM_FAILURE'.
Replaced 1913 with 1743.
Analysis
Mysqlbinlog output for encrypted binary log
#Q> insert into tab1 values (3,'row 003')
#190912 17:36:35 server id 10221 end_log_pos 980 CRC32 0x53bcb3d3 Table_map: `test`.`tab1` mapped to number 19
# at 940
#190912 17:36:35 server id 10221 end_log_pos 1026 CRC32 0xf2ae5136 Write_rows: table id 19 flags: STMT_END_F
Here we can see Table_map_log_event ends at 980 but Next event starts at 940.
And the reason for that is we do not send START_ENCRYPTION_EVENT to the slave
Solution:-
Send Start_encryption_log_event as Ignorable_log_event to slave(mysqlbinlog),
So that mysqlbinlog can update its log_pos.
Since Slave can request multiple FORMAT_DESCRIPTION_EVENT while master does not
have so We only update slave master pos when master actually have the
FORMAT_DESCRIPTION_EVENT. Similar logic should be applied for START_ENCRYPTION_EVENT.
Also added the test case when new server reads the data from old server which
does not send START_ENCRYPTION_EVENT to slave.
Master Slave Upgrade Scenario.
When Slave is updated first, Slave will have extra logic of handling
START_ENCRYPTION_EVENT But master willnot be sending START_ENCRYPTION_EVENT.
So there will be no issue.
When Master is updated first, It will send START_ENCRYPTION_EVENT to
slave , But slave will ignore this event in queue_event.
Analysis:
========
In general if there are three groups.
1 - Inserts 32 which fails due to local entry '32' on slave.
2 - Inserts 33
3 - Inserts 34
Each group considers itself as a waiter and it waits for prior group 'waitee'.
This is done in 'register_wait_for_prior_event_group_commit'. If there is no
other parallel group being scheduled then no waitee will be there.
Let us assume 3 groups are being scheduled in parallel.
3-> waits for 2-> waits for->1
'1' upon completion it checks is there any registered subsequent waiter. If
so it wakes up the subsequent waiter with its execution status. This execution
status is stored in wakeup_error.
If '1' failed then it sends corresponding wakeup_error to 2. Then '2' aborts
and it propagates error to '3'. So all further commits are aborted. This
mechanism works only when all transactions reach a stage where they are
waiting for their prior commit to complete.
In case of optimistic following scenario occurs.
1,2,3 are scheduled in parallel.
3 - Reaches group_commit_code waits for 2 to complete.
1 - errors out sets stop_on_error_sub_id=1.
When a group execution results in error its corresponding sub_id is set to
'stop_on_error_sub_id'. Any new groups queued for execution will check if
their sub_id is > stop_on_error_sub_id. If it is true their execution will be
skipped as prior group execution failed. 'skip_event_group=1' will be set.
Since the execution of SQL thread is about to stop we just skip execution of
all the following event groups. We still do all the normal waiting and wakeup
processing between the event groups as a simple way to ensure that everything
is stopped and cleaned up correctly.
Upon error '1' transaction checks for registered waiters. Since no one is
there it simply goes away.
2 - Starts the execution. It checks do I have a waitee.
Since wait_commit_sub_id == entry->last_committed_sub_id no waitee is set.
Secondly: 'entry->stop_on_error_sub_id' is set by '1'st execution. Now
'handle_parallel_thread' code checks if the current group 'sub_id' is greater
than the 'sub_id' set within 'stop_on_error_sub_id'.
Since the above is true 'skip_event_group=true' is set. Simply call
'wait_for_prior_commit' to wakeup all waiters. Group '2' didn't had any
waitee and its execution is skipped. Hence its wakeup_error=0.It sends a
positive wakeup signal to '3'. Which commits. This results in a missed
transaction. i.e 33 is missed and 34 is committed.
Fix:
===
When a worker learns that an earlier transaction execution has failed, and it
should not proceed for further execution, it should mark its own execution
status as failed so that it alerts its followers to abort as well.
This allows one to run the test suite even if any of the following
options are changed:
- character-set-server
- collation-server
- join-cache-level
- log-basename
- max-allowed-packet
- optimizer-switch
- query-cache-size and query-cache-type
- skip-name-resolve
- table-definition-cache
- table-open-cache
- Some innodb options
etc
Changes:
- Don't print out the value of system variables as one can't depend on
them to being constants.
- Don't set global variables to 'default' as the default may not
be the same as the test was started with if there was an additional
option file. Instead save original value and reset it at end of test.
- Test that depends on the latin1 character set should include
default_charset.inc or set the character set to latin1
- Test that depends on the original optimizer switch, should include
default_optimizer_switch.inc
- Test that depends on the value of a specific system variable should
set it in the test (like optimizer_use_condition_selectivity)
- Split subselect3.test into subselect3.test and subselect3.inc to
make it easier to set and reset system variables.
- Added .opt files for test that required specfic options that could
be changed by external configuration files.
- Fixed result files in rockdsb & tokudb that had not been updated for
a while.
MDEV-17625 Different warnings when comparing a garbage to DATETIME vs TIME
- Splitting processes of data type conversion (to TIME/DATE,DATETIME)
and warning generation.
Warning are now only get collected during conversion (in an "int" variable),
and are pushed in the very end of conversion (not in parallel).
Warnings generated by the low level routines str_to_xxx() and number_to_xxx()
can now be changed at the end, when TIME_FUZZY_DATES is applied,
from "Invalid value" to "Truncated invalid value".
Now "Illegal value" is issued only when the low level routine returned
an error and TIME_FUZZY_DATES was not set. Otherwise, if the low level
routine returned "false" (success), or if NULL was converted to a zero
datetime by TIME_FUZZY_DATES, then "Truncated illegal value"
is issued. This gives better warnings.
- Methods Type_handler::Item_get_date() and
Type_handler::Item_func_hybrid_field_type_get_date() now only
convert and collect warning information, but do not push warnings.
- Changing the return data type for Type_handler::Item_get_date()
and Type_handler::Item_func_hybrid_field_type_get_date() from
"bool" to "void". The conversion result (success vs error) can be
checked by testing ltime->time_type. MYSQL_TIME_{NONE|ERROR}
mean mean error, other values mean success.
- Adding new wrapper methods Type_handler::Item_get_date_with_warn() and
Type_handler::Item_func_hybrid_field_type_get_date_with_warn()
to do conversion followed by raising warnings, and changing
the code to call new Type_handler::***_with_warn() methods.
- Adding a helper class Temporal::Status, a wrapper
for MYSQL_TIME_STATUS with automatic initialization.
- Adding a helper class Temporal::Warn, to collect warnings
but without actually raising them. Moving a part of ErrConv
into a separate class ErrBuff, and deriving both Temporal::Warn
and ErrConv from ErrBuff. The ErrBuff part of Temporal::Warn
is used to collect textual representation of the input data.
- Adding a helper class Temporal::Warn_push. It's used
to collect warning information during conversion, and
automatically pushes warnings to the diagnostics area
on its destructor time (in case of non-zero warning).
- Moving more code from various functions inside class Temporal.
- Adding more Temporal_hybrid constructors and
protected Temporal methods make_from_xxx(),
which convert and only collect warning information, but do not
actually raise warnings.
- Now the low level functions str_to_datetime() and str_to_time()
always set status->warning if the return value is "true" (error).
- Now the low level functions number_to_time() and number_to_datetime()
set the "*was_cut" argument if the return value is "true" (error).
- Adding a few DBUG_ASSERTs to make sure that str_to_xxx() and
number_to_xxx() always set warnings on error.
- Adding new warning flags MYSQL_TIME_WARN_EDOM and MYSQL_TIME_WARN_ZERO_DATE
for the code symmetry. Before this change there was a special
code path for (rc==true && was_cut==0) which was treated by
Field_temporal::store_invalid_with_warning as "zero date violation".
Now was_cut==0 always means that there are no any error/warnings/notes
to be raised, not matter what rc is.
- Using new Temporal_hybrid constructors in combination with
Temporal::Warn_push inside str_to_datetime_with_warn(),
double_to_datetime_with_warn(), int_to_datetime_with_warn(),
Field::get_date(), Item::get_date_from_string(), and a few other places.
- Removing methods Dec_ptr::to_datetime_with_warn(),
Year::to_time_with_warn(), my_decimal::to_datetime_with_warn(),
Dec_ptr::to_datetime_with_warn().
Fixing Sec6::to_time() and Sec6::to_datetime() to
convert and only collect warnings, without raising warnings.
Now warning raising functionality resides in Temporal::Warn_push.
- Adding classes Longlong_hybrid_null and Double_null, to
return both value and the "IS NULL" flag. Adding methods
Item::to_double_null(), to_longlong_hybrid_null(),
Item_func_hybrid_field_type::to_longlong_hybrid_null_op(),
Item_func_hybrid_field_type::to_double_null_op().
Removing separate classes VInt and VInt_op, as they
have been replaced by a single class Longlong_hybrid_null.
- Adding a helper method Temporal::type_name_by_timestamp_type(),
moving a part of make_truncated_value_warning() into it,
and reusing in Temporal::Warn::push_conversion_warnings().
- Removing Item::make_zero_date() and
Item_func_hybrid_field_type::make_zero_mysql_time().
They provided duplicate functionality.
Now this code resides in Temporal::make_fuzzy_date().
The latter is now called for all Item types when data type
conversion (to DATE/TIME/DATETIME) is involved, including
Item_field and Item_direct_view_ref.
This fixes MDEV-17563: Item_direct_view_ref now correctly converts
NULL to a zero date when TIME_FUZZY_DATES says so.
The code in Type_handler_blob****::make_conversion_table_field()
erroneously assumed that row format replication uses
MYSQL_TYPE_TINYBLOB, MYSQL_TYPE_BLOB, MYSQL_TYPE_MEDIUMBLOB,
MYSQL_TYPE_LONGBLOB type codes to tranfer BLOB variations.
In fact, all BLOB variations use MYSQL_TYPE_BLOB as the type
code, while the BLOB packlength (1,2,3 or 4) it tranferred
in metadata.
The bug was introduced by aee068085d
(MDEV-9238 Wrap create_virtual_tmp_table() into a class, split into different steps)
Main problem was that no log-event print function checked for disk
full error on the IO_CACHE.
All changes in this patch only affects mysqlbinlog, not the server!
- Changed all log-event print functions to return 1 on error
- Fixed memory usage when not using --flashback.
- Added printing of number of rows in row events. Can be disabled with
--print-row-count=0
- Print annotated rows when using mysqlbinlog --short-form
- Fixed that mysqlbinlog --debug works
- Fixed create_drop_binlog.test test failure
- Reorganized fields in PRINT_EVENT_INFO to be according to size to
optimize storage
- Don't change print_row_event_position or print_row_counts if set by user
- Remove some testing of argument to my_free is 0
- base64-output=never is now supported and works in all context
- Updated help information for --base64-output and --short-form
- print_row_count is now on by default. Reset automatically if --short-form
is used
- Removed obsolote warning for mysql 5.6.0
- More DBUG_PRINT for mysqltest.cc
- my_b_write_byte() now checks for flush failures. This fixed a memory
overrun on disk full
- my_b_printf() now returns 1 on failure, 0 on ok. This simplifies code
and no old code was using the old return value of my_b_printf().
- my_b_Write_backtick_quote() now returns 1 on failure and 0 on ok
- Fixed some error conditions in log printing that was not previously
handled.
- Slave_rows_error_report() can now handle longlong positions
- Write_on_release_cache() rewritten so that we can detect errors
on flush. Not depending on automatic release anymore.
- Changed types for Pos and End_log_pos to 64 bit in SHOW BINLOG EVENTS
- Fixed that copy_event_cache_to_string_and_reinit() works with strings
longer than 4G (Changed to use LEX_STRING instead of String)
- Restricted binlog_rows_event_max_size to UINT32_MAX-1 as String's are
anyway restricted to UINT32_MAX
- Fixed bug in rpl_binlog_state::write_to_iocache() which hide write
failures (duplicate variable name)
- Fixed bug in String::append if original string was not allocated
- Stop mysqlbinlog output at once if there is an error.
- Before printing error message, flush result file. This ensures that
the error message is printed last. (Easier to find)
and specifically the ack receiving functionality.
Semisync is turned to be static instead of plugin so its functions
are invoked at the same points as RUN_HOOKS.
The RUN_HOOKS and the observer interface remain to be removed by later
patch.
Todo:
React on killed status by repl_semisync_master.wait_after_sync(). Currently
Repl_semi_sync_master::commit_trx does not check the killed status.
There were few bugfixes found that are present in mysql and its unclear
whether/how they are covered. Those include:
Bug#15985893: GTID SKIPPED EVENTS ON MASTER CAUSE SEMI SYNC TIME-OUTS
Bug#17932935 CALLING IS_SEMI_SYNC_SLAVE() IN EACH FUNCTION CALL
HAS BAD PERFORMANCE
Bug#20574628: SEMI-SYNC REPLICATION PERFORMANCE DEGRADES WITH A HIGH NUMBER OF THREADS
Recording mtr.add_suppression("Could not use .*")
instead of mtr.add_suppression("Could not open .*")
in tests:
- binlog_encryption.rpl_binlog_errors
- binlog_encryption.binlog_index
The following options will be removed:
innodb_file_format
innodb_file_format_check
innodb_file_format_max
innodb_large_prefix
They have been deprecated in MySQL 5.7.7 (and MariaDB 10.2.2) in WL#7703.
The file_format column in two INFORMATION_SCHEMA tables will be removed:
innodb_sys_tablespaces
innodb_sys_tables
Code to update the file format tag at the end of page 0:5
(TRX_SYS_PAGE in the InnoDB system tablespace) will be removed.
When initializing a new database, the bytes will remain 0.
All references to the Barracuda file format will be removed.
Some references to the Antelope file format (meaning
ROW_FORMAT=REDUNDANT or ROW_FORMAT=COMPACT) will remain.
This basically ports WL#7704 from MySQL 8.0.0 to MariaDB 10.3.1:
commit 4a69dc2a95995501ed92d59a1de74414a38540c6
Author: Marko Mäkelä <marko.makela@oracle.com>
Date: Wed Mar 11 22:19:49 2015 +0200
1. Special mode to search in error logs: if SEARCH_RANGE is not set,
the file is considered an error log and the search is performed
since the last CURRENT_TEST: line
2. Number of matches is printed too. "FOUND 5 /foo/ in bar".
Use greedy .* at the end of the pattern if number of matches
isn't stable. If nothing is found it's still "NOT FOUND",
not "FOUND 0".
3. SEARCH_ABORT specifies the prefix of the output.
Can be "NOT FOUND" or "FOUND" as before,
but also "FOUND 5 " if needed.
the test restarts the server, giving it 60 seconds to shutdown
and then killing it mercilessly.
make sure the server closes all MyISAM tables before shutdown,
as we cannot reliably expect it to make the deadline.
The reason is a simple race condition. Initially the test was meant to synchronize with master
before showing tables, but it turned out that the slave IO thread should fail by this point,
and synchronization was removed along with a server bugfix. Now added an intermediate sync
instead, to make sure that slave has replicated events before the point of failure