As of now innodb does not store trx_id for each record in secondary index.
The idea behind is following: let us store only per-page max_trx_id, and
delete-mark the records when they are deleted/updated.
If the read starts, it rememders the lowest id of currently active
transaction. Innodb refers to it as trx->read_view->m_up_limit_id.
See also ReadView::open.
When the page is fetched, its max_trx_id is compared to m_up_limit_id.
If the value is lower, and the secondary index record is not delete-marked,
then this page is just safe to read as is. Else, a clustered index could be
needed ato access. See page_get_max_trx_id call in row_search_mvcc, and the
corresponding switch (row_search_idx_cond_check(...)) below.
Virtual columns are required to be updated in case if the record was
delete-marked. The motivation behind it is documented in
Row_sel_get_clust_rec_for_mysql::operator() near
row_sel_sec_rec_is_for_clust_rec call.
This was basically a description why virtual column computation can
normally happen during SELECT, and, generally, a vcol index access.
Sometimes stats tables are updated by innodb. This starts a new
transaction, and it can happen that it didn't finish to the moment of
SELECT execution, forcing virtual columns recomputation. If the result was
a something that normally outputs a warning, like division by zero, then
it could be outputted in a racy manner.
The solution is to suppress the warnings when a column is computed
for the described purpose.
ignore_wrnings argument is added innobase_get_computed_value.
Currently, it is only true for a call from
row_sel_sec_rec_is_for_clust_rec.
Underlying causes of all bugs mentioned below are same. This patch fixes
all of them:
1) MDEV-25028: ASAN use-after-poison in
base_list_iterator::next or Assertion `sl->join == 0' upon
INSERT .. RETURNING via PS
2) MDEV-25187: Assertion `inited == NONE || table->open_by_handler'
failed or Direct leak in init_dynamic_array2 upon INSERT .. RETURNING
and memory leak in init_dynamic_array2
3) MDEV-28740: crash in INSERT RETURNING subquery in prepared statements
4) MDEV-27165: crash in base_list_iterator::next
5) MDEV-29686: Assertion `slave == 0' failed in
st_select_lex_node::attach_single
Analysis:
consider this statement:
INSERT(1)...SELECT(2)...(SELECT(3)...) RETURNING (SELECT(4)...)
When RETURNING is encountered, add_slave() changes how selects are linked.
It makes the builtin_select(1) slave of SELECT(2). This causes
losing of already existing slave(3) (which is nested select of SELECT of
INSERT...SELECT). When really, builtin_select (1) shouldn't be slave to
SELECT(2) because it is not nested within it. Also, push_select() to use
correct context also changed how select are linked.
During reinit_stmt_before_use(), we expect the selects to
be cleaned-up and have join=0. Since these selects are not linked correctly,
clean-up doesn't happen correctly so join is not NULL. Hence the crash.
Fix:
IF we are parsing RETURNING, make is_parsing_returning= true for
current select. get rid of add_slave(). In place of push_select(), used
push_context() to have correct context (the context of builtin_select)
to resolve items in item_list. And add these items to item_list of
builtin_select.
The problem is that if table definition cache (TDC) is full of real tables
which are in tables cache, view definition can not stay there so will be
removed by its own underlying tables.
In situation above old mechanism of detection matching definition in PS
and current version always require reprepare and so prevent executing
the PS.
One work around is to increase TDC, other - improve version check for
views/triggers (which is done here). Now in suspicious cases we check:
- timestamp (microseconds) of the view to be sure that version really
have changed;
- time (microseconds) of creation of a trigger related to time
(microseconds) of statement preparation.
Problem:
========
Replication can break while applying a query log event if its
respective command errors on the primary, but is ignored by the
replication filter within Grant_tables on the replica. The bug
reported by MDEV-28530 shows this with REVOKE ALL PRIVILEGES using a
non-existent user. The primary will binlog the REVOKE command with
an error code, and the replica will think the command executed with
success because the replication filter will ignore the command while
accessing the Grant_tables classes. When the replica performs an
error check, it sees the difference between the error codes, and
replication breaks.
Solution:
========
If the replication filter check done by Grant_tables logic ignores
the tables, reset thd->slave_expected_error to 0 so that
Query_log_event::do_apply_event() can be made aware that the
underlying query was ignored when it compares errors.
Note that this bug also effects DROP USER if not all users exist
in the provided list, and the patch fixes and tests this case.
Reviewed By:
============
andrei.elkin@mariadb.com
Making changes to wsrep_mysqld.h causes large parts of server code to
be recompiled. The reason is that wsrep_mysqld.h is included by
sql_class.h, even tough very little of wsrep_mysqld.h is needed in
sql_class.h. This commit introduces a new header file, wsrep_on.h,
which is meant to be included from sql_class.h, and contains only
macros and variable declarations used to determine whether wsrep is
enabled.
Also, header wsrep.h should only contain definitions that are also
used outside of sql/. Therefore, move WSREP_TO_ISOLATION* and
WSREP_SYNC_WAIT macros to wsrep_mysqld.h.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
TIMESTAMP columns were compared as strings in ALL/ANY comparison,
which did not work well near DST time change.
Changing ALL/ANY comparison to use "Native" representation to compare
TIMESTAMP columns, like simple comparison does.
When creating a recursive CTE, the column types are taken from the
non recursive part of the CTE (this is according to the SQL standard).
This patch adds code to abort the CTE if the calculated values in the
recursive part does not fit in the fields in the created temporary table.
The new code only affects recursive CTE, so it should not cause any notable
problems for old applications.
Other things:
- Fixed that we get correct row numbers for warnings generated with
WITH RECURSIVE
Reviewer: Alexander Barkov <bar@mariadb.com>
MDEV-21810 MBR: Unexpected "Unsafe statement" warning for unsafe IODKU
MDEV-17614 fixes to replication unsafety for INSERT ON DUP KEY UPDATE
on two or more unique key table left a flaw. The fixes checked the
safety condition per each inserted record with the idea to catch a user-created
value to an autoincrement column and when that succeeds the autoincrement column
would become the source of unsafety too.
It was not expected that after a duplicate error the next record's
write_set may become different and the unsafe decision for that
specific record will be computed to screw the Query's binlogging
state and when @@binlog_format is MIXED nothing gets bin-logged.
This case has been already fixed in 10.5.2 by 91ab42a823 that
relocated/optimized THD::decide_logging_format_low() out of the record insert
loop. The safety decision is computed once and at the right time.
Pertinent parts of the commit are cherry-picked.
Also a spurious warning about unsafety is removed when MIXED
@@binlog_format; original MDEV-17614 test result corrected.
The original test of MDEV-17614 is extended and made more readable.
Problem:
========
If a primary is shutdown during an active semi-sync connection
during the period when the primary is awaiting an ACK, the primary
hard kills the active communication thread and does not ensure the
transaction was received by a replica. This can lead to an
inconsistent replication state.
Solution:
========
During shutdown, the primary should wait for an ACK or timeout
before hard killing a thread which is awaiting a communication. We
extend the `SHUTDOWN WAIT FOR SLAVES` logic to identify and ignore
any threads waiting for a semi-sync ACK in phase 1. Then, before
stopping the ack receiver thread, the shutdown is delayed until all
waiting semi-sync connections receive an ACK or time out. The
connections are then killed in phase 2.
Notes:
1) There remains an unresolved corner case that affects this
patch. MDEV-28141: Slave crashes with Packets out of order when
connecting to a shutting down master. Specifically, If a slave is
connecting to a master which is actively shutting down, the slave
can crash with a "Packets out of order" assertion error. To get
around this issue in the MTR tests, the primary will wait a small
amount of time before phase 1 killing threads to let the replicas
safely stop (if applicable).
2) This patch also fixes MDEV-28114: Semi-sync Master ACK Receiver
Thread Can Error on COM_QUIT
Reviewed By
============
Andrei Elkin <andrei.elkin@mariadb.com>
the bug was that in_vector array in Item_func_in was allocated in the
statement arena, not in the table->expr_arena.
revert part of the 5acd391e8b2d. Instead, change the arena correctly
in fix_all_session_vcol_exprs().
Remove TABLE_ARENA, that was introduced in 5acd391e8b2d to force
item tree changes to be rolled back (because they were allocated in the
wrong arena and didn't persist. now they do)
Problem: In regular replication, when master binlogged using statement format
slave might not have written an event to its binary log when the Query
event aimed at a temporary table.
Specifically this was observed with LOAD DATA INFILE.
This effect was possible because unlike master slave holds temporary
tables in its pool and the master side check of existence of a
temporary table at the format bin-logging decision did not apply.
Solution: replace THD::has_thd_temporary_tables() with
THD::has_temporary_tables which allows to identify temporary table
presence on either side.
--
Reviewed by Andrei Elkin.
Using parallel slave applying can cause deadlock between between DDL and
other events. GTID with lower seqno can be blocked in galera when node
entered TOI mode, but DDL GTID which has higher node can be blocked
before previous GTIDs are applied locally.
Fix is to check prior commits before entering TOI.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Handling BF abort for prepared statement execution so that EXECUTE processing will continue
until parameter setup is complete, before BF abort bails out the statement execution.
THD class has new boolean member: wsrep_delayed_BF_abort, which is set if BF abort is observed
in do_command() right after reading client's packet, and if the client has sent PS execute command.
In such case, the deadlock error is not returned immediately back to client, but the PS execution
will be started. However, the PS execution loop, will now check if wsrep_delayed_BF_abort is set, and
stop the PS execution after the type information has been assigned for the PS.
With this, the PS protocol type information, which is present in the first PS EXECUTE command, is not lost
even if the first PS EXECUTE command was marked to abort.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
extra2_read_len resolved by keeping the implementation
in sql/table.cc by exposed it for use by ha_partition.cc
Remove identical implementation in unireg.h
(ref: bfed2c7d57a7ca34936d6ef0688af7357592dc40)
Problem:
Parse-time conversion from binary to tricky character sets like utf32
produced ill-formed strings. So, later a chash happened in debug builds,
or a wrong SHOW CREATE TABLE was returned in release builds.
Fix:
1. Backporting a few methods from 10.3:
- THD::check_string_for_wellformedness()
- THD::convert_string() overloads
- THD::make_text_string_connection()
2. Adding a new method THD::reinterpret_string_from_binary(),
which makes sure to either returns a well-formed string
(optionally prepending with zero bytes), or returns an error.