In the BASE of this patch when a slave parallel worker proceeds from
the wait-for-prior-commit stage into retrying it may have its
backup-lock related sub-state, specifically `THD::backup_commit_lock`,
not reset, that is the pointer dangling.
That caused segfault at the pointer's dereferencing in the worker retrying.
The reason THD::backup_commit_lock is left dangling was unexpected
state of THD having non-NULL of `THD::backup_commit_lock` and NULL of `mdl_backup.ticket`.
This combination turns out possible when the slave worker is killed
for retry *and* few instruction later it does not succeed
to (re-)acquire the Backup MDL at exiting from
`MYSQL_BIN_LOG::queue_for_group_commit()`.
While it did not succeed it also did not expose that fact with
timing out from `MDL_context::acquire_lock` and it did not have to,
as it before to start waiting it found itself killed.
The bug is fixed with amendment of `backup_commit_lock` reset condition
at the end of `ha_commit_trans()`. The amended reset remains careful to
affect only the stack allocated lock.
A test is added to confirm the fixes with reproducing all stages
described above. In the patch BASE it causes the segfault.
@@new_mode is a set of flags to control introduced features.
Flags are by default off. Setting a flag in @@new_mode will introduce
a new different server behaviour and/or set of features.
We also introduce a new option 'hidden_values' into some system variable
types to hide options that we do not wish to show as options.
- Don't print hidden values in mysqld --help output.
- Make get_options() use the same logic as check_new_mode_value() does.
- Setting @@new_mode=ALL shouldn't give warnings.
We had a protection against it, by allowing versioned delete if:
trx->id != table->vers_start_id()
For replace this check fails: replace calls ha_delete_row(record[2]), but
table->vers_start_id() returns the value from record[0], which is irrelevant.
The same problem hits Field::is_max, which may have checked the wrong record.
Fix:
* Refactor Field::is_max to optionally accept a pointer as an argument.
* Refactor vers_start_id and vers_end_id to always accept a pointer to the
record. there is a difference with is_max is that is_max accepts the pointer to
the
field data, rather than to the record.
Method val_int() would be too effortful to refactor to accept the argument, so
instead the value in record is fetched directly, like it is done in
Field_longlong.
See also MDEV-30046.
Idempotent write_row works same as REPLACE: if there is a duplicating
record in the table, then it will be deleted and re-inserted, with the
same update optimization.
The code in Rows:log_event::write_row was basically copy-pasted from
write_record.
What's done:
REPLACE operation was unified across replication and sql. It is now
representred as a Write_record class, that holds the whole state, and allows
re-using some resources in between the row writes.
Replace, IODKU and single insert implementations are split across different
methods, reluting in a much cleaner code.
The entry point is preserved as a single Write_record::write_record() call.
The implementation to call is chosen on the constructor stage.
This allowed several optimizations to be done:
1. The table key list is not iterated for every row. We find last unique key in
the order of checking once and preserve it across the rows. See last_uniq_key().
2. ib_handler::referenced_by_foreign_key acquires a global lock. This call was
done per row as well. Not all the table config that allows optimized replace is
folded into a single boolean field can_optimize. All the fields to check are
even stored in a single register on a 64-bit platform.
3. DUP_REPLACE and DUP_UPDATE cases now have one less level of indirection
4. modified_non_trans_tables is checked and set only when it's really needed.
5. Obsolete bitmap manipulations are removed.
Also:
* Unify replace initialization step across implementations:
add prepare_for_replace and finalize_replace
* alloca is removed in favor of mem_root allocation. This memory is reused
across the rows.
* An rpl-related callback is added to the replace branch, meaning that an extra
check is made per row replace even for the common case. It can be avoided with
templates if considered a problem.
- Removed duplicate words, like "the the" and "to to"
- Removed duplicate lines (one double sort line found in mysql.cc)
- Fixed some typos found while searching for duplicate words.
Command used to find duplicate words:
egrep -rI "\s([a-zA-Z]+)\s+\1\s" | grep -v param
Thanks to Artjoms Rimdjonoks for the command and pointing out the
spelling errors.
We had a protection against it, by allowing versioned delete if:
trx->id != table->vers_start_id()
For replace this check fails: replace calls ha_delete_row(record[2]), but
table->vers_start_id() returns the value from record[0], which is irrelevant.
The same problem hits Field::is_max, which may have checked the wrong record.
Fix:
* Refactor Field::is_max to optionally accept a pointer as an argument.
* Refactor vers_start_id and vers_end_id to always accept a pointer to the
record. there is a difference with is_max is that is_max accepts the pointer to
the
field data, rather than to the record.
Method val_int() would be too effortful to refactor to accept the argument, so
instead the value in record is fetched directly, like it is done in
Field_longlong.
See also MDEV-30046.
Idempotent write_row works same as REPLACE: if there is a duplicating
record in the table, then it will be deleted and re-inserted, with the
same update optimization.
The code in Rows:log_event::write_row was basically copy-pasted from
write_record.
What's done:
REPLACE operation was unified across replication and sql. It is now
representred as a Write_record class, that holds the whole state, and allows
re-using some resources in between the row writes.
Replace, IODKU and single insert implementations are split across different
methods, reluting in a much cleaner code.
The entry point is preserved as a single Write_record::write_record() call.
The implementation to call is chosen on the constructor stage.
This allowed several optimizations to be done:
1. The table key list is not iterated for every row. We find last unique key in
the order of checking once and preserve it across the rows. See last_uniq_key().
2. ib_handler::referenced_by_foreign_key acquires a global lock. This call was
done per row as well. Not all the table config that allows optimized replace is
folded into a single boolean field can_optimize. All the fields to check are
even stored in a single register on a 64-bit platform.
3. DUP_REPLACE and DUP_UPDATE cases now have one less level of indirection
4. modified_non_trans_tables is checked and set only when it's really needed.
5. Obsolete bitmap manipulations are removed.
Also:
* Unify replace initialization step across implementations:
add prepare_for_replace and finalize_replace
* alloca is removed in favor of mem_root allocation. This memory is reused
across the rows.
* An rpl-related callback is added to the replace branch, meaning that an extra
check is made per row replace even for the common case. It can be avoided with
templates if considered a problem.
in case of a long unique conflict ha_write_row() used delete_row()
to remove the newly inserted row, and it used rnd_pos() to position
the cursor before deletion. This rnd_pos() was freeing and reallocating
blobs in record[0]. So when the code for FOR PORTION OF did
store_record(record[2]);
ha_write_row()
restore_record(record[2]);
it ended up with blob pointers to a freed memory.
Let's use lookup_handler for deletion.
let's disallow UPDATE IGNORE in READ COMMITTED with the table
has UNIQUE constraint that is USING HASH or is WITHOUT OVERLAPS
This rarely-used combination should not block a release, with be
fixed in MDEV-37233
mark creating an index as an SQLCOM_CREATE_INDEX operation.
currently the only effect of this is to tell InnoDB that it's not
a "CREATE TABLE" as such, so not affected by innodb_force_primary_key
followup for 9703c90712 (MDEV-37199 UNIQUE KEY USING HASH accepting duplicate records)
ha_partition can return HA_ERR_KEY_NOT_FOUND even in the middle
of the index scan
followup for 9703c90712 (MDEV-37199 UNIQUE KEY USING HASH accepting duplicate records)
maintain the invariant, that handler::ha_update_row()
is always invoked as handler::ha_update_row(record[0], record[1])
followup for 9703c90712 (MDEV-37199 UNIQUE KEY USING HASH accepting duplicate records)
when looking for long unique duplicates and the new row is already
inserted, we cannot simply "skip one conflict" we must skip exactly
the new row and find a conflict which isn't a new row - otherwise
table->file->dup_ref can be set incorrectly and REPLACE won't work.
Server-level UNIQUE constraints (namely, WITHOUT OVERLAPS and USING HASH)
only worked with InnoDB in REPEATABLE READ isolation mode, when the
constraint was checked first and then the row was inserted or updated.
Gap locks prevented race conditions when a concurrent connection
could've also checked the constraint and inserted/updated a row
at the same time.
In READ COMMITTED there are no gap locks. To avoid race conditions,
we now check the constraint *after* the row operation. This is
enabled by the HA_CHECK_UNIQUE_AFTER_WRITE table flag that InnoDB
sets in the READ COMMITTED transactions.
Checking the constraint after the row operation is more complex.
First, the constraint will see the current (inserted/updated) row,
and needs to skip it. Second, IGNORE operations become tricky,
as we need to revert the insert/update and continue statement execution.
write_row() (INSERT IGNORE) is reverted with delete_row(). Conveniently
it deletes the current row, that is, the last inserted row.
update_row(a,b) (UPDATE IGNORE) is reverted with a reversed update,
update_row(b,a). Conveniently, it updates the current row too.
Except in InnoDB when the PK is updated - in this case InnoDB internally
performs delete+insert, but does not move the cursor, so the "current"
row is the deleted one and the reverse update doesn't work.
This combination now throws an "unsupported" error and will
be fixed in MDEV-37233
Lots of different cases, SELECT, SELECT DEFAULT(),
UPDATE t SET x=DEFAULT, prepares statements,
opening of a table for the I_S, prelocking (so TL_WRITE),
insert with subquery (so SQLCOM_SELECT), etc.
Don't check NEXTVAL privileges in fix_fields() anymore, it cannot
possibly handle all the cases correctly. Make a special method
Item_func_nextval::check_access() for that and invoke it from
* fix_fields on explicit SELECT NEXTVAL()
(but not if NEXTVAL() is used in a DEFAULT clause)
* when DEFAULT bareword in used in, say, UPDATE t SET x=DEFAULT
(but not if DEFAULT() itself is used in a DEFAULT clause)
* in CREATE TABLE
* in ALTER TABLE ALGORITHM=INPLACE (that doesn't go CREATE TABLE path)
* on INSERT
helpers
* Virtual_column_info::check_access() to walk the item tree and invoke
Item::check_access()
* TABLE::check_sequence_privileges() to iterate default expressions
and invoke Virtual_column_info::check_access()
also, single-table UPDATE in prepared statements now associates
value items with fields just as multi-update already did, fixes the
case of PREPARE s "UPDATE t SET x=?"; EXECUTE s USING DEFAULT.
handler::clone() call did not work with read only tables like S3.
It gave a wrong error message (out of memory instead of a permission
error) and aborted the query.
The issue was that the clone call had a wrong parameter to ha_open().
This now fixed. I also changed the clone call to provide the correct
error message if things fails.
This patch fixes an 'out of memory' error when using the S3 engine
for queries that could use multiple indexes together to find the matching
rows, like the following:
SELECT * FROM t1 WHERE key1 = 99 OR key2 = 2
This commit fixes a bug where Aria tables are used in
(master->slave1->slave2) and a backup is taken on slave2. In this case
it is possible that the replication position in the backup, stored in
mysql.gtid_slave_pos, will be wrong. This will lead to replication
errors if one is trying to use the backup as a new slave.
Analyze:
Replicated row events are committed with trans_commit_stmt() and
thd->transaction->all.ha_list != 0.
This means that backup_commit_lock is not taken for Aria tables,
which means the rows are committed and binary logged on the slave
under BLOCK_COMMIT which should not happen.
This issue does not occur on the master as thd->transaction->all.ha_list
is == 0 under AUTO_COMMIT, which sets 'is_real_trans' and 'rw_trans'
which in turn causes backup_commit_lock to be taken.
Fixed by checking in ha_check_and_coalesce_trx_read_only() if all handlers
supports rollback and if not, then wait for BLOCK_COMMIT also for
statement commit.
To check the rows, the table needs to be opened. To that end, and like
MDEV-36038, we force COPY algorithm on ALTER TABLE ... SEQUENCE=1.
This also results in checking the sequence state / metadata.
The table structure was already validated before this patch.
7544fd4cae had to make use of a static array to avoid memory
use-after-free or leak.
Instead, let us make a function returning String, this is the only way
to automatically manage the memory after the function returned.
To make it all correct, move constructor is added. Normally, it is
expected, that the constructor will be elided upon return of an object
by value, but if something goes different, or -fno-elide-constructors is
used, we can have a problem. So this was a move constructor avoids
copy elision-related UB.
dbug_print_row returning char* is still there for convenient use in a
debugger.
Currently execution of commit in one phase proceeds to commit by
engines when binlog_commit() does not succeed.
There are two issues with that:
1. absence of binlog_rollback() or lower-level
`binlog_cache_data::reset()` along the following execution of the
failing statement eventually will raise an assert on non-empty binlog
cache, find in the MDEV description
# --error assert(sql/log.cc:1712(binlog_close_connection))
# --disconnect default
2. engines, including ones that are rollback capable, commit in this
particular error situation.
Both effects can be observed with a new mtr test that would fail when run on
a BASE of this commit.
The BASE has to include MDEV-35207 et all fixes because the test is written
with CREATE-TABLE-SELECTs.
A new test file verifies the new behaviour to rollback including
cases with a side effect of modified non-transactional engine which
expose another MDEV-36027 (TODO: fix).