Timestamp-versioned row deletion was exposed to a collisional problem: if
current timestamp wasn't changed, then a sequence of row delete+insert could
get a duplication error. A row delete would find another conflicting history row
and return an error.
This is true both for REPLACE and DELETE statements, however in REPLACE, the
"optimized" path is usually taken, especially in the tests. There, delete+insert
is substituted for a single versioned row update. In the end, both paths end up
as ha_update_row + ha_write_row.
The solution is to handle a history collision somehow.
From the design perspective, the user shouldn't experience history rows loss,
unless there's a technical limitation.
To the contrary, trxid-based changes should never generate history for the same
transaction, see MDEV-15427.
If two operations on the same row happened too quickly, so that they happen at
the same timestamp, the history row shouldn't be lost. We can still write a
history row, though it'll have row_start == row_end.
We cannot store more than one such historical row, as this will violate the
unique constraint on row_end. So we will have to phisically delete the row if
the history row is already available.
In this commit:
1. Improve TABLE::delete_row to handle the history collision: if an update
results with a duplicate error, delete a row for real.
2. use TABLE::delete_row in a non-optimistic path of REPLACE, where the
system-versioned case now belongs entirely.
We had a protection against it, by allowing versioned delete if:
trx->id != table->vers_start_id()
For replace this check fails: replace calls ha_delete_row(record[2]), but
table->vers_start_id() returns the value from record[0], which is irrelevant.
The same problem hits Field::is_max, which may have checked the wrong record.
Fix:
* Refactor Field::is_max to optionally accept a pointer as an argument.
* Refactor vers_start_id and vers_end_id to always accept a pointer to the
record. there is a difference with is_max is that is_max accepts the pointer to
the
field data, rather than to the record.
Method val_int() would be too effortful to refactor to accept the argument, so
instead the value in record is fetched directly, like it is done in
Field_longlong.
after 633417308f (MDEV-37312) lookup_handler is locked with F_WRLCK,
because it may be used for deleting rows.
And lookup_handler is locked with F_WRLCK after prune_partitions(),
but the main handler is locked before, and might expects all
partitions to be in the read least, non-pruned.
Let's prepare the lookup handler before prune_partitions().
- Removed duplicate words, like "the the" and "to to"
- Removed duplicate lines (one double sort line found in mysql.cc)
- Fixed some typos found while searching for duplicate words.
Command used to find duplicate words:
egrep -rI "\s([a-zA-Z]+)\s+\1\s" | grep -v param
Thanks to Artjoms Rimdjonoks for the command and pointing out the
spelling errors.
Timestamp-versioned row deletion was exposed to a collisional problem: if
current timestamp wasn't changed, then a sequence of row delete+insert could
get a duplication error. A row delete would find another conflicting history row
and return an error.
This is true both for REPLACE and DELETE statements, however in REPLACE, the
"optimized" path is usually taken, especially in the tests. There, delete+insert
is substituted for a single versioned row update. In the end, both paths end up
as ha_update_row + ha_write_row.
The solution is to handle a history collision somehow.
From the design perspective, the user shouldn't experience history rows loss,
unless there's a technical limitation.
To the contrary, trxid-based changes should never generate history for the same
transaction, see MDEV-15427.
If two operations on the same row happened too quickly, so that they happen at
the same timestamp, the history row shouldn't be lost. We can still write a
history row, though it'll have row_start == row_end.
We cannot store more than one such historical row, as this will violate the
unique constraint on row_end. So we will have to phisically delete the row if
the history row is already available.
In this commit:
1. Improve TABLE::delete_row to handle the history collision: if an update
results with a duplicate error, delete a row for real.
2. use TABLE::delete_row in a non-optimistic path of REPLACE, where the
system-versioned case now belongs entirely.
We had a protection against it, by allowing versioned delete if:
trx->id != table->vers_start_id()
For replace this check fails: replace calls ha_delete_row(record[2]), but
table->vers_start_id() returns the value from record[0], which is irrelevant.
The same problem hits Field::is_max, which may have checked the wrong record.
Fix:
* Refactor Field::is_max to optionally accept a pointer as an argument.
* Refactor vers_start_id and vers_end_id to always accept a pointer to the
record. there is a difference with is_max is that is_max accepts the pointer to
the
field data, rather than to the record.
Method val_int() would be too effortful to refactor to accept the argument, so
instead the value in record is fetched directly, like it is done in
Field_longlong.
Multi-delete code invokes ha_rnd_end after applying deferred deletes, following
the pattern as in multi-update.
The CSV engine batches deletes made via calls to delete_row, then applies them
all at once during ha_rnd_end. For each row batched, the CSV engine decrements
an internal counter of the total rows in the table. The multi-delete code was
not calling ha_rnd_end, so this internal counter was not consistent with the
total number of rows in the table, triggering the assertion.
In the CSV engine, explicitly delete the destination file before renaming the
source file over it. This avoids a file rename problem on Windows. This
change would have been necessary before the latest multi-delete changes, had
this test case existed at that point in time, because the end_read_record call
in do_table_deletes swallowed the rename failure on Windows.
replication problems
DELETE HISTORY did not process parameterized PS properly as the
history expression was checked on prepare stage when the parameters
was not yet substituted. In that case check_units() succeeded as there
is no invalid type: Item_param has type_handler_null which is
inherited from string type and this is valid type for history
expression. The warning was thrown when the expression was evaluated
for comparison on delete execution (when the parameter was already
substituted).
The fix postpones check_units() until the first PS execution. We have
to postpone where conditions processing until the first execution and
update select_lex.where on every execution as it is reset to the state
after prepare.
Avoid ASAN failure by collecting statistics from Result objects
before cleaning them up. In related single-table cases, statistics
are maintained directly by the single-table update and delete
functions.
* Let GCC `-Wformat` check formats sent to these `my_vsnprintf_ex` users
* Migrate them from the old extension specifiers
to the new `-Wformat`-compatible suffixes
* rpl.rpl_system_versioning_partitions updated for MDEV-32188
* innodb.row_size_error_log_warnings_3 changed error for MDEV-33658
(checks are done in a different order)
Reintroduces delete_while_scanning optimization for multi_delete.
Reverse some test changes from the initial feature devlopment now
that we delete-on-the-fly once again.
We now allow multitable queries with order by and limit, such as:
delete t1.*, t2.* from t1, t2 order by t1.id desc limit 3;
To predict what rows will be deleted, run the equivalent select:
select t1.*, t2.* from t1, t2 order by t1.id desc limit 3;
Additionally, index hints are now supported with single table delete statements:
delete from t2 use index(xid) order by (id) limit 2;
This approach changes the multi_delete SELECT result interceptor to use a temporary
table to collect row ids pertaining to the rows that will be deleted, rather than
directly deleting rows from the target table(s). Row ids are collected during
send_data, then read during send_eof to delete target rows. In the event that the
temporary table created in memory is not big enough for all matching rows, it is
converted to an aria table.
Other changes:
- Deleting from a sequence now affects zero rows instead of emitting an error
Limitations:
- The federated connector does not create implicit row ids, so we to use a key
when conditionally deleting. See the change in federated_maybe_16324629.test
Implementation of this task adds ability to raise the signal with
SQLSTATE '02TRG' from a BEFORE INSERT/UPDATE/DELETE trigger and handles
this signal as an indicator meaning 'to throw away the current row'
on processing the INSERT/UPDATE/DELETE statement. The signal with
SQLSTATE '02TRG' has special meaning only in case it is raised inside
BEFORE triggers, for AFTER trigger's this value of SQLSTATE isn't treated
in any special way. In according with SQL standard, the SQLSTATE class '02'
means NO DATA and sql_errno for this class is set to value
ER_SIGNAL_NOT_FOUND by current implementation of MariaDB server.
Implementation of this task assigns the value ER_SIGNAL_SKIP_ROW_FROM_TRIGGER
to sql_errno in Diagnostics_area in case the signal is raised from a trigger
and SQLSTATE has value '02TRG'.
To catch signal with SQLTSATE '02TRG' and handle it in special way, the methods
Table_triggers_list::process_triggers
select_insert::store_values
select_create::store_values
Rows_log_event::process_triggers
and the overloaded function
fill_record_n_invoke_before_triggers
were extended with extra out parameter for returning the flag whether
to skip the current values being processed by INSERT/UPDATE/DELETE
statement. This extra parameter is passed as nullptr in case of AFTER trigger
and BEFORE trigger this parameter points to a variable to store a marker
whether to skip the current record or store it by calling write_record().
(Review input addressed)
After this patch, the optimizer can handle virtual column expressions
in WHERE/ON clauses. If the table has an indexed virtual column:
ALTER TABLE t1
ADD COLUMN vcol INT AS (col1+1),
ADD INDEX idx1(vcol);
and the query uses the exact virtual column expression:
SELECT * FROM t1 WHERE col1+1 <= 100
then the optimizer will be able use index idx1 for it.
This is achieved by walking the WHERE/ON clauses and replacing instances
of virtual column expression (like "col1+1" above) with virtual column's
Item_field (like "vcol"). The latter can be processed by the optimizer.
Replacement is considered (and done) only in items that are potentially
usable to the range optimizer.
it's incorrect to zero out table->triggers->extra_null_bitmap
before a statement, because if insert uses an explicit field list
and omits a field that has no default value, the field should
get NULL implicitly. So extra_null_bitmap should have 1s for all
fields that have no defaults
* create extra_null_bitmap_init and initialize it as above
* copy extra_null_bitmap_init to extra_null_bitmap for inserts
* still zero out extra_null_bitmap for updates/deletes where
all fields definitely have a value
* make not_null_fields_have_null_values() to send
ER_NO_DEFAULT_FOR_FIELD for fields with no default and no value,
otherwise creation of a trigger with an empty body would change the
error message
The problems were that:
1) resources was freed "asimetric" normal execution in send_eof,
in case of error in destructor.
2) destructor was not called in case of SP for result objects.
(so if the last SP execution ended with error resorces was not
freeded on reinit before execution (cleanup() called before next
execution) and destructor also was not called due to lack of
delete call for the object)
Result cleanup() renamed to reset_for_next_ps_execution() to better
reflect function().
All result method revised and freeing resources made "symetric".
Destructor of result object called for SP.
Added skipped invalidation in case of error in insert.
Removed misleading naming of reset(thd) (could be mixed with
with reset()).
This bug has the same nature as the issues
MDEV-34718: Trigger doesn't work correctly with bulk update
MDEV-24411: Trigger doesn't work correctly with bulk insert
To fix the issue covering all use cases, resetting the thd->bulk_param
temporary to the value nullptr before invoking triggers and restoring
its original value on finishing execution of a trigger is moved to the method
Table_triggers_list::process_triggers
that be invoked ultimately for any kind of triggers.
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict
The functions queue_compare, qsort2_cmp, and qsort_cmp2
all had similar interfaces, and were used interchangable
and unsafely cast to one another.
This patch consolidates the functions all into the
qsort_cmp2 interface.
Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
create templates
thd->alloc<X>(n) to use instead of (X*)thd->alloc(sizeof(X)*n)
and the same for thd->calloc(). By the default the type is char,
so old usage of thd->alloc(size) works too.
This partially reverts 43623f04a9
Engines have to set ::position() after ::write_row(), otherwise
the server won't be able to refer to the row just inserted.
This is important for high-level indexes.
heap part isn't reverted, so heap doesn't support high-level indexes.
to fix this, it'll need info->lastpos in addition to info->current_ptr