Problem:
Queries like this showed performance degratation in 10.4 over 10.3:
SELECT temporal_literal FROM t1;
SELECT temporal_literal + 1 FROM t1;
SELECT COUNT(*) FROM t1 WHERE temporal_column = temporal_literal;
SELECT COUNT(*) FROM t1 WHERE temporal_column = string_literal;
Fix:
Replacing the universal member "MYSQL_TIME cached_time" in
Item_temporal_literal to data type specific containers:
- Date in Item_date_literal
- Time in Item_time_literal
- Datetime in Item_datetime_literal
This restores the performance, and make it even better in some cases.
See benchmark results in MDEV.
Also, this change makes futher separations of Date, Time, Datetime
from each other, which will make it possible not to derive them from
a too heavy (40 bytes) MYSQL_TIME, and replace them to smaller data
type specific containers.
* Allocate items on thd->mem_root while refixing vcol exprs
* Make vcol tree changes register and roll them back after the statement is executed.
Explanation:
Due to collation implementation specifics an Item tree could change while fixing.
The tricky thing here is to make it on a proper arena.
It's usually not a problem when a field is deterministic, however, makes a pain vice-versa, during allocation allocating.
A non-deterministic field should be refixed on each statement, since it depends on the environment state.
Changing the tree will be temporary and therefore it should be reverted after the statement execution.
Fix stale virtual field value in 4 cases: when virtual field depends
on row_start/row_end in timestamp/trx_id versioned table. row_start
dep is recalculated in vers_update_fields() (SQL and InnoDB
layer). row_end dep is recalculated on history row insert.
In AddressSanitizer, we only want memory poisoning to happen
in connection with custom memory allocation or freeing.
The primary use of MEM_UNDEFINED is for declaring memory uninitialized
in Valgrind or MemorySanitizer. We do not want MEM_UNDEFINED to
have the unwanted side effect that AddressSanitizer would no longer
be able to complain about accessing unallocated memory.
MEM_UNDEFINED(): Define as no-op for AddressSanitizer.
MEM_MAKE_ADDRESSABLE(): Define as MEM_UNDEFINED() or
ASAN_UNPOISON_MEMORY_REGION().
MEM_CHECK_ADDRESSABLE(): Wrap also __asan_region_is_poisoned().
- Some of the bug fixes are backports from 10.5!
- The fix in innobase/fil/fil0fil.cc is just a backport to get less
error messages in mysqld.1.err when running with valgrind.
- Renamed HAVE_valgrind_or_MSAN to HAVE_valgrind
- Removed not needed bzero in void TABLE::initialize_quick_structures().
- Replaced bzero with TRASH_ALLOC() to have this change verfied with
memory checkers
- Added missing table->quick_keys.is_set in table_cond_selectivity()
Make sure to initialize members of TABLE::reginfo when TABLE::init is called. In this case the problem
was that table->reginfo.join_tab was set for the SELECT query and then was reused by the UPDATE query.
This case occurred only when the SELECT query had a degenerate join.
Problem:- Calling mark_columns_per_binlog_row_image() earlier may change the
result of mark_virtual_columns_for_write() , Since it can set the bitmap on
for virtual column, and henceforth mark_virtual_column_deps(field) will
never be called in mark_virtual_column_with_deps.
This bug is not specific for long unique, It also fails for this case
create table t2(id int primary key, a blob, b varchar(20) as (LEFT(a,2)));
Previously multiple threads were allowed to load histograms concurrently.
There were no known problems caused by this. But given amount of data
races in this code, it'd happen sooner or later.
To avoid scalability bottleneck, histograms loading is protected by
per-TABLE_SHARE atomic variable.
Whenever histograms were loaded by preceding statement (hot-path), a
scalable load-acquire check is performed.
Whenever histograms have to be loaded anew, mutual exclusion for loaders
is established by atomic variable. If histograms are being loaded
concurrently, statement waits until load is completed.
- Table_statistics::total_hist_size moved to TABLE_STATISTICS_CB: only
meaningful within TABLE_SHARE (not used for collected stats).
- TABLE_STATISTICS_CB::histograms_can_be_read and
TABLE_STATISTICS_CB::histograms_are_read are replaced with a tri state
atomic variable.
- Simplified away alloc_histograms_for_table_share().
Note: there's still likely a data race if a thread attempts accessing
histograms data after it failed to load it (because of concurrent load).
It was there previously and goes out of the scope of this effort. One way
of fixing it could be reviving TABLE::histograms_are_read and adding
appropriate checks whenever it is needed.
Part of MDEV-19061 - table_share used for reading statistical tables is
not protected
Previously multiple threads were allowed to load statistics concurrently.
There were no known problems caused by this. But given amount of data
races in this code, it'd happen sooner or later.
To avoid scalability bottleneck, statistics loading is protected by
per-TABLE_SHARE atomic variable.
Whenever statistics were loaded by preceding statement (hot-path), a
scalable load-acquire check is performed.
Whenever statistics have to be loaded anew, mutual exclusion for loaders
is established by atomic variable. If statistics are being loaded
concurrently, statement waits until load is completed.
TABLE_STATISTICS_CB::stats_can_be_read and
TABLE_STATISTICS_CB::stats_is_read are replaced with a tri state atomic
variable.
Part of MDEV-19061 - table_share used for reading statistical tables is
not protected
Respect system fields in NO_ZERO_DATE mode.
This is the subject for refactoring in MDEV-19597
Conflict resolution from 7d5223310789f967106d86ce193ef31b315ecff0
update_virtual_field() is called as part of index rebuild in
ha_myisam::repair() (MDEV-5800) which is done on bulk INSERT finish.
Assertion in update_virtual_field() was put as part of MDEV-16222
because update_virtual_field() returns in_use->is_error(). The idea:
wrongly mixed semantics of error status before update_virtual_field()
and the status returned by update_virtual_field(). The former can
falsely influence the latter.
MDEV-21398 Deadlock (server hang) or assertion failure in
Diagnostics_area::set_error_status upon ALTER under lock
This failure could only happen if one locked the same table
multiple times and then did an ALTER TABLE on the table.
Major change is to change all instances of
table->m_needs_reopen= true;
to
table->mark_table_for_reopen();
The main fix for the problem was to ensure that we mark all
instances of the table in the locked_table_list and when we
reopen the tables, we first close all tables before reopening
and locking them.
Other things:
- Don't call thd->locked_tables_list.reopen_tables if there
are no tables marked for reopen. (performance)
The code incorrectly assumed in multiple places that TYPELIB
values cannot have 0x00 bytes inside. In fact they can:
CREATE TABLE t1 (a ENUM(0x61, 0x0062) CHARACTER SET BINARY);
Note, the TYPELIB value encoding used in FRM is ambiguous about 0x00.
So this fix is partial.
It fixes 0x00 bytes in many (but not all) places:
- In the middle or in the end of a value:
CREATE TABLE t1 (a ENUM(0x6100) ...);
CREATE TABLE t1 (a ENUM(0x610062) ...);
- In the beginning of the first value:
CREATE TABLE t1 (a ENUM(0x0061));
CREATE TABLE t1 (a ENUM(0x0061), b ENUM('b'));
- In the beginning of the second (and following) value of the *last* ENUM/SET
in the table:
CREATE TABLE t1 (a ENUM('a',0x0061));
CREATE TABLE t1 (a ENUM('a'), b ENUM('b',0x0061));
However, it does not fix 0x00 when:
- 0x00 byte is in the beginning of a value of a non-last ENUM/SET
causes an error:
CREATE TABLE t1 (a ENUM('a',0x0061), b ENUM('b'));
ERROR 1033 (HY000): Incorrect information in file: './test/t1.frm'
This is an ambuguous case and will be fixed separately.
We need a new TYPELIB encoding to fix this.
Details:
- unireg.cc
The function pack_header() incorrectly used strlen() to detect
a TYPELIB value length. Adding a new function typelib_values_packed_length()
which uses TYPELIB::type_lengths[n] to detect the n-th value length,
and reusing the new function in pack_header() and packed_fields_length()
- table.cc
fix_type_pointers() assumed in multiple places that values cannot have
0x00 inside and used strlen(TYPELIB::type_names[n]) to set
the corresponding TYPELIB::type_lengths[n].
Also, fix_type_pointers() did not check the encoded data for consistency.
Rewriting fix_type_pointers() code to populate TYPELIB::type_names[n] and
TYPELIB::type_lengths[n] at the same time, so no additional loop
with strlen() is needed any more.
Adding many data consistency tests.
Fixing the main loop in fix_type_pointers() to use memchr() instead of
strchr() to handle 0x00 properly.
Fixing create_key_infos() to return the result in a LEX_STRING rather
that in a char*.
compare_keys_but_name(): do not use KEY_PART_INFO::field for
Field::is_equal(). Following the logic of that code we need to
compare fields of a table. But KEY_PART_INFO::field sometimes
(when key part is shorter than table field) is a different field.
In that case Field::is_equal() returns incorrect result and
problems occur.
KEY_PART_INFO::field may become some strange field in
open_frm_error open_table_from_share(). I think this is an
incorrect logic, some tecnhical debt. I'm not fixing it right now,
because I don't have time. But I'm making Field::field_length
a const class member. Then, the only fishy code which changed that
field requires now a const_cast<>. I'm bringing attention to that
code with it. This change should not affect logic of the
program in any way.
"write set" for replication finally got its correct place
(mark_columns_per_binlog_row_image()). When done generally in
mark_columns_needed_for_update() it affects optimization
algorithm. used_key_is_modified, query_plan.using_io_buffer are
wrongly set and that leads to wrong prepare_for_keyread() which limits
read_set.
Turn read cache off for update and multi-update for versioned
table. no_cache is reinited on each TABLE open because it is
applicable for specific algorithms.
As a side fix vers_insert_history_row() honors vers_write setting.
Aria with row_format=fixed uses IO_CACHE of type READ_CACHE for
sequential read in update loop. When history row is inserted inside
this loop the cache misses it and fails with error.
TODO:
Currently maria_extra() does not support SEQ_READ_APPEND. Probably it
might be possible to use this type of cache.
MDEV-18957 UPDATE with LIMIT clause is wrong for versioned partitioned tables
UPDATE, DELETE: replace linear search of current/historical records
with vers_setup_conds().
Additional DML cases in view.test
The issue here is the wrong estimate of the cardinality of a partial join,
the cardinality is too high because the function table_cond_selectivity()
returns an absurd number 100 while selectivity cannot be greater than 1.
When accessing table t by outer reference t1.a via index we do not perform any
range analysis for t. Yet we see TABLE::quick_key_parts[key] and
TABLE->quick_rows[key] contain a non-zero value though these should have been
remained untouched and equal to 0.
Thus real cause of the problem is that TABLE::init does not clean the arrays
TABLE::quick_key_parts[] and TABLE::>quick_rows[].
It should have done it because the TABLE structure created for any
instance of a table can be reused for many queries.
- Any temporary tables created under read-only mode will never be logged
to binary log. Any usage of these tables to update normal tables, even
after read-only has been disabled, will use row base logging (as the
temporary table will not be on the slave).
- Analyze, check and repair table will not be logged in read-only mode.
Other things:
- Removed not used varaibles in
MYSQL_BIN_LOG::flush_and_set_pending_rows_event.
- Set table_share->table_creation_was_logged for all normal tables.
- THD::binlog_query() now returns -1 if statement was not logged., This
is used to update table_share->table_creation_was_logged.
- Don't log admin statements in opt_readonly is set.
- Table's that doesn't have table_creation_was_logged will set binlog format to row
logging.
- Removed not needed/wrong setting of table->s->table_creation_was_logged
in create_table_from_items()