MDEV-20589: Server still crashes in Field::set_warning_truncated_wrong_value
- Use dbug_tmp_use_all_columns() to mark that all fields can be used
- Remove field->is_stat_field (not needed)
- Remove extra arguments to Field::clone() that should not be there
- Safety fix for Field::set_warning_truncated_wrong_value() to not crash
if table is zero in production builds (We have got crashes several times
here so better to be safe than sorry).
- Threat wrong character string warnings identical to other field
conversion warnings. This removes some warnings we before got from
internal conversion errors. There is no good reason why a user would
get an error in case of 'key_field='wrong-utf8-string' but not for
'field=wrong-utf8-string'. The old code could also easily give
thousands of no-sence warnings for one single statement.
The flag is_stat_field is not set for the min_value and max_value of field items
inside table share. This is a must requirement as we don't want to throw
warnings of truncation when we read values from the statistics table to the column
statistics of table share fields.
The MDEV-20265 commit e746f451d57def4be679caafc29976741b3e89f7
introduces DBUG_ASSERT(right_op == r_tbl) in
st_select_lex::add_cross_joined_table(), and that assertion would
fail in several tests that exercise joins. That commit was skipped
in this merge, and a separate fix of MDEV-20265 will be necessary in 10.4.
For MDEV-15955, the fix in create_tmp_field_from_item() would cause a
compilation error. After a discussion with Alexander Barkov, the fix
was omitted and only the test case was kept.
In 10.3 and later, MDEV-15955 is fixed properly by overriding
create_tmp_field() in Item_func_user_var.
A histogram size that is odd in size with DOUBLE precision will leave the last
byte unwritten. When collecting histograms, this causes the last byte to
be uninitialized in the record. memset the buffer to 0 first to make
sure this does not happen.
if columns or indexes are modified/renamed/dropped in an ALTER TABLE,
stat tables must be updated accordingly (e.g. all statistics for a column
should be dropped). But if a stat table doesn't exist, it's not a reason
to fail the whole ALTER TABLE operation - such an error should be ignored.
To read histograms for a table, we should check if the allocation of statistics was done or not,
if not done we should not try to read histograms for such a table.
The command SHOW INDEXES ignored setting of the system variable
use_stat_tables to the value of 'preferably' and and showed statistical
data received from the engine. Similarly queries over the table
STATISTICS from INFORMATION_SCHEMA ignored this setting. It happened
because the function fill_schema_table_by_open() did not read any data
from statistical tables.
For single table updates and multi-table updates , engine independent statistics were not being
read even if the statistics were collected.
Fixed it, so when the optimizer_use_condition_selectivity > 2 then we would read the available
statistics for update queries.
To fix the crash there we need to make sure that the
server while storing the statistical values in statistical tables should do it
in a multi-byte safe way.
Also there is no need to throw warnings if there is truncation while storing
values from statistical fields.
and, again, *don't use thd->clear_error()*
this fixed main.sp_notembedded failure on various amd64 platforms
(where ER_STACK_OVERRUN_NEED_MORE happens to fire in open_stat_tables()
under Dummy_error_handler)
The variable controls the amount of sampling analyze table performs.
If ANALYZE table with histogram collection is too slow, one can reduce the
time taken by setting analyze_sample_percentage to a lower value of the
total number of rows.
Setting it to 0 will use a formula to compute how many rows to sample:
The number of rows collected is capped to a minimum of 50000 and
increases logarithmically with a coffecient of 4096. The coffecient is
chosen so that we expect an error of less than 3% in our estimations
according to the paper:
"Random Sampling for Histogram Construction: How much is enough?”
– Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya, ACM SIGMOD, 1998.
The drawback of sampling is that avg_frequency number is computed
imprecisely and will yeild a smaller number than the real one.
The add method does not need to provide the row order number. It was
only used to detect if the minimum/maximum value was populated once or not, so
as to force an update for the first encounter of a value.
Also, apply the MDEV-17957 changes to encrypted page checksums,
and remove error message output from the checksum function,
because these messages would be useless noise when mariabackup
is retrying reads of corrupted-looking pages, and not that
useful during normal server operation either.
The error messages in fil_space_verify_crypt_checksum()
should be refactored separately.
- Call delete_statistics_tables() after lock_table_names in drop tables.
This avoids a deadlock issue with FTWRL and future backup locks.
- Added some missing clear_error()
- Ensure we don't clear error caused by the caller
- Updated function comments
Added to new values to the server variable use_stat_tables.
The values are COMPLEMENTARY_FOR_QUERIES and PREFERABLY_FOR_QUERIES.
Both these values don't allow to collect EITS for queries like
analyze table t1;
To collect EITS we would need to use the syntax with persistent like
analyze table t1 persistent for columns (col1,col2...) index (idx1, idx2...) / ALL
Changing the default value from NEVER to PREFERABLY_FOR_QUERIES.
The problem here is EITS statistics does not calculate statistics for the partitions of the table.
So a temporary solution would be to not read EITS statistics for partitioned tables.
Also disabling reading of EITS for columns that participate in the partition list of a table.