This commit adds a new feature to the server to add comments at the database
level. 1024 bytes is the maximum comment length allowed. If the comment length
exceeds this limit, a new error/warning code 4144 is thrown, based on whether
thd->is_strict_mode() is true/false. The database comment is also added to the
db.opt file, as well as to the information_schema.schemata table.
So to push index condition for each join tab we have calculate the index condition that can be pushed and then
remove this index condition from the original condition. This is done through the function make_cond_remainder.
The problem is the function make_cond_remainder does not remove index condition when there is an OR operator.
Fixed this by making the function make_cond_remainder to keep in mind of the OR operator.
Also updated results for multiple test files which were incorrectly updated by the commit e0c1b3f242
code which was supposed to remove the condition present in the index
condition was not getting executed when the condition had OR operator, with AND the pushed
index condition was getting removed from where.
This problem affects all versions starting from 5.5 but this is a performance improvement, so fixing it in 10.4
Parenthesis around table names and derived tables should be allowed
in FROM clauses and some other context as it was in earlier versions.
Returned test queries that used such parenthesis in 10.3 to their
original form. Adjusted test results accordingly.
Windows does atomic writes, as long as they are aligned and multiple
of sector size. this is documented in MSDN.
Fix innodb.doublewrite test to always use doublewrite buffer,
(even if atomic writes are autodetected)
If required privilege is missing, dump the output from "SHOW GRANTS"
into mariabackup log.
This will help troubleshooting, and make the bug reproducible.
The statement
SET GLOBAL innodb_encryption_rotate_key_age=0;
would have the unwanted side effect that ENCRYPTION=DEFAULT tablespaces
would no longer be encrypted or decrypted according to the setting of
innodb_encrypt_tables.
We implement a trigger, so that whenever one of the following is executed:
SET GLOBAL innodb_encrypt_tables=OFF;
SET GLOBAL innodb_encrypt_tables=ON;
SET GLOBAL innodb_encrypt_tables=FORCE;
all wrong-state ENCRYPTION=DEFAULT tablespaces will be added to
fil_system_t::rotation_list, so that the encryption will be added
or removed.
Note: This will *NOT* happen automatically after a server restart.
Before reading the first page of a data file, InnoDB cannot know
the encryption status of the data file. The statement
SET GLOBAL innodb_encrypt_tables will have the side effect that
all not-yet-read InnoDB data files will be accessed in order to
determine the encryption status.
innodb_encrypt_tables_validate(): Stop disallowing
SET GLOBAL innodb_encrypt_tables when innodb_encryption_rotate_key_age=0.
This reverts part of commit 50eb40a2a8
that addressed MDEV-11738 and MDEV-11581.
fil_system_t::read_page0(): Trigger a call to fil_node_t::read_page0().
Refactored from fil_space_get_space().
fil_crypt_rotation_list_fill(): If innodb_encryption_rotate_key_age=0,
initialize fil_system->rotation_list. This is invoked both on
SET GLOBAL innodb_encrypt_tables and
on SET GLOBAL innodb_encryption_rotate_key_age=0.
fil_space_set_crypt_data(): Remove.
fil_parse_write_crypt_data(): Simplify the logic.
This is joint work with Marko Mäkelä.
Problem:
========
The mysqlbinlog tool is leaking memory, causing failures in various tests when
compiling and testing with AddressSanitizer or LeakSanitizer like this:
cmake -DCMAKE_BUILD_TYPE=Debug -DWITH_ASAN:BOOL=ON /path/to/source
make -j$(nproc)
cd mysql-test
ASAN_OPTIONS=abort_on_error=1 ./mtr --parallel=auto
Analysis:
=========
Two types of leaks were observed during above execution.
1) Leak in Log_event::read_log_event(char const*, unsigned int, char const**,
Format_description_log_event const*, char)
File: sql/log_event.cc:2150
For all row based replication events the memory which is allocated during
read_log_event is not freed after the event is processed. The event specific
memory has to be retained only when flashback option is enabled with
mysqlbinlog tool. In this case all the events are retained till the end
statement is received and they are processed in reverse order and they are
destroyed. But in the existing code all events are retained irrespective of
flashback mode. Hence the memory leaks are observed.
2) read_remote_annotate_event(unsigned char*, unsigned long, char const**)
File: client/mysqlbinlog.cc:194
In general the Annotate event is not printed immediately because all
subsequent rbr-events can be filtered away. Instead it will be printed
together with the first not filtered away Table map or the last rbr will be
processed. While reading remote annotate events memory is allocated for event
buffer and event's temp_buf is made to point to the allocated buffer as shown
below. The TRUE flag is used for doing proper cleanup using free_temp_buf().
i.e at the time of deletion of annotate event its destructor takes care of
clearing the temp_buf.
/*
Ensure the event->temp_buf is pointing to the allocated buffer.
(TRUE = free temp_buf on the event deletion)
*/
event->register_temp_buf((char*)event_buf, TRUE);
But existing code does the following when it receives a remote annotate_event.
if (remote_opt)
ev->temp_buf= 0;
That is code immediately sets temp_buf=0, because of which free_temp_buf()
call will return empty handed as it has lost the reference to the allocated
temporary buffer. This results in memory leak
Fix:
====
1) If not in flashback mode, destroy the memory for events once they are
processed.
2) Remove the ev->temp_buf=0 code for remote option. Let the proper cleanup to
be done as part of free_temp_buf().
InnoDB could return the same list again and again if the buffer
passed to trx_recover_for_mysql() is smaller than the number of
transactions that InnoDB recovered in XA PREPARE state.
We introduce the transaction state TRX_PREPARED_RECOVERED, which
is like TRX_PREPARED, but will be set during trx_recover_for_mysql()
so that each transaction will only be returned once.
Because init_server_components() is invoking ha_recover() twice,
we must reset the state of the transactions back to TRX_PREPARED
after returning the complete list, so that repeated traversals
will see the complete list again, instead of seeing an empty list.
Without this tweak, the test main.tc_heuristic_recover would hang
in MariaDB 10.1.
dict_create_foreign_constraints_low(): Tolerate the keywords
IGNORE and ONLINE between the keywords ALTER and TABLE.
We should really remove the hacky FOREIGN KEY constraint parser
from InnoDB.
The patch fixes a fired assert in the semisync master module. The assert
caught attempt to switch semisync off (per rpl_semi_sync_master_wait_no_slave = OFF)
when it was not even initialized (per rpl_semi_sync_master_enabled = OFF).
The switching-off execution branch is relocated under one that executes
enable_master() first.
A minor cleaup is done to remove the int return from two functions that
did not return anything but an error which could not happen in the functions.
Archive storage engine assumed that any query that attempts to read from
the table will call ha_archive::info() beforehand. ha_archive would flush
un-written data in that call (this would make it visible for the reads).
Break this assumption. Flush the data when the table is opened for reading.
This way, one can do multiple write statements without causing a flush, but
as soon as we might need the data, we flush it.
DROP DATABASE would internally execute DROP TABLE on every contained
table and finally remove the directory. In InnoDB, DROP TABLE is
sometimes executed in the background. The table would be renamed to
a name that starts with #sql. The existence of these files would
prevent DROP DATABASE from succeeding.
CREATE OR REPLACE DATABASE can internally execute DROP DATABASE if
the directory already exists. This could fail due to the InnoDB
background DROP TABLE, possibly due to some tables that were
leftovers from earlier tests.
This patch was originally made by Anel Husakovic.
Skip `PACKAGE` and `PACKAGE BODY` records quickly.
These stored objects do not have any parameters or return values
(only procedures and functions have).
So no needs to build a `CREATE` statement
(in `Sp_handler::sp_load_for_information_schema()`) and parse it:
this won't give us any data useful for `INFORMATION_SCHEMA.PARAMETERS`.
InnoDB crash recovery used to read every data page for which
redo log exists. This is unnecessary for those pages that are
initialized by the redo log. If a newly created page is corrupted,
recovery could unnecessarily fail. It would suffice to reinitialize
the page based on the redo log records.
To add insult to injury, InnoDB crash recovery could hang if it
encountered a corrupted page. We will fix also that problem.
InnoDB would normally refuse to start up if it encounters a
corrupted page on recovery, but that can be overridden by
setting innodb_force_recovery=1.
Data pages are completely initialized by the records
MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS.
MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE,
which notifies that a page has been freed and its contents
can be discarded (filled with zeroes).
The record MLOG_INDEX_LOAD notifies that redo logging has
been re-enabled after being disabled. We can avoid loading
the page if all buffered redo log records predate the
MLOG_INDEX_LOAD record.
For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD
records were written before commit aa3f7a107c.
Hence, we will skip these optimizations for tables whose
name starts with FTS_.
This is joint work with Thirunarayanan Balathandayuthapani.
fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the
latest recovered MLOG_INDEX_LOAD record for a tablespace.
mlog_init: Page initialization operations discovered during
redo log scanning. FIXME: This really belongs in recv_sys->addr_hash,
and should be removed in MDEV-19176.
recv_addr_state: Add the new state RECV_WILL_NOT_READ to
indicate that according to mlog_init, the page will be
initialized based on redo log record contents.
recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state
if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS
as page initialization. This works around bugs in the crash
recovery of ROW_FORMAT=COMPRESSED tables.
recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record
by resetting the state to RECV_NOT_PROCESSED and by updating
the fil_name_t::enable_lsn.
recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn
to fil_space_t::enable_lsn.
recv_recover_page(): Add the parameter init_lsn, to ignore
any log records that precede the page initialization.
Add DBUG output about skipped operations.
buf_page_create(): Initialize FIL_PAGE_LSN, so that
recv_recover_page() will not wrongly skip applying
the page-initialization record due to the field containing
some newer LSN as a leftover from a different page.
Do not invoke ibuf_merge_or_delete_for_page() during
crash recovery.
recv_apply_hashed_log_recs(): Remove some unnecessary lookups.
Note if a corrupted page was found during recovery.
After invoking buf_page_create(), do invoke
ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge()
in the last recovery batch.
ibuf_merge_or_delete_for_page(): Relax a debug assertion.
innobase_start_or_create_for_mysql(): Abort startup if
a corrupted page was found during recovery. Corrupted pages
will not be flagged if innodb_force_recovery is set.
However, the recv_sys->found_corrupt_fs flag can be set
regardless of innodb_force_recovery if file names are found
to be incorrect (for example, multiple files with the same
tablespace ID).
Similar to what was done in commit aa3f7a107c
for FULLTEXT INDEX, we must ensure that MLOG_INDEX_LOAD records will always
be written if redo logging was disabled.
row_merge_build_indexes(): Invoke row_merge_write_redo() also when
online operation is not being executed or an error occurs.
In case of an error, invoke flush_observer->interrupted() so that
the pages will not be flushed but merely evicted from the buffer pool.
Before resuming redo logging, it is crucial for the correctness of
mariabackup and InnoDB crash recovery to flush or evict all affected pages
and to write MLOG_INDEX_LOAD records.
With the MDEV-15562 instant DROP COLUMN, clustered index records
will contain traces of dropped columns, as follows:
In ROW_FORMAT=REDUNDANT, dropped columns will be stored as 0 bytes,
but they will consume 1 or 2 bytes per column in the record header.
In ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC, dropped columns will
be stored as NULL if allowed. This will consume 1 bit per nullable
column.
In ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC, dropped NOT NULL columns
will be stored as 0 bytes if allowed. This will consume 1 byte per
NOT NULL variable-length column. Fixed-length columns will be stored
using the fixed number of bytes.
The metadata record will be 20 bytes larger than user records, because
it will contain a metadata BLOB pointer.
We must refuse ALGORITHM=INSTANT (and require a table rebuild) if
the metadata record would grow too big to fit in the index page.
If SQL_MODE includes STRICT_TRANS_TABLES or STRICT_ALL_TABLES, we should
refuse ALGORITHM=INSTANT if the maximum length of user records would
exceed the maximum size of an index page, similar to what
row_create_index_for_mysql() does during CREATE TABLE. This limit
would kick in when the default values for any instantly added columns
in the metadata record are NULL or short, but the allowed maximum values
are long.
instant_alter_column_possible(): Add the parameter "bool strict" to
enable checks for the user record size, and always check the metadata
record size.