1
0
mirror of https://github.com/MariaDB/server.git synced 2025-11-30 05:23:50 +03:00
Commit Graph

2052 Commits

Author SHA1 Message Date
Marko Mäkelä
8a346f31b9 MDEV-17073 INSERT…ON DUPLICATE KEY UPDATE became more deadlock-prone
thd_rpl_stmt_based(): A new predicate to check if statement-based
replication is active. (This can also hold when replication is not
in use, but binlog is.)

que_thr_stop(), row_ins_duplicate_error_in_clust(),
row_ins_sec_index_entry_low(), row_ins(): On a duplicate key error,
only lock all index records when statement-based replication is in use.
2018-11-02 18:52:39 +02:00
Sergei Golubchik
a6e0000494 Merge branch '10.0' into 10.1 2018-10-31 10:53:22 +01:00
Sergei Golubchik
44f6f44593 Merge branch '10.0' into 10.1 2018-10-30 15:10:01 +01:00
Marko Mäkelä
dc91ea5bb7 MDEV-12023 Assertion failure sym_node->table != NULL on startup
row_drop_table_for_mysql(): Avoid accessing non-existing dictionary tables.

dict_create_or_check_foreign_constraint_tables(): Add debug instrumentation
for creating and dropping a table before the creation of any non-core
dictionary tables.

trx_purge_add_update_undo_to_history(): Adjust a debug assertion, so that
it will not fail due to the test instrumentation.
2018-10-30 15:53:55 +02:00
Marko Mäkelä
6ced789186 MDEV-12023 Assertion failure sym_node->table != NULL on startup
row_drop_table_for_mysql(): Avoid accessing non-existing dictionary tables.

dict_create_or_check_foreign_constraint_tables(): Add debug instrumentation
for creating and dropping a table before the creation of any non-core
dictionary tables.

trx_purge_add_update_undo_to_history(): Adjust a debug assertion, so that
it will not fail due to the test instrumentation.
2018-10-30 13:29:19 +02:00
Marko Mäkelä
30c3d6db32 MDEV-17533 Merge new release of InnoDB 5.6.42 to 10.0
Also, add a test for a bug that does not seem to affect MariaDB.
2018-10-25 13:05:23 +03:00
Marko Mäkelä
2549f98289 MDEV-17532 Performance_schema reports wrong directory for the temporary files of ALTER TABLE…ALGORITHM=INPLACE
row_merge_file_create_low(): Pass the directory of the temporary file
to the PSI_FILE_CALL.
2018-10-25 13:04:41 +03:00
Marko Mäkelä
5dd3b52f95 MDEV-17531 Crash in RENAME TABLE with FOREIGN KEY and FULLTEXT INDEX
In RENAME TABLE, when an error occurs while renaming FOREIGN KEY
constraint, that error would be overwritten when renaming the
InnoDB internal tables related to FULLTEXT INDEX.

row_rename_table_for_mysql(): Do not attempt to rename the internal
tables if an error already occurred.

This problem was originally reported as Oracle Bug#27545888.
2018-10-25 13:03:29 +03:00
Marko Mäkelä
a21e01a53d MDEV-17541 KILL QUERY during lock wait in FOREIGN KEY check causes hang
row_ins_check_foreign_constraint(): Do not overwrite hard errors
with the soft error DB_LOCK_WAIT. This prevents an infinite
wait loop when DB_INTERRUPTED was returned. For DB_LOCK_WAIT,
row_insert_for_mysql() would keep invoking row_ins_step() and the
transaction would remain active until the server shutdown is initiated.
2018-10-25 10:35:46 +03:00
Marko Mäkelä
0e5a4ac253 MDEV-15662 Instant DROP COLUMN or changing the order of columns
Allow ADD COLUMN anywhere in a table, not only adding as the
last column.

Allow instant DROP COLUMN and instant changing the order of columns.

The added columns will always be added last in clustered index records.
In new records, instantly dropped columns will be stored as NULL or
empty when possible.

Information about dropped and reordered columns will be written in
a metadata BLOB (mblob), which is stored before the first 'user' field
in the hidden metadata record at the start of the clustered index.
The presence of mblob is indicated by setting the delete-mark flag in
the metadata record.

The metadata BLOB stores the number of clustered index fields,
followed by an array of column information for each field.
For dropped columns, we store the NOT NULL flag, the fixed length,
and for variable-length columns, whether the maximum length exceeded
255 bytes. For non-dropped columns, we store the column position.

Unlike with MDEV-11369, when a table becomes empty, it cannot
be converted back to the canonical format. The reason for this is
that other threads may hold cached objects such as
row_prebuilt_t::ins_node that could refer to dropped or reordered
index fields.

For instant DROP COLUMN and ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC,
we must store the n_core_null_bytes in the root page, so that the
chain of node pointer records can be followed in order to reach the
leftmost leaf page where the metadata record is located.
If the mblob is present, we will zero-initialize the strings
"infimum" and "supremum" in the root page, and use the last byte of
"supremum" for storing the number of null bytes (which are allocated
but useless on node pointer pages). This is necessary for
btr_cur_instant_init_metadata() to be able to navigate to the mblob.

If the PRIMARY KEY contains any variable-length column and some
nullable columns were instantly dropped, the dict_index_t::n_nullable
in the data dictionary could be smaller than it actually is in the
non-leaf pages. Because of this, the non-leaf pages could use more
bytes for the null flags than the data dictionary expects, and we
could be reading the lengths of the variable-length columns from the
wrong offset, and thus reading the child page number from wrong place.
This is the result of two design mistakes that involve unnecessary
storage of data: First, it is nonsense to store any data fields for
the leftmost node pointer records, because the comparisons would be
resolved by the MIN_REC_FLAG alone. Second, there cannot be any null
fields in the clustered index node pointer fields, but we nevertheless
reserve space for all the null flags.

Limitations (future work):

MDEV-17459 Allow instant ALTER TABLE even if FULLTEXT INDEX exists
MDEV-17468 Avoid table rebuild on operations on generated columns
MDEV-17494 Refuse ALGORITHM=INSTANT when the row size is too large

btr_page_reorganize_low(): Preserve any metadata in the root page.
Call lock_move_reorganize_page() only after restoring the "infimum"
and "supremum" records, to avoid a memcmp() assertion failure.

dict_col_t::DROPPED: Magic value for dict_col_t::ind.

dict_col_t::clear_instant(): Renamed from dict_col_t::remove_instant().
Do not assert that the column was instantly added, because we
sometimes call this unconditionally for all columns.
Convert an instantly added column to a "core column". The old name
remove_instant() could be mistaken to refer to "instant DROP COLUMN".

dict_col_t::is_added(): Rename from dict_col_t::is_instant().

dtype_t::metadata_blob_init(): Initialize the mblob data type.

dtuple_t::is_metadata(), dtuple_t::is_alter_metadata(),
upd_t::is_metadata(), upd_t::is_alter_metadata(): Check if info_bits
refer to a metadata record.

dict_table_t::instant: Metadata about dropped or reordered columns.

dict_table_t::prepare_instant(): Prepare
ha_innobase_inplace_ctx::instant_table for instant ALTER TABLE.
innobase_instant_try() will pass this to dict_table_t::instant_column().
On rollback, dict_table_t::rollback_instant() will be called.

dict_table_t::instant_column(): Renamed from instant_add_column().
Add the parameter col_map so that columns can be reordered.
Copy and adjust v_cols[] as well.

dict_table_t::find(): Find an old column based on a new column number.

dict_table_t::serialise_columns(), dict_table_t::deserialise_columns():
Convert the mblob.

dict_index_t::instant_metadata(): Create the metadata record
for instant ALTER TABLE. Invoke dict_table_t::serialise_columns().

dict_index_t::reconstruct_fields(): Invoked by
dict_table_t::deserialise_columns().

dict_index_t::clear_instant_alter(): Move the fields for the
dropped columns to the end, and sort the surviving index fields
in ascending order of column position.

ha_innobase::check_if_supported_inplace_alter(): Do not allow
adding a FTS_DOC_ID column if a hidden FTS_DOC_ID column exists
due to FULLTEXT INDEX. (This always required ALGORITHM=COPY.)

instant_alter_column_possible(): Add a parameter for InnoDB table,
to check for additional conditions, such as the maximum number of
index fields.

ha_innobase_inplace_ctx::first_alter_pos: The first column whose position
is affected by instant ADD, DROP, or changing the order of columns.

innobase_build_col_map(): Skip added virtual columns.

prepare_inplace_add_virtual(): Correctly compute num_to_add_vcol.
Remove some unnecessary code. Note that the call to
innodb_base_col_setup() should be executed later.

commit_try_norebuild(): If ctx->is_instant(), let the virtual
columns be added or dropped by innobase_instant_try().

innobase_instant_try(): Fill in a zero default value for the
hidden column FTS_DOC_ID (to reduce the work needed in MDEV-17459).
If any columns were dropped or reordered (or added not last),
delete any SYS_COLUMNS records for the following columns, and
insert SYS_COLUMNS records for all subsequent stored columns as well
as for all virtual columns. If any virtual column is dropped, rewrite
all virtual column metadata. Use a shortcut only for adding
virtual columns. This is because innobase_drop_virtual_try()
assumes that the dropped virtual columns still exist in ctx->old_table.

innodb_update_cols(): Renamed from innodb_update_n_cols().

innobase_add_one_virtual(), innobase_insert_sys_virtual(): Change
the return type to bool, and invoke my_error() when detecting an error.

innodb_insert_sys_columns(): Insert a record into SYS_COLUMNS.
Refactored from innobase_add_one_virtual() and innobase_instant_add_col().

innobase_instant_add_col(): Replace the parameter dfield with type.

innobase_instant_drop_cols(): Drop matching columns from SYS_COLUMNS
and all columns from SYS_VIRTUAL.

innobase_add_virtual_try(), innobase_drop_virtual_try(): Let
the caller invoke innodb_update_cols().

innobase_rename_column_try(): Skip dropped columns.

commit_cache_norebuild(): Update table->fts->doc_col.

dict_mem_table_col_rename_low(): Skip dropped columns.

trx_undo_rec_get_partial_row(): Skip dropped columns.

trx_undo_update_rec_get_update(): Handle the metadata BLOB correctly.

trx_undo_page_report_modify(): Avoid out-of-bounds access to record fields.
Log metadata records consistently.
Apparently, the first fields of a clustered index may be updated
in an update_undo vector when the index is ID_IND of SYS_FOREIGN,
as part of renaming the table during ALTER TABLE. Normally, updates of
the PRIMARY KEY should be logged as delete-mark and an insert.

row_undo_mod_parse_undo_rec(), row_purge_parse_undo_rec():
Use trx_undo_metadata.

row_undo_mod_clust_low(): On metadata rollback, roll back the root page too.

row_undo_mod_clust(): Relax an assertion. The delete-mark flag was
repurposed for ALTER TABLE metadata records.

row_rec_to_index_entry_impl(): Add the template parameter mblob
and the optional parameter info_bits for specifying the desired new
info bits. For the metadata tuple, allow conversion between the original
format (ADD COLUMN only) and the generic format (with hidden BLOB).
Add the optional parameter "pad" to determine whether the tuple should
be padded to the index fields (on ALTER TABLE it should), or whether
it should remain at its original size (on rollback).

row_build_index_entry_low(): Clean up the code, removing
redundant variables and conditions. For instantly dropped columns,
generate a dummy value that is NULL, the empty string, or a
fixed length of NUL bytes, depending on the type of the dropped column.

row_upd_clust_rec_by_insert_inherit_func(): On the update of PRIMARY KEY
of a record that contained a dropped column whose value was stored
externally, we will be inserting a dummy NULL or empty string value
to the field of the dropped column. The externally stored column would
eventually be dropped when purge removes the delete-marked record for
the old PRIMARY KEY value.

btr_index_rec_validate(): Recognize the metadata record.

btr_discard_only_page_on_level(): Preserve the generic instant
ALTER TABLE metadata.

btr_set_instant(): Replaces page_set_instant(). This sets a clustered
index root page to the appropriate format, or upgrades from
the MDEV-11369 instant ADD COLUMN to generic ALTER TABLE format.

btr_cur_instant_init_low(): Read and validate the metadata BLOB page
before reconstructing the dictionary information based on it.

btr_cur_instant_init_metadata(): Do not read any lengths from the
metadata record header before reading the BLOB. At this point, we
would not actually know how many nullable fields the metadata record
contains.

btr_cur_instant_root_init(): Initialize n_core_null_bytes in one
of two possible ways.

btr_cur_trim(): Handle the mblob record.

row_metadata_to_tuple(): Convert a metadata record to a data tuple,
based on the new info_bits of the metadata record.

btr_cur_pessimistic_update(): Invoke row_metadata_to_tuple() if needed.
Invoke dtuple_convert_big_rec() for metadata records if the record is
too large, or if the mblob is not yet marked as externally stored.

btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete():
When the last user record is deleted, do not delete the
generic instant ALTER TABLE metadata record. Only delete
MDEV-11369 instant ADD COLUMN metadata records.

btr_cur_optimistic_insert(): Avoid unnecessary computation of rec_size.

btr_pcur_store_position(): Allow a logically empty page to contain
a metadata record for generic ALTER TABLE.

REC_INFO_DEFAULT_ROW_ADD: Renamed from REC_INFO_DEFAULT_ROW.
This is for the old instant ADD COLUMN (MDEV-11369) only.

REC_INFO_DEFAULT_ROW_ALTER: The more generic metadata record,
with additional information for dropped or reordered columns.

rec_info_bits_valid(): Remove. The only case when this would fail
is when the record is the generic ALTER TABLE metadata record.

rec_is_alter_metadata(): Check if a record is the metadata record
for instant ALTER TABLE (other than ADD COLUMN). NOTE: This function
must not be invoked on node pointer records, because the delete-mark
flag in those records may be set (it is garbage), and then a debug
assertion could fail because index->is_instant() does not necessarily
hold.

rec_is_add_metadata(): Check if a record is MDEV-11369 ADD COLUMN metadata
record (not more generic instant ALTER TABLE).

rec_get_converted_size_comp_prefix_low(): Assume that the metadata
field will be stored externally. In dtuple_convert_big_rec() during
the rec_get_converted_size() call, it would not be there yet.

rec_get_converted_size_comp(): Replace status,fields,n_fields with tuple.

rec_init_offsets_comp_ordinary(), rec_get_converted_size_comp_prefix_low(),
rec_convert_dtuple_to_rec_comp(): Add template<bool mblob = false>.
With mblob=true, process a record with a metadata BLOB.

rec_copy_prefix_to_buf(): Assert that no fields beyond the key and
system columns are being copied. Exclude the metadata BLOB field.

rec_convert_dtuple_to_metadata_comp(): Convert an alter metadata tuple
into a record.

row_upd_index_replace_metadata(): Apply an update vector to an
alter_metadata tuple.

row_log_allocate(): Replace dict_index_t::is_instant()
with a more appropriate condition that ignores dict_table_t::instant.
Only a table on which the MDEV-11369 ADD COLUMN was performed
can "lose its instantness" when it becomes empty. After
instant DROP COLUMN or reordering columns, we cannot simply
convert the table to the canonical format, because the data
dictionary cache and all possibly existing references to it
from other client connection threads would have to be adjusted.

row_quiesce_write_index_fields(): Do not crash when the table contains
an instantly dropped column.

Thanks to Thirunarayanan Balathandayuthapani for discussing the design
and implementing an initial prototype of this.
Thanks to Matthias Leich for testing.
2018-10-19 18:57:23 +03:00
Marko Mäkelä
9b14e37717 Merge 10.3 into 10.4 2018-10-18 12:22:48 +03:00
Marko Mäkelä
3eaae09669 Extend the use of innodb_default_row_format.combinations 2018-10-18 11:27:19 +03:00
Marko Mäkelä
d88c136b9f Merge 10.3 into 10.4 2018-10-17 19:11:42 +03:00
Marko Mäkelä
2fa4ed031c MDEV-17483 Insert on delete-marked record can wrongly inherit old values for instantly added column
row_ins_clust_index_entry_low(): Do not call dtuple_t::trim()
before row_ins_clust_index_entry_by_modify(), so that the values
of all columns will be available in row_upd_build_difference_binary().
If applicable, the tuple can be trimmed in btr_cur_optimistic_update()
or btr_cur_pessimistic_update(), which will be called by
row_ins_clust_index_entry_by_modify().
2018-10-17 18:55:46 +03:00
Marko Mäkelä
853a0a4368 MDEV-13564: Set innodb_safe_truncate=ON by default
The setting innodb_safe_truncate=ON reduces compatibility with older
versions of MariaDB and backup tools in two ways.

First, we will be writing TRX_UNDO_RENAME_TABLE records, which older
versions do not know about. These records could be misinterpreted if
a DDL transaction was recovered and would be rolled back.
Such rollback is only possible if the server was killed while
an incomplete DDL transaction was persisted. On transaction completion,
the insert_undo log pages would only be repurposed for new undo log
allocations, and their contents would not matter. So, older versions
will not have a problem with innodb_safe_truncate=ON if the server was
shut down cleanly.

Second, to prevent such recovery failure, innodb_safe_truncate=ON will
cause a modification of the redo log format identifier, which will
prevent older versions from starting up after a crash. MariaDB Server
versions older than 10.2.13 will refuse to start up altogether, even
after clean shutdown.

A server restart with innodb_safe_truncate=OFF will restore compatibility
with older server and backup versions.
2018-10-17 17:35:38 +03:00
Marko Mäkelä
7e869a2767 Merge 10.2 into 10.3 2018-10-11 23:09:10 +03:00
Marko Mäkelä
81a5b6ccd5 MDEV-17433 Allow InnoDB start up with empty ib_logfile0 from mariabackup --prepare
A prepared backup from Mariabackup does not really need to contain any
redo log file, because all log will have been applied to the data files.

When the user copies a prepared backup to a data directory (overwriting
existing files), it could happen that the data directory already contained
redo log files from the past. mariabackup --copy-back) would delete the
old redo log files, but a user’s own copying script might not do that.
To prevent corruption caused by mixing an old redo log file with data
files from a backup, starting with MDEV-13311, Mariabackup would create
a zero-length ib_logfile0 that would prevent startup.

Actually, there is no need to prevent InnoDB from starting up when a
single zero-length file ib_logfile0 is present. Only if there exist
multiple data files of different lengths, then we should refuse to
start up due to inconsistency. A single zero-length ib_logfile0 should
be treated as if the log files were missing: create new log files
according to the configuration.

open_log_file(): Remove. There is no need to open the log files
at this point, because os_file_get_status() already determined
the size of the file.

innobase_start_or_create_for_mysql(): Move the creation of new
log files a little later, not when finding out that the first log
file does not exist, but after finding out that it does not exist
or it exists as a zero-length file.
2018-10-11 23:00:48 +03:00
Marko Mäkelä
6319c0b541 MDEV-13564: Replace innodb_unsafe_truncate with innodb_safe_truncate
Rename the 10.2-specific configuration option innodb_unsafe_truncate
to innodb_safe_truncate, and invert its value.

The default (for now) is innodb_safe_truncate=OFF, to avoid
disrupting users with an undo and redo log format change within
a Generally Available (GA) release series.
2018-10-11 15:10:13 +03:00
Marko Mäkelä
3448ceb02a MDEV-13564: Implement innodb_unsafe_truncate=ON for compatibility
While MariaDB Server 10.2 is not really guaranteed to be compatible
with Percona XtraBackup 2.4 (for example, the MySQL 5.7 undo log format
change that could be present in XtraBackup, but was reverted from
MariaDB in MDEV-12289), we do not want to disrupt users who have
deployed xtrabackup and MariaDB Server 10.2 in their environments.

With this change, MariaDB 10.2 will continue to use the backup-unsafe
TRUNCATE TABLE code, so that neither the undo log nor the redo log
formats will change in an incompatible way.

Undo tablespace truncation will keep using the redo log only. Recovery
or backup with old code will fail to shrink the undo tablespace files,
but the contents will be recovered just fine.

In the MariaDB Server 10.2 series only, we introduce the configuration
parameter innodb_unsafe_truncate and make it ON by default. To allow
MariaDB Backup (mariabackup) to work properly with TRUNCATE TABLE
operations, use loose_innodb_unsafe_truncate=OFF.

MariaDB Server 10.3.10 and later releases will always use the
backup-safe TRUNCATE TABLE, and this parameter will not be
added there.

recv_recovery_rollback_active(): Skip row_mysql_drop_garbage_tables()
unless innodb_unsafe_truncate=OFF. It is too unsafe to drop orphan
tables if RENAME operations are not transactional within InnoDB.

LOG_HEADER_FORMAT_10_3: Replaces LOG_HEADER_FORMAT_CURRENT.

log_init(), log_group_file_header_flush(),
srv_prepare_to_delete_redo_log_files(),
innobase_start_or_create_for_mysql(): Choose the redo log format
and subformat based on the value of innodb_unsafe_truncate.
2018-10-11 08:17:04 +03:00
Marko Mäkelä
2a955c7a83 Merge 10.3 into 10.4 2018-10-10 10:36:51 +03:00
Marko Mäkelä
61b32df931 Merge 10.2 into 10.3 2018-10-10 06:45:19 +03:00
Marko Mäkelä
00b6c7d8fc MDEV-16946 innodb.alter_kill failed in buildbot with wrong result
Ensure that no redo log checkpoint occurs in a critical section
of a recovery test.
2018-10-10 06:31:43 +03:00
Marko Mäkelä
2610c26a53 MDEV-16273 innodb.alter_kill fails Unknown storage engine 'InnoDB'
The test is shutting down InnoDB, corrupting a file, and finally
restarting InnoDB. Before the shutdown, the test created the table
and inserted some records. Before MDEV-12288, there would be no access
to the table after server restart, but after MDEV-12288 purge would
reset the transaction identifier after the INSERT, and this would
sometimes happen after the restart.

To make the test deterministic, wait for purge to complete before the
shutdown.
2018-10-10 06:14:14 +03:00
Marko Mäkelä
43ee6915fa Merge 10.2 into 10.3 2018-10-09 09:11:30 +03:00
Marko Mäkelä
1ff22b2062 MDEV-17289: Skip the test for non-debug server 2018-10-06 13:43:13 +03:00
Thirunarayanan Balathandayuthapani
33fadbfefc MDEV-17289: Add a test case 2018-10-05 17:36:31 +03:00
Marko Mäkelä
444c380ceb Merge 10.3 into 10.4 2018-10-05 08:09:49 +03:00
Sergei Golubchik
57e0da50bb Merge branch '10.2' into 10.3 2018-09-28 16:37:06 +02:00
Marko Mäkelä
304857764b MDEV-13564 Mariabackup does not work with TRUNCATE
Implement undo tablespace truncation via normal redo logging.

Implement TRUNCATE TABLE as a combination of RENAME to #sql-ib name,
CREATE, and DROP.
2018-09-25 17:29:43 +03:00
Marko Mäkelä
4a246fecbd Merge 10.3 into 10.4 2018-09-19 08:21:03 +03:00
Marko Mäkelä
ac24289e66 MDEV-16328 ALTER TABLE…page_compression_level should not rebuild table
The table option page_compression_level is something that only
affects future writes, not actually the data format. Therefore,
we can allow instant changes of this option.

Similarly, the table option page_compressed can be set on a
previously uncompressed table without rebuilding the table,
because an uncompressed page would be considered valid when
reading a page_compressed table.

Removing the page_compressed option will continue to require
the table to be rebuilt.

ha_innobase_inplace_ctx::page_compression_level: The requested
page_compression_level at the start of ALTER TABLE, or 0 if
page_compressed=OFF.

alter_options_need_rebuild(): Renamed from
create_option_need_rebuild(). Allow page_compression_level and
page_compressed to be changed as above, without rebuilding the table.

ha_innobase::check_if_supported_inplace_alter(): Allow ALGORITHM=INSTANT
for ALTER_OPTIONS if the table is not to be rebuilt. If rebuild is
needed, set ha_alter_info->unsupported_reason.

innobase_page_compression_try(): Update SYS_TABLES.TYPE according
to the table flags, for an instant change of page_compression_level
or page_compressed.

commit_cache_norebuild(): Adjust dict_table_t::flags, fil_space_t::flags
and (if needed) FSP_SPACE_FLAGS if page_compression_level was specified.
2018-09-17 14:37:39 +03:00
Marko Mäkelä
171fbbb968 Merge 10.3 into 10.4 2018-09-14 15:23:34 +03:00
Marko Mäkelä
aba5c72be2 MDEV-17196 Crash during instant ADD COLUMN with long DEFAULT value
A debug assertion would fail if an instant ADD COLUMN operation
involves splitting the leftmost leaf page and storing a default
value off-page. Another debug assertion could fail if the
default value does not fit in an undo log page.

btr_cur_pessimistic_update(): Invoke rec_offs_make_valid()
in order to prevent rec_offs_validate() assertion failure.

innobase_add_instant_try(): Invoke btr_cur_pessimistic_update()
with the BTR_KEEP_POS_FLAG, which is the correct course of action
when BLOBs may need to be written. Whenever returning true,
ensure that my_error() will have been called.
2018-09-14 15:06:58 +03:00
Oleksandr Byelkin
28f08d3753 Merge branch '10.1' into 10.2 2018-09-14 08:47:22 +02:00
Marko Mäkelä
5567a8c936 MDEV-17138 Reduce redo log volume for undo tablespace initialization
Implement a 10.4 redo log format, which extends the 10.3 format
by introducing the MLOG_MEMSET record.

MLOG_MEMSET: A new redo log record type for filling an area with a byte.

mlog_memset(): Write the MLOG_MEMSET record.

mlog_parse_nbytes(): Handle MLOG_MEMSET as well.

trx_rseg_header_create(): Reduce the redo log volume by making use of
mlog_memset() and the zero-initialization that happens inside page
allocation.

fil_addr_null: Remove.

flst_init(): Create a variant that takes a zero-initialized
buf_block_t* as a parameter, and only writes the FIL_NULL using
mlog_memset().

flst_zero_addr(): A variant of flst_write_addr() that writes
a null address using mlog_memset() for the FIL_NULL.

The following fixes are replacing some use of MLOG_WRITE_STRING
with the more compact MLOG_MEMSET record, or eliminating
redundant redo log writes:

btr_store_big_rec_extern_fields(): Invoke mlog_memset() for
zero-initializing the tail of the ROW_FORMAT=COMPRESSED BLOB page.

trx_sysf_create(), trx_rseg_format_upgrade(): Invoke mlog_memset()
for zero-initializing the page trailer.

fsp_header_init(), trx_rseg_header_create():
Remove redundant zero-initializations.
2018-09-11 21:32:36 +03:00
Marko Mäkelä
67fa97dc2c Merge 10.3 into 10.4 2018-09-11 21:31:47 +03:00
Marko Mäkelä
b02c722e7a MDEV-17158 TRUNCATE is not atomic after MDEV-13564 2018-09-10 15:40:11 +03:00
Marko Mäkelä
75f8e86f57 MDEV-17158 TRUNCATE is not atomic after MDEV-13564
It turned out that ha_innobase::truncate() would prematurely
commit the transaction already before the completion of the
ha_innobase::create(). All of this must be atomic.

innodb.truncate_crash: Use the correct DEBUG_SYNC point, and
tolerate non-truncation of the table, because the redo log
for the TRUNCATE transaction commit might be flushed due to
some InnoDB background activity.

dict_build_tablespace_for_table(): Merge to the function
dict_build_table_def_step().

dict_build_table_def_step(): If a table is being created during
an already started data dictionary transaction (such as TRUNCATE),
persistently write the table_id to the undo log header before
creating any file. In this way, the recovery of TRUNCATE will be
able to delete the new file before rolling back the rename of
the original table.

dict_table_rename_in_cache(): Add the parameter replace_new_file,
used as part of rolling back a TRUNCATE operation.

fil_rename_tablespace_check(): Add the parameter replace_new.
If the parameter is set and a file identified by new_path exists,
remove a possible tablespace and also the file.

create_table_info_t::create_table_def(): Remove some debug assertions
that no longer hold. During TRUNCATE, the transaction will already
have been started (and performed a rename operation) before the
table is created. Also, remove a call to dict_build_tablespace_for_table().

create_table_info_t::create_table(): Add the parameter create_fk=true.
During TRUNCATE TABLE, do not add FOREIGN KEY constraints to the
InnoDB data dictionary, because they will also not be removed.

row_table_add_foreign_constraints(): If trx=NULL, do not modify
the InnoDB data dictionary, but only load the FOREIGN KEY constraints
from the data dictionary.

ha_innobase::create(): Lock the InnoDB data dictionary cache only
if no transaction was passed by the caller. Unlock it in any case.

innobase_rename_table(): Add the parameter commit = true.
If !commit, do not lock or unlock the data dictionary cache.

ha_innobase::truncate(): Lock the data dictionary before invoking
rename or create, and let ha_innobase::create() unlock it and
also commit or roll back the transaction.

trx_undo_mark_as_dict(): Renamed from trx_undo_mark_as_dict_operation()
and declared global instead of static.

row_undo_ins_parse_undo_rec(): If table_id is set, this must
be rolling back the rename operation in TRUNCATE TABLE, and
therefore replace_new_file=true.
2018-09-10 14:59:58 +03:00
Marko Mäkelä
5a1868b58d MDEV-13564 Mariabackup does not work with TRUNCATE
This is a merge from 10.2, but the 10.2 version of this will not
be pushed into 10.2 yet, because the 10.2 version would include
backports of MDEV-14717 and MDEV-14585, which would introduce
a crash recovery regression: Tables could be lost on
table-rebuilding DDL operations, such as ALTER TABLE,
OPTIMIZE TABLE or this new backup-friendly TRUNCATE TABLE.
The test innodb.truncate_crash occasionally loses the table due to
the following bug:

MDEV-17158 log_write_up_to() sometimes fails
2018-09-07 22:15:06 +03:00
Marko Mäkelä
73ed19e44f MDEV-14585 Automatically remove #sql- tables in InnoDB dictionary during recovery
This is a backport of the following commits:
commit b4165985c9
commit 69e88de0fe
commit 40f4525f43
commit 656f66def2

Now that MDEV-14717 made RENAME TABLE crash-safe within InnoDB,
it should be safe to drop the #sql- tables within InnoDB during
crash recovery. These tables can be one of two things:

(1) #sql-ib related to deferred DROP TABLE (follow-up to MDEV-13407)
or to table-rebuilding ALTER TABLE...ALGORITHM=INPLACE
(since MDEV-14378, only related to the intermediate copy of a table),

(2) #sql- related to the intermediate copy of a table during
ALTER TABLE...ALGORITHM=COPY

We will not drop tables whose name starts with #sql2, because
the server can be killed during an ALGORITHM=COPY operation at
a point where the original table was renamed to #sql2 but the
finished intermediate copy was not yet renamed from #sql-
to the original table name.

If an old version of MariaDB Server before 10.2.13 (MDEV-11415)
was killed while ALTER TABLE...ALGORITHM=COPY was in progress,
after recovery there could be undo log records for some records that were
inserted into an intermediate copy of the table. Due to these undo log
records, InnoDB would resurrect locks at recovery, and the intermediate
table would be locked while we are trying to drop it. This would cause
a call to row_rename_table_for_mysql(), either from
row_mysql_drop_garbage_tables() or from the rollback of a RENAME
operation that was part of the ALTER TABLE.

row_rename_table_for_mysql(): Do not attempt to parse FOREIGN KEY
constraints when renaming from #sql-something to #sql-something-else,
because it does not make any sense.

row_drop_table_for_mysql(): When deferring DROP TABLE due to locks,
do not rename the table if its name already starts with the #sql-
prefix, which is what row_mysql_drop_garbage_tables() uses.
Previously, the too strict prefix #sql-ib was used, and some
tables were renamed unnecessarily.
2018-09-07 22:10:03 +03:00
Marko Mäkelä
8dcacd3b01 Follow-up to MDEV-13407 innodb.drop_table_background failed in buildbot with "Tablespace for table exists"
This is a backport of commit 88aff5f471.

The InnoDB background DROP TABLE queue is something that we should
really remove, but are unable to until we remove dict_operation_lock
so that DDL and DML operations can be combined in a single transaction.

Because the queue is not persistent, it is not crash-safe. We should
in some way ensure that the deferred-dropped tables will be dropped
after server restart.

The existence of two separate transactions complicates the error handling
of CREATE TABLE...SELECT. We should really not break locks in DROP TABLE.

Our solution to these problems is to rename the table to a temporary
name, and to drop such-named tables on InnoDB startup. Also, the
queue will use table IDs instead of names from now on.

check-testcase.test: Ignore #sql-ib*.ibd files, because tables may enter
the background DROP TABLE queue shortly before the test finishes.

innodb.drop_table_background: Test CREATE...SELECT and the creation of
tables whose file name starts with #sql-ib.

innodb.alter_crash: Adjust the recovery, now that the #sql-ib tables
will be dropped on InnoDB startup.

row_mysql_drop_garbage_tables(): New function, to drop all #sql-ib tables
on InnoDB startup.

row_drop_table_for_mysql_in_background(): Remove an unnecessary and
misplaced call to log_buffer_flush_to_disk(). (The call should have been
after the transaction commit. We do not care about flushing the redo log
here, because the table would be dropped again at server startup.)

Remove the entry from the list after the table no longer exists.

If server shutdown has been initiated, empty the list without actually
dropping any tables. They will be dropped again on startup.

row_drop_table_for_mysql(): Do not call lock_remove_all_on_table().
Instead, if locks exist, defer the DROP TABLE until they do not exist.
If the table name does not start with #sql-ib, rename it to that prefix
before adding it to the background DROP TABLE queue.
2018-09-07 22:10:03 +03:00
Marko Mäkelä
754727bb8a MDEV-14378 In ALGORITHM=INPLACE, use a common name for the intermediate tables or partitions
This is a backport of commit 07e9ff1fe1.

Allow DROP TABLE `#mysql50##sql-...._.` to drop tables that were
being rebuilt by ALGORITHM=INPLACE

NOTE: If the server is killed after the table-rebuilding ALGORITHM=INPLACE
commits inside InnoDB but before the .frm file has been replaced, then
the recovery will involve something else than DROP TABLE.

NOTE: If the server is killed in a true inplace ALTER TABLE commits
inside InnoDB but before the .frm file has been replaced, then we
are really out of luck. To properly handle that situation, we would
need a transactional mysql.ddl_fixup table that directs recovery to
rename or remove files.

prepare_inplace_alter_table_dict(): Use the altered_table->s->table_name
for generating the new_table_name.

table_name_t::part_suffix: The start of the partition name suffix.

table_name_t::dbend(): Return the end of the schema name.

table_name_t::dblen(): Return the length of the schema name, in bytes.

table_name_t::basename(): Return the name without the schema name.

table_name_t::part(): Return the partition name, or NULL if none.

row_drop_table_for_mysql(): Assert for #sql, not #sql-ib.
2018-09-07 22:10:03 +03:00
Marko Mäkelä
cf2a4426a2 MDEV-14717 RENAME TABLE in InnoDB is not crash-safe
This is a backport of commit 0bc36758ba
and commit 9eb3fcc9fb.

InnoDB in MariaDB 10.2 appears to only write MLOG_FILE_RENAME2
redo log records during table-rebuilding ALGORITHM=INPLACE operations.
We must write the records for any .ibd file renames, so that the
operations are crash-safe.

If InnoDB is killed during a RENAME TABLE operation, it can happen that
the transaction for updating the data dictionary will be rolled back.
But, nothing will roll back the renaming of the .ibd file
(the MLOG_FILE_RENAME2 only guarantees roll-forward), or for that matter,
the renaming of the dict_table_t::name in the dict_sys cache. We introduce
the undo log record TRX_UNDO_RENAME_TABLE to fix this.

fil_space_for_table_exists_in_mem(): Remove the parameters
adjust_space, table_id and some code that was trying to work around
these deficiencies.

fil_name_write_rename(): Write a MLOG_FILE_RENAME2 record.

dict_table_rename_in_cache(): Invoke fil_name_write_rename().

trx_undo_rec_copy(): Set the first 2 bytes to the length of the
copied undo log record.

trx_undo_page_report_rename(), trx_undo_report_rename():
Write a TRX_UNDO_RENAME_TABLE record with the old table name.

row_rename_table_for_mysql(): Invoke trx_undo_report_rename()
before modifying any data dictionary tables.

row_undo_ins_parse_undo_rec(): Roll back TRX_UNDO_RENAME_TABLE
by invoking dict_table_rename_in_cache(), which will take care
of both renaming the table and the file.

ha_innobase::truncate(): Remove a work-around.
2018-09-07 22:10:02 +03:00
Marko Mäkelä
e67b1070bb MDEV-17049 Enable innodb_undo tests on buildbot
Remove the innodb_undo suite, and move and adapt the tests.
Remove unnecessary restarts, and add innodb_page_size_small.inc
for combinations.

innodb.undo_truncate is the merge of innodb_undo.truncate
and innodb_undo.truncate_multi_client.

Add the global status variable innodb_undo_truncations.
Without this, the test innodb.undo_truncate would occasionally
report that truncation did not happen. The test was only waiting
for the history list length to reach 0, but the undo tablespace
truncation would only take place some time after that.

Undo tablespace truncation will only occasionally occur with
innodb_page_size=32k, and typically never occur (with this amount
of undo log operations) with innodb_page_size=64k. We disable
these combinations.

innodb.undo_truncate_recover was formerly called
innodb_undo.truncate_recover.
2018-09-07 22:10:02 +03:00
Marko Mäkelä
055a3334ad MDEV-13564 Mariabackup does not work with TRUNCATE
Implement undo tablespace truncation via normal redo logging.

Implement TRUNCATE TABLE as a combination of RENAME to #sql-ib name,
CREATE, and DROP.

Note: Orphan #sql-ib*.ibd may be left behind if MariaDB Server 10.2
is killed before the DROP operation is committed. If MariaDB Server 10.2
is killed during TRUNCATE, it is also possible that the old table
was renamed to #sql-ib*.ibd but the data dictionary will refer to the
table using the original name.

In MariaDB Server 10.3, RENAME inside InnoDB is transactional,
and #sql-* tables will be dropped on startup. So, this new TRUNCATE
will be fully crash-safe in 10.3.

ha_mroonga::wrapper_truncate(): Pass table options to the underlying
storage engine, now that ha_innobase::truncate() will need them.

rpl_slave_state::truncate_state_table(): Before truncating
mysql.gtid_slave_pos, evict any cached table handles from
the table definition cache, so that there will be no stale
references to the old table after truncating.

== TRUNCATE TABLE ==

WL#6501 in MySQL 5.7 introduced separate log files for implementing
atomic and crash-safe TRUNCATE TABLE, instead of using the InnoDB
undo and redo log. Some convoluted logic was added to the InnoDB
crash recovery, and some extra synchronization (including a redo log
checkpoint) was introduced to make this work. This synchronization
has caused performance problems and race conditions, and the extra
log files cannot be copied or applied by external backup programs.

In order to support crash-upgrade from MariaDB 10.2, we will keep
the logic for parsing and applying the extra log files, but we will
no longer generate those files in TRUNCATE TABLE.

A prerequisite for crash-safe TRUNCATE is a crash-safe RENAME TABLE
(with full redo and undo logging and proper rollback). This will
be implemented in MDEV-14717.

ha_innobase::truncate(): Invoke RENAME, create(), delete_table().
Because RENAME cannot be fully rolled back before MariaDB 10.3
due to missing undo logging, add some explicit rename-back in
case the operation fails.

ha_innobase::delete(): Introduce a variant that takes sqlcom as
a parameter. In TRUNCATE TABLE, we do not want to touch any
FOREIGN KEY constraints.

ha_innobase::create(): Add the parameters file_per_table, trx.
In TRUNCATE, the new table must be created in the same transaction
that renames the old table.

create_table_info_t::create_table_info_t(): Add the parameters
file_per_table, trx.

row_drop_table_for_mysql(): Replace a bool parameter with sqlcom.

row_drop_table_after_create_fail(): New function, wrapping
row_drop_table_for_mysql().

dict_truncate_index_tree_in_mem(), fil_truncate_tablespace(),
fil_prepare_for_truncate(), fil_reinit_space_header_for_table(),
row_truncate_table_for_mysql(), TruncateLogger,
row_truncate_prepare(), row_truncate_rollback(),
row_truncate_complete(), row_truncate_fts(),
row_truncate_update_system_tables(),
row_truncate_foreign_key_checks(), row_truncate_sanity_checks():
Remove.

row_upd_check_references_constraints(): Remove a check for
TRUNCATE, now that the table is no longer truncated in place.

The new test innodb.truncate_foreign uses DEBUG_SYNC to cover some
race-condition like scenarios. The test innodb-innodb.truncate does
not use any synchronization.

We add a redo log subformat to indicate backup-friendly format.
MariaDB 10.4 will remove support for the old TRUNCATE logging,
so crash-upgrade from old 10.2 or 10.3 to 10.4 will involve
limitations.

== Undo tablespace truncation ==

MySQL 5.7 implements undo tablespace truncation. It is only
possible when innodb_undo_tablespaces is set to at least 2.
The logging is implemented similar to the WL#6501 TRUNCATE,
that is, using separate log files and a redo log checkpoint.

We can simply implement undo tablespace truncation within
a single mini-transaction that reinitializes the undo log
tablespace file. Unfortunately, due to the redo log format
of some operations, currently, the total redo log written by
undo tablespace truncation will be more than the combined size
of the truncated undo tablespace. It should be acceptable
to have a little more than 1 megabyte of log in a single
mini-transaction. This will be fixed in MDEV-17138 in
MariaDB Server 10.4.

recv_sys_t: Add truncated_undo_spaces[] to remember for which undo
tablespaces a MLOG_FILE_CREATE2 record was seen.

namespace undo: Remove some unnecessary declarations.

fil_space_t::is_being_truncated: Document that this flag now
only applies to undo tablespaces. Remove some references.

fil_space_t::is_stopping(): Do not refer to is_being_truncated.
This check is for tablespaces of tables. Potentially used
tablespaces are never truncated any more.

buf_dblwr_process(): Suppress the out-of-bounds warning
for undo tablespaces.

fil_truncate_log(): Write a MLOG_FILE_CREATE2 with a nonzero
page number (new size of the tablespace in pages) to inform
crash recovery that the undo tablespace size has been reduced.

fil_op_write_log(): Relax assertions, so that MLOG_FILE_CREATE2
can be written for undo tablespaces (without .ibd file suffix)
for a nonzero page number.

os_file_truncate(): Add the parameter allow_shrink=false
so that undo tablespaces can actually be shrunk using this function.

fil_name_parse(): For undo tablespace truncation,
buffer MLOG_FILE_CREATE2 in truncated_undo_spaces[].

recv_read_in_area(): Avoid reading pages for which no redo log
records remain buffered, after recv_addr_trim() removed them.

trx_rseg_header_create(): Add a FIXME comment that we could write
much less redo log.

trx_undo_truncate_tablespace(): Reinitialize the undo tablespace
in a single mini-transaction, which will be flushed to the redo log
before the file size is trimmed.

recv_addr_trim(): Discard any redo logs for pages that were
logged after the new end of a file, before the truncation LSN.
If the rec_list becomes empty, reduce n_addrs. After removing
any affected records, actually truncate the file.

recv_apply_hashed_log_recs(): Invoke recv_addr_trim() right before
applying any log records. The undo tablespace files must be open
at this point.

buf_flush_or_remove_pages(), buf_flush_dirty_pages(),
buf_LRU_flush_or_remove_pages(): Add a parameter for specifying
the number of the first page to flush or remove (default 0).

trx_purge_initiate_truncate(): Remove the log checkpoints, the
extra logging, and some unnecessary crash points. Merge the code
from trx_undo_truncate_tablespace(). First, flush all to-be-discarded
pages (beyond the new end of the file), then trim the space->size
to make the page allocation deterministic. At the only remaining
crash injection point, flush the redo log, so that the recovery
can be tested.
2018-09-07 22:10:02 +03:00
Oleksandr Byelkin
31081593aa Merge branch '11.0' into 10.1 2018-09-06 22:45:19 +02:00
Marko Mäkelä
4caf3e08a8 Add MDEV-11080, MDEV-16709 tests for the MDEV-13333 fix
The regression that was introduced in
commit 723f87e9d3
was fixed as part of MDEV-13333
(commit 3b37edee1a)
without a test case, because the MDEV-13333 test case
is even less deterministic than these ones.
2018-09-04 20:21:57 +03:00
Sergei Golubchik
9180e8666b MDEV-16465 Invalid (old?) table or database name or hang in ha_innobase::delete_table and log semaphore wait upon concurrent DDL with foreign keys
ALTER TABLE locks the table with TL_READ_NO_INSERT, to prevent the
source table modifications while it's being copied. But there's an
indirect way of modifying a table, via cascade FK actions.

After previous commits, an attempt to modify an FK parent table
will cause FK children to be prelocked, so the table-being-altered
cannot be modified by a cascade FK action, because ALTER holds a
lock and prelocking will wait.

But if a new FK is being added by this very ALTER, then the target
table is not locked yet (it's a temporary table). So, we have to
lock FK parents explicitly.
2018-09-04 09:49:53 +02:00
Sergei Golubchik
dd74332d2c MDEV-12669 Circular foreign keys cause a loop and OOM upon LOCK TABLE
table_already_fk_prelocked() was looking for a table in the wrong
list (not the complete list of prelocked tables, but only in its tail,
starting from the current table - which is always empty for the last
added table), so for circular FKs it kept adding same tables to the list
indefinitely.

Backport of d6d7e169fb
2018-09-04 09:49:52 +02:00
Sergei Golubchik
64a23c1c8a extend prelocking to FK-accessed tables
Backport of f136291098
2018-09-04 08:37:44 +02:00