1
0
mirror of https://github.com/MariaDB/server.git synced 2025-12-01 17:39:21 +03:00
Commit Graph

170 Commits

Author SHA1 Message Date
Oleksandr Byelkin
c07325f932 Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
Marko Mäkelä
198ed24cac MDEV-19513: Rename dict_operation_lock to dict_sys.latch
dict_sys.lock(), dict_sys_lock(): Acquire both mutex and latch.

dict_sys.unlock(), dict_sys_unlock(): Release both mutex and latch.

dict_sys.assert_locked(): Assert that both mutex and latch are held.
2019-05-17 15:26:33 +03:00
Marko Mäkelä
5fd7502e77 MDEV-19513: Allocate dict_sys statically
dict_sys_t::create(): Renamed from dict_init().

dict_sys_t::close(): Renamed from dict_close().

dict_sys_t::add(): Sliced from dict_table_t::add_to_cache().

dict_sys_t::remove(): Renamed from dict_table_remove_from_cache().

dict_sys_t::prevent_eviction(): Renamed from
dict_table_move_from_lru_to_non_lru().

dict_sys_t::acquire(): Replaces dict_move_to_mru() and some more logic.

dict_sys_t::resize(): Renamed from dict_resize().

dict_sys_t::find(): Replaces dict_lru_find_table() and
dict_non_lru_find_table().
2019-05-17 14:32:53 +03:00
Marko Mäkelä
874f8f30f2 Merge 10.2 into 10.3 2019-05-14 17:25:25 +03:00
Marko Mäkelä
be85d3e61b Merge 10.2 into 10.3 2019-05-14 17:18:46 +03:00
Marko Mäkelä
b93ecea65c Remove unnecessary pointer indirection for rw_lock_t
In MySQL 5.7.8 an extra level of pointer indirection was added to
dict_operation_lock and some other rw_lock_t without solid justification,
in mysql/mysql-server@52720f1772.

Let us revert that change and remove the rather useless rw_lock_t
constructor and destructor and the magic_n field. In this way,
some unnecessary pointer dereferences and heap allocation will be avoided
and debugging might be a little easier.
2019-05-13 18:46:12 +03:00
Marko Mäkelä
26a14ee130 Merge 10.1 into 10.2 2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru
c0ac0b8860 Update FSF address 2019-05-11 19:25:02 +03:00
Marko Mäkelä
d3dcec5d65 Merge 10.3 into 10.4 2019-05-05 15:06:44 +03:00
Marko Mäkelä
b6f4cccd19 Merge 10.2 into 10.3 2019-05-03 20:14:09 +03:00
Marko Mäkelä
3db94d2403 MDEV-19346: Remove dummy InnoDB log checkpoints
log_checkpoint(), log_make_checkpoint_at(): Remove the parameter
write_always. It seems that the primary purpose of this parameter
was to ensure in the function recv_reset_logs() that both checkpoint
header pages will be overwritten, when the function is called from
the never-enabled function recv_recovery_from_archive_start().

create_log_files(): Merge recv_reset_logs() to its only caller.

Debug instrumentation: Prefer to flush the redo log, instead of
triggering a redo log checkpoint.

page_header_set_field(): Disable a debug assertion that will
always fail due to MDEV-19344, now that we no longer initiate
a redo log checkpoint before an injected crash.

In recv_reset_logs() there used to be two calls to
log_make_checkpoint_at(). The apparent purpose of this was
to ensure that both InnoDB redo log checkpoint header pages
will be initialized or overwritten.
The second call was removed (without any explanation) in MySQL 5.6.3:
mysql/mysql-server@4ca37968da

In MySQL 5.6.8 WL#6494, starting with
mysql/mysql-server@00a0ba8ad9
the function recv_reset_logs() was not only invoked during
InnoDB data file initialization, but also during a regular
startup when the redo log is being resized.

mysql/mysql-server@45e9167983
in MySQL 5.7.2 removed the UNIV_LOG_ARCHIVE code, but still
did not remove the parameter write_always.
2019-05-03 20:02:11 +03:00
Marko Mäkelä
8b480df63e Merge 10.3 into 10.4 2019-03-25 17:18:15 +02:00
Marko Mäkelä
c3a6c683e2 Merge 10.2 into 10.3 2019-03-25 11:03:19 +02:00
Marko Mäkelä
b59d484696 MDEV-14126: Remove page_is_root()
The predicate page_is_root(), which was added in MariaDB Server 10.2.2,
is based on a wrong assumption.

Under some circumstances, InnoDB can transform B-trees into a degenerate
state where a non-leaf page has no sibling pages. Because of this,
we cannot assume that a page that has no siblings is the root page.
This bug will be tracked as MDEV-19022.

Because of the bug that may affect many InnoDB data files, we must remove
and replace the wrong predicate. Using the wrong predicate can cause
corruption. A leaf page is not allowed to be empty except if it is the
root page, and the entire table is empty.
2019-03-25 10:53:00 +02:00
Marko Mäkelä
6b6fa3cdb1 MDEV-18644: Support full_crc32 for page_compressed
This is a follow-up task to MDEV-12026, which introduced
innodb_checksum_algorithm=full_crc32 and a simpler page format.
MDEV-12026 did not enable full_crc32 for page_compressed tables,
which we will be doing now.

This is joint work with Thirunarayanan Balathandayuthapani.

For innodb_checksum_algorithm=full_crc32 we change the
page_compressed format as follows:

FIL_PAGE_TYPE: The most significant bit will be set to indicate
page_compressed format. The least significant bits will contain
the compressed page size, rounded up to a multiple of 256 bytes.

The checksum will be stored in the last 4 bytes of the page
(whether it is the full page or a page_compressed page whose
size is determined by FIL_PAGE_TYPE), covering all preceding
bytes of the page. If encryption is used, then the page will
be encrypted between compression and computing the checksum.
For page_compressed, FIL_PAGE_LSN will not be repeated at
the end of the page.

FSP_SPACE_FLAGS (already implemented as part of MDEV-12026):
We will store the innodb_compression_algorithm that may be used
to compress pages. Previously, the choice of algorithm was written
to each compressed data page separately, and one would be unable
to know in advance which compression algorithm(s) are used.

fil_space_t::full_crc32_page_compressed_len(): Determine if the
page_compressed algorithm of the tablespace needs to know the
exact length of the compressed data. If yes, we will reserve and
write an extra byte for this right before the checksum.

buf_page_is_compressed(): Determine if a page uses page_compressed
(in any innodb_checksum_algorithm).

fil_page_decompress(): Pass also fil_space_t::flags so that the
format can be determined.

buf_page_is_zeroes(): Check if a page is full of zero bytes.

buf_page_full_crc32_is_corrupted(): Renamed from
buf_encrypted_full_crc32_page_is_corrupted(). For full_crc32,
we always simply validate the checksum to the page contents,
while the physical page size is explicitly specified by an
unencrypted part of the page header.

buf_page_full_crc32_size(): Determine the size of a full_crc32 page.

buf_dblwr_check_page_lsn(): Make this a debug-only function, because
it involves potentially costly lookups of fil_space_t.

create_table_info_t::check_table_options(),
ha_innobase::check_if_supported_inplace_alter(): Do allow the creation
of SPATIAL INDEX with full_crc32 also when page_compressed is used.

commit_cache_norebuild(): Preserve the compression algorithm when
updating the page_compression_level.

dict_tf_to_fsp_flags(): Set the flags for page compression algorithm.
FIXME: Maybe there should be a table option page_compression_algorithm
and a session variable to back it?
2019-03-18 14:08:43 +02:00
Marko Mäkelä
58f3ff7175 Merge 10.3 into 10.4 2019-03-11 18:27:58 +02:00
Marko Mäkelä
814205f306 Merge 10.2 into 10.3 2019-03-11 17:49:36 +02:00
Marko Mäkelä
1ab049e572 MDEV-18878 Purge: Optimize away futile table lookups
If a table has been dropped, rebuilt, or its tablespace has been
discarded or the table is corrupted, it does not make sense to
look up that table again while purging old undo log records.

purge_node_t::purge_node_t(): Replaces row_purge_node_create().

que_common_t::que_common_t(): Constructor.

row_import_update_index_root(): Remove the constant parameter
dict_locked=true, and update the table->def_trx_id in the cache.

purge_node_t::unavailable_table_id: The latest unavailable table ID,
to avoid future lookups.

purge_node_t::def_trx_id: The latest modification of the table
identified by unavailable_table_id, or TRX_ID_MAX.

purge_node_t::is_skipped(): Determine if a table should be skipped.

purge_node_t::skip(): Note that a table should be skipped.
2019-03-11 17:17:24 +02:00
Thirunarayanan Balathandayuthapani
c0f47a4a58 MDEV-12026: Implement innodb_checksum_algorithm=full_crc32
MariaDB data-at-rest encryption (innodb_encrypt_tables)
had repurposed the same unused data field that was repurposed
in MySQL 5.7 (and MariaDB 10.2) for the Split Sequence Number (SSN)
field of SPATIAL INDEX. Because of this, MariaDB was unable to
support encryption on SPATIAL INDEX pages.

Furthermore, InnoDB page checksums skipped some bytes, and there
are multiple variations and checksum algorithms. By default,
InnoDB accepts all variations of all algorithms that ever existed.
This unnecessarily weakens the page checksums.

We hereby introduce two more innodb_checksum_algorithm variants
(full_crc32, strict_full_crc32) that are special in a way:
When either setting is active, newly created data files will
carry a flag (fil_space_t::full_crc32()) that indicates that
all pages of the file will use a full CRC-32C checksum over the
entire page contents (excluding the bytes where the checksum
is stored, at the very end of the page). Such files will always
use that checksum, no matter what the parameter
innodb_checksum_algorithm is assigned to.

For old files, the old checksum algorithms will continue to be
used. The value strict_full_crc32 will be equivalent to strict_crc32
and the value full_crc32 will be equivalent to crc32.

ROW_FORMAT=COMPRESSED tables will only use the old format.
These tables do not support new features, such as larger
innodb_page_size or instant ADD/DROP COLUMN. They may be
deprecated in the future. We do not want an unnecessary
file format change for them.

The new full_crc32() format also cleans up the MariaDB tablespace
flags. We will reserve flags to store the page_compressed
compression algorithm, and to store the compressed payload length,
so that checksum can be computed over the compressed (and
possibly encrypted) stream and can be validated without
decrypting or decompressing the page.

In the full_crc32 format, there no longer are separate before-encryption
and after-encryption checksums for pages. The single checksum is
computed on the page contents that is written to the file.

We do not make the new algorithm the default for two reasons.
First, MariaDB 10.4.2 was a beta release, and the default values
of parameters should not change after beta. Second, we did not
yet implement the full_crc32 format for page_compressed pages.
This will be fixed in MDEV-18644.

This is joint work with Marko Mäkelä.
2019-02-19 18:50:19 +02:00
Marko Mäkelä
9f56dd7382 Merge 10.3 into 10.4 2019-02-11 17:55:25 +02:00
Marko Mäkelä
4e7ee166a9 MDEV-18295 IMPORT TABLESPACE fails with instant-altered tables
When importing a tablespace, we must initialize dummy DEFAULT NULL
values for any instantly added columns in order to avoid a debug
assertion failure when PageConverter::update_records() invokes
rec_get_offsets(). Finally, when the operation completes, we must
evict and reload the table definition, so that the correct
default values for instantly added columns will be loaded.

ha_innobase::discard_or_import_tablespace(): On successful
IMPORT TABLESPACE, evict and reload the table definition,
so that btr_cur_instant_init() will load the correct metadata.

PageConverter::update_index_page(): Fill in dummy DEFAULT NULL values
for instantly added columns. These will be replaced upon the
completion of the operation by evicting and reloading the metadata.

row_discard_tablespace(): Invoke dict_table_t::remove_instant().
After DISCARD TABLESPACE, the table is no longer in "instant ALTER"
format, because there is no data file attached.
2019-02-11 14:42:48 +02:00
Marko Mäkelä
0a1c3477bf MDEV-18493 Remove page_size_t
MySQL 5.7 introduced the class page_size_t and increased the size of
buffer pool page descriptors by introducing this object to them.

Maybe the intention of this exercise was to prepare for a future
where the buffer pool could accommodate multiple page sizes.
But that future never arrived, not even in MySQL 8.0. It is much
easier to manage a pool of a single page size, and typically all
storage devices of an InnoDB instance benefit from using the same
page size.

Let us remove page_size_t from MariaDB Server. This will make it
easier to remove support for ROW_FORMAT=COMPRESSED (or make it a
compile-time option) in the future, just by removing various
occurrences of zip_size.
2019-02-07 12:21:35 +02:00
Marko Mäkelä
b5763ecd01 Merge 10.3 into 10.4 2018-12-18 11:33:53 +02:00
Marko Mäkelä
45531949ae Merge 10.2 into 10.3 2018-12-18 09:15:41 +02:00
Marko Mäkelä
7d245083a4 Merge 10.1 into 10.2 2018-12-17 20:15:38 +02:00
Marko Mäkelä
8c43f96388 Follow-up to MDEV-12112: corruption in encrypted table may be overlooked
The initial fix only covered a part of Mariabackup.
This fix hardens InnoDB and XtraDB in a similar way, in order
to reduce the probability of mistaking a corrupted encrypted page
for a valid unencrypted one.

This is based on work by Thirunarayanan Balathandayuthapani.

fil_space_verify_crypt_checksum(): Assert that key_version!=0.
Let the callers guarantee that. Now that we have this assertion,
we also know that buf_page_is_zeroes() cannot hold.
Also, remove all diagnostic output and related parameters,
and let the relevant callers emit such messages.
Last but not least, validate the post-encryption checksum
according to the innodb_checksum_algorithm (only accepting
one checksum for the strict variants), and no longer
try to validate the page as if it was unencrypted.

buf_page_is_zeroes(): Move to the compilation unit of the only callers,
and declare static.

xb_fil_cur_read(), buf_page_check_corrupt(): Add a condition before
calling fil_space_verify_crypt_checksum(). This is a non-functional
change.

buf_dblwr_process(): Validate the page only as encrypted or unencrypted,
but not both.
2018-12-17 19:33:44 +02:00
Marko Mäkelä
0abd2766b1 Merge 10.2 into 10.3
Also, related to MDEV-15522, MDEV-17304, MDEV-17835,
remove the Galera xtrabackup tests, because xtrabackup never worked
with MariaDB Server 10.3 due to InnoDB redo log format changes.
2018-11-30 09:38:56 +02:00
Marko Mäkelä
447e493179 Remove some unnecessary InnoDB #include 2018-11-29 12:53:44 +02:00
Marko Mäkelä
7dcbc33db5 Merge 10.3 into 10.4 2018-11-26 17:20:07 +02:00
Marko Mäkelä
06e5f28f9f MDEV-12266: Remove a level of pointer indirection
Replace table->space->id with table->space_id.
2018-11-22 17:10:26 +02:00
Marko Mäkelä
4be0855cf5 MDEV-17794 Do not assign persistent ID for temporary tables
InnoDB in MySQL 5.7 introduced two new parameters to the function
dict_hdr_get_new_id(), to allow redo logging to be disabled when
assigning identifiers to temporary tables or during the
backup-unfriendly TRUNCATE TABLE that was replaced in MariaDB
by MDEV-13564.

Now that MariaDB 10.4.0 removed the crash recovery code for the
backup-unfriendly TRUNCATE, we can revert dict_hdr_get_new_id()
to be used only for persistent data structures.

dict_table_assign_new_id(): Remove. This was a simple 2-line function
that was called from few places.

dict_table_open_on_id_low(): Declare in the only file where it
is called.

dict_sys_t::temp_id_hash: A separate lookup table for temporary tables.
Table names will be in the common dict_sys_t::table_hash.

dict_sys_t::get_temporary_table_id(): Assign a temporary table ID.

dict_sys_t::get_table(): Look up a persistent table.

dict_sys_t::get_temporary_table(): Look up a temporary table.

dict_sys_t::temp_table_id: The sequence of temporary table identifiers.
Starts from DICT_HDR_FIRST_ID, so that we can continue to simply compare
dict_table_t::id to a few constants for the persistent hard-coded
data dictionary tables.

undo_node_t::state: Distinguish temporary and persistent tables.

lock_check_dict_lock(), lock_get_table_id(): Assert that there cannot
be locks on temporary tables.

row_rec_to_index_entry_impl(): Assert that there cannot be metadata
records on temporary tables.

row_undo_ins_parse_undo_rec(): Distinguish temporary and persistent tables.
Move some assertions from the only caller. Return whether the table was
found.

row_undo_ins(): Add some assertions.

row_undo_mod_clust(), row_undo_mod(): Do not assign node->state.
Let row_undo() do that.

row_undo_mod_parse_undo_rec(): Distinguish temporary and persistent tables.
Move some assertions from the only caller. Return whether the table was
found.

row_undo_try_truncate(): Renamed and simplified from trx_roll_try_truncate().

row_undo_rec_get(): Replaces trx_roll_pop_top_rec_of_trx() and
trx_roll_pop_top_rec(). Fetch an undo log record, and assign undo->state
accordingly.

trx_undo_truncate_end(): Acquire the rseg->mutex only for the minimum
required duration, and release it between mini-transactions.
2018-11-22 15:42:52 +02:00
Marko Mäkelä
dde2ca4aa1 Merge 10.3 into 10.4 2018-11-19 20:22:33 +02:00
Marko Mäkelä
fd58bb71e2 Merge 10.2 into 10.3 2018-11-19 18:45:53 +02:00
Marko Mäkelä
ff88e4bb8a Remove many redundant #include from InnoDB 2018-11-19 11:42:14 +02:00
Marko Mäkelä
eea0c3c3e7 MDEV-17750: Remove unnecessary rec_get_offsets() in IMPORT TABLESPACE
row_import_set_sys_max_row_id(): Change the return type to void,
and access the first column (DB_ROW_ID) directly.
2018-11-16 20:37:00 +02:00
Marko Mäkelä
074c684099 Merge 10.3 into 10.4 2018-11-06 16:24:16 +02:00
Marko Mäkelä
df563e0c03 Merge 10.2 into 10.3
main.derived_cond_pushdown: Move all 10.3 tests to the end,
trim trailing white space, and add an "End of 10.3 tests" marker.
Add --sorted_result to tests where the ordering is not deterministic.

main.win_percentile: Add --sorted_result to tests where the
ordering is no longer deterministic.
2018-11-06 09:40:39 +02:00
Sergei Golubchik
44f6f44593 Merge branch '10.0' into 10.1 2018-10-30 15:10:01 +01:00
Sergei Golubchik
87d852f102 Merge branch 'merge/merge-innodb-5.6' into 10.0 2018-10-28 01:22:18 +02:00
Sergei Golubchik
da34c7de5d 5.6.42 2018-10-27 21:05:16 +02:00
Eugene Kosov
14be814380 MDEV-17491 micro optimize page_id_t
page_id_t: remove m_fold member

various places: pass page_id_t by value instead of by reference
2018-10-25 18:46:27 +03:00
Marko Mäkelä
0e5a4ac253 MDEV-15662 Instant DROP COLUMN or changing the order of columns
Allow ADD COLUMN anywhere in a table, not only adding as the
last column.

Allow instant DROP COLUMN and instant changing the order of columns.

The added columns will always be added last in clustered index records.
In new records, instantly dropped columns will be stored as NULL or
empty when possible.

Information about dropped and reordered columns will be written in
a metadata BLOB (mblob), which is stored before the first 'user' field
in the hidden metadata record at the start of the clustered index.
The presence of mblob is indicated by setting the delete-mark flag in
the metadata record.

The metadata BLOB stores the number of clustered index fields,
followed by an array of column information for each field.
For dropped columns, we store the NOT NULL flag, the fixed length,
and for variable-length columns, whether the maximum length exceeded
255 bytes. For non-dropped columns, we store the column position.

Unlike with MDEV-11369, when a table becomes empty, it cannot
be converted back to the canonical format. The reason for this is
that other threads may hold cached objects such as
row_prebuilt_t::ins_node that could refer to dropped or reordered
index fields.

For instant DROP COLUMN and ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC,
we must store the n_core_null_bytes in the root page, so that the
chain of node pointer records can be followed in order to reach the
leftmost leaf page where the metadata record is located.
If the mblob is present, we will zero-initialize the strings
"infimum" and "supremum" in the root page, and use the last byte of
"supremum" for storing the number of null bytes (which are allocated
but useless on node pointer pages). This is necessary for
btr_cur_instant_init_metadata() to be able to navigate to the mblob.

If the PRIMARY KEY contains any variable-length column and some
nullable columns were instantly dropped, the dict_index_t::n_nullable
in the data dictionary could be smaller than it actually is in the
non-leaf pages. Because of this, the non-leaf pages could use more
bytes for the null flags than the data dictionary expects, and we
could be reading the lengths of the variable-length columns from the
wrong offset, and thus reading the child page number from wrong place.
This is the result of two design mistakes that involve unnecessary
storage of data: First, it is nonsense to store any data fields for
the leftmost node pointer records, because the comparisons would be
resolved by the MIN_REC_FLAG alone. Second, there cannot be any null
fields in the clustered index node pointer fields, but we nevertheless
reserve space for all the null flags.

Limitations (future work):

MDEV-17459 Allow instant ALTER TABLE even if FULLTEXT INDEX exists
MDEV-17468 Avoid table rebuild on operations on generated columns
MDEV-17494 Refuse ALGORITHM=INSTANT when the row size is too large

btr_page_reorganize_low(): Preserve any metadata in the root page.
Call lock_move_reorganize_page() only after restoring the "infimum"
and "supremum" records, to avoid a memcmp() assertion failure.

dict_col_t::DROPPED: Magic value for dict_col_t::ind.

dict_col_t::clear_instant(): Renamed from dict_col_t::remove_instant().
Do not assert that the column was instantly added, because we
sometimes call this unconditionally for all columns.
Convert an instantly added column to a "core column". The old name
remove_instant() could be mistaken to refer to "instant DROP COLUMN".

dict_col_t::is_added(): Rename from dict_col_t::is_instant().

dtype_t::metadata_blob_init(): Initialize the mblob data type.

dtuple_t::is_metadata(), dtuple_t::is_alter_metadata(),
upd_t::is_metadata(), upd_t::is_alter_metadata(): Check if info_bits
refer to a metadata record.

dict_table_t::instant: Metadata about dropped or reordered columns.

dict_table_t::prepare_instant(): Prepare
ha_innobase_inplace_ctx::instant_table for instant ALTER TABLE.
innobase_instant_try() will pass this to dict_table_t::instant_column().
On rollback, dict_table_t::rollback_instant() will be called.

dict_table_t::instant_column(): Renamed from instant_add_column().
Add the parameter col_map so that columns can be reordered.
Copy and adjust v_cols[] as well.

dict_table_t::find(): Find an old column based on a new column number.

dict_table_t::serialise_columns(), dict_table_t::deserialise_columns():
Convert the mblob.

dict_index_t::instant_metadata(): Create the metadata record
for instant ALTER TABLE. Invoke dict_table_t::serialise_columns().

dict_index_t::reconstruct_fields(): Invoked by
dict_table_t::deserialise_columns().

dict_index_t::clear_instant_alter(): Move the fields for the
dropped columns to the end, and sort the surviving index fields
in ascending order of column position.

ha_innobase::check_if_supported_inplace_alter(): Do not allow
adding a FTS_DOC_ID column if a hidden FTS_DOC_ID column exists
due to FULLTEXT INDEX. (This always required ALGORITHM=COPY.)

instant_alter_column_possible(): Add a parameter for InnoDB table,
to check for additional conditions, such as the maximum number of
index fields.

ha_innobase_inplace_ctx::first_alter_pos: The first column whose position
is affected by instant ADD, DROP, or changing the order of columns.

innobase_build_col_map(): Skip added virtual columns.

prepare_inplace_add_virtual(): Correctly compute num_to_add_vcol.
Remove some unnecessary code. Note that the call to
innodb_base_col_setup() should be executed later.

commit_try_norebuild(): If ctx->is_instant(), let the virtual
columns be added or dropped by innobase_instant_try().

innobase_instant_try(): Fill in a zero default value for the
hidden column FTS_DOC_ID (to reduce the work needed in MDEV-17459).
If any columns were dropped or reordered (or added not last),
delete any SYS_COLUMNS records for the following columns, and
insert SYS_COLUMNS records for all subsequent stored columns as well
as for all virtual columns. If any virtual column is dropped, rewrite
all virtual column metadata. Use a shortcut only for adding
virtual columns. This is because innobase_drop_virtual_try()
assumes that the dropped virtual columns still exist in ctx->old_table.

innodb_update_cols(): Renamed from innodb_update_n_cols().

innobase_add_one_virtual(), innobase_insert_sys_virtual(): Change
the return type to bool, and invoke my_error() when detecting an error.

innodb_insert_sys_columns(): Insert a record into SYS_COLUMNS.
Refactored from innobase_add_one_virtual() and innobase_instant_add_col().

innobase_instant_add_col(): Replace the parameter dfield with type.

innobase_instant_drop_cols(): Drop matching columns from SYS_COLUMNS
and all columns from SYS_VIRTUAL.

innobase_add_virtual_try(), innobase_drop_virtual_try(): Let
the caller invoke innodb_update_cols().

innobase_rename_column_try(): Skip dropped columns.

commit_cache_norebuild(): Update table->fts->doc_col.

dict_mem_table_col_rename_low(): Skip dropped columns.

trx_undo_rec_get_partial_row(): Skip dropped columns.

trx_undo_update_rec_get_update(): Handle the metadata BLOB correctly.

trx_undo_page_report_modify(): Avoid out-of-bounds access to record fields.
Log metadata records consistently.
Apparently, the first fields of a clustered index may be updated
in an update_undo vector when the index is ID_IND of SYS_FOREIGN,
as part of renaming the table during ALTER TABLE. Normally, updates of
the PRIMARY KEY should be logged as delete-mark and an insert.

row_undo_mod_parse_undo_rec(), row_purge_parse_undo_rec():
Use trx_undo_metadata.

row_undo_mod_clust_low(): On metadata rollback, roll back the root page too.

row_undo_mod_clust(): Relax an assertion. The delete-mark flag was
repurposed for ALTER TABLE metadata records.

row_rec_to_index_entry_impl(): Add the template parameter mblob
and the optional parameter info_bits for specifying the desired new
info bits. For the metadata tuple, allow conversion between the original
format (ADD COLUMN only) and the generic format (with hidden BLOB).
Add the optional parameter "pad" to determine whether the tuple should
be padded to the index fields (on ALTER TABLE it should), or whether
it should remain at its original size (on rollback).

row_build_index_entry_low(): Clean up the code, removing
redundant variables and conditions. For instantly dropped columns,
generate a dummy value that is NULL, the empty string, or a
fixed length of NUL bytes, depending on the type of the dropped column.

row_upd_clust_rec_by_insert_inherit_func(): On the update of PRIMARY KEY
of a record that contained a dropped column whose value was stored
externally, we will be inserting a dummy NULL or empty string value
to the field of the dropped column. The externally stored column would
eventually be dropped when purge removes the delete-marked record for
the old PRIMARY KEY value.

btr_index_rec_validate(): Recognize the metadata record.

btr_discard_only_page_on_level(): Preserve the generic instant
ALTER TABLE metadata.

btr_set_instant(): Replaces page_set_instant(). This sets a clustered
index root page to the appropriate format, or upgrades from
the MDEV-11369 instant ADD COLUMN to generic ALTER TABLE format.

btr_cur_instant_init_low(): Read and validate the metadata BLOB page
before reconstructing the dictionary information based on it.

btr_cur_instant_init_metadata(): Do not read any lengths from the
metadata record header before reading the BLOB. At this point, we
would not actually know how many nullable fields the metadata record
contains.

btr_cur_instant_root_init(): Initialize n_core_null_bytes in one
of two possible ways.

btr_cur_trim(): Handle the mblob record.

row_metadata_to_tuple(): Convert a metadata record to a data tuple,
based on the new info_bits of the metadata record.

btr_cur_pessimistic_update(): Invoke row_metadata_to_tuple() if needed.
Invoke dtuple_convert_big_rec() for metadata records if the record is
too large, or if the mblob is not yet marked as externally stored.

btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete():
When the last user record is deleted, do not delete the
generic instant ALTER TABLE metadata record. Only delete
MDEV-11369 instant ADD COLUMN metadata records.

btr_cur_optimistic_insert(): Avoid unnecessary computation of rec_size.

btr_pcur_store_position(): Allow a logically empty page to contain
a metadata record for generic ALTER TABLE.

REC_INFO_DEFAULT_ROW_ADD: Renamed from REC_INFO_DEFAULT_ROW.
This is for the old instant ADD COLUMN (MDEV-11369) only.

REC_INFO_DEFAULT_ROW_ALTER: The more generic metadata record,
with additional information for dropped or reordered columns.

rec_info_bits_valid(): Remove. The only case when this would fail
is when the record is the generic ALTER TABLE metadata record.

rec_is_alter_metadata(): Check if a record is the metadata record
for instant ALTER TABLE (other than ADD COLUMN). NOTE: This function
must not be invoked on node pointer records, because the delete-mark
flag in those records may be set (it is garbage), and then a debug
assertion could fail because index->is_instant() does not necessarily
hold.

rec_is_add_metadata(): Check if a record is MDEV-11369 ADD COLUMN metadata
record (not more generic instant ALTER TABLE).

rec_get_converted_size_comp_prefix_low(): Assume that the metadata
field will be stored externally. In dtuple_convert_big_rec() during
the rec_get_converted_size() call, it would not be there yet.

rec_get_converted_size_comp(): Replace status,fields,n_fields with tuple.

rec_init_offsets_comp_ordinary(), rec_get_converted_size_comp_prefix_low(),
rec_convert_dtuple_to_rec_comp(): Add template<bool mblob = false>.
With mblob=true, process a record with a metadata BLOB.

rec_copy_prefix_to_buf(): Assert that no fields beyond the key and
system columns are being copied. Exclude the metadata BLOB field.

rec_convert_dtuple_to_metadata_comp(): Convert an alter metadata tuple
into a record.

row_upd_index_replace_metadata(): Apply an update vector to an
alter_metadata tuple.

row_log_allocate(): Replace dict_index_t::is_instant()
with a more appropriate condition that ignores dict_table_t::instant.
Only a table on which the MDEV-11369 ADD COLUMN was performed
can "lose its instantness" when it becomes empty. After
instant DROP COLUMN or reordering columns, we cannot simply
convert the table to the canonical format, because the data
dictionary cache and all possibly existing references to it
from other client connection threads would have to be adjusted.

row_quiesce_write_index_fields(): Do not crash when the table contains
an instantly dropped column.

Thanks to Thirunarayanan Balathandayuthapani for discussing the design
and implementing an initial prototype of this.
Thanks to Matthias Leich for testing.
2018-10-19 18:57:23 +03:00
Marko Mäkelä
755187c853 Terminology: 'metadata record' instead of 'default row'
For instant ALTER TABLE, we store a hidden metadata record at the
start of the clustered index, to indicate how the format of the
records differs from the latest table definition.

The term 'default row' is too specific, because it applies to
instant ADD COLUMN only, and we will be supporting more classes
of instant ALTER TABLE later on. For instant ADD COLUMN, we
store the initial default values in the metadata record.
2018-09-19 07:21:24 +03:00
Marko Mäkelä
7830fb7f45 Merge 10.2 into 10.3 2018-08-28 12:22:56 +03:00
Marko Mäkelä
cccdb176a6 MDEV-16862 build failure for WITH_INNODB_AHI=0
Fix the build, which was broken by MDEV-16515.
2018-08-21 12:10:18 +03:00
Marko Mäkelä
05459706f2 Merge 10.2 into 10.3 2018-08-03 15:57:23 +03:00
Marko Mäkelä
814ae57daf Merge 10.1 into 10.2 2018-08-03 13:02:56 +03:00
Marko Mäkelä
0d3972c6be Merge 10.0 into 10.1 2018-08-03 12:03:10 +03:00
Marko Mäkelä
9dfef6e29b Fix -Wclass-memaccess warnings in InnoDB,XtraDB 2018-08-03 11:53:57 +03:00
Marko Mäkelä
ef3070e997 Merge 10.1 into 10.2 2018-08-02 08:19:57 +03:00