1
0
mirror of https://github.com/MariaDB/server.git synced 2025-05-25 13:42:52 +03:00

23 Commits

Author SHA1 Message Date
Sergei Golubchik
f71d7f2f0f Merge branch '10.5' into 10.6 2024-03-13 21:02:34 +01:00
Thirunarayanan Balathandayuthapani
57cc8605eb MDEV-19044 Alter table corrupts while applying the modification log
Problem:
========
- InnoDB reads the length of the variable length field wrongly
while applying the modification log of instant table.

Solution:
========
rec_init_offsets_comp_ordinary(): For the temporary instant
file record, InnoDB should read the length of the variable length
field from the record itself.
2024-02-27 12:59:46 +05:30
Marko Mäkelä
cd04673a17 MDEV-32050 fixup: innodb.instant_alter_crash (take 2)
We must disable persistent statistics, because a transaction commit
from dict_stats_save() would occasionally interfere with this test.
2023-11-20 16:57:57 +02:00
Marko Mäkelä
55a96c055a MDEV-32050 fixup: innodb.instant_alter_crash
This test occasionally fails with a failure to purge history.
Let us try to purge everything before starting the interesting part,
to make that occasional failure go away.
2023-11-16 16:39:02 +02:00
Marko Mäkelä
14685b10df MDEV-32050: Deprecate&ignore innodb_purge_rseg_truncate_frequency
The motivation of introducing the parameter
innodb_purge_rseg_truncate_frequency in
mysql/mysql-server@28bbd66ea5 and
mysql/mysql-server@8fc2120fed
seems to have been to avoid stalls due to freeing undo log pages
or truncating undo log tablespaces. In MariaDB Server,
innodb_undo_log_truncate=ON should be a much lighter operation
than in MySQL, because it will not involve any log checkpoint.

Another source of performance stalls should be
trx_purge_truncate_rseg_history(), which is shrinking the history list
by freeing the undo log pages whose undo records have been purged.
To alleviate that, we will introduce a purge_truncation_task that will
offload this from the purge_coordinator_task. In that way, the next
innodb_purge_batch_size pages may be parsed and purged while the pages
from the previous batch are being freed and the history list being shrunk.

The processing of innodb_undo_log_truncate=ON will still remain the
responsibility of the purge_coordinator_task.

purge_coordinator_state::count: Remove. We will ignore
innodb_purge_rseg_truncate_frequency, and act as if it had been
set to 1 (the maximum shrinking frequency).

purge_coordinator_state::do_purge(): Invoke an asynchronous task
purge_truncation_callback() to free the undo log pages.

purge_sys_t::iterator::free_history(): Free those undo log pages
that have been processed. This used to be a part of
trx_purge_truncate_history().

purge_sys_t::clone_end_view(): Take a new value of purge_sys.head
as a parameter, so that it will be updated while holding exclusive
purge_sys.latch. This is needed for race-free access to the field
in purge_truncation_callback().

Reviewed by: Vladislav Lesin
2023-10-25 09:11:58 +03:00
Marko Mäkelä
5bada1246d Merge 10.5 into 10.6 2023-04-11 16:15:19 +03:00
Thirunarayanan Balathandayuthapani
dfdcd7ffab MDEV-26198 Assertion `0' failed in row_log_table_apply_op during
redundant table rebuild

- InnoDB alter fails to apply the online log during redundant table
rebuild. Problem is that InnoDB wrongly reads the length flags of the
record while applying the temporary log record.

rec_init_offsets_comp_ordinary(): For finding the n_core_null_bytes,
InnoDB should use the same logic as rec_convert_dtuple_to_rec_comp().
2023-03-14 13:34:23 +05:30
Marko Mäkelä
e572c745dc MDEV-29504/MDEV-29849 TRUNCATE breaks FOREIGN KEY locking
ha_innobase::referenced_by_foreign_key(): Protect the check with
dict_sys.freeze(), to prevent races with TRUNCATE TABLE.
The test innodb.instant_alter_crash has been adjusted for this
additional locking.

dict_table_is_referenced_by_foreign_key(): Removed (merged to
the only caller).

create_table_info_t::create_table(): Ignore missing indexes for
FOREIGN KEY constraints if foreign_key_checks=0.

create_table_info_t::create_table_update_dict(): Rewritten as
a static function. Do not return any error.

ha_innobase::create(): When trx!=nullptr and we are operating
on a persistent table, do not rollback, commit, or release the
data dictionary latch.

ha_innobase::truncate(): Protect the entire critical section
with an exclusive dict_sys.latch, so that
ha_innobase::referenced_by_foreign_key() on referenced tables
will return a consistent result. In case of a failure,
invoke dict_load_foreigns() to restore also any FOREIGN KEY
constraints.

ha_innobase::free_foreign_key_create_info(): Define inline.

lock_release(): Disregard innodb_evict_tables_on_commit_debug=ON
when dict_sys.locked() holds. It would hold when fts_load_stopword()
is invoked by create_table_info_t::create_table_update_dict().

dict_sys_t::locked(): Return whether the current thread is holding
the exclusive dict_sys.latch.

dict_sys_t::frozen_not_locked(): Return whether any thread is
holding a shared dict_sys.latch.

In the test main.mysql_upgrade, the InnoDB persistent statistics
will no longer be recalculated in ha_innobase::open() as part of
CHECK TABLE ... FOR UPGRADE. They were deleted earlier in the test.

Tested by: Matthias Leich
2022-11-08 17:34:34 +02:00
Marko Mäkelä
d2e649aec2 MDEV-29440 InnoDB instant ALTER TABLE recovery must use READ UNCOMMITTED
In commit 8f8ba758559e473f643baa0a0601d321c42517b9 (MDEV-27234)
the data dictionary recovery was changed to use READ COMMITTED
so that table-rebuild operations (OPTIMIZE TABLE, TRUNCATE TABLE,
some forms of ALTER TABLE) would be recovered correctly.

However, for operations that avoid a table rebuild thanks to
being able to instantly ADD, DROP or reorder columns, recovery
must use the READ UNCOMMITTED isolation level so that changes to
the hidden metadata record can be rolled back.

We will detect instant operations by detecting uncommitted changes
to SYS_COLUMNS in case there is no uncommitted change of SYS_TABLES.ID
for the table. In any table-rebuilding DDL operation, the SYS_TABLES.ID
(and likely also the table name) will be updated.

As part of rolling back the instant ALTER TABLE operation, after the
operation on the hidden metadata record has been rolled back, a rollback
of an INSERT into SYS_COLUMNS in row_undo_ins_remove_clust_rec() will
invoke trx_t::evict_table() to discard the READ UNCOMMITTED definition
of the table. After that, subsequent recovery steps will load and use
the correct table definition.

Reviewed by: Thirunarayanan Balathandayuthapani
Tested by: Matthias Leich
2022-09-08 14:57:50 +03:00
Marko Mäkelä
bdae5508b8 Cleanup: Remove a test hack
Merge commit 2b6f8044903dc974c32e071bc6a7c4099481ae80 introduced
the debug injection point dict_sys_mutex_avoid to make the test
innodb.instant_alter_crash work even after the MDEV-23991 fix
(commit afc9d00c66db946c8240fe1fa6b345a3a8b6fec1).

Thanks to DDL being atomic and crash-safe in MariaDB 10.6
(mainly thanks to commit 7762ee5dbec5c336628c06bbe950837257276e57)
we do not actually need this hack anymore.
2021-07-26 14:51:19 +03:00
Monty
7762ee5dbe MDEV-25180 Atomic ALTER TABLE
MDEV-25604 Atomic DDL: Binlog event written upon recovery does not
           have default database

The purpose of this task is to ensure that ALTER TABLE is atomic even if
the MariaDB server would be killed at any point of the alter table.
This means that either the ALTER TABLE succeeds (including that triggers,
the status tables and the binary log are updated) or things should be
reverted to their original state.

If the server crashes before the new version is fully up to date and
commited, it will revert to the original table and remove all
temporary files and tables.
If the new version is commited, crash recovery will use the new version,
and update triggers, the status tables and the binary log.
The one execption is ALTER TABLE .. RENAME .. where no changes are done
to table definition. This one will work as RENAME and roll back unless
the whole statement completed, including updating the binary log (if
enabled).

Other changes:
- Added handlerton->check_version() function to allow the ddl recovery
  code to check, in case of inplace alter table, if the table in the
  storage engine is of the new or old version.
- Added handler->table_version() so that an engine can report the current
  version of the table. This should be changed each time the table
  definition changes.
- Added  ha_signal_ddl_recovery_done() and
  handlerton::signal_ddl_recovery_done() to inform all handlers when
  ddl recovery has been done. (Needed by InnoDB).
- Added handlerton call inplace_alter_table_committed, to signal engine
  that ddl_log has been closed for the alter table query.
- Added new handerton flag
  HTON_REQUIRES_NOTIFY_TABLEDEF_CHANGED_AFTER_COMMIT to signal when we
  should call hton->notify_tabledef_changed() during
  mysql_inplace_alter_table. This was required as MyRocks and InnoDB
  needed the call at different times.
- Added function server_uuid_value() to be able to generate a temporary
  xid when ddl recovery writes the query to the binary log. This is
  needed to be able to handle crashes during ddl log recovery.
- Moved freeing of the frm definition to end of mysql_alter_table() to
  remove duplicate code and have a common exit strategy.

-------
InnoDB part of atomic ALTER TABLE
(Implemented by Marko Mäkelä)
innodb_check_version(): Compare the saved dict_table_t::def_trx_id
to determine whether an ALTER TABLE operation was committed.

We must correctly recover dict_table_t::def_trx_id for this to work.
Before purge removes any trace of DB_TRX_ID from system tables, it
will make an effort to load the user table into the cache, so that
the dict_table_t::def_trx_id can be recovered.

ha_innobase::table_version(): return garbage, or the trx_id that would
be used for committing an ALTER TABLE operation.

In InnoDB, table names starting with #sql-ib will remain special:
they will be dropped on startup. This may be revisited later in
MDEV-18518 when we implement proper undo logging and rollback
for creating or dropping multiple tables in a transaction.

Table names starting with #sql will retain some special meaning:
dict_table_t::parse_name() will not consider such names for
MDL acquisition, and dict_table_rename_in_cache() will treat such
names specially when handling FOREIGN KEY constraints.

Simplify InnoDB DROP INDEX.
Prevent purge wakeup

To ensure that dict_table_t::def_trx_id will be recovered correctly
in case the server is killed before ddl_log_complete(), we will block
the purge of any history in SYS_TABLES, SYS_INDEXES, SYS_COLUMNS
between ha_innobase::commit_inplace_alter_table(commit=true)
(purge_sys.stop_SYS()) and purge_sys.resume_SYS().
The completion callback purge_sys.resume_SYS() must be between
ddl_log_complete() and MDL release.

--------

MyRocks support for atomic ALTER TABLE
(Implemented by Sergui Petrunia)

Implement these SE API functions:
- ha_rocksdb::table_version()
- hton->check_version = rocksdb_check_versionMyRocks data dictionary
  now stores table version for each table.
  (Absence of table version record is interpreted as table_version=0,
  that is, which means no upgrade changes are needed)
- For inplace alter table of a partitioned table, call the underlying
  handlerton when checking if the table is ok. This assumes that the
  partition engine commits all changes at once.
2021-05-19 22:54:13 +02:00
Marko Mäkelä
589cf8dbf3 Merge 10.3 into 10.4 2020-12-01 19:51:14 +02:00
Marko Mäkelä
73f34336e3 MDEV-24323 Crash on recovery after kill during instant ADD COLUMN
row_undo_ins_parse_undo_rec(): Do not try to read non-existing
virtual column information for the metadata record.
2020-12-01 15:24:49 +02:00
Marko Mäkelä
7b2bb67113 Merge 10.3 into 10.4 2020-10-29 13:38:38 +02:00
Marko Mäkelä
2b6f804490 Merge 10.2 into 10.3 2020-10-28 10:44:40 +02:00
Oleksandr Byelkin
c07325f932 Merge branch '10.3' into 10.4 2019-05-19 20:55:37 +02:00
Marko Mäkelä
0c405b064d MDEV-18009: Clean up the test case
Avoid accessing the table cache while the ALTER TABLE statement
is blocked by DEBUG_SYNC. Use explicit COMMIT for forcing the
redo log flush (whose main purpose is to ensure that the
incomplete state of the blocked ALTER TABLE statement is persisted).
2019-05-09 12:33:45 +03:00
Marko Mäkelä
646a3bd72d MDEV-18009 Missing redo log flush in innodb.instant_alter_crash
Ensure that the 'auxiliary transactions' that are there for
flushing the incomplete undo log of the to-be-recovered DDL
transactions are actually making modifications.

This is a backport of 2fe40a7af05aaa4ee64afcc82ba7e462eecacaae
from MariaDB 10.4.
2019-05-09 12:33:45 +03:00
Marko Mäkelä
2fe40a7af0 MDEV-18009 Missing redo log flush in innodb.instant_alter_crash
Ensure that the 'auxiliary transactions' that are there for
flushing the incomplete undo log of the to-be-recovered DDL
transactions are actually making modifications.
2018-12-20 18:18:21 +02:00
Marko Mäkelä
0e5a4ac253 MDEV-15662 Instant DROP COLUMN or changing the order of columns
Allow ADD COLUMN anywhere in a table, not only adding as the
last column.

Allow instant DROP COLUMN and instant changing the order of columns.

The added columns will always be added last in clustered index records.
In new records, instantly dropped columns will be stored as NULL or
empty when possible.

Information about dropped and reordered columns will be written in
a metadata BLOB (mblob), which is stored before the first 'user' field
in the hidden metadata record at the start of the clustered index.
The presence of mblob is indicated by setting the delete-mark flag in
the metadata record.

The metadata BLOB stores the number of clustered index fields,
followed by an array of column information for each field.
For dropped columns, we store the NOT NULL flag, the fixed length,
and for variable-length columns, whether the maximum length exceeded
255 bytes. For non-dropped columns, we store the column position.

Unlike with MDEV-11369, when a table becomes empty, it cannot
be converted back to the canonical format. The reason for this is
that other threads may hold cached objects such as
row_prebuilt_t::ins_node that could refer to dropped or reordered
index fields.

For instant DROP COLUMN and ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC,
we must store the n_core_null_bytes in the root page, so that the
chain of node pointer records can be followed in order to reach the
leftmost leaf page where the metadata record is located.
If the mblob is present, we will zero-initialize the strings
"infimum" and "supremum" in the root page, and use the last byte of
"supremum" for storing the number of null bytes (which are allocated
but useless on node pointer pages). This is necessary for
btr_cur_instant_init_metadata() to be able to navigate to the mblob.

If the PRIMARY KEY contains any variable-length column and some
nullable columns were instantly dropped, the dict_index_t::n_nullable
in the data dictionary could be smaller than it actually is in the
non-leaf pages. Because of this, the non-leaf pages could use more
bytes for the null flags than the data dictionary expects, and we
could be reading the lengths of the variable-length columns from the
wrong offset, and thus reading the child page number from wrong place.
This is the result of two design mistakes that involve unnecessary
storage of data: First, it is nonsense to store any data fields for
the leftmost node pointer records, because the comparisons would be
resolved by the MIN_REC_FLAG alone. Second, there cannot be any null
fields in the clustered index node pointer fields, but we nevertheless
reserve space for all the null flags.

Limitations (future work):

MDEV-17459 Allow instant ALTER TABLE even if FULLTEXT INDEX exists
MDEV-17468 Avoid table rebuild on operations on generated columns
MDEV-17494 Refuse ALGORITHM=INSTANT when the row size is too large

btr_page_reorganize_low(): Preserve any metadata in the root page.
Call lock_move_reorganize_page() only after restoring the "infimum"
and "supremum" records, to avoid a memcmp() assertion failure.

dict_col_t::DROPPED: Magic value for dict_col_t::ind.

dict_col_t::clear_instant(): Renamed from dict_col_t::remove_instant().
Do not assert that the column was instantly added, because we
sometimes call this unconditionally for all columns.
Convert an instantly added column to a "core column". The old name
remove_instant() could be mistaken to refer to "instant DROP COLUMN".

dict_col_t::is_added(): Rename from dict_col_t::is_instant().

dtype_t::metadata_blob_init(): Initialize the mblob data type.

dtuple_t::is_metadata(), dtuple_t::is_alter_metadata(),
upd_t::is_metadata(), upd_t::is_alter_metadata(): Check if info_bits
refer to a metadata record.

dict_table_t::instant: Metadata about dropped or reordered columns.

dict_table_t::prepare_instant(): Prepare
ha_innobase_inplace_ctx::instant_table for instant ALTER TABLE.
innobase_instant_try() will pass this to dict_table_t::instant_column().
On rollback, dict_table_t::rollback_instant() will be called.

dict_table_t::instant_column(): Renamed from instant_add_column().
Add the parameter col_map so that columns can be reordered.
Copy and adjust v_cols[] as well.

dict_table_t::find(): Find an old column based on a new column number.

dict_table_t::serialise_columns(), dict_table_t::deserialise_columns():
Convert the mblob.

dict_index_t::instant_metadata(): Create the metadata record
for instant ALTER TABLE. Invoke dict_table_t::serialise_columns().

dict_index_t::reconstruct_fields(): Invoked by
dict_table_t::deserialise_columns().

dict_index_t::clear_instant_alter(): Move the fields for the
dropped columns to the end, and sort the surviving index fields
in ascending order of column position.

ha_innobase::check_if_supported_inplace_alter(): Do not allow
adding a FTS_DOC_ID column if a hidden FTS_DOC_ID column exists
due to FULLTEXT INDEX. (This always required ALGORITHM=COPY.)

instant_alter_column_possible(): Add a parameter for InnoDB table,
to check for additional conditions, such as the maximum number of
index fields.

ha_innobase_inplace_ctx::first_alter_pos: The first column whose position
is affected by instant ADD, DROP, or changing the order of columns.

innobase_build_col_map(): Skip added virtual columns.

prepare_inplace_add_virtual(): Correctly compute num_to_add_vcol.
Remove some unnecessary code. Note that the call to
innodb_base_col_setup() should be executed later.

commit_try_norebuild(): If ctx->is_instant(), let the virtual
columns be added or dropped by innobase_instant_try().

innobase_instant_try(): Fill in a zero default value for the
hidden column FTS_DOC_ID (to reduce the work needed in MDEV-17459).
If any columns were dropped or reordered (or added not last),
delete any SYS_COLUMNS records for the following columns, and
insert SYS_COLUMNS records for all subsequent stored columns as well
as for all virtual columns. If any virtual column is dropped, rewrite
all virtual column metadata. Use a shortcut only for adding
virtual columns. This is because innobase_drop_virtual_try()
assumes that the dropped virtual columns still exist in ctx->old_table.

innodb_update_cols(): Renamed from innodb_update_n_cols().

innobase_add_one_virtual(), innobase_insert_sys_virtual(): Change
the return type to bool, and invoke my_error() when detecting an error.

innodb_insert_sys_columns(): Insert a record into SYS_COLUMNS.
Refactored from innobase_add_one_virtual() and innobase_instant_add_col().

innobase_instant_add_col(): Replace the parameter dfield with type.

innobase_instant_drop_cols(): Drop matching columns from SYS_COLUMNS
and all columns from SYS_VIRTUAL.

innobase_add_virtual_try(), innobase_drop_virtual_try(): Let
the caller invoke innodb_update_cols().

innobase_rename_column_try(): Skip dropped columns.

commit_cache_norebuild(): Update table->fts->doc_col.

dict_mem_table_col_rename_low(): Skip dropped columns.

trx_undo_rec_get_partial_row(): Skip dropped columns.

trx_undo_update_rec_get_update(): Handle the metadata BLOB correctly.

trx_undo_page_report_modify(): Avoid out-of-bounds access to record fields.
Log metadata records consistently.
Apparently, the first fields of a clustered index may be updated
in an update_undo vector when the index is ID_IND of SYS_FOREIGN,
as part of renaming the table during ALTER TABLE. Normally, updates of
the PRIMARY KEY should be logged as delete-mark and an insert.

row_undo_mod_parse_undo_rec(), row_purge_parse_undo_rec():
Use trx_undo_metadata.

row_undo_mod_clust_low(): On metadata rollback, roll back the root page too.

row_undo_mod_clust(): Relax an assertion. The delete-mark flag was
repurposed for ALTER TABLE metadata records.

row_rec_to_index_entry_impl(): Add the template parameter mblob
and the optional parameter info_bits for specifying the desired new
info bits. For the metadata tuple, allow conversion between the original
format (ADD COLUMN only) and the generic format (with hidden BLOB).
Add the optional parameter "pad" to determine whether the tuple should
be padded to the index fields (on ALTER TABLE it should), or whether
it should remain at its original size (on rollback).

row_build_index_entry_low(): Clean up the code, removing
redundant variables and conditions. For instantly dropped columns,
generate a dummy value that is NULL, the empty string, or a
fixed length of NUL bytes, depending on the type of the dropped column.

row_upd_clust_rec_by_insert_inherit_func(): On the update of PRIMARY KEY
of a record that contained a dropped column whose value was stored
externally, we will be inserting a dummy NULL or empty string value
to the field of the dropped column. The externally stored column would
eventually be dropped when purge removes the delete-marked record for
the old PRIMARY KEY value.

btr_index_rec_validate(): Recognize the metadata record.

btr_discard_only_page_on_level(): Preserve the generic instant
ALTER TABLE metadata.

btr_set_instant(): Replaces page_set_instant(). This sets a clustered
index root page to the appropriate format, or upgrades from
the MDEV-11369 instant ADD COLUMN to generic ALTER TABLE format.

btr_cur_instant_init_low(): Read and validate the metadata BLOB page
before reconstructing the dictionary information based on it.

btr_cur_instant_init_metadata(): Do not read any lengths from the
metadata record header before reading the BLOB. At this point, we
would not actually know how many nullable fields the metadata record
contains.

btr_cur_instant_root_init(): Initialize n_core_null_bytes in one
of two possible ways.

btr_cur_trim(): Handle the mblob record.

row_metadata_to_tuple(): Convert a metadata record to a data tuple,
based on the new info_bits of the metadata record.

btr_cur_pessimistic_update(): Invoke row_metadata_to_tuple() if needed.
Invoke dtuple_convert_big_rec() for metadata records if the record is
too large, or if the mblob is not yet marked as externally stored.

btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete():
When the last user record is deleted, do not delete the
generic instant ALTER TABLE metadata record. Only delete
MDEV-11369 instant ADD COLUMN metadata records.

btr_cur_optimistic_insert(): Avoid unnecessary computation of rec_size.

btr_pcur_store_position(): Allow a logically empty page to contain
a metadata record for generic ALTER TABLE.

REC_INFO_DEFAULT_ROW_ADD: Renamed from REC_INFO_DEFAULT_ROW.
This is for the old instant ADD COLUMN (MDEV-11369) only.

REC_INFO_DEFAULT_ROW_ALTER: The more generic metadata record,
with additional information for dropped or reordered columns.

rec_info_bits_valid(): Remove. The only case when this would fail
is when the record is the generic ALTER TABLE metadata record.

rec_is_alter_metadata(): Check if a record is the metadata record
for instant ALTER TABLE (other than ADD COLUMN). NOTE: This function
must not be invoked on node pointer records, because the delete-mark
flag in those records may be set (it is garbage), and then a debug
assertion could fail because index->is_instant() does not necessarily
hold.

rec_is_add_metadata(): Check if a record is MDEV-11369 ADD COLUMN metadata
record (not more generic instant ALTER TABLE).

rec_get_converted_size_comp_prefix_low(): Assume that the metadata
field will be stored externally. In dtuple_convert_big_rec() during
the rec_get_converted_size() call, it would not be there yet.

rec_get_converted_size_comp(): Replace status,fields,n_fields with tuple.

rec_init_offsets_comp_ordinary(), rec_get_converted_size_comp_prefix_low(),
rec_convert_dtuple_to_rec_comp(): Add template<bool mblob = false>.
With mblob=true, process a record with a metadata BLOB.

rec_copy_prefix_to_buf(): Assert that no fields beyond the key and
system columns are being copied. Exclude the metadata BLOB field.

rec_convert_dtuple_to_metadata_comp(): Convert an alter metadata tuple
into a record.

row_upd_index_replace_metadata(): Apply an update vector to an
alter_metadata tuple.

row_log_allocate(): Replace dict_index_t::is_instant()
with a more appropriate condition that ignores dict_table_t::instant.
Only a table on which the MDEV-11369 ADD COLUMN was performed
can "lose its instantness" when it becomes empty. After
instant DROP COLUMN or reordering columns, we cannot simply
convert the table to the canonical format, because the data
dictionary cache and all possibly existing references to it
from other client connection threads would have to be adjusted.

row_quiesce_write_index_fields(): Do not crash when the table contains
an instantly dropped column.

Thanks to Thirunarayanan Balathandayuthapani for discussing the design
and implementing an initial prototype of this.
Thanks to Matthias Leich for testing.
2018-10-19 18:57:23 +03:00
Monty
a7e352b54d Changed database, tablename and alias to be LEX_CSTRING
This was done in, among other things:
- thd->db and thd->db_length
- TABLE_LIST tablename, db, alias and schema_name
- Audit plugin database name
- lex->db
- All db and table names in Alter_table_ctx
- st_select_lex db

Other things:
- Changed a lot of functions to take const LEX_CSTRING* as argument
  for db, table_name and alias. See init_one_table() as an example.
- Changed some function arguments from LEX_CSTRING to const LEX_CSTRING
- Changed some lists from LEX_STRING to LEX_CSTRING
- threads_mysql.result changed because process list_db wasn't always
  correctly updated
- New append_identifier() function that takes LEX_CSTRING* as arguments
- Added new element tmp_buff to Alter_table_ctx to separate temp name
  handling from temporary space
- Ensure we store the length after my_casedn_str() of table/db names
- Removed not used version of rename_table_in_stat_tables()
- Changed Natural_join_column::table_name and db_name() to never return
  NULL (used for print)
- thd->get_db() now returns db as a printable string (thd->db.str or "")
2018-01-30 21:33:55 +02:00
Marko Mäkelä
6edba392d8 Work around MDEV-14029 Server does not remove #sql*.frm files after crash during ALTER TABLE
For the two ALTER TABLE statements that are interrupted by killing the
server, manually remove the #sql*.frm files, because server startup
will not remove the files.
2017-10-09 12:55:01 +03:00
Marko Mäkelä
a4948dafcd MDEV-11369 Instant ADD COLUMN for InnoDB
For InnoDB tables, adding, dropping and reordering columns has
required a rebuild of the table and all its indexes. Since MySQL 5.6
(and MariaDB 10.0) this has been supported online (LOCK=NONE), allowing
concurrent modification of the tables.

This work revises the InnoDB ROW_FORMAT=REDUNDANT, ROW_FORMAT=COMPACT
and ROW_FORMAT=DYNAMIC so that columns can be appended instantaneously,
with only minor changes performed to the table structure. The counter
innodb_instant_alter_column in INFORMATION_SCHEMA.GLOBAL_STATUS
is incremented whenever a table rebuild operation is converted into
an instant ADD COLUMN operation.

ROW_FORMAT=COMPRESSED tables will not support instant ADD COLUMN.

Some usability limitations will be addressed in subsequent work:

MDEV-13134 Introduce ALTER TABLE attributes ALGORITHM=NOCOPY
and ALGORITHM=INSTANT
MDEV-14016 Allow instant ADD COLUMN, ADD INDEX, LOCK=NONE

The format of the clustered index (PRIMARY KEY) is changed as follows:

(1) The FIL_PAGE_TYPE of the root page will be FIL_PAGE_TYPE_INSTANT,
and a new field PAGE_INSTANT will contain the original number of fields
in the clustered index ('core' fields).
If instant ADD COLUMN has not been used or the table becomes empty,
or the very first instant ADD COLUMN operation is rolled back,
the fields PAGE_INSTANT and FIL_PAGE_TYPE will be reset
to 0 and FIL_PAGE_INDEX.

(2) A special 'default row' record is inserted into the leftmost leaf,
between the page infimum and the first user record. This record is
distinguished by the REC_INFO_MIN_REC_FLAG, and it is otherwise in the
same format as records that contain values for the instantly added
columns. This 'default row' always has the same number of fields as
the clustered index according to the table definition. The values of
'core' fields are to be ignored. For other fields, the 'default row'
will contain the default values as they were during the ALTER TABLE
statement. (If the column default values are changed later, those
values will only be stored in the .frm file. The 'default row' will
contain the original evaluated values, which must be the same for
every row.) The 'default row' must be completely hidden from
higher-level access routines. Assertions have been added to ensure
that no 'default row' is ever present in the adaptive hash index
or in locked records. The 'default row' is never delete-marked.

(3) In clustered index leaf page records, the number of fields must
reside between the number of 'core' fields (dict_index_t::n_core_fields
introduced in this work) and dict_index_t::n_fields. If the number
of fields is less than dict_index_t::n_fields, the missing fields
are replaced with the column value of the 'default row'.
Note: The number of fields in the record may shrink if some of the
last instantly added columns are updated to the value that is
in the 'default row'. The function btr_cur_trim() implements this
'compression' on update and rollback; dtuple::trim() implements it
on insert.

(4) In ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC records, the new
status value REC_STATUS_COLUMNS_ADDED will indicate the presence of
a new record header that will encode n_fields-n_core_fields-1 in
1 or 2 bytes. (In ROW_FORMAT=REDUNDANT records, the record header
always explicitly encodes the number of fields.)

We introduce the undo log record type TRX_UNDO_INSERT_DEFAULT for
covering the insert of the 'default row' record when instant ADD COLUMN
is used for the first time. Subsequent instant ADD COLUMN can use
TRX_UNDO_UPD_EXIST_REC.

This is joint work with Vin Chen (陈福荣) from Tencent. The design
that was discussed in April 2017 would not have allowed import or
export of data files, because instead of the 'default row' it would
have introduced a data dictionary table. The test
rpl.rpl_alter_instant is exactly as contributed in pull request #408.
The test innodb.instant_alter is based on a contributed test.

The redo log record format changes for ROW_FORMAT=DYNAMIC and
ROW_FORMAT=COMPACT are as contributed. (With this change present,
crash recovery from MariaDB 10.3.1 will fail in spectacular ways!)
Also the semantics of higher-level redo log records that modify the
PAGE_INSTANT field is changed. The redo log format version identifier
was already changed to LOG_HEADER_FORMAT_CURRENT=103 in MariaDB 10.3.1.

Everything else has been rewritten by me. Thanks to Elena Stepanova,
the code has been tested extensively.

When rolling back an instant ADD COLUMN operation, we must empty the
PAGE_FREE list after deleting or shortening the 'default row' record,
by calling either btr_page_empty() or btr_page_reorganize(). We must
know the size of each entry in the PAGE_FREE list. If rollback left a
freed copy of the 'default row' in the PAGE_FREE list, we would be
unable to determine its size (if it is in ROW_FORMAT=COMPACT or
ROW_FORMAT=DYNAMIC) because it would contain more fields than the
rolled-back definition of the clustered index.

UNIV_SQL_DEFAULT: A new special constant that designates an instantly
added column that is not present in the clustered index record.

len_is_stored(): Check if a length is an actual length. There are
two magic length values: UNIV_SQL_DEFAULT, UNIV_SQL_NULL.

dict_col_t::def_val: The 'default row' value of the column.  If the
column is not added instantly, def_val.len will be UNIV_SQL_DEFAULT.

dict_col_t: Add the accessors is_virtual(), is_nullable(), is_instant(),
instant_value().

dict_col_t::remove_instant(): Remove the 'instant ADD' status of
a column.

dict_col_t::name(const dict_table_t& table): Replaces
dict_table_get_col_name().

dict_index_t::n_core_fields: The original number of fields.
For secondary indexes and if instant ADD COLUMN has not been used,
this will be equal to dict_index_t::n_fields.

dict_index_t::n_core_null_bytes: Number of bytes needed to
represent the null flags; usually equal to UT_BITS_IN_BYTES(n_nullable).

dict_index_t::NO_CORE_NULL_BYTES: Magic value signalling that
n_core_null_bytes was not initialized yet from the clustered index
root page.

dict_index_t: Add the accessors is_instant(), is_clust(),
get_n_nullable(), instant_field_value().

dict_index_t::instant_add_field(): Adjust clustered index metadata
for instant ADD COLUMN.

dict_index_t::remove_instant(): Remove the 'instant ADD' status
of a clustered index when the table becomes empty, or the very first
instant ADD COLUMN operation is rolled back.

dict_table_t: Add the accessors is_instant(), is_temporary(),
supports_instant().

dict_table_t::instant_add_column(): Adjust metadata for
instant ADD COLUMN.

dict_table_t::rollback_instant(): Adjust metadata on the rollback
of instant ADD COLUMN.

prepare_inplace_alter_table_dict(): First create the ctx->new_table,
and only then decide if the table really needs to be rebuilt.
We must split the creation of table or index metadata from the
creation of the dictionary table records and the creation of
the data. In this way, we can transform a table-rebuilding operation
into an instant ADD COLUMN operation. Dictionary objects will only
be added to cache when table rebuilding or index creation is needed.
The ctx->instant_table will never be added to cache.

dict_table_t::add_to_cache(): Modified and renamed from
dict_table_add_to_cache(). Do not modify the table metadata.
Let the callers invoke dict_table_add_system_columns() and if needed,
set can_be_evicted.

dict_create_sys_tables_tuple(), dict_create_table_step(): Omit the
system columns (which will now exist in the dict_table_t object
already at this point).

dict_create_table_step(): Expect the callers to invoke
dict_table_add_system_columns().

pars_create_table(): Before creating the table creation execution
graph, invoke dict_table_add_system_columns().

row_create_table_for_mysql(): Expect all callers to invoke
dict_table_add_system_columns().

create_index_dict(): Replaces row_merge_create_index_graph().

innodb_update_n_cols(): Renamed from innobase_update_n_virtual().
Call my_error() if an error occurs.

btr_cur_instant_init(), btr_cur_instant_init_low(),
btr_cur_instant_root_init():
Load additional metadata from the clustered index and set
dict_index_t::n_core_null_bytes. This is invoked
when table metadata is first loaded into the data dictionary.

dict_boot(): Initialize n_core_null_bytes for the four hard-coded
dictionary tables.

dict_create_index_step(): Initialize n_core_null_bytes. This is
executed as part of CREATE TABLE.

dict_index_build_internal_clust(): Initialize n_core_null_bytes to
NO_CORE_NULL_BYTES if table->supports_instant().

row_create_index_for_mysql(): Initialize n_core_null_bytes for
CREATE TEMPORARY TABLE.

commit_cache_norebuild(): Call the code to rename or enlarge columns
in the cache only if instant ADD COLUMN is not being used.
(Instant ADD COLUMN would copy all column metadata from
instant_table to old_table, including the names and lengths.)

PAGE_INSTANT: A new 13-bit field for storing dict_index_t::n_core_fields.
This is repurposing the 16-bit field PAGE_DIRECTION, of which only the
least significant 3 bits were used. The original byte containing
PAGE_DIRECTION will be accessible via the new constant PAGE_DIRECTION_B.

page_get_instant(), page_set_instant(): Accessors for the PAGE_INSTANT.

page_ptr_get_direction(), page_get_direction(),
page_ptr_set_direction(): Accessors for PAGE_DIRECTION.

page_direction_reset(): Reset PAGE_DIRECTION, PAGE_N_DIRECTION.

page_direction_increment(): Increment PAGE_N_DIRECTION
and set PAGE_DIRECTION.

rec_get_offsets(): Use the 'leaf' parameter for non-debug purposes,
and assume that heap_no is always set.
Initialize all dict_index_t::n_fields for ROW_FORMAT=REDUNDANT records,
even if the record contains fewer fields.

rec_offs_make_valid(): Add the parameter 'leaf'.

rec_copy_prefix_to_dtuple(): Assert that the tuple is only built
on the core fields. Instant ADD COLUMN only applies to the
clustered index, and we should never build a search key that has
more than the PRIMARY KEY and possibly DB_TRX_ID,DB_ROLL_PTR.
All these columns are always present.

dict_index_build_data_tuple(): Remove assertions that would be
duplicated in rec_copy_prefix_to_dtuple().

rec_init_offsets(): Support ROW_FORMAT=REDUNDANT records whose
number of fields is between n_core_fields and n_fields.

cmp_rec_rec_with_match(): Implement the comparison between two
MIN_REC_FLAG records.

trx_t::in_rollback: Make the field available in non-debug builds.

trx_start_for_ddl_low(): Remove dangerous error-tolerance.
A dictionary transaction must be flagged as such before it has generated
any undo log records. This is because trx_undo_assign_undo() will mark
the transaction as a dictionary transaction in the undo log header
right before the very first undo log record is being written.

btr_index_rec_validate(): Account for instant ADD COLUMN

row_undo_ins_remove_clust_rec(): On the rollback of an insert into
SYS_COLUMNS, revert instant ADD COLUMN in the cache by removing the
last column from the table and the clustered index.

row_search_on_row_ref(), row_undo_mod_parse_undo_rec(), row_undo_mod(),
trx_undo_update_rec_get_update(): Handle the 'default row'
as a special case.

dtuple_t::trim(index): Omit a redundant suffix of an index tuple right
before insert or update. After instant ADD COLUMN, if the last fields
of a clustered index tuple match the 'default row', there is no
need to store them. While trimming the entry, we must hold a page latch,
so that the table cannot be emptied and the 'default row' be deleted.

btr_cur_optimistic_update(), btr_cur_pessimistic_update(),
row_upd_clust_rec_by_insert(), row_ins_clust_index_entry_low():
Invoke dtuple_t::trim() if needed.

row_ins_clust_index_entry(): Restore dtuple_t::n_fields after calling
row_ins_clust_index_entry_low().

rec_get_converted_size(), rec_get_converted_size_comp(): Allow the number
of fields to be between n_core_fields and n_fields. Do not support
infimum,supremum. They are never supposed to be stored in dtuple_t,
because page creation nowadays uses a lower-level method for initializing
them.

rec_convert_dtuple_to_rec_comp(): Assign the status bits based on the
number of fields.

btr_cur_trim(): In an update, trim the index entry as needed. For the
'default row', handle rollback specially. For user records, omit
fields that match the 'default row'.

btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete():
Skip locking and adaptive hash index for the 'default row'.

row_log_table_apply_convert_mrec(): Replace 'default row' values if needed.
In the temporary file that is applied by row_log_table_apply(),
we must identify whether the records contain the extra header for
instantly added columns. For now, we will allocate an additional byte
for this for ROW_T_INSERT and ROW_T_UPDATE records when the source table
has been subject to instant ADD COLUMN. The ROW_T_DELETE records are
fine, as they will be converted and will only contain 'core' columns
(PRIMARY KEY and some system columns) that are converted from dtuple_t.

rec_get_converted_size_temp(), rec_init_offsets_temp(),
rec_convert_dtuple_to_temp(): Add the parameter 'status'.

REC_INFO_DEFAULT_ROW = REC_INFO_MIN_REC_FLAG | REC_STATUS_COLUMNS_ADDED:
An info_bits constant for distinguishing the 'default row' record.

rec_comp_status_t: An enum of the status bit values.

rec_leaf_format: An enum that replaces the bool parameter of
rec_init_offsets_comp_ordinary().
2017-10-06 09:50:10 +03:00