MDEV-19581 Valgrind error with WolfSSL and encrypted binlog
WolfSSL can read memory out of bounds in EVP_CipherUpdate()
in decrypt/NOPAD mode, when the input length is not multiple of AES block
size.
The workaround ensures that input will have some padding at the end
by having slightly larger allocated buffer, or padding the structures
with 16 more bytes.
dict_sys.lock(), dict_sys_lock(): Acquire both mutex and latch.
dict_sys.unlock(), dict_sys_unlock(): Release both mutex and latch.
dict_sys.assert_locked(): Assert that both mutex and latch are held.
In MySQL 5.7.8 an extra level of pointer indirection was added to
dict_operation_lock and some other rw_lock_t without solid justification,
in mysql/mysql-server@52720f1772.
Let us revert that change and remove the rather useless rw_lock_t
constructor and destructor and the magic_n field. In this way,
some unnecessary pointer dereferences and heap allocation will be avoided
and debugging might be a little easier.
MySQL 5.7 introduced the class page_size_t and increased the size of
buffer pool page descriptors by introducing this object to them.
Maybe the intention of this exercise was to prepare for a future
where the buffer pool could accommodate multiple page sizes.
But that future never arrived, not even in MySQL 8.0. It is much
easier to manage a pool of a single page size, and typically all
storage devices of an InnoDB instance benefit from using the same
page size.
Let us remove page_size_t from MariaDB Server. This will make it
easier to remove support for ROW_FORMAT=COMPRESSED (or make it a
compile-time option) in the future, just by removing various
occurrences of zip_size.
PROBLEM
-------
1. We are inserting a base column entry which causes an invalid value
by the function provided to generate virtual column,but we go ahead
and insert this due to ignore keyword.
2. We then delete this record, making this record delete marked in innodb.
If we try to insert another record with the same pk as the deleted
record and if the rec is not purged ,then we try to undelete mark this
record and try to build a update vector with previous and updated value
and while calculating the value of virtual column we get error from
server that we cannot calculate this from base column.
Innodb assumes that innobase_get_computed_value() Should always return
a valid value for the base column present in the row. The failure of
this call was not handled ,so we were crashing.
FIX
dict_index_t::db_trx_id(): Return the position of DB_TRX_ID.
Only valid for the clustered index.
dict_index_t::db_roll_ptr(): Return the position of DB_ROLL_PTR.
Only valid for the clustered index.
dict_index_get_sys_col_pos(): Remove. This was performing unnecessarily
complex computations, which only made sense for DB_ROW_ID, which would
exist either as the first field in the clustered index or as the last
field in a secondary index (only when a DB_ROW_ID column is materialised).
row_sel_store_row_id_to_prebuilt(): Remove, and replace with simpler code.
row_upd_index_entry_sys_field(): Remove.
btr_cur_log_sys(): Replaces row_upd_write_sys_vals_to_log().
btr_cur_write_sys(): Write DB_TRX_ID,DB_ROLL_PTR to a data tuple.
Allow ADD COLUMN anywhere in a table, not only adding as the
last column.
Allow instant DROP COLUMN and instant changing the order of columns.
The added columns will always be added last in clustered index records.
In new records, instantly dropped columns will be stored as NULL or
empty when possible.
Information about dropped and reordered columns will be written in
a metadata BLOB (mblob), which is stored before the first 'user' field
in the hidden metadata record at the start of the clustered index.
The presence of mblob is indicated by setting the delete-mark flag in
the metadata record.
The metadata BLOB stores the number of clustered index fields,
followed by an array of column information for each field.
For dropped columns, we store the NOT NULL flag, the fixed length,
and for variable-length columns, whether the maximum length exceeded
255 bytes. For non-dropped columns, we store the column position.
Unlike with MDEV-11369, when a table becomes empty, it cannot
be converted back to the canonical format. The reason for this is
that other threads may hold cached objects such as
row_prebuilt_t::ins_node that could refer to dropped or reordered
index fields.
For instant DROP COLUMN and ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC,
we must store the n_core_null_bytes in the root page, so that the
chain of node pointer records can be followed in order to reach the
leftmost leaf page where the metadata record is located.
If the mblob is present, we will zero-initialize the strings
"infimum" and "supremum" in the root page, and use the last byte of
"supremum" for storing the number of null bytes (which are allocated
but useless on node pointer pages). This is necessary for
btr_cur_instant_init_metadata() to be able to navigate to the mblob.
If the PRIMARY KEY contains any variable-length column and some
nullable columns were instantly dropped, the dict_index_t::n_nullable
in the data dictionary could be smaller than it actually is in the
non-leaf pages. Because of this, the non-leaf pages could use more
bytes for the null flags than the data dictionary expects, and we
could be reading the lengths of the variable-length columns from the
wrong offset, and thus reading the child page number from wrong place.
This is the result of two design mistakes that involve unnecessary
storage of data: First, it is nonsense to store any data fields for
the leftmost node pointer records, because the comparisons would be
resolved by the MIN_REC_FLAG alone. Second, there cannot be any null
fields in the clustered index node pointer fields, but we nevertheless
reserve space for all the null flags.
Limitations (future work):
MDEV-17459 Allow instant ALTER TABLE even if FULLTEXT INDEX exists
MDEV-17468 Avoid table rebuild on operations on generated columns
MDEV-17494 Refuse ALGORITHM=INSTANT when the row size is too large
btr_page_reorganize_low(): Preserve any metadata in the root page.
Call lock_move_reorganize_page() only after restoring the "infimum"
and "supremum" records, to avoid a memcmp() assertion failure.
dict_col_t::DROPPED: Magic value for dict_col_t::ind.
dict_col_t::clear_instant(): Renamed from dict_col_t::remove_instant().
Do not assert that the column was instantly added, because we
sometimes call this unconditionally for all columns.
Convert an instantly added column to a "core column". The old name
remove_instant() could be mistaken to refer to "instant DROP COLUMN".
dict_col_t::is_added(): Rename from dict_col_t::is_instant().
dtype_t::metadata_blob_init(): Initialize the mblob data type.
dtuple_t::is_metadata(), dtuple_t::is_alter_metadata(),
upd_t::is_metadata(), upd_t::is_alter_metadata(): Check if info_bits
refer to a metadata record.
dict_table_t::instant: Metadata about dropped or reordered columns.
dict_table_t::prepare_instant(): Prepare
ha_innobase_inplace_ctx::instant_table for instant ALTER TABLE.
innobase_instant_try() will pass this to dict_table_t::instant_column().
On rollback, dict_table_t::rollback_instant() will be called.
dict_table_t::instant_column(): Renamed from instant_add_column().
Add the parameter col_map so that columns can be reordered.
Copy and adjust v_cols[] as well.
dict_table_t::find(): Find an old column based on a new column number.
dict_table_t::serialise_columns(), dict_table_t::deserialise_columns():
Convert the mblob.
dict_index_t::instant_metadata(): Create the metadata record
for instant ALTER TABLE. Invoke dict_table_t::serialise_columns().
dict_index_t::reconstruct_fields(): Invoked by
dict_table_t::deserialise_columns().
dict_index_t::clear_instant_alter(): Move the fields for the
dropped columns to the end, and sort the surviving index fields
in ascending order of column position.
ha_innobase::check_if_supported_inplace_alter(): Do not allow
adding a FTS_DOC_ID column if a hidden FTS_DOC_ID column exists
due to FULLTEXT INDEX. (This always required ALGORITHM=COPY.)
instant_alter_column_possible(): Add a parameter for InnoDB table,
to check for additional conditions, such as the maximum number of
index fields.
ha_innobase_inplace_ctx::first_alter_pos: The first column whose position
is affected by instant ADD, DROP, or changing the order of columns.
innobase_build_col_map(): Skip added virtual columns.
prepare_inplace_add_virtual(): Correctly compute num_to_add_vcol.
Remove some unnecessary code. Note that the call to
innodb_base_col_setup() should be executed later.
commit_try_norebuild(): If ctx->is_instant(), let the virtual
columns be added or dropped by innobase_instant_try().
innobase_instant_try(): Fill in a zero default value for the
hidden column FTS_DOC_ID (to reduce the work needed in MDEV-17459).
If any columns were dropped or reordered (or added not last),
delete any SYS_COLUMNS records for the following columns, and
insert SYS_COLUMNS records for all subsequent stored columns as well
as for all virtual columns. If any virtual column is dropped, rewrite
all virtual column metadata. Use a shortcut only for adding
virtual columns. This is because innobase_drop_virtual_try()
assumes that the dropped virtual columns still exist in ctx->old_table.
innodb_update_cols(): Renamed from innodb_update_n_cols().
innobase_add_one_virtual(), innobase_insert_sys_virtual(): Change
the return type to bool, and invoke my_error() when detecting an error.
innodb_insert_sys_columns(): Insert a record into SYS_COLUMNS.
Refactored from innobase_add_one_virtual() and innobase_instant_add_col().
innobase_instant_add_col(): Replace the parameter dfield with type.
innobase_instant_drop_cols(): Drop matching columns from SYS_COLUMNS
and all columns from SYS_VIRTUAL.
innobase_add_virtual_try(), innobase_drop_virtual_try(): Let
the caller invoke innodb_update_cols().
innobase_rename_column_try(): Skip dropped columns.
commit_cache_norebuild(): Update table->fts->doc_col.
dict_mem_table_col_rename_low(): Skip dropped columns.
trx_undo_rec_get_partial_row(): Skip dropped columns.
trx_undo_update_rec_get_update(): Handle the metadata BLOB correctly.
trx_undo_page_report_modify(): Avoid out-of-bounds access to record fields.
Log metadata records consistently.
Apparently, the first fields of a clustered index may be updated
in an update_undo vector when the index is ID_IND of SYS_FOREIGN,
as part of renaming the table during ALTER TABLE. Normally, updates of
the PRIMARY KEY should be logged as delete-mark and an insert.
row_undo_mod_parse_undo_rec(), row_purge_parse_undo_rec():
Use trx_undo_metadata.
row_undo_mod_clust_low(): On metadata rollback, roll back the root page too.
row_undo_mod_clust(): Relax an assertion. The delete-mark flag was
repurposed for ALTER TABLE metadata records.
row_rec_to_index_entry_impl(): Add the template parameter mblob
and the optional parameter info_bits for specifying the desired new
info bits. For the metadata tuple, allow conversion between the original
format (ADD COLUMN only) and the generic format (with hidden BLOB).
Add the optional parameter "pad" to determine whether the tuple should
be padded to the index fields (on ALTER TABLE it should), or whether
it should remain at its original size (on rollback).
row_build_index_entry_low(): Clean up the code, removing
redundant variables and conditions. For instantly dropped columns,
generate a dummy value that is NULL, the empty string, or a
fixed length of NUL bytes, depending on the type of the dropped column.
row_upd_clust_rec_by_insert_inherit_func(): On the update of PRIMARY KEY
of a record that contained a dropped column whose value was stored
externally, we will be inserting a dummy NULL or empty string value
to the field of the dropped column. The externally stored column would
eventually be dropped when purge removes the delete-marked record for
the old PRIMARY KEY value.
btr_index_rec_validate(): Recognize the metadata record.
btr_discard_only_page_on_level(): Preserve the generic instant
ALTER TABLE metadata.
btr_set_instant(): Replaces page_set_instant(). This sets a clustered
index root page to the appropriate format, or upgrades from
the MDEV-11369 instant ADD COLUMN to generic ALTER TABLE format.
btr_cur_instant_init_low(): Read and validate the metadata BLOB page
before reconstructing the dictionary information based on it.
btr_cur_instant_init_metadata(): Do not read any lengths from the
metadata record header before reading the BLOB. At this point, we
would not actually know how many nullable fields the metadata record
contains.
btr_cur_instant_root_init(): Initialize n_core_null_bytes in one
of two possible ways.
btr_cur_trim(): Handle the mblob record.
row_metadata_to_tuple(): Convert a metadata record to a data tuple,
based on the new info_bits of the metadata record.
btr_cur_pessimistic_update(): Invoke row_metadata_to_tuple() if needed.
Invoke dtuple_convert_big_rec() for metadata records if the record is
too large, or if the mblob is not yet marked as externally stored.
btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete():
When the last user record is deleted, do not delete the
generic instant ALTER TABLE metadata record. Only delete
MDEV-11369 instant ADD COLUMN metadata records.
btr_cur_optimistic_insert(): Avoid unnecessary computation of rec_size.
btr_pcur_store_position(): Allow a logically empty page to contain
a metadata record for generic ALTER TABLE.
REC_INFO_DEFAULT_ROW_ADD: Renamed from REC_INFO_DEFAULT_ROW.
This is for the old instant ADD COLUMN (MDEV-11369) only.
REC_INFO_DEFAULT_ROW_ALTER: The more generic metadata record,
with additional information for dropped or reordered columns.
rec_info_bits_valid(): Remove. The only case when this would fail
is when the record is the generic ALTER TABLE metadata record.
rec_is_alter_metadata(): Check if a record is the metadata record
for instant ALTER TABLE (other than ADD COLUMN). NOTE: This function
must not be invoked on node pointer records, because the delete-mark
flag in those records may be set (it is garbage), and then a debug
assertion could fail because index->is_instant() does not necessarily
hold.
rec_is_add_metadata(): Check if a record is MDEV-11369 ADD COLUMN metadata
record (not more generic instant ALTER TABLE).
rec_get_converted_size_comp_prefix_low(): Assume that the metadata
field will be stored externally. In dtuple_convert_big_rec() during
the rec_get_converted_size() call, it would not be there yet.
rec_get_converted_size_comp(): Replace status,fields,n_fields with tuple.
rec_init_offsets_comp_ordinary(), rec_get_converted_size_comp_prefix_low(),
rec_convert_dtuple_to_rec_comp(): Add template<bool mblob = false>.
With mblob=true, process a record with a metadata BLOB.
rec_copy_prefix_to_buf(): Assert that no fields beyond the key and
system columns are being copied. Exclude the metadata BLOB field.
rec_convert_dtuple_to_metadata_comp(): Convert an alter metadata tuple
into a record.
row_upd_index_replace_metadata(): Apply an update vector to an
alter_metadata tuple.
row_log_allocate(): Replace dict_index_t::is_instant()
with a more appropriate condition that ignores dict_table_t::instant.
Only a table on which the MDEV-11369 ADD COLUMN was performed
can "lose its instantness" when it becomes empty. After
instant DROP COLUMN or reordering columns, we cannot simply
convert the table to the canonical format, because the data
dictionary cache and all possibly existing references to it
from other client connection threads would have to be adjusted.
row_quiesce_write_index_fields(): Do not crash when the table contains
an instantly dropped column.
Thanks to Thirunarayanan Balathandayuthapani for discussing the design
and implementing an initial prototype of this.
Thanks to Matthias Leich for testing.
dict_index_t::first_user_field(): Return the first data field in
a clustered index, that is, the field after the PRIMARY KEY and
the two system columns DB_TRX_ID, DB_ROLL_PTR.
dtuple_convert_big_rec(): Remove some local variables.
For instant ALTER TABLE, we store a hidden metadata record at the
start of the clustered index, to indicate how the format of the
records differs from the latest table definition.
The term 'default row' is too specific, because it applies to
instant ADD COLUMN only, and we will be supporting more classes
of instant ALTER TABLE later on. For instant ADD COLUMN, we
store the initial default values in the metadata record.
row_log_table_get_pk_col(): Replace a condition that was inadvertently
removed in MDEV-16365. PRIMARY KEY columns are never allowed to be NULL,
and failure to enforce the constraint caused a null pointer to be
dereferenced in mem_heap_dup().
During a table-rebuilding online ALTER TABLE, if
dict_index_t::remove_instant() was invoked on the source table
(because it became empty), we would inadvertently change the way
how log records are written and parsed. We must keep the online_log
format unchanged throughout the whole table-rebuilding operation.
dict_col_t::def_t: Name the type of dict_col_t::def_val.
rec_get_n_add_field_len(), rec_set_n_add_field(): Define globally,
because these will be needed in row_log_table_low().
rec_init_offsets_temp(), rec_init_offsets_comp_ordinary(): Add
the parameter def_val for explicitly passing the default values
of the instantly added columns of the source table, so that
dict_index_t::instant_field_value() will not be called during
row_log_table_apply(). This allows us to consistently parse the
online_log records, even if the source table was converted
to the canonical non-instant format during the rebuild operation.
row_log_t::non_core_fields[]: The default values of the
instantly added columns on the source table; copied
during ha_innobase::prepare_inplace_alter_table()
while the table is exclusively locked.
row_log_t::instant_field_value(): Accessor to non_core_fields[],
analogous to dict_index_t::instant_field_value().
row_log_table_low(): Add fake_extra_size bytes to the record
header if the source table was converted to the canonical format
during the operation.
row_log_allocate(): Initialize row_log_t::non_core_fields.
NULL values when there is no DEFAULT
Copy and inplace algorithm works similarly for
NULL to NOT NULL conversion for the following cases:
(1) strict sql mode - Should give error.
(2) non-strict sql mode - Should give warnings alone
(3) alter ignore table command. - Should give warnings alone.
The predicate dict_table_is_discarded() checks whether
ALTER TABLE…DISCARD TABLESPACE has been executed.
Replace most occurrences of dict_table_is_discarded() with
checks of dict_table_t::space. A few checks for the flag
DICT_TF2_DISCARDED are necessary; write them inline.
Because !is_readable() implies !space, some checks for
dict_table_is_discarded() were redundant.
If creating a secondary index fails (typically, ADD UNIQUE INDEX fails
due to duplicate key), it is possible that concurrently running UPDATE
or DELETE will access the index stub and hit the debug assertion.
It does not make any sense to keep updating an uncommitted index whose
creation has failed.
dict_index_t::is_corrupted(): Replaces dict_index_is_corrupted().
Also take online_status into account.
Replace some calls to dict_index_is_clust() with calls to
dict_index_t::is_primary().
During an online table rebuild, a table could be emptied and converted
from 'instant ADD' format to plain (pre-10.3) format. All online_log
records for rebuilding the table must be written and parsed in the
format of the table that existed at the start of the operation.
row_log_t::n_core_fields: A new field for recording index->n_core_fields
when online ALTER is initiated in row_log_allocate().
row_log_t::is_instant(): Determine if the log is in the instant format.
Only invoked by the row_log_table_ family of functions.
dict_index_t::get_n_nullable(): Remove is_instant() debug assertions.
Because a table can be converted to non-instant format during a
table-rebuilding ALTER TABLE, these assertions would be bogus when
executing row_log_table_apply().
rec_init_offsets_temp(): Add the parameter n_core for passing the
original index->n_core_fields.
rec_init_offsets_temp(): Add a 3-parameter variant.
rec_init_offsets_comp_ordinary(): Add the parameter n_core for
passing the index->n_core_fields.
Issue:
------
Prefix for externally stored columns were being stored in online_log when a
table is altered and alter causes table to be rebuilt. Space in online_log is
limited and if length of prefix of externally stored columns is very big, then
it is being written to online log without making sure if it fits. This leads to
memory corruption.
Fix:
----
After fix for Bug#16544143, there is no need to store prefixes of externally
stored columnd in online_log. Thus remove the code which stores column prefixes
for externally stored columns. Also, before writing anything on online_log,
make sure it fits to available memory to avoid memory corruption.
Read RB page for more details.
Reviewed-by: Annamalai Gurusami <annamalai.gurusami@oracle.com>
RB: 18239
Remove unused InnoDB function parameters and functions.
i_s_sys_virtual_fill_table(): Do not allocate heap memory.
mtr_is_block_fix(): Replace with mtr_memo_contains().
mtr_is_page_fix(): Replace with mtr_memo_contains_page().
- Allow NOT NULL constraint to replace the NULL value in the row with
explicit or implicit default value.
- If the default value is non-const value then inplace alter won't
support it.
- ALTER IGNORE will ignore the error if the concurrent DML contains
NULL value.