1
0
mirror of https://github.com/MariaDB/server.git synced 2025-12-01 17:39:21 +03:00
Commit Graph

306 Commits

Author SHA1 Message Date
Marko Mäkelä
5a1868b58d MDEV-13564 Mariabackup does not work with TRUNCATE
This is a merge from 10.2, but the 10.2 version of this will not
be pushed into 10.2 yet, because the 10.2 version would include
backports of MDEV-14717 and MDEV-14585, which would introduce
a crash recovery regression: Tables could be lost on
table-rebuilding DDL operations, such as ALTER TABLE,
OPTIMIZE TABLE or this new backup-friendly TRUNCATE TABLE.
The test innodb.truncate_crash occasionally loses the table due to
the following bug:

MDEV-17158 log_write_up_to() sometimes fails
2018-09-07 22:15:06 +03:00
Marko Mäkelä
980d1bf1a9 MDEV-14717: Prevent crash-downgrade to earlier MariaDB 10.2
A crash-downgrade of a RENAME (or TRUNCATE or table-rebuilding
ALTER TABLE or OPTIMIZE TABLE) operation to an earlier 10.2 version
would trigger a debug assertion failure during rollback,
in trx_roll_pop_top_rec_of_trx(). In a non-debug build, the
TRX_UNDO_RENAME_TABLE record would be misinterpreted as an
update_undo log record, and typically the file name would be
interpreted as DB_TRX_ID,DB_ROLL_PTR,PRIMARY KEY. If a matching
record would be found, row_undo_mod() would hit ut_error in
switch (node->rec_type). Typically, ut_a(table2 == NULL) would
fail when opening the table from SQL.

Because of this, we prevent a crash-downgrade to earlier MariaDB 10.2
versions by changing the InnoDB redo log format identifier to the
10.3 identifier, and by introducing a subformat identifier so that
10.2 can continue to refuse crash-downgrade from 10.3 or later.
After a clean shutdown, a downgrade to MariaDB 10.2.13 or later would
still be possible thanks to MDEV-14909. A downgrade to older 10.2
versions is only possible after removing the log files (not recommended).

LOG_HEADER_FORMAT_CURRENT: Change to 103 (originally the 10.3 format).

log_group_t: Add subformat. For 10.2, we will use subformat 1,
and will refuse crash recovery from any other subformat of the
10.3 format, that is, a genuine 10.3 redo log.

recv_find_max_checkpoint(): Allow startup after clean shutdown
from a future LOG_HEADER_FORMAT_10_4 (unencrypted only).
We cannot handle the encrypted 10.4 redo log block format,
which was introduced in MDEV-12041. Allow crash recovery from
the original 10.2 format as well as the new format.
In Mariabackup --backup, do not allow any startup from 10.3 or 10.4
redo logs.

recv_recovery_from_checkpoint_start(): Skip redo log apply for
clean 10.3 redo log, but not for the new 10.2 redo log
(10.3 format, subformat 1).

srv_prepare_to_delete_redo_log_files(): On format or subformat
mismatch, set srv_log_file_size = 0, so that we will display the
correct message.

innobase_start_or_create_for_mysql(): Check for format or subformat
mismatch.

xtrabackup_backup_func(): Remove debug assertions that were made
redundant by the code changes in recv_find_max_checkpoint().
2018-09-07 22:10:03 +03:00
Marko Mäkelä
055a3334ad MDEV-13564 Mariabackup does not work with TRUNCATE
Implement undo tablespace truncation via normal redo logging.

Implement TRUNCATE TABLE as a combination of RENAME to #sql-ib name,
CREATE, and DROP.

Note: Orphan #sql-ib*.ibd may be left behind if MariaDB Server 10.2
is killed before the DROP operation is committed. If MariaDB Server 10.2
is killed during TRUNCATE, it is also possible that the old table
was renamed to #sql-ib*.ibd but the data dictionary will refer to the
table using the original name.

In MariaDB Server 10.3, RENAME inside InnoDB is transactional,
and #sql-* tables will be dropped on startup. So, this new TRUNCATE
will be fully crash-safe in 10.3.

ha_mroonga::wrapper_truncate(): Pass table options to the underlying
storage engine, now that ha_innobase::truncate() will need them.

rpl_slave_state::truncate_state_table(): Before truncating
mysql.gtid_slave_pos, evict any cached table handles from
the table definition cache, so that there will be no stale
references to the old table after truncating.

== TRUNCATE TABLE ==

WL#6501 in MySQL 5.7 introduced separate log files for implementing
atomic and crash-safe TRUNCATE TABLE, instead of using the InnoDB
undo and redo log. Some convoluted logic was added to the InnoDB
crash recovery, and some extra synchronization (including a redo log
checkpoint) was introduced to make this work. This synchronization
has caused performance problems and race conditions, and the extra
log files cannot be copied or applied by external backup programs.

In order to support crash-upgrade from MariaDB 10.2, we will keep
the logic for parsing and applying the extra log files, but we will
no longer generate those files in TRUNCATE TABLE.

A prerequisite for crash-safe TRUNCATE is a crash-safe RENAME TABLE
(with full redo and undo logging and proper rollback). This will
be implemented in MDEV-14717.

ha_innobase::truncate(): Invoke RENAME, create(), delete_table().
Because RENAME cannot be fully rolled back before MariaDB 10.3
due to missing undo logging, add some explicit rename-back in
case the operation fails.

ha_innobase::delete(): Introduce a variant that takes sqlcom as
a parameter. In TRUNCATE TABLE, we do not want to touch any
FOREIGN KEY constraints.

ha_innobase::create(): Add the parameters file_per_table, trx.
In TRUNCATE, the new table must be created in the same transaction
that renames the old table.

create_table_info_t::create_table_info_t(): Add the parameters
file_per_table, trx.

row_drop_table_for_mysql(): Replace a bool parameter with sqlcom.

row_drop_table_after_create_fail(): New function, wrapping
row_drop_table_for_mysql().

dict_truncate_index_tree_in_mem(), fil_truncate_tablespace(),
fil_prepare_for_truncate(), fil_reinit_space_header_for_table(),
row_truncate_table_for_mysql(), TruncateLogger,
row_truncate_prepare(), row_truncate_rollback(),
row_truncate_complete(), row_truncate_fts(),
row_truncate_update_system_tables(),
row_truncate_foreign_key_checks(), row_truncate_sanity_checks():
Remove.

row_upd_check_references_constraints(): Remove a check for
TRUNCATE, now that the table is no longer truncated in place.

The new test innodb.truncate_foreign uses DEBUG_SYNC to cover some
race-condition like scenarios. The test innodb-innodb.truncate does
not use any synchronization.

We add a redo log subformat to indicate backup-friendly format.
MariaDB 10.4 will remove support for the old TRUNCATE logging,
so crash-upgrade from old 10.2 or 10.3 to 10.4 will involve
limitations.

== Undo tablespace truncation ==

MySQL 5.7 implements undo tablespace truncation. It is only
possible when innodb_undo_tablespaces is set to at least 2.
The logging is implemented similar to the WL#6501 TRUNCATE,
that is, using separate log files and a redo log checkpoint.

We can simply implement undo tablespace truncation within
a single mini-transaction that reinitializes the undo log
tablespace file. Unfortunately, due to the redo log format
of some operations, currently, the total redo log written by
undo tablespace truncation will be more than the combined size
of the truncated undo tablespace. It should be acceptable
to have a little more than 1 megabyte of log in a single
mini-transaction. This will be fixed in MDEV-17138 in
MariaDB Server 10.4.

recv_sys_t: Add truncated_undo_spaces[] to remember for which undo
tablespaces a MLOG_FILE_CREATE2 record was seen.

namespace undo: Remove some unnecessary declarations.

fil_space_t::is_being_truncated: Document that this flag now
only applies to undo tablespaces. Remove some references.

fil_space_t::is_stopping(): Do not refer to is_being_truncated.
This check is for tablespaces of tables. Potentially used
tablespaces are never truncated any more.

buf_dblwr_process(): Suppress the out-of-bounds warning
for undo tablespaces.

fil_truncate_log(): Write a MLOG_FILE_CREATE2 with a nonzero
page number (new size of the tablespace in pages) to inform
crash recovery that the undo tablespace size has been reduced.

fil_op_write_log(): Relax assertions, so that MLOG_FILE_CREATE2
can be written for undo tablespaces (without .ibd file suffix)
for a nonzero page number.

os_file_truncate(): Add the parameter allow_shrink=false
so that undo tablespaces can actually be shrunk using this function.

fil_name_parse(): For undo tablespace truncation,
buffer MLOG_FILE_CREATE2 in truncated_undo_spaces[].

recv_read_in_area(): Avoid reading pages for which no redo log
records remain buffered, after recv_addr_trim() removed them.

trx_rseg_header_create(): Add a FIXME comment that we could write
much less redo log.

trx_undo_truncate_tablespace(): Reinitialize the undo tablespace
in a single mini-transaction, which will be flushed to the redo log
before the file size is trimmed.

recv_addr_trim(): Discard any redo logs for pages that were
logged after the new end of a file, before the truncation LSN.
If the rec_list becomes empty, reduce n_addrs. After removing
any affected records, actually truncate the file.

recv_apply_hashed_log_recs(): Invoke recv_addr_trim() right before
applying any log records. The undo tablespace files must be open
at this point.

buf_flush_or_remove_pages(), buf_flush_dirty_pages(),
buf_LRU_flush_or_remove_pages(): Add a parameter for specifying
the number of the first page to flush or remove (default 0).

trx_purge_initiate_truncate(): Remove the log checkpoints, the
extra logging, and some unnecessary crash points. Merge the code
from trx_undo_truncate_tablespace(). First, flush all to-be-discarded
pages (beyond the new end of the file), then trim the space->size
to make the page allocation deterministic. At the only remaining
crash injection point, flush the redo log, so that the recovery
can be tested.
2018-09-07 22:10:02 +03:00
Marko Mäkelä
f6d4f624eb MDEV-12041: innodb_encrypt_log key rotation
This will change the InnoDB encrypted redo log format only.
Unencrypted redo log will keep using the MariaDB 10.3 format.
In the new encrypted redo log format, 4 additional bytes will
be reserved in the redo log block trailer for storing the
encryption key version.

For performance reasons, the encryption key rotation
(checking if the latest encryption key version is being used)
is only done at log_checkpoint().

LOG_HEADER_FORMAT_CURRENT: Remove.

LOG_HEADER_FORMAT_ENC_10_4: The encrypted 10.4 format.

LOG_BLOCK_KEY: The encryption key version field.

LOG_BLOCK_TRL_SIZE: Remove.

log_t: Add accessors framing_size(), payload_size(), trailer_offset(),
to be used instead of referring to LOG_BLOCK_TRL_SIZE.

log_crypt_t: An operation passed to log_crypt().

log_crypt(): Perform decryption, encryption, or encryption with key
rotation. Return an error if key rotation at decryption fails.
On encryption, keep using the previous key if the rotation fails.
At startup, old-format encrypted redo log may be written before
the redo log is upgraded (rebuilt) to the latest format.

log_write_up_to(): Add the parameter rotate_key=false.

log_checkpoint(): Invoke log_write_up_to() with rotate_key=true.
2018-08-13 16:04:37 +03:00
Sergei Golubchik
36e59752e7 Merge branch '10.2' into 10.3 2018-06-30 16:39:20 +02:00
Sergei Golubchik
ef64856b97 don't crash on innodb_undo_tablespaces=1 2018-06-21 23:49:37 +02:00
Marko Mäkelä
cd15e764a8 MDEV-16159 Use atomic memory access for purge_sys
Thanks to Sergey Vojtovich for feedback and many ideas.

purge_state_t: Remove. The states are replaced with
purge_sys_t::enabled() and purge_sys_t::paused() as follows:
PURGE_STATE_INIT, PURGE_STATE_EXIT, PURGE_STATE_DISABLED: !enabled().
PURGE_STATE_RUN, PURGE_STATE_STOP: paused() distinguishes these.

purge_sys_t::m_paused: Renamed from purge_sys_t::n_stop.
Protected by atomic memory access only, not purge_sys_t::latch.

purge_sys_t::m_enabled: An atomically updated Boolean that
replaces purge_sys_t::state.

purge_sys_t:🏃 Remove, because it duplicates
srv_sys.n_threads_active[SRV_PURGE].

purge_sys_t::running(): Accessor for srv_sys.n_threads_active[SRV_PURGE].

purge_sys_t::stop(): Renamed from trx_purge_stop().

purge_sys_t::resume(): Renamed from trx_purge_run().
Do not acquire latch; solely rely on atomics.

purge_sys_t::is_initialised(), purge_sys_t::m_initialised: Remove.

purge_sys_t::create(), purge_sys_t::close(): Instead of invoking
is_initialised(), check whether event is NULL.

purge_sys_t::event: Move before latch, so that fields that are
protected by latch can reside on the same cache line with latch.

srv_start_wait_for_purge_to_start(): Merge to the only caller srv_start().
2018-05-15 23:01:18 +03:00
Marko Mäkelä
e9f2609747 Merge 10.2 into 10.3 2018-05-09 16:52:45 +03:00
Thirunarayanan Balathandayuthapani
fe95cb2e40 MDEV-16125 Shutdown crash when innodb_force_recovery >= 2
Problem:
=======
InnoDB master thread encounters the shutdown state as SRV_SHUTDOWN_FLUSH_PHASE
when innodb_force_recovery >=2 and slow scheduling of master thread during
shutdown.

Fix:
====
There is no need for master thread itself if innodb_force_recovery >=2.
Don't create the master thread if innodb_force_recovery >= 2
2018-05-09 19:01:29 +05:30
Marko Mäkelä
baa5a43d8c MDEV-16045: Replace log_group_t with log_t::files
There is only one log_sys and only one log_sys.log.

log_t::files::create(): Replaces log_init().

log_t::files::close(): Replaces log_group_close(), log_group_close_all().

fil_close_log_files(): if (free) log_sys.log_close();
The callers that passed free=true used to call log_group_close_all().

log_header_read(): Replaces log_group_header_read().

log_t::files::file_header_bufs_ptr: Use a single allocation.

log_t::files::file_header_bufs[]: Statically allocate the pointers.

log_t::files::set_fields(): Replaces log_group_set_fields().

log_t::files::calc_lsn_offset(): Replaces log_group_calc_lsn_offset().
Simplify the computation by using fewer variables.

log_t::files::read_log_seg(): Replaces log_group_read_log_seg().

log_sys_t::complete_checkpoint(): Replaces log_io_complete_checkpoint().

fil_aio_wait(): Move the logic from log_io_complete().
2018-04-29 09:46:24 +03:00
Marko Mäkelä
d73a898d64 MDEV-16045: Allocate log_sys statically
There is only one redo log subsystem in InnoDB. Allocate the object
statically, to avoid unnecessary dereferencing of the pointer.

log_t::create(): Renamed from log_sys_init().

log_t::close(): Renamed from log_shutdown().

log_t::checkpoint_buf_ptr: Remove. Allocate log_t::checkpoint_buf
statically.
2018-04-29 09:46:24 +03:00
Marko Mäkelä
715e4f4320 MDEV-12218 Clean up InnoDB parameter validation
Bind more InnoDB parameters directly to MYSQL_SYSVAR and
remove "shadow variables".

innodb_change_buffering: Declare as ENUM, not STRING.

innodb_flush_method: Declare as ENUM, not STRING.

innodb_log_buffer_size: Bind directly to srv_log_buffer_size,
without rounding it to a multiple of innodb_page_size.

LOG_BUFFER_SIZE: Remove.

SysTablespace::normalize_size(): Renamed from normalize().

innodb_init_params(): A new function to initialize and validate
InnoDB startup parameters.

innodb_init(): Renamed from innobase_init(). Invoke innodb_init_params()
before actually trying to start up InnoDB.

srv_start(bool): Renamed from innobase_start_or_create_for_mysql().
Added the input parameter create_new_db.

SRV_ALL_O_DIRECT_FSYNC: Define only for _WIN32.

xb_normalize_init_values(): Merge to innodb_init_param().
2018-04-29 09:41:42 +03:00
Marko Mäkelä
9ed2b2b2b8 Do not divide or multiply by srv_page_size
Instead, shift by srv_page_size_shift.
2018-04-28 20:52:22 +03:00
Marko Mäkelä
a90100d756 Replace univ_page_size and UNIV_PAGE_SIZE
Try to use one variable (srv_page_size) for innodb_page_size.

Also, replace UNIV_PAGE_SIZE_SHIFT with srv_page_size_shift.
2018-04-28 20:45:45 +03:00
Marko Mäkelä
ba19764209 Fix most -Wsign-conversion in InnoDB
Change innodb_buffer_pool_size, innodb_fill_factor to unsigned.
2018-04-28 20:45:45 +03:00
Marko Mäkelä
99fa7c6c2f Merge 10.2 into 10.3 2018-04-26 22:58:41 +03:00
Marko Mäkelä
6e04af1b78 Remove dead code HAVE_LZO1X
MariaDB uses HAVE_LZO, not HAVE_LZO1X (which was never defined).
Also, the variable srv_lzo_disabled was never defined or read
(only declared and assigned to, in unreachable code).
2018-04-26 12:59:13 +03:00
Vladislav Vaintroub
321771f89f MDEV-15895 : make Innodb merge temp tables use pfs_os_file_t for
file IO, rather than int.

On Windows, it is suboptimal to depend on C runtime, as it has limited
number of file descriptors. This change eliminates
os_file_read_no_error_handling_int_fd(), os_file_write_int_fd(),
OS_FILE_FROM_FD() macro.
2018-04-17 09:07:38 +01:00
Sergey Vojtovich
0993d6b81b MDEV-15773 - trx_sys.mysql_trx_list -> trx_sys.trx_list
Replaced "list of transactions created for MySQL" with "list of all
transactions". This simplifies code and allows further removal of
trx_sys.m_views.
2018-04-04 14:09:37 +04:00
Marko Mäkelä
4cad42392a MDEV-12266: Change dict_table_t::space to fil_space_t*
InnoDB always keeps all tablespaces in the fil_system cache.
The fil_system.LRU is only for closing file handles; the
fil_space_t and fil_node_t for all data files will remain
in main memory. Between startup to shutdown, they can only be
created and removed by DDL statements. Therefore, we can
let dict_table_t::space point directly to the fil_space_t.

dict_table_t::space_id: A numeric tablespace ID for the corner cases
where we do not have a tablespace. The most prominent examples are
ALTER TABLE...DISCARD TABLESPACE or a missing or corrupted file.

There are a few functional differences; most notably:
(1) DROP TABLE will delete matching .ibd and .cfg files,
even if they were not attached to the data dictionary.
(2) Some error messages will report file names instead of numeric IDs.

There still are many functions that use numeric tablespace IDs instead
of fil_space_t*, and many functions could be converted to fil_space_t
member functions. Also, Tablespace and Datafile should be merged with
fil_space_t and fil_node_t. page_id_t and buf_page_get_gen() could use
fil_space_t& instead of a numeric ID, and after moving to a single
buffer pool (MDEV-15058), buf_pool_t::page_hash could be moved to
fil_space_t::page_hash.

FilSpace: Remove. Only few calls to fil_space_acquire() will remain,
and gradually they should be removed.

mtr_t::set_named_space_id(ulint): Renamed from set_named_space(),
to prevent accidental calls to this slower function. Very few
callers remain.

fseg_create(), fsp_reserve_free_extents(): Take fil_space_t*
as a parameter instead of a space_id.

fil_space_t::rename(): Wrapper for fil_rename_tablespace_check(),
fil_name_write_rename(), fil_rename_tablespace(). Mariabackup
passes the parameter log=false; InnoDB passes log=true.

dict_mem_table_create(): Take fil_space_t* instead of space_id
as parameter.

dict_process_sys_tables_rec_and_mtr_commit(): Replace the parameter
'status' with 'bool cached'.

dict_get_and_save_data_dir_path(): Avoid copying the fil_node_t::name.

fil_ibd_open(): Return the tablespace.

fil_space_t::set_imported(): Replaces fil_space_set_imported().

truncate_t: Change many member function parameters to fil_space_t*,
and remove page_size parameters.

row_truncate_prepare(): Merge to its only caller.

row_drop_table_from_cache(): Assert that the table is persistent.

dict_create_sys_indexes_tuple(): Write SYS_INDEXES.SPACE=FIL_NULL
if the tablespace has been discarded.

row_import_update_discarded_flag(): Remove a constant parameter.
2018-03-29 22:02:05 +03:00
Marko Mäkelä
330ecb906d MDEV-12266: fsp_flags_try_adjust(): Remove lookup 2018-03-29 20:47:42 +03:00
Marko Mäkelä
332e805e2c MDEV-12266: Refactor trx_rseg_header_create() 2018-03-29 20:47:38 +03:00
Marko Mäkelä
c577192d6c MDEV-12266: fsp_flags_try_adjust(): Remove a lookup
fsp_header_init(): Take fil_space_t* as a parameter.
2018-03-29 20:47:29 +03:00
Marko Mäkelä
2ac8b1a907 MDEV-12266: Add fil_system.sys_space, temp_space
Add fil_system_t::sys_space, fil_system_t::temp_space.
These will replace lookups for TRX_SYS_SPACE or SRV_TMP_SPACE_ID.

mtr_t::m_undo_space, mtr_t::m_sys_space: Remove.

mtr_t::set_sys_modified(): Remove.

fil_space_get_type(), fil_space_get_n_reserved_extents(): Remove.

fsp_header_get_tablespace_size(), fsp_header_inc_size():
Merge to the only caller, innobase_start_or_create_for_mysql().
2018-03-29 19:18:11 +03:00
Marko Mäkelä
600c85e85a Allocate the singleton fil_system statically
fil_system_t::create(): Replaces fil_init(), fsp_init().

fil_system_t::close(): Replaces fil_close().

fil_system_t::max_n_open: Remove. Use srv_max_n_open_files directly.
2018-03-29 19:18:10 +03:00
Marko Mäkelä
980dd09be6 Merge 10.2 into 10.3 2018-03-29 17:03:34 +03:00
Marko Mäkelä
4d9969c216 MDEV-15719 ALTER TABLE…ALGORITHM=INPLACE is unnecessarily refused due to innodb_force_recovery
ha_innobase::check_if_supported_inplace_alter(): Only check for
high_level_read_only. Do not unnecessarily refuse
ALTER TABLE...ALGORITHM=INPLACE if innodb_force_recovery was
specified as 1, 2, or 3.

innobase_start_or_create_for_mysql(): Block all writes from SQL
if the system tablespace was initialized with 'newraw'.
2018-03-29 13:20:59 +03:00
Sergei Golubchik
b1818dccf7 Merge branch '10.2' into 10.3 2018-03-28 17:31:57 +02:00
Sergey Vojtovich
e147a4a067 Fixed build failure 2018-03-23 00:32:16 +04:00
Marko Mäkelä
e80a842000 Merge 10.1 into 10.2 2018-03-22 18:02:40 +02:00
Eugene Kosov
8d32959b09 fix data races
srv_last_monitor_time: make all accesses relaxed atomical

WARNING: ThreadSanitizer: data race (pid=12041)
  Write of size 8 at 0x000003949278 by thread T26 (mutexes: write M226445748578513120):
    #0 thd_destructor_proxy storage/innobase/handler/ha_innodb.cc:314:14 (mysqld+0x19b5505)

  Previous read of size 8 at 0x000003949278 by main thread:
    #0 innobase_init(void*) storage/innobase/handler/ha_innodb.cc:4180:11 (mysqld+0x1a03404)
    #1 ha_initialize_handlerton(st_plugin_int*) sql/handler.cc:522:31 (mysqld+0xc5ec73)
    #2 plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) sql/sql_plugin.cc:1447:9 (mysqld+0x134908d)
    #3 plugin_init(int*, char**, int) sql/sql_plugin.cc:1729:15 (mysqld+0x13484f0)
    #4 init_server_components() sql/mysqld.cc:5345:7 (mysqld+0xbf720f)
    #5 mysqld_main(int, char**) sql/mysqld.cc:5940:7 (mysqld+0xbf107d)
    #6 main sql/main.cc:25:10 (mysqld+0xbe971b)

  Location is global 'srv_running' of size 8 at 0x000003949278 (mysqld+0x000003949278)

WARNING: ThreadSanitizer: data race (pid=27869)
  Atomic write of size 4 at 0x7b4800000c00 by thread T8:
    #0 __tsan_atomic32_exchange llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:589 (mysqld+0xbd4eac)
    #1 TTASEventMutex<GenericPolicy>::exit() storage/innobase/include/ib0mutex.h:467:7 (mysqld+0x1a8d4cb)
    #2 PolicyMutex<TTASEventMutex<GenericPolicy> >::exit() storage/innobase/include/ib0mutex.h:609:10 (mysqld+0x1a7839e)
    #3 fil_validate() storage/innobase/fil/fil0fil.cc:5535:2 (mysqld+0x1abd913)
    #4 fil_validate_skip() storage/innobase/fil/fil0fil.cc:204:9 (mysqld+0x1aba601)
    #5 fil_aio_wait(unsigned long) storage/innobase/fil/fil0fil.cc:5296:2 (mysqld+0x1abbae6)
    #6 io_handler_thread storage/innobase/srv/srv0start.cc:340:3 (mysqld+0x21abe1e)

  Previous read of size 4 at 0x7b4800000c00 by main thread (mutexes: write M1273, write M1271):
    #0 TTASEventMutex<GenericPolicy>::state() const storage/innobase/include/ib0mutex.h:530:10 (mysqld+0x21c66e2)
    #1 sync_array_detect_deadlock(sync_array_t*, sync_cell_t*, sync_cell_t*, unsigned long) storage/innobase/sync/sync0arr.cc:746:14 (mysqld+0x21c1c7a)
    #2 sync_array_wait_event(sync_array_t*, sync_cell_t*&) storage/innobase/sync/sync0arr.cc:465:6 (mysqld+0x21c1708)
    #3 TTASEventMutex<GenericPolicy>::enter(unsigned int, unsigned int, char const*, unsigned int) storage/innobase/include/ib0mutex.h:516:6 (mysqld+0x1a8c206)
    #4 PolicyMutex<TTASEventMutex<GenericPolicy> >::enter(unsigned int, unsigned int, char const*, unsigned int) storage/innobase/include/ib0mutex.h:635:10 (mysqld+0x1a782c3)
    #5 fil_mutex_enter_and_prepare_for_io(unsigned long) storage/innobase/fil/fil0fil.cc:1131:3 (mysqld+0x1a9a92e)
    #6 fil_io(IORequest const&, bool, page_id_t const&, page_size_t const&, unsigned long, unsigned long, void*, void*, bool) storage/innobase/fil/fil0fil.cc:5082:2 (mysqld+0x1ab8de2)
    #7 buf_flush_write_block_low(buf_page_t*, buf_flush_t, bool) storage/innobase/buf/buf0flu.cc:1112:3 (mysqld+0x1cb970a)
    #8 buf_flush_page(buf_pool_t*, buf_page_t*, buf_flush_t, bool) storage/innobase/buf/buf0flu.cc:1270:3 (mysqld+0x1cb7d70)
    #9 buf_flush_try_neighbors(page_id_t const&, buf_flush_t, unsigned long, unsigned long) storage/innobase/buf/buf0flu.cc:1493:9 (mysqld+0x1cc9674)
    #10 buf_flush_page_and_try_neighbors(buf_page_t*, buf_flush_t, unsigned long, unsigned long*) storage/innobase/buf/buf0flu.cc:1565:13 (mysqld+0x1cbadf3)
    #11 buf_do_flush_list_batch(buf_pool_t*, unsigned long, unsigned long) storage/innobase/buf/buf0flu.cc:1825:3 (mysqld+0x1cbbcb8)
    #12 buf_flush_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, flush_counters_t*) storage/innobase/buf/buf0flu.cc:1895:16 (mysqld+0x1cbb459)
    #13 buf_flush_do_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, flush_counters_t*) storage/innobase/buf/buf0flu.cc:2065:2 (mysqld+0x1cbcfe1)
    #14 buf_flush_lists(unsigned long, unsigned long, unsigned long*) storage/innobase/buf/buf0flu.cc:2167:8 (mysqld+0x1cbd5a3)
    #15 log_preflush_pool_modified_pages(unsigned long) storage/innobase/log/log0log.cc:1400:13 (mysqld+0x1eefc3b)
    #16 log_make_checkpoint_at(unsigned long, bool) storage/innobase/log/log0log.cc:1751:10 (mysqld+0x1eefb16)
    #17 buf_dblwr_create() storage/innobase/buf/buf0dblwr.cc:335:2 (mysqld+0x1cd2141)
    #18 innobase_start_or_create_for_mysql() storage/innobase/srv/srv0start.cc:2539:10 (mysqld+0x21b4d8e)
    #19 innobase_init(void*) storage/innobase/handler/ha_innodb.cc:4193:8 (mysqld+0x1a5e3d7)
    #20 ha_initialize_handlerton(st_plugin_int*) sql/handler.cc:522:31 (mysqld+0xc74d33)
    #21 plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) sql/sql_plugin.cc:1447:9 (mysqld+0x1376d5d)
    #22 plugin_init(int*, char**, int) sql/sql_plugin.cc:1729:15 (mysqld+0x13761c0)
    #23 init_server_components() sql/mysqld.cc:5348:7 (mysqld+0xc0d0ff)
    #24 mysqld_main(int, char**) sql/mysqld.cc:5943:7 (mysqld+0xc06f9d)
    #25 main sql/main.cc:25:10 (mysqld+0xbff71b)

WARNING: ThreadSanitizer: data race (pid=29031)
  Write of size 8 at 0x0000039e48e0 by thread T15:
    #0 srv_monitor_thread storage/innobase/srv/srv0srv.cc:1699:24 (mysqld+0x21a254e)

  Previous write of size 8 at 0x0000039e48e0 by thread T14:
    #0 srv_refresh_innodb_monitor_stats() storage/innobase/srv/srv0srv.cc:1165:24 (mysqld+0x21a3124)
    #1 srv_error_monitor_thread storage/innobase/srv/srv0srv.cc:1836:3 (mysqld+0x21a2d40)

  Location is global 'srv_last_monitor_time' of size 8 at 0x0000039e48e0 (mysqld+0x0000039e48e0)
2018-03-22 14:42:15 +04:00
Daniel Black
7bb661cd40 innodb: os_file_create_tmpfile always called with NULL -> simplify 2018-03-15 13:49:00 +11:00
Sergey Vojtovich
131d9a5d0c Allocate lock_sys statically
There is only one lock_sys. Allocate it statically in order to avoid
dereferencing a pointer whenever accessing it. Also, align some
members to their own cache line in order to avoid false sharing.

lock_sys_t::create(): The deferred constructor.

lock_sys_t::close(): The early destructor.
2018-02-23 08:18:18 +02:00
Marko Mäkelä
a8656d58d4 Fix the startup with innodb_force_recovery=5
At innodb_force_recovery=5 or bigger, trx_lists_init_at_db_start()
no longer initialises the purge_sys. Adjust an assertion accordingly.
2018-02-22 09:49:50 +02:00
Marko Mäkelä
fb335b48b5 Allocate purge_sys statically
There is only one purge_sys. Allocate it statically in order to avoid
dereferencing a pointer whenever accessing it. Also, align some
members to their own cache line in order to avoid false sharing.

purge_sys_t::create(): The deferred constructor.

purge_sys_t::close(): The early destructor.

undo::Truncate::create(): The deferred constructor.
Because purge_sys.undo_trunc is constructed before the start-up
parameters are parsed, the normal constructor would copy a
wrong value of srv_purge_rseg_truncate_frequency.

TrxUndoRsegsIterator: Do not forward-declare an inline constructor,
because the static construction of purge_sys.rseg_iter would not have
access to it.
2018-02-22 09:30:41 +02:00
Marko Mäkelä
947efe17ed MDEV-15158 On commit, do not write to the TRX_SYS page
This is based on a prototype by
Thirunarayanan Balathandayuthapani <thiru@mariadb.com>.

Binlog and Galera write-set replication information was written into
TRX_SYS page on each commit. Instead of writing to the TRX_SYS during
normal operation, InnoDB can make use of rollback segment header pages,
which are already being written to during a commit.

The following list of fields in rollback segment header page are added:
    TRX_RSEG_BINLOG_OFFSET
    TRX_RSEG_BINLOG_NAME (NUL-terminated; empty name = not present)
    TRX_RSEG_WSREP_XID_FORMAT (0=not present; 1=present)
    TRX_RSEG_WSREP_XID_GTRID
    TRX_RSEG_WSREP_XID_BQUAL
    TRX_RSEG_WSREP_XID_DATA

trx_sys_t: Introduce the fields
recovered_binlog_filename, recovered_binlog_offset, recovered_wsrep_xid.

To facilitate upgrade from older mysql or mariaDB versions, we will read
the information in TRX_SYS page. It will be overridden by the
information that we find in rollback segment header pages.

Mariabackup --prepare will read the metadata from the rollback
segment header pages via trx_rseg_array_init(). It will still
not read any undo log pages or recover any transactions.
2018-02-20 21:36:36 +02:00
Marko Mäkelä
cc3b5d1fe7 Merge bb-10.2-ext into 10.3 2018-02-15 11:48:30 +02:00
Marko Mäkelä
27ea2963fc Dead code removal: sess_t
The session object is not really needed for anything.
We can directly create and free the dummy purge_sys->query->trx.
2018-02-15 10:01:05 +02:00
Vladislav Vaintroub
d995dd2865 Windows : reenable warning C4805 (unsafe mix of types in bool operations) 2018-02-07 20:12:12 +00:00
Vladislav Vaintroub
6c279ad6a7 MDEV-15091 : Windows, 64bit: reenable and fix warning C4267 (conversion from 'size_t' to 'type', possible loss of data)
Handle string length as size_t, consistently (almost always:))
Change function prototypes to accept size_t, where in the past
ulong or uint were used. change local/member variables to size_t
when appropriate.

This fix excludes rocksdb, spider,spider, sphinx and connect for now.
2018-02-06 12:55:58 +00:00
Marko Mäkelä
c7d0448797 MDEV-15132 Avoid accessing the TRX_SYS page
InnoDB maintains an internal persistent sequence of transaction
identifiers. This sequence is used for assigning both transaction
start identifiers (DB_TRX_ID=trx->id) and end identifiers (trx->no)
as well as end identifiers for the mysql.transaction_registry table
that was introduced in MDEV-12894.

TRX_SYS_TRX_ID_WRITE_MARGIN: Remove. After this many updates of
the sequence we used to update the TRX_SYS page. We can avoid accessing
the TRX_SYS page if we modify the InnoDB startup so that resurrecting
the sequence from other pages of the transaction system.

TRX_SYS_TRX_ID_STORE: Deprecate. The field only exists for the purpose
of upgrading from an earlier version of MySQL or MariaDB.

Starting with this fix, MariaDB will rely on the fields
TRX_UNDO_TRX_ID, TRX_UNDO_TRX_NO in the undo log header page of
each non-committed transaction, and on the new field
TRX_RSEG_MAX_TRX_ID in rollback segment header pages.

Because of this change, setting innodb_force_recovery=5 or 6 may cause
the system to recover with trx_sys.get_max_trx_id()==0. We must adjust
checks for invalid DB_TRX_ID and PAGE_MAX_TRX_ID accordingly.

We will change the startup and shutdown messages to display the
trx_sys.get_max_trx_id() in addition to the log sequence number.

trx_sys_t::flush_max_trx_id(): Remove.

trx_undo_mem_create_at_db_start(), trx_undo_lists_init():
Add an output parameter max_trx_id, to be updated from
TRX_UNDO_TRX_ID, TRX_UNDO_TRX_NO.

TRX_RSEG_MAX_TRX_ID: New field, for persisting
trx_sys.get_max_trx_id() at the time of the latest transaction commit.
Startup is not reading the undo log pages of committed transactions.
We want to avoid additional page accesses on startup, as well as
trouble when all undo logs have been emptied.
On startup, we will simply determine the maximum value from all pages
that are being read anyway.

TRX_RSEG_FORMAT: Redefined from TRX_RSEG_MAX_SIZE.

Old versions of InnoDB wrote uninitialized garbage to unused data fields.
Because of this, we cannot simply introduce a new field in the
rollback segment pages and expect it to be always zero, like it would
if the database was created by a recent enough InnoDB version.

Luckily, it looks like the field TRX_RSEG_MAX_SIZE was always written
as 0xfffffffe. We will indicate a new subformat of the page by writing
0 to this field. This has the nice side effect that after a downgrade
to older versions of InnoDB, transactions should fail to allocate any
undo log, that is, writes will be blocked. So, there is no problem of
getting corrupted transaction identifiers after downgrading.

trx_rseg_t::max_size: Remove.

trx_rseg_header_create(): Remove the parameter max_size=ULINT_MAX.

trx_purge_add_undo_to_history(): Update TRX_RSEG_MAX_SIZE
(and TRX_RSEG_FORMAT if needed). This is invoked on transaction commit.

trx_rseg_mem_restore(): If TRX_RSEG_FORMAT contains 0,
read TRX_RSEG_MAX_SIZE.

trx_rseg_array_init(): Invoke trx_sys.init_max_trx_id(max_trx_id + 1)
where max_trx_id was the maximum that was encountered in the rollback
segment pages and the undo log pages of recovered active, XA PREPARE,
or some committed transactions. (See trx_purge_add_undo_to_history()
which invokes trx_rsegf_set_nth_undo(..., FIL_NULL, ...);
not all committed transactions will be immediately detached from the
rollback segment header.)
2018-01-31 10:24:19 +02:00
Marko Mäkelä
6058f92f5c Simplify undo log access during InnoDB startup
trx_rseg_mem_restore(): Update the max_trx_id from the undo log pages.

trx_sys_init_at_db_start(): Remove; merge with trx_lists_init_at_db_start().

trx_undo_lists_init(): Move to the only calling module, trx0rseg.cc.

trx_undo_mem_create_at_db_start(): Declare globally. Return the number
of pages.
2018-01-31 08:52:16 +02:00
Marko Mäkelä
8d1d38f953 Simplify access to the TRX_SYS page
trx_sysf_t: Remove.

trx_sysf_get(): Return the TRX_SYS page, not a pointer within it.

trx_sysf_rseg_get_space(), trx_sysf_rseg_get_page_no():
Remove a parameter, and merge the declaration and definition.
Take the TRX_SYS page as a parameter.

TRX_SYS_N_RSEGS: Correct the comment.

trx_sysf_rseg_find_free(), trx_sys_update_mysql_binlog_offset(),
trx_sys_update_wsrep_checkpoint(): Take the TRX_SYS page as a parameter.

trx_rseg_header_create(): Add a parameter for the TRX_SYS page.

trx_sysf_rseg_set_space(), trx_sysf_rseg_set_page_no(): Remove;
merge to the only caller, trx_rseg_header_create().
2018-01-31 08:52:16 +02:00
Marko Mäkelä
54c715acca Avoid an assertion failure on aborted startup
srv_init_abort_low(): Call srv_shutdown_bg_undo_sources() so that if
startup aborts while creating InnoDB system tables, the shutdown will
proceed correctly.
2018-01-31 08:52:16 +02:00
Sergey Vojtovich
db5bb785f9 Allocate trx_sys.mvcc at link time
trx_sys.mvcc was allocated dynamically for no good reason.
2018-01-20 16:10:36 +04:00
Marko Mäkelä
f8882cce93 Replace trx_sys_t* trx_sys with trx_sys_t trx_sys
There is only one transaction system object in InnoDB.
Allocate the storage for it at link time, not at runtime.

lock_rec_fetch_page(): Use the correct fetch mode BUF_GET.
Pages may never be deallocated from a tablespace while
record locks are pointing to them.
2018-01-20 16:10:36 +04:00
Sergey Vojtovich
d09f146934 MDEV-14756 - Remove trx_sys_t::rw_trx_list
Reduce divergence between trx_sys_t::rw_trx_hash and trx_sys_t::rw_trx_list
by not adding recovered COMMITTED transactions to trx_sys_t::rw_trx_list.

Such transactions are discarded immediately without creating trx object.

This also required to split rollback and cleanup phases of recovery. To
reflect these updates the following renames happened:
trx_rollback_or_clean_all_recovered() -> trx_rollback_all_recovered()
trx_rollback_or_clean_is_active -> trx_rollback_is_active
trx_rollback_or_clean_recovered() -> trx_rollback_recovered()
trx_cleanup_at_db_startup() -> trx_cleanup_recovered()

Also removed a hack from lock_trx_release_locks(). Instead let recovery
rollback thread to skip committed XA transactions.
2018-01-20 16:09:26 +04:00
Marko Mäkelä
4ef2e43080 Merge bb-10.2-ext into 10.3 2018-01-17 16:33:40 +02:00
Marko Mäkelä
f44017384a MDEV-14968 On upgrade, InnoDB reports "started; log sequence number 0"
srv_prepare_to_delete_redo_log_files(): Initialize srv_start_lsn.
2018-01-16 20:02:38 +02:00
Marko Mäkelä
be85c2dc88 Mariabackup --prepare: Do not access transactions or data dictionary
innobase_start_or_create_for_mysql(): Only start the data dictionary
and transaction subsystems in normal server startup and during
mariabackup --export.
2018-01-16 13:57:30 +02:00