1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-07 00:04:31 +03:00
Commit Graph

26 Commits

Author SHA1 Message Date
Marko Mäkelä
4ca355d863 MDEV-33894: Resurrect innodb_log_write_ahead_size
As part of commit 685d958e38 (MDEV-14425)
the parameter innodb_log_write_ahead_size was removed, because it was
thought that determining the physical block size would be a sufficient
replacement.

However, we can only determine the physical block size on Linux or
Microsoft Windows. On some file systems, the physical block size
is not relevant. For example, XFS uses a block size of 4096 bytes
even if the underlying block size may be smaller.

On Linux, we failed to determine the physical block size if
innodb_log_file_buffered=OFF was not requested or possible.
This will be fixed.

log_sys.write_size: The value of the reintroduced parameter
innodb_log_write_ahead_size. To keep it simple, this is read-only
and a power of two between 512 and 4096 bytes, so that the previous
alignment guarantees are fulfilled. This will replace the previous
log_sys.get_block_size().

log_sys.block_size, log_t::get_block_size(): Remove.

log_t::set_block_size(): Ensure that write_size will not be less
than the physical block size. There is no point to invoke this
function with 512 or less, because that is the minimum value of
write_size.

innodb_params_adjust(): Add some disabled code for adjusting
the minimum value and default value of innodb_log_write_ahead_size
to reflect the log_sys.write_size.

log_t::set_recovered(): Mark the recovery completed. This is the
place to adjust some things if we want to allow write_size>4096.

log_t::resize_write_buf(): Refer to write_size.

log_t::resize_start(): Refer to write_size instead of get_block_size().

log_write_buf(): Simplify some arithmetics and remove a goto.

log_t::write_buf(): Refer to write_size. If we are writing less than
that, do not switch buffers, but keep writing to the same buffer.
Move some code to improve the locality of reference.

recv_scan_log(): Refer to write_size instead of get_block_size().

os_file_create_func(): For type==OS_LOG_FILE on Linux, always invoke
os_file_log_maybe_unbuffered(), so that log_sys.set_block_size() will
be invoked even if we are not attempting to use O_DIRECT.

recv_sys_t::find_checkpoint(): Read the entire log header
in a single 12 KiB request into log_sys.buf.

Tested with:
./mtr --loose-innodb-log-write-ahead-size=4096
./mtr --loose-innodb-log-write-ahead-size=2048
2024-06-27 16:38:08 +03:00
Thirunarayanan Balathandayuthapani
baf276e6d4 MDEV-19229 Allow innodb_undo_tablespaces to be changed after database creation
trx_sys_t::undo_log_nonempty: Set to true if there are undo logs
to rollback and purge.

The algorithm for re-creating the undo tablespace when
trx_sys_t::undo_log_nonempty is disabled:

1) trx_sys_t::reset_page(): Reset the TRX_SYS page and assign all
rollback segment slots from 1..127 to FIL_NULL

2) Free the rollback segment header page of system tablespace
for the slots 1..127

3) Update the binlog and WSREP information in system tablespace
rollback segment header
Step (1), (2) and Step (3) should happen atomically within a
single mini-transaction.

4) srv_undo_delete_old_tablespaces(): Delete the old undo tablespaces
present in the undo log directory

5) Make checkpoint to get rid of old undo log tablespaces redo logs

6) Assign new start space id for the undo log tablespaces

7) Re-create the specified undo log tablespaces. InnoDB uses same
mtr for this one and step (6)

8) Make checkpoint again, so that server or mariabackup
can read the undo log tablespace page0 before applying
the redo logs

srv_undo_tablespaces_reinit(): Recreate the undo log tablespaces.
It does reset trx_sys page, delete the old undo tablespaces,
update the binlog offset, write set replication checkpoint
in system rollback segment page

trx_rseg_update_binlog_offset(): Added 2 new parameters to pass
binlog file name and binlog offset

trx_rseg_array_init(): Return error if the rollback segment
slot points to non-existent tablespace

srv_undo_tablespaces_init(): Added new parameter mtr
to initialize all undo tablespaces

trx_assign_rseg_low(): Allow the transaction to use the rollback
segment slots(1..127) even if InnoDB failed to change to the
requested innodb_undo_tablespaces=0

srv_start(): Override the user specified value of
innodb_undo_tablespaces variable with already existing actual
undo tablespaces

wf_incremental_process(): Detects whether TRX_SYS page has been
modified since last backup. If it is then incremental backup
fails and throws the information about taking full backup again

xb_assign_undo_space_start(): Removed the function. Because
undo001 has first undo space id value in page0

Added test case to test the scenario during startup and mariabackup
incremental process too.

Reviewed-by : Marko Mäkelä
Tested-by : Matthias Leich
2022-10-25 11:19:36 +05:30
Marko Mäkelä
618d820646 Merge 10.7 into 10.8 2022-10-13 10:42:41 +03:00
Marko Mäkelä
6dc157f8a6 Merge 10.5 into 10.6 2022-10-06 09:22:39 +03:00
Marko Mäkelä
de078e060e Merge 10.4 into 10.5 2022-10-06 08:29:56 +03:00
Marko Mäkelä
f600690c6b MDEV-29710: Skip some more tests on Valgrind 2022-10-05 20:37:54 +03:00
Marko Mäkelä
685d958e38 MDEV-14425 Improve the redo log for concurrency
The InnoDB redo log used to be formatted in blocks of 512 bytes.
The log blocks were encrypted and the checksum was calculated while
holding log_sys.mutex, creating a serious scalability bottleneck.

We remove the fixed-size redo log block structure altogether and
essentially turn every mini-transaction into a log block of its own.
This allows encryption and checksum calculations to be performed
on local mtr_t::m_log buffers, before acquiring log_sys.mutex.
The mutex only protects a memcpy() of the data to the shared
log_sys.buf, as well as the padding of the log, in case the
to-be-written part of the log would not end in a block boundary of
the underlying storage. For now, the "padding" consists of writing
a single NUL byte, to allow recovery and mariadb-backup to detect
the end of the circular log faster.

Like the previous implementation, we will overwrite the last log block
over and over again, until it has been completely filled. It would be
possible to write only up to the last completed block (if no more
recent write was requested), or to write dummy FILE_CHECKPOINT records
to fill the incomplete block, by invoking the currently disabled
function log_pad(). This would require adjustments to some logic around
log checkpoints, page flushing, and shutdown.

An upgrade after a crash of any previous version is not supported.
Logically empty log files from a previous version will be upgraded.

An attempt to start up InnoDB without a valid ib_logfile0 will be
refused. Previously, the redo log used to be created automatically
if it was missing. Only with with innodb_force_recovery=6, it is
possible to start InnoDB in read-only mode even if the log file
does not exist. This allows the contents of a possibly corrupted
database to be dumped.

Because a prepared backup from an earlier version of mariadb-backup
will create a 0-sized log file, we will allow an upgrade from such
log files, provided that the FIL_PAGE_FILE_FLUSH_LSN in the system
tablespace looks valid.

The 512-byte log checkpoint blocks at 0x200 and 0x600 will be replaced
with 64-byte log checkpoint blocks at 0x1000 and 0x2000.

The start of log records will move from 0x800 to 0x3000. This allows us
to use 4096-byte aligned blocks for all I/O in a future revision.

We extend the MDEV-12353 redo log record format as follows.

(1) Empty mini-transactions or extra NUL bytes will not be allowed.
(2) The end-of-minitransaction marker (a NUL byte) will be replaced
with a 1-bit sequence number, which will be toggled each time when the
circular log file wraps back to the beginning.
(3) After the sequence bit, a CRC-32C checksum of all data
(excluding the sequence bit) will written.
(4) If the log is encrypted, 8 bytes will be written before
the checksum and included in it. This is part of the
initialization vector (IV) of encrypted log data.
(5) File names, page numbers, and checkpoint information will not be
encrypted. Only the payload bytes of page-level log will be encrypted.
The tablespace ID and page number will form part of the IV.
(6) For padding, arbitrary-length FILE_CHECKPOINT records may be written,
with all-zero payload, and with the normal end marker and checksum.
The minimum size is 7 bytes, or 7+8 with innodb_encrypt_log=ON.

In mariadb-backup and in Galera snapshot transfer (SST) scripts, we will
no longer remove ib_logfile0 or create an empty ib_logfile0. Server startup
will require a valid log file. When resizing the log, we will create
a logically empty ib_logfile101 at the current LSN and use an atomic rename
to replace ib_logfile0 with it. See the test innodb.log_file_size.

Because there is no mandatory padding in the log file, we are able
to create a dummy log file as of an arbitrary log sequence number.
See the test mariabackup.huge_lsn.

The parameter innodb_log_write_ahead_size and the
INFORMATION_SCHEMA.INNODB_METRICS counter log_padded will be removed.

The minimum value of innodb_log_buffer_size will be increased to 2MiB
(because log_sys.buf will replace recv_sys.buf) and the increment
adjusted to 4096 bytes (the maximum log block size).

The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed:

os_log_fsyncs
os_log_pending_fsyncs
log_pending_log_flushes
log_pending_checkpoint_writes

The following status variables will be removed:

Innodb_os_log_fsyncs (this is included in Innodb_data_fsyncs)
Innodb_os_log_pending_fsyncs (this was limited to at most 1 by design)

log_sys.get_block_size(): Return the physical block size of the log file.
This is only implemented on Linux and Microsoft Windows for now, and for
the power-of-2 block sizes between 64 and 4096 bytes (the minimum and
maximum size of a checkpoint block). If the block size is anything else,
the traditional 512-byte size will be used via normal file system
buffering.

If the file system buffers can be bypassed, a message like the following
will be issued:

InnoDB: File system buffers for log disabled (block size=512 bytes)
InnoDB: File system buffers for log disabled (block size=4096 bytes)

This has been tested on Linux and Microsoft Windows with both sizes.

On Linux, only enable O_DIRECT on the log for innodb_flush_method=O_DSYNC.
Tests in 3 different environments where the log is stored in a device
with a physical block size of 512 bytes are yielding better throughput
without O_DIRECT. This could be due to the fact that in the event the
last log block is being overwritten (if multiple transactions would
become durable at the same time, and each of will write a small
number of bytes to the last log block), it should be faster to re-copy
data from log_sys.buf or log_sys.flush_buf to the kernel buffer,
to be finally written at fdatasync() time.

The parameter innodb_flush_method=O_DSYNC will imply O_DIRECT for
data files. This option will enable O_DIRECT on the log file on Linux.
It may be unsafe to use when the storage device does not support
FUA (Force Unit Access) mode.

When the server is compiled WITH_PMEM=ON, we will use memory-mapped
I/O for the log file if the log resides on a "mount -o dax" device.
We will identify PMEM in a start-up message:

InnoDB: log sequence number 0 (memory-mapped); transaction id 3

On Linux, we will also invoke mmap() on any ib_logfile0 that resides
in /dev/shm, effectively treating the log file as persistent memory.
This should speed up "./mtr --mem" and increase the test coverage of
PMEM on non-PMEM hardware. It also allows users to estimate how much
the performance would be improved by installing persistent memory.
On other tmpfs file systems such as /run, we will not use mmap().

mariadb-backup: Eliminated several variables. We will refer
directly to recv_sys and log_sys.

backup_wait_for_lsn(): Detect non-progress of
xtrabackup_copy_logfile(). In this new log format with
arbitrary-sized blocks, we can only detect log file overrun
indirectly, by observing that the scanned log sequence number
is not advancing.

xtrabackup_copy_logfile(): On PMEM, do not modify the sequence bit,
because we are not allowed to modify the server's log file, and our
memory mapping is read-only.

trx_flush_log_if_needed_low(): Do not use the callback on pmem.
Using neither flush_lock nor write_lock around PMEM writes seems
to yield the best performance. The pmem_persist() calls may
still be somewhat slower than the pwrite() and fdatasync() based
interface (PMEM mounted without -o dax).

recv_sys_t::buf: Remove. We will use log_sys.buf for parsing.

recv_sys_t::MTR_SIZE_MAX: Replaces RECV_SCAN_SIZE.

recv_sys_t::file_checkpoint: Renamed from mlog_checkpoint_lsn.

recv_sys_t, log_sys_t: Removed many data members.

recv_sys.lsn: Renamed from recv_sys.recovered_lsn.
recv_sys.offset: Renamed from recv_sys.recovered_offset.
log_sys.buf_size: Replaces srv_log_buffer_size.

recv_buf: A smart pointer that wraps log_sys.buf[recv_sys.offset]
when the buffer is being allocated from the memory heap.

recv_ring: A smart pointer that wraps a circular log_sys.buf[] that is
backed by ib_logfile0. The pointer will wrap from recv_sys.len
(log_sys.file_size) to log_sys.START_OFFSET. For the record that
wraps around, we may copy file name or record payload data to
the auxiliary buffer decrypt_buf in order to have a contiguous
block of memory. The maximum size of a record is less than
innodb_page_size bytes.

recv_sys_t::parse(): Take the smart pointer as a template parameter.
Do not temporarily add a trailing NUL byte to FILE_ records, because
we are not supposed to modify the memory-mapped log file. (It is
attached in read-write mode already during recovery.)

recv_sys_t::parse_mtr(): Wrapper for recv_sys_t::parse().

recv_sys_t::parse_pmem(): Like parse_mtr(), but if PREMATURE_EOF would be
returned on PMEM, use recv_ring to wrap around the buffer to the start.

mtr_t::finish_write(), log_close(): Do not enforce log_sys.max_buf_free
on PMEM, because it has no meaning on the mmap-based log.

log_sys.write_to_buf: Count writes to log_sys.buf. Replaces
srv_stats.log_write_requests and export_vars.innodb_log_write_requests.
Protected by log_sys.mutex. Updated consistently in log_close().
Previously, mtr_t::commit() conditionally updated the count,
which was inconsistent.

log_sys.write_to_log: Count swaps of log_sys.buf and log_sys.flush_buf,
for writing to log_sys.log (the ib_logfile0). Replaces
srv_stats.log_writes and export_vars.innodb_log_writes.
Protected by log_sys.mutex.

log_sys.waits: Count waits in append_prepare(). Replaces
srv_stats.log_waits and export_vars.innodb_log_waits.

recv_recover_page(): Do not unnecessarily acquire
log_sys.flush_order_mutex. We are inserting the blocks in arbitary
order anyway, to be adjusted in recv_sys.apply(true).

We will change the definition of flush_lock and write_lock to
avoid potential false sharing. Depending on sizeof(log_sys) and
CPU_LEVEL1_DCACHE_LINESIZE, the flush_lock and write_lock could
share a cache line with each other or with the last data members
of log_sys.

Thanks to Matthias Leich for providing https://rr-project.org traces
for various failures during the development, and to
Thirunarayanan Balathandayuthapani for his help in debugging
some of the recovery code. And thanks to the developers of the
rr debugger for a tool without which extensive changes to InnoDB
would be very challenging to get right.

Thanks to Vladislav Vaintroub for useful feedback and
to him, Axel Schwenke and Krunal Bauskar for testing the performance.
2022-01-21 16:03:47 +02:00
Daniel Black
d9f7a6b331 MDEV-27158: humanize the bytes in innodb info/error messages
Log messages like total size = 17179869184, chunk size = 134217728
get hard to read. If we normalize it down to IEC units is easier.

Idea thanks to Axel Schwenke.

Review thanks to Eugene Kosov and Marko Mäkelä

$ mariadblocal --innodb-buffer-pool-size=30G --innodb-log-file-size=128M
Installing MariaDB/MySQL system tables in '/tmp/build-mariadb-server-10.7-datadir' ...
2021-12-09  9:54:04 0 [Note] /home/dan/repos/build-mariadb-server-10.7/sql/mysqld (server 10.7.2-MariaDB) starting as process 250473 ...
2021-12-09  9:54:04 0 [Note] InnoDB: The first data file './ibdata1' did not exist. A new tablespace will be created!
2021-12-09  9:54:04 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2021-12-09  9:54:04 0 [Note] InnoDB: Number of transaction pools: 1
2021-12-09  9:54:04 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2021-12-09  9:54:04 0 [Note] InnoDB: Using liburing
2021-12-09  9:54:04 0 [Note] InnoDB: Initializing buffer pool, total size = 128.000MiB, chunk size = 128.000MiB
2021-12-09  9:54:04 0 [Note] InnoDB: Completed initialization of buffer pool
2021-12-09  9:54:04 0 [Note] InnoDB: Setting O_DIRECT on file ./ibdata1 failed
2021-12-09  9:54:04 0 [Note] InnoDB: Setting file './ibdata1' size to 12.000MiB. Physically writing the file full; Please wait ...
2021-12-09  9:54:04 0 [Note] InnoDB: File './ibdata1' size is now 12.000MiB.
2021-12-09  9:54:04 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 96.000MiB
2021-12-09  9:54:04 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
2021-12-09  9:54:04 0 [Note] InnoDB: New log file created, LSN=10317
2021-12-09  9:54:04 0 [Note] InnoDB: Doublewrite buffer not found: creating new
2021-12-09  9:54:04 0 [Note] InnoDB: Doublewrite buffer created
2021-12-09  9:54:04 0 [Note] InnoDB: 128 rollback segments are active.
2021-12-09  9:54:04 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2021-12-09  9:54:04 0 [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
2021-12-09  9:54:04 0 [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
2021-12-09  9:54:04 0 [Note] InnoDB: 10.7.2 started; log sequence number 0; transaction id 3
OK
2021-12-09  9:54:04 0 [Note] sql/mysqld (server 10.7.2-MariaDB) starting as process 250501 ...
2021-12-09  9:54:04 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2021-12-09  9:54:04 0 [Note] InnoDB: Number of transaction pools: 1
2021-12-09  9:54:04 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2021-12-09  9:54:04 0 [Note] InnoDB: Using liburing
2021-12-09  9:54:04 0 [Note] InnoDB: Initializing buffer pool, total size = 30.000GiB, chunk size = 128.000MiB
2021-12-09  9:54:04 0 [Note] InnoDB: Completed initialization of buffer pool
2021-12-09  9:54:04 0 [Note] InnoDB: Setting O_DIRECT on file ./ibdata1 failed
2021-12-09  9:54:04 0 [Note] InnoDB: Resizing redo log from 96.000MiB to 128.000MiB; LSN=41361
2021-12-09  9:54:04 0 [Note] InnoDB: Starting to delete and rewrite log file.
2021-12-09  9:54:04 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 128.000MiB
2021-12-09  9:54:04 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
2021-12-09  9:54:04 0 [Note] InnoDB: New log file created, LSN=41361
2021-12-09  9:54:04 0 [Note] InnoDB: 128 rollback segments are active.
2021-12-09  9:54:04 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2021-12-09  9:54:04 0 [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
2021-12-09  9:54:04 0 [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
2021-12-09  9:54:04 0 [Note] InnoDB: 10.7.2 started; log sequence number 41349; transaction id 14
2021-12-09  9:54:04 0 [Note] InnoDB: Loading buffer pool(s) from /tmp/build-mariadb-server-10.7-datadir/ib_buffer_pool
2021-12-09  9:54:04 0 [Note] Plugin 'FEEDBACK' is disabled.
2021-12-09  9:54:04 0 [Note] InnoDB: Buffer pool(s) load completed at 211209  9:54:04
2021-12-09  9:54:04 0 [Note] sql/mysqld: ready for connections.
Version: '10.7.2-MariaDB'  socket: '/tmp/build-mariadb-server-10.7.sock'  port: 0  Source distribution
2021-12-09  9:56:57 0 [Note] sql/mysqld (initiated by: unknown): Normal shutdown
2021-12-09  9:56:57 0 [Note] InnoDB: FTS optimize thread exiting.
2021-12-09  9:56:57 0 [Note] InnoDB: Starting shutdown...
2021-12-09  9:56:57 0 [Note] InnoDB: Dumping buffer pool(s) to /tmp/build-mariadb-server-10.7-datadir/ib_buffer_pool
2021-12-09  9:56:57 0 [Note] InnoDB: Buffer pool(s) dump completed at 211209  9:56:57
2021-12-09  9:56:57 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
2021-12-09  9:56:57 0 [Note] InnoDB: Shutdown completed; log sequence number 42602; transaction id 15
2021-12-09  9:56:57 0 [Note] sql/mysqld: Shutdown complete
2022-01-18 14:20:59 +02:00
Marko Mäkelä
3f5726768f Merge 10.5 into 10.6 2022-01-04 09:26:38 +02:00
Julius Goryavsky
55bb933a88 Merge branch 10.4 into 10.5 2021-12-26 12:51:04 +01:00
Marko Mäkelä
ef9517eb81 MDEV-27268 Failed InnoDB initialization leaves garbage files behind
create_log_files(): Check log_set_capacity() before modifying
or creating any log files.

innobase_start_or_create_for_mysql(): If create_log_files()
fails and we were initializing a new database, delete the
system tablespace files before exiting.
2021-12-15 14:17:55 +02:00
Marko Mäkelä
3a427c568b Cleanup: Remove useless message "If you are installing InnoDB" 2021-05-14 08:26:51 +03:00
Marko Mäkelä
cf552f5886 MDEV-25312 Replace fil_space_t::name with fil_space_t::name()
A consistency check for fil_space_t::name is causing recovery failures
in MDEV-25180 (Atomic ALTER TABLE). So, we'd better remove that field
altogether.

fil_space_t::name was more or less a copy of dict_table_t::name
(except for some special cases), and it was not being used for
anything useful.

There used to be a name_hash, but it had been removed already in
commit a75dbfd718 (MDEV-12266).

We will also remove os_normalize_path(), OS_PATH_SEPARATOR,
OS_PATH_SEPATOR_ALT. On Microsoft Windows, we will treat \ and /
roughly in the same way. The intention is that for per-table
tablespaces, the filenames will always follow the pattern
prefix/databasename/tablename.ibd. (Any \ in the prefix must not
be converted.)

ut_basename_noext(): Remove (unused function).

read_link_file(): Replaces RemoteDatafile::read_link_file().
We will ensure that the last two path component separators are
forward slashes (converting up to 2 trailing backslashes on
Microsoft Windows), so that everywhere else we can
assume that data file names end in "/databasename/tablename.ibd".

Note: On Microsoft Windows, path names that start with \\?\ must
not contain / as path component separators. Previously, such paths
did work in the DATA DIRECTORY argument of InnoDB tables.

Reviewed by: Vladislav Vaintroub
2021-04-07 18:01:13 +03:00
Eugene Kosov
9ef2d29ff4 MDEV-14425 deprecate and ignore innodb_log_files_in_group
Now there can be only one log file instead of several which
logically work as a single file.

Possible names of redo log files: ib_logfile0,
ib_logfile101 (for just created one)

innodb_log_fiels_in_group: value of this variable is not used
by InnoDB. Possible values are still 1..100, to not break upgrade

LOG_FILE_NAME: add constant of value "ib_logfile0"
LOG_FILE_NAME_PREFIX: add constant of value "ib_logfile"

get_log_file_path(): convenience function that returns full
path of a redo log file

SRV_N_LOG_FILES_MAX: removed

srv_n_log_files: we can't remove this for compatibility reasons,
but now server doesn't use this variable

log_sys_t::file::fd: now just one, not std::vector

log_sys_t::log_capacity: removed word 'group'

find_and_check_log_file(): part of logic from huge srv_start()
moved here

recv_sys_t::files: file descriptors of redo log files.
There can be several of those in case we're upgrading
from older MariaDB version.

recv_sys_t::remove_extra_log_files: whether to remove
ib_logfile{1,2,3...} after successfull upgrade.

recv_sys_t::read(): open if needed and read from one
of several log files

recv_sys_t::files_size(): open if needed and return files count

redo_file_sizes_are_correct(): check that redo log files
sizes are equal. Just to log an error for a user.
Corresponding check was moved from srv0start.cc

namespace deprecated: put all deprecated variables here to
prevent usage of it by us, developers
2020-02-19 12:21:59 +03:00
Marko Mäkelä
95e903261e MDEV-21216 InnoDB does dirty read of TRX_SYS page before recovery
InnoDB startup was discovering undo tablespaces in a dirty way.
It was reading a possibly stale copy of the TRX_SYS page before
processing any redo log records.

srv_start(): Do not call buf_pool_invalidate(). Invoke
trx_rseg_get_n_undo_tablespaces() after the recovery has been initiated.

recv_recovery_from_checkpoint_start(): Assert that the buffer pool is
empty. This used to be guaranteed by the buf_pool_invalidate() call.

trx_rseg_get_n_undo_tablespaces(): Move to the calling compilation unit,
and reimplement in a simpler way.

srv_undo_tablespace_create(): Remove the constant parameter
size=SRV_UNDO_TABLESPACE_SIZE_IN_PAGES.

srv_undo_tablespace_open(): Reimplement in a cleaner way, with
more robust error handling.

srv_all_undo_tablespaces_open(): Split from srv_undo_tablespaces_init().

srv_undo_tablespaces_init(): Read all "undo001","undo002" tablespace
files directly, without consulting the TRX_SYS page via calling
trx_rseg_get_n_undo_tablespaces().

This is joint work with Thirunarayanan Balathandayuthapani.
2019-12-04 15:34:28 +02:00
Aleksey Midenkov
fbbdd44f72 Tests: removed --transaction-registry option [#387]
1. Reverts "Tests: disabled TRT for some IB tests [#302]"
6d78496aee

2. Removes setting TRANSACTION_REGISTRY=0 in mysqldump

--system-versioning-transaction-registry now is OFF by default.

This commit should be reverted back if the default will change.

Tests affected: mysqldump mysqldump-max openssl_1
2017-12-21 12:51:57 +03:00
Aleksey Midenkov
6d78496aee Tests: disabled TRT for some IB tests [#302]
As they fail on TRT schema check:
innodb.log_file
innodb.table_flags
innodb.row_format_redundant
encryption.innodb_encrypt_log_corruption
encryption.innodb_first_page
2017-11-17 11:51:41 +03:00
Marko Mäkelä
84e4e4506f Reduce the granularity of innodb_log_file_size
In Mariabackup, we would want the backed-up redo log file size to be
a multiple of 512 bytes, or OS_FILE_LOG_BLOCK_SIZE. However, at startup,
InnoDB would be picky, requiring the file size to be a multiple of
innodb_page_size.

Furthermore, InnoDB would require the parameter to be a multiple of
one megabyte, while the minimum granularity is 512 bytes. Because
the data-file-oriented fil_io() API is being used for writing the
InnoDB redo log, writes will for now require innodb_log_file_size to
be a multiple of the maximum innodb_page_size (65536 bytes).

To complicate matters, InnoDB startup divided srv_log_file_size by
UNIV_PAGE_SIZE, so that initially, the unit was bytes, and later it
was innodb_page_size. We will simplify this and keep srv_log_file_size
in bytes at all times.

innobase_log_file_size: Remove. Remove some obsolete checks against
overflow on 32-bit systems. srv_log_file_size is always 64 bits, and
the maximum size 512GiB in multiples of innodb_page_size always fits
in ulint (which is 32 or 64 bits). 512GiB would be 8,388,608*64KiB or
134,217,728*4KiB.

log_init(): Remove the parameter file_size that was always passed as
srv_log_file_size.

log_set_capacity(): Add a parameter for passing the requested file size.

srv_log_file_size_requested: Declare static in srv0start.cc.

create_log_file(), create_log_files(),
innobase_start_or_create_for_mysql(): Invoke fil_node_create()
with srv_log_file_size expressed in multiples of innodb_page_size.

innobase_start_or_create_for_mysql(): Require the redo log file sizes
to be multiples of 512 bytes.
2017-06-29 23:15:05 +03:00
Sergei Petrunia
5e0ed6912f Merge 10.2 into bb-10.2-mariarocks 2017-04-03 13:48:05 +03:00
Sergei Golubchik
b2865a437f search_pattern_in_file.inc changes
1. Special mode to search in error logs: if SEARCH_RANGE is not set,
   the file is considered an error log and the search is performed
   since the last CURRENT_TEST: line
2. Number of matches is printed too. "FOUND 5 /foo/ in bar".
   Use greedy .* at the end of the pattern if number of matches
   isn't stable. If nothing is found it's still "NOT FOUND",
   not "FOUND 0".
3. SEARCH_ABORT specifies the prefix of the output.
   Can be "NOT FOUND" or "FOUND" as before,
   but also "FOUND 5 " if needed.
2017-03-31 19:28:58 +02:00
Marko Mäkelä
124bae082b MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance
InnoDB divides the allocation of undo logs into rollback segments.
The DB_ROLL_PTR system column of clustered indexes can address up to
128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only
created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin
for MySQL 5.1, all 128 rollback segments were created.

MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs.
On upgrade, unless a slow shutdown (innodb_fast_shutdown=0)
was performed on the old server instance, these rollback segments
could be in use by transactions that are in XA PREPARE state or
transactions that were left behind by a server kill followed by a
normal shutdown immediately after restart.

Persistent tables cannot refer to temporary undo logs or vice versa.
Therefore, we should keep two distinct sets of rollback segments:
one for persistent tables and another for temporary tables. In this way,
all 128 rollback segments will be available for both types of tables,
which could improve performance. Also, MariaDB 10.2 will remain more
compatible than MySQL 5.7 with data files from earlier versions of
MySQL or MariaDB.

trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary
rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will
be solely for persistent undo logs.

srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS.

srv_available_undo_logs: Change the type to ulong.

trx_rseg_get_on_id(): Remove. Instead, let the callers refer to
trx_sys directly.

trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters.
These functions only deal with persistent undo logs.

trx_temp_rseg_create(): New function, to create all temporary rollback
segments at server startup.

trx_rseg_t::is_persistent(): Determine if the rollback segment is for
persistent tables.

trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on
context (such as table handle) whether the DB_ROLL_PTR is referring to
a persistent undo log.

trx_sys_create_rsegs(): Remove all parameters, which were always passed
as global variables. Instead, modify the global variables directly.

enum trx_rseg_type_t: Remove.

trx_t::get_temp_rseg(): A method to ensure that a temporary
rollback segment has been assigned for the transaction.

trx_t::assign_temp_rseg(): Replaces trx_assign_rseg().

trx_purge_free_segment(), trx_purge_truncate_rseg_history():
Remove the redundant variable noredo=false.
Temporary undo logs are discarded immediately at transaction commit
or rollback, not lazily by purge.

trx_purge_mark_undo_for_truncate(): Remove references to the
temporary rollback segments.

trx_purge_mark_undo_for_truncate(): Remove a check for temporary
rollback segments. Only the dedicated persistent undo log tablespaces
can be truncated.

trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the
parameter is_temp.

trx_rseg_mem_restore(): Split from trx_rseg_mem_create().
Initialize the undo log and the rollback segment from the file
data structures.

trx_sysf_get_n_rseg_slots(): Renamed from
trx_sysf_used_slots_for_redo_rseg(). Count the persistent
rollback segment headers that have been initialized.

trx_sys_close(): Also free trx_sys->temp_rsegs[].

get_next_redo_rseg(): Merged to trx_assign_rseg_low().

trx_assign_rseg_low(): Remove the parameters and access the
global variables directly. Revert to simple round-robin, now that
the whole trx_sys->rseg_array[] is for persistent undo log again.

get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg().

srv_undo_tablespaces_init(): Remove some parameters and use the
global variables directly. Clarify some error messages.

Adjust the test innodb.log_file. Apparently, before these changes,
InnoDB somehow ignored missing dedicated undo tablespace files that
are pointed by the TRX_SYS header page, possibly losing part of
essential transaction system state.
2017-03-31 18:53:04 +03:00
Marko Mäkelä
2af28a363c MDEV-11782: Redefine the innodb_encrypt_log format
Write only one encryption key to the checkpoint page.
Use 4 bytes of nonce. Encrypt more of each redo log block,
only skipping the 4-byte field LOG_BLOCK_HDR_NO which the
initialization vector is derived from.

Issue notes, not warning messages for rewriting the redo log files.

recv_recovery_from_checkpoint_finish(): Do not generate any redo log,
because we must avoid that before rewriting the redo log files, or
otherwise a crash during a redo log rewrite (removing or adding
encryption) may end up making the database unrecoverable.
Instead, do these tasks in innobase_start_or_create_for_mysql().

Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some
unreachable code and duplicated error messages for log corruption.

LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo
log format.

log_group_t::is_encrypted(), log_t::is_encrypted(): Determine
if the redo log is in encrypted format.

recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED.

srv_prepare_to_delete_redo_log_files(): Display NOTE messages about
adding or removing encryption. Do not issue warnings for redo log
resizing any more.

innobase_start_or_create_for_mysql(): Rebuild the redo logs also when
the encryption changes.

innodb_log_checksums_func_update(): Always use the CRC-32C checksum
if innodb_encrypt_log. If needed, issue a warning
that innodb_encrypt_log implies innodb_log_checksums.

log_group_write_buf(): Compute the checksum on the encrypted
block contents, so that transmission errors or incomplete blocks can be
detected without decrypting.

Rewrite most of the redo log encryption code. Only remember one
encryption key at a time (but remember up to 5 when upgrading from the
MariaDB 10.1 format.)
2017-02-15 08:07:20 +02:00
Marko Mäkelä
743ac7c2d0 MDEV-12061 Allow innodb_log_files_in_group=1
The InnoDB redo log consists of a list of files that logically form
a bigger file, as if the individual files were concatenated together.

The first file will always be written on redo log checkpoint, because
the two checkpoint pages are at the start of the single logical
redo log file.

There is no technical reason why InnoDB requires at least 2 files
to exist. Let us reduce the minimum number to 1. In that way,
restoring from backups will become easier, since InnoDB can directly
deal with a single backed-up redo log file.
2017-02-15 08:07:20 +02:00
Marko Mäkelä
a440d6ed3a MDEV-11948 innodb.log_file fails in buildbot on CentOS 5
Rewrite the test so that the main server is restarted, instead of
--exec $MYSQLD_CMD. In this way, the test can be run with Valgrind
and with any --mysqld=--innodb-page-size.

Also remove the workaround --skip-innodb-use-native-aio. It should
not be needed when we are inheriting the server parameters from
the test environment.
2017-02-06 10:45:18 +02:00
Marko Mäkelä
7128328d41 Remove a work-around for MDEV-11689.
Also, work around MDEV-11948 by disabling native asynchronous I/O.
2017-01-31 10:23:21 +02:00
Marko Mäkelä
2de0e42af5 Import and adjust the InnoDB redo log tests from MySQL 5.7. 2017-01-27 17:53:02 +02:00