mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-12-06 05:42:06 +03:00

Author	SHA1	Message	Date
Alexander Barkov	a923d6f49c	MDEV-28769 Assertion `(m_ci->state & 32) \|\| m_with_collate' failed in Lex_exact_charset_opt_extended_collate::Lex_exact_charset_opt_extended_collate on SET NAMES These system variables: @@character_set_client @@character_set_connection @@character_set_database @@character_set_filesystem @@character_set_results @@character_set_server can now be set in numeric format only to IDs of default collations, e.g.: SET @@character_set_xxx=9; -- OK (latin2_general_ci is default) SET @@character_set_xxx=2; -- ERROR (latin2_czech_cs is not default) SET @@character_set_xxx=21; -- ERROR (latin2_hungarian_ci is not default) Before this change the server accepted IDs of non-default collations so all three examples above worked without errors, but this could lead to unexpected behavior in later statements.	2022-06-16 10:38:35 +04:00
Marko Mäkelä	51a4fcd565	Merge 10.9 into 10.10	2022-06-15 10:07:31 +03:00
Marko Mäkelä	9fe784ff7e	Merge 10.8 into 10.9	2022-06-15 10:01:51 +03:00
Marko Mäkelä	4c0cd953ab	MDEV-28766: SET GLOBAL innodb_log_file_buffering In commit `c4c8830709` (MDEV-28111) we disabled the file system cache on the InnoDB write-ahead log file (ib_logfile0) by default on Linux. It turns out that especially with innodb_flush_trx_log_at_commit=2, writing to the log via the file system cache typically improves throughput, especially on slow storage or at a small number of concurrent transactions. For other values of innodb_flush_log_at_trx_commit, direct writes were observed to be mostly but not always faster. Whether it pays off to disable the file system cache on the log may depend on the type of storage, the workload, and the operating system kernel version. On Linux and Microsoft Windows, we will introduce the settable Boolean global variable innodb_log_file_buffering that indicates whether the file system cache on the redo log file is enabled. The default value is innodb_log_file_buffering=OFF. If the server is started up with innodb_flush_log_at_trx_commit=2, the value will be changed to innodb_log_file_buffering=ON. When a persistent memory interface is being used for the log, the value cannot be changed from innodb_log_file_buffering=OFF. On Linux, when the physical block size cannot be determined to be a power of 2 between 64 and 4096 bytes, the file system cache cannot be disabled, and innodb_log_file_buffering=ON cannot be changed. Server log messages will indicate whether the file system cache is enabled for the redo log: [Note] InnoDB: Buffered log writes (block size=512 bytes) [Note] InnoDB: File system buffers for log disabled (block size=512 bytes) After this change, the startup parameter innodb_flush_method will no longer control whether O_DIRECT will be set on the redo log on Linux. On other operating systems that support O_DIRECT, no interface has been implemented for controlling the file system cache for the redo log. The innodb_flush_method values O_DIRECT, O_DIRECT_NO_FSYNC, O_DSYNC will enable O_DIRECT for data files, not the log. Tested by: Matthias Leich, Axel Schwenke	2022-06-14 17:46:47 +03:00
Marko Mäkelä	32edabd1f2	Merge 10.9 into 10.10	2022-06-09 15:26:09 +03:00
Marko Mäkelä	5a33a37682	Merge 10.8 into 10.9	2022-06-07 09:20:07 +03:00
Marko Mäkelä	fdc039db29	MDEV-28540 Deprecate and ignore the parameter innodb_prefix_index_cluster_optimization The parameter innodb_prefix_index_cluster_optimization used to enable an optimization that was added in `cb37c55768` and was disabled by default. We will unconditionally enable the extension and mark the parameter as deprecated. Related to this, the counters Innodb_secondary_index_triggered_cluster_reads and Innodb_secondary_index_triggered_cluster_reads_avoided allowed to determine the usefulness of this optimization. Now that the configuration parameter is disabled, the counters do not serve any useful purpose and can be removed. row_search_with_covering_prefix(): Fix a bug that caused an incorrect result to be returned.	2022-06-03 12:20:20 +03:00
Marko Mäkelä	6b9bba41e8	MDEV-28554: Remove innodb_version INNODB_VERSION_STR: Replaced with PACKAGE_VERSION (non-functional change). INNODB_VERSION_SHORT: Replaced with direct use of MYSQL_VERSION_MAJOR << 8 \| MYSQL_VERSION_MINOR. check_version(): Simplify the mariadb-backup version check, and require the server version to be MariaDB 10.8 or later, because that is when the InnoDB redo log format was last changed.	2022-06-03 12:20:19 +03:00
Haidong Ji	41068a890e	MDEV-27314 Condense innodb buffer pool resize message InnoDB buffer pool resize messages are more succinct from this change: Before: ``` 2022-05-07 17:10:33 0 [Note] InnoDB: Completed resizing buffer pool from 14745600 to 19660800 bytes. 2022-05-07 17:10:33 0 [Note] InnoDB: Completed resizing buffer pool. 2022-05-07 17:10:33 8 [Note] InnoDB: Completed resizing buffer pool. (New size: 19660800 bytes). ``` After: ``` 2022-05-07 17:10:33 0 [Note] InnoDB: Completed resizing buffer pool from 14745600 to 19660800 bytes. ``` Additionally, the INNODB_BUFFER_POOL_RESIZE_STATUS has more complete info: it contains both the old and new buffer pool size values.	2022-05-26 12:10:29 +10:00
Tingyao Nian	b3df1ec97a	MDEV-24815 Add 'allow-suspicious-udfs' and 'skip-grant-tables' to system variables Make two existing command line options "allow-suspicious-udfs" and "skip-grant-tables" visible as global system variables. Both options have security implications, but users were not able to check their states in the server prior to this change. This was a security issue, as the user may not be aware if the options are enabled. By adding them into system variables, it increases users’ visibility into their security configurations. Create new MTR tests to verify that the system variables align with the command line options. Minor adjustments to the existing MTR due to the new members in system variables. Before: mysql> SHOW VARIABLES WHERE Variable_Name LIKE 'allow_suspicious_udfs' OR Variable_Name LIKE 'skip_grant_tables'; Empty set (0.000 sec) After: mysql> SHOW VARIABLES WHERE Variable_Name LIKE 'allow_suspicious_udfs' OR Variable_Name LIKE 'skip_grant_tables'; +-----------------------+-------+ \| Variable_name \| Value \| +-----------------------+-------+ \| allow_suspicious_udfs \| OFF \| \| skip_grant_tables \| OFF \| +-----------------------+-------+ All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.	2022-05-26 11:23:13 +10:00
Sergei Golubchik	bf2bdd1a1a	Merge branch '10.8' into 10.9	2022-05-19 14:07:55 +02:00
Sergei Golubchik	b7ffccf49b	Merge branch '10.7' into 10.8	2022-05-18 13:26:48 +02:00
Sergei Golubchik	99a433ed1c	Merge branch '10.6' into 10.7	2022-05-18 10:34:38 +02:00
Sergei Golubchik	b2187662bc	Merge branch '10.5' into 10.6	2022-05-18 10:30:47 +02:00
Alexey Botchkov	b03ab1270d	MDEV-28490 Strange result truncation with group_concat_max_len=1GB. Arythmetic can overrun the uint type when possible group_concat_max_len is multiplied to collation.mbmaxlen (can easily be like 4). So use ulonglong there for calculations.	2022-05-15 23:28:06 +04:00
Marko Mäkelä	504a3b32f6	Merge 10.8 into 10.9	2022-04-28 15:54:03 +03:00
Marko Mäkelä	133c2129cd	Merge 10.7 into 10.8	2022-04-27 10:43:00 +03:00
Marko Mäkelä	638afc4acf	Merge 10.6 into 10.7	2022-04-26 18:59:40 +03:00
Marko Mäkelä	e135edec3a	Merge 10.5 into 10.6	2022-04-26 15:21:20 +03:00
Marko Mäkelä	c009ce7dd0	MDEV-27094 Debug builds include useless InnoDB "disabled" options This is a backport of commit `4489a89c71` in order to remove the test innodb.redo_log_during_checkpoint that would cause trouble in the DBUG subsystem invoked by safe_mutex_lock() via log_checkpoint(). Before commit `7cffb5f6e8` these mutexes were of different type. The following options were introduced in commit `2e814d4702` (mariadb-10.2.2) and have little use: innodb_disable_resize_buffer_pool_debug had no effect even in MariaDB 10.2.2 or MySQL 5.7.9. It was introduced in mysql/mysql-server@5c4094cf49 to work around a problem that was fixed in mysql/mysql-server@2957ae4f99 (but the parameter was not removed). innodb_page_cleaner_disabled_debug and innodb_master_thread_disabled_debug are only used by the test innodb.redo_log_during_checkpoint that will be removed as part of this commit. innodb_dict_stats_disabled_debug is only used by that test, and it is redundant because one could simply use innodb_stats_persistent=OFF or the STATS_PERSISTENT=0 attribute of the table in the test to achieve the same effect.	2022-04-22 12:48:40 +03:00
Marko Mäkelä	fae0ccad6e	Merge 10.5 into 10.6	2022-04-21 17:46:40 +03:00
Daniel Black	580cbd18b3	Merge branch 10.4 into 10.5 A few of constaint -> constraint	2022-04-21 15:47:03 +10:00
Rucha Deodhar	5945e420f1	MDEV-24920: Merge "old" SQL variable to "old_mode" sql variable Analysis: There are 2 server variables- "old_mode" and "old". "old" is no longer needed as "old_mode" has replaced it (however still used in some places in the code). "old_mode" and "old" has same purpose- emulate behavior from previous MariaDB versions. So they can be merged to avoid confusion. Fix: Deprecate "old" variable and create another mode for @@old_mode to mimic behavior of previous "old" variable. Create specific modes for specifix task that --old sql variable was doing earlier and use the new modes instead.	2022-04-20 00:30:22 +05:30
Rucha Deodhar	3327bb6098	MDEV-22266: Diagnostics_area::sql_errno() const: Assertion `m_status == DA_ERROR' failed on SELECT after setting tmp_disk_table_size. Analysis: Mismatch in number of warnings between "194 warnings" vs "64 rows in set" is because of max_error_count variable which has default value of 64. About the corrupted tables, the error that occurs because of insufficient tmp_disk_table_size variable is not reported correctly and we continue to execute the statement. But because the previous error (about table being full)is not reported correctly, this error moves up the stack and is wrongly reported as parsing error later on while parsing frm file of one of the information schema table. This parsing error gives corrupted table error. As for the innodb error, it occurs even when tmp_disk_table_size is not insufficient is default but the internal error handler takes care of it and the error doesn't show. But when tmp_disk_table_size is insufficient, the fatal error which wasn't reported correctly moves up the stack so internal error handler is not called. So it shows errors. Fix: Report the error correctly.	2022-04-12 01:22:51 +05:30
Marko Mäkelä	6cb6ba8b7b	Merge 10.8 into 10.9	2022-04-06 13:33:33 +03:00
Marko Mäkelä	b2baeba415	Merge 10.7 into 10.8	2022-04-06 13:28:25 +03:00
Marko Mäkelä	2d8e38bc94	Merge 10.6 into 10.7	2022-04-06 13:00:09 +03:00
Marko Mäkelä	ff99413804	MDEV-25975: Merge 10.5 into 10.6	2022-04-06 12:45:14 +03:00
Marko Mäkelä	5d8dcfd86c	MDEV-25975: Merge 10.4 into 10.5	2022-04-06 10:30:49 +03:00
Marko Mäkelä	d172df9913	MDEV-25975: Merge 10.3 into 10.4	2022-04-06 09:18:38 +03:00
Marko Mäkelä	e9735a8185	MDEV-25975 innodb_disallow_writes causes shutdown to hang We will remove the parameter innodb_disallow_writes because it is badly designed and implemented. The parameter was never allowed at startup. It was only internally used by Galera snapshot transfer. If a user executed SET GLOBAL innodb_disallow_writes=ON; the server could hang even on subsequent read operations. During Galera snapshot transfer, we will block writes to implement an rsync friendly snapshot, as follows: sst_flush_tables() will acquire a global lock by executing FLUSH TABLES WITH READ LOCK, which will block any writes at the high level. sst_disable_innodb_writes(), invoked via ha_disable_internal_writes(true), will suspend or disable InnoDB background tasks or threads that could initiate writes. As part of this, log_make_checkpoint() will be invoked to ensure that anything in the InnoDB buf_pool.flush_list will be written to the data files. This has the nice side effect that the Galera joiner will avoid crash recovery. The changes to sql/wsrep.cc and to the tests are based on a prototype that was developed by Jan Lindström. Reviewed by: Jan Lindström	2022-04-06 08:06:49 +03:00
Marko Mäkelä	177345dadc	MDEV-27812 Allow SET GLOBAL innodb_log_file_size We support online log resizing by replicating the current ib_logfile0 to a new file ib_logfile101, which will eventually replace the ib_logfile0 on the first applicable log checkpoint. Unless the log is located in a persistent memory file system (PMEM), an attempt to SET GLOBAL innodb_log_file_size to less than innodb_log_buffer_size will be refused. (With PMEM, a.k.a. mmap() based log, that parameter has no meaning.) Should the server be killed while the log was being resized, both files ib_logfile0 and ib_logfile101 may exist on startup, and since commit `3b06415cb8` the extra file ib_logfile101 will be removed. We will initiate checkpoint flushing by invoking buf_flush_ahead(), to let buf_flush_page_cleaner() write out pages until the buf_flush_async_lsn target has been reached. On a log checkpoint, if the new checkpoint LSN is not older than log_sys.resize_lsn (the start LSN of the ib_logfile101), we can switch files and complete the log resizing. Else, we will attempt to switch files on the next checkpoint. Log resizing can be aborted by killing the connection that is executing the SET GLOBAL statement. If the ib_logfile101 wraps around to the beginning, we must advance the log_sys.resize_lsn. In the resized log file, the sequence bit will always be written as 1 (no wrap-around). The log will be duplicated in log_t::resize_write(), invoked by mtr_t::finish_write(). When the log is being written via system calls (not PMEM), the initial log_sys.resize_lsn is the current log_sys.first_lsn, plus an integer multiple of log_sys.block_size, corresponding to the LSN at the start of the block that was written by log_sys.write_lsn. The log_sys.resize_buf will be of the same size as the log_sys.buf. During resizing, the contents of log_sys.buf and log_sys.resize_buf will be identical, except that the sequence bit of each mini-transaction will always be 1 in log_sys.resize_buf. If resizing is in progress, log_t::write_buf() will write log_sys.resize_buf to log_sys.resize_log (ib_logfile101). If the file would wrap around, the buffer will be written to log_sys.START_OFFSET and the log_sys.resize_lsn advanced accordingly. When using mmap() on /dev/shm or a PMEM mount -o dax file system, the initial log_sys.resize_lsn will be the log_sys.lsn at the time the resizing is initiated. If the log file wraps around during resizing, then the log_sys.resize_lsn will be advanced by (log_sys.resize_target - log_sys.START_OFFSET). log_t::resize_start(), log_t::resize_abort(), log_t::write_checkpoint(): Unless the log is mmap() based, acquire flush_lock and write_lock. In any case, acquire exclusive log_sys.latch to prevent race conditions. log_t::resize_rename(): Renamed from log_t::rename_resized(), and moved some code to the previous sole caller srv_start(). Thanks to Vladislav Vaintroub for helpful review comments and to Matthias Leich for testing this, in particular, testing crash recovery, multiple concurrent SET GLOBAL innodb_log_file_size and frequently killed connections.	2022-03-02 16:53:04 +02:00
Marko Mäkelä	32d741b5b0	Merge 10.7 into 10.8	2022-02-25 16:24:13 +02:00
Marko Mäkelä	3d88f9f34c	Merge 10.6 into 10.7	2022-02-25 16:09:16 +02:00
Marko Mäkelä	06eaca9b86	Merge 10.5 into 10.6 (MDEV-27913)	2022-02-25 12:15:16 +02:00
Marko Mäkelä	f42d6234bd	Merge 10.4 into 10.5 (MDEV-27913)	2022-02-25 11:47:27 +02:00
Marko Mäkelä	0eabc285a3	Merge 10.3 into 10.4 (MDEV-27913)	2022-02-25 10:55:57 +02:00
Thirunarayanan Balathandayuthapani	a76731e1a1	MDEV-27913 innodb_ft_cache_size max possible value (80000000) is too small for practical purposes - Make innodb_ft_cache_size & innodb_ft_total_cache_size are dynamic variable and increase the maximum value of innodb_ft_cache_size to 512MB for 32-bit system and 1 TB for 64-bit system and set innodb_ft_total_cache_size maximum value to 1 TB for 64-bit system. - Print warning if the fts cache exceeds the innodb_ft_cache_size and also unlock the cache if fts cache memory reduces less than innodb_ft_cache_size.	2022-02-24 22:41:23 +05:30
Oleksandr Byelkin	4fb2cb1a30	Merge branch '10.7' into 10.8	2022-02-04 14:50:25 +01:00
Oleksandr Byelkin	9ed8deb656	Merge branch '10.6' into 10.7	2022-02-04 14:11:46 +01:00
Oleksandr Byelkin	f5c5f8e41e	Merge branch '10.5' into 10.6	2022-02-03 17:01:31 +01:00
Oleksandr Byelkin	cf63eecef4	Merge branch '10.4' into 10.5	2022-02-01 20:33:04 +01:00
Andrei	fe2d90cca9	MDEV-11675. Convert the new session var to bool type and test changes The new @@binlog_alter_two_phase is converted to `my_bool` type.	2022-01-31 22:57:39 +02:00
Oleksandr Byelkin	a576a1cea5	Merge branch '10.3' into 10.4	2022-01-30 09:46:52 +01:00
Oleksandr Byelkin	41a163ac5c	Merge branch '10.2' into 10.3	2022-01-29 15:41:05 +01:00
Sachin	0c5d1342ae	MDEV-11675 Lag Free Alter On Slave This commit implements two phase binloggable ALTER. When a new @@session.binlog_alter_two_phase = YES ALTER query gets logged in two parts, the START ALTER and the COMMIT or ROLLBACK ALTER. START Alter is written in binlog as soon as necessary locks have been acquired for the table. The timing is such that any concurrent DML:s that update the same table are either committed, thus logged into binary log having done work on the old version of the table, or will be queued for execution on its new version. The "COMPLETE" COMMIT or ROLLBACK ALTER are written at the very point of a normal "single-piece" ALTER that is after the most of the query work is done. When its result is positive COMMIT ALTER is written, otherwise ROLLBACK ALTER is written with specific error happened after START ALTER phase. Replication of two-phase binloggable ALTER is cross-version safe. Specifically the OLD slave merely does not recognized the start alter part, still being able to process and memorize its gtid. Two phase logged ALTER is read from binlog by mysqlbinlog to produce BINLOG 'string', where 'string' contains base64 encoded Query_log_event containing either the start part of ALTER, or a completion part. The Query details can be displayed with `-v` flag, similarly to ROW format events. Notice, mysqlbinlog output containing parts of two-phase binloggable ALTER is processable correctly only by binlog_alter_two_phase server. @@log_warnings > 2 can reveal details of binlogging and slave side processing of the ALTER parts. The current commit also carries fixes to the following list of reported bugs: MDEV-27511, MDEV-27471, MDEV-27349, MDEV-27628, MDEV-27528. Thanks to all people involved into early discussion of the feature including Kristian Nielsen, those who helped to design, implement and test: Sergei Golubchik, Andrei Elkin who took the burden of the implemenation completion, Sujatha Sivakumar, Brandon Nesterenko, Alice Sherepa, Ramesh Sivaraman, Jan Lindstrom.	2022-01-27 21:25:07 +02:00
Daniel Black	83dd7db69d	MDEV-27314 InnoDB Buffer Pool Resize output cleanup (mtr postfix) More tests depending on 'Completed resizing buffer pool.' output	2022-01-24 17:28:06 +11:00
Marko Mäkelä	685d958e38	MDEV-14425 Improve the redo log for concurrency The InnoDB redo log used to be formatted in blocks of 512 bytes. The log blocks were encrypted and the checksum was calculated while holding log_sys.mutex, creating a serious scalability bottleneck. We remove the fixed-size redo log block structure altogether and essentially turn every mini-transaction into a log block of its own. This allows encryption and checksum calculations to be performed on local mtr_t::m_log buffers, before acquiring log_sys.mutex. The mutex only protects a memcpy() of the data to the shared log_sys.buf, as well as the padding of the log, in case the to-be-written part of the log would not end in a block boundary of the underlying storage. For now, the "padding" consists of writing a single NUL byte, to allow recovery and mariadb-backup to detect the end of the circular log faster. Like the previous implementation, we will overwrite the last log block over and over again, until it has been completely filled. It would be possible to write only up to the last completed block (if no more recent write was requested), or to write dummy FILE_CHECKPOINT records to fill the incomplete block, by invoking the currently disabled function log_pad(). This would require adjustments to some logic around log checkpoints, page flushing, and shutdown. An upgrade after a crash of any previous version is not supported. Logically empty log files from a previous version will be upgraded. An attempt to start up InnoDB without a valid ib_logfile0 will be refused. Previously, the redo log used to be created automatically if it was missing. Only with with innodb_force_recovery=6, it is possible to start InnoDB in read-only mode even if the log file does not exist. This allows the contents of a possibly corrupted database to be dumped. Because a prepared backup from an earlier version of mariadb-backup will create a 0-sized log file, we will allow an upgrade from such log files, provided that the FIL_PAGE_FILE_FLUSH_LSN in the system tablespace looks valid. The 512-byte log checkpoint blocks at 0x200 and 0x600 will be replaced with 64-byte log checkpoint blocks at 0x1000 and 0x2000. The start of log records will move from 0x800 to 0x3000. This allows us to use 4096-byte aligned blocks for all I/O in a future revision. We extend the MDEV-12353 redo log record format as follows. (1) Empty mini-transactions or extra NUL bytes will not be allowed. (2) The end-of-minitransaction marker (a NUL byte) will be replaced with a 1-bit sequence number, which will be toggled each time when the circular log file wraps back to the beginning. (3) After the sequence bit, a CRC-32C checksum of all data (excluding the sequence bit) will written. (4) If the log is encrypted, 8 bytes will be written before the checksum and included in it. This is part of the initialization vector (IV) of encrypted log data. (5) File names, page numbers, and checkpoint information will not be encrypted. Only the payload bytes of page-level log will be encrypted. The tablespace ID and page number will form part of the IV. (6) For padding, arbitrary-length FILE_CHECKPOINT records may be written, with all-zero payload, and with the normal end marker and checksum. The minimum size is 7 bytes, or 7+8 with innodb_encrypt_log=ON. In mariadb-backup and in Galera snapshot transfer (SST) scripts, we will no longer remove ib_logfile0 or create an empty ib_logfile0. Server startup will require a valid log file. When resizing the log, we will create a logically empty ib_logfile101 at the current LSN and use an atomic rename to replace ib_logfile0 with it. See the test innodb.log_file_size. Because there is no mandatory padding in the log file, we are able to create a dummy log file as of an arbitrary log sequence number. See the test mariabackup.huge_lsn. The parameter innodb_log_write_ahead_size and the INFORMATION_SCHEMA.INNODB_METRICS counter log_padded will be removed. The minimum value of innodb_log_buffer_size will be increased to 2MiB (because log_sys.buf will replace recv_sys.buf) and the increment adjusted to 4096 bytes (the maximum log block size). The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed: os_log_fsyncs os_log_pending_fsyncs log_pending_log_flushes log_pending_checkpoint_writes The following status variables will be removed: Innodb_os_log_fsyncs (this is included in Innodb_data_fsyncs) Innodb_os_log_pending_fsyncs (this was limited to at most 1 by design) log_sys.get_block_size(): Return the physical block size of the log file. This is only implemented on Linux and Microsoft Windows for now, and for the power-of-2 block sizes between 64 and 4096 bytes (the minimum and maximum size of a checkpoint block). If the block size is anything else, the traditional 512-byte size will be used via normal file system buffering. If the file system buffers can be bypassed, a message like the following will be issued: InnoDB: File system buffers for log disabled (block size=512 bytes) InnoDB: File system buffers for log disabled (block size=4096 bytes) This has been tested on Linux and Microsoft Windows with both sizes. On Linux, only enable O_DIRECT on the log for innodb_flush_method=O_DSYNC. Tests in 3 different environments where the log is stored in a device with a physical block size of 512 bytes are yielding better throughput without O_DIRECT. This could be due to the fact that in the event the last log block is being overwritten (if multiple transactions would become durable at the same time, and each of will write a small number of bytes to the last log block), it should be faster to re-copy data from log_sys.buf or log_sys.flush_buf to the kernel buffer, to be finally written at fdatasync() time. The parameter innodb_flush_method=O_DSYNC will imply O_DIRECT for data files. This option will enable O_DIRECT on the log file on Linux. It may be unsafe to use when the storage device does not support FUA (Force Unit Access) mode. When the server is compiled WITH_PMEM=ON, we will use memory-mapped I/O for the log file if the log resides on a "mount -o dax" device. We will identify PMEM in a start-up message: InnoDB: log sequence number 0 (memory-mapped); transaction id 3 On Linux, we will also invoke mmap() on any ib_logfile0 that resides in /dev/shm, effectively treating the log file as persistent memory. This should speed up "./mtr --mem" and increase the test coverage of PMEM on non-PMEM hardware. It also allows users to estimate how much the performance would be improved by installing persistent memory. On other tmpfs file systems such as /run, we will not use mmap(). mariadb-backup: Eliminated several variables. We will refer directly to recv_sys and log_sys. backup_wait_for_lsn(): Detect non-progress of xtrabackup_copy_logfile(). In this new log format with arbitrary-sized blocks, we can only detect log file overrun indirectly, by observing that the scanned log sequence number is not advancing. xtrabackup_copy_logfile(): On PMEM, do not modify the sequence bit, because we are not allowed to modify the server's log file, and our memory mapping is read-only. trx_flush_log_if_needed_low(): Do not use the callback on pmem. Using neither flush_lock nor write_lock around PMEM writes seems to yield the best performance. The pmem_persist() calls may still be somewhat slower than the pwrite() and fdatasync() based interface (PMEM mounted without -o dax). recv_sys_t::buf: Remove. We will use log_sys.buf for parsing. recv_sys_t::MTR_SIZE_MAX: Replaces RECV_SCAN_SIZE. recv_sys_t::file_checkpoint: Renamed from mlog_checkpoint_lsn. recv_sys_t, log_sys_t: Removed many data members. recv_sys.lsn: Renamed from recv_sys.recovered_lsn. recv_sys.offset: Renamed from recv_sys.recovered_offset. log_sys.buf_size: Replaces srv_log_buffer_size. recv_buf: A smart pointer that wraps log_sys.buf[recv_sys.offset] when the buffer is being allocated from the memory heap. recv_ring: A smart pointer that wraps a circular log_sys.buf[] that is backed by ib_logfile0. The pointer will wrap from recv_sys.len (log_sys.file_size) to log_sys.START_OFFSET. For the record that wraps around, we may copy file name or record payload data to the auxiliary buffer decrypt_buf in order to have a contiguous block of memory. The maximum size of a record is less than innodb_page_size bytes. recv_sys_t::parse(): Take the smart pointer as a template parameter. Do not temporarily add a trailing NUL byte to FILE_ records, because we are not supposed to modify the memory-mapped log file. (It is attached in read-write mode already during recovery.) recv_sys_t::parse_mtr(): Wrapper for recv_sys_t::parse(). recv_sys_t::parse_pmem(): Like parse_mtr(), but if PREMATURE_EOF would be returned on PMEM, use recv_ring to wrap around the buffer to the start. mtr_t::finish_write(), log_close(): Do not enforce log_sys.max_buf_free on PMEM, because it has no meaning on the mmap-based log. log_sys.write_to_buf: Count writes to log_sys.buf. Replaces srv_stats.log_write_requests and export_vars.innodb_log_write_requests. Protected by log_sys.mutex. Updated consistently in log_close(). Previously, mtr_t::commit() conditionally updated the count, which was inconsistent. log_sys.write_to_log: Count swaps of log_sys.buf and log_sys.flush_buf, for writing to log_sys.log (the ib_logfile0). Replaces srv_stats.log_writes and export_vars.innodb_log_writes. Protected by log_sys.mutex. log_sys.waits: Count waits in append_prepare(). Replaces srv_stats.log_waits and export_vars.innodb_log_waits. recv_recover_page(): Do not unnecessarily acquire log_sys.flush_order_mutex. We are inserting the blocks in arbitary order anyway, to be adjusted in recv_sys.apply(true). We will change the definition of flush_lock and write_lock to avoid potential false sharing. Depending on sizeof(log_sys) and CPU_LEVEL1_DCACHE_LINESIZE, the flush_lock and write_lock could share a cache line with each other or with the last data members of log_sys. Thanks to Matthias Leich for providing https://rr-project.org traces for various failures during the development, and to Thirunarayanan Balathandayuthapani for his help in debugging some of the recovery code. And thanks to the developers of the rr debugger for a tool without which extensive changes to InnoDB would be very challenging to get right. Thanks to Vladislav Vaintroub for useful feedback and to him, Axel Schwenke and Krunal Bauskar for testing the performance.	2022-01-21 16:03:47 +02:00
Jan Lindström	e32c21cb93	Changing wsrep_slave_threads parameter requires that cluster is connected so moved test here.	2022-01-11 09:43:59 +02:00
Marko Mäkelä	7dfaded962	Merge 10.6 into 10.7	2022-01-04 09:55:58 +02:00

... 3 4 5 6 7 ...

1327 Commits