1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-08 11:22:35 +03:00
Commit Graph

8132 Commits

Author SHA1 Message Date
Marko Mäkelä
638afc4acf Merge 10.6 into 10.7 2022-04-26 18:59:40 +03:00
Marko Mäkelä
e135edec3a Merge 10.5 into 10.6 2022-04-26 15:21:20 +03:00
Brandon Nesterenko
d16c3aca3c MDEV-26473: mysqld got exception 0xc0000005 (rpl_slave_state/rpl_load_gtid_slave_state)
Problem:
========
During mysqld initialization, if the number of GTIDs added since
that last purge of the mysql.gtid_slave_pos tables is greater than
or equal to the –-gtid-cleanup-batch-size value, a race condition
can occur. Specifically, the binlog background thread will submit
the bg_gtid_delete_pending job to the mysql handle manager; however,
the mysql handle manager may not be initialized, leading to crashes.

Solution:
========
Force the mysql handle manager to initialize/start before the binlog
background thread is created.

Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
2022-04-25 08:07:17 -06:00
Marko Mäkelä
4faef6e240 Cleanup: Remove IF_VALGRIND
The purpose of the compress() wrapper my_compress_buffer() was twofold:
silence Valgrind warnings about uninitialized memory access before
zlib 1.2.4, and have PERFORMANCE_SCHEMA instrumentation of some zlib
related memory allocation. Because of PERFORMANCE_SCHEMA, we cannot
trivially replace my_compress_buffer() with compress().

az_open(): Remove a crc32() call. Any CRC of the empty string is 0.
2022-04-25 09:40:40 +03:00
Marko Mäkelä
232af0c7bf Do not disable --symbolic-links on Valgrind (or MSAN)
The option --symbolic-links was originally disabled by default under
Purify (and later Valgrind) in 51156c5af2
without any explanation.
2022-04-25 09:36:30 +03:00
Brandon Nesterenko
a83c7ab1ea MDEV-11853: semisync thread can be killed after sync binlog but before ACK in the sync state
Problem:
========
If a primary is shutdown during an active semi-sync connection
during the period when the primary is awaiting an ACK, the primary
hard kills the active communication thread and does not ensure the
transaction was received by a replica. This can lead to an
inconsistent replication state.

Solution:
========
During shutdown, the primary should wait for an ACK or timeout
before hard killing a thread which is awaiting a communication. We
extend the `SHUTDOWN WAIT FOR SLAVES` logic to identify and ignore
any threads waiting for a semi-sync ACK in phase 1. Then, before
stopping the ack receiver thread, the shutdown is delayed until all
waiting semi-sync connections receive an ACK or time out. The
connections are then killed in phase 2.

Notes:
 1) There remains an unresolved corner case that affects this
patch. MDEV-28141: Slave crashes with Packets out of order when
connecting to a shutting down master. Specifically, If a slave is
connecting to a master which is actively shutting down, the slave
can crash with a "Packets out of order" assertion error. To get
around this issue in the MTR tests, the primary will wait a small
amount of time before phase 1 killing threads to let the replicas
safely stop (if applicable).
 2) This patch also fixes MDEV-28114: Semi-sync Master ACK Receiver
Thread Can Error on COM_QUIT

Reviewed By
============
Andrei Elkin <andrei.elkin@mariadb.com>
2022-04-22 12:59:54 -06:00
Marko Mäkelä
fae0ccad6e Merge 10.5 into 10.6 2022-04-21 17:46:40 +03:00
Marko Mäkelä
620c55e708 Merge 10.4 into 10.5 2022-04-21 15:33:50 +03:00
Marko Mäkelä
394784095e Merge 10.3 into 10.4 2022-04-21 11:33:59 +03:00
Rucha Deodhar
72a1250585 MDEV-28029: No warnings if server starts with "--old"
Analysis: When --old option is used, the corresponding --old-mode variables
are set but warning is not given.
Fix: Use sql_print_warning() to give warning.
2022-04-20 11:05:15 +05:30
Rucha Deodhar
5945e420f1 MDEV-24920: Merge "old" SQL variable to "old_mode" sql variable
Analysis: There are 2 server variables- "old_mode" and "old". "old" is no
longer needed as "old_mode" has replaced it (however still used in some places
 in the code). "old_mode" and "old" has same purpose- emulate behavior from
previous MariaDB versions. So they can be merged to avoid confusion.
Fix: Deprecate "old" variable and create another mode for @@old_mode to mimic
behavior of previous "old" variable. Create specific modes for specifix task
that --old sql variable was doing earlier and use the new modes instead.
2022-04-20 00:30:22 +05:30
Nayuta Yanagisawa
cbf9d8a8d5 Merge 10.7 into 10.8 2022-04-13 17:52:27 +09:00
Marko Mäkelä
aa3a9d1ef5 Merge 10.6 into 10.7 2022-04-12 16:11:29 +03:00
Sergei Golubchik
bbdec04d59 MDEV-24317 Data race in LOGGER::init_error_log at sql/log.cc:1443 and in LOGGER::error_log_print at sql/log.cc:1181
don't initialize error_log_handler_list in set_handlers()
* error_log_handler_list is initialized to LOG_FILE early, in init_base()
* set_handlers always reinitializes it to LOG_FILE, so it's pointless
* after init_base() concurrent threads start using sql_log_warning,
  so following set_handlers() shouldn't modify error_log_handler_list
  without some protection
2022-04-12 13:07:20 +02:00
Vladislav Vaintroub
5a4a37076d MDEV-10183 implement service_manager_extend_timeout on Windows
The implementation calls SetServiceStatus() with updated
SERVICE_STATUS::dwHint and SERVICE_STATUS::dwCheckpoint
2022-04-11 07:49:43 +02:00
Daniel Black
88ce8a3d8b Merge 10.7 into 10.8 2022-03-25 15:06:56 +11:00
Daniel Black
e86986a157 Merge 10.6 into 10.7 2022-03-24 18:57:07 +11:00
Daniel Black
065f995e6d Merge branch 10.5 into 10.6 2022-03-18 12:17:11 +11:00
Daniel Black
b73d852779 Merge 10.4 to 10.5 2022-03-17 17:03:24 +11:00
Daniel Black
069139a549 Merge 10.3 to 10.4
extra2_read_len resolved by keeping the implementation
in sql/table.cc by exposed it for use by ha_partition.cc

Remove identical implementation in unireg.h
(ref: bfed2c7d57)
2022-03-16 16:39:10 +11:00
Daniel Black
a950086036 Merge 10.2 (part) into 10.3
commit '6de482a6fefac0c21daf33ed465644151cdf879f'

10.3 no longer errors in truncate_notembedded.test
but per comments, a non-crash is all that we are after.
2022-03-15 16:44:52 +11:00
Haidong Ji
114476f2ec MDEV-27978 fix wrong name in error when max_session_mem_used exceeded
Fixed typo in my_malloc_size_cb_func. There is no max-thread-mem-used
sys variable in MariaDB, only max-session-mem-used. The relevant entry
in sys_vars.cc is also fixed.

Added a fallback case in case we could allocate the 256 bytes for the
error message containing the exact setting.
2022-03-08 15:13:09 +11:00
Marko Mäkelä
a3b96d4584 Merge 10.7 into 10.8 2022-02-22 13:20:18 +02:00
Marko Mäkelä
507084517f Merge 10.6 into 10.7 2022-02-22 12:47:48 +02:00
Haidong Ji
120480d15b MDEV-27394 added memset to zero out addr.un.sun_path string
If this char[] is not zeroed out, inaccurate log message described in
MDEV-27394 will show up.
2022-02-22 09:41:16 +11:00
Oleksandr Byelkin
4fb2cb1a30 Merge branch '10.7' into 10.8 2022-02-04 14:50:25 +01:00
Oleksandr Byelkin
9ed8deb656 Merge branch '10.6' into 10.7 2022-02-04 14:11:46 +01:00
Oleksandr Byelkin
f5c5f8e41e Merge branch '10.5' into 10.6 2022-02-03 17:01:31 +01:00
Oleksandr Byelkin
cf63eecef4 Merge branch '10.4' into 10.5 2022-02-01 20:33:04 +01:00
Sachin
0c5d1342ae MDEV-11675 Lag Free Alter On Slave
This commit implements two phase binloggable ALTER.
When a new

      @@session.binlog_alter_two_phase = YES

ALTER query gets logged in two parts, the START ALTER and the COMMIT
or ROLLBACK ALTER. START Alter is written in binlog as soon as
necessary locks have been acquired for the table. The timing is
such that any concurrent DML:s that update the same table are either
committed, thus logged into binary log having done work on the old
version of the table, or will be queued for execution on its new
version.

The "COMPLETE" COMMIT or ROLLBACK ALTER are written at the very point
of a normal "single-piece" ALTER that is after the most of
the query work is done. When its result is positive COMMIT ALTER is
written, otherwise ROLLBACK ALTER is written with specific error
happened after START ALTER phase.
Replication of two-phase binloggable ALTER is
cross-version safe. Specifically the OLD slave merely does not
recognized the start alter part, still being able to process and
memorize its gtid.

Two phase logged ALTER is read from binlog by mysqlbinlog to produce
BINLOG 'string', where 'string' contains base64 encoded
Query_log_event containing either the start part of ALTER, or a
completion part. The Query details can be displayed with `-v` flag,
similarly to ROW format events.  Notice, mysqlbinlog output containing
parts of two-phase binloggable ALTER is processable correctly only by
binlog_alter_two_phase server.

@@log_warnings > 2 can reveal details of binlogging and slave side
processing of the ALTER parts.

The current commit also carries fixes to the following list of
reported bugs:
MDEV-27511, MDEV-27471, MDEV-27349, MDEV-27628, MDEV-27528.

Thanks to all people involved into early discussion of the feature
including Kristian Nielsen, those who helped to design, implement and
test: Sergei Golubchik, Andrei Elkin who took the burden of the
implemenation completion, Sujatha Sivakumar, Brandon
Nesterenko, Alice Sherepa, Ramesh Sivaraman, Jan Lindstrom.
2022-01-27 21:25:07 +02:00
Marko Mäkelä
685d958e38 MDEV-14425 Improve the redo log for concurrency
The InnoDB redo log used to be formatted in blocks of 512 bytes.
The log blocks were encrypted and the checksum was calculated while
holding log_sys.mutex, creating a serious scalability bottleneck.

We remove the fixed-size redo log block structure altogether and
essentially turn every mini-transaction into a log block of its own.
This allows encryption and checksum calculations to be performed
on local mtr_t::m_log buffers, before acquiring log_sys.mutex.
The mutex only protects a memcpy() of the data to the shared
log_sys.buf, as well as the padding of the log, in case the
to-be-written part of the log would not end in a block boundary of
the underlying storage. For now, the "padding" consists of writing
a single NUL byte, to allow recovery and mariadb-backup to detect
the end of the circular log faster.

Like the previous implementation, we will overwrite the last log block
over and over again, until it has been completely filled. It would be
possible to write only up to the last completed block (if no more
recent write was requested), or to write dummy FILE_CHECKPOINT records
to fill the incomplete block, by invoking the currently disabled
function log_pad(). This would require adjustments to some logic around
log checkpoints, page flushing, and shutdown.

An upgrade after a crash of any previous version is not supported.
Logically empty log files from a previous version will be upgraded.

An attempt to start up InnoDB without a valid ib_logfile0 will be
refused. Previously, the redo log used to be created automatically
if it was missing. Only with with innodb_force_recovery=6, it is
possible to start InnoDB in read-only mode even if the log file
does not exist. This allows the contents of a possibly corrupted
database to be dumped.

Because a prepared backup from an earlier version of mariadb-backup
will create a 0-sized log file, we will allow an upgrade from such
log files, provided that the FIL_PAGE_FILE_FLUSH_LSN in the system
tablespace looks valid.

The 512-byte log checkpoint blocks at 0x200 and 0x600 will be replaced
with 64-byte log checkpoint blocks at 0x1000 and 0x2000.

The start of log records will move from 0x800 to 0x3000. This allows us
to use 4096-byte aligned blocks for all I/O in a future revision.

We extend the MDEV-12353 redo log record format as follows.

(1) Empty mini-transactions or extra NUL bytes will not be allowed.
(2) The end-of-minitransaction marker (a NUL byte) will be replaced
with a 1-bit sequence number, which will be toggled each time when the
circular log file wraps back to the beginning.
(3) After the sequence bit, a CRC-32C checksum of all data
(excluding the sequence bit) will written.
(4) If the log is encrypted, 8 bytes will be written before
the checksum and included in it. This is part of the
initialization vector (IV) of encrypted log data.
(5) File names, page numbers, and checkpoint information will not be
encrypted. Only the payload bytes of page-level log will be encrypted.
The tablespace ID and page number will form part of the IV.
(6) For padding, arbitrary-length FILE_CHECKPOINT records may be written,
with all-zero payload, and with the normal end marker and checksum.
The minimum size is 7 bytes, or 7+8 with innodb_encrypt_log=ON.

In mariadb-backup and in Galera snapshot transfer (SST) scripts, we will
no longer remove ib_logfile0 or create an empty ib_logfile0. Server startup
will require a valid log file. When resizing the log, we will create
a logically empty ib_logfile101 at the current LSN and use an atomic rename
to replace ib_logfile0 with it. See the test innodb.log_file_size.

Because there is no mandatory padding in the log file, we are able
to create a dummy log file as of an arbitrary log sequence number.
See the test mariabackup.huge_lsn.

The parameter innodb_log_write_ahead_size and the
INFORMATION_SCHEMA.INNODB_METRICS counter log_padded will be removed.

The minimum value of innodb_log_buffer_size will be increased to 2MiB
(because log_sys.buf will replace recv_sys.buf) and the increment
adjusted to 4096 bytes (the maximum log block size).

The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed:

os_log_fsyncs
os_log_pending_fsyncs
log_pending_log_flushes
log_pending_checkpoint_writes

The following status variables will be removed:

Innodb_os_log_fsyncs (this is included in Innodb_data_fsyncs)
Innodb_os_log_pending_fsyncs (this was limited to at most 1 by design)

log_sys.get_block_size(): Return the physical block size of the log file.
This is only implemented on Linux and Microsoft Windows for now, and for
the power-of-2 block sizes between 64 and 4096 bytes (the minimum and
maximum size of a checkpoint block). If the block size is anything else,
the traditional 512-byte size will be used via normal file system
buffering.

If the file system buffers can be bypassed, a message like the following
will be issued:

InnoDB: File system buffers for log disabled (block size=512 bytes)
InnoDB: File system buffers for log disabled (block size=4096 bytes)

This has been tested on Linux and Microsoft Windows with both sizes.

On Linux, only enable O_DIRECT on the log for innodb_flush_method=O_DSYNC.
Tests in 3 different environments where the log is stored in a device
with a physical block size of 512 bytes are yielding better throughput
without O_DIRECT. This could be due to the fact that in the event the
last log block is being overwritten (if multiple transactions would
become durable at the same time, and each of will write a small
number of bytes to the last log block), it should be faster to re-copy
data from log_sys.buf or log_sys.flush_buf to the kernel buffer,
to be finally written at fdatasync() time.

The parameter innodb_flush_method=O_DSYNC will imply O_DIRECT for
data files. This option will enable O_DIRECT on the log file on Linux.
It may be unsafe to use when the storage device does not support
FUA (Force Unit Access) mode.

When the server is compiled WITH_PMEM=ON, we will use memory-mapped
I/O for the log file if the log resides on a "mount -o dax" device.
We will identify PMEM in a start-up message:

InnoDB: log sequence number 0 (memory-mapped); transaction id 3

On Linux, we will also invoke mmap() on any ib_logfile0 that resides
in /dev/shm, effectively treating the log file as persistent memory.
This should speed up "./mtr --mem" and increase the test coverage of
PMEM on non-PMEM hardware. It also allows users to estimate how much
the performance would be improved by installing persistent memory.
On other tmpfs file systems such as /run, we will not use mmap().

mariadb-backup: Eliminated several variables. We will refer
directly to recv_sys and log_sys.

backup_wait_for_lsn(): Detect non-progress of
xtrabackup_copy_logfile(). In this new log format with
arbitrary-sized blocks, we can only detect log file overrun
indirectly, by observing that the scanned log sequence number
is not advancing.

xtrabackup_copy_logfile(): On PMEM, do not modify the sequence bit,
because we are not allowed to modify the server's log file, and our
memory mapping is read-only.

trx_flush_log_if_needed_low(): Do not use the callback on pmem.
Using neither flush_lock nor write_lock around PMEM writes seems
to yield the best performance. The pmem_persist() calls may
still be somewhat slower than the pwrite() and fdatasync() based
interface (PMEM mounted without -o dax).

recv_sys_t::buf: Remove. We will use log_sys.buf for parsing.

recv_sys_t::MTR_SIZE_MAX: Replaces RECV_SCAN_SIZE.

recv_sys_t::file_checkpoint: Renamed from mlog_checkpoint_lsn.

recv_sys_t, log_sys_t: Removed many data members.

recv_sys.lsn: Renamed from recv_sys.recovered_lsn.
recv_sys.offset: Renamed from recv_sys.recovered_offset.
log_sys.buf_size: Replaces srv_log_buffer_size.

recv_buf: A smart pointer that wraps log_sys.buf[recv_sys.offset]
when the buffer is being allocated from the memory heap.

recv_ring: A smart pointer that wraps a circular log_sys.buf[] that is
backed by ib_logfile0. The pointer will wrap from recv_sys.len
(log_sys.file_size) to log_sys.START_OFFSET. For the record that
wraps around, we may copy file name or record payload data to
the auxiliary buffer decrypt_buf in order to have a contiguous
block of memory. The maximum size of a record is less than
innodb_page_size bytes.

recv_sys_t::parse(): Take the smart pointer as a template parameter.
Do not temporarily add a trailing NUL byte to FILE_ records, because
we are not supposed to modify the memory-mapped log file. (It is
attached in read-write mode already during recovery.)

recv_sys_t::parse_mtr(): Wrapper for recv_sys_t::parse().

recv_sys_t::parse_pmem(): Like parse_mtr(), but if PREMATURE_EOF would be
returned on PMEM, use recv_ring to wrap around the buffer to the start.

mtr_t::finish_write(), log_close(): Do not enforce log_sys.max_buf_free
on PMEM, because it has no meaning on the mmap-based log.

log_sys.write_to_buf: Count writes to log_sys.buf. Replaces
srv_stats.log_write_requests and export_vars.innodb_log_write_requests.
Protected by log_sys.mutex. Updated consistently in log_close().
Previously, mtr_t::commit() conditionally updated the count,
which was inconsistent.

log_sys.write_to_log: Count swaps of log_sys.buf and log_sys.flush_buf,
for writing to log_sys.log (the ib_logfile0). Replaces
srv_stats.log_writes and export_vars.innodb_log_writes.
Protected by log_sys.mutex.

log_sys.waits: Count waits in append_prepare(). Replaces
srv_stats.log_waits and export_vars.innodb_log_waits.

recv_recover_page(): Do not unnecessarily acquire
log_sys.flush_order_mutex. We are inserting the blocks in arbitary
order anyway, to be adjusted in recv_sys.apply(true).

We will change the definition of flush_lock and write_lock to
avoid potential false sharing. Depending on sizeof(log_sys) and
CPU_LEVEL1_DCACHE_LINESIZE, the flush_lock and write_lock could
share a cache line with each other or with the last data members
of log_sys.

Thanks to Matthias Leich for providing https://rr-project.org traces
for various failures during the development, and to
Thirunarayanan Balathandayuthapani for his help in debugging
some of the recovery code. And thanks to the developers of the
rr debugger for a tool without which extensive changes to InnoDB
would be very challenging to get right.

Thanks to Vladislav Vaintroub for useful feedback and
to him, Axel Schwenke and Krunal Bauskar for testing the performance.
2022-01-21 16:03:47 +02:00
Vladislav Vaintroub
e222e44d1b Merge branch 'preview-10.8-MDEV-26713-Windows-i18-support' into 10.8 2022-01-18 21:37:52 +01:00
Jan Lindström
f43ef9ba3a MDEV-25977 : Warning: Memory not freed: 32 on SET GLOBAL wsrep_sst_auth=USER
Add missing wsrep_sst_auth_free call.
2022-01-18 07:10:48 +02:00
Marko Mäkelä
7dfaded962 Merge 10.6 into 10.7 2022-01-04 09:55:58 +02:00
Marko Mäkelä
3f5726768f Merge 10.5 into 10.6 2022-01-04 09:26:38 +02:00
Marko Mäkelä
c9db50b585 Merge 10.4 into 10.5 2022-01-03 07:23:49 +02:00
Monty
a48d2ec866 Add --valgrind to VERSION() string for valgrind builds
Fixes main.sp-no-valgrind for valgrind builds not done with BUILD scripts
2021-12-28 16:37:14 +02:00
Sergei Golubchik
89a0364fc8 MDEV-27304 SHOW ... result columns are right-aligned
--version=value was setting sys_var::CONFIG (meaning, the value
came from the config file), but the filename was left as NULL.
2021-12-27 13:28:25 +01:00
Julius Goryavsky
55bb933a88 Merge branch 10.4 into 10.5 2021-12-26 12:51:04 +01:00
Julius Goryavsky
681b7784b6 Merge branch 10.3 into 10.4 2021-12-25 12:13:03 +01:00
Julius Goryavsky
3376668ca8 Merge branch 10.2 into 10.3 2021-12-23 14:14:04 +01:00
Vladislav Vaintroub
ea0a5cb0a4 MDEV-27092 Windows - services that have non-ASCII characters do not work with activeCodePage=UTF8
CreateServiceA, OpenServiceA, and couple of other functions do not work
correctly with non-ASCII character, in the special case where application
has defined activeCodePage=UTF8.

Workaround by redefining affected ANSI functions to own wrapper, which
works by converting narrow(ANSI) to wide, then calling wide function.

Deprecate original ANSI service functions, via declspec, so that we can catch
their use.
2021-12-15 19:13:57 +01:00
Sergei Golubchik
0745db7179 don't use buffered_option_error_reporter without perfschema
it's not printed, not cleaned up without perfschema,
so isn't supposed to be written into either

this fixes "Memory not freed" warnings when early command line
options produce warnings in non-perfschema builds
2021-12-10 09:40:50 +01:00
Marko Mäkelä
7e8a13d9d7 Merge 10.6 into 10.7 2021-11-19 17:45:52 +02:00
Marko Mäkelä
dc8def73f7 Merge 10.5 into 10.6 2021-11-16 16:30:45 +02:00
Vladislav Vaintroub
72afeaf8a5 MDEV-18439 Windows builds should have core_file=1 by default
reapply patch 96b472c0ae
It was not merged correctly into 10.5+
2021-11-13 10:31:46 +01:00
Marko Mäkelä
06988bdcaa Merge 10.6 into 10.7 2021-11-09 09:40:29 +02:00
Marko Mäkelä
25ac047baf Merge 10.5 into 10.6 2021-11-09 09:11:50 +02:00
Marko Mäkelä
9c18b96603 Merge 10.4 into 10.5 2021-11-09 08:50:33 +02:00
Marko Mäkelä
47ab793d71 Merge 10.3 into 10.4 2021-11-09 08:40:14 +02:00