The uring_ member of the aio_uring class needed to be
defined to make MSAN happy.
To make aio_uring consistent with aio_libaio in the 10.11 branch,
added the "using namespace tpool" and removed tpool scoped quantifiers.
10.6 aio_uring is defined in the tpool namespace however changing
that would just cause merge conflicts.
Where io_setup fails, this is a serious issue, normally because
of the lack of fs.aio-max-nr configured in the kernel.
We adjust the error message to be a Warning, because like
"native AIO failed: falling back to innodb_use_native_aio=OFF",
its user actionable.
A default configuation of the server and indeed raising
innodb_write_io_threads and innodb_read_io_threads couldn't
exceed the default fs.aio-max-nr value. If a user is
constructing multipe instances of MariaDB that exceed this
value then they should be seeing the warning and taking action.
There are CI environments, as Otto points on on Launchpad,
that have insufficient fs.aio-max-nr configure to run mtr
in parallel. This however a genuine distro problem and
to resolve.
For us, and our developers, we'd rather see the warning
so we can fix CI and dev instances that are insufficiently
configured.
The io_setup man page as a very short but descript set of
causes for the io_setup failures. Its safer to refer to this,
now with a strerror description rather than a number.
The mtr suppressions are removed because a while ago
when the errors where moved out of InnoDB and recently
the server could never generate a warning of these forms.
A read from liburing has its output buffer as defined
by the nature of the system call.
Instrumenting the library proved difficult as the
completion events don't include the opcode or pointers
that where there on submission. It was just easier
to do this in the application code.
As noted in commit c36834c832
the MemorySanitizer instrumented builds so far only work with
the synchronous I/O interface (innodb_use_native_aio=OFF).
It is not that hard to make WITH_MSAN=ON work with the
Linux libaio, without even instrumenting that library itself.
aio_linux::getevent_thread_routine(): Declare the buffer that
is returned by my_getevents() as initialized. Declare
the data returned by a successful aio_opcode::AIO_PREAD as initialized.
Reviewed by: Daniel Black
This controls which linux implementation to use for
innodb_use_native_aio=ON.
innodb_linux_aio=auto is equivalent to innodb_linux_aio=io_uring when
it is available, and falling back to innodb_linux_aio=aio when not.
Debian packaging is no longer aio exclusive or uring, so
for those older Debian or Ubuntu releases, its a remove_uring directive.
For more recent releases, add mandatory liburing for consistent packaging.
WITH_LIBAIO is now an independent option from WITH_URING.
LINUX_NATIVE_AIO preprocessor constant is renamed to HAVE_LIBAIO,
analogous to existing HAVE_URING.
tpool::is_aio_supported(): A common feature check.
is_linux_native_aio_supported(): Remove. This had originally been added in
mysql/mysql-server@0da310b69d in 2012
to fix an issue where io_submit() on CentOS 5.5 would return EINVAL
for a /tmp/#sql*.ibd file associated with CREATE TEMPORARY TABLE.
But, starting with commit 2e814d4702 InnoDB
temporary tables will be written to innodb_temp_data_file_path.
The 2012 commit said that the error could occur on "old kernels".
Any GNU/Linux distribution that we currently support should be based
on a newer Linux kernel; for example, Red Hat Enterprise Linux 7
was released in 2014.
tpool::create_linux_aio(): Wraps the Linux implementations:
create_libaio() and create_liburing(), each defined in separate
compilation units (aio_linux.cc, aio_libaio.cc, aio_liburing.cc).
The CMake definitions are simplified using target_sources() and
target_compile_definitions(), all available since CMake 2.8.12.
With this change, there is no need to include ${CMAKE_SOURCE_DIR}/tpool
or add TPOOL_DEFINES flags anymore, target_link_libraries(lib tpool)
does all that.
This is joint work with Daniel Black and Vladislav Vaintroub.
As noted by Jens Axobe, errno isn't set, it is returned
by the io_uring_queue_init function.
As such users where getting the following as the common case
when EPERM was actually intended to be returned.
"mariadbd: io_uring_queue_init() failed with errno 0".
Add to error message the correct relevant information around EPERM.
log_file_t::read(), log_file_t::write(): Invoke pread() or pwrite()
directly, so that we can give more accurate diagnostics in case of
a failure, and so that we will avoid the overhead of setting up 5(!)
stack frames and related objects.
tpool::pwrite(): Add a missing const qualifier.
This is a server hang and not an issue with backup. While concurrent
DDLs in server gets in hanged state, mariabackup waits for DDLs to
finish trying to acquire MDL_BACKUP_BLOCK_DDL.
The server hang is serious in nature and caused by thread pool state
being incorrectly set to thread creation pending state while no creation
is actually pending. Once a thread pool reaches such state no new thread
gets created in the pool.
While it could possibly affect all thread pools in server, the innodb
thread pool is the victim in current bug where IO job gets blocked when
the pool is stuck with much less number of threads than intended.
Available workers are blocked in purge waiting for page lock to be
released by IO write (SX lock) causing a complete deadlock.
The issue is caused by the state variable m_thread_creation_pending
introduced by MDEV-31095: 9e62ab7aaf. We check and set the variable
early while attempting to create a new thread in pool but fail to reset
it if we exit the flow for other reasons like maximum threads reached
or get into thread creation throttling path.
Fix: The simple fix is to make sure that the state is reset back in case
we don't actually attempt to create the thread.
Previous solution, that would entirely switch timer off, turned out
to be deadlock prone.
This patch fixed previous attempt to switch between long/short interval
periods in MDEV-24295. Now, initial state of the timer is fixed (it is ON).
Also, avoid switching timer to longer periods if there is any activity in
the pool.
Before patch, maintenance timer will tick every 0.4 seconds.
After this patch, timer will tick every 0.4 seconds when necessary(
there are delayed thread creation), switching off completely after 20
seconds of being idle.
When the constant OS_AIO_N_PENDING_IOS_PER_THREAD is changed from 256 to 1
and the server is run with the minimum parameters
innodb_read_io_threads=1 and innodb_write_io_threads=2, two hangs
were observed.
tpool::cache<T>::put(T*): Ensure that get() in io_slots::acquire()
will be woken up when the cache previously was empty.
buf_pool_t::io_buf_t::reserve(): Schedule a possibly partial doublewrite
batch so that os_aio_wait_until_no_pending_writes() has a chance of
returning. Add a Boolean parameter and pass wait_for_reads=false inside
buf_page_decrypt_after_read(), because those calls will be executed
inside a read completion callback, and therefore
os_aio_wait_until_no_pending_reads() would block indefinitely.
The MDEV-29693 conflict resolution is from Monty, as well as is
a bug fix where ANALYZE TABLE wrongly built histograms for
single-column PRIMARY KEY.
Also includes a fix for safe_malloc error reporting.
Other things:
- Copied main.log_slow from 10.4 to avoid mtr issue
Disabled test:
- spider/bugfix.mdev_27239 because we started to get
+Error 1429 Unable to connect to foreign data source: localhost
-Error 1158 Got an error reading communication packets
- main.delayed
- Bug#54332 Deadlock with two connections doing LOCK TABLE+INSERT DELAYED
This part is disabled for now as it fails randomly with different
warnings/errors (no corruption).
Add threadpool functionality to restrict concurrency during "batch"
periods (where tasks are added in rapid succession).
This will throttle thread creation more agressively than usual, while
keeping performance at least on-par.
One of these cases is bufferpool load, where async read IOs are executed
without any throttling. There can be as much as 650K read IOs for
loading 10GB buffer pool.
Another one is recovery, where "fake read" IOs are executed.
Why there are more threads than we expect?
Worker threads are not be recognized as idle, until they return to the
standby list, and to return to that list, they need to acquire
mutex currently held in the submit_task(). In those cases, submit_task()
has no worker to wake, and would create threads until default concurrency
level (2*ncpus) is satisfied. Only after that throttling would happen.
tpool::cache::m_mtx: Add PERFORMANCE_SCHEMA instrumentation
(wait/synch/mutex/innodb/tpool_cache_mutex). This covers the
InnoDB read_slots and write_slots for asynchronous data page I/O.