In resolving MDEV-33301 (76a27155b4) we
moved all the capabilities from CapabilityBoundingSet to AmbientCapabilities
where only add/moving CAP_IPC_LOCK was intended.
The effect of this is the defaulting running MariaDB HAS the capabiltiy
CAP_DAC_OVERRIDE CAP_AUDIT_WRITE allowing it to access any file,
even while running as a non-root user.
Resolve this by making CAP_IPC_LOCK apply to AmbientCapabilities and
leave the remaining CAP_DAC_OVERRIDE CAP_AUDIT_WRITE to CapabilityBoundingSet
for the use by auth_pam_tool.
Per https://github.com/systemd/systemd/issues/36529 OOM counts
as a on-abnormal condition. To ensure that MariaDB testart on
OOM the Restart is changes to on-abnormal which an extension
on the current on-abort condition.
.. even with MDEV-9095 fix
CapabilityBounding sets require filesystem setcap attributes
for the executable to gain privileges during execution.
A side effect of this however is the getauxvec(AT_SECURE) gets
set, and the secure_getenv from OpenSSL internals on
OPENSSL_CONF environment variable will get ignored (openssl gh issue
21770).
According to capabilities(7), Ambient capabilities don't trigger
ld.so triggering the secure execution mode.
Include SELinux and Apparmor capabilities for ipc_lock
Originally requested to be infinity, but rolled back to 99%
to allow for a remote ssh connection or the odd needed system
job. This is up from 15% which is the effective default of
DefaultTasksMax.
Thanks Rick Pizzi for the bug report.
Add SYSTEMD_READWRITEPATH-variable to mariadb{@,}.service.in to make sure that
if one is not building RPM or DEB packages then make sure there is ReadWritePaths
directive is defined in systemd service file.
This ensures that tar-ball installation has permissions to write database default
installation path (default: /usr/local/mysql/data) even if it's located
under /usr. Writing to that location is prevented by 'ProtectSystem=full'
systemd directive by default.
Prefixing the path with "-" in systemd causes there to not be an error if the
path doesn't exist. This may occur if the user has configured a datadir
elsewhere.
Reviewer: Daniel Black
Quoting MDEV reporter Daniel Lewart:
Starting MariaDB with default configuration causes the following problems:
"[Warning] Could not increase number of max_open_files to more than 16384 (request: 32186)"
silently reduces table_open_cache_instances from 8 (default) to 4
Default Server System Variables:
extra_max_connections = 1
max_connections = 151
table_open_cache = 2000
table_open_cache_instances = 8
thread_pool_size = 4
LimitNOFILE=16834 is in the following files:
support-files/mariadb.service.in
support-files/mariadb@.service.in
Looking at sql/mysqld.cc lines 3837-3917:
wanted_files= (extra_files + max_connections + extra_max_connections +
tc_size * 2 * tc_instances);
wanted_files+= threadpool_size;
Plugging in the default values:
wanted_files = (30 + 151 + 1 + 2000 * 2 * 8 + 4) = 32186
However, systemd configuration has LimitNOFILE = 16384, which is far smaller.
I suggest increasing LimitNOFILE to 32768.
liburing is a new optional dependency (WITH_URING=auto|yes|no)
that replaces libaio when it is available.
aio_uring: class which wraps io_uring stuff
aio_uring::bind()/unbind(): optional optimization
aio_uring::submit_io(): mutex prevents data race. liburing calls are
thread-unsafe. But if you look into it's implementation you'll see
atomic operations. They're used for synchronization between kernel and
user-space only. That's why our own synchronization is still needed.
For systemd, we add LimitMEMLOCK=524288 (ulimit -l 524288)
because the io_uring_setup system call that is invoked
by io_uring_queue_init() requests locked memory. The value
was found empirically; with 262144, we would occasionally
fail to enable io_uring when using the maximum values of
innodb_read_io_threads=64 and innodb_write_io_threads=64.
aio_uring::thread_routine(): Tolerate -EINTR return from
io_uring_wait_cqe(), because it may occur on shutdown
on Ubuntu 20.10 (Groovy Gorilla).
This was mostly implemented by Eugene Kosov. Systemd integration
and improved startup/shutdown error handling by Marko Mäkelä.
Replace all references to /usr/sbin/mysqld (and bin and libexec) with
mariadbd, so that the binary server will always be 'mariadbd'.
Also update all places that reference the server binary in other ways,
such as AppArmor profiles and scripts that previously expected to find
a 'mysqld' in process lists.
When trying to start mariadb via systemctl, WSREP failed
to start mysqld for wsrep recovery, because the binary
"galera-recovery" is neither searching the mysqld in the
same folder as the binary itself nor in the path variable
but instead expects the root to be /usr/local/mysql.
This fix changes the current directory to the desired
directory before starting mysqld.
The arg was introduced as part of 75bcf1f9ad
to fix a SELinux problem caused by mysqld_safe accessing files it should
not be via the my_which function.
The root cause for this was fixed in 10.3, via
355ee6877b which eliminated the my_which
function from mysqld_safe entirely. Thus, in 10.3, this --basedir flag
is not necessary.
The unit files made systemd print:
systemd[1]: Started MariaDB 10.3.13 database server (multi-instance).
Let's add the instance name, so starting mariadb@foo.service
makes it print:
systemd[1]: Started MariaDB 10.3.13 database server (multi-instance foo).
Include comment header that describes overrides.
Unit description now includes @VERSION@.
After=syslog.target removed - redunant
Add --basedir=@prefix to prevent /root/.my.cnf lookups. This is
placed after $MYSQLD_OPTIONS in case a user sets a --{no,}default
type options which has to be first in the mysqld arguements.
Additional changes to multi instance (support-files/mariadb@.service.in):
* added @SYSTEMD_EXECSTARTPRE@ / @SYSTEMD_EXECSTARTPOST@
* removed mariadb@bootstrap reference as galera_new_cluster as
it's a little too proment.
* use_galera_new_cluster.conf updated to override pre/post steps
to ensure it has no side effects
Signed-off-by: Daniel Black <daniel@linux.vnet.ibm.com>
* wait() for the child process to die, let it rest in peace
* fix incorrect parentheses
* if there was no password on the command line or in .cnf file,
pkt will be "", and we need to request the user to enter the password
* make sure that auth->salt is always allocated on a permanent memroot.
when called from set_user_salt_if_needed(), user_copy and its auth_str
are on the thd memroot, but auth_copy->salt is then copied to auth->salt
* adjust service files so that systemd wouldn't interfere with our
setuid executables
also
* print the pam error message in debug mode
By removing Galera functionality, we remove PermissionsStartOnly=true
and hence make this service more flexible for running multiple
instances each on a different user.