allow to set what subsystems are used by
libcontainer/cgroups/fs.Manager.
subsystemsUnified is used on a system running with cgroups v2 unified
mode.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Transient units (and transient slice units) have been available for quite a
long time and RHEL 7 with systemd v219 (likely the oldest OS we care about at
this point) supports that. A system running a systemd without these features is
likely to break a lot of other stuff that runc/libcontainer care about.
Regarding delegated slices, modern systemd doesn't allow it and
runc/libcontainer run fine on it, so we might as well just stop requesting it
on older versions of systemd which allowed it. (Those versions never really
changed behavior significantly when that option was passed anyways.)
Signed-off-by: Filipe Brandenburger <filbranden@gmail.com>
This dependency is only needed in package "github.com/coreos/go-systemd/util"
and we only use it for IsRunningSystemd(), which is a simple Go function that
just stats a file.
Let's just borrow it here, so we remove the dependency and can remove that
package from vendored build.
This also removes dependencies on dlopen and on trying to find libsystemd.so
or libsystemd-login.so in the system.
Tested that this still builds and works as expected.
Signed-off-by: Filipe Brandenburger <filbranden@gmail.com>
`Init` on the `Process` struct specifies whether the process is the first process in the container. This needs to be set to `true` when running the container.
Signed-off-by: Andreas Stocker <astocker@anexia-it.com>
when exec as root and config.Cwd is not owned by root, exec will fail
because root doesn't have the caps.
So, Chdir should be done before setting the caps.
Signed-off-by: Kurnia D Win <kurnia.d.win@gmail.com>
This will permit us to extend the internals of systemd.Manager to include
further information about the system, such as whether cgroupv1, cgroupv2 or
both are in effect.
Furthermore, it allows a future refactor of moving more of UseSystemd() code
into the factory initialization function.
Signed-off-by: Filipe Brandenburger <filbranden@gmail.com>
In the exception handling of initProcess.start(), we need to add the
missing IntelRdtManager.Destroy() handler in defer func.
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
We discovered in umoci that setting a dummy type of "none" would result
in file-based bind-mounts no longer working properly, which is caused by
a restriction for when specconv will change the device type to "bind" to
work around rootfs_linux.go's ... issues.
However, bind-mounts don't have a type (and Linux will ignore any type
specifier you give it) because the type is copied from the source of the
bind-mount. So we should always overwrite it to avoid user confusion.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Refactor configuring logging into a reusable component
so that it can be nicely used in both main() and init process init()
Co-authored-by: Georgi Sabev <georgethebeatle@gmail.com>
Co-authored-by: Giuseppe Capizzi <gcapizzi@pivotal.io>
Co-authored-by: Claudia Beresford <cberesford@pivotal.io>
Signed-off-by: Danail Branekov <danailster@gmail.com>
Add support for children processes logging (including nsexec).
A pipe is used to send logs from children to parent in JSON.
The JSON format used is the same used by logrus JSON formatted,
i.e. children process can use standard logrus APIs.
Signed-off-by: Marco Vedovati <mvedovati@suse.com>
Whenever processes are spawned using nsexec, a zombie runc:[1:CHILD]
process will always be created and will need to be reaped by the parent
Signed-off-by: Alex Fang <littlelightlittlefire@gmail.com>
secure_getenv is a Glibc extension and so this code does not compile
on Musl libc any more after this patch.
secure_getenv is only intended to be used in setuid binaries, in
order that they should not trust their environment. It simply returns
NULL if the binary is running setuid. If runc was installed setuid,
the user can already do anything as root, so it is game over, so this
check is not needed.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Work is ongoing in the kernel to support different kernel
keyrings per user namespace. We want to allow SELinux to manage
kernel keyrings inside of the container.
Currently when runc creates the kernel keyring it gets the label which runc is
running with ususally `container_runtime_t`, with this change the kernel keyring
will be labeled with the container process label container_t:s0:C1,c2.
Container running as container_t:s0:c1,c2 can manage keyrings with the same label.
This change required a revendoring or the SELinux go bindings.
github.com/opencontainers/selinux.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>