opencontainers/runc

mirror of https://github.com/opencontainers/runc.git synced 2025-07-08 18:21:57 +03:00

Author	SHA1	Message	Date
Sohan Kunkerkar	cde1d0908a	libcontainer: force apps to think fips is enabled/disabled for testing The motivation behind this change is to provide a flexible mechanism for containers within a Kubernetes cluster to opt out of FIPS mode when necessary. This change enables apps to simulate FIPS mode being enabled or disabled for testing purposes. Users can control whether apps believe FIPS mode is on or off by manipulating `/proc/sys/crypto/fips_enabled`. Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>	2024-04-10 18:58:34 -04:00
Aleksa Sarai	cdff09ab87	rootfs: fix 'can we mount on top of /proc' check Our previous test for whether we can mount on top of /proc incorrectly assumed that it would only be called with bind-mount sources. This meant that having a non bind-mount entry for a pseudo-filesystem (like overlayfs) with a dummy source set to /proc on the host would let you bypass the check, which could easily lead to security issues. In addition, the check should be applied more uniformly to all mount types, so fix that as well. And add some tests for some of the tricky cases to make sure we protect against them properly. Fixes: `331692baa7` ("Only allow proc mount if it is procfs") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2023-12-14 11:36:42 +11:00
Aleksa Sarai	8e8b136c49	tree-wide: use /proc/thread-self for thread-local state With the idmap work, we will have a tainted Go thread in our thread-group that has a different mount namespace to the other threads. It seems that (due to some bad luck) the Go scheduler tends to make this thread the thread-group leader in our tests, which results in very baffling failures where /proc/self/mountinfo produces gibberish results. In order to avoid this, switch to using /proc/thread-self for everything that is thread-local. This primarily includes switching all file descriptor paths (CLONE_FS), all of the places that check the current cgroup (technically we never will run a single runc thread in a separate cgroup, but better to be safe than sorry), and the aforementioned mountinfo code. We don't need to do anything for the following because the results we need aren't thread-local: * Checks that certain namespaces are supported by stat(2)ing /proc/self/ns/... * /proc/self/exe and /proc/self/cmdline are not thread-local. * While threads can be in different cgroups, we do not do this for the runc binary (or libcontainer) and thus we do not need to switch to the thread-local version of /proc/self/cgroups. * All of the CLONE_NEWUSER files are not thread-local because you cannot set the usernamespace of a single thread (setns(CLONE_NEWUSER) is blocked for multi-threaded programs). Note that we have to use runtime.LockOSThread when we have an open handle to a tid-specific procfs file that we are operating on multiple times. Go can reschedule us such that we are running on a different thread and then kill the original thread (causing -ENOENT or similarly confusing errors). This is not strictly necessary for most usages of /proc/thread-self (such as using /proc/thread-self/fd/$n directly) since only operating on the actual inodes associated with the tid requires this locking, but because of the pre-3.17 fallback for CentOS, we have to do this in most cases. In addition, CentOS's kernel is too old for /proc/thread-self, which requires us to emulate it -- however in rootfs_linux.go, we are in the container pid namespace but /proc is the host's procfs. This leads to the incredibly frustrating situation where there is no way (on pre-4.1 Linux) to figure out which /proc/self/task/... entry refers to the current tid. We can just use /proc/self in this case. Yes this is all pretty ugly. I also wish it wasn't necessary. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2023-12-14 11:36:41 +11:00
Irwin D'Souza	b76b6b9338	Allow mounting of /proc/sys/kernel/ns_last_pid The CAP_CHECKPOINT_RESTORE linux capability provides the ability to update /proc/sys/kernel/ns_last_pid. However, because this file is under /proc, and by default both K8s and CRI-O specify that /proc/sys should be mounted as Read-Only, by default even with the capability specified, a process will not be able to write to ns_last_pid. To get around this, a pod author can specify a volume mount and a hostpath to bind-mount /proc/sys/kernel/ns_last_pid. However, runc does not allow specifying mounts under /proc. This commit adds /proc/sys/kernel/ns_last_pid to the validProcMounts string array to enable a pod author to mount ns_last_pid as read-write. The default remains unchanged; unless explicitly requested as a volume mount, ns_last_pid will remain read-only regardless of whether or not CAP_CHECKPOINT_RESTORE is specified. Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>	2022-04-07 14:08:59 -04:00
Kir Kolyshkin	9ff64c3d97	*: rm redundant linux build tag For files that end with _linux.go or _linux_test.go, there is no need to specify linux build tag, as it is assumed from the file name. In addition, rename libcontainer/notify_linux_v2.go -> libcontainer/notify_v2_linux.go for the file name to make sense. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-08-30 20:15:00 -07:00
Michael Crosby	331692baa7	Only allow proc mount if it is procfs Fixes #2128 This allows proc to be bind mounted for host and rootless namespace usecases but it removes the ability to mount over the top of proc with a directory. ```bash > sudo docker run --rm apparmor docker: Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"/var/lib/docker/volumes/aae28ea068c33d60e64d1a75916cf3ec2dc3634f97571854c9ed30c8401460c1/_data\\\" to rootfs \\\"/var/lib/docker/overlay2/a6be5ae911bf19f8eecb23a295dec85be9a8ee8da66e9fb55b47c841d1e381b7/merged\\\" at \\\"/proc\\\" caused \\\"\\\\\\\"/var/lib/docker/overlay2/a6be5ae911bf19f8eecb23a295dec85be9a8ee8da66e9fb55b47c841d1e381b7/merged/proc\\\\\\\" cannot be mounted because it is not of type proc\\\"\"": unknown. > sudo docker run --rm -v /proc:/proc apparmor docker-default (enforce) root 18989 0.9 0.0 1288 4 ? Ss 16:47 0:00 sleep 20 ``` Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2019-09-24 11:00:18 -04:00
Giuseppe Scrivano	636b664027	linux: drop check for /proc as invalid dest it is now allowed to bind mount /proc. This is useful for rootless containers when the PID namespace is shared with the host. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2018-08-30 09:56:18 +02:00
Michael Crosby	70b16a5ab9	Remove check for binding to / In order to mount root filesystems inside the container's mount namespace as part of the spec we need to have the ability to do a bind mount to / as the destination. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-09-29 15:26:09 -07:00
Akihiro Suda	42234a85d1	Fix setupDev logic in rootfs_linux.go setupDev was introduced in #96, but broken since #536 because spec 0.3.0 introduced default devices. Fix #80 again Fix docker/docker#21808 Signed-off-by: Akihiro Suda <suda.kyoto@gmail.com> Signed-off-by: Alexander Morozov <lk4d4@docker.com>	2016-04-11 10:29:40 -07:00
Michael Crosby	8f97d39dd2	Move libcontainer into subdirectory Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-06-21 19:29:15 -07:00

10 Commits