1
0
mirror of https://github.com/opencontainers/runc.git synced 2025-07-30 17:43:06 +03:00
Commit Graph

1574 Commits

Author SHA1 Message Date
5935bf8c21 libct/cgroups/fs: introduce NewManager()
...and use it from libcontainer/factory_linux.go.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-08 10:06:05 -07:00
24f945e08d libct/cgroups/systemd/v2: return a public interface
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-08 10:06:02 -07:00
63854b0ea8 newSetnsProcess: reuse state.CgroupPaths
c.cgroupManager.GetPaths() are called twice here: once in currentState()
and then in newSetnsProcess(). Reuse the result of the first call, which
is stored into state.CgroupPaths.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-08 10:05:59 -07:00
9a3e632625 notify: simplify usage
Instead of passing the whole map of paths, pass the path to the memory
controller which these functions actually require.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-08 10:05:58 -07:00
b18a9650f8 test: update devicefilter tests
The test cases need to take into account the assembly modifications.
The instruction:
	LdXMemH dst: r2 src: r1 off: 0 imm: 0
has been replaced with:
        LdXMemW dst: r2 src: r1 off: 0 imm: 0
        And32Imm dst: r2 imm: 65535

Signed-off-by: Alice Frosi <afrosi@de.ibm.com>
2020-05-08 07:31:05 +01:00
128cb60f58 ebpf: fix big endian issue for s390x
Load the full 32 bits word and take the lower 16 bits, instead of
reading just 16 bits.

Same fix as 07bae05e61

Signed-off-by: Alice Frosi <afrosi@de.ibm.com>
2020-05-08 07:31:05 +01:00
bf15cc99b1 cgroup v2: support rootless systemd
Tested with both Podman (master) and Moby (master), on Ubuntu 19.10 .

$ podman --cgroup-manager=systemd run -it --rm --runtime=runc \
  --cgroupns=host --memory 42m --cpus 0.42 --pids-limit 42 alpine
/ # cat /proc/self/cgroup
0::/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope
/ # cat /sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/memory.max
44040192
/ # cat /sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/cpu.max
42000 100000
/ # cat /sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/pids.max
42

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-05-08 12:39:20 +09:00
492cfd8bf9 Merge pull request #2352 from lifubang/eventsv2
fix runc events error in cgroup v2
2020-05-08 12:51:05 +09:00
657407ff23 fix runc events error in cgroup v2
Signed-off-by: lifubang <lifubang@acmcoder.com>
2020-05-07 22:18:46 +08:00
b48bbdd08d vendor: opencontainers/selinux v1.5.1, update deprecated uses
full diff: https://github.com/opencontainers/selinux/v1.4.0...v1.5.1

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-05-05 15:53:40 +02:00
407e9f9d0d Add reading of information from cpuacct.usage_all
Remove logrus logs from tests

Signed-off-by: Katarzyna Kujawa <katarzyna.kujawa@intel.com>
2020-05-05 08:51:12 +02:00
a57358e016 Merge pull request #2370 from lifubang/swap0
let runc disable swap in cgroup v2
2020-05-04 16:57:12 -07:00
402d645c5c Simplify ticks, as the value is a constant
See for example in the Musl libc source code https://git.musl-libc.org/cgit/musl/tree/src/conf/sysconf.c#n29

This removes the cgo dependency for the system package.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-05-04 23:05:46 +02:00
9df0b5e268 libcontainer: RunningInUserNS() use sync.Once
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-05-04 15:53:33 +02:00
6161d255b6 Merge pull request #2375 from tedyu/wait-lazy-close
Close fd in case fd.Write() returns error
2020-05-03 09:03:40 -07:00
a70f354680 let runc disable swap in cgroup v2
In cgroup v2, when memory and memorySwap set to the same value which is greater than zero,
runc should write zero in `memory.swap.max` to disable swap.

Signed-off-by: lifubang <lifubang@acmcoder.com>
2020-05-03 20:57:36 +08:00
db29dce076 Close fd in case fd.Write() returns error
Signed-off-by: Ted Yu <yuzhihong@gmail.com>
2020-05-02 20:06:08 -07:00
64ca54816c libcontainer: simplify error message
The error message was including both the rootfs path, and the full
mount path, which also includes the path of the rootfs.

This patch removes the rootfs path from the error message, as it
was redundant, and made the error message overly verbose

Before this patch (errors wrapped for readability):

```
container_linux.go:348: starting container process caused: process_linux.go:438:
container init caused: rootfs_linux.go:58: mounting "/foo.txt"
to rootfs "/var/lib/docker/overlay2/de506d67da606b807009e23b548fec60d72359c77eec88785d8c7ecd54a6e4b2/merged"
at "/var/lib/docker/overlay2/de506d67da606b807009e23b548fec60d72359c77eec88785d8c7ecd54a6e4b2/merged/usr/share/nginx/html"
caused: not a directory: unknown
```

With this patch applied:

```
container_linux.go:348: starting container process caused: process_linux.go:438:
container init caused: rootfs_linux.go:58: mounting "/foo.txt"
to rootfs at "/var/lib/docker/overlay2/de506d67da606b807009e23b548fec60d72359c77eec88785d8c7ecd54a6e4b2/merged/usr/share/nginx/html"
caused: not a directory: unknown
```

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-05-03 02:59:46 +02:00
2adfd20ac9 libcontainer: don't double-quote errors
genericError.Error() was formatting the underlying error using `%q`; as a
result, quotes in underlying errors were escaped multiple times, which
caused the output to become hard to read, for example (wrapped for readability):

```
container_linux.go:345: starting container process caused "process_linux.go:430:
container init caused \"rootfs_linux.go:58: mounting \\\"/foo.txt\\\"
to rootfs \\\"/var/lib/docker/overlay2/f49a0ae0ec6646c818dcf05dbcbbdd79fc7c42561f3684fbb1fc5d2b9d3ad192/merged\\\"
at \\\"/var/lib/docker/overlay2/f49a0ae0ec6646c818dcf05dbcbbdd79fc7c42561f3684fbb1fc5d2b9d3ad192/merged/usr/share/nginx/html\\\"
caused \\\"not a directory\\\"\"": unknown
```

With this patch applied:

```
container_linux.go:348: starting container process caused: process_linux.go:438:
container init caused: rootfs_linux.go:58: mounting "/foo.txt"
to rootfs "/var/lib/docker/overlay2/de506d67da606b807009e23b548fec60d72359c77eec88785d8c7ecd54a6e4b2/merged"
at "/var/lib/docker/overlay2/de506d67da606b807009e23b548fec60d72359c77eec88785d8c7ecd54a6e4b2/merged/usr/share/nginx/html"
caused: not a directory: unknown
```

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-05-03 02:55:15 +02:00
c3b0b13fe9 cgroups/fs2: don't always parse /proc/self/cgroup
Function defaultPath always parses /proc/self/cgroup, but
the resulting value is not always used.

Avoid unnecessary reading/parsing by moving the code
to just before its use.

Modify the test case accordingly.

[v2: test: use UnifiedMountpoint, skip test if not on v2]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-28 22:16:36 -07:00
0a4dcc0203 Merge pull request #2331 from lifubang/StartTransientUnit
check that StartTransientUnit/StopUnit succeeds

LGTMs: @AkihiroSuda @kolyshkin 
Closes #2313, #2309
2020-04-28 10:47:52 -07:00
bfa1b2aab3 check that StartTransientUnit and StopUnit succeeds
Signed-off-by: lifubang <lifubang@acmcoder.com>
2020-04-28 15:46:28 +08:00
60c647e3b8 fs2: fix cgroup.subtree_control EPERM on rootless + add CI
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-04-27 13:30:15 +09:00
799d94818d intelrdt: Add Cache Monitoring Technology stats
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
2020-04-25 09:43:48 +02:00
b19f9cecfe Merge pull request #2343 from lifubang/updateSystemdScope
fix data inconsistency when using runc update in systemd driven cgroup
2020-04-24 23:34:19 -07:00
0fd8d468ea Merge pull request #2318 from lifubang/linuxResources
cgroupv2: use default allowed devices when linux resources is null
2020-04-25 09:00:23 +09:00
634e51b52c Merge pull request #2335 from kolyshkin/cgroupv2-cpt
Fix cgroupv2 checkpoint/restore
2020-04-24 08:47:36 -07:00
49ca1fd074 Merge pull request #2347 from kolyshkin/v2-allow-all-devs
cgroupv2: allow to set EnableAllDevices=true
2020-04-24 16:09:40 +09:00
c420a3ec7f Merge pull request #2324 from kolyshkin/criu-freezer
libcontainer: fix Checkpoint wrt cgroupv2
2020-04-23 19:24:38 -07:00
440244268b Merge pull request #2330 from KentaTada/use-linuxnamespace-const
libcontainer: use consts of Namespace from runtime-spec
2020-04-23 18:58:29 -07:00
55d5c99ca7 libct/mountToRootfs: rm useless code
To make a bind mount read-only, it needs to be remounted. This is what
the code removed does, but it is not needed here.

We have to deal with three cases here:

1. cgroup v2 unified mode. In this case the mount is real mount with
   fstype=cgroup2, and there is no need to have a bind mount on top,
   as we pass readonly flag to the mount as is.

2. cgroup v1 + cgroupns (enableCgroupns == true). In this case the
   "mount" is in fact a set of real mounts with fstype=cgroup, and
   they are all performed in mountCgroupV1, with readonly flag
   added if needed.

3. cgroup v1 as is (enableCgroupns == false). In this case
   mountCgroupV1() calls mountToRootfs() again with an argument
   from the list obtained from getCgroupMounts(), i.e. a bind
   mount with the same flags as the original mount has (plus
   unix.MS_BIND | unix.MS_REC), and mountToRootfs() does remounting
   (under the case "bind":).

So, the code which this patch is removing is not needed -- it
essentially does nothing in case 3 above (since the bind mount
is already remounted readonly), and in cases 1 and 2 it
creates an unneeded extra bind mount on top of a real one (or set of
real ones).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-23 16:49:12 -07:00
20959b1666 libcontainer/integration/checkpoint_test: simplify
Since commit 9280e3566d it is not longer needed to have `cgroup2'
mount.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-23 15:22:32 -07:00
1d4ccc8e0c fix data inconsistent when runc update in systemd driven cgroup v1
Signed-off-by: lifubang <lifubang@acmcoder.com>
2020-04-23 19:32:57 +08:00
7682a2b2a5 fix data inconsistent when runc update in systemd driven cgroup v2
Signed-off-by: lifubang <lifubang@acmcoder.com>
2020-04-23 19:32:07 +08:00
4474795388 libcontainer: use x/sys/unix instead of the hardcoded value
PR_SET_CHILD_SUBREAPER is defined in x/sys/unix.

Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>
2020-04-23 10:49:51 +09:00
9280e3566d checkpoint/restore: fix cgroupv2 handling
In case of cgroupv2 unified hierarchy, the /sys/fs/cgroup mount
is the real mount with fstype of cgroup2 (rather than a set of
external bind mounts like for cgroupv1).

So, we should not add it to the list of "external bind mounts"
on both checkpoint and restore.

Without this fix, checkpoint integration tests fail on cgroup v2.

Also, same is true for cgroup v1 + cgroupns.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-22 11:26:43 -07:00
75a92ea615 cgroupv2: allow to set EnableAllDevices=true
In this case we just do not install any eBPF rules
checking the devices.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-22 11:05:36 -07:00
46be7b612e Merge pull request #2299 from kolyshkin/fs2-init-ctrl
cgroupv2: fix fs2 driver initialization
2020-04-20 21:27:42 -07:00
ab276b1c09 cgroups/fs2/Destroy: use Remove, ignore ENOENT
1. There is no need to try removing it recursively.

2. Do not treat ENOENT as an error (similar to fs
   and systemd v1 drivers).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-19 16:27:40 -07:00
4b4bc995ad CreateCgroupPath: only enable needed controllers
1. Instead of enabling all available controllers, figure out which
   ones are required, and only enable those.

2. Amend all setFoo() functions to call isFooSet(). While this might
   seem unnecessary, it might actually help to uncover a bug.
   Imagine someone:
    - adds a cgroup.Resources.CpuFoo setting;
    - modifies setCpu() to apply the new setting;
    - but forgets to amend isCpuSet() accordingly <-- BUG

   In this case, a test case modifying CpuFoo will help
   to uncover the BUG. This is the reason why it's added.

This patch *could be* amended by enabling controllers on a best-effort
basis, i.e. :

 - do not return an error early if we can't enable some controllers;
 - if we fail to enable all controllers at once (usually because one
   of them can't be enabled), try enabling them one by one.

Currently this is not implemented, and it's not clear whether this
would be a good way to go or not.

[v2: add/use is${Controller}Set() functions]
[v3: document neededControllers()]
[v4: drop "best-effort" part]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-19 16:27:40 -07:00
bb47e35843 cgroup/systemd: reorganize
1. Rename the files
  - v1.go: cgroupv1 aka legacy;
  - v2.go: cgroupv2 aka unified hierarchy;
  - unsupported.go: when systemd is not available.

2. Move the code that is common between v1 and v2 to common.go

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-19 16:27:40 -07:00
de1134156b cgroups/fs2/CreateCgroupPath: nit
This slightly improves code readability.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-19 16:27:40 -07:00
b5c1949f2a cgroups/fs2/CreateCgroupPath: reinstate check
This check was removed in commit 5406833a65. Now, when this
function is called from a few places, it is no longer obvious
that the path always starts with /sys/fs/cgroup/, so reinstate
the check just to be on the safe side.

This check also ensures that elements[3:] can be used safely.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-19 16:27:40 -07:00
813cb3eb94 cgroupv2: fix fs2 cgroup init
fs2 cgroup driver was not working because it did not enable controllers
while creating cgroup directory; instead it was merely doing MkdirAll()
and gathered the list of available controllers in NewManager().

Also, cgroup should be created in Apply(), not while creating a new
manager instance.

To fix:

1. Move the createCgroupsv2Path function from systemd driver to fs2 driver,
   renaming it to CreateCgroupPath. Use in Apply() from both fs2 and
   systemd drivers.

2. Delay available controllers map initialization to until it is needed.

With this patch:
 - NewManager() only performs minimal initialization (initializin
   m.dirPath, if not provided);
 - Apply() properly creates cgroup path, enabling the controllers;
 - m.controllers is initialized lazily on demand.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-19 16:27:40 -07:00
60eaed2ed6 cgroupv2: move sanity path check to common code
The fs2 cgroup driver has a sanity check for path.
Since systemd driver is relying on the same path,
it makes sense to move this check to the common code.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-19 16:27:40 -07:00
dbeff89491 cgroupv2/systemd: privatize UnifiedManager
... and its Cgroup field. There is no sense to keep it public.

This was generated by gorename.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-19 16:27:40 -07:00
88c13c0713 cgroupv2: use SecureJoin in systemd driver
It seems that some paths are coming from user and are therefore
untrusted.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-19 16:20:22 -07:00
9c80cd672d cgroupv2: rm legacy Paths from systemd driver
Having map of per-subsystem paths in systemd unified cgroups
driver does not make sense and makes the code less readable.

To get rid of it, move the systemd v1-or-v2 init code to
libcontainer/factory_linux.go which already has a function
to deduce unified path out of paths map.

End result is much cleaner code. Besides, we no longer write pid
to the same cgroup file 7 times in Apply() like we did before.

While at it
 - add `rootless` flag which is passed on to fs2 manager
 - merge getv2Path() into GetUnifiedPath(), don't overwrite
   path if it is set during initialization (on Load).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-19 16:19:51 -07:00
3de8613327 libcontainer: use consts of Namespace from runtime-spec
Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>
2020-04-19 23:21:40 +09:00
480bca91be cgroups/fs2: move type decl to beginning
It was weird having it somewhere in the middle.

No code change, just moving it around.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-18 18:43:41 -07:00