1
0
mirror of https://github.com/opencontainers/runc.git synced 2025-08-08 12:42:06 +03:00
Commit Graph

183 Commits

Author SHA1 Message Date
Kir Kolyshkin
8c5a19f79b libct/cgroups/fs: rename some files
no changes, just a few git renames

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:45:54 -07:00
Kir Kolyshkin
7db2d3e146 libcontainer/cgroups: rm FindCgroupMountpointDir
This function is cgroupv1-specific, is only used once, and its name
is very close to the name of another function, FindCgroupMountpoint.

Inline it into the (only) caller.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:40:15 -07:00
Kir Kolyshkin
dd2426d067 libct/cgroups: fix m.paths map access
This fixes a few cases of accessing m.paths map directly without holding
the mutex lock.

Fixes: 9087f2e82
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-15 18:30:16 -07:00
Kir Kolyshkin
a77d7b1d0f libct: don't use GetPaths
Since commit 714c91e9f7, method GetPaths() should only be used
for saving container state. For other uses, we have a new method,
Path(), which is cleaner.

Fix GetPaths() usage introduced by recent commits 859a780d6f and 9087f2e82.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-15 18:27:34 -07:00
Kir Kolyshkin
5b247e739c Merge pull request #2338 from lifubang/systemdcgroupv2
fix path error in systemd when stopped

LGTMs: @mrunalp @AkihiroSuda
2020-06-15 18:01:13 -07:00
Katarzyna Kujawa
71e63de4a3 Fix #2469 omit memory.numa_stat when not available
Signed-off-by: Katarzyna Kujawa <katarzyna.kujawa@intel.com>
2020-06-15 11:39:34 +02:00
lifubang
9087f2e827 fix path error in systemd when stopped
When we use cgroup with systemd driver, the cgroup path will be auto removed
by systemd when all processes exited. So we should check cgroup path exists
when we access the cgroup path, for example in `kill/ps`, or else we will
got an error.

Signed-off-by: lifubang <lifubang@acmcoder.com>
2020-06-02 18:17:43 +08:00
Katarzyna Kujawa
92f831bf0c Fix #2440 omit cpuacct.usage_all when not available
Signed-off-by: Katarzyna Kujawa <katarzyna.kujawa@intel.com>
2020-06-02 09:24:11 +02:00
Kir Kolyshkin
3249e2379c cgroupv1: check cpu shares in place
Commit 4e65e0e90a added a check for cpu shares. Apparently, the
kernel allows to set a value higher than max or lower than min without
an error, but the value read back is always within the limits.

The check (which was later moved out to a separate CheckCpushares()
function) is always performed after setting the cpu shares, so let's
move it to the very place where it is set.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-29 16:46:28 -07:00
Kir Kolyshkin
d57f5bb286 cgroupv1: don't ignore MemorySwap if Memory==-1
Commit 18ebc51b3cc3 "Reset Swap when memory is set to unlimited (-1)"
added handling of the case when a user updates the container limits
to set memory to unlimited (-1) but do not set any other limits.
Apparently, in this case, if swap limit was previously set, kernel fails
to set memory.limit_in_bytes to -1 if memory.memsw.limit_in_bytes is
not set to -1.

What the above commit fails to handle correctly is the request when
Memory is set to -1 and MemorySwap is set to some specific limit N
(where N > 0). In this case, the value of N is silently discarded
and MemorySwap is set to -1 instead.

This is wrong thing to do, as the limit set, even if incorrectly,
should not be ignored.

Fix this by only assigning MemorySwap == -1 in case it was not
explicitly set.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-20 17:23:40 -07:00
Akihiro Suda
2fa3c286b5 fix "libcontainer/cgroups/fs/cpuset.go:63:14: undefined: fmt"
The compilation error had ocurred because of a bad rebase during #2401 and #2413

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-05-19 23:38:20 +09:00
Akihiro Suda
f369199ff6 Merge pull request #2413 from JFHwang/2392-spec-check
Add nil check of spec.Process in validateProcessSpec()
2020-05-19 08:11:22 +09:00
Mrunal Patel
53a4649776 Merge pull request #2401 from kolyshkin/fs-cpuset-mountinfo
libct/cgroup: rm GetClosestMountpointAncestor using moby/sys/mountinfo parser
2020-05-18 10:43:55 -07:00
John Hwang
7fc291fd45 Replace formatted errors when unneeded
Signed-off-by: John Hwang <John.F.Hwang@gmail.com>
2020-05-16 18:13:21 -07:00
Akihiro Suda
3f1e886991 Merge pull request #2391 from cyphar/devices-cgroup
cgroup: devices: major cleanups and minimal transition rules
2020-05-14 09:57:06 +09:00
Kir Kolyshkin
2db3240f35 libct/cgroups: rm GetClosestMountpointAncestor
The function GetClosestMountpointAncestor is not very efficient,
does not really belong to cgroup package, and is only used once
(from fs/cpuset.go).

Remove it, replacing with the implementation based on moby/sys/mountinfo
parser.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-13 17:32:06 -07:00
Kir Kolyshkin
f160352682 libct/cgroup: prep to rm GetClosestMountpointAncestor
This function is not very efficient, does not really belong to cgroup
package, and is only used once (from fs/cpuset.go).

Prepare to remove it by replacing with the implementation based on
the parser from github.com/moby/sys/mountinfo parser.

This commit is here to make sure the proposed replacement passes the
unit test.

Funny, but the unit test need to be slightly modified since it
supplies the wrong mountinfo (space as the first character, empty line
at the end).

Validated by

 $ go test -v -run Ance
 === RUN   TestGetClosestMountpointAncestor
 --- PASS: TestGetClosestMountpointAncestor (0.00s)
 PASS
 ok  	github.com/opencontainers/runc/libcontainer/cgroups	0.002s

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-13 16:26:16 -07:00
Kir Kolyshkin
41855317b6 Merge pull request #2271 from katarzyna-z/kk-cpuacct-usage-all
Add reading of information from cpuacct.usage_all
2020-05-13 13:33:05 -07:00
Aleksa Sarai
afe83489d4 cgroupv1: devices: use minimal transition rules with devices.Emulator
Now that all of the infrastructure for devices.Emulator is in place, we
can finally implement minimal transition rules for devices cgroups. This
allows for minimal disruption to running containers if a rule update is
requested. Only in very rare circumstances (black-list cgroups and mode
switching) will a clear-all rule be written. As a result, containers
should no longer see spurious errors.

A similar issue affects the cgroupv2 devices setup, but that is a topic
for another time (as the solution is drastically different).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2020-05-13 17:42:43 +10:00
Aleksa Sarai
24388be71e configs: use different types for .Devices and .Resources.Devices
Making them the same type is simply confusing, but also means that you
could accidentally use one in the wrong context. This eliminates that
problem. This also includes a whole bunch of cleanups for the types
within DeviceRule, so that they can be used more ergonomically.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2020-05-13 17:38:45 +10:00
Aleksa Sarai
b2bec9806f cgroup: devices: eradicate the Allow/Deny lists
These lists have been in the codebase for a very long time, and have
been unused for a large portion of that time -- specconv doesn't
generate them and the only user of these flags has been tests (which
doesn't inspire much confidence).

In addition, we had an incorrect implementation of a white-list policy.
This wasn't exploitable because all of our users explicitly specify
"deny all" as the first rule, but it was a pretty glaring issue that
came from the "feature" that users can select whether they prefer a
white- or black- list. Fix this by always writing a deny-all rule (which
is what our users were doing anyway, to work around this bug).

This is one of many changes needed to clean up the devices cgroup code.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2020-05-13 17:38:45 +10:00
Aleksa Sarai
859a780d6f cgroups: add GetFreezerState() helper to Manager
This is effectively a nicer implementation of the container.isPaused()
helper, but to be used within the cgroup code for handling some fun
issues we have to fix with the systemd cgroup driver.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2020-05-13 17:38:45 +10:00
Kir Kolyshkin
714c91e9f7 Simplify cgroup path handing in v2 via unified API
This unties the Gordian Knot of using GetPaths in cgroupv2 code.

The problem is, the current code uses GetPaths for three kinds of things:

1. Get all the paths to cgroup v1 controllers to save its state (see
   (*linuxContainer).currentState(), (*LinuxFactory).loadState()
   methods).

2. Get all the paths to cgroup v1 controllers to have the setns process
    enter the proper cgroups in `(*setnsProcess).start()`.

3. Get the path to a specific controller (for example,
   `m.GetPaths()["devices"]`).

Now, for cgroup v2 instead of a set of per-controller paths, we have only
one single unified path, and a dedicated function `GetUnifiedPath()` to get it.

This discrepancy between v1 and v2 cgroupManager API leads to the
following problems with the code:

 - multiple if/else code blocks that have to treat v1 and v2 separately;

 - backward-compatible GetPaths() methods in v2 controllers;

 -  - repeated writing of the PID into the same cgroup for v2;

Overall, it's hard to write the right code with all this, and the code
that is written is kinda hard to follow.

The solution is to slightly change the API to do the 3 things outlined
above in the same manner for v1 and v2:

1. Use `GetPaths()` for state saving and setns process cgroups entering.

2. Introduce and use Path(subsys string) to obtain a path to a
   subsystem. For v2, the argument is ignored and the unified path is
   returned.

This commit converts all the controllers to the new API, and modifies
all the users to use it.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-08 12:04:06 -07:00
Kir Kolyshkin
1d143562d2 libct/cgroups/fs: access m.paths under lock
1. Prevent theoretical "concurrent map access" error to m.paths.

2. There is no need to call m.Paths -- we can access m.paths directly.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-08 10:09:55 -07:00
Kir Kolyshkin
fc620fdf81 libct/cgroups/fs: privatize Manager and its fields
This was generated entirely by gorename -- nothing to review here.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-08 10:07:00 -07:00
Kir Kolyshkin
5935bf8c21 libct/cgroups/fs: introduce NewManager()
...and use it from libcontainer/factory_linux.go.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-08 10:06:05 -07:00
Katarzyna Kujawa
407e9f9d0d Add reading of information from cpuacct.usage_all
Remove logrus logs from tests

Signed-off-by: Katarzyna Kujawa <katarzyna.kujawa@intel.com>
2020-05-05 08:51:12 +02:00
Sebastiaan van Stijn
402d645c5c Simplify ticks, as the value is a constant
See for example in the Musl libc source code https://git.musl-libc.org/cgit/musl/tree/src/conf/sysconf.c#n29

This removes the cgo dependency for the system package.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-05-04 23:05:46 +02:00
Kir Kolyshkin
af6b9e7fa9 nit: do not use syscall package
In many places (not all of them though) we can use `unix.`
instead of `syscall.` as these are indentical.

In particular, x/sys/unix defines:

```go
type Signal = syscall.Signal
type Errno = syscall.Errno
type SysProcAttr = syscall.SysProcAttr

const ENODEV      = syscall.Errno(0x13)
```

and unix.Exec() calls syscall.Exec().

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-18 16:16:49 -07:00
Michael Crosby
5c6216b1ed Merge pull request #2278 from iwankgb/memory.numa_stats
Exposing memory.numa_stats
2020-04-14 11:32:51 -04:00
iwankgb
7fe0a98e79 Exposing memory.numa_stats
Making information on page usage by type and NUMA node available

Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
2020-04-08 17:40:09 +02:00
Kir Kolyshkin
b2272b2cba libcontainer: use errors.Is() and errors.As()
Make use of errors.Is() and errors.As() where appropriate to check
the underlying error. The biggest motivation is to simplify the code.

The feature requires go 1.13 but since merging #2256 we are already
not supporting go 1.12 (which is an unsupported release anyway).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-02 20:34:01 -07:00
Kir Kolyshkin
c39f87a47a Revert "Merge pull request #2280 from kolyshkin/errors-unwrap"
Using errors.Unwrap() is not the best thing to do, since it returns
nil in case of an error which was not wrapped. More to say,
errors package provides more elegant ways to check for underlying
errors, such as errors.As() and errors.Is().

This reverts commit f8e138855d, reversing
changes made to 6ca9d8e6da.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-02 19:41:11 -07:00
Kir Kolyshkin
bd737f1e94 libct/cgroups/fs: use errors.Unwrap
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-31 20:07:04 -07:00
Kir Kolyshkin
66778b3c28 libct/setKernelMemory: use errors.Unwrap
This simplifies code a lot.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-31 20:07:04 -07:00
Boris Popovschi
3b992087b8 Fix skip message for cgroupv2
Signed-off-by: Boris Popovschi <zyqsempai@mail.ru>
2020-02-03 14:27:12 +02:00
Mrunal Patel
5cc0deaf7a Merge pull request #2169 from AkihiroSuda/split-fs
cgroup2: split fs2 from fs
2020-01-13 16:23:27 -08:00
Julio Montes
8ddd892072 libcontainer: add method to get cgroup config from cgroup Manager
`configs.Cgroup` contains the configuration used to create cgroups. This
configuration must be saved to disk, since it's required to restore the
cgroup manager that was used to create the cgroups.
Add method to get cgroup configuration from cgroup Manager to allow API users
save it to disk and restore a cgroup manager later.

fixes #2176

Signed-off-by: Julio Montes <julio.montes@intel.com>
2019-12-17 22:46:03 +00:00
Akihiro Suda
88e8350de2 cgroup2: split fs2 from fs
split fs2 package from fs, as mixing up fs and fs2 is very likely to result in
unmaintainable code.

Inspired by containerd/cgroups#109

Fix #2157

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-12-06 15:42:10 +09:00
Michael Crosby
8bb10af481 Merge pull request #2165 from AkihiroSuda/travis-f31
.travis.yml: add Fedora 31 vagrant box (for cgroup2)
2019-12-05 16:26:51 -05:00
Akihiro Suda
ccd4436fc4 .travis.yml: add Fedora 31 vagrant box (for cgroup2)
As the baby step, only unit tests are executed.

Failing tests are currently skipped and will be fixed in follow-up PRs.

Fix #2124

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-10-31 16:53:01 +09:00
Akihiro Suda
faf673ee45 cgroup2: port over eBPF device controller from crun
The implementation is based on https://github.com/containers/crun/blob/0.10.2/src/libcrun/ebpf.c

Although ebpf.c is originally licensed under LGPL-3.0-or-later, the author
Giuseppe Scrivano agreed to relicense the file in Apache License 2.0:
https://github.com/opencontainers/runc/issues/2144#issuecomment-543116397

See libcontainer/cgroups/ebpf/devicefilter/devicefilter_test.go for tested configurations.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-10-31 14:01:46 +09:00
Qiang Huang
e57a774066 Merge pull request #2149 from AkihiroSuda/cgroup2-ps
cgroup2: implement `runc ps`
2019-10-31 09:44:39 +08:00
Qiang Huang
d239ca8425 Merge pull request #2148 from AkihiroSuda/cg2-ignore-cpuset-when-no-config
cgroup2: cpuset_v2: skip Apply when no limit is specified
2019-10-29 21:57:58 +08:00
Akihiro Suda
74a3fe5d1b cgroup2: do not parse /proc/cgroups
/proc/cgroups is meaningless for v2 and should be ignored.

https://github.com/torvalds/linux/blob/v5.3/Documentation/admin-guide/cgroup-v2.rst#deprecated-v1-core-features

* Now GetAllSubsystems() parses /sys/fs/cgroup/cgroup.controller, not /proc/cgroups.
  The function result also contains "pseudo" controllers: {"devices", "freezer"}.
  As it is hard to detect availability of pseudo controllers, pseudo controllers
  are always assumed to be available.

* Now IOGroupV2.Name() returns "io", not "blkio"

Fix #2155 #2156

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-10-28 00:00:33 +09:00
Akihiro Suda
dbd771e475 cgroup2: implement runc ps
Implemented `runc ps` for cgroup v2 , using a newly added method `m.GetUnifiedPath()`.
Unlike the v1  implementation that checks `m.GetPaths()["devices"]`, the v2 implementation does not require the device controller to be available.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-10-19 01:59:24 +09:00
Akihiro Suda
d918e7f408 cpuset_v2: skip Apply when no limit is specified
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-10-19 00:33:31 +09:00
Akihiro Suda
033936ef76 io_v2.go: remove blkio v1 code
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-10-18 21:33:48 +09:00
tianye15
28e58a0f6a Support different field counts of cpuaact.stats
Signed-off-by: skilxnTL <tylxltt@gmail.com>
2019-09-29 10:20:58 +08:00
Giuseppe Scrivano
1932917b71 libcontainer: add initial support for cgroups v2
allow to set what subsystems are used by
libcontainer/cgroups/fs.Manager.

subsystemsUnified is used on a system running with cgroups v2 unified
mode.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-09-05 13:02:25 +02:00