1
0
mirror of https://github.com/opencontainers/runc.git synced 2025-08-01 05:06:52 +03:00
Commit Graph

73 Commits

Author SHA1 Message Date
89516d17dd libct/cgroups/readProcsFile: ret errorr if scan failed
Not sure why but the errors from scanner were ignored. Such errors
can happen if open(2) has succeeded but the subsequent read(2) fails.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-17 12:33:01 -07:00
0681d456fc libct/cgroups/utils: move cgroup v1 code to separate file
In most project, "utils" is a big mess, and this is not an exception.
Try to clean it up a bit by moving cgroup v1 specific code to a separate
source file.

There are no code changes in this commit, just moving it from one file
to another.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:45:07 -07:00
7db2d3e146 libcontainer/cgroups: rm FindCgroupMountpointDir
This function is cgroupv1-specific, is only used once, and its name
is very close to the name of another function, FindCgroupMountpoint.

Inline it into the (only) caller.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:40:15 -07:00
d244b4058e libct/cgroups: improve ParseCgroupFile docs
In particular, state that for cgroup v2 the result is very different.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:40:08 -07:00
5785aabc13 libct/cgroups: make isSubsystemAvailable v1-specific
This function is only called from cgroupv1 code, so there is no need
for it to implement cgroupv2 stuff.

Make it v1-specific, and panic if it is called from v2 code (since this
is an internal function, the panic would mean incorrect runc code).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:40:04 -07:00
142d0f2d5d libct/cgroups/utils: make FindCgroupMountpoint* v1-specific
It's bad and wrong to use these functions for any cgroupv2 code,
and there are no existing users (in runc, at least).

Make them return an error in such case.

Also, remove the cgroupv2-specific handling from
findCgroupMountpointAndRootFromReader().

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:39:58 -07:00
44b75e760e libct/cgroups: separate getCgroupMountsV1
This function should not really be used for cgroupv2 code.
Currently it is used in kubernetes code, so we can't remove
the v2 case yet.

Add a TODO item to remove v2 code once kubernetes is converted
to not use it, and separate out v1 code.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:39:06 -07:00
3834222d88 libct/cgroups/utils: getControllerPath return err for v2
This function is not used and were never used in any cgroupv2 code.

To have it stay that way, let it return error in case it's called
for v2.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-15 20:23:59 -07:00
4189cb65f8 cgroups: remove cgroup.Resources.CpuMax
This (and the converting function) is only used by one of the four
cgroup drivers. The other three do some checking and conversion in
place, so let the fs2 do the same.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-09 17:15:38 -07:00
3c6e8ac4d2 cgroupv2: set mem+swap to max if mem set to max
... and mem+swap is not explicitly set otherwise.

This ensures compatibility with cgroupv1 controller which interprets
things this way.

With this fixed, we can finally enable swap tests for cgroupv2.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-22 21:32:16 -07:00
2db3240f35 libct/cgroups: rm GetClosestMountpointAncestor
The function GetClosestMountpointAncestor is not very efficient,
does not really belong to cgroup package, and is only used once
(from fs/cpuset.go).

Remove it, replacing with the implementation based on moby/sys/mountinfo
parser.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-13 17:32:06 -07:00
f160352682 libct/cgroup: prep to rm GetClosestMountpointAncestor
This function is not very efficient, does not really belong to cgroup
package, and is only used once (from fs/cpuset.go).

Prepare to remove it by replacing with the implementation based on
the parser from github.com/moby/sys/mountinfo parser.

This commit is here to make sure the proposed replacement passes the
unit test.

Funny, but the unit test need to be slightly modified since it
supplies the wrong mountinfo (space as the first character, empty line
at the end).

Validated by

 $ go test -v -run Ance
 === RUN   TestGetClosestMountpointAncestor
 --- PASS: TestGetClosestMountpointAncestor (0.00s)
 PASS
 ok  	github.com/opencontainers/runc/libcontainer/cgroups	0.002s

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-13 16:26:16 -07:00
a70f354680 let runc disable swap in cgroup v2
In cgroup v2, when memory and memorySwap set to the same value which is greater than zero,
runc should write zero in `memory.swap.max` to disable swap.

Signed-off-by: lifubang <lifubang@acmcoder.com>
2020-05-03 20:57:36 +08:00
e58a406b77 libcontainer: remove unneeded import
Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>
2020-04-09 20:14:39 +09:00
c86be8a2c1 cgroupv2: fix setting MemorySwap
The resources.MemorySwap field from OCI is memory+swap, while cgroupv2
has a separate swap limit, so subtract memory from the limit (and make
sure values are set and sane).

Make sure to set MemorySwapMax for systemd, too. Since systemd does not
have MemorySwapMax for cgroupv1, it is only needed for v2 driver.

[v2: return -1 on any negative value, add unit test]
[v3: treat any negative value other than -1 as error]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-07 20:45:53 -07:00
b2272b2cba libcontainer: use errors.Is() and errors.As()
Make use of errors.Is() and errors.As() where appropriate to check
the underlying error. The biggest motivation is to simplify the code.

The feature requires go 1.13 but since merging #2256 we are already
not supporting go 1.12 (which is an unsupported release anyway).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-02 20:34:01 -07:00
c39f87a47a Revert "Merge pull request #2280 from kolyshkin/errors-unwrap"
Using errors.Unwrap() is not the best thing to do, since it returns
nil in case of an error which was not wrapped. More to say,
errors package provides more elegant ways to check for underlying
errors, such as errors.As() and errors.Is().

This reverts commit f8e138855d, reversing
changes made to 6ca9d8e6da.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-02 19:41:11 -07:00
272c83e169 libct/cgroups: use errors.Unwrap
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-31 20:07:04 -07:00
b45db5d3b2 libcontainer/cgroup: obsolete Get*Cgroup for v2
These functions should not be called from any code handling
the cgroup2 unified hierarchy.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-26 19:20:00 -07:00
5542a2c77d libcontainer/cgroups: GetAllPids: optimize
1. Return earlier if there is an error.

2. Do not use filepath.Split on every entry, use info.Name() instead.

3. Make readProcsFile() accept file name as an argument, to avoid
   unnecessary file name and directory splitting and merging.

4. Skip on info.IsDir() -- this avoids an error when cgroup name is
   set to "cgroup.procs".

This is still not very good since filepath.Walk() performs an unnecessary
stat(2) on every entry, but better than before.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-20 12:27:36 -07:00
aa269315a4 cgroup2: add CpuMax conversion
Fix #2243

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-03-13 02:58:39 +09:00
64e9a97981 cgroup2: fix conversion
* TestConvertCPUSharesToCgroupV2Value(0) was returning 70369281052672, while the correct value is 0
* ConvertBlkIOToCgroupV2Value(0) was returning 32, while the correct value is 0
* ConvertBlkIOToCgroupV2Value(1000) was returning 4, while the correct value is 10000

Fix #2244
Follow-up to #2212 #2213

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-03-13 02:57:07 +09:00
4b8134f63b Convert blkioWeight to io.weight properly
Signed-off-by: Boris Popovschi <zyqsempai@mail.ru>
2020-02-18 15:44:07 +02:00
7c439cc6f6 Added conversion for cpu.weight v2
Signed-off-by: Boris Popovschi <zyqsempai@mail.ru>
2020-02-12 11:32:34 +02:00
74a3fe5d1b cgroup2: do not parse /proc/cgroups
/proc/cgroups is meaningless for v2 and should be ignored.

https://github.com/torvalds/linux/blob/v5.3/Documentation/admin-guide/cgroup-v2.rst#deprecated-v1-core-features

* Now GetAllSubsystems() parses /sys/fs/cgroup/cgroup.controller, not /proc/cgroups.
  The function result also contains "pseudo" controllers: {"devices", "freezer"}.
  As it is hard to detect availability of pseudo controllers, pseudo controllers
  are always assumed to be available.

* Now IOGroupV2.Name() returns "io", not "blkio"

Fix #2155 #2156

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-10-28 00:00:33 +09:00
b28f58f31b Set unified mountpoint in find mnt func
This is needed for the fsv2 cgroups to work when there is a unified mountpoint.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-10-15 15:40:03 -04:00
1932917b71 libcontainer: add initial support for cgroups v2
allow to set what subsystems are used by
libcontainer/cgroups/fs.Manager.

subsystemsUnified is used on a system running with cgroups v2 unified
mode.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-09-05 13:02:25 +02:00
6f77e35daf Export list of HugePageSizeUnits
This will allow others to import it instead of copying it.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
2019-05-30 20:17:30 +02:00
c6445b1c1c Add tests for GetHugePageSize
Add tests to avoid regressions

Signed-off-by: Odin Ugedal <odin@ugedal.com>
2019-05-30 17:27:32 +02:00
273e7b74a7 Fix cgroup hugetlb size prefix for kB
The hugetlb cgroup control files (introduced here in 2012:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=abb8206cb0773)
use "KB" and not "kB"
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/hugetlb_cgroup.c?h=v5.0#n349).

The behavior in the kernel has not changed since the introduction, and
the current code using "kB" will therefore fail on devices with small
amounts of ram (see
https://github.com/kubernetes/kubernetes/issues/77169) running a kernel
with config flag CONFIG_HUGETLBFS=y

As seen from the code in "mem_fmt" inside hugetlb_cgroup.c, only "KB",
"MB" and "GB" are used, so the others may be removed as well.

Here is a real world example of the files inside the
"/sys/kernel/mm/hugepages/" directory:
- "hugepages-64kB"
- "hugepages-2048kB"
- "hugepages-32768kB"
- "hugepages-1048576kB"

And the corresponding cgroup files:
- "hugetlb.64KB._____"
- "hugetlb.2MB._____"
- "hugetlb.32MB._____"
- "hugetlb.1GB._____"

Signed-off-by: Odin Ugedal <odin@ugedal.com>
2019-05-29 21:52:43 +02:00
bdf3524b34 Retry adding pids to cgroups when EINVAL occurs
The kernel will sometimes return EINVAL when writing a pid to a
cgroup.procs file. It does so when the task being added still has the
state TASK_NEW.

See: https://elixir.bootlin.com/linux/v4.8/source/kernel/sched/core.c#L8286

Co-authored-by: Danail Branekov <danailster@gmail.com>

Signed-off-by: Tom Godkin <tgodkin@pivotal.io>
Signed-off-by: Danail Branekov <danailster@gmail.com>
2018-12-17 15:34:47 +00:00
769d6c4a75 Fix some typos
Signed-off-by: JoeWrightss <zhoulin.xie@daocloud.io>
2018-12-09 23:52:54 +08:00
76520a4bf0 Merge pull request #1872 from masters-of-cats/better-find-cgroup-mountpoint
Respect container's cgroup path
2018-11-16 14:06:54 -05:00
df3fa115f9 Add support for cgroup namespace
Cgroup namespace can be configured in `config.json` as other
namespaces. Here is an example:

```
"namespaces": [
	{
		"type": "pid"
	},
	{
		"type": "network"
	},
	{
		"type": "ipc"
	},
	{
		"type": "uts"
	},
	{
		"type": "mount"
	},
	{
		"type": "cgroup"
	}
],

```

Note that if you want to run a container which has shared cgroup ns with
another container, then it's strongly recommended that you set
proper `CgroupsPath` of both containers(the second container's cgroup
path must be the subdirectory of the first one). Or there might be
some unexpected results.

Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-10-31 10:51:43 -04:00
a1d5398afa Respect container's cgroup path
Respect the container's cgroup path when finding the container's
cgroup mount point, which is useful in multi-tenant environments, where
containers have their own unique cgroup mounts

Signed-off-by: Danail Branekov <danailster@gmail.com>
Signed-off-by: Oliver Stenbom <ostenbom@pivotal.io>
Signed-off-by: Giuseppe Capizzi <gcapizzi@pivotal.io>
2018-09-25 17:43:36 +01:00
578fe65e4f merge branch 'pr-1817'
Fix duplicate entries and missing entries in getCgroupMountsHelper
  Add test for testing cgroup mounts on bedrock linux
  Stop relying on number of subsystems for cgroups

LGTMs: @crosbymichael @cyphar
Closes #1817
2018-09-19 19:48:17 +10:00
feb90346e0 doc: fix typo
Signed-off-by: Yan Zhu <yanzhu@alauda.io>
2018-09-07 11:58:59 +08:00
a2faaa1317 Fix duplicate entries and missing entries in getCgroupMountsHelper
Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2018-07-31 20:12:18 -07:00
5ee0648bfb Stop relying on number of subsystems for cgroups
When there are complicated mount setups, there can be multiple mount
points which have the subsystem we are looking for. Instead of
counting the mountpoints, tick off subsystems until we have found them
all.

Without the 'all' flag, ignore duplicate subsystems after the first.

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2018-06-24 00:00:58 +01:00
18cd7e06f7 Merge pull request #1372 from cloudfoundry-incubator/cpuset-mount-root
Handle container creation when cgroups have already been mounted in another location
2017-05-25 09:53:57 -07:00
baeef29858 rootless: add rootless cgroup manager
The rootless cgroup manager acts as a noop for all set and apply
operations. It is just used for rootless setups. Currently this is far
too simple (we need to add opportunistic cgroup management), but is good
enough as a first-pass at a noop cgroup manager.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-03-23 20:46:20 +11:00
f5c5aac958 Create containers when cgroups already mounted
Runc needs to copy certain files from the top of the cgroup cpuset hierarchy
into the container's cpuset cgroup directory. Currently, runc determines
which directory is the top of the hierarchy by using the parent dir of
the first entry in /proc/self/mountinfo of type cgroup.

This creates problems when cgroup subsystems are mounted arbitrarily in
different dirs on the host.

Now, we use the most deeply nested mountpoint that contains the
container's cpuset cgroup directory.

Signed-off-by: Konstantinos Karampogias <konstantinos.karampogias@swisscom.com>
Signed-off-by: Will Martin <wmartin@pivotal.io>
2017-03-15 10:10:30 +00:00
0fefa36f3a Merge pull request #1278 from datawolf/scanner
move error check out of the for loop
2017-01-20 17:49:44 +00:00
b8cefd7d8f Merge pull request #1266 from mrunalp/ignore_cgroup_v2
Ignore cgroup2 mountpoints
2017-01-20 17:26:46 +00:00
3a71eb0256 move error check out of the for loop
The `bufio.Scanner.Scan` method returns false either by reaching the
end of the input or an error. After Scan returns false, the Err method
will return any error that occurred during scanning, except that if it
was io.EOF, Err will return nil.

We should check the error when Scan return false(out of the for loop).

Signed-off-by: Wang Long <long.wanglong@huawei.com>
2017-01-18 05:02:39 +00:00
e7b57cb042 Ignore cgroup2 mountpoints
Our current cgroup parsing logic assumes cgroup v1 mounts
so we should ignore cgroup2 mounts for now

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-01-11 12:34:50 -08:00
4732f46fd9 small refactor
Signed-off-by: Wang Long <long.wanglong@huawei.com>
2017-01-04 11:39:44 +08:00
f557996401 Add flag to allow getting all mounts for cgroups subsystems
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-09-15 15:19:27 -04:00
ebe85bf180 Allow cgroup creation without attaching a pid
Signed-off-by: Buddha Prakash <buddhap@google.com>
2016-07-20 13:49:48 -07:00
394610a396 cgroups: Parse correctly if cgroup path contains :
Prior to this change a cgroup with a `:` character in it's path was not
parsed correctly (as occurs on some instances of systemd cgroups under
some versions of systemd, e.g. 225 with accounting).

This fixes that issue and adds a test.

Signed-off-by: Euan Kemp <euank@coreos.com>
2016-06-10 23:09:03 -07:00