1
0
mirror of https://github.com/opencontainers/runc.git synced 2025-09-18 05:23:14 +03:00

217 Commits

Author SHA1 Message Date
Kir Kolyshkin
89e59902c4 Modernize code for Go 1.24
Brought to you by

	modernize -fix -test ./...

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-08-27 19:11:02 -07:00
Kir Kolyshkin
17570625c0 Use for range over integers
This appears in Go 1.22 (see https://tip.golang.org/ref/spec#For_range).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-03-31 17:15:06 -07:00
Kir Kolyshkin
a638f1330b .golangci.yml: add nolintlint, fix found issues
The errrolint linter can finally ignore errors from Close,
and it also ignores direct comparisons of errors from x/sys/unix.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-03-24 11:59:54 -07:00
Kir Kolyshkin
65e0f2b719 libct/int: use destroyContainer
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-03-24 10:02:47 -07:00
Kir Kolyshkin
1aebfa3eab libct/int: don't use _ = runContainerOk
There is no need to explicitly ignore returned value.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-03-24 10:02:47 -07:00
Kir Kolyshkin
5ac77ed6d9 libct/int: add/use needUserNS helper
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-03-13 08:40:27 -07:00
Kir Kolyshkin
a75076b4a4 Switch to opencontainers/cgroups
This removes libcontainer/cgroups packages and starts
using those from github.com/opencontainers/cgroups repo.

Mostly generated by:

  git rm -f libcontainer/cgroups

  find . -type f -name "*.go" -exec sed -i \
    's|github.com/opencontainers/runc/libcontainer/cgroups|github.com/opencontainers/cgroups|g' \
    {} +

  go get github.com/opencontainers/cgroups@v0.0.1
  make vendor
  gofumpt -w .

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-02-28 15:20:33 -08:00
Kir Kolyshkin
7dc2486889 libct: switch to numeric UID/GID/groups
This addresses the following TODO in the code (added back in 2015
by commit 845fc65e5):

> // TODO: fix libcontainer's API to better support uid/gid in a typesafe way.

Historically, libcontainer internally uses strings for user, group, and
additional (aka supplementary) groups.

Yet, runc receives those credentials as part of runtime-spec's process,
which uses integers for all of them (see [1], [2]).

What happens next is:

1. runc start/run/exec converts those credentials to strings (a User
   string containing "UID:GID", and a []string for additional GIDs) and
   passes those onto runc init.
2. runc init converts them back to int, in the most complicated way
   possible (parsing container's /etc/passwd and /etc/group).

All this conversion and, especially, parsing is totally unnecessary,
but is performed on every container exec (and start).

The only benefit of all this is, a libcontainer user could use user and
group names instead of numeric IDs (but runc itself is not using this
feature, and we don't know if there are any other users of this).

Let's remove this back and forth translation, hopefully increasing
runc exec performance.

The only remaining need to parse /etc/passwd is to set HOME environment
variable for a specified UID, in case $HOME is not explicitly set in
process.Env. This can now be done right in prepareEnv, which simplifies
the code flow a lot. Alas, we can not use standard os/user.LookupId, as
it could cache host's /etc/passwd or the current user (even with the
osusergo tag).

PS Note that the structures being changed (initConfig and Process) are
never saved to disk as JSON by runc, so there is no compatibility issue
for runc users.

Still, this is a breaking change in libcontainer, but we never promised
that libcontainer API will be stable (and there's a special package
that can handle it -- github.com/moby/sys/user). Reflect this in
CHANGELOG.

For 3998.

[1]: https://github.com/opencontainers/runtime-spec/blob/v1.0.2/config.md#posix-platform-user
[2]: https://github.com/opencontainers/runtime-spec/blob/v1.0.2/specs-go/config.go#L86

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-02-06 17:49:17 -08:00
Kir Kolyshkin
6c9ddcc648 libct: switch from libct/devices to libct/cgroups/devices/config
Use the old package name as an alias to minimize the patch.

No functional change; this just eliminates a bunch of deprecation
warnings.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-01-31 16:51:09 -08:00
Kir Kolyshkin
6171da6005 libct/configs: add HookList.SetDefaultEnv
1. Make CommandHook.Command a pointer, which reduces the amount of data
   being copied when using hooks, and allows to modify command hooks.

2. Add SetDefaultEnv, which is to be used by the next commit.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-01-09 18:22:53 +08:00
Kir Kolyshkin
390641d148 libct/int: improve TestExecInEnvironment
This is a slight refactor of TestExecInEnvironment, making it more
strict wrt checking the exec output.

1. Explain why DEBUG is added twice to the env.
2. Reuse the execEnv for the check.
3. Make the check more strict -- instead of looking for substrings,
   check line by line.
4. Add a check for extra environment variables.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-01-09 18:22:53 +08:00
Kir Kolyshkin
9a54594752 libct/int: add BenchmarkExecInBigEnv
Here's what it shows on my laptop (with -count 10 -benchtime 10s,
summarized by benchstat):

	                │   sec/op    │
	ExecTrue-20       8.477m ± 2%
	ExecInBigEnv-20   61.53m ± 1%

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-01-09 18:22:53 +08:00
Kir Kolyshkin
a56f85f87b libct/*: switch from configs to cgroups
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-12-11 19:08:40 -08:00
Kir Kolyshkin
1e674098f5 libct/int: add exec benchmark
This is a benchmark which checks how fast we can execute /bin/true
inside a container.

Results from my machine are below. As you can see, in default setup
about 70% of exec time is spent for CVE-2019-5736 (copying runc binary),
and using either RUNC_DMZ=true or memfd-bind helps a lot.

This can also be used for profiling (using -test.cpuprofile option).

=== Default setup ===

[kir@kir-tp1 integration]$ sudo ./integration.test -test.run xxx -test.v -test.benchtime 5s -test.count 5 -test.bench . .
goos: linux
goarch: amd64
pkg: github.com/opencontainers/runc/libcontainer/integration
cpu: 12th Gen Intel(R) Core(TM) i7-12800H
BenchmarkExecTrue
BenchmarkExecTrue-20    	     327	  24475677 ns/op
BenchmarkExecTrue-20    	     244	  25242718 ns/op
BenchmarkExecTrue-20    	     232	  26187174 ns/op
BenchmarkExecTrue-20    	     237	  26780030 ns/op
BenchmarkExecTrue-20    	     318	  18487219 ns/op
PASS

=== With DMZ enabled ===

[kir@kir-tp1 integration]$ sudo -E RUNC_DMZ=true ./integration.test -test.run xxx -test.v -test.benchtime 5s -test.count 5 -test.bench . .
goos: linux
goarch: amd64
pkg: github.com/opencontainers/runc/libcontainer/integration
cpu: 12th Gen Intel(R) Core(TM) i7-12800H
BenchmarkExecTrue
BenchmarkExecTrue-20    	     694	   8263744 ns/op
BenchmarkExecTrue-20    	     778	   8483228 ns/op
BenchmarkExecTrue-20    	     784	   8456018 ns/op
BenchmarkExecTrue-20    	     732	   8160239 ns/op
BenchmarkExecTrue-20    	     769	   8236972 ns/op
PASS

=== With memfd-bind ===

[kir@kir-tp1 integration]$ sudo systemctl start  memfd-bind@$(systemd-escape -p $PWD/integration.test)
[kir@kir-tp1 integration]$ sudo ./integration.test -test.run xxx -test.v -test.benchtime 5s -test.count 5 -test.bench . .
goos: linux
goarch: amd64
pkg: github.com/opencontainers/runc/libcontainer/integration
cpu: 12th Gen Intel(R) Core(TM) i7-12800H
BenchmarkExecTrue
BenchmarkExecTrue-20    	     800	   7538839 ns/op
BenchmarkExecTrue-20    	     717	   7424755 ns/op
BenchmarkExecTrue-20    	     848	   7747787 ns/op
BenchmarkExecTrue-20    	     800	   7668740 ns/op
BenchmarkExecTrue-20    	     751	   7304373 ns/op
PASS

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-10-24 13:39:26 -07:00
Kir Kolyshkin
cb20148703 libct/int: use testing.TB for utils
...so that they can be used for benchmarks, too.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-10-24 13:39:26 -07:00
Kir Kolyshkin
0de1953333 runc spec, libct/int: do not add ambient capabilities
Commit 98fe566c removed inheritable capabilities from the example spec
(used by runc spec) and from the libcontainer/integration test config,
but neglected to also remove ambient capabilities.

An ambient capability could only be set if the same inheritable
capability is set, so as a result of the above change ambient
capabilities were not set (but due to a bug in gocapability package,
those errors are never reported).

Once we start using a library with the fix [1], that bug will become
apparent (both bats-based and libct/int tests will fail).

[1]: https://github.com/kolyshkin/capability/pull/3

Fixes: 98fe566c ("runc: do not set inheritable capabilities")
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-09-25 21:48:29 -07:00
Kir Kolyshkin
2c398bb41d libct/int/seccomp_test: simplify exit code checks
In all the three cases, we check that the program returned non-zero exit
code. This can be done in a much simpler manner.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-08-10 14:46:43 -07:00
Aleksa Sarai
81d63f265a merge #4331 into opencontainers/runc:main
Sebastiaan van Stijn (1):
  libct/userns: split userns detection from internal userns code

LGTMs: kolyshkin cyphar
2024-07-03 14:24:05 +10:00
Sebastiaan van Stijn
30b530ca94 libct/userns: split userns detection from internal userns code
Commit 4316df8b53 isolated RunningInUserNS
to a separate package to make it easier to consume without bringing in
additional dependencies, and with the potential to move it separate in
a similar fashion as libcontainer/user was moved to a separate module
in commit ca32014adb. While RunningInUserNS
is fairly trivial to implement, it (or variants of this utility) is used
in many codebases, and moving to a separate module could consolidate
those implementations, as well as making it easier to consume without
large dependency trees (when being a package as part of a larger code
base).

Commit 1912d5988b and follow-ups introduced
cgo code into the userns package, and code introduced in those commits
are not intended for external use, therefore complicating the potential
of moving the userns package separate.

This commit moves the new code to a separate package; some of this code
was included in v1.1.11 and up, but I could not find external consumers
of `GetUserNamespaceMappings` and `IsSameMapping`. The `Mapping` and
`Handles` types (added in ba0b5e2698) only
exist in main and in non-stable releases (v1.2.0-rc.x), so don't need
an alias / deprecation.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-06-30 20:06:30 +02:00
Sebastiaan van Stijn
c14213399a remove pre-go1.17 build-tags
Removed pre-go1.17 build-tags with go fix;

    go fix -mod=readonly ./...

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-06-29 15:45:25 +02:00
Kir Kolyshkin
42cea2ecb4 libct: don't allow to start second init process
By definition, every container has only 1 init (i.e. PID 1) process.

Apparently, libcontainer API supported running more than 1 init, and
at least one tests mistakenly used it.

Let's not allow that, erroring out if we already have init. Doing
otherwise _probably_ results in some confusion inside the library.

Fix two cases in libct/int which ran two inits inside a container.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-06-10 22:30:03 -07:00
Kir Kolyshkin
36be6d0510 libct/int: checkpoint test: skip pre-dump if not avail
Since we're now testing on ARM, the test case fails when trying to do
pre-dump since MemTrack is not available.

Skip the pre-dump part if so.

This also reverts part of commit 3f4a73d6 as it is no longer needed
(now, instead of skipping the whole test, we're just skipping the
pre-dump).

[Review with --ignore-all-space]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-05-10 11:04:58 -07:00
Kir Kolyshkin
e42d981d67 libct/int: rm double logging in checkpoint_test
Since commit c77aaa3f the tail of criu log is printed by runc, so
there's no need to do the same thing in tests.

This also fixes a test failure on ARM where showLog fails (because
there's no log file) and thus the conditional t.Skip is not called.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-05-09 15:18:29 -07:00
Kir Kolyshkin
62a314656a libct/int/cpt: simplify test pre-check
criu check --feature userns also tests for the /proc/self/ns/user
presense, so remove the redundant check, and simplify the error message.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-05-09 15:18:29 -07:00
lifubang
4ea0bf88fd update/add some tests for rlimit
issues:
https://github.com/opencontainers/runc/issues/4195
https://github.com/opencontainers/runc/pull/4265#discussion_r1588599809

Signed-off-by: lifubang <lifubang@acmcoder.com>
2024-05-08 10:57:10 +00:00
Aleksa Sarai
8e1cd2f56d init: verify after chdir that cwd is inside the container
If a file descriptor of a directory in the host's mount namespace is
leaked to runc init, a malicious config.json could use /proc/self/fd/...
as a working directory to allow for host filesystem access after the
container runs. This can also be exploited by a container process if it
knows that an administrator will use "runc exec --cwd" and the target
--cwd (the attacker can change that cwd to be a symlink pointing to
/proc/self/fd/... and wait for the process to exec and then snoop on
/proc/$pid/cwd to get access to the host). The former issue can lead to
a critical vulnerability in Docker and Kubernetes, while the latter is a
container breakout.

We can (ab)use the fact that getcwd(2) on Linux detects this exact case,
and getcwd(3) and Go's Getwd() return an error as a result. Thus, if we
just do os.Getwd() after chdir we can easily detect this case and error
out.

In runc 1.1, a /sys/fs/cgroup handle happens to be leaked to "runc
init", making this exploitable. On runc main it just so happens that the
leaked /sys/fs/cgroup gets clobbered and thus this is only consistently
exploitable for runc 1.1.

Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626
Co-developed-by: lifubang <lifubang@acmcoder.com>
Signed-off-by: lifubang <lifubang@acmcoder.com>
[refactored the implementation and added more comments]
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2024-01-24 00:20:58 +11:00
Akihiro Suda
3f4a73d632 TestCheckpoint: skip on ErrCriuMissingFeatures
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-12-19 18:47:03 +09:00
Aleksa Sarai
8e8b136c49 tree-wide: use /proc/thread-self for thread-local state
With the idmap work, we will have a tainted Go thread in our
thread-group that has a different mount namespace to the other threads.
It seems that (due to some bad luck) the Go scheduler tends to make this
thread the thread-group leader in our tests, which results in very
baffling failures where /proc/self/mountinfo produces gibberish results.

In order to avoid this, switch to using /proc/thread-self for everything
that is thread-local. This primarily includes switching all file
descriptor paths (CLONE_FS), all of the places that check the current
cgroup (technically we never will run a single runc thread in a separate
cgroup, but better to be safe than sorry), and the aforementioned
mountinfo code. We don't need to do anything for the following because
the results we need aren't thread-local:

 * Checks that certain namespaces are supported by stat(2)ing
   /proc/self/ns/...

 * /proc/self/exe and /proc/self/cmdline are not thread-local.

 * While threads can be in different cgroups, we do not do this for the
   runc binary (or libcontainer) and thus we do not need to switch to
   the thread-local version of /proc/self/cgroups.

 * All of the CLONE_NEWUSER files are not thread-local because you
   cannot set the usernamespace of a single thread (setns(CLONE_NEWUSER)
   is blocked for multi-threaded programs).

Note that we have to use runtime.LockOSThread when we have an open
handle to a tid-specific procfs file that we are operating on multiple
times. Go can reschedule us such that we are running on a different
thread and then kill the original thread (causing -ENOENT or similarly
confusing errors). This is not strictly necessary for most usages of
/proc/thread-self (such as using /proc/thread-self/fd/$n directly) since
only operating on the actual inodes associated with the tid requires
this locking, but because of the pre-3.17 fallback for CentOS, we have
to do this in most cases.

In addition, CentOS's kernel is too old for /proc/thread-self, which
requires us to emulate it -- however in rootfs_linux.go, we are in the
container pid namespace but /proc is the host's procfs. This leads to
the incredibly frustrating situation where there is no way (on pre-4.1
Linux) to figure out which /proc/self/task/... entry refers to the
current tid. We can just use /proc/self in this case.

Yes this is all pretty ugly. I also wish it wasn't necessary.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-12-14 11:36:41 +11:00
Aleksa Sarai
09822c3da8 configs: disallow ambiguous userns and timens configurations
For userns and timens, the mappings (and offsets, respectively) cannot
be changed after the namespace is first configured. Thus, configuring a
container with a namespace path to join means that you cannot also
provide configuration for said namespace. Previously we would silently
ignore the configuration (and just join the provided path), but we
really should be returning an error (especially when you consider that
the configuration userns mappings are used quite a bit in runc with the
assumption that they are the correct mapping for the userns -- but in
this case they are not).

In the case of userns, the mappings are also required if you _do not_
specify a path, while in the case of the time namespace you can have a
container with a timens but no mappings specified.

It should be noted that the case checking that the user has not
specified a userns path and a userns mapping needs to be handled in
specconv (as opposed to the configuration validator) because with this
patchset we now cache the mappings of path-based userns configurations
and thus the validator can't be sure whether the mapping is a cached
mapping or a user-specified one. So we do the validation in specconv,
and thus the test for this needs to be an integration test.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-12-05 17:46:09 +11:00
Aleksa Sarai
f81ef1493d libcontainer: sync: cleanup synchronisation code
This includes quite a few cleanups and improvements to the way we do
synchronisation. The core behaviour is unchanged, but switching to
embedding json.RawMessage into the synchronisation structure will allow
us to do more complicated synchronisation operations in future patches.

The file descriptor passing through the synchronisation system feature
will be used as part of the idmapped-mount and bind-mount-source
features when switching that code to use the new mount API outside of
nsexec.c.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-08-15 19:54:24 -07:00
Kir Kolyshkin
883aef789b libct/init: unify init, fix its error logic
This commit does two things:

1. Consolidate StartInitialization calling logic into Init().
2. Fix init error handling logic.

The main issues at hand are:
- the "unable to convert _LIBCONTAINER_INITPIPE" error from
  StartInitialization is never shown;
- errors from WriteSync and WriteJSON are never shown;
- the StartInit calling code is triplicated;
- using panic is questionable.

Generally, our goals are:
 - if there's any error, do our best to show it;
 - but only show each error once;
 - simplify the code, unify init implementations.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-08-04 13:00:35 -07:00
Francis Laniel
c47f58c4e9 Capitalize [UG]idMappings as [UG]IDMappings
Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com>
2023-07-21 13:55:34 +02:00
Kir Kolyshkin
f8ad20f500 runc kill: drop -a option
As of previous commit, this is implied in a particular scenario. In
fact, this is the one and only scenario that justifies the use of -a.

Drop the option from the documentation. For backward compatibility, do
recognize it, and retain the feature of ignoring the "container is
stopped" error when set.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-06-08 09:30:40 -07:00
Kir Kolyshkin
9583b3d1c2 libct: move killing logic to container.Signal
By default, the container has its own PID namespace, and killing (with
SIGKILL) its init process from the parent PID namespace also kills all
the other processes.

Obviously, it does not work that way when the container is sharing its
PID namespace with the host or another container, since init is no
longer special (it's not PID 1). In this case, killing container's init
will result in a bunch of other processes left running (and thus the
inability to remove the cgroup).

The solution to the above problem is killing all the container
processes, not just init.

The problem with the current implementation is, the killing logic is
implemented in libcontainer's initProcess.wait, and thus only available
to libcontainer users, but not the runc kill command (which uses
nonChildProcess.kill and does not use wait at all). So, some workarounds
exist:
 - func destroy(c *Container) calls signalAllProcesses;
 - runc kill implements -a flag.

This code became very tangled over time. Let's simplify things by moving
the killing all processes from initProcess.wait to container.Signal,
and documents the new behavior.

In essence, this also makes `runc kill` to automatically kill all container
processes when the container does not have its own PID namespace.
Document that as well.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-06-08 09:29:25 -07:00
Kir Kolyshkin
2a7dcbbb40 libct: fix shared pidns detection
When someone is using libcontainer to start and kill containers from a
long lived process (i.e. the same process creates and removes the
container), initProcess.wait method is used, which has a kludge to work
around killing containers that do not have their own PID namespace.

The code that checks for own PID namespace is not entirely correct.
To be exact, it does not set sharePidns flag when the host/caller PID
namespace is implicitly used. As a result, the above mentioned kludge
does not work.

Fix the issue, add a test case (which fails without the fix).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-06-08 09:23:29 -07:00
Kir Kolyshkin
5b8f8712a4 libct: signalAllProcesses: remove child reaping
There are two very distinct usage scenarios for signalAllProcesses:

* when used from the runc binary ("runc kill" command), the processes
  that it kills are not the children of "runc kill", and so calling
  wait(2) on each process is totally useless, as it will return ECHLD;

* when used from a program that have created the container (such as
  libcontainer/integration test suite), that program can and should call
  wait(2), not the signalling code.

So, the child reaping code is totally useless in the first case, and
should be implemented by the program using libcontainer in the second
case. I was not able to track down how this code was added, my best
guess is it happened when this code was part of dockerd, which did not
have a proper child reaper implemented at that time.

Remove it, and add a proper documentation piece.

Change the integration test accordingly.

PS the first attempt to disable the child reaping code in
signalAllProcesses was made in commit bb912eb00c, which used a
questionable heuristic to figure out whether wait(2) should be called.
This heuristic worked for a particular use case, but is not correct in
general.

While at it:
 - simplify signalAllProcesses to use unix.Kill;
 - document (container).Signal.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-06-08 09:23:29 -07:00
Kir Kolyshkin
7e481ee2eb libct/int: remove logger from init
Currently, TestInit sets up logrus, and init uses it to log an error
from StartInitialization(). This is solely used by TestExecInError
to check that error returned from StartInitialization is the one it
expects.

Note that the very same error is communicated to the runc init parent
and is ultimately returned by container.Run(), so checking what
StartInitialization returned is redundant.

Remove logrus setup and use from TestMain/init.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-05-17 12:46:27 -07:00
utam0k
d9230602e9 Implement to set a domainname
opencontainers/runtime-spec#1156

Signed-off-by: utam0k <k0ma@utam0k.jp>
2023-04-12 13:31:20 +00:00
Kir Kolyshkin
f2e71b085d libct/int: make TestFdLeaks more robust
The purpose of this test is to check that there are no extra file
descriptors left open after repeated calls to runContainer. In fact,
the first call to runContainer leaves a few file descriptors opened,
and this is by design.

Previously, this test relied on two things:
1. some other tests were run before it (and thus all such opened-once
   file descriptors are already opened);
2.  explicitly excluding fd opened to /sys/fs/cgroup.

Now, if we run this test separately, it will fail (because of 1 above).
The same may happen if the tests are run in a random order.

To fix this, add a container run before collection the initial fd list,
so those fds that are opened once are included and won't be reported.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-02-22 02:58:47 -08:00
Kir Kolyshkin
be7e03940f libct/int: wording nits
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-02-22 02:58:47 -08:00
Kir Kolyshkin
7c75e84e22 libc/int: add/use runContainerOk wrapper
This is to de-duplicate the code that checks that err is nil
and that the exit code is zero.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-02-22 02:58:47 -08:00
Kir Kolyshkin
4fd4af5b1c CI: workaround CentOS Stream 9 criu issue
Older criu builds fail to work properly on CentOS Stream 9 due to
changes in glibc's rseq.

Skip criu tests if an older criu version is found.

Fixes: https://github.com/opencontainers/runc/issues/3532

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-07-27 17:51:41 -07:00
Kir Kolyshkin
b6967fa84c Decouple cgroup devices handling
This commit separates the functionality of setting cgroup device
rules out of libct/cgroups to libct/cgroups/devices package. This
package, if imported, sets the function variables in libct/cgroups and
libct/cgroups/systemd, so that a cgroup manager can use those to manage
devices. If those function variables are nil (when libct/cgroups/devices
are not imported), a cgroup manager returns the ErrDevicesUnsupported
in case any device rules are set in Resources.

It also consolidates the code from libct/cgroups/ebpf and
libct/cgroups/ebpf/devicefilter into libct/cgroups/devices.

Moved some tests in libct/cg/sd that require device management to
libct/sd/devices.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-05-18 11:17:08 -07:00
Kir Kolyshkin
98fe566c52 runc: do not set inheritable capabilities
Do not set inheritable capabilities in runc spec, runc exec --cap,
and in libcontainer integration tests.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-05-12 08:14:50 +10:00
Kir Kolyshkin
102b8abd26 libct: rm BaseContainer and Container interfaces
The only implementation of these is linuxContainer. It does not make
sense to have an interface with a single implementation, and we do not
foresee other types of containers being added to runc.

Remove BaseContainer and Container interfaces, moving their methods
documentation to linuxContainer.

Rename linuxContainer to Container.

Adopt users from using interface to using struct.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-03-23 11:04:12 -07:00
Kir Kolyshkin
6a3fe1618f libcontainer: remove LinuxFactory
Since LinuxFactory has become the means to specify containers state
top directory (aka --root), and is only used by two methods (Create
and Load), it is easier to pass root to them directly.

Modify all the users and the docs accordingly.

While at it, fix Create and Load docs (those that were originally moved
from the Factory interface docs).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-03-22 23:44:31 -07:00
Kir Kolyshkin
8358a0ecbb libct: StartInitialization: decouple from factory
StartInitialization does not have to be a method of Factory (while
it is clear why it was done that way initially, now we only have
Linux containers so it does not make sense).

Fix callers and docs accordingly.

No change in functionality.

Also, since this was the only user of libcontainer.New with the empty
string as an argument, the corresponding check can now be removed
from it.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-03-22 23:29:24 -07:00
Kir Kolyshkin
0fec1c2d8c libct: Mount: rm {Pre,Post}mountCmds
Those were added by commit 59c5c3ac0 back in Apr 2015, but AFAICS were
never used and are obsoleted by more generic container hooks (initially
added by commit 05567f2c94 in Sep 2015).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-01-26 15:51:55 -08:00
Kir Kolyshkin
953e56c56f libct/int: runContainer: drop console arg
It is not and was never ever used.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-11-29 20:10:22 -08:00
Kir Kolyshkin
972aea3af0 libct/configs/validate: allow / in sysctl names
Runtime spec says:

> sysctl (object, OPTIONAL) allows kernel parameters to be modified at
> runtime for the container. For more information, see the sysctl(8)
> man page.

and sysctl(8) says:

> variable
>    The name of a key to read from. An example is
>    kernel.ostype. The '/' separator is also accepted in place of a '.'.

Apparently, runc config validator do not support sysctls with / as a
separator. Fortunately this is a one-line fix.

Add some more test data where / is used as a separator.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-10-29 09:45:55 -07:00