1
0
mirror of https://github.com/opencontainers/runtime-spec.git synced 2025-09-18 05:27:41 +03:00

216 Commits

Author SHA1 Message Date
Akihiro Suda
bfdffd548a Merge pull request #1282 from askervin/5aD-oci-mempolicy
Add support for Linux memory policy
2025-08-04 17:16:26 +09:00
Markus Lehtonen
34a39b9070 config-linux: add intelRdt.enableMonitoring (#1287)
Add a parameter for enabling per-container resctrl monitoring.

This supersedes and replaces the previous "enableCMT" and "enableMBM"
settings whose functionality was very vaguely specified. Separate
parameter for every monitoring metric does not seem to make much sense, in
particular because in the resctrl filesystem it is not possible to
selectively enable a subset of the monitoring features. You always get
all the metrics that the system provides. Also, with separate settings
(and corresponding check if the specific metric is available) the user
cannot specify "enable whatever is available" - setting everything to
"true" might fail because one of the metrics is not available on the
platform. In addition, having separate parameters is very
future-unproof, making support for new monitoring metrics unnecessarily
cumbersome to add. New metrics are certain to be added in new hardware
generations, e.g. perf/energy monitoring in the near future
(https://lkml.org/lkml/2025/5/21/1631), and requiring an update to the
runtime-spec for each one of them feels like an overkill without much
benefits. It is easier to have one switch for "enable container-specific
metrics" and let the user read whatever metrics the platform provides.

Moreover, it is not even possible to turn off monitoring (from the
resctrl filesystem). For example, you always get the metrics for all
CTRL_MON groups (closIDs). However, that is not always very useful as
there likely are a lot of applications packed in the same group. The new
intelRdt.enableMontoring parameter will enable creation of a MON group
specific to a single container allowing monitoring of resctrl metrics on
per-container granularity.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2025-06-29 12:08:38 +09:00
Markus Lehtonen
d2f4f9097a config-linux: add schemata field to IntelRdt (#1230)
* config-linux: add schemata field to IntelRdt

Add a new "schemata" field to the Linux IntelRdt configuration. This
addresses the complexity of separate schema fields and resolves the
issue of supporting currently uncovered RDT features like L2 cache
allocation and CDP (Code and Data Prioritization).

The new field is for specifying the complete schemata (all schemas) to
be written to the schemata file in Linux resctrl fs. The aim is for
simple usage and runtime implementation (by not requiring any
parsing/filtering of data or otherwise re-implement parsing or
validation of the Linux resctrl interface) and also to support all RDT
features now and in the future (i.e. schemas like L2, L2CODE, L2DATA,
L3CODE and L3DATA and who knows L4 or something else in the future).

Behavior of existing fields is not changed but it is required that the
new schemata field is applied last.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>

* Add linux.intelRdt.schemata to features.md

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>

---------

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2025-05-09 21:00:57 +09:00
Antti Kervinen
57c949588e Add support for Linux memory policy
Enable setting a NUMA memory policy for the container. New
linux.memoryPolicy object contains inputs to the set_mempolicy(2)
syscall.

Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
2025-04-23 10:32:29 +03:00
Antonio Ojea
e935f995dd Define Linux Network Devices (#1271)
The proposed "netdevices" field provides a declarative way to
specify which host network devices should be moved into a container's
network namespace.

This approach is similar than the existing "devices" field used for block
devices but uses a dictionary keyed by the interface name instead.

The proposed scheme is based on the existing representation of network
device by the `struct net_device`
https://docs.kernel.org/networking/netdevices.html.

This proposal focuses solely on moving existing network devices into
the container namespace. It does not cover the complexities of
network configuration or network interface creation, emphasizing the
separation of device management and network configuration.

Signed-off-by: Antonio Ojea <aojea@google.com>
2025-04-01 18:56:57 +09:00
Kir Kolyshkin
a5b01166ad Merge pull request #1273 from kershawmehta/zos
zos updates
2025-01-29 19:50:13 -08:00
Kershaw Mehta
1df9fa9f2b zos updates - add zos namespaces, remove zos devices
This PR proposes updates to the OCI runtime spec with
z/OS platform-specific details, including adding
namespaces, adding noNewPrivileges flag, and removing
devices. These changes are currently in use by the
IBM z/OS Container Platform (zOSCP) product - details
can be found here:
https://www.ibm.com/products/zos-container-platform.

Signed-off-by: Neil Johnson <najohnsn@us.ibm.com>
Signed-off-by: Kershaw Mehta <kershaw@us.ibm.com>
2025-01-16 14:27:04 -05:00
Akihiro Suda
d61dee6691 Merge pull request #1258 from kiashok/cpuAffinity-oci
Add support for windows CPU affinity
2025-01-07 03:05:19 +09:00
Kirtana Ashok
b9e8fdb005 Add support for windows CPU affinity
Signed-off-by: Kirtana Ashok <kiashok@microsoft.com>
2024-12-16 10:28:10 -08:00
utam0k
b37b687479 ci: Add a github actions workflow for lint
Signed-off-by: utam0k <k0ma@utam0k.jp>
2024-12-10 20:52:21 +09:00
Akihiro Suda
8cfc4074b2 specs-go: sync SCMP_ARCH_* constants with libseccomp main (#1229)
The following constants are defined in the main branch of libseccomp,
but not included in its latest release (v2.5) yet:

* SCMP_ARCH_LOONGARCH64  (seccomp/libseccomp@6966ec7)
* SCMP_ARCH_M68K         (seccomp/libseccomp@dd5c9c2)
* SCMP_ARCH_SH           (seccomp/libseccomp@c12945d)
* SCMP_ARCH_SHEB         (seccomp/libseccomp@c12945d)

These constant names are unlikely to change before v2.6 GA,
so we can safely refer to them in specs-go.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2024-12-09 20:36:42 +09:00
Sebastiaan van Stijn
9ceba9f40b update http links to https
Most of these either redirect (so changing saves an extra redirect),
or have a TLS version available.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-11-04 12:28:14 +01:00
Kir Kolyshkin
119ae426a1 Add CPU affinity to executed processes
This allows to set initial and final CPU affinity for a process being
run in a container, which is needed to solve the issue described in [1].

[1] https://github.com/opencontainers/runc/issues/3922

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-06-11 16:47:33 -07:00
Kijima Daigo
d4aa6d8a2d chore: format JSON file make -C schema fmt
Signed-off-by: Kijima Daigo <norimaking777@gmail.com>
2024-06-10 22:13:53 +09:00
Akihiro Suda
5e98fec96d features: add potentiallyUnsafeConfigAnnotations
Fix issue 1202

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-10-22 22:43:23 +09:00
Rodrigo Campos
f329913c57 features-linux: Expose idmap information
High level container runtimes sometimes need to know if the OCI runtime
supports idmap mounts or not, as the OCI runtime silently ignores
unknown fields.

This means that if it doesn't support idmap mounts, a container with
userns will be started, without idmap mounts, and the files created on
the volumes will have a "garbage" owner/group. Furthermore, as the
userns mapping is not guaranteed to be stable over time, it will be
completely unusable.

Let's expose idmap support in the features subcommand, so high level
container runtimes use the feature safely.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-08-23 15:38:52 +02:00
Giuseppe Scrivano
d46c8b28bb schema: fix definition for ioPriority
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-05-22 14:53:06 +02:00
Akihiro Suda
8e0dce84f7 Merge pull request #1191 from utam0k/io-prio
Add I/O Priority Configuration for process group in Linux Containers
2023-05-22 20:13:49 +09:00
utam0k
504f70ef81 Add I/O Priority Configuration for Process Group in Linux Containers
Signed-off-by: utam0k <k0ma@utam0k.jp>
2023-05-19 10:24:58 +00:00
Giuseppe Scrivano
89478497a5 spec: add scheduler entity
extend the process struct to represent scheduling attributes for a
process based on the sched_setattr(2) syscall.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-04-18 12:23:51 +02:00
Chris Bandy
6152be404b schema: remove duplicate keys
commit b6980b01b0 introduced the issue.

Signed-off-by: Chris Bandy <bandy.chris@gmail.com>
2023-04-04 20:26:08 -05:00
Giuseppe Scrivano
b6980b01b0 schema: fix schema for timeOffsets
commit 36bb632767 introduced the issue.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-03-29 22:30:21 +02:00
Akihiro Suda
689874fc76 Add features.md to formalize the runc features JSON
Add `features.md` and `features-linux.md`, to formalize the `runc features` JSON that was introduced in runc v1.1.0.

A runtime caller MAY use this JSON to detect the features implemented by the runtime.

The spec corresponds to https://github.com/opencontainers/runc/blob/v1.1.0/types/features/features.go
(opencontainers/runc PR 3296, opencontainers/runc PR 3310)

Differences since runc v1.1.0:
- Add `.linux.intelRdt.enabled` field
- Add `.linux.cgroup.rdma` field
- Add `.linux.seccomp.knownFlags` and `.linux.seccomp.supportedFlags` fields (Implemented in runc PR 3588)

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-03-22 04:04:57 +09:00
Austin Vazquez
c9b5d0e19a Remove references to deprecated io/ioutil package
Signed-off-by: Austin Vazquez <macedonv@amazon.com>
2023-03-16 15:33:06 +00:00
Qiang Huang
7301c34549 Merge pull request #1151 from KentaTada/add-time-namespac
Add support for time namespace
2023-02-01 11:38:51 +08:00
Kenta Tada
36bb632767 Add support for time namespace
The time namespace is a new kernel feature available in 5.6+ to
isolate the system monotonic and boot-time clocks.

Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>
2023-01-24 21:20:51 +09:00
Akihiro Suda
6188d9e9ef Merge pull request #1120 from kailun-qin/add-cfs-burst
config-linux: add CFS bandwidth burst
2023-01-23 20:05:01 +09:00
Kir Kolyshkin
494a5a6aca Merge pull request #1158 from kolyshkin/check-before-update
config-linux: add memory.checkBeforeUpdate
2022-09-09 13:48:39 -07:00
Alban Crequy
4bcd065f24 seccomp: Add flag SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV
Linux 5.19 introduced a new seccomp flag:
SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV

It is useful for seccomp notify when handling notification from Golang
programs which are often preempted by the runtime with SIGURG.

Signed-off-by: Alban Crequy <albancrequy@microsoft.com>
2022-09-07 12:11:41 +02:00
Kailun Qin
d931d4b8ab config-linux: add CFS bandwidth burst
Burstable CFS controller is introduced in Linux 5.14. This helps with
parallel workloads that might be bursty. They can get throttled even
when their average utilization is under quota. And they may be latency
sensitive at the same time so that throttling them is undesired.

This feature borrows time now against the future underrun, at the cost
of increased interference against the other system users, by introducing
`cfs_burst_us` into CFS bandwidth control to enact the cap on unused
bandwidth accumulation, which will then used additionally for burst.

The patch adds the support/control for CFS bandwidth burst.

Fixes https://github.com/opencontainers/runtime-spec/issues/1119

Signed-off-by: Kailun Qin <kailun.qin@intel.com>
2022-09-02 09:40:53 -04:00
Kir Kolyshkin
9e658bcd71 config-linux: add memory.checkBeforeUpdate
This setting can be used to mimic cgroup v1 behavior on cgroup v2,
when setting the new memory limit during update operation.

In cgroup v1, a limit which is lower than the current usage is rejected.

In cgroup v2, such a low limit is causing an OOM kill.

Ref: https://github.com/opencontainers/runc/issues/3509

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-08-29 10:48:45 -07:00
Charlie Doern
744912b29a add domainname spec entity
add the domainname entity so that container runtimes can add special handling similar to hostname. The current workaround of adding a sysctl for kernel.domainname only works with rootful execution in most cases. This will allow for rootless execution.

container runtimes will be able to add special handling as they do for hostname, using setdomainname to add the entry to /proc/sys/kernel/domainname.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2022-08-08 10:19:42 -04:00
Tianon Gravi
72c1f0b44f Merge pull request #1143 from AlexeyPerevalov/IdMapMounts
IDMapping field for mount point
2022-06-01 09:40:19 -07:00
Alexey Perevalov
9d1130dc3b IDMapping field for mount point
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
Co-authored-by: Giuseppe Scrivano <giuseppe@scrivano.org>
2022-05-26 17:03:17 +08:00
Giuseppe Scrivano
bc545ecf66 schema: add cpu idle
commit 9d363b36f6 added the feature but
didn't update the json schema file.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2022-04-20 18:12:37 +02:00
Vincent Batts
2fde0ec207 Merge pull request #1084 from Iceber/schema-golang-1.16
schema: make with golang 1.16
2022-04-20 10:59:45 -04:00
Vincent Batts
ba3abe1642 Merge pull request #1083 from Iceber/schema-readme
schema: update README.md
2022-04-20 10:51:27 -04:00
Vincent Batts
0d6cc581ae Merge pull request #1076 from Creatone/creatone/mon-support
config-linux: Add Intel RDT CMT and MBM Linux support
2021-09-10 07:50:17 -04:00
flouthoc
f0ac327307 defs-zos: [Fix] prevent schema parsers from hitting recursion-loop while resolving types.
Signed-off-by: flouthoc <flouthoc.git@gmail.com>
2021-08-07 18:04:01 +05:30
Paweł Szulik
cc7f6ec598 config-linux: Add Intel RDT CMT and MBM Linux support
Add support for Intel Resource Director Technology (RDT) /
Cache Monitoring Technology (CMT) and Memory Bandwidth Monitoring (MBM).

Example:

"linux": {
    "intelRdt": {
        "enableCMT": true,
        "enableMBM": true
    }
}

This is the prerequisite of this runc proposal:
https://github.com/opencontainers/runc/issues/2519

For more information about Intel RDT CMT and MBM, please refer to:
https://github.com/opencontainers/runc/issues/2519

Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
2021-07-13 08:53:11 +02:00
Neil Johnson
c83b45e7d1 Introduce zos as platform.
Signed-off-by: Neil Johnson <najohnsn@us.ibm.com>
Signed-off-by: Steele Ray Desmond <steele.desmond@ibm.com>
2021-07-09 11:47:37 -04:00
Rodrigo Campos
0f84938403 schema/defs-linux: Fix inconsistencies with seccomp notify
Commit "Add Seccomp Notify support"
(58798e75e9) just added
SECCOMP_FILTER_FLAG_NEW_LISTENER to the schema and not to the list of
flags in config-linux.md. However, it was a mistake to add them to the
schema, as the user will never really need to specify that flag.

Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
2021-03-18 17:29:02 +01:00
Rodrigo Campos
58798e75e9 Add Seccomp Notify support
This adds the specification for Seccomp Userspace Notification and the
Golang bindings. This contains:
- New fields in the seccomp section to use with seccomp userspace
  notification.
- Additional SeccompState struct containing the container state and file
  descriptors passed for seccomp.

This was discussed in the OCI Weekly Discussion on September 16th,
2020. After review on github, this implementation was changed to the
"Proposal with listenerPath and listenerExtraMetadata". For more
information see:
- https://github.com/opencontainers/runtime-spec/pull/1073#issuecomment-719465555

Docs presented on the community meeting (for the old implementation
using hooks):
- https://hackmd.io/El8Dd2xrTlCaCG59ns5cwg#September-16-2020
- https://docs.google.com/document/d/1xHw5GQjMj6ZKR-40aKmTWZRkvlPuzMGQRu-YpOFQc30/edit

Documentation for this feature:
- https://www.kernel.org/doc/html/v5.0/userspace-api/seccomp_filter.html#userspace-notification
- man pages: seccomp_user_notif.2 at
  https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=seccomp_user_notif
- brauner's blog:
  https://brauner.github.io/2020/07/23/seccomp-notify.html

This PR is an alternative proposal to PR 1038. While similar in nature,
the main difference is that this PR adds optional metadata to be sent to
the seccomp agent and specifies how the UNIX socket MUST be used.

Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
2021-03-09 18:54:39 +01:00
Giuseppe Scrivano
f7ef278d1b seccomp: allow to override default errno return code
the specs already support overriding the errno code for the syscalls
but the default value is hardcoded to EPERM.

Add a new attribute to override the default value.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-02-22 16:47:57 +01:00
Iceber Gu
3f30167c3b schema: make with golang 1.16
Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>
2021-02-08 21:13:12 +08:00
Iceber Gu
34a75447b4 schema: update README.md
Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>
2021-02-08 21:05:31 +08:00
Sascha Grunert
2fe047519c Add support for SCMP_ACT_KILL_THREAD
The seccomp action has been added to libseccomp a while ago, so I guess
the runtime spec should support it as well:

b2f15f3d02

Signed-off-by: Sascha Grunert <sgrunert@suse.com>
2020-08-25 15:37:51 +02:00
Mrunal Patel
f9c09b4ea1 Merge pull request #1040 from giuseppe/cgroup-v2
cgroup: add cgroup v2 support
2020-08-17 13:42:27 -07:00
Mrunal Patel
d438e29be5 Merge pull request #1059 from KentaTada/support-riscv64
Update seccomp architectures to support RISCV64
2020-08-06 14:57:16 -07:00
John Bartholomew
11bfea26e8 Fix int64 and uint64 type value ranges
Signed-off-by: John Bartholomew <jpa.bartholomew@gmail.com>
2020-07-30 23:59:21 +01:00