1
0
mirror of https://github.com/moby/buildkit.git synced 2025-04-18 18:04:03 +03:00

rootless: update docs and examples

Fix issue 5763

- Discourage `--oci-worker-no-process-sandbox`, due to the leakage of
  the processes (by design).
  Instead, encourage setting `systempaths=unconfined` in `docker run`.
  This corresponds to `securityContext.procMount: Unmasked` in Kubernetes,
  however, the configuration is hard on Kubernetes, as it has to be used
  in conjunction with `hostUsers: false`.

- Remove `--device /dev/fuse`, as fuse-overlayfs is no longer used typically.

- Use the new Kubernetes struct for AppArmor

- Add a hint about `kernel.apparmor_restrict_unprivileged_userns`

- Remove `$` from command snippets for ease of copypasting

- Make `job.*.yaml` more practical

- Add `*.userns.yaml`. Needs `UserNamespaceSupport` feature gate to be enabled.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
This commit is contained in:
Akihiro Suda 2025-02-21 14:50:24 +09:00
parent 18db8b3e29
commit 3a91b50be1
No known key found for this signature in database
GPG Key ID: 49524C6F9F638F1A
11 changed files with 318 additions and 75 deletions

View File

@ -12,18 +12,19 @@ Rootless mode allows running BuildKit daemon as a non-root user.
[RootlessKit](https://github.com/rootless-containers/rootlesskit/) needs to be installed.
```console
$ rootlesskit buildkitd
```bash
rootlesskit buildkitd
```
```console
$ buildctl --addr unix:///run/user/$UID/buildkit/buildkitd.sock build ...
```bash
buildctl --addr unix:///run/user/$UID/buildkit/buildkitd.sock build ...
```
To isolate BuildKit daemon's network namespace from the host (recommended):
```console
$ rootlesskit --net=slirp4netns --copy-up=/etc --disable-host-loopback buildkitd
```
> [!TIP]
> To isolate BuildKit daemon's network namespace from the host (recommended):
> ```bash
> rootlesskit --net=slirp4netns --copy-up=/etc --disable-host-loopback buildkitd
> ```
## Running BuildKit in Rootless mode (containerd worker)
@ -31,15 +32,28 @@ $ rootlesskit --net=slirp4netns --copy-up=/etc --disable-host-loopback buildkitd
Run containerd in rootless mode using rootlesskit following [containerd's document](https://github.com/containerd/containerd/blob/main/docs/rootless.md).
```
$ containerd-rootless.sh
```bash
containerd-rootless.sh
CONTAINERD_NAMESPACE=default containerd-rootless-setuptool.sh install-buildkit-containerd
```
Then let buildkitd join the same namespace as containerd.
<details>
<summary>Advanced guide</summary>
<p>
Alternatively, you can specify the full command line flags as follows:
```bash
containerd-rootless.sh --config /path/to/config.toml
containerd-rootless-setuptool.sh nsenter -- buildkitd --oci-worker=false --containerd-worker=true
```
$ containerd-rootless-setuptool.sh nsenter -- buildkitd --oci-worker=false --containerd-worker=true --containerd-worker-snapshotter=native
```
</p>
</details>
## Containerized deployment
@ -48,36 +62,45 @@ See [`../examples/kubernetes`](../examples/kubernetes).
### Docker
```console
$ docker run \
```bash
docker run \
--name buildkitd \
-d \
--security-opt seccomp=unconfined \
--security-opt apparmor=unconfined \
--device /dev/fuse \
moby/buildkit:rootless --oci-worker-no-process-sandbox
$ buildctl --addr docker-container://buildkitd build ...
--security-opt systempaths=unconfined \
moby/buildkit:rootless
buildctl --addr docker-container://buildkitd build ...
```
If you don't mind using `--privileged` (almost safe for rootless), the `docker run` flags can be shorten as follows:
> [!TIP]
> If you don't mind using `--privileged` (almost safe for rootless), the `docker run` flags can be shorten as follows:
>
> ```bash
> docker run --name buildkitd -d --privileged moby/buildkit:rootless
> ```
```console
$ docker run --name buildkitd -d --privileged moby/buildkit:rootless
```
Justification of the `--security-opt` flags:
#### About `--device /dev/fuse`
Adding `--device /dev/fuse` to the `docker run` arguments is required only if you want to use `fuse-overlayfs` snapshotter.
* `seccomp=unconfined`: For allowing several syscalls such as `unshare` (used by runc) and `mount` (used by snapshotters, etc).
#### About `--oci-worker-no-process-sandbox`
* `apparmor=unconfined`: For allowing mounting filesystems, etc.
This flag is not needed when the host operating system does not use AppArmor.
By adding `--oci-worker-no-process-sandbox` to the `buildkitd` arguments, BuildKit can be executed in a container without adding `--privileged` to `docker run` arguments.
However, you still need to pass `--security-opt seccomp=unconfined --security-opt apparmor=unconfined` to `docker run`.
* `systempaths=unconfined`: For disabling the masks for the `/proc` mount in the container, so that each of `ExecOp`
(corresponds to a `RUN` instruction in Dockerfile) can have a dedicated `/proc` filesystem.
`systempaths=unconfined` potentially allows reading and writing dangerous kernel files from a container, but it is safe when you are running `buildkitd` as non-root.
Note that `--oci-worker-no-process-sandbox` allows build executor containers to `kill` (and potentially `ptrace` depending on the seccomp configuration) an arbitrary process in the BuildKit daemon container.
To allow running rootless `buildkitd` without `--oci-worker-no-process-sandbox`, run `docker run` with `--security-opt systempaths=unconfined`. (For Kubernetes, set `securityContext.procMount` to `Unmasked`.)
The `--security-opt systempaths=unconfined` flag disables the masks for the `/proc` mount in the container and potentially allows reading and writing dangerous kernel files, but it is safe when you are running `buildkitd` as non-root.
> [!TIP]
> Instead of `--security-opt systempaths=unconfined`, `buildkitd` can be also executed with `--oci-worker-no-process-sandbox` (flag of `buildkitd`, not `docker`)
> to avoid creating a new PID namespace and mounting a new `/proc` for it.
>
> Using `--oci-worker-no-process-sandbox` is discouraged, as it cannot terminate processes that did not exit during an `ExecOp`.
> Also, `--oci-worker-no-process-sandbox` allows `ExecOp` containers to `kill` (and potentially `ptrace` depending on the seccomp configuration) an arbitrary process in the BuildKit daemon container.
>
> Despite these caveats, the [Kubernetes examples](../examples/kubernetes) uses `--oci-worker-no-process-sandbox`, as Kubernetes lacks the equivalent of `systempaths=unconfined`.
> (`securityContext.procMount=Unmasked` is similar, but different in the sense that it depends on `hostUsers: false`)
### Change UID/GID
@ -90,7 +113,7 @@ Actual ID (shown in the host and the BuildKit daemon container)| Mapped ID (show
... | ...
165535 | 65536
```
```console
$ docker exec buildkitd id
uid=1000(user) gid=1000(user)
$ docker exec buildkitd ps aux
@ -99,15 +122,16 @@ PID USER TIME COMMAND
13 user 0:00 /proc/self/exe buildkitd --addr tcp://0.0.0.0:1234
21 user 0:00 buildkitd --addr tcp://0.0.0.0:1234
29 user 0:00 ps aux
$ docker exec cat /etc/subuid
user:100000:65536
```
To change the UID/GID configuration, you need to modify and build the BuildKit image manually.
```
$ vi Dockerfile
$ make images
$ docker run ... moby/buildkit:local-rootless ...
```bash
vi Dockerfile
make images
docker run ... moby/buildkit:local-rootless ...
```
## Troubleshooting
@ -120,7 +144,9 @@ $ rootlesskit buildkitd --oci-worker-snapshotter=fuse-overlayfs
```
### Error related to `fuse-overlayfs`
Try running `buildkitd` with `--oci-worker-snapshotter=native`:
Run `docker run` with `--device /dev/fuse`.
Also try running `buildkitd` with `--oci-worker-snapshotter=native`:
```console
$ rootlesskit buildkitd --oci-worker-snapshotter=native
@ -137,12 +163,19 @@ Run `sysctl -w user.max_user_namespaces=N` (N=positive integer, like 63359) on t
See [`../examples/kubernetes/sysctl-userns.privileged.yaml`](../examples/kubernetes/sysctl-userns.privileged.yaml).
### Error `fork/exec /proc/self/exe: permission denied` with `This error might have happened because /proc/sys/kernel/apparmor_restrict_unprivileged_userns is set to 1`
Add `kernel.apparmor_restrict_unprivileged_userns=0` to `/etc/sysctl.conf` (or `/etc/sysctl.d`) and run `sudo sysctl -p`.
### Error `mount proc:/proc (via /proc/self/fd/6), flags: 0xe: operation not permitted`
This error is known to happen when BuildKit is executed in a container without the `--oci-worker-no-sandbox` flag.
Make sure that `--oci-worker-no-process-sandbox` is specified (See [below](#docker)).
This error is known to happen when BuildKit is executed in a container without the `--security-opt systempaths=unconfined` flag.
Make sure to specify it (See [above](#docker)).
## Distribution-specific hint
Using Ubuntu kernel is recommended.
### Ubuntu, 24.04 or later
Add `kernel.apparmor_restrict_unprivileged_userns=0` to `/etc/sysctl.conf` (or `/etc/sysctl.d`) and run `sudo sysctl -p`.
### Container-Optimized OS from Google
Make sure to have an `emptyDir` volume below:
```yaml

View File

@ -6,16 +6,26 @@ This directory contains Kubernetes manifests for `Pod`, `Deployment` (with `Serv
* `StateFulset`: good for client-side load balancing, without registry-side cache
* `Job`: good if you don't want to have daemon pods
Using Rootless mode (`*.rootless.yaml`) is recommended because Rootless mode image is executed as non-root user (UID 1000) and doesn't need `securityContext.privileged`.
See [`../../docs/rootless.md`](../../docs/rootless.md).
## Variants
See also ["Building Images Efficiently And Securely On Kubernetes With BuildKit" (KubeCon EU 2019)](https://kccnceu19.sched.com/event/MPX5).
- `*.privileged.yaml`: Launches the Pod as the fully privileged root user.
- `*.rootless.yaml`: Launches the Pod as a non-root user, whose UID is 1000.
- `*.userns.yaml`: Launches the Pod as a non-root user. The UID is determined by kubelet.
Needs kubelet and kube-apiserver to be reconfigured to enable the
[`UserNamespacesSupport`](https://kubernetes.io/docs/tasks/configure-pod-container/user-namespaces/) feature gate.
It is recommended to use `*.rootless.yaml` to minimize the chance of container breakout attacks.
See also:
- [`../../docs/rootless.md`](../../docs/rootless.md).
- ["Building Images Efficiently And Securely On Kubernetes With BuildKit" (KubeCon EU 2019)](https://kccnceu19.sched.com/event/MPX5).
## `Pod`
```console
$ kubectl apply -f pod.rootless.yaml
$ buildctl \
```bash
kubectl apply -f pod.rootless.yaml
buildctl \
--addr kube-pod://buildkitd \
build --frontend dockerfile.v0 --local context=/path/to/dir --local dockerfile=/path/to/dir
```
@ -29,25 +39,27 @@ If rootless mode doesn't work, try `pod.privileged.yaml`.
Setting up mTLS is highly recommended.
`./create-certs.sh SAN [SAN...]` can be used for creating certificates.
```console
$ ./create-certs.sh 127.0.0.1
```bash
./create-certs.sh 127.0.0.1
```
The daemon certificates is created as `Secret` manifest named `buildkit-daemon-certs`.
```console
$ kubectl apply -f .certs/buildkit-daemon-certs.yaml
```bash
kubectl apply -f .certs/buildkit-daemon-certs.yaml
```
Apply the `Deployment` and `Service` manifest:
```console
$ kubectl apply -f deployment+service.rootless.yaml
$ kubectl scale --replicas=10 deployment/buildkitd
```bash
kubectl apply -f deployment+service.rootless.yaml
kubectl scale --replicas=10 deployment/buildkitd
```
Run `buildctl` with TLS client certificates:
```console
$ kubectl port-forward service/buildkitd 1234
$ buildctl \
```bash
kubectl port-forward service/buildkitd 1234
buildctl \
--addr tcp://127.0.0.1:1234 \
--tlscacert .certs/client/ca.pem \
--tlscert .certs/client/cert.pem \
@ -58,10 +70,10 @@ $ buildctl \
## `StatefulSet`
`StatefulSet` is useful for consistent hash mode.
```console
$ kubectl apply -f statefulset.rootless.yaml
$ kubectl scale --replicas=10 statefulset/buildkitd
$ buildctl \
```bash
kubectl apply -f statefulset.rootless.yaml
kubectl scale --replicas=10 statefulset/buildkitd
buildctl \
--addr kube-pod://buildkitd-4 \
build --frontend dockerfile.v0 --local context=/path/to/dir --local dockerfile=/path/to/dir
```
@ -70,8 +82,8 @@ See [`./consistenthash`](./consistenthash) for how to use consistent hashing.
## `Job`
```console
$ kubectl apply -f job.rootless.yaml
```bash
kubectl apply -f job.rootless.yaml
```
To push the image to the registry, you also need to mount `~/.docker/config.json`

View File

@ -13,8 +13,6 @@ spec:
metadata:
labels:
app: buildkitd
annotations:
container.apparmor.security.beta.kubernetes.io/buildkitd: unconfined
# see buildkit/docs/rootless.md for caveats of rootless mode
spec:
containers:
@ -54,6 +52,9 @@ spec:
# Needs Kubernetes >= 1.19
seccompProfile:
type: Unconfined
# Needs Kubernetes >= 1.30
appArmorProfile:
type: Unconfined
# To change UID/GID, you need to rebuild the image
runAsUser: 1000
runAsGroup: 1000

View File

@ -0,0 +1,77 @@
# Depends on feature gate UserNamespacesSupport
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: buildkitd
name: buildkitd
spec:
replicas: 1
selector:
matchLabels:
app: buildkitd
template:
metadata:
labels:
app: buildkitd
spec:
hostUsers: false
containers:
- name: buildkitd
image: moby/buildkit:master
args:
- --addr
- unix:///run/buildkit/buildkitd.sock
- --addr
- tcp://0.0.0.0:1234
- --tlscacert
- /certs/ca.pem
- --tlscert
- /certs/cert.pem
- --tlskey
- /certs/key.pem
# the probe below will only work after Release v0.6.3
readinessProbe:
exec:
command:
- buildctl
- debug
- workers
initialDelaySeconds: 5
periodSeconds: 30
# the probe below will only work after Release v0.6.3
livenessProbe:
exec:
command:
- buildctl
- debug
- workers
initialDelaySeconds: 5
periodSeconds: 30
securityContext:
# Not really privileged
privileged: true
ports:
- containerPort: 1234
volumeMounts:
- name: certs
readOnly: true
mountPath: /certs
volumes:
# buildkit-daemon-certs must contain ca.pem, cert.pem, and key.pem
- name: certs
secret:
secretName: buildkit-daemon-certs
---
apiVersion: v1
kind: Service
metadata:
labels:
app: buildkitd
name: buildkitd
spec:
ports:
- port: 1234
protocol: TCP
selector:
app: buildkitd

View File

@ -8,11 +8,11 @@ spec:
restartPolicy: Never
initContainers:
- name: prepare
image: alpine:3.10
image: busybox
command:
- sh
- -c
- "echo FROM hello-world > /workspace/Dockerfile"
- "echo -e 'FROM alpine\nRUN apk add gcc\n' > /workspace/Dockerfile"
volumeMounts:
- name: workspace
mountPath: /workspace

View File

@ -4,19 +4,16 @@ metadata:
name: buildkit
spec:
template:
metadata:
annotations:
container.apparmor.security.beta.kubernetes.io/buildkit: unconfined
# see buildkit/docs/rootless.md for caveats of rootless mode
spec:
restartPolicy: Never
initContainers:
- name: prepare
image: alpine:3.10
image: busybox
command:
- sh
- -c
- "echo FROM hello-world > /workspace/Dockerfile"
- "echo -e 'FROM alpine\nRUN apk add gcc\n' > /workspace/Dockerfile"
securityContext:
runAsUser: 1000
runAsGroup: 1000
@ -45,6 +42,9 @@ spec:
# Needs Kubernetes >= 1.19
seccompProfile:
type: Unconfined
# Needs Kubernetes >= 1.30
appArmorProfile:
type: Unconfined
# To change UID/GID, you need to rebuild the image
runAsUser: 1000
runAsGroup: 1000

View File

@ -0,0 +1,47 @@
# Depends on feature gate UserNamespacesSupport
apiVersion: batch/v1
kind: Job
metadata:
name: buildkit
spec:
template:
spec:
hostUsers: false
restartPolicy: Never
initContainers:
- name: prepare
image: busybox
command:
- sh
- -c
- "echo -e 'FROM alpine\nRUN apk add gcc\n' > /workspace/Dockerfile"
volumeMounts:
- name: workspace
mountPath: /workspace
containers:
- name: buildkit
image: moby/buildkit:master
command:
- buildctl-daemonless.sh
args:
- build
- --frontend
- dockerfile.v0
- --local
- context=/workspace
- --local
- dockerfile=/workspace
# To push the image to a registry, add
# `--output type=image,name=docker.io/username/image,push=true`
securityContext:
# Not really privileged
privileged: true
volumeMounts:
- name: workspace
readOnly: true
mountPath: /workspace
# To push the image, you also need to create `~/.docker/config.json` secret
# and set $DOCKER_CONFIG to `/path/to/.docker` directory.
volumes:
- name: workspace
emptyDir: {}

View File

@ -2,8 +2,6 @@ apiVersion: v1
kind: Pod
metadata:
name: buildkitd
annotations:
container.apparmor.security.beta.kubernetes.io/buildkitd: unconfined
# see buildkit/docs/rootless.md for caveats of rootless mode
spec:
containers:
@ -31,6 +29,9 @@ spec:
# Needs Kubernetes >= 1.19
seccompProfile:
type: Unconfined
# Needs Kubernetes >= 1.30
appArmorProfile:
type: Unconfined
# To change UID/GID, you need to rebuild the image
runAsUser: 1000
runAsGroup: 1000

View File

@ -0,0 +1,29 @@
# Depends on feature gate UserNamespacesSupport
apiVersion: v1
kind: Pod
metadata:
name: buildkitd
spec:
hostUsers: false
containers:
- name: buildkitd
image: moby/buildkit:master
readinessProbe:
exec:
command:
- buildctl
- debug
- workers
initialDelaySeconds: 5
periodSeconds: 30
livenessProbe:
exec:
command:
- buildctl
- debug
- workers
initialDelaySeconds: 5
periodSeconds: 30
securityContext:
# Not really privileged
privileged: true

View File

@ -15,8 +15,6 @@ spec:
metadata:
labels:
app: buildkitd
annotations:
container.apparmor.security.beta.kubernetes.io/buildkitd: unconfined
# see buildkit/docs/rootless.md for caveats of rootless mode
spec:
containers:
@ -44,6 +42,9 @@ spec:
# Needs Kubernetes >= 1.19
seccompProfile:
type: Unconfined
# Needs Kubernetes >= 1.30
appArmorProfile:
type: Unconfined
# To change UID/GID, you need to rebuild the image
runAsUser: 1000
runAsGroup: 1000

View File

@ -0,0 +1,42 @@
# Depends on feature gate UserNamespacesSupport
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: buildkitd
name: buildkitd
spec:
serviceName: buildkitd
podManagementPolicy: Parallel
replicas: 1
selector:
matchLabels:
app: buildkitd
template:
metadata:
labels:
app: buildkitd
spec:
hostUsers: false
containers:
- name: buildkitd
image: moby/buildkit:master
readinessProbe:
exec:
command:
- buildctl
- debug
- workers
initialDelaySeconds: 5
periodSeconds: 30
livenessProbe:
exec:
command:
- buildctl
- debug
- workers
initialDelaySeconds: 5
periodSeconds: 30
securityContext:
# Not really privileged
privileged: true