Run rootless containers with crun

Nov 10, 2025

To deepen my understanding of how containers really work, I experimented with running and configuring rootless containers directly using crun.

Basic use of crun

![NOTE] If the config states terminal=true crun expects a posix console-socket as argument so it knows where to send the output

#Extract a filesystem from an image:
crane export docker.io/library/alpine:3.22.2@sha256:4b7ce07002c69e8f3d704a9c5d6fd3053be500b7f1c69fc0d80990c2ad8dd412 alpine-rootfs.tar

#extract the filesystem
tar -C rootfs -xf alpine-rootfs.tar

# Create container config.json(for rootless container)
crun spec --rootless

#Creates the container 
crun create <name/id> 

#Creates and runs the container
crun run <name/id>

#Starts the container
crun start <name/id>
  
# Kill container
crun kill <name/id> [SIGNAL]

# Delete Container
crun delete <name/id>

Prerequsits for running rootless containers

To run a rootless container in our setup we first need to disable the Apparmor restriction for unprivileged users namespaces. You would of course need to make very light and exact adjustments in a production environment.

# Check current AppArmor setting for unprivileged user namespaces   
# 0=disabled, 1=enabled
cat /proc/sys/kernel/apparmor_restrict_unprivileged_userns  

# disable it  
sudo sysctl -w kernel.apparmor_restrict_unprivileged_userns=0   

#Linux prevents you from directly writing to `/sys/fs/cgroup/`
# because those files are owned by `root` and protected by the kernel’s cgroup delegation rules. 
# Systemd can delegate a private cgroup subtree to your user session:
# Delegate=yes

systemd-run --user --scope -p Delegate=yes -- bash

# To run the container in the background and to make it bootpersistent I start it as a unit
systemd-run --user -p Delegate=yes --unit=my-envoy crun start task1-envoy

Alpine container

Hello World: Run a simple container using alpine Linux with the following requirements:

Name: task1-alpine
Image: docker.io/library/alpine:3.22.2@sha256:4b7ce07002c69e8f3d704a9c5d6fd3053be500b7f1c69fc0d80990c2ad8dd412
Command: (choose a command that keeps the container running)

Container networking

Container networking: Run an Envoy Proxy container with the following requirements:

Name: task1-envoy
Image: docker.io/envoyproxy/envoy:v1.36.2@sha256:4972515dd9a069b44beb43cebba7851596e72a8c61cd7a7c33d8f48efc5280ba
Command: envoy -c /etc/envoy/envoy.yaml
Env variables: ENVOY_LOG_LEVEL=debug
Network: 172.20.0.200/24 (create a bridge interface named container0 with the IP address 172.20.0.1/24 on the host)

Prepare config.json

{	"ociVersion": "1.0.0",
	"process": {
		"terminal": false,
		"user": {
			"uid": 0,
			"gid": 0
		},
		"args": [
			"envoy",
		    "-c",
		    "/etc/envoy/envoy.yaml"
		],
		"env": [
			"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
			"TERM=xterm",
			"ENVOY_LOG_LEVEL=debug"
		],

Create network interfaces

There are two ways to handle this setup.

We can configure the config.json as needed and then run crun create task1-alpine . Crun will automatically create a network namespace for the container. After that, we can attach our veth interface to the corresponding container PID that crun generated.
Advantage: The container’s lifecycle is directly linked to the namespace, hence when the container is removed, the namespace disappears as well.

We manually create the network namespace, add all required interfaces and devices, and then reference this namespace in the config.json file.
Advantage: The entire network setup can be prepared before the container exists.
Disadvantage: The lifecycle is not coupled, therefore if the container is removed, the namespace remains (and vice versa).

I will go with option one in this case, so I already created the container and got the PID 2209.

PID_ENVOY=2209
# Bridge container0
sudo ip link add name container0 type bridge
sudo ip addr add 172.20.0.1/24 dev container0
sudo ip link set container0 up

# veth pair
sudo ip link delete veth-host 2>/dev/null || true
sudo ip link add veth-host type veth peer name veth-envoy
sudo ip link set veth-host master container0
sudo ip link set veth-host up
sudo ip link set veth-envoy netns $PID_ENVOY

# Change dev in netns 
sudo nsenter -t $PID_ENVOY -n ip addr add 172.20.0.200/24 dev veth-envoy
sudo nsenter -t $PID_ENVOY -n ip link set veth-envoy up
sudo nsenter -t $PID_ENVOY -n ip link set lo up

[!veth] veth(Virtual Ethernet device): Is like a direct cable between the two peers.

Limit resources with cgroups

Task: Run a load generator container with the following requirements:

Name: task1-cgroups
Image: (use the stress-ng.tar provided in this repository)
Command: stress-ng --cpu 4 --vm 2 --vm-bytes 2g --vm-keep --fork 4
Limit the memory usage of the container to 128 MB and 1 CPU using cgroups. (Do not set limits directly in config.json, use the Cgroups Path instead.)

[!NOTE] Linux prevents you from directly writing to /sys/fs/cgroup/ (and its subdirectories) because those files are owned by root and protected by the kernel’s cgroup delegation rules. Systemd, however, can delegate a private cgroup subtree to your user session using the command:
systemd-run --user --scope -p Delegate=yes -- <cmd>.

Create container and set limit in cgroups path.

systemd-run --user --scope -p Delegate=yes -- crun create task1-cgroups  
  
echo $((128*1024*1024)) > /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/task1-cgroups/memory.max  
echo "100000 100000" > /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/task1-cgroups/cpu.max  
  
crun start task1-cgroups