To deepen my understanding of how containers really work, I experimented with running and configuring rootless containers directly using crun.
![NOTE] If the config states terminal=true crun expects a posix console-socket as argument so it knows where to send the output
#Extract a filesystem from an image:
crane export docker.io/library/alpine:3.22.2@sha256:4b7ce07002c69e8f3d704a9c5d6fd3053be500b7f1c69fc0d80990c2ad8dd412 alpine-rootfs.tar
#extract the filesystem
tar -C rootfs -xf alpine-rootfs.tar
# Create container config.json(for rootless container)
crun spec --rootless
#Creates the container
crun create <name/id>
#Creates and runs the container
crun run <name/id>
#Starts the container
crun start <name/id>
# Kill container
crun kill <name/id> [SIGNAL]
# Delete Container
crun delete <name/id>
To run a rootless container in our setup we first need to disable the Apparmor restriction for unprivileged users namespaces. You would of course need to make very light and exact adjustments in a production environment.
# Check current AppArmor setting for unprivileged user namespaces
# 0=disabled, 1=enabled
cat /proc/sys/kernel/apparmor_restrict_unprivileged_userns
# disable it
sudo sysctl -w kernel.apparmor_restrict_unprivileged_userns=0
#Linux prevents you from directly writing to `/sys/fs/cgroup/`
# because those files are owned by `root` and protected by the kernel’s cgroup delegation rules.
# Systemd can delegate a private cgroup subtree to your user session:
# Delegate=yes
systemd-run --user --scope -p Delegate=yes -- bash
# To run the container in the background and to make it bootpersistent I start it as a unit
systemd-run --user -p Delegate=yes --unit=my-envoy crun start task1-envoy
Hello World: Run a simple container using alpine Linux with the following requirements:
task1-alpinedocker.io/library/alpine:3.22.2@sha256:4b7ce07002c69e8f3d704a9c5d6fd3053be500b7f1c69fc0d80990c2ad8dd412Container networking: Run an Envoy Proxy container with the following requirements:
task1-envoydocker.io/envoyproxy/envoy:v1.36.2@sha256:4972515dd9a069b44beb43cebba7851596e72a8c61cd7a7c33d8f48efc5280baenvoy -c /etc/envoy/envoy.yamlENVOY_LOG_LEVEL=debug172.20.0.200/24 (create a bridge interface named container0 with the IP address 172.20.0.1/24 on the host){ "ociVersion": "1.0.0",
"process": {
"terminal": false,
"user": {
"uid": 0,
"gid": 0
},
"args": [
"envoy",
"-c",
"/etc/envoy/envoy.yaml"
],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"TERM=xterm",
"ENVOY_LOG_LEVEL=debug"
],
There are two ways to handle this setup.
We can configure the config.json as needed and then run crun create task1-alpine . Crun will automatically create a network namespace for the container. After that, we can attach our veth interface to the corresponding container PID that crun generated.
Advantage: The container’s lifecycle is directly linked to the namespace, hence when the container is removed, the namespace disappears as well.
We manually create the network namespace, add all required interfaces and devices, and then reference this namespace in the config.json file.
Advantage: The entire network setup can be prepared before the container exists.
Disadvantage: The lifecycle is not coupled, therefore if the container is removed, the namespace remains (and vice versa).
I will go with option one in this case, so I already created the container and got the PID 2209.
PID_ENVOY=2209
# Bridge container0
sudo ip link add name container0 type bridge
sudo ip addr add 172.20.0.1/24 dev container0
sudo ip link set container0 up
# veth pair
sudo ip link delete veth-host 2>/dev/null || true
sudo ip link add veth-host type veth peer name veth-envoy
sudo ip link set veth-host master container0
sudo ip link set veth-host up
sudo ip link set veth-envoy netns $PID_ENVOY
# Change dev in netns
sudo nsenter -t $PID_ENVOY -n ip addr add 172.20.0.200/24 dev veth-envoy
sudo nsenter -t $PID_ENVOY -n ip link set veth-envoy up
sudo nsenter -t $PID_ENVOY -n ip link set lo up
[!veth] veth(Virtual Ethernet device): Is like a direct cable between the two peers.
Task: Run a load generator container with the following requirements:
task1-cgroupsstress-ng.tar provided in this repository)stress-ng --cpu 4 --vm 2 --vm-bytes 2g --vm-keep --fork 4config.json, use the Cgroups Path instead.)[!NOTE] Linux prevents you from directly writing to
/sys/fs/cgroup/(and its subdirectories) because those files are owned byrootand protected by the kernel’s cgroup delegation rules. Systemd, however, can delegate a private cgroup subtree to your user session using the command:systemd-run --user --scope -p Delegate=yes -- <cmd>.
Create container and set limit in cgroups path.
systemd-run --user --scope -p Delegate=yes -- crun create task1-cgroups
echo $((128*1024*1024)) > /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/task1-cgroups/memory.max
echo "100000 100000" > /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/task1-cgroups/cpu.max
crun start task1-cgroups