runc idea - part 2

This continues the container idea using runc started in part 1 where the idea was to turn container images into applications or more likely services. Last time, the OCI image was turned into an OCI bundle that was runnable with runc. The problem was the networking wasn’t configured to allow the host to talk to the service running in the container.

The starting point is to consider the Container Network Interface (CNI), which has the tag line of “networking for Linux containers”. Other ideas is to check to see if iptables or nftables (which is the intended as the replacements for the former tools). The question I would like answered is does iptables or nftables enable us to forward ports from the localhost interface within the container to the host.

Re-starting

When starting this one, I reran the commands from part 1 on a new fresh install of Alpine Linux except I was a regularly user this time instead of root. This proved to be problematic as while /opt/containers was opened by the regular user, umoci would fail to unpack there with the following error:

   • umoci encountered a permission error: maybe --rootless will help?
   x create runtime bundle: unpack rootfs: chown rootfs: lchown /opt/containers/valkey-9.0.0/rootfs: operation not permitted`

By adding the --rootless argument to umoci what it does is it doesn’t try to change the owner to root as it is in the layer tarballs and therefore allows it to be created.

The documentation of rootless option of umoci mentions

umoci also supports the user.rootlesscontainers specification, which allows for further emulation of things like chown(2) inside rootless containers using tools like PRoot.

That sounds as if it may be a solution to the problem where the valkey container wants to change the owner (run chown) of everything to valkey, so that needs to be looked into later.

When running the container, since it tried to do the chown stuff that still failed but using an existing user ID also failed such as 5 for sync.

ERRO[0000] runc run failed: unable to start container process: error during container init: unable to setup user: cannot setgid to unmapped gid 5 in user namespace

Possibly need to run newgidmap or newuidmap, but unsure, in the end I set the uid and gid back to 0 and simply modified rootfs/usr/local/bin/docker-entrypoint.sh to skip the chown and it ran.

This time the networking worked, however before testing connecting to the server, a problem was discovered. Starting a second container failed because it said the port was already in use, which suggests the networking isn’t in its own namespace. Running the valkey-cli worked, at first I thought it was because it was installed on the host rather than using the chroot approach done last time but both that and the chroot worked.

I then when back to the original environment and sure enough under that I could run two valkey-server on same port via runc run vk-3 and runc run vk-1. This indeed worked and the second run didn’t conflict with the port in the first.

The difference was the original system was Wolfi where the second system was Alpine 3.22. Comparing the OCI runtime specification for the two systems, in the Wolfi one it has network namespace listed but the Alpine one didn’t, instead it has the type user likely because of the --rootless.

Back to Root

Re-starting but as root this time, which brought it back to having a network namespace and thus was able to start three containers of Valkey.

Testing using nsenter, to run valkey-cli within the container. The first step is to find the PID of the valkey-server via ps, then enter the namespace of that process with nsenter -t <pid> valkey-cli. This raised an error Could not connect to Valkey at 127.0.0.1:6379: Connection refused. Try again this time add -n argument to nsenter which causes it to enter the network namespace and that works. This experiment provides a new idea, use socat to essentially folder data to the namespace.

Now this has brought us back to base where networking is namespaced, can look at the networking options.

Networking

socat

For this, we need the port that the service is running on as it only forwards the port. In the example with Valkey 9.0.0 this is port 6379.

In this case, nsenter is used to run the command in the same network namespace as the Valkey process.

nsenter -t $(runc state vk-2 | jq ".pid") -n socat UNIX-LISTEN:/tmp/socket TCP:127.0.0.1:6379

The reason runc exec can’t be used here is it would require socat exist within the root file system of the container. This command therefore won’t work unless socat was added to the rootfs directory in a directory included in $PATH.

runc exec vk-3 socat UNIX-LISTEN:/tmp/socket TCP:127.0.0.1:6379

Then run the following command which listens on the same port on the host and connects to the socket which is connected ot the port in the network namespace.

socat TCP-LISTEN:6379 UNIX:/tmp/socket

Now valkey-cli on the host will connect to that service, as it the default host and port.

Pros
- Specific single port - only get the individual thing
- The port could come from the org.opencontainers.image.exposedPorts annotation.
Cons
- Do need to know the port.

This approach does seem easy to automate, especially with the hooks.

iproute2

First things to account for is the ip command from BusyBox does not have the netns subcommand, so iproute2 is needed to peruse that.

Special thanks to Murat Kilic article about network setup with runc and a scripted approach by svv on Serverfault, which I found shortly after following Murat’s post. This creates virtual ethernet device with one end in the network namespace for the container and the other outside (i.e. on the host). The downside to this approach is creating this device requires root privileges.

ip netns add ctr_valkey_9.0.0
ip link add name veth-host type veth peer name veth-ctr-valkey
ip link set veth-ctr-valkey netns ctr_valkey_9.0.0
ip netns exec ctr_valkey_9.0.0 ip addr add 192.168.10.1/24 dev veth-ctr-valkey
ip netns exec ctr_valkey_9.0.0 ip link set veth-ctr-valkey up
ip netns exec ctr_valkey_9.0.0 ip link set lo up
ip link set veth-host up
ip route add 192.168.10.1/32 dev veth-host
ip netns exec ctr_valkey_9.0.0 ip route add default via 192.168.10.1 dev veth-ctr-valkey

Create a new named network namespace - this is naming it after the container.
Create a virtual ethernet adapter - one end represents the host the other the container.
Move the container end to the network namespace.
Set-up IP address for the container side of the ethernet adapeter.
Bring up the network interface for the container (network namespace.)
Bring up the network interface for the host side.
Route traffic.

The ip netns exec are running the command that follows in a network namespace, it could probably be done with nsenter instead, but it takes the network namespace name rather than PID. However, I suspect with a container hook this set-up can become part of that.

Checking the working, by trying to ping the IP of the container from the host so in this case that is 192.168.10.1.

You can also ip -netns ctr_valkey_9.0.0 link ls and ip -netns ctr_valkey_9.0.0 addr to help troubleshoot.

The link appears as;

1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: veth-ctr-valkey@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether de:b0:21:43:e3:80 brd ff:ff:ff:ff:ff:ff link-netnsid 0

The address is:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host proto kernel_lo
       valid_lft forever preferred_lft forever
3: veth-ctr-valkey@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether de:b0:21:43:e3:80 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.10.1/24 scope global veth-ctr-valkey
       valid_lft forever preferred_lft forever
    inet6 fe80::dcb0:21ff:fe43:e380/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever

Using it

In the config.json, go to linux.namespaces and find the namespace with type of network and add the path property to it i.e.

    { 
        "type": "network",
        "path": "/var/run/netns/ctr_valkey_9.0.0"
    }

Re-run the container and now valkey-cli -h 192.168.10.1 from the host works..

Of course, now since the namespace has a particular path it means you can’t run the container more than once.

Automating

I didn’t get time for this avenues.

Unexplored avenues

I’m not sure if iptables or nftables could be used to set-up the port forwarding between namespaces.

For non-network avenues, explore the user.rootlesscontainers and PRoot mentioned by umoci’s documentation about rootless.

The other thing that may have been useful but wasn’t needed in today’s session was:

net.ipv4.ip_forward=1
net.ipv6.conf.all.forwarding=1