runc idea - part 1

Basic premise is to explore an idea of turning container images into applications or more likely services. This will hopefully be a part of a series of posts about this idea.

The idea is what would it take to create a system where by the organisation of the containers is similar to the form:

/opt/containers/<name>-<version>
| rootfs
| data (i.e. the volume)
\ configuration

Possibly another directory for volume for stuff that is data specific to the application, i.e. not say the postgres data directory but is for access logs, etc.

Where the result can be run via runc and possibly a either a systemd unit or OpenRC service.

Features this system would be counter acting are:

layers - as rootfs would be the final one
overlayfs - this is still considered if it needed to set-up writable rootfs and have some place for it (and have it discarded).
reusing layers - the basic idea here is that there either is the base layers aren’t shared between the applications that will be run, so there that isn’t needed.

Idea

Started with skopeo, verdict is it can copy an image from a registry and store it as a OCI directory, but it can’t create a bundle. The next day I discovered umoci can take an OCI directory and create a OCI bundle from it and it can’t do the fetching itself.’

Practical

Setup

Download the tools required.

curl -LO https://github.com/opencontainers/umoci/releases/download/v0.6.0/umoci.linux.amd64
curl -LO https://github.com/lework/skopeo-binary/releases/download/v1.20.0/skopeo-linux-amd64
chmod +x umoci.linux.amd64 skopeo-linux-amd64

It is a bit of a shame one uses dot and the other uses a minus sign.

First Attempt

./skopeo-linux-amd64 copy docker://docker.io/valkey/valkey:9.0.0 oci:valkey:9.0.0
./umoci.linux.amd64 unpack --image valkey:9.0.0 bundle_valkey_9.0.0
# Or aiming for the desired layout above.
./umoci.linux.amd64 unpack --image valkey:9.0.0 /opt/containers/valkey-9.0.0

The result of the first command will provide a index.json within a directory and then sub-directory with blobs in it (well blob/sha256 directory).

The second command will result in a rootfs sub-directory, which is exactly what runc can work with minimal effort. The added bonus is it will create the config.json, which is the OCI runtime specification. This populates the process.env property with the values from the OCI image’s index, so in this case it defines the VALKEY_VERSION enviroment variable as its set and it sets the cwd as well and the args. The annotations are copied as well. The root is not however set to readonly.

It set-up /data as a tmpfs mount, which is something that will want to change.

Is it runnable? No chown: changing ownership of '.': Operation not permitted

Quick step back to run it, changed to the directory with the config.json and tried running it with runc run vk-2

Differences in settings between the config.json generated by umoci and the one I had, other then the ones mentioned above.

/proc/asound was read-only path for umoci` but was otherwise a masked path.
/proc/acpi and /proc/keys weren’t masked for umoci version.
Extra capabilities - the same that are permitted are also set as inheriable and ambient.

Tried checking if it was to do with the current owner of a file with:

find -exec stat -c "%U:%G %n" {} \; | grep -v root:root

In the end it seemed like the error was coming from the /usr/local/bin/docker-entrypoint.sh. Sure enough there is this line, find . \! -user valkey -exec chown valkey '{}' +. which will fail in read-only.

Create the overlay, so there is a workdir.

# This would ideally be randomised or be a UUID.
CONTAINER_NAME=goofy_davinci
mkdir "/opt/containers/valkey-9.0.0/$CONTAINER_NAME-work"
mkdir "/opt/containers/valkey-9.0.0/$CONTAINER_NAME-merged"
mkdir "/opt/containers/valkey-9.0.0/$CONTAINER_NAME"
mount -t overlay overlay -o lowerdir=/opt/containers/valkey-9.0.0/rootfs,upperdir=/opt/containers/valkey-9.0.0/"$CONTAINER_NAME",workdir=/opt/containers/valkey-9.0.0/"$CONTAINER_NAME"-work /opt/containers/valkey-9.0.0/"$CONTAINER_NAME"-merged

Now, the rootfs needs to be set to the merged one (i.e. goofy_davinci-merged). This means if you were to write in merged then the files end up in the upperdir.

Turns out still failed. Modifying the uid and gid did make it startup as that causes it to skip the chown part. I set it as 5 which was sync on my system then later tried 65534 which was nobody. Sure, enough running ps aux outside the container showed that valkey-server was being run as nobody (and sync before it)

15503 nobody    0:00 valkey-server *:6379

Need to look more into that, but for now check that the server works. Since I didn’t have a the client installed on the host, decided to use the one in the container and start off in the same namespaces.

$ runc exec vk-2 valkey-cli -u valkey://default:PASSWORD@localhost:6379/0
> set answer 42
OK
> get answer
42

However, really need to beable to access it from the host (or different namespace). In this case simply chroot rootfs bash and then ran:

$ valkey-cli
Could not connect to Valkey at 127.0.0.1:6379: Connection refused`

It at this point I ended up on the second session of investigating this idea. I wanted to look at networking next, however first need to redo the above without the overlay.

Returning Threads

These are other threads that I want to return to.

User ID

Look into how the uid mapping works in the container and how it is meant to work as well as implications.

Overlays

Is there way to set-up the overlay on create, such that it is populated with the container-id chosen during the runc run? The OCI runtime spec mentions there are life cycle hooks. The life cycle of container correpsond to create, start, kill, delete and the one of intrest is the createRuntime and startContainer hooks.

It does seem unlikely the hooks can be used for setting up the overlay as the script runs within the container namespace so it already lost ability to mount.