runc idea - part 1
Basic premise is to explore an idea of turning container images into applications or more likely services. This will hopefully be a part of a series of posts about this idea.
The idea is what would it take to create a system where by the organisation of the containers is similar to the form:
/opt/containers/<name>-<version>
| rootfs
| data (i.e. the volume)
\ configuration
Possibly another directory for volume for stuff that is data specific to the application, i.e. not say the postgres data directory but is for access logs, etc.
Where the result can be run via runc and possibly a either a systemd unit
or OpenRC service.
Features this system would be counter acting are:
- layers - as rootfs would be the final one
- overlayfs - this is still considered if it needed to set-up writable rootfs and have some place for it (and have it discarded).
- reusing layers - the basic idea here is that there either is the base layers aren’t shared between the applications that will be run, so there that isn’t needed.
Idea
Started with skopeo, verdict is it can copy an image from a registry and
store it as a OCI directory, but it can’t create a bundle. The next day I
discovered umoci can take an OCI directory and create a OCI bundle from it and
it can’t do the fetching itself.’
Practical
Setup
Download the tools required.
curl -LO https://github.com/opencontainers/umoci/releases/download/v0.6.0/umoci.linux.amd64
curl -LO https://github.com/lework/skopeo-binary/releases/download/v1.20.0/skopeo-linux-amd64
It is a bit of a shame one uses dot and the other uses a minus sign.
First Attempt
./skopeo-linux-amd64 copy docker://docker.io/valkey/valkey:9.0.0 oci:valkey:9.0.0
./umoci.linux.amd64 unpack --image valkey:9.0.0 bundle_valkey_9.0.0
# Or aiming for the desired layout above.
./umoci.linux.amd64 unpack --image valkey:9.0.0 /opt/containers/valkey-9.0.0
The result of the first command will provide a index.json within a directory
and then sub-directory with blobs in it (well blob/sha256 directory).
The second command will result in a rootfs sub-directory, which is exactly
what runc can work with minimal effort.
The added bonus is it will create the config.json, which is the OCI runtime
specification. This populates the process.env property with the values
from the OCI image’s index, so in this case it
defines the VALKEY_VERSION enviroment variable as its set and it sets the
cwd as well and the args. The annotations are copied as well.
The root is not however set to readonly.
It set-up /data as a tmpfs mount, which is something that will want to change.
Is it runnable? No
chown: changing ownership of '.': Operation not permitted
Quick step back to run it, changed to the directory with the config.json
and tried running it with runc run vk-2
Differences in settings between the config.json generated by umoci and the
one I had, other then the ones mentioned above.
/proc/asoundwas read-only path forumoci` but was otherwise a masked path./proc/acpiand/proc/keysweren’t masked forumociversion.- Extra capabilities - the same that are permitted are also set as inheriable and ambient.
Tried checking if it was to do with the current owner of a file with:
find -exec stat -c "%U:%G %n" {} \; | grep -v root:root
In the end it seemed like the error was coming from the
/usr/local/bin/docker-entrypoint.sh. Sure enough there is this line,
find . \! -user valkey -exec chown valkey '{}' +. which will fail in read-only.
Create the overlay, so there is a workdir.
# This would ideally be randomised or be a UUID.
CONTAINER_NAME=goofy_davinci
mkdir "/opt/containers/valkey-9.0.0/$CONTAINER_NAME-work"
mkdir "/opt/containers/valkey-9.0.0/$CONTAINER_NAME-merged"
mkdir "/opt/containers/valkey-9.0.0/$CONTAINER_NAME"
mount -t overlay overlay -o lowerdir=/opt/containers/valkey-9.0.0/rootfs,upperdir=/opt/containers/valkey-9.0.0/"$CONTAINER_NAME",workdir=/opt/containers/valkey-9.0.0/"$CONTAINER_NAME"-work /opt/containers/valkey-9.0.0/"$CONTAINER_NAME"-merged
Now, the rootfs needs to be set to the merged one (i.e. goofy_davinci-merged). This means if you were to write in merged then the files end up in the upperdir.
Turns out still failed. Modifying the uid and gid did make it startup as that
causes it to skip the chown part. I set it as 5 which was sync on my system
then later tried 65534 which was nobody. Sure, enough running ps aux outside
the container showed that valkey-server was being run as nobody (and sync
before it)
15503 nobody 0:00 valkey-server *:6379
Need to look more into that, but for now check that the server works. Since I didn’t have a the client installed on the host, decided to use the one in the container and start off in the same namespaces.
$ runc exec vk-2 valkey-cli -u valkey://default:PASSWORD@localhost:6379/0
> set answer 42
OK
> get answer
42
However, really need to beable to access it from the host (or different
namespace). In this case simply chroot rootfs bash and then ran:
$ valkey-cli
Could not connect to Valkey at 127.0.0.1:6379: Connection refused`
It at this point I ended up on the second session of investigating this idea. I wanted to look at networking next, however first need to redo the above without the overlay.
Returning Threads
These are other threads that I want to return to.
User ID
Look into how the uid mapping works in the container and how it is meant to work as well as implications.
Overlays
Is there way to set-up the overlay on create, such that it is populated with
the container-id chosen during the runc run? The OCI runtime spec mentions
there are life cycle hooks. The life cycle of container correpsond to
create, start, kill, delete and the one of intrest is the createRuntime
and startContainer hooks.
It does seem unlikely the hooks can be used for setting up the overlay as the script runs within the container namespace so it already lost ability to mount.