Container Runtimes and Container Engines

6 minute read

Diff photo by Donald Cook

In my previous post writing about the technology behind containers we introduced the concepts of Container Runtime and Container Image, so in this post I will continue with a brief overview on the features typically offered by a Container Runtime and the current available implementations.

What is a Container Runtime

At the beginning, when Docker was released on top of LXC, it solved several problems in a very “monolithic” way:

  • It was a container image format
  • A method for building container images (Dockerfile + docker build)
  • A way to manage container images (docker images, docker rm , etc.)
  • A way to manage instances of containers (docker ps, docker rm , etc.)
  • A way to share container images (docker push/pull)
  • Network Interfaces and management
  • A way to run containers (docker run) and supervision
  • Manage local storage

As we can see, it mixed several a-priori not connected functions, with the Docker’s CLI, the engine, the Docker daemon, etc. Because of that, Docker, Google, CoreOS, and other vendors created the Open Container Initiative (OCI). They then broke out their code for running containers as a tool and library called runc and donated it to OCI as a reference implementation of the OCI runtime specification. So some of the aforementioned “features” of what was called “runtime”, as the image format or the building tools, were not part of the runtime anymore; only the functions specified on the OCI Runtime Specification:

  • The filesystem bundle: understanding the container’s image once unpacked
  • Runtime and Lifecycle: abstracting the use of kernel’s features

So, current Container runtimes, as the reference implementation runc, that focus on just running containers are usually referred to as “low-level container runtimes”. Runtimes that support high-level features, like image management and gRPC/Web APIs, are usually referred to as “high-level container tools”, “high-level container runtimes” or usually just “container runtimes” (i.e. containerd, Docker, CRI-O, etc.), and they usually leverage on low-level runtimes.

Summarizing, a low-level container runtime is a piece of software that takes as an input a folder containing rootfs and a configuration file describing container parameters (such as resource limits, mount points, process to start, etc.) and as a result the runtime starts a workload, which might be a container or any other “isolated and restricted process”.

Low-level Runtimes

There are basically two types of low-level runtimes: the native runtimes and the virtualized or sandboxes runtimes.

Native runtimes

Native runtimes are the most prevalent type of runtimes, they are those which interacts directly with the host’s kernel.

  • runc: it is reference implementation for the OCI Runtime spec. It is written in Go and maintained under Docker’s open source Moby project.

  • crun: it is written in C and can be used both as an executable and as a library. It’s been originated by Red Hat and other Red Hat projects like buildah or podman tend to use crun instead of runc.

  • Sysbox: historically, Docker was advocating for a “single container -single process” model and despite of some constant demand to bring containers somewhat closer to traditional virtual machines (with systemd or alike manager inside), it has never been officially supported. Sysbox is an open-source container runtime, originally developed by Nestybox, that enables Docker containers to act as virtual servers capable of running software such as Systemd, Docker, and Kubernetes in them, easily and with proper isolation. This allows you to use containers in new ways, and provides a faster, more efficient, and more portable alternative to virtual machines in many scenarios.

Sanboxed and virtualized

In addition to native runtimes, which run the containerized process on the same host kernel, there are some sandboxed and virtualized implementers of the OCI spec. Since containers are essentially just isolated and restricted Linux processes, we are one kernel bug away from letting the containerized code gaining access to the host system. Depending on the environment, it can become a significant concern [1].

which provide further isolation of the host from the containerized process. Instead of sharing the host kernel, the containerized process runs on a unikernel or kernel proxy layer, which then interacts with the host kernel on the container’s behalf. Because of this increased isolation, these runtimes have a reduced attack surface and make it less likely that a containerized process can have a maleffect on the host.

High-level Runtimes and the Container Runtime Interface (CRI)

Launching containers using the runc command line can be done, but what happens when we need to automate this process? Launching tens of containers and keeping track of their statuses. Some of them might need to be restarted on failure, resources need to be released on termination, images have to be pulled from registries, inter-containers networks need to be configured and so on. All these features are the ones to be performed by the High-level runtimes as: containerd or cri-o.

When the Kubernetes container orchestrator was introduced, the Docker runtime was hardcoded into its machine daemon, the Kubelet. However, as Kubernetes rapidly became popular the community began to need alternative runtime support. To solve this, Hyper, CoreOS, Google, and other Kubernetes sponsors collaborated on a high-level spec describing a container runtime from a container-orchestration perspective: the Container Runtime Interface (CRI).

The CRI has additional concerns over an OCI Runtime including image management and distribution, storage, snapshotting, networking (distinct from the CNI), and more. A CRI has the functionality required to leverage containers in dynamic cloud environments, unlike OCI Runtimes which are tightly focused on creating containers on a machine. Further, CRIs usually delegate to an OCI Runtime for the actual container execution. By introducing the CRI, the Kubernetes authors effectively decoupled the Kubelet from the underlying container runtime in an extensible way. CRIs support interop with all of OCI runtimes via either native interop or plugins/shims, including the sandboxed and virtualized implementations.

  • dockershim: the first CRI implementation was the dockershim, which provided the agreed-upon layer of abstraction in front of the Docker engine.

  • containerd: it is Docker’s high-level runtime, managed and developed out in the open under the Moby project. By default it uses runc under the hood. Like the rest of the container tools that originated from Docker, it is the current de-facto standard CRI. It provides all the core functionality of a CRI and more. It has a plugin design - cri-containerd implements the CRI, and various shims exist to integrate containerd with low-level runtimes such as Kata containers.

  • cri-o: it is the reference implementation of the Kubernetes CRI to enable using OCI compatible runtimes. It is a lightweight alternative to using Docker as the runtime for Kubernetes. It allows Kubernetes to use any OCI-compliant runtime as the container runtime for running pods. It supports runc and Kata containers as the container runtimes but any OCI-conformant runtime can be plugged.

Container Engines

You may notice in reading the above that Docker is not a CRI or OCI implementation, but uses both (via containerd and [run]). In fact, it has additional features like image building and signing that are out of scope of either CRI or OCI specs.

Docker calls their product the “Docker Engine”, and generically these full container tools suites may be referred to as Container Engines. No one except Docker provides such a full featured single executable, but we can piece a comparable suite of tools together from the Containers Tools project.

The Container Tools project follows the UNIX philosophy of small tools which do one thing well: