Demystifying Kubernetes Networking — Episode 2
In this episode 2, we will look at the Kubernetes Secret Recipe — The Pause Container!
In the last episode of Demystifying Kubernetes Networking, we learnt about Linux network namespace & its relevance in Kubernetes.
Its the Linux network namespace that gives Kubernetes pods (containers), the needed network isolations & at the same time enables the containers inside a pod to be able to talk with each other!
But how does Kubernetes implements this? Because POD in reality is just a logical construct? How does it work on the machine? That’s a question we’re going to explain in today’s episode!
Video edition of this article —
https://sanjimoh.medium.com/s1e02-kubernetes-networking-series-dd97573e236f
Let’s start with a small exercise.
Login to one of the nodes of a Kubernetes cluster and then execute docker ps -a
command and take a look at the output. Did you see containers with “/pause” command in the output? (something similar to below screenshot)
What the heck are these secret pause commands? Did you ever notice it earlier? Did you know what are these for?
From the last episode of Demystifying Kubernetes Networking, we learnt that the containers in a pod need to share the same network namespace to be able to reach each other through their loopback address. The way Kubernetes implements this is through the — Pause containers (or Infra container) .
This Pause/Infra container is a very small image, about 700KB (gcr.io/google_containers/pause), written in C language & is always in the “pause” state. It really does nothing useful & its container image basically contains a simple binary that goes to sleep and never wakes up!
Kubernetes creates this additional Infra container as the first thing in each Pod on our behalf and all other containers are subsequently added to its network namespace.
The pause container is the Kubernetes’s secret recipe that holds the network namespace for the pod!
In a Linux environment, when you create a process, it inherits the namespace from its parent process. Once this parent process is running, you can add other processes to the namespace of this new process to form a Pod. New processes can be added to an existing namespace through the setns system call. Kubernetes Pod infrastructure automates this entire process for us and in reality we never have to deal with this ourselves.
The pause container in a Pod is treated as a root process and it reclaim any zombie process by calling the wait system call. In this way there won’t be many zombie processes in the PID namespace of a Kubernetes pod.
Kubernetes pause container functionalities
- PID namespace: Pods from different applications can see the processes of other applications ID.
- Network namespaces: Containers in a Pod can access the same IP & port range.
- IPC namespaces: Containers can be used in a Pod or POSIX message queues SystemV IPC communication.
- UTS namespaces: Containers in a Pod shares a host name, shared storage volume.
What happens if the Pause container gets deleted?
When the Pause container gets deleted, the Kubernetes thinks that the entire Pod is unavailable even though there are other business containers in the Pod which are up & running. Because of this, Kubernetes will recreate the Pod with a complete new IP address. But if the business container is stopped, Kubernetes won’t re-create the pod, only business application container is re-created, without change in any IP.
Future of Pause container
In CRI-O community, there were discussions around, if Pause containers continue to make sense or could they be replaced with other alternatives (pinNS)
Some of the factors driving such discussions -
- Space taken by the Pause container in the pods in a node; especially when there are many Pods in a node.
- Time taken to create, mount & start a Pause container.
- Process management overhead of Pause container => No process, no management!
In the next episode of demystifying Kubernetes Networking series, we’ll get bit more involved and try to learn about goals of Kubernetes networking model and how these have been implemented.
[Update] Link to Episode 3— https://sanjimoh.medium.com/demystifying-kubernetes-networking-episode-3-d00a260c389f