INCIDENT REPORT #9902-B: THE DEATH OF OUR REGISTRY
Date: October 14, 2023
Duration: 48 Hours (and my sanity)
Status: Post-Mortem / Educational Intervention
Lead SRE: [REDACTED]
Table of Contents
1. Summary of the Carnage
At 03:14 UTC, the monitoring stack for our production cluster started screaming. Node prod-worker-04, running Docker Engine v25.0.3, reported a total disk exhaustion event on the /var/lib/docker partition. Within twelve minutes, the failure cascaded. The registry, hosted on a dedicated S3-backed instance, began throwing 504 Gateway Timeouts. The reason? A junior developer—who I will not name but who is currently banned from touching the CI/CD pipeline—pushed a 4.2GB image tagged as latest.
This wasn’t just a “large file.” This was a technical crime. The image was based on a bloated Debian Bookworm environment, into which someone had decided to COPY . . their entire local development environment, including a 2GB .git folder, a node_modules directory the size of a small moon, and several unoptimized machine learning model weights.
When the scheduler tried to pull this monstrosity across twenty nodes simultaneously, the internal network saturated. The Docker Engine, using API version 1.44, attempted to decompress these layers. The resulting IOPS spike killed the underlying EBS volumes. We didn’t just have a slow deployment; we had a total infrastructure seizure. I’ve spent the last two days manually cleaning up orphaned layers and fixing corrupted overlay2 backing stores. My coffee is cold, my eyes are vibrating, and if I see another “latest” tag, I’m quitting to become a carpenter.
2. The Layered Lie: Dissecting the Manifest
To understand what is actually sitting on your disk, you have to stop thinking of a Docker image as a single file or a virtual machine disk. It isn’t. An image is a collection of read-only layers, orchestrated by a JSON manifest. When you run docker pull, you aren’t downloading a “program”; you are fetching a series of tarballs and a set of instructions on how to stack them.
Let’s look at the manifest of the disaster in question. This is what the Docker Engine sees before it even starts the download:
[
{
"Config": "8f3a2b1c...json",
"RepoTags": ["our-app:latest"],
"Layers": [
"sha256:a3ed95caeb02...",
"sha256:5f70bf18a086...",
"sha256:7c3d1f2b4e5a..."
]
}
]
Each of those SHA-256 hashes represents a layer. The hash is generated by taking the content of the layer’s tarball and running it through the SHA-256 algorithm. This is “content-addressable storage.” If the content changes by even one bit—say, you added a single space to a YAML file—the hash changes completely.
The junior dev’s mistake was thinking that by “overwriting” the latest tag, they were replacing the image. No. They were just adding more garbage to the stack. The registry now has to store every single version of those 4GB layers because the “latest” tag is just a mutable pointer. It’s a lie. It’s a shortcut that leads directly to production outages.
3. Deconstructing the Manifest: Content-Addressable Chaos
When we talk about what is an image identity, we are talking about the DiffID and the ChainID. This is where people get lost in the YAML indentation hell of their own making.
The Docker Engine v25.0.3 uses these hashes to ensure integrity. When a layer is downloaded, Docker calculates the SHA-256 of the compressed artifact (the Distribution Hash). Then, it decompresses it and calculates the hash of the uncompressed content (the DiffID).
If you run docker inspect on the bloated image, you see the horror:
$ docker inspect our-app:latest
[
{
"Id": "sha256:8f3a2b1c...",
"RepoTags": [
"our-app:latest"
],
"RepoDigests": [
"our-app@sha256:d4e5f6..."
],
"Parent": "",
"Comment": "buildkit.dockerfile.v0",
"Created": "2023-10-14T03:00:00Z",
"Container": "...",
"DockerVersion": "25.0.3",
"Author": "",
"Config": {
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
],
"Cmd": [
"python",
"app.py"
],
"Image": "sha256:8f3a2b1c..."
},
"Architecture": "amd64",
"Os": "linux",
"Size": 4294967296,
"GraphDriver": {
"Data": {
"LowerDir": "/var/lib/docker/overlay2/layer_a/diff:/var/lib/docker/overlay2/layer_b/diff",
"MergedDir": "/var/lib/docker/overlay2/layer_c/merged",
"UpperDir": "/var/lib/docker/overlay2/layer_c/diff",
"WorkDir": "/var/lib/docker/overlay2/layer_c/work"
},
"Name": "overlay2"
},
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:a3ed95caeb02...",
"sha256:5f70bf18a086...",
"sha256:7c3d1f2b4e5a..."
]
},
"Metadata": {
"LastTagTime": "2023-10-14T03:05:00Z"
}
}
]
Look at that Size field. 4.2 billion bytes. Most of that is dead weight. Because the developer used Debian Bookworm as a base instead of something sane like Alpine 3.19, they started with a 100MB+ footprint before they even wrote a line of code. Then they piled on the layers.
The RootFS.Layers array shows the stack. Each hash here is a DiffID. If you want to know why your builds are slow, it’s because Docker has to calculate these hashes every single time you change a line of code and rebuild. It’s a massive CPU tax paid for the privilege of laziness.
4. The Overlay2 Autopsy: Where the Files Actually Live
To understand what is happening on the filesystem, we have to look at /var/lib/docker/overlay2/. This is the “Graph Driver.” It’s where the magic—and the nightmares—happen.
When you pull an image, Docker creates a directory for each layer. It uses the overlay2 storage driver to mount these layers on top of each other. This isn’t a simple copy. It uses the mount syscall with the overlay type.
The kernel takes the LowerDir (the read-only layers), the UpperDir (the changes made in the current layer), and presents them as a single, unified filesystem in the MergedDir.
During the incident, I had to manually inspect the diff directories to find out where the 4GB was coming from.
# Navigating the graveyard of disk space
$ cd /var/lib/docker/overlay2/
$ du -sh * | sort -hr | head -n 5
4.2G l/6W7Q...
1.2G l/2B9R...
...
The l directory contains symbolic links to the actual layer IDs to avoid hitting the kernel’s limit on mount command lengths. If you look inside one of these diff folders, you see the actual files. In our case, I found a /root/.cache/pip directory that was 1.5GB. Why? Because the dev didn’t use --no-cache-dir in their Dockerfile.
This is the reality of Docker: it’s just a clever way of lying to the process about where its files are. But the disk doesn’t lie. The disk just fills up until the kernel starts killing processes to save itself.
5. The “Latest” Sin and the Immutable Myth
We need to talk about the “latest” tag. In a sane world, we would use immutable digests. A digest looks like this: our-app@sha256:d4e5f6.... This is a permanent, unchangeable reference to a specific manifest.
The “latest” tag is a pointer. It’s like a DNS record that someone can change at any moment. When the junior dev pushed their 4GB image with the latest tag, every node in the cluster that had a PullPolicy: Always (another mistake) immediately tried to pull it.
Because the tag was the same, but the underlying SHA-256 digest had changed, Docker Engine v25.0.3 realized its local cache was invalid. It began the “Death Pull.”
The OCI (Open Container Initiative) Image Specification defines how these tags and digests work. A tag is just a reference in the registry’s key-value store. It points to a manifest. The manifest points to the layers. By using latest, you are essentially saying, “I want whatever random garbage was most recently uploaded to this name.”
In production, this is suicide. We use specific version tags or, better yet, the SHA-256 digest of the image. If we had been using digests, the new 4GB image would have sat harmlessly in the registry until we explicitly updated the deployment manifest. Instead, it was a forced injection of bloat.
6. The 4GB Bloat: A Forensic History
I ran docker history --no-trunc on the offending image. This command is the only way to see the “why” behind the “what.” It shows the commands that created each layer.
“`text
IMAGE CREATED CREATED BY
Related Articles
Explore more insights and best practices: