Docker Image Guide: How to Build, Run, and Manage Images

text
[root@prod-node-04 ~]# docker pull registry.internal.corp/analytics/data-cruncher:latest
latest: Pulling from analytics/data-cruncher
d5a1f291072d: Already exists
f23a467d5e21: Pull complete
4f4fb5514a3d: Pull complete
7d23456789ab: Extracting [==================================================>] 2.1GB/2.1GB
8e34567890cd: Extracting [========================> ] 1.2GB/2.4GB
failed to register layer: Error processing tar file(exit status 1): write /usr/lib/x86_64-linux-gnu/libLLVM-15.so.1: no space left on device
[root@prod-node-04 ~]# df -h /var/lib/docker
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p3 100G 96G 0 100% /var/lib/docker
[root@prod-node-04 ~]# date
Sun Oct 20 03:14:22 UTC 2024

The smell of a data center at 3:00 AM is a specific cocktail of ionized air, floor wax, and the metallic tang of overheating copper. I’m sitting on a cold raised-floor tile in Row 4, my laptop balanced on a crash cart, staring at the terminal output above. This is the "modern" infrastructure we were promised. This is the "efficiency" of the container revolution. 

A junior developer, likely someone who thinks a "socket" is something you plug a lamp into, pushed a 4.5GB image to the registry. They didn't bother to check the base image. They didn't bother to clean up the build dependencies. They just kept stacking layers like a toddler playing with sticky blocks until the underlying filesystem—a perfectly tuned XFS partition on a high-speed NVMe—choked to death on the sheer volume of redundant garbage.

We used to ship binaries. We used to ship 15MB ELF files that did one thing and did it well. Now, we ship entire operating systems, half-baked Python environments, and three different versions of the LLVM compiler just to run a microservice that calculates a sales tax. It is a disgrace.

## The Bloat is the Point

The "docker image" is not a revolutionary technology. It is a glorified tarball with an identity crisis. At its core, it is a collection of filesystem changes wrapped in a JSON manifest that tells the Docker Engine (currently v25.0.3 in this godforsaken cluster) how to stack them using a union filesystem. 

The industry has moved away from the discipline of the "bare metal" era, where every byte of RAM was a precious resource, toward a culture of "storage is cheap, so let's be lazy." But storage isn't cheap when you're paying for the IOPS to pull a 4GB image across a saturated 10GbE link every time a pod restarts.

When you pull `debian:bookworm`, you aren't just getting the tools you need. You're getting a whole graveyard of legacy utilities, man pages, and shared libraries that your application will never touch. But that’s just the start. Developers treat these images like virtual machines. They `apt-get install` everything from `vim` to `net-tools` "just in case" they need to debug. 

The result? A bloated, sluggish monster. The "image" becomes a black box. Because it’s so easy to just `docker build`, nobody asks what’s actually inside. They don't see the 400MB of `.pyc` files or the `.git` directory accidentally copied into the image because someone forgot a `.dockerignore` file. The bloat isn't an accident; it's the inevitable outcome of a system that rewards convenience over competence.

## Layer Caching: The False Promise of Speed

Docker's marketing team loves to talk about layer caching. They tell you that if you don't change a layer, you don't have to rebuild it or pull it. It sounds like magic. In reality, it’s a leaky abstraction that creates more problems than it solves.

Docker uses the `Overlay2` storage driver. It works by creating a series of directories on the host—usually under `/var/lib/docker/overlay2`—and merging them into a single unified view using the kernel's union mount capabilities. When you run a command in a Dockerfile, Docker creates a new layer. If you delete a file in that layer, it isn't actually gone. It’s just hidden by a "whiteout" file in the upper layer.

Look at this `docker history` output from the failed image:

```bash
[root@prod-node-04 ~]# docker history --no-trunc analytics/data-cruncher:latest
IMAGE          CREATED        CREATED BY                                                                                          SIZE      COMMENT
<missing>      2 hours ago    RUN /bin/sh -c apt-get update && apt-get install -y build-essential python3-dev libllvm15 && rm -rf /var/lib/apt/lists/* # buildkit   1.85GB    
<missing>      2 hours ago    COPY . /app # buildkit                                                                              2.1GB     
<missing>      3 hours ago    RUN /bin/sh -c pip install -r requirements.txt # buildkit                                           450MB     
<missing>      4 hours ago    FROM debian:bookworm                                                                                117MB     

Notice the 2.1GB COPY . /app layer? The developer copied their entire local environment, including a venv folder and several large CSV test files, into the image. Then, in the next layer, they installed build-essential. Even if they had tried to clean up the build tools in a later RUN command, the 1.85GB of compilers and headers would still exist in the previous layer, forever haunting the disk space of every node in the cluster.

The “cache” is only as good as the developer’s understanding of the build order. Change one line at the top of your Dockerfile, and every subsequent layer is invalidated. You’re back to pulling gigabytes of data. It’s a fragile system that encourages “clever” Dockerfile hacks rather than actual optimization.

Anatomy of a Disastrous Dockerfile

A “disastrous” Dockerfile is the norm, not the exception. I’ve seen things that would make a VAX-11 systems administrator weep. The most common sin is the failure to understand the execution context of the RUN command.

Every RUN instruction creates a new commit in the underlying storage driver. If you do this:

RUN apt-get update
RUN apt-get install -y heavy-package
RUN rm -rf /var/lib/apt/lists/*

You have already failed. You have created three layers. The first layer has the package indices. The second has the package. The third has the instruction to delete the indices, but the indices themselves are still sitting in the first layer’s filesystem blob. You must chain these commands with && and a backslash to keep them in a single layer. But even then, you’re still fighting a losing battle against the CoW (Copy-on-Write) overhead.

Then there’s the ADD vs COPY debate. ADD is a security nightmare that can fetch remote URLs and unpack tarballs automatically. It’s non-deterministic and dangerous. COPY is better, but still, people use it to dump their entire context into the image.

And don’t get me started on environment variables. I’ve found database passwords, API keys, and private SSH keys baked directly into image layers via ENV instructions. Because developers think the image is “private,” they treat it like a secure vault. They don’t realize that anyone with docker pull access can run docker history or docker inspect and see every single secret ever injected into a layer.

The “Latest” Tag is a Suicide Note

In the world of bare metal, we had version control. We had specific builds. In the world of Docker, people rely on the :latest tag.

The :latest tag is not a version. It is a floating pointer, a ghost that haunts your production environment. When you pull :latest, you are playing Russian Roulette with your uptime. You have no guarantee that the image you tested in staging is the same one being pulled in production.

If the upstream maintainer of python:latest (currently pointing to 3.12.2-bookworm) decides to update a shared library that breaks your specific C-extension, your deployment will fail. Or worse, it will start, but it will behave non-deterministically.

[root@prod-node-04 ~]# docker inspect --format='{{.RepoDigests}}' analytics/data-cruncher:latest
[registry.internal.corp/analytics/data-cruncher@sha256:a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0u1v2w3x4y5z6a1b2c3d4e5f6]

If you aren’t pinning your images to a specific SHA256 digest, you aren’t running a stable system. You’re running a collection of hopes and prayers. The “latest” tag is a suicide note written in YAML. It bypasses the entire concept of immutable infrastructure. True immutability requires a cryptographic hash of the content, not a friendly name that can be overwritten by any CI/CD pipeline with a grudge.

Stripping the Fat: A Manual for the Paranoid

If you must use containers—and it seems I am forced to by the powers that be—then do it with some dignity. The only acceptable way to build a Docker image is the multi-stage build. This is the only feature added to Docker in the last five years that isn’t total bloatware.

A multi-stage build allows you to use a “heavy” image (like golang:1.22.1-bookworm) to compile your code and then copy the resulting static binary into a “scratch” or “distroless” image.

# Stage 1: The Build
FROM golang:1.22.1-bookworm AS builder
WORKDIR /src
COPY . .
RUN go build -o /app/server main.go

# Stage 2: The Final Product
FROM gcr.io/distroless/static-debian12:latest
COPY --from=builder /app/server /server
USER nonroot:nonroot
ENTRYPOINT ["/server"]

This takes your image from 800MB down to 20MB. It removes the shell, the package manager, and every other tool an attacker could use once they find a vulnerability in your code.

But even “distroless” isn’t enough for the truly paranoid. You should be using ldd to trace every shared library dependency. You should be using readelf to check for stack canaries and NX bits. You should be statically linking everything. If your binary requires libc.so.6, you are still relying on the host’s kernel interface and the image’s library version matching up perfectly.

The goal should be an image that contains exactly one file: your binary. Anything else is just an invitation for entropy to take hold.

Why Your “Docker Image” is Actually a Security Nightmare

Let’s talk about the kernel. Developers think a container is a sandbox. It isn’t. A container is a process (or group of processes) running on the same kernel as the host, constrained by cgroups and namespaces.

When you ship a 4GB image filled with outdated libraries, you are shipping a massive attack surface. Every .so file in that image is a potential target. Even if your application doesn’t use libssl.so.1.1, if it’s sitting in the /usr/lib directory of your image, an attacker who gains remote code execution can use it.

Furthermore, the default configuration of Docker is a security disaster. Most images run as root. This means that if an attacker escapes the container—and container escapes happen with alarming frequency due to the complexity of the Linux kernel’s syscall interface—they are root on your host.

[root@prod-node-04 ~]# docker inspect analytics/data-cruncher:latest | grep -i "user"
            "User": "",

An empty “User” field means the process is running as UID 0. In 2024, this is inexcusable. You are one io_uring vulnerability away from losing the entire rack.

And then there’s the issue of the shared kernel. A container can make syscalls directly to the host kernel. We use seccomp profiles and AppArmor to try and restrict this, but the sheer number of syscalls (over 300 in modern kernels) makes it nearly impossible to secure perfectly. Every time you add a layer to your image, you’re adding more complexity, more potential for misconfiguration, and more “layers of lies” between you and the hardware.

The “docker image” has become a crutch for bad engineering. It allows people to ignore the fundamentals of systems programming, memory management, and security. They think that because it’s “containerized,” it’s safe and efficient. They are wrong.

I’m looking at the clock. It’s 4:00 AM now. I’ve cleared enough space in /var/lib/docker by nuking the build cache and deleting old images of “failed experiments” from other teams. The deployment is finally pulling.

[root@prod-node-04 ~]# docker system prune -a --volumes
...
Total reclaimed space: 42.6GB
[root@prod-node-04 ~]# docker pull registry.internal.corp/analytics/data-cruncher:latest
...
Status: Downloaded newer image for registry.internal.corp/analytics/data-cruncher:latest

It works. For now. But tomorrow, another developer will push another 5GB monstrosity. They’ll add another layer of abstraction, another “easy” tool that hides the reality of the hardware. And I’ll be back here, in the cold, listening to the fans scream, cleaning up the mess left by a generation that forgot how to talk to the metal.

The “docker image” isn’t the future. It’s just a very heavy, very complicated way to avoid learning how a computer actually works. And until we stop treating infrastructure like a disposable toy, we will continue to suffer the consequences of our own bloat.

Now, if you’ll excuse me, I have to go check why the OOM Killer just took out the monitoring agent on node 07. I suspect someone tried to run a Java app in a container with a 256MB limit. Some people never learn.

Related Articles

Explore more insights and best practices:

Leave a Comment