Docker Best Practices: Build Efficient, Secure Containers

I just ran docker images on the “Hello World” container you pushed to the registry.

REPOSITORY          TAG       IMAGE ID       CREATED          SIZE
hello-world-app     latest    f3a2b1c0d9e8   2 minutes ago    2.14GB

Two. Gigabytes. For a Python script that prints “Hello World” and exits. In 1994, I was managing entire SunOS clusters on less disk space than you’ve wasted on a single container layer. You’ve managed to package an entire operating system, a compiler toolchain, three different versions of the Python interpreter, and probably the developer’s local Downloads folder into a single OCI image.

This isn’t “modern development.” This is a digital crime scene. You’ve treated Docker like a virtual machine, and in doing so, you’ve created a slow, insecure, and unmaintainable nightmare that I now have to fix before it touches my production nodes.

Table of Contents

THE CRIME SCENE: A 2GB HELLO WORLD MANIFESTO

Here is the Dockerfile you handed me. I’ve printed it out just so I could physically throw it in the trash, but for the sake of this post-mortem, let’s look at the wreckage:

# THE CRIME SCENE
FROM ubuntu:latest

# Running update without cleaning up the cache
RUN apt-get update
RUN apt-get install -y python3 python3-pip build-essential

# Using ADD instead of COPY for no reason
ADD . /app
WORKDIR /app

# Installing dependencies as root
RUN pip3 install -r requirements.txt

# No user defined, running as root
CMD ["python3", "hello.py"]

THE RECKONING: DISSECTING YOUR INCOMPETENCE

Let’s go through this line by line, because apparently, the concept of “efficiency” has been lost to your generation of “move fast and break things” developers.

1. FROM ubuntu:latest
You used the latest tag. Never use the latest tag. It is a non-deterministic pointer to whatever the maintainer felt like pushing five minutes ago. When this build fails in six months because ubuntu:latest moved from 22.04 to 24.04 and broke a dependency, I’m not the one who’s going to stay up until 3 AM fixing it. You are. Furthermore, ubuntu is a general-purpose distribution. It contains binaries for things your app will never touch—utilities for hardware management, internationalization files, and man pages. You don’t need sed, awk, or grep inside a container that only runs a Python script.

2. Multiple RUN commands
Every RUN instruction in a Dockerfile creates a new layer in the filesystem. You ran apt-get update in one layer and apt-get install in another. Do you know what that does? It stores the entire package index in the first layer. Even if you tried to delete it in a later layer, it stays in the image history forever. It’s like painting a wall black, then painting it white, and wondering why the coat of paint is so thick.

3. build-essential in a production image
Why is gcc, g++, and make in my production environment? Unless your app is compiling C extensions at runtime (which it shouldn’t be), this is a massive security risk. If an attacker gains shell access to your container, you’ve kindly provided them with all the tools they need to compile a rootkit or a crypto-miner right on the spot.

4. The ADD instruction
You used ADD . /app. The ADD instruction is a relic. It has “magic” features like auto-extracting tarballs and fetching remote URLs. Unless you specifically need that magic, use COPY. It’s explicit. It’s predictable. And because you didn’t include a .dockerignore file, you just copied your .git directory, your __pycache__, and your local environment variables into the image. That’s where that 2GB is coming from.

5. Running as root
You didn’t define a USER. By default, Docker runs as root. If there is a container escape vulnerability in the kernel (and there will be), the attacker doesn’t just have your app; they have the host. Running as root in a container is a firing offense in my book.

THE REFACTORING LOG: STRIPPING THE BLOAT

We are going to rebuild this using a multi-stage build and a specific, slim base image. We are going to use python:3.11.5-slim-bookworm. Why? Because it’s based on Debian, it’s predictable, and it doesn’t include the kitchen sink.

# STAGE 1: Builder
FROM python:3.11.5-slim-bookworm AS builder

# Set environment variables to keep Python from being annoying
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

WORKDIR /build

# Install build dependencies only here
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

# STAGE 2: Final Runtime
FROM python:3.11.5-slim-bookworm

# Create a non-privileged user
RUN groupadd -g 10001 appuser && \
    useradd -u 10001 -g appuser -s /bin/sh appuser

WORKDIR /app

# Copy only the installed site-packages from the builder
COPY --from=builder /install /usr/local
COPY --chown=appuser:appuser hello.py .

USER appuser

ENTRYPOINT ["python", "hello.py"]

Look at the difference. We use a “Builder” stage to compile anything that needs compiling, then we copy only the resulting artifacts into a fresh, clean “Runtime” stage. The build-essential garbage never makes it to the final image. We use --no-install-recommends to avoid installing “suggested” packages that we don’t need. We clean up /var/lib/apt/lists/* in the same RUN command to ensure the cache isn’t persisted in the layer.

THE LAYER CAKE OF LIES: UNDERSTANDING OVERLAYFS

To understand why your 2GB image is a failure, you need to understand how the storage driver works. Most modern Docker installations use overlay2. It’s a union filesystem. It stacks directories on top of each other and presents them as a single unified view.

When you run a command like RUN apt-get update, Docker creates a new directory (a layer). Any files modified or created during that command are written to this new directory. If you run RUN rm -rf /var/lib/apt/lists/* in a subsequent line, Docker doesn’t actually delete those files from the underlying storage. It creates a “whiteout” file in the new layer that tells the union filesystem to hide the file from the merged view. The bits are still there, taking up space, being pushed to the registry, and being pulled by the production servers.

Let’s look at the docker history of your original disaster:

$ docker history --human --format "{{.CreatedBy}}: {{.Size}}" hello-world-app:latest

/bin/sh -c #(nop)  CMD ["python3" "hello.py"]: 0B
/bin/sh -c pip3 install -r requirements.txt: 850MB
/bin/sh -c #(nop) WORKDIR /app: 0B
/bin/sh -c ADD . /app: 1.1GB
/bin/sh -c apt-get install -y python3 python3-pip...: 450MB
/bin/sh -c apt-get update: 32MB
/bin/sh -c #(nop)  FROM ubuntu:latest: 77.8MB

Look at that ADD . /app line. 1.1GB. You copied your local venv folder and your .git history into the image. Then look at the pip install line. 850MB. Because you didn’t use --no-cache-dir, pip saved a copy of every wheel it downloaded in ~/.cache/pip.

In the refactored version, we chain commands using && and clean up in the same layer. This ensures that the temporary files never get committed to the image’s read-only layers. This is a fundamental “docker best” practice that separates the engineers from the script kiddies.

THE SECURITY HARDENING: ROOT IS A FIRING OFFENSE

I mentioned UID/GID mapping earlier. Let’s get into the weeds. When you run a process as root inside a container, it is, by default, the same UID 0 as the root user on the host. While Docker uses Linux Namespaces to isolate the process, namespaces are not a perfect sandbox. There have been numerous “container escape” vulnerabilities (like Dirty Pipe or various runc exploits) where a process inside the container can break out.

If that process is running as root, it has root privileges on your host kernel. By creating a specific appuser with a high UID (like 10001) and using the USER instruction, we ensure that even if a breakout occurs, the attacker is trapped in a low-privilege account.

Furthermore, we should be talking about Linux Capabilities. A standard process doesn’t need the ability to change the system clock (CAP_SYS_TIME), modify kernel modules (CAP_SYS_MODULE), or bypass file permissions (CAP_DAC_OVERRIDE). When I deploy your container, I’m going to use the following flags:

docker run --cap-drop ALL --cap-add NET_BIND_SERVICE --security-opt no-new-privileges:true hello-world-app:v1.0.0

This drops every single kernel capability and only adds back the bare minimum needed to bind to a network port. The no-new-privileges flag prevents the process from gaining new privileges via setuid or setgid binaries. If you had written your Dockerfile correctly, you would have known this.

THE NUANCES OF COPY VS ADD

I see juniors get this wrong constantly. They think ADD is just COPY with a shorter name. It isn’t.

COPY is the preferred instruction. It does exactly what it says: it copies files or directories from your build context into the container. It is transparent and easy to audit.

ADD, on the other hand, has two “features” that are actually security vulnerabilities in disguise:
1. Remote URL Support: If you use ADD https://example.com/big-file.tar.gz /app/, Docker will download that file. But it doesn’t support checksum verification within the instruction. You have no idea if that file was tampered with in transit. A better approach is to use RUN curl ... && sha256sum ....
2. Auto-Extraction: If you ADD a local .tar.gz file, Docker will automatically extract it into the destination directory. This sounds convenient until you realize it can lead to “Zip Slip” style attacks or simply unexpected filesystem layouts.

By using COPY, we maintain a strict audit trail of what is entering the image. We also leverage the layer cache more effectively. Docker’s build cache works by comparing the checksum of the files being copied. If you use ADD with a remote URL, the cache behavior becomes unpredictable.

SYSCALL FILTERING AND SECCOMP

Since you’re so fond of using ubuntu:latest, I assume you haven’t given a single thought to the system calls your application makes. Every time your Python script interacts with the outside world—opening a file, sending a network packet, checking the time—it makes a syscall to the host kernel.

A standard Docker container has access to about 300 syscalls. Your “Hello World” app probably needs about 15. By leaving the other 285 syscalls available, you are increasing the attack surface of the kernel.

In a real environment, I would run your container through strace to see exactly what it’s doing:

strace -c -f docker run hello-world-app:v1.0.0

Then, I would generate a Seccomp (Secure Computing Mode) profile that explicitly allows only those syscalls and denies everything else. If your app suddenly tries to call mount() or ptrace(), the kernel will kill the process instantly. This is how we run production systems. We don’t just “hope” the code is safe; we enforce safety at the syscall level.

THE FINAL VERDICT: ADHERING TO DOCKER BEST PRACTICES OR QUITTING

The industry has become lazy. Because compute is cheap and disk space is plentiful, people think it’s okay to ship 2GB containers for trivial tasks. It isn’t. Large images take longer to pull, which slows down auto-scaling during traffic spikes. They take longer to scan for vulnerabilities. They consume more expensive NVMe storage on our build servers.

Adhering to “docker best” practices isn’t about being a pedant; it’s about professional engineering. It’s about knowing that when the pager goes off at 4 AM, the container is as slim, secure, and predictable as possible.

Here is your homework. You will take that 2GB monstrosity and you will refactor it until it is under 100MB. You will use a multi-stage build. You will use a non-root user. You will use specific version tags for every image in the FROM line. And if I see apt-get update without a corresponding cleanup in the same layer again, I will personally revoke your access to the production cluster and move your desk next to the server room intake fans.

Docker is a tool for isolation and packaging, not a dumping ground for your messy development environment. Treat the filesystem with respect, treat the kernel with caution, and for the love of Ken Thompson, stop using latest.

Now, get out of my office and fix this. I have a Perl script from 1998 that’s more efficient than your entire career.

Explore more insights and best practices: