Docker Best Practices: Build Production-Ready Containers

The Grave We Dug with Default Dockerfiles

A Technical Post-Mortem and Remediation Guide

$ docker build -t api-service:latest .
[+] Building 412.4s (9/12)
 => [internal] load build definition from Dockerfile                       0.1s
 => [internal] load .dockerignore                                          0.1s
 => [internal] load metadata for docker.io/library/node:latest             1.2s
 => [1/8] FROM docker.io/library/node:latest                               0.0s
 => [2/8] WORKDIR /app                                                     0.5s
 => [3/8] COPY . .                                                        84.2s
 => [4/8] RUN npm install                                                210.4s
 => [5/8] RUN npm run build                                               45.1s
 => [6/8] EXPOSE 3000                                                      0.0s
 => [7/8] CMD ["npm", "start"]                                             0.0s
 => exporting to image                                                    70.2s
 => => exporting layers                                                   70.1s
 => => writing image sha256:4f52...                                        0.1s
 => => naming to docker.io/library/api-service:latest                      0.0s

$ docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}"
REPOSITORY          TAG                 SIZE
api-service         latest              2.84GB

$ docker inspect api-service:latest | grep -i "user"
            "User": "",

$ curl -I http://production-api-01:3000/health
HTTP/1.1 502 Bad Gateway
# Incident Log: 03:14 AM. Node memory exhaustion. 
# Overlay2 disk pressure at 98%. 
# Deployment rolled back. SRE team (me) awake for 48 hours.

I am staring at a 2.84GB container image. It contains a simple Node.js API that should, by all rights, occupy no more than 150MB. This isn’t just “inefficiency.” This is professional negligence. This is the result of “tutorial-driven development,” where someone copied a Dockerfile from a 2017 Medium post and called it a day.

The production failure tonight wasn’t a code bug. It was a failure of the infrastructure to breathe under the weight of its own bloat. When the overlay2 storage driver hit the disk limit because of layer accumulation, the kubelet started evicting pods. Because the images were nearly 3GB, the pull time for the replacement pods exceeded the liveness probe thresholds. We entered a death spiral.

We are going to fix this. We are going to implement docker best practices, or I am going to start deleting repositories without warning.


1. The Sin of the “Latest” Tag and the Moving Target

The first thing I saw in the logs was FROM node:latest. This is a suicide note. When you use latest, you are telling the build daemon, “I don’t care what version of the OS or the runtime I use. Just give me whatever was pushed five minutes ago.”

Last night, latest updated. It brought in a new version of Debian with a different GLIBC version that conflicted with one of our native C++ bindings. The build passed. The tests passed in a stale CI environment. Production died.

To follow docker best practices, you must pin your versions. Not just the language runtime, but the underlying OS distribution.

The Sin

FROM node:latest
WORKDIR /app
COPY . .
RUN npm install
CMD ["node", "index.js"]

The Penance

# Pinning to a specific LTS version on a stable, slim OS release
FROM node:20.11.0-alpine3.19

# Set environment to production to ensure devDependencies are ignored
ENV NODE_ENV=production

WORKDIR /app

# We will address the rest of this disaster in the following sections.

By using node:20.11.0-alpine3.19, we guarantee immutability. Alpine Linux uses musl instead of glibc, which reduces the attack surface and the footprint. If you need glibc, use node:20.11.0-bookworm-slim. Never, under any circumstances, allow a tag to be a moving target in a production pipeline.


2. Layer Bloat and the Multi-Stage Salvation

I ran docker history api-service:latest and I nearly threw my monitor out the window. Every RUN command, every COPY, every ADD creates a new layer on the filesystem. In the “Sin” example above, the npm install layer was 800MB because it included all the devDependencies, build tools, and the npm cache. Even if you delete them in a later RUN command, they stay in the previous layer. They are ghosts that haunt your disk space forever.

The docker best way to handle this is multi-stage builds. You use one heavy image to build your assets and a second, microscopic image to run them.

The Sin

FROM node:20.11.0
WORKDIR /app
COPY . .
RUN npm install && npm run build
# The node_modules and source code are now baked into this layer forever.
CMD ["node", "dist/main.js"]

The Penance

# Stage 1: The Builder
FROM node:20.11.0-alpine3.19 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci # Clean install for reproducible builds
COPY . .
RUN npm run build

# Stage 2: The Runner
FROM node:20.11.0-alpine3.19
WORKDIR /app
ENV NODE_ENV=production

# Only copy the compiled assets and production dependencies
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package*.json ./

# Look at that. No build tools. No source code. No junk.
CMD ["node", "dist/main.js"]

Using dive on the new image shows a reduction from 2.84GB to 162MB. The overlay2 driver no longer has to manage 15 layers of garbage. We are copying only what is necessary for execution. This is not a suggestion; it is a requirement for survival.


3. The Security Theater of Root Containers

Every single container in our cluster was running as root. If an attacker exploits a vulnerability in the application, they aren’t just a user; they are the superuser of that container’s namespace. If they find a way to escape the container, they are root on the host.

A docker best practice that is ignored 90% of the time is the principle of least privilege. Most official images, including the Node Alpine image, already have a non-privileged user created (usually named node). Use it.

The Sin

FROM node:20.11.0-alpine3.19
WORKDIR /app
COPY . .
# Running as root by default. 
# If someone escapes the app, they own the container.
CMD ["node", "index.js"]

The Penance

FROM node:20.11.0-alpine3.19
# Create a dedicated group and user if the base image doesn't have one
# But for Node Alpine, 'node' user (UID 1000) already exists.

WORKDIR /app

# Ensure the app directory is owned by our non-root user
COPY --chown=node:node --from=builder /app/dist ./dist
COPY --chown=node:node --from=builder /app/node_modules ./node_modules

USER node

# Now, even if the app is compromised, the attacker is trapped in a 
# low-privilege shell with no ability to install packages or modify system files.
CMD ["node", "dist/main.js"]

I shouldn’t have to explain why sudo doesn’t belong in a container. If your application needs to modify the host’s network stack or mount filesystems, you aren’t building a container; you’re building a security nightmare.


4. Cache Invalidation and the Order of Operations

The build logs showed COPY . . happening before npm install. This is why the build took 7 minutes every time someone changed a single comment in a README file. Docker’s layer caching is based on the checksum of the files being copied. If you copy the entire directory, any change to any file invalidates the cache for that layer and every subsequent layer.

To adhere to docker best practices, you must copy your dependency manifests first, install your dependencies, and then copy your source code.

The Sin

FROM node:20.11.0-alpine3.19
WORKDIR /app
COPY . . 
RUN npm install # This runs EVERY time any file changes.

The Penance

FROM node:20.11.0-alpine3.19
WORKDIR /app

# Copy only the files that define dependencies
COPY package*.json ./

# This layer is cached unless package.json or package-lock.json changes
RUN npm ci --only=production

# Now copy the rest of the source. 
# Changes here won't trigger a re-install of node_modules.
COPY . . 

This simple reordering reduced our CI build time from 7 minutes to 45 seconds. We are no longer wasting CPU cycles on the build farm re-downloading the internet for every minor commit.


5. The .dockerignore Void

Why was the image 2.84GB? Because the build context included the .git folder, the local node_modules, the dist folder from the developer’s machine, and a 1GB test-data.log file that someone forgot to delete.

When you run docker build, the first thing the CLI does is “Sending build context to Docker daemon.” If you don’t have a .dockerignore file, you are sending every piece of junk in your project folder over the socket.

The Sin

(No .dockerignore file exists. The daemon receives 2GB of data before the build even starts.)

The Penance

Create a .dockerignore file. It is as important as the Dockerfile itself.

# .dockerignore
.git
node_modules
npm-debug.log
dist
Dockerfile
.dockerignore
.env
.aws
tests/
docs/
*.md

By excluding these, the build context drops from gigabytes to kilobytes. The docker best approach is to be subtractive. If the container doesn’t need it to run, it doesn’t belong in the context.


6. PID 1 and the Zombie Process Apocalypse

In Linux, PID 1 is the init process. It has two jobs: reaping orphaned child processes and handling signals like SIGTERM and SIGINT. Node.js, Python, and Java were not designed to run as PID 1. They don’t reap zombies, and they often ignore signals.

When I tried to stop the failing containers last night, they took 30 seconds to die. Why? Because they ignored SIGTERM, and Kubernetes eventually had to SIGKILL them. This leads to data corruption and unclean shutdowns.

The Sin

# Node runs as PID 1. It doesn't know how to handle signals properly.
CMD ["node", "index.js"]

The Penance

Use a minimal init system like tini. It is included in many base images or can be added easily. It handles the signal forwarding and zombie reaping so your app doesn’t have to.

# Install tini
RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "--"]

# Now node runs as a child of tini
CMD ["node", "dist/main.js"]

Alternatively, use the --init flag at runtime, but for production-grade docker best compliance, bake it into the image or use a base image that handles it.


7. The Filesystem Truth: ADD vs COPY

I saw ADD being used to move local files. Stop it. ADD is a complex command that can fetch remote URLs and extract tarballs automatically. It is unpredictable and carries security risks (like Zip Slip vulnerabilities).

Unless you are specifically pulling a remote tarball and want it extracted in one go, use COPY. It is transparent. It is simple. It is the docker best practice for a reason.

The Sin

ADD my-app.tar.gz /app/
ADD https://internal-tool.com/binary /usr/local/bin/tool

The Penance

# Be explicit.
COPY my-app.tar.gz /tmp/
RUN tar -xzf /tmp/my-app.tar.gz -C /app/ && rm /tmp/my-app.tar.gz

# Use curl for remote files so you can check checksums and handle errors
RUN apk add --no-cache curl \
    && curl -sSL https://internal-tool.com/binary -o /usr/local/bin/tool \
    && chmod +x /usr/local/bin/tool \
    && apk del curl

The Anatomy of the Overlay2 Failure

Let’s talk about why the disk filled up. Docker uses a storage driver, usually overlay2, to manage the layers. Each layer is a directory on the host’s filesystem (usually under /var/lib/docker/overlay2). When you write a file in a container, it uses a “copy-on-write” strategy.

If you have a 1GB log file in your image and you run RUN rm /app/log.file, the file is “deleted” in the new layer, but it still exists in the previous layer. The total disk usage is still 1GB plus the metadata for the deletion.

This is why the “Sin” Dockerfile was so lethal.
1. COPY . . brought in 1GB of junk. (Layer 1: +1GB)
2. RUN npm install added 800MB of modules. (Layer 2: +800MB)
3. RUN npm run build added 200MB of assets. (Layer 3: +200MB)

Even if the final stage only needed 200MB, the host was storing 2GB for every single version of that image. Multiply that by 10 microservices and 5 previous versions kept for rollbacks, and you have a disk-space catastrophe.


Checklist for the Uninitiated

If you are going to push code to my cluster, you will verify the following. I will not ask twice.

  1. Is the base image pinned? No latest. No major version tags only. Use node:20.11.0-alpine3.19.
  2. Is it multi-stage? If your final image contains a compiler, a git client, or a package manager cache, you failed.
  3. Is it non-root? Check docker inspect on your image. If "User": "" or "User": "root", fix it.
  4. Is there a .dockerignore? If you are sending your .git folder to the daemon, you are doing it wrong.
  5. Are the layers ordered for caching? Manifests first, then install, then code.
  6. Is PID 1 handled? Use tini or a similar init process.
  7. Is the image size reasonable? If a microservice is over 200MB, you better have a damn good reason (like a machine learning model).

I am going to sleep for four hours. When I wake up, I expect to see the image sizes in the registry trending downward. If I see another 2GB image, I am revoking your docker push permissions and you can go back to deploying via FTP like it’s 1999.

The grave is deep enough. Stop digging.

Related Articles

Explore more insights and best practices:

Leave a Comment