POST-MORTEM REPORT: THE DAY THE LAYERS COLLAPSED
DATE: October 14, 2023
AUDITOR: Lead Infrastructure Engineer (Hardened Systems Division)
STATUS: CRITICAL / FORENSIC COMPLETE
INCIDENT REF: #882-ALPHA-FAILURE
I’ve spent the last 72 hours staring at hex dumps and cleaning up the radioactive sludge left behind by a “standard” deployment. My eyes are bloodshot, my caffeine intake has reached toxic levels, and I have lost all faith in the industry’s ability to read a man page. We didn’t just have a breach; we had a total structural collapse. The stack didn’t fail—the foundation was built on sand and “convenience.”
This report is a autopsy of a dead system. If you’re looking for a “seamless” guide to containerization, look elsewhere. This is about the grit, the syscalls, and the entropy of a compromised environment. We are currently running Docker Engine v25.0.3, and yet, we are still making mistakes that were solved in 2014.
Table of Contents
1. TIMELINE OF FAILURE
03:00 AM: Prometheus alerts fire. High CPU utilization on the prod-api-04 node. Initial triage suggests a runaway worker process.
03:15 AM: Automated trivy scans on the running container environment (triggered by the spike) return a sea of red. The runtime environment is identified as compromised.
03:45 AM: Lateral movement detected. The attacker has escaped the container. They are now polling the metadata service of the cloud provider.
04:10 AM: Forensic analysis of docker history on the compromised image reveals a hardcoded GITHUB_TOKEN in layer 4.
04:30 AM: The attacker exploits CVE-2024-21626. Because the container was running as root and the runc version was unpatched on the host, they gained a file descriptor to the host’s /proc/self/cwd.
05:00 AM: Database credentials exfiltrated. The attacker uses the leaked token to push a malicious commit to the main branch, poisoning the CI/CD pipeline.
06:00 AM: Kill switch engaged. Entire VPC isolated. Production is dark. The cleanup begins.
2. THE ANATOMY OF A DISASTER: THE COMPROMISED DOCKERFILE
The following Dockerfile was found in the repository. It is a masterclass in how to ignore docker best practices. It is bloated, insecure, and fundamentally broken.
# THE "BEFORE" DOCKERFILE - ARCHITECTURAL SUICIDE
FROM ubuntu:latest
# Running as root is the default. This is a death sentence.
WORKDIR /app
# Bloated layer: installing everything including the kitchen sink
RUN apt-get update && apt-get install -y \
python3 \
python3-pip \
curl \
vim \
git \
wget \
build-essential
# Leaking secrets into the image metadata
ENV DB_PASSWORD="super-secret-password-123"
ENV API_KEY="AKIA_FAKE_LEAKED_KEY"
COPY . .
# Installing dependencies as root
RUN pip3 install -r requirements.txt
# Exposing a privileged port
EXPOSE 80
# No healthcheck, no signal handling
CMD ["python3", "app.py"]
3. THE ROOT USER SIN
The most egregious failure in this audit was the persistence of the root user. By default, Docker containers run as UID 0. In our environment, this allowed the attacker to exploit CVE-2024-21626. This vulnerability in runc allows an attacker to leak file descriptors. If the process is running as root, that leaked descriptor can be used to traverse the host filesystem.
When I ran docker inspect on the compromised container, the Config.User field was empty.
Raw Terminal Output: docker inspect
[
{
"Id": "sha256:a1b2c3d4...",
"Config": {
"User": "",
"ExposedPorts": {
"80/tcp": {}
},
"Env": [
"DB_PASSWORD=super-secret-password-123",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
]
}
}
]
An empty string means root. It means the attacker has the same privileges inside the container as the kernel’s administrative user. When the attacker triggered the runc exploit, they didn’t have to fight for escalation. We handed them the keys. A hardened infrastructure requires a non-privileged user. Period.
4. THE BLOATED IMAGE VULNERABILITY
The “Before” image used ubuntu:latest. This is a 77MB base image (compressed) that expands to over 200MB. It includes apt, bash, and a plethora of binaries that have no business being in a production runtime.
Every binary is a potential gadget for an attacker. By including curl and wget, we provided the attacker with the tools to download their secondary payloads. By including git, we allowed them to clone private repositories once they found the leaked token.
Raw Terminal Output: trivy image (The Horror Show)
Total: 584 (UNKNOWN: 5, LOW: 142, MEDIUM: 280, HIGH: 132, CRITICAL: 25)
+--------------+------------------+----------+-------------------+---------------+---------------------------------------+
| LIBRARY | VULNERABILITY | SEVERITY | INSTALLED VERSION | FIXED VERSION | TITLE |
+--------------+------------------+----------+-------------------+---------------+---------------------------------------+
| bash | CVE-2022-3715 | HIGH | 5.1-6ubuntu1 | 5.1-6ubuntu1.1| bash: a heap-based buffer overflow |
| coreutils | CVE-2016-2781 | LOW | 8.32-4.1ubuntu1 | | coreutils: Non-privileged session can |
| libc6 | CVE-2023-4911 | CRITICAL | 2.35-0ubuntu3.1 | 2.35-0ubuntu3.4| glibc: GHOST-like buffer overflow |
| libssl3 | CVE-2024-0727 | HIGH | 3.0.2-0ubuntu1.10 | 3.0.2-0ubuntu1.15| openssl: NULL pointer dereference |
+--------------+------------------+----------+-------------------+---------------+---------------------------------------+
Compare this to a Distroless image or a static Alpine build. A Distroless image contains only your application and its runtime dependencies. No shell. No package manager. The attack surface drops from 584 vulnerabilities to near zero. The byte-count difference is not just about disk space; it’s about the reduction of the “Exploit Entropy.”
5. THE SECRET LEAKAGE DISASTER
The developer thought that by putting ENV DB_PASSWORD in the Dockerfile, it was “internal” to the container. They were wrong. Docker layers are additive and permanent. Even if you unset the variable in a later layer, the secret remains in the image’s history.
Raw Terminal Output: docker history
IMAGE CREATED CREATED BY SIZE COMMENT
a1b2c3d4e5f6 2 hours ago CMD ["python3" "app.py"] 0B
<missing> 2 hours ago RUN /bin/sh -c pip3 install -r requirements.txt 45MB
<missing> 2 hours ago COPY . . 1.2MB
<missing> 2 hours ago ENV API_KEY=AKIA_FAKE_LEAKED_KEY 0B
<missing> 2 hours ago ENV DB_PASSWORD=super-secret-password-123 0B
<missing> 2 hours ago RUN /bin/sh -c apt-get update && apt-get inst… 150MB
<missing> 2 hours ago WORKDIR /app 0B
<missing> 3 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B
Anyone with docker pull access to that image can run docker history --no-trunc and scrape every secret ever defined in an ENV or ARG instruction. This is how our GitHub token was leaked. Secrets must be injected at runtime via a vault or secrets manager, never baked into the image layers.
6. OVERLAYFS AND LAYER POLLUTION
The RUN apt-get update && apt-get install command in the “Before” Dockerfile is a crime against OverlayFS. When you run a command in a Dockerfile, it creates a new layer. If you install 150MB of packages and then delete the cache in a subsequent RUN command, the 150MB still exists in the previous layer.
OverlayFS works by stacking these layers. The “upper” layer only records the deletion as a “whiteout” file. The bytes remain on the disk. This leads to bloated images that take longer to pull, increasing the window of vulnerability during scaling events.
Furthermore, the lack of a .dockerignore file meant the .git directory was copied into the image. This leaked the entire commit history, including deleted files that contained legacy credentials. The COPY . . command is a blunt instrument that should be replaced with specific file copies.
7. THE PRIVILEGED FLAG AND SYSCALL EXPOSURE
During the investigation, I found that the container was started with the --privileged flag in the docker-compose.yml. This is the equivalent of giving the container a “get out of jail free” card for the Linux kernel’s security boundaries.
A privileged container has access to all devices on the host. It can mount the host’s /dev filesystem. It bypasses AppArmor and Seccomp profiles.
The Syscall Problem:
A standard container is restricted by a Seccomp profile that filters out dangerous syscalls like mount(), reboot(), and ptrace(). By using --privileged, we allowed the attacker to use mount() to re-mount the host’s root partition as read-write inside the container.
We need to move toward a “Least Privilege” model where we only add the specific capabilities required, such as CAP_NET_BIND_SERVICE. We don’t need CAP_SYS_ADMIN to run a Python API.
8. NETWORKING: THE DEFAULT BRIDGE TRAP
The compromised service was running on the default docker0 bridge. This is a flat network where every container can talk to every other container on the same host. There is no internal segmentation.
Once the attacker gained a shell, they used nmap (which they installed because we left apt in the image) to scan the internal 172.17.0.0/16 range. They found an unauthenticated Redis instance used for caching and flushed all session data, forcing a mass logout of our users.
Hardened Networking Requirement:
We must use custom bridge networks or overlay networks with encrypted data planes. We must use the --internal flag for containers that do not require outbound internet access. The default bridge is a playground for lateral movement.
9. THE “AFTER” DOCKERFILE: HARDENED INFRASTRUCTURE
This is what a professional, hardened Dockerfile looks like. It uses multi-stage builds to ensure the final runtime image is as lean as possible. It uses a non-root user. It cleans up after itself. It follows docker best practices to the letter.
# STAGE 1: BUILDER
FROM python:3.11-slim-bookworm AS builder
# Set environment variables for Python
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
WORKDIR /install
# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
gcc \
&& rm -rf /var/lib/apt/lists/*
# Install requirements to a local directory
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt
# STAGE 2: RUNTIME
# Using Google's Distroless for maximum hardening
# No shell, no package manager, no root user
FROM gcr.io/distroless/python3-debian12:nonroot
WORKDIR /app
# Copy only the necessary artifacts from the builder
COPY --from=builder /install /usr/local
COPY --from=builder /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu
# Copy application code
# Ensure the .dockerignore excludes .git, tests, and env files
COPY . .
# Distroless images run as a non-root user by default (uid 65532)
# But we explicitly set it for clarity
USER nonroot
# Use a healthcheck to ensure the container is viable
HEALTHCHECK --interval=30s --timeout=3s \
CMD ["python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')"]
# Expose non-privileged port
EXPOSE 8080
# Run the application
ENTRYPOINT ["python", "app.py"]
10. VERIFICATION: THE SCANNING LOGS
After implementing the hardened Dockerfile, I ran a new scan. The results are the only reason I’m not resigning today.
Raw Terminal Output: trivy image --severity CRITICAL,HIGH hardened-api:latest
hardened-api:latest (debian 12.5)
=================================
Total: 0 (UNKNOWN: 0, LOW: 0, MEDIUM: 0, HIGH: 0, CRITICAL: 0)
Raw Terminal Output: docker history hardened-api:latest
IMAGE CREATED CREATED BY SIZE COMMENT
f1e2d3c4b5a6 10 mins ago ENTRYPOINT ["python" "app.py"] 0B
<missing> 10 mins ago COPY . . 450KB
<missing> 10 mins ago COPY --from=builder /usr/lib/x86_64-linux-gnu… 12MB
<missing> 10 mins ago COPY --from=builder /install /usr/local 28MB
<missing> 10 mins ago USER nonroot 0B
<missing> 2 weeks ago ... (Distroless Base Layers) 18MB
The image size dropped from 245MB to 58MB. The vulnerability count dropped from 584 to 0. The secrets are gone. The shell is gone. The root user is gone.
11. MANDATORY HARDENING CHECKLIST
Effective immediately, no container will be deployed to production unless it passes the following forensic audit requirements. Failure to comply will be treated as a deliberate security bypass.
- NO ROOT USERS: Every Dockerfile must contain a
USERinstruction with a non-zero UID. If the application requires a privileged port, usesetcapor map the port at the runtime level (e.g., 8080:80). - MULTI-STAGE BUILDS ONLY: Compilers, build-tools, and header files are forbidden in the final runtime image. Use multi-stage builds to copy only the necessary artifacts.
- DISTROLESS OR MINIMAL ALPINES: Use
gcr.io/distrolessfor languages like Python, Node, and Java. Usescratchfor Go binaries. If you must use a general-purpose OS, use Alpine Linux and keep it updated. - NO SECRETS IN LAYERS: Use
docker-composesecrets, Kubernetes Secrets, or a cloud-native Vault. Any image found with anENVvariable containing a credential will be purged and the developer’s access revoked. - IMMUTABLE TAGS:
latestis not a version. It is a rolling disaster. AllFROMinstructions must use specific version tags or, preferably, SHA256 digests. - READ-ONLY FILESYSTEMS: Containers should be run with
--read-only. Any required writable areas (like/tmp) must be mounted astmpfsvolumes. - RESOURCE CONSTRAINTS: Every container must have
cpusandmemorylimits defined. This prevents a single compromised container from performing a DoS attack on the entire host via resource exhaustion. - NO PRIVILEGED CONTAINERS: The
--privilegedflag is banned. If you think you need it, you are wrong. Use specific--cap-addflags for the minimum required capabilities. - SCANNING INTEGRATION:
trivyorsnykmust block the CI/CD pipeline if any HIGH or CRITICAL vulnerabilities are found. - NETWORK ISOLATION: Production containers must run on a dedicated overlay network with
com.docker.network.bridge.enable_icc: "false"to prevent inter-container communication unless explicitly allowed.
The layers collapsed because we were lazy. We prioritized the speed of “it works on my machine” over the stability of “it is secure in production.” This ends now. The infrastructure is being rebuilt. The entropy is being purged.
Audit Signed:
The Hardened Infrastructure Auditor
06:45 AM – Post-Breach Recovery Day 3
Related Articles
Explore more insights and best practices: