POST-INCIDENT AUDIT: REPORT #88-B (CRITICAL SYSTEM COMPROMISE)
DATE: 2024-10-14
AUDITOR: Senior Infrastructure Architect (Security/Hardening)
SUBJECT: The systematic failure of docker compose deployments in the “Alpha-Omega” staging environment.
Table of Contents
1. The Incident Log
The following is a raw dump from the host prod-srv-01 during the initial breach detection. The developer responsible claimed the setup was “standard.” I claim it was an invitation to a funeral.
# journalctl -u docker.service --since "2024-10-14 02:00:00"
Oct 14 02:10:15 prod-srv-01 dockerd[1102]: container_id=a4f2e... exec "curl http://169.254.169.254/latest/meta-data/iam/security-credentials/"
Oct 14 02:12:44 prod-srv-01 kernel: [10422.12] audit: type=1400 audit(1728871964.123:45): apparmor="DENIED" operation="mount" info="failed flags check" error=-13 profile="docker-default" name="/proc/" pid=14202 comm="python3"
Oct 14 02:15:01 prod-srv-01 dockerd[1102]: container_id=a4f2e... OOM kill detected. Memory limit exceeded.
# docker stats --no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT NET I/O BLOCK I/O PIDS
a4f2e8b1c0d9 web_app 185.20% 1.99GiB / 2GiB 12.4GB / 8.2GB 14MB / 0B 842
# iptables -L DOCKER -n -t nat
Chain DOCKER (2 references)
target prot opt source destination
RETURN all -- 0.0.0.0/0 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:6379 to:172.18.0.3:6379
The post-mortem revealed a classic disaster. A developer used a standard docker compose file. They exposed Redis to the world. They didn’t set resource limits. They didn’t drop capabilities. An attacker hit the Redis port, used a known Lua script injection to gain shell access, and immediately started probing the cloud provider’s metadata service. The only reason we caught it was a poorly written cryptominer that tripped the OOM killer. We didn’t “win.” We got lucky.
2. The Teardown: Why Your YAML Is a Liability
The default behavior of docker compose is built for speed, not for survival. When you run docker compose up, you are handing the keys of your kernel to a set of binaries you likely haven’t audited.
First, the ports directive. Most developers think 6379:6379 means “open this port on the firewall.” No. It means “bypass the host’s ufw or firewalld and inject a DNAT rule directly into the iptables DOCKER chain.” Your host-level firewall is now irrelevant. If that container is running as root—which it is, by default—you have effectively bridged your internal memory store to the public internet with zero filtering.
Second, the networking. The default bridge network allows every container to talk to every other container. Why does the frontend need to reach the database’s management port? It doesn’t. But in a default docker compose stack, the blast radius is the entire subnet.
Third, capabilities. Linux kernels use capabilities to break down the “root” privilege into smaller pieces. By default, Docker grants containers things like NET_RAW (perfect for ARP spoofing) and MKNOD (creation of special files). Most applications need none of these. Leaving them active is negligence.
3. The Reconstruction: Hardening the Stack
We are going to rebuild this using docker compose v2.27.0. We are going to treat the host like a hostile environment and the containers like untrusted actors. We will not use “magic.” We will use explicit, restrictive configurations.
H2: The Fallacy of Default Bridge Networking
The first step is to kill the default network. We will define multiple networks with internal: true to ensure that back-end services cannot reach the outside world, even if the container is compromised.
networks:
frontend_net:
driver: bridge
driver_opts:
com.docker.network.bridge.name: br-frontend
backend_net:
internal: true
driver: bridge
driver_opts:
com.docker.network.bridge.name: br-backend
By setting internal: true, Docker configures iptables to drop any packet leaving that bridge that isn’t destined for another container on the same bridge. This prevents exfiltration. If an attacker gains a shell on your database container, they can’t curl their command-and-control server. They are trapped in a dark room.
H2: Capability Leaks and the Root User Trap
Every container must run as a non-privileged user. No exceptions. But even then, we must strip the kernel privileges. In docker compose, we use cap_drop and security_opt.
services:
app:
image: our-hardened-python:3.12-slim
user: "1000:1000"
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE
security_opt:
- no-new-privileges:true
- seccomp:unconfined # Only if using a custom profile, otherwise leave default
cap_drop: [ALL] is the baseline. If your app needs to bind to port 80, you add back NET_BIND_SERVICE. Nothing else. The no-new-privileges:true flag is the most important line in the file. It prevents processes from gaining new privileges via setuid or setgid binaries. It stops a compromised low-privilege user from escalating to root within the container namespace.
H2: The Iptables Treachery and Port Binding
Stop binding to 0.0.0.0. If you must expose a port, bind it to a specific internal IP or 127.0.0.1 if you are running a local proxy like Nginx or HAProxy.
ports:
- "127.0.0.1:8080:8080"
This ensures the port is only accessible to the local host. If you need public access, you handle it at the edge, not at the container level. The interaction between docker compose and the host’s routing table is too opaque to trust with public-facing services.
H2: Resource Exhaustion as a Denial of Service
A container without limits is a time bomb. An attacker doesn’t need to steal data to win; they can just consume every CPU cycle or every byte of RAM, crashing the host. We use deploy configurations even in non-swarm mode (Compose V2 respects these).
deploy:
resources:
limits:
cpus: '0.50'
memory: 512M
reservations:
cpus: '0.25'
memory: 256M
ulimits:
nproc: 65535
nofile:
soft: 20000
hard: 40000
Setting ulimits is vital. A fork bomb in a container can exhaust the host’s process table. By limiting nproc, we contain the explosion. We also set mem_limit to prevent the OOM killer from reaping critical host processes like sshd because a leaky Node.js app decided to eat 16GB of RAM.
H2: Orchestration Logic and Healthcheck Rigidity
Most developers use depends_on as a simple list. This is useless. It only checks if the container is started, not if it’s functional. We need to use the long-form depends_on with service_healthy conditions. This prevents the “thundering herd” problem where the app starts, fails to connect to the database, and enters a crash loop that fills the logs and consumes CPU.
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user -d db"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
Then, in the application service:
depends_on:
db:
condition: service_healthy
This forces docker compose to respect the actual state of the infrastructure. It ensures that the application doesn’t even attempt to start until the database is ready to accept connections.
H2: Filesystem Integrity and the Read-Only Mandate
A container’s root filesystem should be immutable. If an attacker gains access, they shouldn’t be able to install a rootkit, modify /etc/shadow, or drop a persistence script.
read_only: true
tmpfs:
- /tmp
- /run
- /var/cache/nginx
By setting read_only: true, we turn the entire container into a read-only medium. Any attempt to write to the filesystem results in an error. For the few directories that require write access (like /tmp or pid files), we use tmpfs. This keeps the writes in memory, and they vanish the moment the container restarts. No persistence. No footprint.
4. The Hardened Spec
This is the final, audited docker-compose.yaml. It is not “easy” to use. It will break your “hot-reloading” developer workflows. It will require you to actually understand your application’s requirements. That is the point.
version: "3.9"
services:
db:
image: postgres:16-alpine
container_name: hardened_db
environment:
POSTGRES_PASSWORD_FILE: /run/secrets/db_password
networks:
- backend_net
volumes:
- db_data:/var/lib/postgresql/data:rw
secrets:
- db_password
deploy:
resources:
limits:
memory: 1G
healthcheck:
test: ["CMD", "pg_isready", "-U", "postgres"]
interval: 5s
timeout: 5s
retries: 5
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
read_only: true
tmpfs:
- /run/postgresql
- /tmp
api:
image: our-registry/api-service:v1.4.2
container_name: hardened_api
user: "1001:1001"
depends_on:
db:
condition: service_healthy
networks:
- backend_net
- frontend_net
environment:
DB_HOST: db
DB_PASSWORD_FILE: /run/secrets/db_password
secrets:
- db_password
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
read_only: true
tmpfs:
- /tmp
- /run
deploy:
resources:
limits:
cpus: '1.0'
memory: 2G
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
proxy:
image: nginx:alpine
container_name: hardened_proxy
ports:
- "127.0.0.1:80:80"
- "127.0.0.1:443:443"
networks:
- frontend_net
depends_on:
api:
condition: service_started
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE
- CHOWN
- SETGID
- SETUID
security_opt:
- no-new-privileges:true
read_only: true
tmpfs:
- /var/cache/nginx
- /var/run
- /tmp
logging:
driver: "syslog"
options:
syslog-address: "udp://127.0.0.1:514"
tag: "nginx"
networks:
frontend_net:
internal: false
driver: bridge
driver_opts:
com.docker.network.bridge.enable_icc: "false"
backend_net:
internal: true
driver: bridge
driver_opts:
com.docker.network.bridge.enable_icc: "false"
volumes:
db_data:
driver: local
driver_opts:
type: none
o: bind
device: /mnt/secure_storage/postgres_data
secrets:
db_password:
file: ./secrets/db_password.txt
Analysis of the Hardened Spec
- Isolation: I used
com.docker.network.bridge.enable_icc: "false". This disables Inter-Container Communication by default. Even on the same bridge, containers cannot talk to each other unless explicitly linked or using DNS resolution. This is the “Zero Trust” model applied to the bridge. - Secrets Management: We are not using environment variables for passwords.
POSTGRES_PASSWORDis a security hole; it shows up indocker inspectand/proc/1/environ. We usesecrets, which mounts the password as a file in/run/secrets/. - Logging: The proxy logs to
syslog. If an attacker wipes the container logs, the evidence is already on the remote logging server. The API uses ajson-filedriver with strict rotation to prevent disk exhaustion. - User Namespacing: Although not shown in the YAML (as it is a
daemon.jsonsetting), this configuration assumes the host hasuserns-remapenabled. This means “root” in the container is actually an unprivileged high-range UID on the host. - Volume Hardening: The database volume is a bind mount to a specific, encrypted partition (
/mnt/secure_storage). We don’t trust Docker’s default volume management to handle data persistence.
5. The Warning
I have spent the last decade watching developers treat docker compose like a toy. They copy-paste YAML snippets from Stack Overflow and wonder why their infrastructure is a sieve. They prioritize “developer experience” and “velocity” while I am the one who has to explain to the board why our customer data is being sold on a Telegram channel.
The future of container orchestration is not looking better. We are moving toward more abstraction, more “serverless” layers that hide the underlying insecurity. People think that moving to the cloud solves these problems. It doesn’t. It just moves the iptables rules to a different API.
If you use this hardened spec, your developers will complain. They will say it’s “too hard” to debug. They will say they can’t “just exec in and fix things.” Good. They shouldn’t be “fixing things” in production. They should be building artifacts that are secure by design.
Every open port is an insult. Every default configuration is a back door. If you aren’t paranoid, you aren’t doing your job. You have been warned.
AUDIT COMPLETE.
STATUS: FAIL (Remediation Required)
SIGNATURE: [REDACTED]
Related Articles
Explore more insights and best practices: