Mastering Docker Compose: Simplify Multi-Container Apps

The smell of ozone and stale coffee is the only thing that stays the same.

It was 03:14 AM on a Tuesday in 2014. I was standing in a data center in Secaucus, the floor tiles vibrating under my boots from the sheer CFM of the cooling fans. We had a cluster of HP ProLiant DL380s that had been running a monolithic financial settlement engine for six years. I knew those machines. I knew their IRQ timings. I knew which ones had slightly wonky RAID controllers that required a specific firmware revision just to keep the write-cache from flaking out.

Then the “drift” happened.

A junior dev—bless his heart and his misplaced enthusiasm—had manually updated libssl on Node 04 to patch a CVE he’d read about on Hacker News. He didn’t use the configuration management scripts. He didn’t document it. He just apt-get installed his way into a nightmare. Three hours later, the binary linked against the old library started throwing segmentation faults that looked like a digital stroke.

[72432.124005] settlement_eng[14202]: segfault at 0 ip 00007f8eac321a10 sp 00007ffed8c1a210 error 4 in libssl.so.1.0.0[7f8eac2e1000+60000]
[72432.124012] Code: 48 89 45 f0 48 8b 45 f0 48 8b 00 48 85 c0 74 1b 48 8b 45 f0 48 8b 00 48 8b 40 28 48 85 c0 74 0a 48 8b 45 f0 48 8b 00 ff 50 28 <48> 8b 45 f0 48 8b 00 48 8b 40 30 48 85 c0 74 0a 48 8b 45 f0 48 8b
ldd /usr/local/bin/settlement_eng
    linux-vdso.so.1 =>  (0x00007ffed8dfb000)
    libssl.so.1.0.0 => not found
    libcrypto.so.1.0.0 => not found
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8eabf10000)

The system was hemorrhaging $40,000 a minute. I spent four hours manually symlinking libraries and rebuilding the environment from a 500-line Bash script that I wrote in 2011, which—of course—failed because the Debian mirrors had moved. That was the old way. The “pure” way. The way that nearly gave me a coronary.

The Shell Script Delusion

For twenty years, I believed that if you couldn’t build a server from a blank TTY with nothing but a shell script and a prayer, you weren’t a real engineer. I treated my servers like prize-winning orchids. I hand-tuned /etc/sysctl.conf like I was playing a Stradivarius.

When Docker first showed up, I laughed. “A wrapper for LXC? I can write my own cgroups, thanks.” But then the microservices era hit like a freight train. Suddenly, I wasn’t managing one monolith; I was managing twelve different services, three different databases, and a message broker.

I tried to keep the old faith. I wrote a master deployment script. It was a masterpiece of if-else blocks, sed commands, and grep hacks. It would SSH into five different boxes, pull the latest code, and run a series of docker run commands.

# The old way - A fragment of my descent into madness
docker run -d --name pg-db -e POSTGRES_PASSWORD=password123 -v /mnt/data/pg:/var/lib/postgresql/data postgres:15-alpine
docker run -d --name redis-cache redis:7.0-bullseye
docker run -d --name api-srv --link pg-db:db --link redis-cache:redis -p 8080:8080 my-api:latest
# Wait, I forgot the network. And the restart policy. And the log rotation.
# And if the API starts before the DB is ready, it crashes.
# So I add a 'sleep 30'. God help me.

The “AHA” moment didn’t come during a keynote speech. It came when I had to explain to a new hire how to set up the local dev environment. It took him three days. He had the wrong version of Redis. His Postgres container couldn’t see the API container because he’d messed up the bridge network naming. I looked at my 500-line Bash script, then I looked at his broken terminal, and I realized I was the problem. I was holding onto a “bare-metal” ego while the world was moving toward declarative state.

I realized that docker run is a command, but docker-compose.yml is a contract.

YAML: The Necessary Evil of Declarative State

I hate YAML. I hate the whitespace sensitivity. I hate that it feels like a configuration language designed by people who think tabs are a sin. But I hate downtime more.

When I finally swallowed my pride and moved to Docker Compose (specifically the V2 specification), the first thing I noticed was the sanity of the depends_on and healthcheck parameters. No more sleep 30 in my Bash scripts. No more hoping the database was ready to accept connections before the application layer tried to bind to it.

Here is the anatomy of the beast that replaced my 500-line script. It’s not “pretty,” but it’s deterministic. It’s idempotent. If I run it ten times, I get the same result ten times. That is a luxury I never had with bare metal.

version: '3.8' # Now part of the Compose Spec

services:
  db:
    image: postgres:15-alpine
    container_name: production_db
    restart: always
    environment:
      POSTGRES_USER: ${DB_USER:-admin}
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_DB: settlement_prod
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U admin -d settlement_prod"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - backend

  cache:
    image: redis:7.0-bullseye
    command: redis-server --save 60 1 --loglevel warning
    volumes:
      - redisdata:/data
    networks:
      - backend

  api:
    build:
      context: ./services/api
      dockerfile: Dockerfile
    depends_on:
      db:
        condition: service_healthy
      cache:
        condition: service_started
    env_file: .env
    ports:
      - "8080:8080"
    networks:
      - backend
      - frontend

networks:
  frontend:
  backend:
    internal: true

volumes:
  pgdata:
  redisdata:

Look at that healthcheck. In the old days, I’d have a cron job or a Nagios check that would alert me after the service failed. Here, the api service won’t even attempt to spawn until the db service reports it’s actually ready to handle queries. That’s not just a feature; it’s a preventative measure against the 3 AM wake-up call.

The Networking Lies We Tell Ourselves

On bare metal, networking was simple. You had an IP. You had a port. You had a firewall. If you wanted two services to talk, you pointed them at each other’s static IPs.

In the container world, networking is a hall of mirrors. I spent weeks fighting with the Docker bridge. I’d see errors like this in the logs:

api-srv  | 2023/10/12 14:22:11 Error connecting to database: dial tcp 172.18.0.2:5432: connect: connection refused
api-srv  | 2023/10/12 14:22:11 Retrying in 5 seconds...
api-srv  | 2023/10/12 14:22:16 Error connecting to database: dial tcp: lookup db on 127.0.0.11:53: no such host

The 127.0.0.11:53 address is the Docker embedded DNS server. It’s a fickle beast. If you don’t define your networks properly in Compose, your services are just shouting into the void.

The “Veteran” in me wanted to use network_mode: host for everything. “Just give me the raw throughput!” I’d scream at the screen. But host mode is a trap. It destroys the isolation that makes containers useful. It leads to port collisions that remind me of the dark days of trying to run two instances of Apache on the same box.

Docker Compose forces you to define your topology. In the YAML above, the db and cache are on the backend network, which is marked as internal: true. This means they have no route to the outside world. They are invisible to the internet. Only the api service, which sits on both frontend and backend, can talk to them. This is the kind of security posture that used to take me hours to configure with iptables and VLANs. Now, it’s four lines of YAML.

Volumes: Where Data Goes to Die

If networking is a hall of mirrors, volumes are a minefield. I’ve seen more data lost to improper volume mounting than to actual hardware failure.

The biggest mistake people make—and I made it too—is treating container storage like a regular filesystem. You think, “I’ll just mount /var/lib/mysql to a local folder.” Then you realize that the UID/GID of the user inside the container (usually 999 for Postgres) doesn’t match the user on your host machine. You end up with a permission denied error that kills the service on startup.

production_db | 2023-10-12 14:30:05.123 UTC [1] FATAL:  could not open directory "pg_tblspc": Permission denied
production_db | 2023-10-12 14:30:05.123 UTC [1] LOG:  database system is shut down

I learned the hard way: use named volumes. Let Docker manage the abstraction. If you need to back it up, you back up the volume, not the directory. And for the love of all that is holy, don’t use bind mounts for production databases unless you have a very specific reason to handle the I/O overhead and permission mapping yourself.

The bare-metal guy in me still winces at the thought of not knowing exactly which sector on the disk my data lives on. But the systems engineer in me recognizes that pgdata:/var/lib/postgresql/data is a much more stable way to handle state across container restarts than a hardcoded path in a Bash script that might not exist on the next server I rack.

Secrets, Environment Variables, and the Art of Not Getting Fired

In the old days, secrets were kept in a file called config.php or settings.py, usually with permissions set to 600, owned by root. We’d use rsync to push these files around. It was primitive, but it worked—until someone accidentally committed the config file to Git.

Docker Compose handles environment variables with a .env file, but it’s a double-edged sword.

# .env file - The place where security goes to hide
DB_PASSWORD=super_secret_password_that_i_will_forget_to_change
API_KEY=sk_live_51Mz...
DEBUG=false

The problem is that people forget that .env is just a local convenience. In a real production environment, you shouldn’t be using .env files; you should be injecting these through your CI/CD pipeline or a proper secret manager. But for local development—the “mirroring production” part—Compose is king.

I can hand a developer a docker-compose.yml and a .env.example. They copy the example, fill in their local keys, and run docker compose up -d.

$ docker compose up -d
[+] Running 4/4
 ⠿ Network settlement_frontend  Created                                  0.1s
 ⠿ Network settlement_backend   Created                                  0.1s
 ⠿ Container production_db      Started                                  0.5s
 ⠿ Container redis-cache        Started                                  0.4s
 ⠿ Container api-srv            Started                                  0.8s

$ docker compose ps
NAME                IMAGE                COMMAND                  SERVICE             CREATED             STATUS                    PORTS
api-srv             settlement-api       "./main"                 api                 10 seconds ago      Up 9 seconds              0.0.0.0:8080->8080/tcp
production_db       postgres:15-alpine   "docker-entrypoint.s…"   db                  10 seconds ago      Up 9 seconds (healthy)    5432/tcp
redis-cache         redis:7.0-bullseye   "docker-entrypoint.s…"   cache               10 seconds ago      Up 9 seconds              6379/tcp

That (healthy) tag next to the database? That’s the sound of me sleeping through the night. It means the container isn’t just “running” (which is a useless metric); it means it’s actually responding to queries.

The Idempotency Myth and the Reality of docker compose up

We talk about “idempotency” like it’s a religious commandment. The idea is that you can run the same command over and over and the state remains the same. Bare-metal scripts are almost never idempotent. If you run mkdir /data twice, the second time it throws an error unless you add -p. If you run apt-get install twice, it might upgrade a package you didn’t want to touch.

Docker Compose is the closest I’ve ever come to true idempotency in a deployment workflow. If I change a single environment variable in the YAML and run docker compose up -d, it doesn’t tear down the whole stack. It calculates the diff. It sees that the db and cache haven’t changed, so it leaves them alone. It sees the api needs a new environment variable, so it recreates only that container.

$ docker compose up -d
[+] Running 3/3
 ⠿ Container production_db      Running                                  0.0s
 ⠿ Container redis-cache        Running                                  0.0s
 ⠿ Container api-srv            Recreated                                0.3s

This “Recreated” status is the magic. It’s the difference between a 5-minute outage and a 500-millisecond blip.

But let’s talk about the failures, because that’s where the grit is. What happens when the build fails? What happens when the .dockerignore is missing and you accidentally send your 2GB node_modules folder to the Docker daemon as part of the build context?

$ docker compose build
[+] Building 124.2s (7/11)
 => [api internal] load build definition from Dockerfile                 0.1s
 => [api internal] load .dockerignore                                    0.1s
 => [api internal] load metadata for docker.io/library/golang:1.21-alpine  0.5s
 => [api internal] sending build context to Docker daemon              1.8GB

I’ve seen seniors sit there for ten minutes wondering why their build is slow, not realizing they’re uploading their entire local history to the daemon. The .dockerignore is as important as the Dockerfile itself. It’s the “don’t touch this” sign on the server rack.

# .dockerignore
.git
node_modules
*.log
tmp/

The Ghost of Bare-Metal Past

I still miss the physical reality of servers. I miss knowing that eth0 is the top port on the NIC and eth1 is the bottom one. I miss the tactile click of a drive tray locking into place.

But I don’t miss the configuration drift. I don’t miss the “it works on my machine” excuses from developers who have a different version of libpq installed on their MacBook than we have on the Debian stable servers.

Docker Compose is the bridge. It allows me to codify my twenty years of infrastructure knowledge into a format that a 22-year-old intern can understand. It’s a way to ensure that the “manual configuration” disaster of 2014 never happens again.

When I look at a docker-compose.yml file, I don’t see “shiny” new tech. I see a hardened, version-controlled blueprint of a system. I see the end of the 500-line Bash script. I see a world where I can actually take a vacation without worrying that a library update will trigger a cascade of segmentation faults.

If you’re still writing manual scripts to manage your containers, you’re not being “hardcore.” You’re being a liability. You’re building a snowflake in a world that demands an ice factory.

The transition wasn’t easy. I fought it every step of the way. I grumbled about the overhead of the Docker daemon. I complained about the complexity of overlay networks. But then I realized that the complexity was always there—I was just hiding it in my head and in my brittle shell scripts. Docker Compose just forces that complexity out into the open where it can be versioned, tested, and managed.

I’m still a bare-metal veteran. I still care about kernel parameters and I/O schedulers. But now, I set those parameters in the sysctls section of my Compose file. I tune my limits in the ulimits block. I’m still racking servers; I’m just doing it in code now. And the coffee still tastes like battery acid at 3 AM, but at least the servers are staying up.

$ docker compose logs -f api
api-srv  | 2023-10-12 15:00:01 INFO: Starting settlement engine v2.4.1
api-srv  | 2023-10-12 15:00:01 INFO: Connected to database: settlement_prod
api-srv  | 2023-10-12 15:00:01 INFO: Connected to cache: redis-7.0
api-srv  | 2023-10-12 15:00:01 INFO: Listening on :8080

That’s the only log I want to see. No segfaults. No missing libraries. Just a clean, boring, stable start. That’s the dream. And Docker Compose, for all its YAML-induced headaches, is the only thing that actually delivered it.

Related Articles

Explore more insights and best practices:

Leave a Comment