What is a Docker Container? A Complete Guide for Beginners

Sit down, kid. We’re going to talk about why your “magic box” is just a fancy wrapper for a process.

Stop looking at your watch. It’s 3:00 AM, the AC in this rack room has been dead since Tuesday, and that blinking amber light on the SAN is the only thing keeping me awake. You’ve spent the last three hours talking about “microservices” and “cloud-native paradigms” like you’re reading from a marketing brochure. You think that docker container you just deployed is some kind of sovereign territory, a little digital island isolated from the world.

You’re wrong. It’s a lie. It’s a ghost. There is no container. There is only the kernel, and the kernel is a cold, hard master that doesn’t care about your YAML files. You’re running Docker Engine v26.1.1 and containerd v1.7.15, and you think that makes you a wizard? I was partitioning drives before you were a glint in a venture capitalist’s eye.

Grab that terminal. We’re going to peel back the skin of this beast until you see the gears grinding.

The Illusion of Isolation: Why Your Container is Just a Process with Issues

You keep calling it a “lightweight VM.” If I hear that one more time, I’m going to toss your MacBook into the shredder. A Virtual Machine has a hardware abstraction layer. It has its own kernel. It has a BIOS. It has a soul, kid. A docker container is just a process that’s been told a very elaborate series of lies by the Linux kernel.

Back in 1979, we had chroot. It was simple. It changed the root directory for a process. That was the beginning of the lie. But chroot was leaky. A clever process could break out faster than you can say “dependency hell.” What you’re looking at now is just chroot on steroids, wrapped in a shiny API.

When you run a docker container, you aren’t booting anything. You’re calling clone() with a bunch of flags. You’re telling the kernel, “Hey, take this process and make it think it’s alone in the universe.” But it’s not. It’s sharing the same memory, the same CPU cycles, and the same kernel bugs as everything else on this host.

Look at this. I’m running ps aux on the host. See that? That’s your “isolated” Nginx process. It has a PID on the host. I can kill it from here. I can strace it from here. I can see everything it’s doing. Your “container” is just a folder and a set of constraints. It’s a bird in a cage, and you’re trying to tell me the bird lives in a different dimension.

Namespaces: The Cardboard Walls

You want to know how the lie is maintained? Namespaces. That’s the magic trick. The kernel uses namespaces to partition kernel resources so that one set of processes sees one set of resources while another set sees something else. It’s like putting blinkers on a horse.

There are seven main namespaces you’re leaning on: mnt, uts, ipc, pid, net, user, and cgroup. When Docker Engine v26.1.1 starts a process, it uses the unshare or clone syscalls with flags like CLONE_NEWPID and CLONE_NEWNET.

Let’s look at the truth. Run this:

# lsns -t mnt
        NS TYPE   NPROCS   PID USER             COMMAND
4026531840 mnt       124     1 root             /sbin/init
4026532258 mnt         1  3452 root             /usr/bin/containerd-shim-runc-v2 -namespace moby -id 7a8f...

See that? That 4026532258 is the mount namespace for your docker container. It’s just an ID in a kernel table. If you want to see the lies the process believes, look at /proc:

# readlink /proc/3452/ns/mnt
mnt:[4026532258]

The process thinks it has its own filesystem because the kernel intercepted the mount calls and redirected them. It thinks it’s the only process in the world because CLONE_NEWPID told the kernel to start numbering its PIDs from 1 again inside that namespace. But it’s a facade. It’s cardboard walls painted to look like a fortress. If the kernel gets confused—and it does—those walls vanish.

And don’t get me started on CLONE_NEWNET. You think you have a private network? You have a veth pair. One end is in the container’s namespace, the other is plugged into a bridge called docker0 on the host. It’s just a virtual patch cable. I can sniff every packet your “secure” app sends by just sitting on the host bridge with tcpdump. There is no privacy in the basement, kid.

Cgroups: The Resource Straightjacket

If Namespaces are the lies we tell the process about what it can see, Control Groups (cgroups) are the lies we tell it about what it can have. You set a memory limit of 512MB in your compose file and think you’re being responsible. You’re just putting a straightjacket on a psychotic patient.

In the old days, if a process went rogue, it took the whole system down. Now, we use /sys/fs/cgroup/. This is where the kernel keeps the accounting books. Docker Engine v26.1.1 uses cgroup v2 by default on modern kernels.

Let’s look at the resource usage for your “magic box”:

# cgtop
Control Group                            Tasks   %CPU   Memory  Input/s Output/s
/system.slice/docker-7a8f...scope            2    0.4   120.5M        -        -

Go into /sys/fs/cgroup/system.slice/docker-<id>.scope/. Look at memory.max. That’s the hard ceiling. If your app tries to allocate one byte over that, the OOM (Out Of Memory) killer doesn’t care about your “graceful degradation.” It just sends a SIGKILL.

The kernel is constantly watching. It’s counting every clock cycle, every page fault. You call it “resource orchestration.” I call it a digital debtor’s prison. You’re not managing a service; you’re managing a set of constraints. And when those constraints get tight, the kernel starts reaping processes like a farmer in a bad harvest.

The Layered Filesystem: A Stack of Lies

This is where you really get scammed. You love Docker because of the “layers.” You think it’s efficient. You think it’s clever. In reality, it’s a nightmare of pointers and Copy-on-Write (CoW) overhead that would make a mainframe engineer weep.

Docker Engine v26.1.1 uses the overlay2 storage driver. It’s a union filesystem. It takes a bunch of directories and mashes them together so they look like one.

Let’s look at /var/lib/docker/overlay2. This is the graveyard of your bad decisions.

# ls -l /var/lib/docker/overlay2
drwx------ 4 root root 4096 May 20 02:10 1a2b3c4d5e6f...
drwx------ 4 root root 4096 May 20 02:11 7a8b9c0d1e2f...
drwx------ 4 root root 4096 May 20 02:12 l/

Inside each of those hex-named folders, you have a diff directory, a link file, and sometimes a lower file. The lowerdir is your base image—read-only. The upperdir is where your changes go. The merged directory is what the docker container actually sees.

When you, in your infinite wisdom, run apt-get update in a Dockerfile and don’t clean up, you’re creating a permanent layer of junk. Even if you delete the files in a later layer, they’re still there, taking up space in the lowerdir. They’re just hidden by a “whiteout” file. It’s like painting over a moldy wall. The mold is still there; you just can’t see it until the whole structure collapses.

And the performance? Every time your app wants to write to a file that existed in the base image, the kernel has to copy that entire file from the lowerdir to the upperdir before it can modify a single bit. That’s the “Copy-on-Write” tax. You’re paying it every second, and you don’t even know it. You’re trading disk I/O for the convenience of not having to learn how to use tar and rsync.

The Runtime: What Happens When the Daemon Dies?

You think the Docker Daemon is this all-powerful god. You think if dockerd stops, the world ends. That shows how little you know about the plumbing. Docker is just a middleman. It’s a glorified API wrapper for containerd v1.7.15, which in turn is just a manager for runc.

runc is the actual worker. It’s the OCI (Open Container Initiative) runtime. When you tell Docker to start a docker container, it sends a request to containerd. containerd creates a “shim” process. That shim calls runc. runc does the heavy lifting of setting up the namespaces and cgroups, starts your process, and then—this is the important part—runc exits.

The shim stays alive. Why? To keep the pipes open. To catch the exit code. To make sure the kernel doesn’t clean up the namespaces while the process is still running.

Let’s look at a docker inspect for your running process. I’ll strip out the fluff:

[
    {
        "Id": "7a8f...",
        "State": {
            "Status": "running",
            "Pid": 3452
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/...",
                "UpperDir": "/var/lib/docker/overlay2/.../diff",
                "MergedDir": "/var/lib/docker/overlay2/.../merged"
            },
            "Name": "overlay2"
        }
    }
]

You see that Pid: 3452? That’s the only thing that matters. If I kill the Docker daemon right now, your process keeps running. Why? Because the kernel doesn’t know what a “Docker” is. It only knows about PID 3452 and the namespaces attached to it.

You’ve built this massive stack of abstractions—Docker, Containerd, Shims, Runc—just to run a binary. It’s like building a skyscraper just to hold up a lemonade stand. We used to just run binaries. We used init scripts. We used systemd. But no, you needed a “runtime.” You needed “orchestration.” You’ve added ten layers of failure points and called it “progress.”

Security: The False Sense of Security

This is the part that keeps me up at night. You think that because your app is in a docker container, it’s secure. You think you’ve “isolated” it from the host.

Kid, you’re running as root.

Unless you’ve gone through the pain of setting up User Namespaces—which I know you haven’t, because it breaks half your precious “official” images—the root user inside your container is the same root user (UID 0) as the one on this host. The only thing stopping that process from reaching out and wiping my boot sector is a thin veneer of “Capabilities” and “Seccomp” filters.

Docker Engine v26.1.1 applies a default Seccomp profile. It blocks about 44 syscalls out of 300+. You think that’s enough? One bug in the kernel’s io_uring implementation, one flaw in the overlayfs driver, and your process has “container escaped” and is now running rampant on my bare metal.

And don’t get me started on “privileged” containers. You run a container with --privileged because you couldn’t figure out the permissions for a device node? You might as well just give the keys to the server to every script kiddie on the internet. A privileged docker container isn’t a container at all; it’s a suicide note.

We use AppArmor. We use SELinux. We use seccomp. We layer these things because we know the “container” is a lie. We know the isolation is a polite suggestion. You treat security like a checkbox in your CI/CD pipeline. I treat it like a war. Every time you pull a random image from the “vibrant” community on Docker Hub, you’re inviting a stranger into our basement. You don’t know what’s in those layers. You don’t know who built them. You just trust the “magic box.”

The “Ghost in the Machine” isn’t some digital spirit, kid. It’s the fact that your entire infrastructure is built on a foundation of kernel hacks and marketing terminology. You’re not a “DevOps Evangelist.” You’re a tenant in a house of cards, and the wind is starting to blow.

Now, take this flashlight and go check the drive bays in Rack 4. If I see one more “container” error on my console, I’m making you rewrite the entire stack in assembly.

Move it. The sun will be up in two hours, and we still haven’t fixed the actual problem. Your “magic box” didn’t save us, did it? It just gave us more logs to read. Welcome to the real world. It’s dark, it’s hot, and the kernel always wins.

Related Articles

Explore more insights and best practices:

Leave a Comment