INTERNAL SECURITY BRIEFING: DOCUMENT ID #882-ALPHA-KUBE-DREAD
DATE: OCTOBER 24, 2024
AUTHOR: SENIOR ARCHITECT (INFRASTRUCTURE DEFENSE)
SUBJECT: THE SYSTEMIC FRAGILITY OF THE ORCHESTRATION LAYER
STATUS: CRITICAL / EYES ONLY
$ kubectl get pods --all-namespaces
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding
$ # Attempting to debug via logs...
$ journalctl -u kubelet -n 20 --no-pager
-- Logs begin at Tue 2024-10-22 04:12:01 UTC. --
Oct 24 09:14:12 node-01 kubelet[1024]: E1024 09:14:12.124512 1024 controller.go:144] "Failed to sync pod" err="failed to "StartContainer" for "security-agent" with CrashLoopBackOff: back-off 5m0s restarting failed container=security-agent pod=security-agent-7f8d9b-x2z (ns=kube-system)"
Oct 24 09:14:15 node-01 kubelet[1024]: I1024 09:14:15.882103 1024 server.go:455] "Event occurred" object="kube-system/security-agent-7f8d9b-x2z" kind="Pod" reason="FailedMount" message="MountVolume.SetUp failed for volume \"etcd-certs\" : secret \"etcd-certs\" not found"
Oct 24 09:15:01 node-01 kubelet[1024]: F1024 09:15:01.001221 1024 kubelet.go:1922] Failed to validate certificate: x509: certificate has expired or is not yet valid: current time 2024-10-24T09:15:01Z is after 2024-10-23T12:00:00Z
$ # [REDACTED] - SYSTEM UNRESPONSIVE. CONTROL PLANE DESYNC DETECTED.
The Board keeps asking me, “What is Kubernetes?” They ask it with the same tone they use to ask about the weather or the quarterly earnings. They want a simple answer. They want me to say it is a “platform” or a “solution.”
It is neither.
To understand what is Kubernetes, you must first understand the concept of a lie. Kubernetes is a massive, distributed lie told to developers to make them believe the underlying hardware no longer exists. It is a layer of extreme abstraction that sits atop our bare metal, obscuring the reality of networking, storage, and compute behind a curtain of YAML files and API calls. My job is to look behind that curtain, and what I see is a sprawling, high-velocity disaster waiting to happen.
We are currently running a mix of version 1.29 and 1.30. With the transition to 1.30, we are seeing the final removal of legacy beta APIs—specifically the FlowSchema and PriorityLevelConfiguration v1beta2/v1beta3 versions. While the “cloud-native” evangelists celebrate this as progress, I see it as another point of failure where our legacy automation scripts will simply snap, leaving the control plane in a state of permanent congestion.
Table of Contents
SECTION 1.0: THE API SERVER AS A CENTRALIZED FAILURE POINT
The kube-apiserver is the only thing standing between us and total entropy. Every single action in the cluster—every pod start, every secret access, every network change—must pass through this bottleneck.
When a request hits the API server, it undergoes a grueling process: Authentication, Authorization, and Admission Control. If any of these layers are misconfigured, the entire house of cards collapses. In version 1.30, the “Structured Authentication” and “Structured Authorization” features are moving toward maturity, but they add yet another layer of configuration complexity that our junior admins are not prepared to handle.
AUDIT NOTE: THE RECONCILIATION LOOP
Do not be fooled by the marketing term “self-healing.” The reconciliation loop is actually a state of constant, controlled chaos. The system is perpetually comparing the “Desired State” (what we want) against the “Actual State” (the messy reality). If a node dies, the loop notices the discrepancy and tries to spin up pods elsewhere. This is not “healing”; it is a frantic, automated attempt to outrun hardware failure. If the loop logic itself is flawed—or if the API server is under load—the system can enter a “death spiral” where it kills healthy pods in a desperate attempt to satisfy an impossible configuration.
The “Bin-Packing” problem exacerbates this. The scheduler tries to cram as many containers as possible onto a single node to “save money.” From a security perspective, this is a nightmare. We are intentionally increasing our blast radius, putting disparate workloads with different risk profiles on the same kernel, hoping that the cgroups and namespaces—technologies that were never designed for multi-tenant security—will hold the line.
SECTION 2.4: THE ETCD ATTACK SURFACE
If the API server is the brain, etcd is the memory. It is a distributed key-value store that holds every single secret, configuration, and state for the entire infrastructure. If you have access to etcd, you own the company. Period.
We are currently tracking CVE-2023-44487 (the HTTP/2 Rapid Reset attack) which impacted many Go-based components, but my concern is more fundamental. In our current sprawl, we have found instances where etcd is not using mutual TLS (mTLS) for peer communication.
SECURITY WARNING: DATA EXFILTRATION
Any attacker who gains a foothold on a master node can potentially dump theetcddatabase. Because Kubernetes stores Secrets as base64-encoded strings (which is NOT encryption, despite what the developers think), a simpleetcdctl get / --prefixcommand reveals every database password, API key, and TLS private key in our environment.
We must implement encryption at rest for the etcd layer immediately. Relying on the cloud provider’s disk encryption is insufficient; we need the Kubernetes-native KMS (Key Management Service) integration, which, in version 1.29, finally stabilized its v2 API. This is not a luxury. It is a requirement for survival.
SECTION 3.1: THE KUBELET AND THE KERNEL BREACH
On every single node, there is a process called the kubelet. It is the “node agent” that takes orders from the control plane and talks to the Container Runtime Interface (CRI)—in our case, containerd.
The kubelet is a massive attack surface. It runs with root privileges because it has to manage the host’s iptables, mount file systems, and talk to the kernel. If a container escapes its sandbox, it doesn’t just get the node; it gets the kubelet‘s identity.
Consider this raw YAML for a “logging agent” I found running in the production namespace last week:
apiVersion: v1
kind: Pod
metadata:
name: log-harvester
namespace: prod-apps
spec:
containers:
- name: harvester
image: internal-repo/log-tool:latest
securityContext:
privileged: true
runAsUser: 0
volumeMounts:
- mountPath: /host/var/log
name: varlog
volumes:
- name: varlog
hostPath:
path: /var/log
CRITIQUE OF AUDIT FINDING #1:
This YAML is a suicide note.
1. privileged: true: This disables almost all security protections provided by the container runtime. The container can see the host’s devices.
2. runAsUser: 0: It is running as root. There is no reason for a logging tool to run as root inside the container.
3. hostPath: It is mounting /var/log from the host. An attacker who compromises this container can use this mount to perform a symlink attack and eventually read or write any file on the host operating system.
When I see this, I don’t see a “log harvester.” I see a backdoor that has been invited in and given a seat at the table.
SECTION 4.2: OVERLAY NETWORKING AND THE DNS VORTEX
To understand what is Kubernetes networking, you have to imagine a city where every house has a secret tunnel to every other house, but no one has a map. This is the Container Network Interface (CNI). We use Calico, which creates a virtual mesh of VXLAN tunnels.
This abstraction makes traditional network security tools useless. Your physical firewall sees nothing but encrypted traffic on a single port. Inside the cluster, however, it is a free-for-all. By default, Kubernetes networking is “flat.” Any pod can talk to any other pod, even across namespaces.
AUDIT NOTE: THE CORE-DNS LOOP
We have observed multiple outages caused by “DNS loops.” When a pod tries to resolve an external address, it hitsCoreDNS. IfCoreDNSis misconfigured or if the node’s/etc/resolv.confpoints back to the cluster IP, the request loops until the CPU spikes to 100% and the node stops responding to health checks. This is a self-inflicted Denial of Service.
Look at this attempt at a NetworkPolicy I found in the staging environment:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-all-ingress
namespace: customer-data
spec:
podSelector: {}
policyTypes:
- Ingress
ingress:
- from:
- ipBlock:
cidr: 0.0.0.0/0
CRITIQUE OF AUDIT FINDING #2:
This policy is effectively a “disable firewall” command.
1. podSelector: {}: This selects every pod in the customer-data namespace.
2. cidr: 0.0.0.0/0: This allows traffic from the entire internet (or any internal network) to hit these pods.
In a system that is supposed to be “secure by design,” we are seeing developers use “allow-all” policies because they find the complexity of micro-segmentation too difficult to manage. This is how data breaches happen.
SECTION 5.9: RBAC AND THE ILLUSION OF IDENTITY
Role-Based Access Control (RBAC) in Kubernetes is a nightmare of nested references. You have Roles, ClusterRoles, RoleBindings, and ClusterRoleBindings. It is almost impossible to audit who has access to what without specialized tooling.
In version 1.29, we saw improvements in how ServiceAccount tokens are handled (moving away from long-lived secrets to time-bound volumes), but the legacy debt remains. Many of our internal applications still use the “default” service account, which often has far more permissions than it needs.
Consider this RoleBinding:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: app-manager-binding
namespace: default
subjects:
- kind: ServiceAccount
name: default
namespace: default
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
CRITIQUE OF AUDIT FINDING #3:
This is the single most dangerous configuration I have found to date.
1. It binds the cluster-admin role—the highest possible permission level—to the default service account in the default namespace.
2. Every pod that doesn’t specify a service account will automatically mount the token for this default account.
3. This means any container in the default namespace can now delete the entire cluster, steal all secrets, and wipe our backups.
The fact that the API server even allowed this to be applied is a testament to why I am paranoid. The system does not stop you from shooting yourself in the foot; it merely provides a more efficient, automated way to pull the trigger.
SECTION 6.0: THE 1.30 UPGRADE AND THE “SIDECAR” COMPLICATION
As we move toward full adoption of version 1.30, we are forced to deal with the “SidecarContainers” feature moving to General Availability. This allows us to define containers that start before the main application container.
While the developers see this as a way to handle logging and proxies, I see it as a new way to hide malicious code. A “sidecar” can be injected by a Mutating Admission Controller without the developer even knowing it’s there. If an attacker compromises an admission controller, they can inject a “security-sidecar” into every single pod in the cluster. This sidecar could sniff all local traffic, exfiltrate environment variables, and provide a persistent reverse shell, all while remaining invisible to standard docker ps or kubectl get pods views if the user isn’t looking closely at the container list.
SECURITY WARNING: ADMISSION CONTROLLER BYPASS
We must audit ourValidatingAdmissionWebhooks. If the webhook is set tofailurePolicy: Ignore, an attacker can bypass our security checks by simply flooding the webhook server until it times out. The API server, in its infinite desire to keep the “reconciliation loop” moving, will simply allow the malicious pod to be created because it values availability over security.
SECTION 7.2: THE WEIGHT OF TECHNICAL DEBT
The Board wants to know “what is” the risk. The risk is that we have built our entire business on a foundation of shifting sand. Kubernetes is not a static product; it is a moving target.
Between version 1.29 and 1.30, we have seen:
1. The removal of v1beta2 flow control APIs, which we haven’t fully mapped in our monitoring stack.
2. Changes to how NodeLogQuery works, potentially breaking our audit trails.
3. The promotion of UserNamespacesSupport to beta, which sounds good for security but adds a massive layer of complexity to how we manage UID/GID mapping on the host.
We are currently managing 4,000 pods across 150 nodes. That is 4,000 potential entry points. That is 150 kernels that must be patched, 150 kubelets that must be secured, and a control plane that is being hammered by thousands of requests per second.
The “Bin-Packing” logic means that if one node fails, the remaining 149 nodes must absorb the load. This creates a “thundering herd” effect where the API server is suddenly overwhelmed by thousands of “CreatePod” requests. In this state of exhaustion, the system’s defenses are at their weakest. This is when an attacker will strike.
FINAL ARCHITECTURAL VERDICT
Kubernetes is a system designed for engineers who prioritize velocity above all else. It was built by people who wanted to deploy code a thousand times a day, not by people who wanted to keep a state-sponsored actor out of a database.
My audit concludes that our current infrastructure is a “black box” of our own making. We have traded visibility for scalability. We have traded security for “agility.”
To answer the Board one last time: What is Kubernetes?
It is a sophisticated engine of obfuscation. It is a framework that allows us to automate our mistakes at a scale previously unimaginable. It is a sprawling, interconnected web of APIs, binaries, and overlay networks that no single human being fully understands.
And it is currently running our entire company.
I recommend an immediate freeze on all new namespace creations until we can implement a mandatory PodSecurityAdmission (PSA) policy at the “Restricted” level across the entire cluster. We must also move to a “Zero Trust” network model using a Service Mesh like Istio—not because I want more complexity, but because it is the only way to get the visibility we lost when we moved to the overlay network.
If we do not act, the reconciliation loop will eventually reconcile us out of existence.
[END OF BRIEFING]
[SIGNATURE: ARCHITECT-01]
[ENCRYPTION KEY: 0x8F33A1…]
Related Articles
Explore more insights and best practices: