The Kubernetes Illusion of Simplicity
Kubernetes promises to make container orchestration simple. In many ways it delivers — deployment rollouts, service discovery, self-healing. But the operational reality of running Kubernetes in production, especially on self-managed EC2, has plenty of sharp edges.
Here are the pitfalls that bit us and how we addressed them.
1. OOMKilled: The Silent Killer
Pods with no memory limits will consume as much memory as they can, potentially starving neighbouring pods on the same node. When Kubernetes's OOM killer fires, pods are killed without warning — no graceful shutdown, no drain.
Fix: Set both requests and limits for every container. Start conservative (requests = expected normal usage, limits = 2x requests), then tune based on observed usage.
yamlresources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"Monitor OOMKilled events with: kubectl get events --field-selector reason=OOMKilling
2. Node Pressure and Eviction
When a node is under memory or disk pressure, Kubernetes evicts pods. The eviction order is based on QoS class (BestEffort → Burstable → Guaranteed). Pods with no resource requests are BestEffort — first to be evicted.
We had a monitoring pod running without resource requests that kept getting evicted during node pressure events. Setting resource requests elevated it to Burstable QoS and solved the evictions.
3. IMDS and IAM Role Conflicts
On EC2, the Instance Metadata Service (IMDS) provides IAM credential access. Without kube2iam or IRSA, every pod on a node inherits the node's IAM role. This is a security and permissions nightmare — your high-privilege CI runner pod and your low-privilege web server pod have the same AWS permissions.
We ran kube2iam pre-IRSA. It intercepts calls to the IMDS endpoint and returns pod-specific role credentials based on a pod annotation. Not perfect (there's a brief window during pod startup where it can be bypassed), but significantly better than node-level role inheritance.
4. kubectl Drain and PodDisruptionBudgets
Draining a node (kubectl drain) evicts all pods from that node. Without PodDisruptionBudgets, Kubernetes can evict all replicas of a deployment simultaneously — causing a service outage during routine maintenance.
PodDisruptionBudgets prevent this:
yamlapiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 1
selector:
matchLabels:
app: apiThis guarantees at least 1 replica of api is always available during voluntary disruptions. Set PDBs for every deployment that serves traffic.
5. Stale DNS and kube-dns Overload
kube-dns (or CoreDNS) is the in-cluster DNS resolver. Under high query rates, it can become a bottleneck. Symptoms: intermittent connection failures with "name resolution failure" errors, especially from Go services (which make separate A and AAAA DNS queries for every hostname lookup).
Fixes: Enable NodeLocal DNSCache (a DaemonSet that caches DNS on each node, reducing kube-dns load), and configure ndots:5 reduction in pod DNS config if you control the pod specs.
These aren't exotic failure modes — they're the things that happen in any sufficiently busy Kubernetes cluster. Knowing them before they happen to you in production is worth the reading time.