Kubernetes on EC2 Before EKS: What We Got Wrong (And Right)

Why Self-Managed Kubernetes

In 2017, EKS didn't exist. GKE was mature, AKS was emerging, but AWS's managed Kubernetes offering was still in development. If you wanted Kubernetes on AWS, you ran it yourself on EC2.

We used kops (Kubernetes Operations) to provision and manage our cluster. kops is a mature tool — it handles the complexity of bootstrapping etcd, the control plane, and worker nodes on EC2, and manages upgrades. Without it, self-managed Kubernetes would have been significantly harder.

The Control Plane Is Not Your Friend

In managed Kubernetes (EKS, GKE), the control plane — API server, etcd, scheduler, controller manager — is AWS/Google's problem. In self-managed, it's yours.

etcd is the soul of your cluster. Everything in Kubernetes — all configuration, all state — is stored in etcd. We learned this the hard way when an etcd volume filled up during a cluster upgrade. The API server became read-only, deployments stopped, and we spent 2 hours diagnosing what turned out to be a disk space issue that etcd's compaction wasn't handling correctly.

etcd backup is non-negotiable. We added regular etcd snapshots to S3 within our first month.

Control plane upgrades require a maintenance window. Upgrading the API server or etcd means brief control plane unavailability (seconds to minutes depending on approach). Plan these carefully.

Networking Challenges

Kubernetes networking on EC2 is more complex than in managed offerings. We used Flannel (VXLAN mode) for pod networking. The operational questions you don't face in EKS:

·How do pods communicate across EC2 instances? (VXLAN overlay network)
·How does DNS resolution work for services? (kube-dns/CoreDNS in-cluster)
·How do you expose services externally? (Manual ELB creation and Kubernetes service annotation)

CNI plugins (Container Network Interface) have different performance characteristics. Flannel is simple and reliable; Calico offers network policy capabilities; AWS VPC CNI (which EKS uses) gives pods real VPC IPs. Understanding these tradeoffs matters when you're debugging networking issues at 2am.

What EKS Users Take for Granted

After EKS launched in 2018 and we migrated:

·Control plane managed: No etcd disk capacity alerts at 2am
·AWS IAM integration: Native IAM roles for service accounts (IRSA) — previously we had to run kube2iam as a DaemonSet
·EBS CSI driver: Persistent volume provisioning just works
·Upgrade simplicity: EKS handles control plane upgrades; you upgrade node groups separately

If you're starting fresh today, use EKS (or GKE, or AKS). Self-managed Kubernetes is an advanced topic. But understanding what managed services abstract is valuable — it makes you a better operator when things go wrong.

The Lasting Lessons

1.Monitor your etcd — disk, memory, latency — as carefully as your application
2.Pod disruption budgets matter in a real cluster — set them before you do your first rolling upgrade
3.Resource requests and limits are not optional — unconstrained pods will OOM-kill their neighbours
4.Kubernetes is a great tool that assumes you know what you're doing. Invest in understanding it before you run production workloads.