Why Your Kubernetes Cluster Is Overprovisioned (And How to Fix It)
The 20% Utilization Problem
We audit a lot of Kubernetes clusters. The average CPU utilization we see is 20-35%. That means 65-80% of your compute spend is waste. This isn't because teams are careless. It's because Kubernetes resource management is genuinely hard to get right, and the defaults encourage overprovisioning. Most engineers set resource requests high ('just to be safe'), never revisit them, and the cluster auto-scaler dutifully provisions nodes to satisfy those inflated requests.
Resource Requests: The Root Cause
The most common misconfiguration is resource requests that don't match actual usage. A pod requests 2 CPU cores and 4GB memory because that's what someone guessed during initial deployment. Actual usage is 0.3 CPU and 800MB. The Kubernetes scheduler reserves the requested resources, so even though the pod only uses 15% of what it asked for, nothing else can use the remaining 85%. Multiply this by 50-100 pods and you're running 3x more nodes than you need. The fix is straightforward: use tools like Kubernetes Metrics Server, Goldilocks, or Kubecost to measure actual resource usage over 2-4 weeks, then adjust requests to match P95 utilization with a 20% buffer.
Node Sizing and Instance Selection
Another common issue is using the wrong instance types. Teams default to m5.xlarge (4 vCPU, 16GB) because it's a safe general-purpose choice. But if your workloads are memory-heavy, you end up wasting CPU. If they're CPU-heavy, you waste memory. We use Karpenter (AWS's node provisioning tool for EKS) to automatically select the optimal instance type based on pending pod requirements. Karpenter can choose from a pool of instance types and sizes, including spot instances, to minimize cost while meeting resource needs. Switching from static node groups to Karpenter typically saves 25-40% on compute.
Spot Instances for Non-Critical Workloads
Spot instances on EKS offer 60-70% savings over on-demand pricing. They work well for stateless workloads that can tolerate interruption: development environments, batch processing, CI/CD runners, and non-user-facing services. The key is separating workloads into critical (on-demand or reserved) and non-critical (spot-eligible) using node affinity and taints. We typically run 40-60% of a cluster's workloads on spot instances. With proper pod disruption budgets and multi-AZ spot diversification, interruption rates are under 5% and handled gracefully by Kubernetes rescheduling.
The Optimization Playbook
Our standard EKS cost optimization engagement follows this sequence: install metrics collection (Prometheus + Kubecost) and measure for 2 weeks. Right-size resource requests based on observed P95 usage. Deploy Karpenter to replace static node groups. Implement spot instances for eligible workloads. Set up Goldilocks for ongoing resource recommendation. Configure cluster auto-scaler (or Karpenter) scaling policies to be more aggressive about scaling down. On a recent engagement, a 40-node EKS cluster dropped to 18 nodes with identical workload performance. Monthly compute cost went from $14K to $5,800.