Overview
A B2B SaaS company running on AWS EKS had seen Kubernetes costs grow steadily as the team added services without corresponding resource governance. Cluster costs were the single largest line item in their infrastructure budget.
Root Cause Analysis
KubeAce ran a 30-day resource profiling exercise before making any changes:
- Resource requests vs. actual usage: average pod requested 4× its actual CPU consumption
- Memory limits: set arbitrarily — 70% of pods had never reached 40% of their limit
- Node utilisation: average 18% CPU, 24% memory across all nodes
- No autoscaling: fixed node counts regardless of time-of-day load patterns
Optimisation Actions
Node-Level
- Replaced managed node groups with Karpenter NodePools
- Configured consolidation policies: nodes scale to zero during off-peak windows
- Mixed Spot/On-Demand: 70% Spot for stateless workloads, On-Demand only for stateful services
Workload-Level
- Right-sized resource requests for 63 deployments based on P95 usage metrics
- Enabled VPA in recommendation mode for ongoing request tuning
- Set LimitRange policies per namespace to prevent unbounded resource requests
Governance
- Kubecost deployed with per-team showback dashboards
- ResourceQuota applied to all namespaces
- Resource efficiency gates added to CI/CD pipeline
Results
The 58% cost reduction came primarily from Karpenter consolidation (40% of savings) and right-sizing (18% of savings). Pod stability improved as containers were no longer competing for over-committed node resources.