Overview
A fintech company running payment and lending workloads on Azure had seen cloud costs double over 18 months. The engineering team had no visibility into which services or teams were driving spend, making optimization difficult to prioritise.
What We Found
Initial audit identified four primary cost drivers:
- Over-provisioned VMs — average CPU utilisation at 8%, memory at 22%
- Orphaned resources — 34 disks, 12 public IPs, and 6 load balancers with no associated active workloads
- Unoptimised storage — hot tier used for archive data; no lifecycle policies
- AKS node pools — fixed node counts with no autoscaling; running full capacity 24/7
Actions Taken
- Right-sized all AKS node pools based on 30-day P95 utilisation; enabled cluster autoscaler
- Migrated 60% of stateless workloads to Spot instances (Azure Spot VMs with fallback)
- Implemented Azure Blob Storage lifecycle policies (hot → cool → archive tiering)
- Cleaned up orphaned resources (immediate 8% savings)
- Tagged all resources for team-level cost attribution
- Set up Kubecost for per-namespace, per-team cost dashboards
- Established monthly FinOps review with automated budget alerts at 80% and 95% thresholds
Results
The 42% reduction was achieved within 90 days with no production incidents and no performance regression. The ongoing FinOps process now prevents uncontrolled cost growth going forward.