LiveKit LiveKitWebRTC

Building a Scalable Video Conferencing Platform with LiveKit on Kubernetes

Deploy a production LiveKit WebRTC platform on Kubernetes for 50,000 concurrent users — global SFU nodes, recording egress, and observability.

P

Head of Engineering, KubeAce

15 min read

Building a real-time video platform that works reliably at scale is genuinely hard. WebRTC’s peer-to-peer model breaks down beyond 4-5 participants. LiveKit’s Selective Forwarding Unit (SFU) architecture solves this — but deploying it correctly on Kubernetes requires understanding both systems deeply.

This guide distills what we’ve learned deploying LiveKit for platforms ranging from 100 to 50,000 concurrent users.

Understanding LiveKit’s Architecture

LiveKit is an open-source WebRTC SFU. Instead of participants sending video to each other directly (mesh) or through a centralized transcoder (MCU), each participant sends one stream to the SFU, which selectively forwards it to other participants.

This means:

  • Bandwidth scales with publishers, not participants: A 100-person meeting with one active speaker uses near-identical bandwidth as a 10-person meeting
  • CPU is dominated by packet routing, not transcoding: LiveKit is CPU-efficient and horizontally scalable
  • Each LiveKit server handles a room: Rooms don’t span servers (with default config)

Infrastructure Design: Single Region

For a single-region deployment serving up to 5,000 concurrent users:

# livekit-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: livekit
  namespace: livekit
spec:
  replicas: 4
  selector:
    matchLabels:
      app: livekit
  template:
    metadata:
      labels:
        app: livekit
    spec:
      containers:
      - name: livekit
        image: livekit/livekit-server:v1.7.2
        ports:
        - containerPort: 7880  # HTTP/gRPC
        - containerPort: 7881  # RTC TCP
        - containerPort: 7882  # Prometheus metrics
        - containerPort: 50000-60000  # UDP RTC range (host network preferred)
        env:
        - name: LIVEKIT_CONFIG
          valueFrom:
            secretKeyRef:
              name: livekit-config
              key: config.yaml
        resources:
          requests:
            cpu: "2"
            memory: "4Gi"
          limits:
            memory: "8Gi"  # No CPU limit — causes severe WebRTC quality issues
      hostNetwork: true  # Critical for UDP RTC performance
      dnsPolicy: ClusterFirstWithHostNet

Critical: LiveKit requires hostNetwork: true for optimal UDP performance. Without it, each UDP packet traverses kube-proxy’s NAT, adding 5-15ms latency and causing packet reordering.

The UDP Challenge

This is the biggest Kubernetes-specific challenge with LiveKit. WebRTC uses UDP for media transport. In Kubernetes:

  • NodePort services don’t efficiently handle high-throughput UDP
  • hostNetwork: true bypasses kube-proxy for UDP but means one LiveKit pod per node
  • Karpenter or Cluster Autoscaler must provision appropriate node types (compute-optimised, not memory-optimised)

For AWS, we use c6i.2xlarge instances (8 vCPU, 16 GB) — one LiveKit pod per node.

Redis for Room State

LiveKit uses Redis for room state when running multiple replicas:

apiVersion: v1
kind: ConfigMap
metadata:
  name: livekit-config
data:
  config.yaml: |
    port: 7880
    rtc:
      tcp_port: 7881
      port_range_start: 50000
      port_range_end: 60000
      use_external_ip: true
    redis:
      address: redis-master.redis.svc.cluster.local:6379
    turn:
      enabled: true
      domain: turn.yourdomain.com
      tls_port: 5349
    keys:
      your-api-key: your-api-secret

Global Multi-Region Deployment

For 10,000+ concurrent users across geographies, you need a multi-region approach:

Architecture Overview

┌──────────────────────────────────────────────────────────┐
│                     Global Load Balancer                  │
│              (Cloudflare or Route53 Latency)             │
└──────┬───────────────────────────┬───────────────────────┘
       │                           │
┌──────▼──────┐             ┌──────▼──────┐
│ Mumbai AZ   │             │ Singapore   │
│ LiveKit x4  │             │ LiveKit x4  │
│ Redis       │             │ Redis       │
└─────────────┘             └─────────────┘

Key design decisions:

  1. Latency-based DNS routing: Users connect to the nearest LiveKit cluster automatically
  2. No cross-region room state: Rooms are anchored to a single region. If you need cross-region rooms, use LiveKit’s distributed config with a central Redis.
  3. TURN servers co-located: Deploy Coturn alongside LiveKit in each region for NAT traversal

Ingress for WebSocket

LiveKit’s signaling uses WebSocket (HTTP/2 upgradeable). Configure your ingress carefully:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: livekit-ingress
  annotations:
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-buffering: "off"
    nginx.ingress.kubernetes.io/upstream-hash-by: "$remote_addr"  # Sticky sessions
spec:
  rules:
  - host: livekit.yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: livekit
            port:
              number: 7880

Important: Set aggressive proxy timeouts. LiveKit WebSocket connections last for the duration of the room session.

Observability for WebRTC

Standard infrastructure metrics aren’t enough for WebRTC. You need media quality metrics:

Key Metrics to Monitor

# Prometheus scrape config
- job_name: 'livekit'
  static_configs:
  - targets: ['livekit:7882']
  metrics_path: '/metrics'

Critical metrics from LiveKit’s Prometheus endpoint:

MetricAlert ThresholdMeaning
livekit_room_total> 80% capacityApproaching room limits
livekit_participant_total> 80% capacityApproaching participant limits
livekit_packet_loss_*> 2%Network quality degrading
livekit_nack_totalRisingPacket retransmissions needed
livekit_jitter_*> 30msJitter buffer stress

Grafana Dashboard

We open-source our LiveKit Grafana dashboard — it includes all WebRTC quality metrics, room lifecycle visualisations, and per-participant quality scoring. Contact us to request access.

Load Testing LiveKit

Never deploy at scale without load testing. We use a combination of:

  1. livekit-load-tester (official tool) — simulates publish-only rooms
  2. Custom k6 scripts — for realistic join/leave patterns
  3. Chaos engineering — simulate node failures during active rooms

Run these in staging with your target concurrent user count × 1.5 for headroom.

Results: What to Expect

With the architecture above on 4× c6i.2xlarge nodes:

  • Concurrent participants: 15,000-20,000 per region
  • Average latency: 60-90ms within region
  • Packet loss (good network): < 0.1%
  • CPU utilisation: ~60% at full load (room for spikes)
  • Cost: ~$800-1,200/month per region on AWS Spot

At 50,000+ concurrent users, we add Karpenter for dynamic node provisioning and geographic expansion to 4+ regions.

Getting Started

LiveKit is genuinely one of the best open-source WebRTC stacks available. The complexity is in the deployment — particularly the UDP networking on Kubernetes.

If you want KubeAce to architect and deploy a LiveKit platform for your use case — whether it’s education, telehealth, gaming, or enterprise conferencing — schedule a free architecture review. We typically scope and quote in one session.