Scaling Redis across 9 regions

Why 9 regions?

When your users span from Tokyo to Sao Paulo, every millisecond of cache latency compounds. Our AI chat platform serves users globally, and the difference between a local Redis read (sub-1ms) and a cross-region one (50-200ms) is the difference between a snappy UX and a sluggish one.

We started with 3 regions, scaled to 6, and eventually landed on 9. Each expansion taught us something about the limits of distributed caching.

Replication topology

We use a hub-and-spoke replication model rather than mesh replication. Each region has a local Redis cluster that handles reads. Writes flow through a primary region and replicate outward.

text

                    ┌──────────────┐
                    │   EU-West    │
                    │  (Primary)   │
                    └──────┬───────┘
                           │
           ┌───────────────┼───────────────┐
           │               │               │
    ┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐
    │  US-East    │ │  AP-South   │ │  EU-Central │
    │ (Replica)   │ │ (Replica)   │ │ (Replica)   │
    └──────┬──────┘ └──────┬──────┘ └─────────────┘
           │               │
    ┌──────┴──────┐ ┌──────┴──────┐
    │  US-West    │ │  AP-East    │
    │ (Replica)   │ │ (Replica)   │
    └─────────────┘ └─────────────┘

The consistency problem

With asynchronous replication, you get eventual consistency. For a chat application, this creates a specific problem: a user sends a message in one region, but if they're load-balanced to a different region on the next request, they might not see their own message.

Our solution: read-your-writes consistency via a sticky session layer.

session-router.ts

typescript

interface SessionContext {
  userId: string;
  primaryRegion: string;
  lastWriteTimestamp: number;
  writeRegion: string;
}
 
function resolveReadRegion(ctx: SessionContext, localRegion: string): string {
  const timeSinceWrite = Date.now() - ctx.lastWriteTimestamp;
  const replicationLag = getEstimatedLag(ctx.writeRegion, localRegion);
 
  // If the write hasn't had time to replicate, read from write region
  if (timeSinceWrite < replicationLag * 1.5) {
    return ctx.writeRegion;
  }
 
  return localRegion;
}

This gives us the performance benefits of local reads in the common case (reads far outnumber writes) while guaranteeing users always see their own writes.

Memory management at scale

Redis is an in-memory store, and memory is expensive at scale. Across 9 regions, every byte stored is multiplied by 9. We use several strategies to keep memory in check:

Key expiration policies

cache-policies.ts

typescript

const CACHE_POLICIES = {
  // Hot conversation data: keep for 24h
  conversation: { ttl: 86400, maxMemory: "2gb" },
  // User session data: keep for 1h
  session: { ttl: 3600, maxMemory: "512mb" },
  // AI response cache: keep for 6h
  aiCache: { ttl: 21600, maxMemory: "4gb" },
  // Rate limiting counters: keep for 1 minute
  rateLimit: { ttl: 60, maxMemory: "256mb" },
} as const;

Compression

For larger values (conversation histories, cached AI responses), we compress with LZ4 before storing. The CPU overhead is negligible compared to the memory savings:

compression.ts

typescript

import { compress, decompress } from "lz4-napi";
 
async function setCompressed(
  redis: Redis,
  key: string,
  value: string,
  ttl: number
): Promise<void> {
  const buffer = Buffer.from(value);
  if (buffer.byteLength > 1024) {
    const compressed = await compress(buffer);
    await redis.setex(`z:${key}`, ttl, compressed);
  } else {
    await redis.setex(key, ttl, value);
  }
}

Monitoring and alerting

With 9 regions, observability becomes critical. We monitor three key metrics per region:

Replication lag — how far behind each replica is from the primary
Memory utilization — approaching max triggers eviction warnings
Command latency — p99 latency per command type

redis-health.ts

typescript

interface RegionHealth {
  region: string;
  replicationLagMs: number;
  memoryUsagePercent: number;
  commandLatencyP99Ms: number;
  evictionsPerSecond: number;
  status: "healthy" | "degraded" | "critical";
}
 
function evaluateHealth(metrics: RegionHealth): void {
  if (metrics.replicationLagMs > 5000) {
    alert(`High replication lag in ${metrics.region}: ${metrics.replicationLagMs}ms`);
  }
  if (metrics.evictionsPerSecond > 100) {
    alert(`High eviction rate in ${metrics.region}: ${metrics.evictionsPerSecond}/s`);
  }
}

Operational patterns

Rolling deployments

Never upgrade all 9 regions simultaneously. We deploy to a canary region first, monitor for 30 minutes, then roll out to remaining regions in batches of 2:

Canary (EU-West staging) — 30 min soak
EU-West, EU-Central — 15 min soak
US-East, US-West — 15 min soak
AP regions — 15 min soak

Disaster recovery

Each region can operate independently if the primary goes down. We maintain a promotion runbook that can elevate any replica to primary in under 60 seconds. We drill this quarterly.

Results

After optimizing over 18 months:

Metric	Before (3 regions)	After (9 regions)
Global p50 read latency	23ms	0.8ms
Global p95 read latency	89ms	4.2ms
Cache hit rate	78%	94%
Monthly Redis cost	$12,400	$8,900

The cost actually went down because better cache locality meant fewer cache misses, which meant fewer expensive origin reads.

Key takeaways

Start with hub-and-spoke. Mesh replication doesn't scale operationally beyond 3-4 nodes.
Read-your-writes consistency is non-negotiable for user-facing applications.
Compress early. The memory savings at 9x replication are massive.
Monitor evictions, not just memory. Evictions are the canary in the coal mine.
Drill your failover. A disaster recovery plan you haven't tested is a disaster recovery wish.

The distributed caching layer is one of those infrastructure investments that's invisible when it works well and catastrophic when it doesn't. Take the time to get it right.