Chapter 4: Design a Rate Limiter

volume1 rate-limiter system-design throttling

Status: 🟩 Interview ready - Common question!
Difficulty: Medium
Time to complete: 45 min read + practice


Overview

A rate limiter controls the rate of traffic sent by a client or service. It limits the number of requests a client can make within a time window (e.g., 100 requests per minute).

Why this matters:

  • Very common interview question (appears at Google, Amazon, Microsoft, etc.)
  • Fundamental building block in distributed systems
  • Teaches throttling, distributed systems, and trade-offs

Problem Statement

Design a rate limiter that:

  • Limits requests per user/IP within a time window
  • Returns HTTP 429 (Too Many Requests) when limit exceeded
  • Works in distributed systems (multiple servers)
  • Low latency overhead (< 1ms)
  • Accurate (no false positives/negatives)

Step 1: Requirements & Scope (5 min)

Functional Requirements

Clarifying questions:

  • Client-side or server-side rate limiter? β†’ Server-side (client-side can be bypassed)
  • What do we rate limit on? β†’ User ID, IP address, API key, or combination
  • Scale? β†’ Handle large number of requests (millions per second)
  • Distributed environment? β†’ Yes, multiple servers
  • Response when throttled? β†’ HTTP 429 with retry-after header
  • Do users get notified? β†’ Return headers showing rate limit status

Scope:

  • Limit requests per user or IP
  • Support different rate limit rules (per endpoint, per user tier)
  • Inform users about their rate limit status
  • Works across distributed servers

Non-Functional Requirements

  • Low latency: < 1ms overhead (shouldn’t slow down API)
  • Accurate: No false positives (legitimate requests blocked)
  • Scalability: Handle millions of users
  • Fault tolerance: Failures shouldn’t break system
  • Memory efficient: Don’t store too much state

Step 2: High-Level Design (10 min)

Where to Implement Rate Limiting

Option 1: Client-side ❌

  • Can be bypassed by malicious users
  • Only use as courtesy, not security

Option 2: Server-side (Application code) βœ…

  • Full control
  • Con: Adds logic to every service

Option 3: Middleware/API Gateway βœ…βœ… (Best!)

  • Centralized rate limiting
  • Don’t repeat logic in every service
  • Common in production (AWS API Gateway, Kong, NGINX)

Recommendation: Use API Gateway for rate limiting

Basic Architecture

[Client] β†’ [API Gateway with Rate Limiter] β†’ [Backend Services]
                     ↓
              [Redis/Memcached]
           (Store counters/state)

Why Redis/Memcached?

  • In-memory β†’ Fast (< 1ms)
  • Support atomic operations (INCR)
  • Support expiration (TTL)
  • Shared across servers (distributed)

API Design

Request with rate limiting:

GET /api/v1/user/123

Response Headers:
X-RateLimit-Limit: 100        # Max requests per window
X-RateLimit-Remaining: 42     # Requests left
X-RateLimit-Reset: 1640000000 # Unix timestamp when limit resets

When rate limited:

HTTP 429 Too Many Requests

Response Headers:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1640000000
Retry-After: 60               # Seconds until can retry

Body:
{
  "error": "Rate limit exceeded",
  "message": "Too many requests. Try again in 60 seconds."
}

Rate Limiting Rules

Example rules:

Rule 1: 10 requests per second per user
Rule 2: 100 requests per minute per IP address
Rule 3: Free tier: 1000 requests/day, Premium: 10000 requests/day
Rule 4: /login endpoint: 5 requests per minute (prevent brute force)
Rule 5: /api/* endpoints: 100 requests per minute

Rules stored in configuration (database or config file):

{
  "userId:123": {
    "limit": 100,
    "window": "1m"
  },
  "ip:192.168.1.1": {
    "limit": 1000,
    "window": "1h"
  },
  "endpoint:/api/search": {
    "limit": 10,
    "window": "1s"
  }
}

Step 3: Deep Dive - Rate Limiting Algorithms (20 min)

How it works:

  • Bucket has a capacity (max tokens)
  • Tokens refill at a constant rate
  • Each request consumes 1 token
  • If no tokens β†’ request rejected

Analogy: Water bucket with a hole. Water drips in at constant rate. Request = take water out.

Parameters:

  • Bucket capacity: Max tokens (e.g., 100)
  • Refill rate: Tokens per second (e.g., 10/sec)

Example:

Bucket capacity: 4 tokens
Refill rate: 1 token/second

t=0:  [πŸͺ™πŸͺ™πŸͺ™πŸͺ™] (4 tokens) β†’ Request β†’ [πŸͺ™πŸͺ™πŸͺ™] (3 left)
t=1:  [πŸͺ™πŸͺ™πŸͺ™πŸͺ™] (refilled to 4) β†’ Request β†’ [πŸͺ™πŸͺ™πŸͺ™]
t=1:  [πŸͺ™πŸͺ™πŸͺ™] β†’ Request β†’ [πŸͺ™πŸͺ™]
t=1:  [πŸͺ™πŸͺ™] β†’ Request β†’ [πŸͺ™]
t=1:  [πŸͺ™] β†’ Request β†’ [] (empty!)
t=1:  [] β†’ Request β†’ ❌ REJECTED (no tokens)
t=2:  [πŸͺ™] (refilled) β†’ Request β†’ [] (OK)

Pros:

  • βœ… Allows burst traffic (use all tokens at once)
  • βœ… Memory efficient (only store bucket state)
  • βœ… Simple to understand and implement
  • βœ… Used by AWS, Stripe (industry standard)

Cons:

  • ❌ Tuning parameters tricky (capacity vs refill rate)
  • ❌ May allow more requests than limit (burst)

When to use: Most cases! Default choice for rate limiting.

Redis implementation:

def allow_request(user_id):
    key = f"rate_limit:{user_id}"
    bucket = redis.get(key)
 
    if bucket is None:
        # Initialize bucket
        bucket = {"tokens": 100, "last_refill": time.now()}
        redis.set(key, bucket, ex=3600)  # Expire in 1 hour
 
    # Refill tokens based on time elapsed
    now = time.now()
    elapsed = now - bucket["last_refill"]
    refill = elapsed * 10  # 10 tokens per second
    bucket["tokens"] = min(100, bucket["tokens"] + refill)
    bucket["last_refill"] = now
 
    # Try to consume 1 token
    if bucket["tokens"] >= 1:
        bucket["tokens"] -= 1
        redis.set(key, bucket, ex=3600)
        return True  # Allowed
    else:
        return False  # Rate limited

Algorithm 2: Leaky Bucket πŸ’§

How it works:

  • Requests enter queue (bucket)
  • Process requests at constant rate (leak)
  • If queue full β†’ request rejected

Analogy: Bucket with hole at bottom. Water leaks out at constant rate.

Parameters:

  • Queue size: Max queued requests (e.g., 100)
  • Outflow rate: Requests per second (e.g., 10/sec)

Example:

Queue size: 3
Outflow: 1 request/second

t=0:  [___] (empty) β†’ 3 requests arrive β†’ [RRR] (queue full)
t=1:  [RRR] β†’ 1 processed (leaked) β†’ [RR_] β†’ 1 new arrives β†’ [RRR]
t=2:  [RRR] β†’ 1 leaked β†’ [RR_]
t=2:  [RR_] β†’ 2 new arrive β†’ [RRR] β†’ 1 more arrives β†’ ❌ REJECTED

Pros:

  • βœ… Smooth traffic (constant outflow rate)
  • βœ… Good for preventing downstream overload
  • βœ… Memory efficient

Cons:

  • ❌ Burst traffic gets queued (higher latency)
  • ❌ Old requests in queue may be stale
  • ❌ Queue can fill up during spikes

When to use: Need strict constant rate (avoid spikes downstream)

Leaky Bucket vs Token Bucket:

AspectToken BucketLeaky Bucket
Burst trafficAllowedQueued/rejected
OutflowVariableConstant
ComplexitySimplerMore complex (queue management)
Use caseAPI rate limitingTraffic shaping

Most systems use Token Bucket because it’s simpler and allows bursts.


Algorithm 3: Fixed Window Counter

How it works:

  • Divide time into fixed windows (e.g., each minute)
  • Count requests in current window
  • Reset counter at window boundary

Parameters:

  • Window size: Duration (e.g., 1 minute)
  • Request limit: Max per window (e.g., 100)

Example:

Limit: 3 requests per minute
Window: 00:00:00 - 00:00:59

00:00:10 β†’ Request β†’ Counter: 1 βœ…
00:00:20 β†’ Request β†’ Counter: 2 βœ…
00:00:30 β†’ Request β†’ Counter: 3 βœ…
00:00:40 β†’ Request β†’ Counter: 3 β†’ ❌ REJECTED
00:00:50 β†’ Request β†’ Counter: 3 β†’ ❌ REJECTED
00:01:00 β†’ Counter resets to 0
00:01:05 β†’ Request β†’ Counter: 1 βœ…

Pros:

  • βœ… Very simple to implement
  • βœ… Memory efficient (one counter per window)
  • βœ… Easy to understand

Cons:

  • ❌ Burst at window boundary (major issue!)
    • At 00:00:59 β†’ 100 requests
    • At 00:01:00 β†’ 100 requests
    • Total: 200 requests in 2 seconds! (Double the limit)

Burst problem visualization:

Window 1: [______________________|********************]
          00:00                  00:00:59  (100 req in last 1 sec)

Window 2: [********************|______________________]
          00:01:00              00:01:59  (100 req in first 1 sec)

Total: 200 requests in 2 seconds! Should be 100 max.

When to use: Quick prototype, not production (use sliding window instead)

Redis implementation (simple):

def allow_request(user_id):
    key = f"rate_limit:{user_id}:{current_minute()}"
    count = redis.incr(key)
    redis.expire(key, 60)  # Expire after 1 minute
 
    if count <= 100:
        return True
    else:
        return False
 
def current_minute():
    return int(time.now() / 60)  # Window ID

Algorithm 4: Sliding Window Log πŸ“Š

How it works:

  • Store timestamp of each request
  • Count requests in sliding window (last N seconds)
  • Remove old timestamps outside window

Parameters:

  • Window size: Duration (e.g., 1 minute)
  • Request limit: Max per window (e.g., 100)

Example:

Limit: 3 requests per 60 seconds
Current time: 00:01:30

Request log:
00:00:35 β†’ Remove (too old, > 60 sec ago)
00:01:00 β†’ Keep (within 60 sec)
00:01:15 β†’ Keep
00:01:20 β†’ Keep
Count: 3 requests in last 60 seconds

00:01:30 β†’ New request β†’ Count: 4 β†’ ❌ REJECTED
00:02:00 β†’ New request β†’ Count: 2 (00:01:00 expired) β†’ βœ… ALLOWED

Pros:

  • βœ… Very accurate (no boundary burst problem)
  • βœ… True sliding window

Cons:

  • ❌ Memory expensive (store all timestamps)
    • 1M users Γ— 100 requests/min Γ— 8 bytes/timestamp = 800 MB
  • ❌ Slow (need to iterate all timestamps to count)

When to use: Need high accuracy, low request volume

Redis implementation:

def allow_request(user_id):
    key = f"rate_limit:{user_id}"
    now = time.now()
    window_start = now - 60  # 60 second window
 
    # Remove old timestamps
    redis.zremrangebyscore(key, 0, window_start)
 
    # Count requests in window
    count = redis.zcard(key)
 
    if count < 100:
        # Add current timestamp
        redis.zadd(key, {now: now})
        redis.expire(key, 60)
        return True
    else:
        return False

Algorithm 5: Sliding Window Counter 🎯 (Best Balance!)

How it works:

  • Hybrid of fixed window + sliding window
  • Approximate sliding window using two fixed windows
  • Weighted average based on time in current window

Formula:

Requests in sliding window =
  Requests in current window +
  Requests in previous window Γ— Overlap percentage

Example:

Limit: 100 requests per minute
Current time: 00:01:30 (30 seconds into current window)

Previous window (00:00 - 00:01): 80 requests
Current window (00:01 - 00:02): 40 requests

Overlap: 30 seconds into current minute
Overlap percentage: 30/60 = 50%

Estimated requests in sliding window:
= 40 (current) + 80 (previous) Γ— 50%
= 40 + 40
= 80 requests

New request β†’ 81 ≀ 100 β†’ βœ… ALLOWED

Visual:

Previous window:  [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ80 reqβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ]
Current window:               [β–ˆβ–ˆβ–ˆβ–ˆ40 reqβ–ˆβ–ˆβ–ˆβ–ˆ|________]
                              00:01:00        00:01:30

Sliding window:               [←————— 60 sec β€”β€”β€”β€”β†’]
                              Uses 50% of prev + 100% of current

Pros:

  • βœ… Good accuracy (better than fixed window)
  • βœ… Memory efficient (only 2 counters)
  • βœ… No burst problem
  • βœ… Used by Cloudflare, Azure

Cons:

  • ❌ Approximation (not perfectly accurate)
  • ❌ Assumes even distribution in previous window

When to use: Production systems! Best balance of accuracy and efficiency.

Redis implementation:

def allow_request(user_id):
    now = time.now()
    current_window = int(now / 60)
    previous_window = current_window - 1
    elapsed_time_in_current_window = now % 60
 
    current_key = f"rate_limit:{user_id}:{current_window}"
    previous_key = f"rate_limit:{user_id}:{previous_window}"
 
    current_count = redis.get(current_key) or 0
    previous_count = redis.get(previous_key) or 0
 
    # Calculate weighted count
    overlap_percentage = (60 - elapsed_time_in_current_window) / 60
    estimated_count = current_count + previous_count * overlap_percentage
 
    if estimated_count < 100:
        redis.incr(current_key)
        redis.expire(current_key, 120)  # Keep for 2 windows
        return True
    else:
        return False

Algorithm Comparison

AlgorithmProsConsMemoryAccuracyUse Case
Token BucketSimple, allows burst, industry standardTuning trickyLowGoodDefault choice
Leaky BucketSmooth trafficQueues requests, complexLowGoodTraffic shaping
Fixed WindowVery simpleBurst at boundaryVery LowPoorPrototypes only
Sliding LogVery accurateMemory expensive, slowHighExcellentLow volume, need accuracy
Sliding WindowGood accuracy, efficientApproximationLowVery GoodProduction systems

Recommendations:

  • Start with: Token Bucket (simple, industry standard)
  • For production: Sliding Window Counter (best balance)
  • Avoid: Fixed Window (burst problem)
  • For low volume: Sliding Log (if need perfect accuracy)

Step 4: Deep Dive - Distributed Rate Limiting

Challenge: Race Conditions

Problem: Multiple servers incrementing counter simultaneously

Server 1 reads count: 99
Server 2 reads count: 99
Server 1 increments: 100 βœ…
Server 2 increments: 100 βœ…
Actual count: 101 (should be rate limited!)

Solutions:

1. Redis INCR (Atomic):

count = redis.incr(key)  # Atomic operation
if count <= 100:
    return True
  • βœ… Atomic, no race condition
  • βœ… Fast

2. Lua Script in Redis (for complex logic):

-- Lua script runs atomically in Redis
local current = redis.call('get', KEYS[1])
if current and tonumber(current) >= 100 then
    return 0  -- Rate limited
else
    redis.call('incr', KEYS[1])
    redis.call('expire', KEYS[1], 60)
    return 1  -- Allowed
end

3. Locks (avoid if possible):

with redis.lock(f"lock:{user_id}"):
    count = redis.get(key)
    if count < 100:
        redis.incr(key)
        return True
  • ❌ Slow (lock contention)
  • ❌ Deadlock risk

Recommendation: Use Redis atomic operations (INCR, Lua scripts)


Challenge: Synchronization Across Regions

Problem: Multi-region deployment, users routed to different regions

US Region: User has 50 requests
EU Region: User has 50 requests
Total: 100 requests (limit reached)
But each region thinks user has 50 (under limit)

Solutions:

1. Centralized Redis Cluster (Strict consistency):

  • All regions talk to same Redis cluster
  • ❌ Higher latency (cross-region)
  • βœ… Accurate

2. Eventual Consistency (Relaxed):

  • Each region has own Redis
  • Periodically sync counts
  • βœ… Low latency
  • ❌ Less accurate (OK for most cases)

3. Sticky Sessions (Route same user to same region):

  • GeoDNS routes user to nearest region
  • βœ… Simple
  • ❌ Imbalanced load

Recommendation:

  • For strict enforcement: Centralized Redis (financial APIs)
  • For general APIs: Eventual consistency (good enough)

Challenge: Performance & Scalability

Bottleneck: Redis can become bottleneck at very high scale

Solutions:

1. Redis Cluster (Sharding):

  • Partition users across Redis shards
  • Hash(user_id) β†’ Redis shard
  • βœ… Scales horizontally

2. Local Rate Limiting + Global Rate Limiting:

# Fast local check (in-memory)
if local_rate_limiter.allow(user_id):
    # Slower global check (Redis)
    if global_rate_limiter.allow(user_id):
        return True
  • βœ… Very fast (local cache)
  • ⚠️ Less accurate (OK for high throughput)

3. Distributed Rate Limiting Libraries:

  • Use existing solutions (DoorKeeper, Resilience4j)
  • Handle complexities for you

Rate Limiting Rules & Configuration

Rule Storage

Option 1: Configuration File (Simple):

rules:
  - user_id: "123"
    limit: 1000
    window: "1d"
  - tier: "free"
    limit: 100
    window: "1h"
  - tier: "premium"
    limit: 10000
    window: "1h"
  - endpoint: "/api/search"
    limit: 10
    window: "1s"

Option 2: Database (Dynamic):

CREATE TABLE rate_limit_rules (
    rule_id INT PRIMARY KEY,
    entity_type VARCHAR(50),  -- 'user', 'ip', 'api_key'
    entity_id VARCHAR(255),
    endpoint VARCHAR(255),
    limit INT,
    window_seconds INT,
    created_at TIMESTAMP
);

Option 3: Rules Engine (Complex):

  • If user.tier == β€œpremium” β†’ 10000 req/hour
  • If endpoint == β€œ/login” β†’ 5 req/min
  • If ip.country == β€œsuspicious” β†’ 10 req/hour

Rule Priority

Multiple rules can apply to same request:

User ID rule: 1000 req/hour
IP rule: 100 req/hour
Endpoint rule: 10 req/sec

Which to apply? β†’ Most restrictive (10 req/sec)

Combining rules:

  • AND: Must satisfy all rules
  • OR: Must satisfy at least one
  • Most restrictive: Apply strictest limit

Handling Rate Limit Exceeded

Response Format

HTTP 429 Response:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1640000000
Retry-After: 60

{
  "error": "rate_limit_exceeded",
  "message": "You have exceeded the rate limit. Please try again in 60 seconds."
}

Client Behavior

Best practices:

  1. Exponential backoff: Wait 1s, 2s, 4s, 8s…
  2. Respect Retry-After header: Wait specified time
  3. Monitor X-RateLimit-Remaining: Stop before hitting limit
  4. Queue non-urgent requests: Don’t spam API

Advanced Topics

1. Distributed Rate Limiting with Redis Cluster

Architecture:

API Gateway 1 ──┐
API Gateway 2 ──┼──→ Redis Cluster (Sharded)
API Gateway 3 β”€β”€β”˜     - Shard 1: Users A-H
                       - Shard 2: Users I-P
                       - Shard 3: Users Q-Z

Consistent hashing: Route same user to same shard

2. Rate Limiting by Multiple Dimensions

Example: Rate limit by (user + endpoint):

User 123 β†’ /api/search: 10 req/sec
User 123 β†’ /api/users: 100 req/min

Redis key design:

rate_limit:{user_id}:{endpoint}:{window}
rate_limit:123:/api/search:1640000000

3. Dynamic Rate Limiting

Adjust limits based on:

  • Time of day (higher during business hours)
  • User behavior (increase limit for good actors)
  • System load (decrease when overloaded)

Example:

def get_rate_limit(user_id, system_load):
    base_limit = 100
    if system_load > 0.9:  # 90% capacity
        return base_limit * 0.5  # Reduce by 50%
    return base_limit

4. Exemptions & Allowlists

Bypass rate limiting for:

  • Internal services
  • Health checks (monitoring)
  • Admin users
  • Trusted partners
if user.role == "admin":
    return True  # Bypass rate limiting

Design Summary

Final Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   API Gateway                       β”‚
β”‚                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚         Rate Limiter Middleware             β”‚    β”‚
β”‚  β”‚                                             β”‚    β”‚
β”‚  β”‚  1. Extract identifier (user/IP/API key)    β”‚    β”‚
β”‚  β”‚  2. Check Redis for current count           β”‚    β”‚
β”‚  β”‚  3. Apply rate limit algorithm              β”‚    β”‚
β”‚  β”‚  4. If allowed β†’ Forward to backend         β”‚    β”‚
β”‚  β”‚  5. If blocked β†’ Return HTTP 429            β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                       ↓                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        ↓
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  Redis Cluster   β”‚
              β”‚  (Rate Counters) β”‚
              β”‚                  β”‚
              β”‚  Sliding Window  β”‚
              β”‚  or Token Bucket β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Decisions Summary

DecisionChoiceReasoning
AlgorithmToken Bucket or Sliding Window CounterBest balance of accuracy and efficiency
StorageRedisFast, atomic operations, TTL support
WhereAPI GatewayCentralized, don’t repeat in services
DistributedRedis Cluster + Atomic operationsHandle race conditions
RulesDatabase or Config fileFlexible, can update without deploy

Interview Questions & Answers

Q: How would you handle rate limiting for anonymous users?
A: Rate limit by IP address. But IP can be shared (NAT, VPN), so may need to be more lenient. Alternative: Use fingerprinting (browser, device) or require login for API access.

Q: What if Redis goes down?
A:

  • Fail open: Allow all requests (risk of abuse, but system stays up)
  • Fail closed: Reject all requests (secure, but bad user experience)
  • Fallback: Use local cache with eventual consistency
  • Best: Redis cluster with replication (high availability)

Q: How to prevent users from circumventing rate limits?
A:

  • Rate limit by multiple dimensions (user + IP + device)
  • Detect suspicious patterns (many IPs from one user)
  • Require authentication (can’t create infinite accounts)
  • Use CAPTCHA for anonymous endpoints
  • Monitor for abuse patterns

Q: Token Bucket vs Leaky Bucket?
A: Token Bucket allows burst traffic (use all tokens at once), Leaky Bucket enforces constant rate (smooth traffic). Token Bucket is more common because APIs naturally have bursty traffic.

Q: How to test rate limiter?
A:

  • Unit tests: Test algorithm logic
  • Integration tests: Test with Redis
  • Load tests: Simulate high traffic (millions of requests)
  • Chaos tests: Kill Redis, network partitions
  • Monitor: Track false positives/negatives in production

Key Takeaways

  1. Token Bucket is the most popular algorithm (AWS, Stripe use it)
  2. Sliding Window Counter gives best balance of accuracy and efficiency
  3. Implement at API Gateway for centralized rate limiting
  4. Use Redis for fast, atomic operations
  5. Handle race conditions with atomic operations (INCR, Lua scripts)
  6. Always return HTTP 429 with Retry-After header
  7. Monitor false positives and adjust limits


Practice this design! Very common interview question. Be ready to:

  1. Explain different algorithms and trade-offs
  2. Draw architecture diagram
  3. Handle distributed system challenges (race conditions, multi-region)
  4. Discuss Redis data structures and operations

Last Updated: 2026-04-09
Status: Common interview question - Must know!