Chapter 4 Flashcards - Rate Limiter

flashcards volume1 rate-limiter throttling

What is a rate limiter and why do we need it?
?
A rate limiter controls the rate of traffic sent by a client or service, limiting requests within a time window (e.g., 100 req/min). Need it to: Prevent DoS attacks, prevent resource exhaustion, reduce costs (third-party API charges), prevent abuse (spam, brute force attacks).

Where should you implement rate limiting: client-side, server-side, or API Gateway?
?
API Gateway (best!): Centralized, don’t repeat logic in every service, standard in production (AWS API Gateway, Kong). Server-side: Full control but adds logic to each service. Client-side: Can be bypassed, only as courtesy not security. Use API Gateway for rate limiting!

What are the 5 main rate limiting algorithms?
?

  1. Token Bucket (most popular), 2. Leaky Bucket (smooth traffic), 3. Fixed Window Counter (simple but flawed), 4. Sliding Window Log (accurate but expensive), 5. Sliding Window Counter (best balance). Know when to use each!

Explain Token Bucket algorithm.
?
Bucket has capacity (max tokens), tokens refill at constant rate, each request consumes 1 token, if no tokens then reject. Parameters: Capacity (e.g., 100) and refill rate (e.g., 10/sec). Allows burst traffic (use all tokens at once). Used by AWS, Stripe. Best for most cases!

What are the pros and cons of Token Bucket?
?
Pros: Allows burst traffic (natural for APIs), memory efficient (only store bucket state), simple to implement, industry standard (AWS, Stripe). Cons: Tuning parameters tricky (capacity vs refill rate), may allow more requests than limit during burst. Still the default choice!

Explain Leaky Bucket algorithm.
?
Requests enter queue (bucket), process at constant rate (leak), if queue full then reject. Parameters: Queue size and outflow rate. Smooths traffic to constant rate. Good for preventing downstream overload but queues burst traffic causing higher latency.

Token Bucket vs Leaky Bucket - key difference?
?
Token Bucket: Allows burst traffic, variable outflow, simpler. Leaky Bucket: Queues/rejects burst, constant outflow, more complex with queue management. Use case: Token Bucket for API rate limiting (most common), Leaky Bucket for traffic shaping (need constant rate).

Explain Fixed Window Counter algorithm.
?
Divide time into fixed windows, count requests in current window, reset counter at boundary. Simple but has MAJOR FLAW: Burst at window boundary! Example: 100 req at 00:00:59 + 100 req at 00:01:00 = 200 req in 2 seconds (double the limit!). Don’t use in production!

What is the burst problem in Fixed Window Counter?
?
At window boundary, can get double the limit! If limit is 100 req/min: Get 100 requests at end of minute 1 (00:00:59), get 100 requests at start of minute 2 (00:01:00), total 200 requests in 2 seconds when limit should be 100. This is why Fixed Window is not used in production.

Explain Sliding Window Log algorithm.
?
Store timestamp of each request, count requests in last N seconds (sliding window), remove old timestamps. Very accurate, true sliding window, no burst problem. But memory expensive (store all timestamps: 1M users × 100 req/min × 8 bytes = 800 MB) and slow (iterate all timestamps).

Explain Sliding Window Counter algorithm.
?
Hybrid of fixed and sliding window. Use two fixed windows with weighted average based on time in current window. Formula: Current window requests + Previous window requests × Overlap percentage. Example: 40 current + 80 previous × 50% = 80 total. Memory efficient (2 counters), good accuracy, used by Cloudflare!

Which rate limiting algorithm should you use in production?
?
Start with Token Bucket (simple, industry standard, allows burst). For production at scale: Sliding Window Counter (best balance of accuracy and efficiency, used by Cloudflare/Azure). Avoid Fixed Window (burst problem). Use Sliding Log only if low volume and need perfect accuracy.

Compare the 5 algorithms by memory usage and accuracy.
?
Token Bucket: Low memory, good accuracy. Leaky Bucket: Low memory, good accuracy. Fixed Window: Very low memory, poor accuracy (burst problem). Sliding Log: High memory (stores all timestamps), excellent accuracy. Sliding Window Counter: Low memory (2 counters), very good accuracy. Best: Sliding Window Counter!

Why use Redis for rate limiting?
?
In-memory so fast (< 1ms), supports atomic operations (INCR prevents race conditions), supports TTL (automatic expiration), shared across all servers (distributed), industry standard. Alternatives: Memcached (no TTL support), local cache (not distributed).

What is the race condition problem in distributed rate limiting?
?
Multiple servers read count simultaneously. Server 1 reads 99, Server 2 reads 99, both increment to 100, actual count becomes 101 (over limit!). Solution: Use Redis atomic operations (INCR is atomic), or Lua scripts (run atomically in Redis), avoid locks (slow, deadlock risk).

How do you handle rate limiting across multiple regions?
?
Option 1 - Centralized Redis: All regions use same Redis cluster. Pro: Accurate. Con: Higher latency (cross-region). Option 2 - Eventual Consistency: Each region has own Redis, periodically sync. Pro: Low latency. Con: Less accurate (OK for most cases). Option 3 - Sticky Sessions: Route same user to same region. Choose based on accuracy vs latency needs.

What HTTP status code and headers should you return when rate limited?
?
Status: HTTP 429 Too Many Requests. Headers: X-RateLimit-Limit (max allowed), X-RateLimit-Remaining (requests left), X-RateLimit-Reset (when limit resets, Unix timestamp), Retry-After (seconds until can retry). Body should explain error and when to retry.

How do you store and manage rate limit rules?
?
Option 1 - Config file: Simple YAML/JSON. Option 2 - Database: Dynamic, can update without deploy. Store: entity type (user/IP/API key), entity ID, endpoint, limit, window duration. Support multiple dimensions (user + endpoint), rule priority (most restrictive wins), exemptions (admin users bypass).

What are the key decisions in rate limiter design?
?
Algorithm: Token Bucket or Sliding Window Counter (best balance). Storage: Redis (fast, atomic, TTL). Location: API Gateway (centralized). Distributed: Redis Cluster with atomic operations (handle race conditions). Rules: Database or config file (flexible). Failover: Fail open (allow requests) vs fail closed (reject).

How would you rate limit anonymous users?
?
Rate limit by IP address. But IP can be shared (NAT, VPN, offices) so may need to be more lenient. Alternatives: Device fingerprinting (browser, OS, screen size), require authentication for API access, use CAPTCHA for suspicious traffic, combine multiple signals (IP + user agent + fingerprint).

What if Redis goes down (failover strategy)?
?
Fail open: Allow all requests (risk of abuse but system stays up). Fail closed: Reject all requests (secure but bad UX). Fallback: Use local cache with eventual consistency. Best solution: Redis cluster with replication (high availability), monitor Redis health, automatic failover to standby.

How do you prevent users from circumventing rate limits?
?
Rate limit by multiple dimensions (user + IP + device fingerprint), detect suspicious patterns (many IPs from one user = distributed attack), require authentication (can’t create infinite free accounts), use CAPTCHA for anonymous endpoints, monitor and block abuse patterns, implement exponential backoff for repeated violations.

How do you implement Token Bucket in Redis?
?
Store bucket state (tokens, last_refill_time) in Redis. On each request: 1) Get bucket from Redis, 2) Calculate tokens to refill based on elapsed time (elapsed × refill_rate), 3) Add refilled tokens (capped at capacity), 4) Try to consume 1 token, 5) If token available, decrement and allow request, 6) Save updated bucket to Redis. Use Redis INCR for atomic operations.

How do you implement Sliding Window Counter in Redis?
?
Store counter for current and previous window. On request: 1) Calculate current window ID (timestamp / window_size), 2) Get counts for current and previous windows from Redis, 3) Calculate overlap percentage (time into current window / window_size), 4) Estimate total = current + previous × overlap%, 5) If under limit, increment current counter, 6) Use Redis INCR (atomic) and set TTL for auto-cleanup.

What is the formula for Sliding Window Counter?
?
Estimated requests = Requests in current window + Requests in previous window × Overlap percentage. Overlap percentage = (Time elapsed in current window) / (Window size). Example: 30 seconds into 60-second window = 50% overlap. If current=40, previous=80: Estimate = 40 + 80×0.5 = 80 requests.

How do you rate limit by multiple dimensions?
?
Create composite keys in Redis. Examples: rate_limit:user:123 (per user), rate_limit:ip:192.168.1.1 (per IP), rate_limit:user:123:endpoint:/api/search (per user per endpoint). Apply multiple rules, use most restrictive. Can also combine AND/OR logic: Must satisfy all rules (AND) or at least one rule (OR).

What is dynamic rate limiting?
?
Adjust rate limits based on conditions. Examples: Increase limit during business hours, decrease during high system load (90% capacity → reduce by 50%), increase for good actors (users with no violations), decrease for suspicious behavior. Implementation: Calculate limit dynamically based on current conditions before checking Redis.

How should clients handle rate limiting (best practices)?
?

  1. Exponential backoff: Wait 1s, 2s, 4s, 8s between retries. 2. Respect Retry-After header (don’t spam). 3. Monitor X-RateLimit-Remaining (stop before hitting limit). 4. Queue non-urgent requests (don’t send all at once). 5. Implement circuit breaker (stop after repeated 429s). 6. Handle 429 gracefully (show user-friendly message).

What are exemptions and allowlists in rate limiting?
?
Bypass rate limiting for certain users/services. Examples: Internal services (service-to-service calls), health checks (monitoring tools), admin users, trusted partners/integrations, verified users with premium accounts. Implementation: Check user role before rate limit check, return early if exempt, still log for auditing.

How do you test a rate limiter?
?
Unit tests: Test algorithm logic (token bucket refill, window calculations). Integration tests: Test with Redis, verify atomic operations. Load tests: Simulate millions of requests, verify accuracy. Chaos tests: Kill Redis, network partitions, verify failover. Monitor in production: Track false positives (legitimate requests blocked), false negatives (abusive requests allowed), adjust thresholds.

What are common use cases for rate limiting?
?
Security: Prevent brute force attacks (5 login attempts/min), prevent DoS/DDoS. Cost control: Limit calls to paid third-party APIs. Resource protection: Prevent database overload, prevent cache stampede. Fairness: Ensure all users get fair share of resources. Tiered service: Free tier (100 req/hour), premium tier (10K req/hour).

Total Cards: 30
Review Time: 20-25 minutes
Priority: HIGH - Very common interview question!
Last Updated: 2026-04-09