Chapter 8: Design a URL Shortener

volume1 url-shortener hashing system-design

Status: 🟩 Interview ready - Very common question!
Difficulty: Easy-Medium
Time to complete: 40 min read + practice

Overview

A URL shortener converts a long URL into a short alias (e.g., https://www.example.com/very/long/path?query=value → https://tinyurl.com/abc1234). When users visit the short URL, they are redirected to the original long URL.

Real-world examples: TinyURL, bit.ly, t.co (Twitter), goo.gl (Google, deprecated)

Why this matters:

Very common interview question (Easy-Medium difficulty, great signal on depth)
Covers hashing, base encoding, database design, caching
Clean example of read-heavy system with write amplification concerns

Problem Statement

Design a URL shortener service that:

Accepts a long URL and returns a short URL (POST)
Redirects a short URL to the original long URL (GET)
Short URLs are short as possible, unique, and URL-safe
Handles high read traffic (read-heavy system)

API Endpoints:

POST /api/v1/data/shorten
  Request:  { "longUrl": "https://www.example.com/very/long/path?query=value" }
  Response: { "shortUrl": "https://tinyurl.com/abc1234" }

GET /{shortURL}
  Response: 301 or 302 redirect to longUrl

Step 1: Requirements & Scope (5 min)

Functional Requirements

Clarifying questions:

What is the expected scale? → 100 million URLs generated per day
How short should URLs be? → As short as possible (7 characters)
What characters are allowed? → [0-9, a-z, A-Z] = 62 characters
Can users update or delete URLs? → No (simplification)
What is the read:write ratio? → ~100:1 (read-heavy)

Scope:

Shorten long URLs → unique 7-character short URLs
Redirect short URLs to the original long URL
High availability and low latency for redirects

Non-Functional Requirements

High availability: Redirects must always work (core function)
Low latency: Redirect should be fast (< 50ms)
Scalability: 100M new URLs/day, 10 billion redirects/day
Durability: Short URLs must persist (not expire unless configured)
Uniqueness: No two long URLs share the same short URL

Back-of-the-Envelope Estimation

Write operations:

100 million URLs / day
= 100,000,000 / 86,400 seconds
= ~1,160 writes/second

Read operations (100:1 ratio):

1,160 writes/sec × 100 = ~116,000 reads/second
≈ 11,600 reads/second (rounding conservatively)

Storage:

100 million URLs/day × 365 days × 10 years
= 365 billion records over 10 years

Avg record size: ~500 bytes (URL up to 2048 chars, metadata)
365 billion × 500 bytes ≈ 182 TB over 10 years

URL character math:

Characters: [0-9] + [a-z] + [A-Z]
= 10 + 26 + 26 = 62 characters

7-character short URL:
62^7 = 3,521,614,606,208 ≈ 3.5 trillion unique URLs

At 100M/day: 3.5 trillion / 100M = 35,000 days ≈ ~95 years
More than enough!

Step 2: High-Level Design (10 min)

API Endpoints

URL Shortening (Write):

POST /api/v1/data/shorten
Content-Type: application/json

{
  "longUrl": "https://www.example.com/very/long/path?query=value"
}

Response 201 Created:
{
  "shortUrl": "https://tinyurl.com/abc1234"
}

URL Redirecting (Read):

GET /abc1234

Response: 301 Moved Permanently
Location: https://www.example.com/very/long/path?query=value

301 vs 302 Redirect

This is a key design decision that interviewers love to ask about!

Aspect	301 Permanent	302 Temporary
Browser caching	Yes (caches redirect, no future requests)	No (browser always asks server)
Server load	Lower (after first visit, browser goes direct)	Higher (every visit hits server)
Analytics	Cannot track repeat visits	Can track every click
Use case	Reduce server load	Track analytics

Which to use?

Use 301 if reducing server load is priority (most URL shorteners)
Use 302 if tracking every click matters (marketing analytics, A/B testing)

Visual:

301 Permanent Redirect:
  Client → Short URL Server → "301, go to longURL"
  Client → Long URL (direct, browser cached, no more short URL server!)

302 Temporary Redirect:
  Client → Short URL Server → "302, go to longURL"
  Client → Long URL
  Client → Short URL Server → "302, go to longURL" (again, every time!)
  Client → Long URL

Data Model

URL table:

CREATE TABLE url_mapping (
    id          BIGINT       PRIMARY KEY AUTO_INCREMENT,
    short_url   VARCHAR(7)   NOT NULL UNIQUE,
    long_url    VARCHAR(2048) NOT NULL,
    created_at  TIMESTAMP    DEFAULT CURRENT_TIMESTAMP,
    user_id     BIGINT       -- nullable, for logged-in users
);
 
-- Index for fast short_url lookup (redirect path)
CREATE INDEX idx_short_url ON url_mapping(short_url);

Why relational DB?:

Data is structured (fixed schema)
Need strong consistency (no two records with same short URL)
Supports ACID transactions
MySQL/PostgreSQL at scale with read replicas

Hash Function Overview

Goal: Map long URL → 7-character hash value

hash(longURL) = hashValue  (7-character alphanumeric string)

Then look up in DB:

GET /{shortURL}:
  1. shortURL → DB lookup → longURL
  2. Return redirect to longURL

Basic Architecture

                          ┌───────────────────────────┐
                          │        Load Balancer       │
                          └───────────┬───────────────┘
                                      │
                    ┌─────────────────┼─────────────────┐
                    ↓                 ↓                  ↓
             ┌──────────┐     ┌──────────┐      ┌──────────┐
             │ API Srv 1│     │ API Srv 2│      │ API Srv 3│
             └────┬─────┘     └────┬─────┘      └────┬─────┘
                  │               │                   │
             ┌────↓───────────────↓───────────────────↓────┐
             │                  Cache                       │
             │              (Redis Cluster)                 │
             │         shortURL → longURL mapping           │
             └────────────────────┬───────────────────────┘
                                  │
                     ┌────────────↓────────────┐
                     │        Database         │
                     │    (MySQL/PostgreSQL)    │
                     │   Primary + Read Replicas│
                     └─────────────────────────┘

Step 3: Deep Dive (20 min)

Approach 1: Hash + Collision Detection

Algorithm:

Compute hash = MD5(longURL) or SHA-256(longURL)
Take first 7 characters of the hash hex string
Check if this 7-char string already exists in DB
If collision (same short URL exists for different long URL) → append a salt and rehash
Repeat until unique

MD5 example:

Input:  "https://www.example.com/very/long/path?query=value"
MD5:    "21232f297a57a5a743894a0e4a801fc3"
Take 7: "21232f2"  ← shortURL

Collision handling:

def generate_short_url(long_url):
    hash_input = long_url
    while True:
        short = md5(hash_input)[:7]
        if not db.exists(short):
            return short
        # Collision! Append suffix and retry
        hash_input = long_url + str(random.random())

Problems with this approach:

❌ Multiple DB lookups per write (to check collision)
❌ Collision check adds latency
❌ Same long URL hashed twice → two different short URLs (if salt used)

Bloom filter optimization:

Use a Bloom filter to check if short URL exists before DB query.
- Bloom filter: Fast probabilistic data structure (false positives OK, no false negatives)
- Check Bloom filter first → only query DB if possibly exists
- Reduces DB load significantly

Approach 2: Base 62 Encoding of Auto-Increment ID

Algorithm:

Insert the long URL into DB → get auto-increment ID (e.g., 2009215674938)
Convert ID to base-62 string
Pad or trim to 7 characters
That 7-character string is the short URL

Base-62 alphabet:

0-9  → digits 0-9
a-z  → digits 10-35
A-Z  → digits 36-61

Base-62 conversion example:

ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
 
def to_base62(num):
    if num == 0:
        return ALPHABET[0]
    result = []
    while num > 0:
        result.append(ALPHABET[num % 62])
        num //= 62
    return ''.join(reversed(result))
 
# Example:
to_base62(11157) → "2TX"
to_base62(2009215674938) → "0xtVTT" (6-7 chars for large IDs)

Flow:

POST /api/v1/data/shorten
  1. Insert longURL into DB → id = 11157
  2. base62(11157) = "2TX"
  3. Short URL = "https://tinyurl.com/2TX"
  4. Update DB record with shortURL = "2TX"
  5. Return short URL to client

ID uniqueness guarantee:

Auto-increment IDs are globally unique
No collision check needed!
But: exposes sequential pattern (ID 1, 2, 3… → a, b, c…)

Security concern: Sequential IDs are guessable. Solutions:

Start auto-increment at a large random number
Add a random salt to the ID before encoding
Use a separate ID generator service (Snowflake-style IDs)

Hash vs Base62 Comparison

Aspect	Hash + Collision Detection	Base62 Encoding
URL length	Fixed 7 chars	Variable (grows with ID)
Uniqueness	Must check for collisions	Guaranteed (auto-increment)
Predictability	Not predictable	Sequential (guessable)
Single server	Works fine	OK but ID conflicts at scale
Distributed	Easy (no shared state)	Needs coordination (global ID)
Same long URL	Same short URL (deterministic)	Different short URL each time
Collision risk	Yes (need handling logic)	No collisions
DB writes	Multiple (collision check)	One write per URL
Recommended	Good for distributed	Good for single-region

Which to choose?

Base62 is simpler and collision-free → preferred for interviews
Hash works well in fully distributed scenarios (no coordination needed)

URL Redirecting Flow (Read Path)

Client                Load Balancer       API Server        Cache         DB
  |                        |                  |               |            |
  |  GET /abc1234          |                  |               |            |
  |─────────────────────→  |                  |               |            |
  |                        |  GET /abc1234    |               |            |
  |                        |─────────────────→|               |            |
  |                        |                  | Cache hit?    |            |
  |                        |                  |──────────────→|            |
  |                        |                  | longURL       |            |
  |                        |                  |←──────────────|            |
  |                        |                  |   (if miss)   |            |
  |                        |                  |               | DB lookup  |
  |                        |                  |───────────────────────────→|
  |                        |                  | longURL       |            |
  |                        |                  |←───────────────────────────|
  |                        |                  | write to cache|            |
  |                        |                  |──────────────→|            |
  |  301/302 + Location    |                  |               |            |
  |←───────────────────────────────────────── |               |            |

Cache strategy:

Cache: Redis
Key:   shortURL (e.g., "abc1234")
Value: longURL  (e.g., "https://www.example.com/...")
TTL:   24 hours (or no TTL if URLs never expire)
Eviction: LRU (Least Recently Used)
Size:  Cache top 20% of URLs (handles 80% of traffic)

Cache sizing:

11,600 reads/sec × 86,400 seconds = ~1 billion reads/day
Top 20% of URLs handle 80% of traffic (Pareto principle)
1B reads × 20% = 200M unique short URLs to cache
200M × 100 bytes/entry = 20 GB cache (manageable!)

URL Shortening Flow (Write Path)

With Base62 approach:

1. Client sends POST /api/v1/data/shorten { longUrl: "..." }
2. API server checks if longUrl already in DB (dedup)
   - If yes: return existing shortUrl (same long URL → same short URL)
   - If no: proceed
3. Generate new ID (auto-increment or Snowflake)
4. Convert ID to base62 → shortUrl
5. Save { id, shortUrl, longUrl, createdAt } to DB
6. Write shortUrl → longUrl to cache
7. Return shortUrl to client

Deduplication:

-- Check if longUrl already exists
SELECT short_url FROM url_mapping WHERE long_url = ?;
 
-- If not found, insert new
INSERT INTO url_mapping (short_url, long_url) VALUES (?, ?);

Rate Limiting to Prevent Abuse

Without rate limiting, a single client could generate millions of URLs:

Rate Limit Rules:
- Unauthenticated: 10 URL shortenings per hour per IP
- Free tier: 100 URL shortenings per day per user
- Premium tier: 10,000 URL shortenings per day per user
- Redirect reads: 10,000 per minute per IP (prevent scraping)

Response when exceeded:
HTTP 429 Too Many Requests
Retry-After: 3600

Analytics (Optional Feature)

Track clicks per short URL for marketing use cases:

CREATE TABLE click_events (
    id          BIGINT PRIMARY KEY AUTO_INCREMENT,
    short_url   VARCHAR(7) NOT NULL,
    clicked_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    ip_address  VARCHAR(45),
    country     VARCHAR(2),
    user_agent  VARCHAR(512),
    referrer    VARCHAR(2048)
);

Write path with analytics:

302 redirect (not 301) so every click goes through server
→ Async: write click event to Kafka/message queue
→ Analytics service consumes from queue → aggregates → stores in data warehouse
→ Dashboard shows: total clicks, geographic breakdown, referrer sources

Design Summary

Final Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         Clients (Browser / App)                  │
└────────────────────────────────┬────────────────────────────────┘
                                 │
                    ┌────────────↓────────────┐
                    │       Load Balancer      │
                    │  (Route by path prefix)  │
                    └────────────┬────────────┘
                                 │
              ┌──────────────────┼──────────────────┐
              ↓                  ↓                   ↓
        ┌───────────┐     ┌───────────┐      ┌───────────┐
        │ API Srv 1 │     │ API Srv 2 │      │ API Srv 3 │
        │(stateless)│     │(stateless)│      │(stateless)│
        └─────┬─────┘     └─────┬─────┘      └─────┬─────┘
              │                 │                   │
              └─────────────────┼───────────────────┘
                                │
                   ┌────────────↓────────────┐
                   │       Redis Cache        │
                   │  shortURL → longURL      │
                   │   (20 GB, LRU, 24h TTL) │
                   └────────────┬────────────┘
                                │ cache miss
                   ┌────────────↓────────────┐
                   │         MySQL            │
                   │  Primary (Write)         │
                   │  Read Replicas (Read)    │
                   │  ~182 TB over 10 years  │
                   └─────────────────────────┘

Key Decisions Summary

Decision	Choice	Reasoning
Short URL length	7 characters	62^7 = 3.5 trillion, enough for 95+ years
Hash algorithm	Base62 of auto-increment ID	No collision, simple, deterministic
Redirect type	301 (default) / 302 (analytics)	301 reduces server load; 302 tracks clicks
Cache	Redis with LRU	Read-heavy: 100:1 ratio, cache top URLs
Database	MySQL with read replicas	Structured data, needs ACID, high read scale
Deduplication	Check longUrl before insert	Same long URL returns same short URL
Rate limiting	Per-user tier limits	Prevent abuse, API gateway enforcement

Interview Questions & Answers

Q: Why 301 over 302 redirect for a URL shortener?
A: 301 (Permanent) tells browsers to cache the redirect, so subsequent visits to the same short URL go directly to the destination without hitting our server. This dramatically reduces server load for popular links. Use 302 (Temporary) only when you need to track every single click (e.g., marketing analytics), since 302 forces the browser to ask our server every time.

Q: How do you handle hash collisions in the hash-based approach?
A: On each hash, check if the short URL already exists in the DB (or Bloom filter for speed). If it exists and maps to a different long URL, append a random salt to the long URL and rehash. Repeat until unique. To avoid multiple DB queries, use a Bloom filter: fast probabilistic lookup, false positives are OK (just do a DB check), no false negatives.

Q: What if two users submit the same long URL at the same time?
A: In the base62 approach, they’ll get two different short URLs (different auto-increment IDs). To deduplicate, first query the DB for the long URL. If found, return the existing short URL. Use a database unique index on long_url and handle insert conflicts. Alternatively, accept two short URLs mapping to the same long URL (simpler, extra storage is minor).

Q: How do you scale the write path at 1,160 writes/second?
A: Single MySQL primary handles ~5,000-10,000 writes/sec comfortably. At 1,160 writes/sec, one primary is sufficient. For future scale: shard by user ID or first character of shortURL. Use connection pooling. Buffer writes with a write-through cache. If auto-increment ID coordination becomes a bottleneck, use a distributed ID generator (Snowflake, UUID-to-base62).

Q: How do you protect against URL shortener abuse (spam, malware)?
A: (1) Rate limiting per IP and user account. (2) URL validation: check that the long URL is reachable (optional). (3) Blacklist known malicious domains. (4) Integrate Google Safe Browsing API to scan submitted URLs. (5) CAPTCHA for anonymous submissions. (6) Flag and review reported URLs.

Key Takeaways

62^7 = 3.5 trillion — 7 characters with base-62 encoding supports ~95 years of URLs at 100M/day
Base62 encoding of auto-increment IDs is simpler and collision-free vs hash + collision detection
301 vs 302 is a key trade-off: 301 reduces server load (browser caches), 302 enables analytics (every click tracked)
Cache is critical — read-heavy (100:1 ratio) makes Redis cache the most impactful optimization; top 20% URLs serve 80% of traffic
Deduplication — always check if long URL already exists before creating a new short URL
Read replicas — scale the read path (11,600 reads/sec) without overloading primary DB
Rate limiting — essential to prevent abuse; tier-based limits (free vs premium) are industry standard

distributed-system-components > Cache - Redis caching strategy
key-patterns > Hashing - Base62 encoding and hash functions
ch04-rate-limiter - Rate limiting to prevent abuse
ch03-framework-for-system-design - Apply framework to this problem

Practice this design! Very common interview question. Be ready to:

Walk through the base62 encoding step-by-step
Explain 301 vs 302 trade-offs clearly
Draw the full architecture with cache and read replicas
Discuss how to scale beyond a single server

Last Updated: 2026-04-13
Status: Very common interview question - Must know!

Study Notes by Niladri & AI

Explorer

ch08-url-shortener

Chapter 8: Design a URL Shortener

Overview

Problem Statement

Step 1: Requirements & Scope (5 min)

Functional Requirements

Non-Functional Requirements

Back-of-the-Envelope Estimation

Step 2: High-Level Design (10 min)

API Endpoints

301 vs 302 Redirect

Data Model

Hash Function Overview

Basic Architecture

Step 3: Deep Dive (20 min)

Approach 1: Hash + Collision Detection

Approach 2: Base 62 Encoding of Auto-Increment ID

Hash vs Base62 Comparison

URL Redirecting Flow (Read Path)

URL Shortening Flow (Write Path)

Rate Limiting to Prevent Abuse

Analytics (Optional Feature)

Design Summary

Final Architecture

Key Decisions Summary

Interview Questions & Answers

Key Takeaways

Graph View

Table of Contents

Backlinks

Study Notes by Niladri & AI

Explorer

ch08-url-shortener

Chapter 8: Design a URL Shortener

Overview

Problem Statement

Step 1: Requirements & Scope (5 min)

Functional Requirements

Non-Functional Requirements

Back-of-the-Envelope Estimation

Step 2: High-Level Design (10 min)

API Endpoints

301 vs 302 Redirect

Data Model

Hash Function Overview

Basic Architecture

Step 3: Deep Dive (20 min)

Approach 1: Hash + Collision Detection

Approach 2: Base 62 Encoding of Auto-Increment ID

Hash vs Base62 Comparison

URL Redirecting Flow (Read Path)

URL Shortening Flow (Write Path)

Rate Limiting to Prevent Abuse

Analytics (Optional Feature)

Design Summary

Final Architecture

Key Decisions Summary

Interview Questions & Answers

Key Takeaways

Related Resources

Graph View

Table of Contents

Backlinks