Chapter 8: Design a URL Shortener
volume1 url-shortener hashing system-design
Status: π© Interview ready - Very common question!
Difficulty: Easy-Medium
Time to complete: 40 min read + practice
Overview
A URL shortener converts a long URL into a short alias (e.g., https://www.example.com/very/long/path?query=value β https://tinyurl.com/abc1234). When users visit the short URL, they are redirected to the original long URL.
Real-world examples: TinyURL, bit.ly, t.co (Twitter), goo.gl (Google, deprecated)
Why this matters:
- Very common interview question (Easy-Medium difficulty, great signal on depth)
- Covers hashing, base encoding, database design, caching
- Clean example of read-heavy system with write amplification concerns
Problem Statement
Design a URL shortener service that:
- Accepts a long URL and returns a short URL (POST)
- Redirects a short URL to the original long URL (GET)
- Short URLs are short as possible, unique, and URL-safe
- Handles high read traffic (read-heavy system)
API Endpoints:
POST /api/v1/data/shorten
Request: { "longUrl": "https://www.example.com/very/long/path?query=value" }
Response: { "shortUrl": "https://tinyurl.com/abc1234" }
GET /{shortURL}
Response: 301 or 302 redirect to longUrl
Step 1: Requirements & Scope (5 min)
Functional Requirements
Clarifying questions:
- What is the expected scale? β 100 million URLs generated per day
- How short should URLs be? β As short as possible (7 characters)
- What characters are allowed? β [0-9, a-z, A-Z] = 62 characters
- Can users update or delete URLs? β No (simplification)
- What is the read:write ratio? β ~100:1 (read-heavy)
Scope:
- Shorten long URLs β unique 7-character short URLs
- Redirect short URLs to the original long URL
- High availability and low latency for redirects
Non-Functional Requirements
- High availability: Redirects must always work (core function)
- Low latency: Redirect should be fast (< 50ms)
- Scalability: 100M new URLs/day, 10 billion redirects/day
- Durability: Short URLs must persist (not expire unless configured)
- Uniqueness: No two long URLs share the same short URL
Back-of-the-Envelope Estimation
Write operations:
100 million URLs / day
= 100,000,000 / 86,400 seconds
= ~1,160 writes/second
Read operations (100:1 ratio):
1,160 writes/sec Γ 100 = ~116,000 reads/second
β 11,600 reads/second (rounding conservatively)
Storage:
100 million URLs/day Γ 365 days Γ 10 years
= 365 billion records over 10 years
Avg record size: ~500 bytes (URL up to 2048 chars, metadata)
365 billion Γ 500 bytes β 182 TB over 10 years
URL character math:
Characters: [0-9] + [a-z] + [A-Z]
= 10 + 26 + 26 = 62 characters
7-character short URL:
62^7 = 3,521,614,606,208 β 3.5 trillion unique URLs
At 100M/day: 3.5 trillion / 100M = 35,000 days β ~95 years
More than enough!
Step 2: High-Level Design (10 min)
API Endpoints
URL Shortening (Write):
POST /api/v1/data/shorten
Content-Type: application/json
{
"longUrl": "https://www.example.com/very/long/path?query=value"
}
Response 201 Created:
{
"shortUrl": "https://tinyurl.com/abc1234"
}
URL Redirecting (Read):
GET /abc1234
Response: 301 Moved Permanently
Location: https://www.example.com/very/long/path?query=value
301 vs 302 Redirect
This is a key design decision that interviewers love to ask about!
| Aspect | 301 Permanent | 302 Temporary |
|---|---|---|
| Browser caching | Yes (caches redirect, no future requests) | No (browser always asks server) |
| Server load | Lower (after first visit, browser goes direct) | Higher (every visit hits server) |
| Analytics | Cannot track repeat visits | Can track every click |
| Use case | Reduce server load | Track analytics |
Which to use?
- Use 301 if reducing server load is priority (most URL shorteners)
- Use 302 if tracking every click matters (marketing analytics, A/B testing)
Visual:
301 Permanent Redirect:
Client β Short URL Server β "301, go to longURL"
Client β Long URL (direct, browser cached, no more short URL server!)
302 Temporary Redirect:
Client β Short URL Server β "302, go to longURL"
Client β Long URL
Client β Short URL Server β "302, go to longURL" (again, every time!)
Client β Long URL
Data Model
URL table:
CREATE TABLE url_mapping (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
short_url VARCHAR(7) NOT NULL UNIQUE,
long_url VARCHAR(2048) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
user_id BIGINT -- nullable, for logged-in users
);
-- Index for fast short_url lookup (redirect path)
CREATE INDEX idx_short_url ON url_mapping(short_url);Why relational DB?:
- Data is structured (fixed schema)
- Need strong consistency (no two records with same short URL)
- Supports ACID transactions
- MySQL/PostgreSQL at scale with read replicas
Hash Function Overview
Goal: Map long URL β 7-character hash value
hash(longURL) = hashValue (7-character alphanumeric string)
Then look up in DB:
GET /{shortURL}:
1. shortURL β DB lookup β longURL
2. Return redirect to longURL
Basic Architecture
βββββββββββββββββββββββββββββ
β Load Balancer β
βββββββββββββ¬ββββββββββββββββ
β
βββββββββββββββββββΌββββββββββββββββββ
β β β
ββββββββββββ ββββββββββββ ββββββββββββ
β API Srv 1β β API Srv 2β β API Srv 3β
ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ
β β β
βββββββββββββββββββββββββββββββββββββββββββββββ
β Cache β
β (Redis Cluster) β
β shortURL β longURL mapping β
ββββββββββββββββββββββ¬ββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββ
β Database β
β (MySQL/PostgreSQL) β
β Primary + Read Replicasβ
βββββββββββββββββββββββββββ
Step 3: Deep Dive (20 min)
Approach 1: Hash + Collision Detection
Algorithm:
- Compute
hash = MD5(longURL)orSHA-256(longURL) - Take first 7 characters of the hash hex string
- Check if this 7-char string already exists in DB
- If collision (same short URL exists for different long URL) β append a salt and rehash
- Repeat until unique
MD5 example:
Input: "https://www.example.com/very/long/path?query=value"
MD5: "21232f297a57a5a743894a0e4a801fc3"
Take 7: "21232f2" β shortURL
Collision handling:
def generate_short_url(long_url):
hash_input = long_url
while True:
short = md5(hash_input)[:7]
if not db.exists(short):
return short
# Collision! Append suffix and retry
hash_input = long_url + str(random.random())Problems with this approach:
- β Multiple DB lookups per write (to check collision)
- β Collision check adds latency
- β Same long URL hashed twice β two different short URLs (if salt used)
Bloom filter optimization:
Use a Bloom filter to check if short URL exists before DB query.
- Bloom filter: Fast probabilistic data structure (false positives OK, no false negatives)
- Check Bloom filter first β only query DB if possibly exists
- Reduces DB load significantly
Approach 2: Base 62 Encoding of Auto-Increment ID
Algorithm:
- Insert the long URL into DB β get auto-increment ID (e.g.,
2009215674938) - Convert ID to base-62 string
- Pad or trim to 7 characters
- That 7-character string is the short URL
Base-62 alphabet:
0-9 β digits 0-9
a-z β digits 10-35
A-Z β digits 36-61
Base-62 conversion example:
ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
def to_base62(num):
if num == 0:
return ALPHABET[0]
result = []
while num > 0:
result.append(ALPHABET[num % 62])
num //= 62
return ''.join(reversed(result))
# Example:
to_base62(11157) β "2TX"
to_base62(2009215674938) β "0xtVTT" (6-7 chars for large IDs)Flow:
POST /api/v1/data/shorten
1. Insert longURL into DB β id = 11157
2. base62(11157) = "2TX"
3. Short URL = "https://tinyurl.com/2TX"
4. Update DB record with shortURL = "2TX"
5. Return short URL to client
ID uniqueness guarantee:
- Auto-increment IDs are globally unique
- No collision check needed!
- But: exposes sequential pattern (ID 1, 2, 3β¦ β a, b, cβ¦)
Security concern: Sequential IDs are guessable. Solutions:
- Start auto-increment at a large random number
- Add a random salt to the ID before encoding
- Use a separate ID generator service (Snowflake-style IDs)
Hash vs Base62 Comparison
| Aspect | Hash + Collision Detection | Base62 Encoding |
|---|---|---|
| URL length | Fixed 7 chars | Variable (grows with ID) |
| Uniqueness | Must check for collisions | Guaranteed (auto-increment) |
| Predictability | Not predictable | Sequential (guessable) |
| Single server | Works fine | OK but ID conflicts at scale |
| Distributed | Easy (no shared state) | Needs coordination (global ID) |
| Same long URL | Same short URL (deterministic) | Different short URL each time |
| Collision risk | Yes (need handling logic) | No collisions |
| DB writes | Multiple (collision check) | One write per URL |
| Recommended | Good for distributed | Good for single-region |
Which to choose?
- Base62 is simpler and collision-free β preferred for interviews
- Hash works well in fully distributed scenarios (no coordination needed)
URL Redirecting Flow (Read Path)
Client Load Balancer API Server Cache DB
| | | | |
| GET /abc1234 | | | |
|ββββββββββββββββββββββ | | | |
| | GET /abc1234 | | |
| |ββββββββββββββββββ| | |
| | | Cache hit? | |
| | |βββββββββββββββ| |
| | | longURL | |
| | |βββββββββββββββ| |
| | | (if miss) | |
| | | | DB lookup |
| | |ββββββββββββββββββββββββββββ|
| | | longURL | |
| | |ββββββββββββββββββββββββββββ|
| | | write to cache| |
| | |βββββββββββββββ| |
| 301/302 + Location | | | |
|ββββββββββββββββββββββββββββββββββββββββββ | | |
Cache strategy:
Cache: Redis
Key: shortURL (e.g., "abc1234")
Value: longURL (e.g., "https://www.example.com/...")
TTL: 24 hours (or no TTL if URLs never expire)
Eviction: LRU (Least Recently Used)
Size: Cache top 20% of URLs (handles 80% of traffic)
Cache sizing:
11,600 reads/sec Γ 86,400 seconds = ~1 billion reads/day
Top 20% of URLs handle 80% of traffic (Pareto principle)
1B reads Γ 20% = 200M unique short URLs to cache
200M Γ 100 bytes/entry = 20 GB cache (manageable!)
URL Shortening Flow (Write Path)
With Base62 approach:
1. Client sends POST /api/v1/data/shorten { longUrl: "..." }
2. API server checks if longUrl already in DB (dedup)
- If yes: return existing shortUrl (same long URL β same short URL)
- If no: proceed
3. Generate new ID (auto-increment or Snowflake)
4. Convert ID to base62 β shortUrl
5. Save { id, shortUrl, longUrl, createdAt } to DB
6. Write shortUrl β longUrl to cache
7. Return shortUrl to client
Deduplication:
-- Check if longUrl already exists
SELECT short_url FROM url_mapping WHERE long_url = ?;
-- If not found, insert new
INSERT INTO url_mapping (short_url, long_url) VALUES (?, ?);Rate Limiting to Prevent Abuse
Without rate limiting, a single client could generate millions of URLs:
Rate Limit Rules:
- Unauthenticated: 10 URL shortenings per hour per IP
- Free tier: 100 URL shortenings per day per user
- Premium tier: 10,000 URL shortenings per day per user
- Redirect reads: 10,000 per minute per IP (prevent scraping)
Response when exceeded:
HTTP 429 Too Many Requests
Retry-After: 3600
Analytics (Optional Feature)
Track clicks per short URL for marketing use cases:
CREATE TABLE click_events (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
short_url VARCHAR(7) NOT NULL,
clicked_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
ip_address VARCHAR(45),
country VARCHAR(2),
user_agent VARCHAR(512),
referrer VARCHAR(2048)
);Write path with analytics:
302 redirect (not 301) so every click goes through server
β Async: write click event to Kafka/message queue
β Analytics service consumes from queue β aggregates β stores in data warehouse
β Dashboard shows: total clicks, geographic breakdown, referrer sources
Design Summary
Final Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Clients (Browser / App) β
ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββ
β Load Balancer β
β (Route by path prefix) β
ββββββββββββββ¬βββββββββββββ
β
ββββββββββββββββββββΌβββββββββββββββββββ
β β β
βββββββββββββ βββββββββββββ βββββββββββββ
β API Srv 1 β β API Srv 2 β β API Srv 3 β
β(stateless)β β(stateless)β β(stateless)β
βββββββ¬ββββββ βββββββ¬ββββββ βββββββ¬ββββββ
β β β
βββββββββββββββββββΌββββββββββββββββββββ
β
βββββββββββββββββββββββββββ
β Redis Cache β
β shortURL β longURL β
β (20 GB, LRU, 24h TTL) β
ββββββββββββββ¬βββββββββββββ
β cache miss
βββββββββββββββββββββββββββ
β MySQL β
β Primary (Write) β
β Read Replicas (Read) β
β ~182 TB over 10 years β
βββββββββββββββββββββββββββ
Key Decisions Summary
| Decision | Choice | Reasoning |
|---|---|---|
| Short URL length | 7 characters | 62^7 = 3.5 trillion, enough for 95+ years |
| Hash algorithm | Base62 of auto-increment ID | No collision, simple, deterministic |
| Redirect type | 301 (default) / 302 (analytics) | 301 reduces server load; 302 tracks clicks |
| Cache | Redis with LRU | Read-heavy: 100:1 ratio, cache top URLs |
| Database | MySQL with read replicas | Structured data, needs ACID, high read scale |
| Deduplication | Check longUrl before insert | Same long URL returns same short URL |
| Rate limiting | Per-user tier limits | Prevent abuse, API gateway enforcement |
Interview Questions & Answers
Q: Why 301 over 302 redirect for a URL shortener?
A: 301 (Permanent) tells browsers to cache the redirect, so subsequent visits to the same short URL go directly to the destination without hitting our server. This dramatically reduces server load for popular links. Use 302 (Temporary) only when you need to track every single click (e.g., marketing analytics), since 302 forces the browser to ask our server every time.
Q: How do you handle hash collisions in the hash-based approach?
A: On each hash, check if the short URL already exists in the DB (or Bloom filter for speed). If it exists and maps to a different long URL, append a random salt to the long URL and rehash. Repeat until unique. To avoid multiple DB queries, use a Bloom filter: fast probabilistic lookup, false positives are OK (just do a DB check), no false negatives.
Q: What if two users submit the same long URL at the same time?
A: In the base62 approach, theyβll get two different short URLs (different auto-increment IDs). To deduplicate, first query the DB for the long URL. If found, return the existing short URL. Use a database unique index on long_url and handle insert conflicts. Alternatively, accept two short URLs mapping to the same long URL (simpler, extra storage is minor).
Q: How do you scale the write path at 1,160 writes/second?
A: Single MySQL primary handles ~5,000-10,000 writes/sec comfortably. At 1,160 writes/sec, one primary is sufficient. For future scale: shard by user ID or first character of shortURL. Use connection pooling. Buffer writes with a write-through cache. If auto-increment ID coordination becomes a bottleneck, use a distributed ID generator (Snowflake, UUID-to-base62).
Q: How do you protect against URL shortener abuse (spam, malware)?
A: (1) Rate limiting per IP and user account. (2) URL validation: check that the long URL is reachable (optional). (3) Blacklist known malicious domains. (4) Integrate Google Safe Browsing API to scan submitted URLs. (5) CAPTCHA for anonymous submissions. (6) Flag and review reported URLs.
Key Takeaways
- 62^7 = 3.5 trillion β 7 characters with base-62 encoding supports ~95 years of URLs at 100M/day
- Base62 encoding of auto-increment IDs is simpler and collision-free vs hash + collision detection
- 301 vs 302 is a key trade-off: 301 reduces server load (browser caches), 302 enables analytics (every click tracked)
- Cache is critical β read-heavy (100:1 ratio) makes Redis cache the most impactful optimization; top 20% URLs serve 80% of traffic
- Deduplication β always check if long URL already exists before creating a new short URL
- Read replicas β scale the read path (11,600 reads/sec) without overloading primary DB
- Rate limiting β essential to prevent abuse; tier-based limits (free vs premium) are industry standard
Related Resources
- distributed-system-components > Cache - Redis caching strategy
- key-patterns > Hashing - Base62 encoding and hash functions
- ch04-rate-limiter - Rate limiting to prevent abuse
- ch03-framework-for-system-design - Apply framework to this problem
Practice this design! Very common interview question. Be ready to:
- Walk through the base62 encoding step-by-step
- Explain 301 vs 302 trade-offs clearly
- Draw the full architecture with cache and read replicas
- Discuss how to scale beyond a single server
Last Updated: 2026-04-13
Status: Very common interview question - Must know!