Chapter 14: Design YouTube (Video Streaming Service)
volume1 youtube video-streaming cdn storage
Status: π© Interview ready - Very common question!
Difficulty: Hard
Time to complete: 45 min read + practice
Overview
YouTube is one of the most visited websites in the world. Designing a video streaming service means solving hard problems: uploading large files reliably, transcoding video into many formats efficiently, and delivering it to millions of concurrent viewers with low latency.
Why this matters:
- One of the most common hard system design questions
- Covers CDN, blob storage, message queues, transcoding pipelines
- Real-world: YouTube, Netflix, TikTok, Twitch, Hulu
Problem Statement
Design a video streaming service that:
- Allows users to upload videos
- Streams videos to users (mobile, web, smart TV)
- Supports search and recommendations
- Handles multi-resolution playback (360p to 4K)
- Available internationally with low latency
Step 1: Requirements & Scope (5 min)
Functional Requirements
Clarifying questions:
- Upload and stream only, or also search/recommendations? β Yes to all
- What devices? β Mobile, web, smart TV
- Video resolution support? β Multiple: 360p, 480p, 720p, 1080p, 4K
- International users? β Yes, globally available
- Max upload size? β 1 GB max per video
- Live streaming? β No, focus on pre-recorded (on-demand)
Scope:
- Upload videos with progress tracking
- Transcode videos into multiple resolutions/formats
- Stream videos globally with low buffering
- Search video library
- View count, likes, comments (basic metadata)
Non-Functional Requirements
- Availability: 99.99% uptime (video streaming is loss-tolerant, not mission critical)
- Reliability: No data loss for uploaded videos
- Scalability: 5M DAU, 10M concurrent viewers at peak
- Low latency streaming: Fast startup, smooth playback
- Durability: Multiple redundant copies of video storage
Scale Estimation
Users:
5M DAU
10% upload videos: 500K uploaders/day
Upload rate: 300 hours of video per minute
Storage (uploads):
300 hours/min Γ 60 min/hr Γ 24 hr/day = 432,000 hours/day
1 hour video at 1080p β 4 GB compressed
432,000 hours Γ 4 GB = ~1.7 PB/day raw storage
Transcoding:
Each video β 5 formats (360p, 480p, 720p, 1080p, 4K) β 5Γ storage
1.7 PB Γ 5 = ~8.5 PB/day total transcoded storage
Streaming bandwidth:
5M DAU Γ 5 min avg watch = 25M min/day
= 25M Γ 60 sec Γ 4 Mbps (avg 720p) / 8 bits = 750 TB/day CDN traffic
CDN bandwidth:
750 TB / 86,400 sec β 70 Gbps average egress
Step 2: High-Level Design (10 min)
Two Core Flows
Flow 1: Video Upload
Client β Load Balancer β API Servers β Metadata DB (MySQL)
β
Original Storage (S3)
β
Transcoding Pipeline
β
Transcoded Storage (S3)
β
CDN Distribution
Flow 2: Video Streaming
Client β CDN Edge Server (cache hit) β Video Content
Client β CDN Edge Server (cache miss) β Origin (S3) β CDN β Client
Component Overview
| Component | Purpose |
|---|---|
| Load Balancer | Distribute incoming traffic |
| API Servers | Handle upload requests, metadata CRUD |
| Original Storage (S3) | Store raw uploaded videos |
| Transcoding Service | Convert video to multiple formats (CPU-heavy) |
| Transcoded Storage (S3) | Store output for each resolution |
| CDN | Deliver video content globally, low latency |
| Metadata DB (MySQL) | Video info, user info, view counts, likes |
| Message Queue | Decouple upload from transcoding |
| Completion Queue | Signal when transcoding done, trigger CDN push |
High-Level Architecture Diagram
βββββββββββββββ ββββββββββββ βββββββββββββββββββ
β Uploaders βββββββ Load βββββββ API Servers β
βββββββββββββββ β Balancer β ββββββββββ¬βββββββββ
ββββββββββββ β
βββββββββββΌβββββββββββ
βΌ βΌ βΌ
ββββββββββββ βββββββββ ββββββββββ
βMetadata β β S3 β β MQ β
β DB β β(raw) β β(jobs) β
ββββββββββββ βββββββββ βββββ¬βββββ
β
βββββββββββΌββββββββ
β Transcoding β
β Workers (DAG) β
βββββββββββ¬ββββββββ
β
βββββββββββΌββββββββ
β S3 Transcoded β
βββββββββββ¬ββββββββ
β
βββββββββββΌββββββββ
β CDN β
β (Edge Servers) β
βββββββββββ¬ββββββββ
β
βββββββββββββββ ββββββββββββ β
β Viewers βββββββ CDN βββββββββββββββββββββββββ
βββββββββββββββ β Edge β
ββββββββββββ
Step 3: Deep Dive (20 min)
Video Transcoding Pipeline
Why Transcode?
Problem: A video uploaded at 4K, H.264, MOV format:
- Won't play on older mobile devices (no 4K support)
- Kills mobile data (4K = ~25 Mbps vs 360p = ~0.5 Mbps)
- Chrome doesn't support MOV (needs MP4 or WebM)
- Smart TVs may need HLS format specifically
Solution: Transcode once into ALL needed formats
1080p MP4 (H.264) β for web/desktop
720p MP4 (H.264) β for HD mobile
480p MP4 (H.264) β for standard mobile
360p MP4 (H.264) β for low-bandwidth mobile
4K MP4 (H.265) β for premium devices
HLS/DASH manifests β for adaptive streaming
DAG Model for Transcoding Pipeline
Instead of one sequential pipeline, use a Directed Acyclic Graph (DAG) to parallelize independent steps:
ββββββββββββββββ
β Original S3 β
ββββββββ¬ββββββββ
β
ββββββββΌββββββββ
β Preprocessor β (split video into segments, validate)
ββββββββ¬ββββββββ
βββββββββββββββββΌββββββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββββ βββββββββββββ βββββββββββββββββββ
β Video β β Audio β β Thumbnail β
β Encoder β β Encoder β β Generator β
ββββββββ¬βββββββ βββββββ¬ββββββ βββββββββ¬ββββββββββ
β β β
ββββββββΌβββββββ βββββββΌββββββ βββββββββΌββββββββββ
β 360p/480p/ β β AAC/MP3 β β Thumbnail S3 β
β 720p/1080p β β output β βββββββββββββββββββ
ββββββββ¬βββββββ βββββββ¬ββββββ
β β
βββββββββ¬ββββββββ
βΌ
βββββββββββββββββ
β Watermarker β (add logo/DRM)
βββββββββ¬ββββββββ
βΌ
βββββββββββββββββ
β Manifest β (generate HLS/DASH .m3u8 / .mpd)
β Generator β
βββββββββ¬ββββββββ
βΌ
βββββββββββββββββ
β Transcoded S3 β
βββββββββββββββββ
Key insight: Video encoding, audio encoding, and thumbnail generation are independent β run in parallel for max throughput.
Pipeline Stages Explained
| Stage | What it does | Tool |
|---|---|---|
| Preprocessor | Split video into 2-minute segments (GOP-aligned), validate format, extract metadata | FFmpeg |
| Video Encoder | Re-encode at each target resolution/bitrate | FFmpeg (libx264, libx265) |
| Audio Encoder | Normalize audio, convert to AAC/MP3 | FFmpeg |
| Thumbnail Generator | Extract frames at key timestamps for preview | FFmpeg |
| Watermarker | Overlay platform logo, embed DRM info | FFmpeg + DRM SDK |
| Manifest Generator | Create HLS .m3u8 or MPEG-DASH .mpd files | Custom script |
FFmpeg command example:
# Transcode to 720p H.264 MP4
ffmpeg -i input.mov \
-c:v libx264 -vf scale=1280:720 \
-b:v 2500k \
-c:a aac -b:a 128k \
output_720p.mp4Adaptive Bitrate Streaming (HLS / DASH)
The problem: Network speed changes mid-stream (Wi-Fi β cellular β congested).
Solution: Adaptive Bitrate Streaming (ABR)
How HLS works:
1. Video split into small segments (2-10 sec each)
2. Same content encoded at multiple bitrates
3. Master playlist (.m3u8) lists all quality levels
4. Player starts at medium quality, monitors download speed
5. If download fast β switch UP (higher quality)
6. If buffering β switch DOWN (lower quality)
7. Seamless quality switching with no interruption
HLS Manifest example:
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=400000,RESOLUTION=640x360
360p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1500000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
HLS vs DASH comparison:
| Aspect | HLS (Apple) | MPEG-DASH (open standard) |
|---|---|---|
| Developed by | Apple | MPEG group |
| Apple devices | Native support | Requires plugin |
| Android | Supported | Native support |
| Segment format | MPEG-TS or fMP4 | fMP4 |
| Codec flexibility | Lower | Higher (any codec) |
| Adoption | YouTube, Netflix iOS | Netflix Android, Disney+ |
| Recommendation | iOS/Safari | Cross-platform |
In practice: Support both. Detect device, serve appropriate manifest.
Video Storage Architecture
Upload: Pre-signed URLs
Problem: Videos are up to 1 GB. API servers should not be the transfer bottleneck.
Solution: Pre-signed URLs β client uploads directly to S3.
1. Client sends POST /upload/init to API server
2. API server generates pre-signed S3 URL (valid 1 hour)
3. API server returns URL to client
4. Client uploads directly to S3 using pre-signed URL
5. S3 notifies completion event β triggers transcoding queue
Client β API Server: "I want to upload video.mp4"
API Server β S3: generate_presigned_url(bucket, key, ttl=3600)
API Server β Client: { upload_url: "https://s3.amazonaws.com/...?sig=xxx" }
Client β S3: PUT video.mp4 (direct, large transfer bypasses API servers)
S3 β SNS/SQS: { event: "ObjectCreated", key: "raw/video123.mp4" }
SQS β Transcoding Workers: pick up job
Why pre-signed URLs?:
- API servers are NOT the bottleneck for large file transfers
- S3 handles parallel multi-part upload natively
- Client gets progress directly from S3
- More secure: URL is scoped to one object, expires quickly
Resumable Chunked Upload
Problem: 1 GB upload on mobile can fail halfway. Should not restart from scratch.
Solution: Multipart upload β split into 5 MB chunks.
Upload flow:
1. Client initiates multipart upload β gets upload_id
2. Client splits file into chunks (e.g., 5 MB each)
3. Client uploads chunks IN PARALLEL (e.g., 4 at a time)
4. Each chunk returns an ETag
5. On failure: only re-upload failed chunks (not whole file)
6. Client sends CompleteMultipartUpload with all ETags
Resuming:
- Client stores chunk completion status locally
- On resume: ListParts β see which ETags received
- Only upload missing chunks
File: video.mp4 (200 MB)
Chunks: Part1(5MB) Part2(5MB) ... Part40(5MB)
Thread 1: Part1 β
Part5 β
Part9 β
...
Thread 2: Part2 β
Part6 β retry Part6 β
...
Thread 3: Part3 β
Part7 β
Part11 β
...
Thread 4: Part4 β
Part8 β
Part12 β
...
CompleteMultipartUpload([ETag1, ETag2, ..., ETag40])
CDN Strategy
How CDN works for video:
Edge Server (CDN PoP - e.g., Singapore):
- Popular videos cached here (80/20 rule: 20% videos = 80% traffic)
- Client β CDN edge β instant serve from cache
Cache miss flow:
- CDN edge β Origin (S3) β fetch + cache β serve client
- First viewer in region = slow, all others = fast
Geographic distribution:
- AWS CloudFront: 400+ PoPs worldwide
- Popular video in India: cached at Mumbai, Chennai, Hyderabad PoPs
- No need to fetch from US-East S3 on every request
CDN Cost Optimization (critical topic in interviews):
Problem: CDN egress is VERY expensive (~$0.08/GB)
750 TB/day Γ $0.08/GB = $60,000/day just for CDN!
Strategy 1: Only cache popular videos on CDN
- Top 20% of videos = 80% of traffic (long-tail distribution)
- Long-tail (old/niche videos) β serve from S3 directly
- Save 80% CDN cost for 20% of content
Strategy 2: Move unpopular videos to cold storage
- Videos not watched in 30 days β S3 Glacier Instant Retrieval
- Cost: $0.004/GB/month vs $0.023/GB/month (standard S3)
- 85% cost reduction for archived videos
Strategy 3: Regional CDN caching
- Pre-warm CDN with trending videos before viral spike
- ML model predicts which videos will trend
Metadata DB Design
Two-database approach:
| Concern | Database | Why |
|---|---|---|
| Video metadata (title, description, tags, owner) | MySQL (relational) | Structured, joins with users table |
| User data (profile, subscriptions, playlists) | MySQL | ACID transactions |
| Video analytics (views, likes, trending scores) | Cassandra (wide-column) | High write throughput, time-series friendly |
| Search index (full-text search on title/description) | Elasticsearch | Full-text search with ranking |
Core tables (MySQL):
-- Video metadata
CREATE TABLE videos (
video_id VARCHAR(36) PRIMARY KEY,
user_id VARCHAR(36) NOT NULL,
title VARCHAR(500) NOT NULL,
description TEXT,
status ENUM('processing', 'ready', 'failed'),
duration_sec INT,
size_bytes BIGINT,
created_at TIMESTAMP,
INDEX idx_user_id (user_id),
INDEX idx_created_at (created_at)
);
-- Transcoded output per format
CREATE TABLE video_formats (
format_id INT AUTO_INCREMENT PRIMARY KEY,
video_id VARCHAR(36) NOT NULL,
resolution VARCHAR(10), -- '360p', '720p', '1080p'
format VARCHAR(10), -- 'mp4', 'hls', 'dash'
s3_key VARCHAR(500),
size_bytes BIGINT,
FOREIGN KEY (video_id) REFERENCES videos(video_id)
);Analytics with Cassandra:
-- View counts (high write throughput)
CREATE TABLE video_views (
video_id UUID,
view_date DATE,
view_count COUNTER,
PRIMARY KEY (video_id, view_date)
);
-- Query: views last 30 days
SELECT SUM(view_count) FROM video_views
WHERE video_id = ? AND view_date >= '2026-03-13';Error Handling & Reliability
Message Queue for Resilience
Without MQ (fragile):
API Server β Transcoding Worker (direct call)
If transcoding worker crashes β job lost β
With MQ (resilient):
API Server β SQS/Kafka β Transcoding Workers
If worker crashes β job remains in queue β
Worker restarts β picks up job again β
Dead Letter Queue (DLQ) β jobs that fail 3+ times
Retry strategy:
Attempt 1: Immediate retry (worker crash, not user error)
Attempt 2: Retry after 30 seconds
Attempt 3: Retry after 5 minutes
Failed: Move to Dead Letter Queue
Alert: Notify engineering team via PagerDuty
Video upload error handling:
Upload errors:
Network timeout β resumable upload, retry from last chunk
File format invalid β return 400, prompt user to convert
File too large (>1GB) β return 413 Request Entity Too Large
Storage full β return 507, trigger auto-scaling alert
Transcoding errors:
Corrupt input file β mark video as FAILED, notify uploader
Worker OOM β restart worker, retry job
All retries exhausted β DLQ, human review
Safety: DRM and Watermarking
Digital Rights Management (DRM):
Problem: Premium content (movies, shows) must not be downloadable/copied
DRM solutions:
FairPlay (Apple devices) β AES-128 encrypted segments
Widevine (Google/Android) β Encrypted key exchange
PlayReady (Microsoft) β Windows/Xbox DRM
Flow:
1. Encrypted video stored in CDN (no plaintext)
2. Player sends license request to DRM License Server
3. License server validates user subscription
4. License server returns decryption key (short-lived)
5. Player decrypts and plays segment-by-segment
6. Key expires after session
Watermarking:
Visible watermark: Platform logo overlay (top-right corner)
Invisible watermark: Embed user ID in video (forensic)
- If video leaks, can trace which account downloaded it
- Used by Netflix for screener copies
- Imperceptible to human eye, detectable by algorithm
Design Summary
Final Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β UPLOAD PATH β
β β
β Client βββ LB βββ API Server βββ MySQL (metadata) β
β β β
β ββββ S3 (pre-signed URL) βββ raw video β
β β β β
β β ββββ SQS (transcode job) β
β β β β
β β ββββββββββΌβββββββββ β
β β β Transcoding β β
β β β Workers (EC2 β β
β β β GPU instances) β β
β β ββββββββββ¬βββββββββ β
β β β β
β β ββββββββββΌβββββββββ β
β β β S3 Transcoded β β
β β β (per format) β β
β ββββ Cassandra (view counts) β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββΌβββββββββββ
β CDN (CloudFront) β
β 400+ PoPs global β
ββββββββββββ¬βββββββββββ
β
βββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β STREAM PATH β
β β
β Viewer βββ DNS (GeoDNS) βββ Nearest CDN Edge β
β Route 53 β β
β Cache HIT? βββ Stream video β
β Cache MISS? βββ S3 β Cache β Stream β
β β
β Player: HLS/DASH adaptive bitrate streaming β
β - Monitor download speed every 2 sec β
β - Switch quality seamlessly β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Decisions Summary
| Decision | Choice | Reasoning |
|---|---|---|
| Video upload | Pre-signed S3 URLs + multipart | API servers not bottleneck, resumable |
| Transcoding | DAG pipeline + message queue | Parallel stages, resilient to worker crashes |
| Transcoding format | HLS + DASH + MP4 | Cover all devices and adaptive bitrate |
| Video storage | S3 blob storage | Cost-effective, durable, CDN-compatible |
| CDN | CloudFront / Akamai | Global PoPs, low latency streaming |
| Metadata DB | MySQL + Cassandra | Relational for structure, wide-column for analytics |
| Error handling | SQS + DLQ + retry | No job loss, automatic recovery |
| Cost control | Only popular content on CDN | 80% savings from long-tail strategy |
Interview Questions & Answers
Q: Why use a DAG model for the transcoding pipeline?
A: DAG allows parallel execution of independent stages. Video encoding at different resolutions, audio encoding, and thumbnail generation are all independent β they can run simultaneously on separate workers. A sequential pipeline would be 3-5Γ slower. DAG also makes it easy to add new stages (e.g., content moderation) without rewriting the whole pipeline.
Q: What is adaptive bitrate streaming and why is it important?
A: ABR (HLS/DASH) splits video into small segments (2-10s) encoded at multiple bitrates. The player monitors download speed and switches quality tier up or down seamlessly. This is critical for mobile users whose bandwidth changes constantly (Wi-Fi β LTE β congested network). Without ABR, users would either buffer constantly (if quality is too high) or get unnecessarily low quality (if quality is too low).
Q: How do you handle the CDN cost problem?
A: CDN egress is expensive (~$0.08/GB). The key insight is that video traffic follows a power-law distribution β 20% of videos drive 80% of views. Strategy: Only cache popular/trending videos on CDN (determined by view count threshold), serve long-tail content directly from S3. Also move videos not watched in 30+ days to S3 Glacier to reduce storage cost. This can cut CDN cost by 70-80%.
Q: How do pre-signed URLs work for video upload?
A: The API server generates a time-limited (e.g., 1 hour) signed URL that allows the client to PUT one specific file directly to S3. The signature is computed using AWS credentials the client does not have. This means: (1) large file transfer bypasses API servers entirely, (2) the URL expires and can only be used for that one object, (3) S3 handles parallel multipart upload natively. After upload, S3 triggers an event (SNS/SQS) to kick off the transcoding pipeline.
Q: How would you handle a video that goes viral (sudden 1000Γ traffic spike)?
A: CDN handles most of the spike automatically β the video would already be cached at edge servers. If itβs a new video not yet popular: CDN cache miss goes to S3 (first few requests per PoP), then cached. For upload spikes: auto-scaling transcoding workers (EC2 spot instances). For API layer: horizontal scaling behind load balancer. Pre-warm CDN for anticipated events (sports finals, product launches) by pre-fetching segments to edge PoPs.
Q: How do you ensure no video is lost if a transcoding worker crashes?
A: Message queue (SQS) provides durability. The transcoding job sits in SQS with a visibility timeout. When a worker picks up the job, SQS makes it invisible to other workers. If the worker crashes before sending acknowledgment, the visibility timeout expires, and the job reappears in the queue for another worker to pick up. After N failed attempts, the job moves to a Dead Letter Queue (DLQ) for human inspection.
Key Takeaways
- Separate upload and stream paths β different scaling characteristics, different optimizations
- DAG transcoding pipeline enables parallel processing β critical for throughput at scale
- Pre-signed URLs for large file upload: keep API servers out of the data path
- Chunked/multipart upload for resumability β never restart a 1 GB upload from scratch
- Adaptive bitrate streaming (HLS/DASH) is standard for all video delivery today
- CDN is the core of video delivery β 80-90% of video requests should be cache hits
- CDN cost optimization with long-tail strategy: only cache popular videos on CDN
- Message queue + DLQ for transcoding reliability β never lose a job to a worker crash
Related Resources
- distributed-system-components - CDN, blob storage, message queues
- key-patterns - CDN caching patterns, message queue patterns
- ch04-rate-limiter - Rate limiting upload API
- ch05-consistent-hashing - Load balancing CDN traffic
Practice this design! Very common hard interview question. Be ready to:
- Draw the full upload and stream path separately
- Explain the transcoding DAG and why itβs parallel
- Discuss adaptive bitrate streaming (HLS/DASH)
- Talk through CDN cost optimization and long-tail strategy
- Handle failure scenarios (worker crash, upload failure, viral spike)
Last Updated: 2026-04-13
Status: Very common hard interview question - Must know!