Chapter 8: Design a Distributed Email Service
volume2 email smtp imap distributed-systems
Status: π© Interview ready
Difficulty: Hard
Time to complete: 50 min read + practice
Overview
A distributed email service lets billions of users send, receive, store, and search emails at massive scale. Gmail and Outlook are the canonical examples β each handling hundreds of billions of emails per year.
Why this matters:
- Complex distributed storage problem (not just a simple CRUD app)
- Teaches email protocols (SMTP, IMAP, POP3) interviewers expect you to know
- Deep dive into storage systems: when to use Cassandra vs object store vs Elasticsearch
- Common at Google, Microsoft, Yahoo interviews
Problem Statement
Design a distributed email service that:
- Handles 1 billion users, 10 billion emails per day
- Reliably delivers emails (no loss, at-least-once)
- Stores emails with metadata (sender, subject, date, labels/folders)
- Supports full-text search within email body and subject
- Supports email threading (conversation view)
- Handles attachments up to 25MB
- Sends push notifications for new emails
- Provides mobile sync protocol
Step 1: Requirements & Scope (5 min)
Functional Requirements
Clarifying questions:
- Scale? β 1 billion users, 10 billion emails/day
- Is delivery guarantee required? β At-least-once with deduplication
- Search? β Full-text search on subject, sender, body
- Attachments? β Yes, up to 25MB
- Threading? β Yes, conversation grouping
- Real-time notifications? β Yes, push notifications for new emails
- Mobile sync? β Yes
- Spam filtering? β Yes
Scope:
- Send emails (to users, to external SMTP servers)
- Receive emails (from external SMTP servers)
- Read, delete, search emails
- Organize with labels/folders
- Thread emails into conversations
- Push notifications for new email
Non-Functional Requirements
- Reliability: No email loss (at-least-once delivery)
- Scale: 1B users, 10B emails/day
- Availability: 99.99% (emails are mission-critical)
- Search performance: Search results < 500ms
- Storage durability: 99.9999999% (nine nines for email data)
- Low latency delivery: Email delivered within seconds
- Attachment deduplication: Same file shared between emails not stored twice
Scale Estimation
Users: 1,000,000,000 (1B)
Emails/day: 10,000,000,000 (10B)
Emails/second: 10B / 86,400 = ~115,740 emails/sec
Peak: ~3Γ average = ~350,000 emails/sec
Storage per email:
- Metadata (headers, labels, status): ~1 KB
- Body (average HTML email): ~50 KB
- Attachment (some emails): ~500 KB avg if attached
Daily storage (metadata only):
10B emails Γ 1 KB = 10 TB/day metadata
Daily storage (body):
10B emails Γ 50 KB = 500 TB/day
1-year storage for bodies:
500 TB/day Γ 365 = ~182 PB/year (with compression and dedup: much less)
Attachments (assume 20% of emails have attachments):
2B emails Γ 500 KB = 1 PB/day
Step 2: High-Level Design (10 min)
Email Protocols
SMTP (Simple Mail Transfer Protocol):
- Used for sending email
- Port 587 (submission) or 465 (SSL)
- Sender β Senderβs SMTP server β Recipientβs SMTP server
- Text-based, stateless
IMAP (Internet Message Access Protocol):
- Used for receiving/accessing email
- Port 993 (SSL)
- Emails stay on server, synced to client
- Supports multiple devices, folders, read/unread state
- Modern clients (Gmail app, Outlook) use IMAP or a custom sync protocol
POP3 (Post Office Protocol 3):
- Used for downloading email
- Port 995 (SSL)
- Downloads and deletes from server (or leaves copy)
- No sync β not suitable for multi-device use
- Legacy protocol, largely replaced by IMAP
SMTP vs IMAP vs POP3:
| Protocol | Direction | Stores on server? | Multi-device? | Use Today |
|---|---|---|---|---|
| SMTP | Sending | No (transport) | N/A | Yes (sending) |
| IMAP | Receiving | Yes | Yes | Yes (receiving) |
| POP3 | Receiving | No (downloads) | No | Legacy only |
Sending Email Flow
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Sending an Email β
β β
β User App β
β β β
β βΌ HTTPS / SMTP submission β
β API Server / SMTP Submission Server β
β β β
β βΌ async β
β Message Queue (Kafka) β
β β β
β βΌ β
β Outgoing SMTP Worker β
β ββ Validate sender domain (SPF, DKIM) β
β ββ Spam check β
β ββ Resolve recipient's MX record (DNS) β
β ββ SMTP connect β Recipient's Mail Server β
β βΌ β
β External Recipient (e.g., Yahoo, Outlook) β
β OR β
β Internal user (route to receiving pipeline below) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Receiving Email Flow
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Receiving an Email β
β β
β External Sender's SMTP Server β
β β SMTP (port 25) β
β βΌ β
β Incoming SMTP Server (our system) β
β ββ Validate: SPF, DKIM, DMARC β
β ββ Rate limit incoming β
β ββ Push to Message Queue β
β β β
β βΌ Kafka β
β Mail Processing Workers β
β ββ Spam filtering (ML + rules) β
β ββ Virus/attachment scanning β
β ββ Attachment extraction β Object Store (S3) β
β ββ Assign labels (Inbox, Spam, etc.) β
β ββ Assign conversation_id (threading) β
β ββ Store email metadata + body β
β β β
β βΌ β
β User's Mailbox Storage β
β ββ Metadata β Cassandra (fast read, multi-device sync) β
β ββ Body β Object Store (S3, immutable) β
β ββ Search Index β Elasticsearch (async, eventual) β
β β β
β βΌ β
β Push Notification Service β
β ββ Notify user's devices (iOS, Android, web) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Data Models
Email Metadata (Cassandra):
Table: email_metadata
user_id UUID -- partition key (owner of mailbox)
message_id UUID -- clustering key (sort descending by time)
conversation_id UUID -- for threading
subject TEXT
sender TEXT
recipients LIST<TEXT>
date TIMESTAMP
read BOOLEAN
labels SET<TEXT> -- ["INBOX", "STARRED", "WORK"]
body_s3_key TEXT -- reference to body in S3
attachment_ids LIST<UUID> -- references to attachment metadata
size_bytes INT
has_attachment BOOLEAN
Email Body (S3):
Key: emails/{year}/{month}/{day}/{message_id}.eml
Content: Full RFC 2822 email body (text or HTML or multipart)
ACL: Private (accessed via signed URLs)
Attachment (S3 + metadata table):
Table: attachment_metadata
attachment_id UUID (PK)
message_id UUID
content_hash TEXT (SHA-256 of file content)
s3_key TEXT (reference to S3 object)
filename TEXT
mime_type TEXT
size_bytes INT
S3 Key: attachments/{content_hash} (dedup by hash!)
Step 3: Deep Dive (25 min)
Email Storage at Scale β Why Not Standard Databases?
Option 1: Relational DB (MySQL, PostgreSQL)
Problems at 10B emails/day:
- Write throughput: 115,740 inserts/sec β MySQL max ~10K TPS per node
- Full-text search: LIKE '%query%' doesn't scale at billions of rows
- JOINs: email + metadata + labels + attachments = expensive JOINs
- Sharding: email data doesn't fit on one server
- Column size: storing email body as BLOB is inefficient
Option 2: Standard NoSQL (MongoDB, DynamoDB)
Problems:
- No efficient full-text search built in
- MongoDB struggles with 182 PB of data at Gmail scale
- Limited support for complex label/conversation queries
- Still need a separate search solution
Option 3: Custom Distributed Storage (Gmailβs approach)
Gmail uses Bigtable (wide-column store) for email storage.
Microsoft Exchange uses its own custom storage engine.
Key insight: Email has a natural schema:
- Keyed by (user_id, timestamp) β wide-column fits perfectly
- Read patterns: "give me all emails for user X sorted by time"
- Append-heavy (new emails added, rarely modified)
Recommended Architecture for the Interview:
| Data | Storage | Why |
|---|---|---|
| Email metadata (headers, labels, read status) | Cassandra | High write throughput, partition by user_id, sorted by time |
| Email body (HTML/text content) | S3 (object store) | Immutable large blobs, cheap, durable |
| Attachments | S3 (deduplicated by content hash) | Same file across multiple emails stored once |
| Search index | Elasticsearch | Full-text search, near-real-time |
| Unread counts, session state | Redis | Fast counters, ephemeral state |
Cassandra Data Model for Email Metadata
Why Cassandra?
- Partition by
user_idβ all of a userβs emails on same node(s) - Cluster by
message_id(time-based UUID) β sorted by time automatically - High write throughput (10B emails/day = 115K writes/sec β Cassandra handles this)
- No single point of failure (masterless, multi-datacenter replication)
- TTL support for automatic deletion of old emails
-- CQL Schema
CREATE TABLE email_metadata (
user_id UUID,
message_id TIMEUUID, -- time-based UUID, sorts chronologically
conversation_id UUID,
subject TEXT,
sender TEXT,
read BOOLEAN,
labels SET<TEXT>,
body_s3_key TEXT,
has_attachment BOOLEAN,
size_bytes INT,
PRIMARY KEY (user_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC); -- newest first
-- Query: get latest 50 emails for user X
SELECT * FROM email_metadata
WHERE user_id = X
LIMIT 50;
-- Query: get emails in specific label
SELECT * FROM email_metadata
WHERE user_id = X AND labels CONTAINS 'INBOX';
Separate table for label index (Cassandra secondary indices are slow for high cardinality β use materialized views or separate tables):
CREATE TABLE email_by_label (
user_id UUID,
label TEXT,
message_id TIMEUUID,
PRIMARY KEY ((user_id, label), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);
Object Storage for Email Body
Why separate body from metadata?
Metadata query: "show me inbox" β needs subject, sender, date, labels
β does NOT need full body
Body fetch: user clicks email β fetch body from S3 by key
Splitting means:
- Inbox load = fast Cassandra read (no large blobs)
- Body fetch = on-demand S3 read (only when opened)
- 80% of emails are never opened again after first read
β body stays in S3 (cold storage after 30 days, glacier after 1 year)
Storage lifecycle:
Day 0-30: S3 Standard (fast access, $0.023/GB)
Day 30-365: S3 Infrequent Access ($0.0125/GB)
Day 365+: S3 Glacier (archive, $0.004/GB)
Attachment Deduplication with Content Hash
Problem: The same 5MB PDF attached to 1M emails = 5 PB wasted storage.
Solution: Content-addressed storage using SHA-256 hash.
Sending attachment:
1. Client sends attachment
2. Server computes SHA-256 hash = "abc123..."
3. Check if S3 key "attachments/abc123..." exists
4. If exists: skip upload, just reference existing S3 object
5. If not exists: upload to S3 at key "attachments/abc123..."
6. Store attachment_metadata record with content_hash = "abc123..."
Result: Same file stored exactly once regardless of how many emails reference it.
def store_attachment(file_bytes):
content_hash = hashlib.sha256(file_bytes).hexdigest()
s3_key = f"attachments/{content_hash}"
# Check if already stored (dedup)
if not s3.exists(s3_key):
s3.put(s3_key, file_bytes)
return content_hash # Store in email metadata, not the file itselfEmail Threading with conversation_id
Problem: Group related emails into a conversation (Gmailβs threaded view).
How email threading works:
RFC 2822 email headers:
Message-ID: <abc123@gmail.com> -- unique ID of this email
In-Reply-To: <xyz789@gmail.com> -- ID of email being replied to
References: <xyz789@gmail.com> <def456@gmail.com> -- full thread chain
Algorithm:
def assign_conversation_id(email):
if email.in_reply_to:
# Find the parent email
parent = find_email_by_message_id(email.in_reply_to)
if parent:
# Join the same conversation
return parent.conversation_id
# No parent found or it's a new email β start new conversation
return generate_new_uuid()Storage: conversation_id stored in Cassandra email_metadata. To load a thread: query WHERE conversation_id = X (needs a secondary index or materialized view in Cassandra).
Full-Text Search with Elasticsearch
Why Elasticsearch?
- SQL
LIKE '%keyword%'doesnβt scale to billions of emails - Cassandra has no efficient full-text search
- Elasticsearch: inverted index, sub-second search across billions of documents
Email stored in DB β Async indexer β Elasticsearch
(near-real-time: indexed within seconds of receipt)
Search query: "subject:invoice from:vendor@company.com"
Elasticsearch query:
{
"query": {
"bool": {
"must": [
{ "match": { "subject": "invoice" } },
{ "term": { "sender": "vendor@company.com" } }
],
"filter": [
{ "term": { "user_id": "user-uuid-123" } }
]
}
}
}
Elasticsearch index per user vs shared index:
- Per-user index: Strong isolation, but millions of tiny indices are expensive
- Shared index with user_id filter: Better resource utilization
- Tenant-based sharding: Group users into index shards (balance between isolation and efficiency)
What to index:
- subject (analyzed: tokenized for full-text)
- sender / recipients (keyword: exact match)
- body snippet (analyzed, truncated to ~200 chars for performance)
- labels (keyword)
- date (date range queries)
What NOT to index in ES:
- Full email body (index text snippets + reference S3 key)
- Attachments (index filenames/MIME types, not content)
Delivery Guarantees
At-least-once delivery with deduplication:
Problem: "Exactly once" is impossible in distributed systems.
At-most-once = may lose emails (unacceptable).
At-least-once + dedup = practical guarantee.
How:
1. Incoming SMTP server assigns a unique message_id to every email
2. Email pushed to Kafka topic with message_id as key
3. Processing worker consumes email
4. Before storing: check if message_id already exists in Cassandra
5. If exists: skip (already processed = idempotent consumer)
6. If not: store email
The message_id (from email headers or assigned by SMTP server) acts
as the deduplication key.
Message queue durability:
Kafka topic for incoming emails:
- Replication factor: 3 (survives 2 broker failures)
- Retention: 7 days (replay if processing worker fails)
- Consumer group offset committed after successful storage
- If worker crashes: re-read from last committed offset
Dead Letter Queue for Undeliverable Emails
Processing pipeline:
Kafka β Mail Worker β [process email]
β
ββββββββββββ΄βββββββββββ
β β
Success Failure (after 3 retries)
β β
Store in DB β Dead Letter Queue (DLQ)
β
Operations team reviews
(spam, virus, corrupt data)
β
Manual retry or discard
Reasons email lands in DLQ:
- Corrupt email format (canβt parse)
- Virus detected in attachment (quarantine)
- Storage failure (rare)
- User storage quota exceeded
Spam Filtering
Multi-layer approach:
Layer 1: IP reputation (before SMTP connection accepted)
- Check sender IP against known spam blacklists (Spamhaus, etc.)
- Block at network level β cheapest check
Layer 2: Email authentication (SMTP handshake)
- SPF: Verify sender's IP is authorized for their domain
- DKIM: Verify email signature (wasn't tampered with)
- DMARC: Policy for how to handle SPF/DKIM failures
Layer 3: Content filtering (during processing)
- Rule-based: Known spam phrases, suspicious links, attachment types
- ML-based: Trained on billions of spam/ham examples
- Collaborative filtering: If many users mark same sender as spam β block
Layer 4: User feedback
- User clicks "Report spam" β trains the ML model
- "Not spam" (false positive) β whitelists sender for that user
Label System vs Folder System
IMAP Folders (traditional email):
- Email belongs to exactly one folder
- Moving email = physical move between folders
- Simple but inflexible
Gmail Labels (modern approach):
- Email can have multiple labels: [βINBOXβ, βSTARREDβ, βWORKβ]
- Labels are metadata, not physical locations
- βMove to folderβ = add label + remove INBOX label
- Archive = remove INBOX label (email still accessible with All Mail label)
Storage: Labels stored as SET<TEXT> in Cassandra email_metadata. Label index table enables efficient βshow all emails with INBOX labelβ queries.
Mobile Sync Protocol
Challenge: Mobile clients can be offline for hours. When they reconnect, how do they efficiently sync?
Sync Approaches:
1. Full sync: Download everything β too slow/expensive
2. Delta sync: Download only changes since last sync
Delta sync with sequence numbers:
- Each user has a global sequence number (monotonic counter)
- Every change (new email, read, delete, label change) increments it
- Client stores last_sync_sequence = 12345
- On reconnect: GET /sync?since=12345
- Server returns: all changes with sequence > 12345
- Client applies changes, updates last_sync_sequence
Push vs pull:
- New email arrives β push notification via APNS (iOS) or FCM (Android)
- Notification wakes app β app pulls delta sync from API
- Silent push notifications donβt wake screen but sync background
Final Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Clients β
β (Web, iOS, Android) β HTTPS + WebSocket/SSE β
ββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β API Gateway β
β Auth | Rate Limiting | Routing β
ββββββββ¬βββββββββββββββββ¬βββββββββββββββββββββ¬ββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββ
β Webmail β β SMTP Submit β β Notification Service β
β API β β Server β β (APNs/FCM/WebSocket) β
β (read/ β β (outgoing) β ββββββββββββββββββββββββ
β search/ β ββββββββ¬ββββββββ
β label) β β Kafka
ββββββββββββββ βΌ
β ββββββββββββββββ
β β Outgoing β
β β SMTP Workers β
β ββββββββββββββββ
β
ββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Storage Layer β
β β
β ββββββββββββββββββ ββββββββββββ βββββββββββββββββββ β
β β Cassandra β β S3 β β Elasticsearch β β
β β (email meta β β (bodies, β β (full-text β β
β β + labels) β β attachm.)β β search index) β β
β ββββββββββββββββββ ββββββββββββ βββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββ β
β β Redis (unread counts, β β
β β session, rate limits) β β
β ββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Incoming SMTP (port 25):
External Sender β Incoming SMTP Server β Kafka β Mail Processing Workers
β [Spam filter | Attachment scan | Thread assign] β Cassandra + S3 + ES
Design Summary β Full Email Receive Flow
1. External sender sends email to user@ourdomain.com
2. Incoming SMTP server (port 25):
ββ Validate SPF, DKIM, DMARC
ββ Rate limit by sender IP
ββ Assign unique message_id (if not present in headers)
ββ Push email to Kafka topic "incoming_emails"
3. Mail processing worker (Kafka consumer):
ββ Check idempotency: message_id already in Cassandra? β skip
ββ Spam filter layers 1-4
ββ Virus scan attachments
ββ Extract attachments β compute SHA-256 β store in S3 (dedup)
ββ Store email body in S3: "emails/2026/04/13/{message_id}.eml"
ββ Assign conversation_id (check In-Reply-To header)
ββ Assign labels: ["INBOX"] (or ["SPAM"] if spam)
4. Write to Cassandra:
ββ INSERT INTO email_metadata (user_id, message_id, subject, sender,
read=false, labels={"INBOX"}, body_s3_key, conversation_id)
ββ INSERT INTO email_by_label (user_id, "INBOX", message_id)
5. Async index in Elasticsearch:
ββ Index: {user_id, message_id, subject, sender, body_snippet, labels}
6. Push notification:
ββ Notification Service β APNs (iOS) / FCM (Android) / WebSocket (web)
ββ "New email from Alice: Re: Invoice Q1 2026"
7. User opens email:
ββ Read metadata from Cassandra (fast)
ββ Fetch body from S3 via signed URL
ββ UPDATE email_metadata SET read = true WHERE ...
Interview Questions & Answers
Q: Why not use a relational database for email storage?
A: Three problems at Gmail scale: (1) Write throughput β 115K inserts/sec exceeds single-node MySQL capacity, requiring complex sharding. (2) Full-text search β SQL LIKE queries donβt scale to billions of rows; need a dedicated search engine. (3) Storage size β 500 TB/day of email bodies stored as BLOBs in a relational DB would be extremely inefficient compared to object storage at a fraction of the cost. Instead: Cassandra for metadata (write-optimized, partitioned by user), S3 for bodies (cheap, durable), Elasticsearch for search.
Q: How do you ensure no emails are lost (delivery guarantee)?
A: At-least-once delivery with deduplication: (1) Kafka for the message queue β replicated across 3 brokers, 7-day retention. Consumer commits offset only after successful DB write. If worker crashes, re-reads from last committed offset. (2) Each email has a unique message_id. Before storing, check if message_id already exists (idempotency). (3) Failed emails after 3 retries go to Dead Letter Queue for manual review. The combination of durable queue + idempotent consumer + DLQ ensures practical exactly-once semantics.
Q: How does email threading work?
A: RFC 2822 email headers contain Message-ID (unique ID of the email) and In-Reply-To (ID of the email being replied to). On receipt, we check In-Reply-To β if we find the parent email in our storage, we assign the same conversation_id. If no parent found, we assign a new conversation_id. All emails in the same thread share the same conversation_id, enabling efficient thread loading with a single query.
Q: How would you implement full-text email search?
A: Elasticsearch with an inverted index. After storing an email in Cassandra + S3, we asynchronously push index documents to Elasticsearch: {user_id, message_id, subject (analyzed), sender (keyword), body snippet (analyzed, ~200 chars), labels (keyword), date}. Search query includes user_id as a mandatory filter for security. Near-real-time indexing means search is available within ~1 second of email receipt. For privacy/multi-tenancy, we shard the index by user cohorts.
Q: How do you handle a user who is over their storage quota?
A: (1) Quota tracked in real-time: Redis counter per user, incremented on email receive, decremented on delete. (2) On new email: check quota before storing. If over quota: email rejected at SMTP level with 452 error code (βinsufficient storageβ), sender receives bounce notification. (3) Warning notifications: push notification and email when at 80%, 90%, 95% of quota. (4) Graceful degradation: read-only mode (can read and delete existing emails but cannot receive new ones). (5) Quota updates propagated via quota service, not inline to keep the critical path fast.
Key Takeaways
- Never store email bodies in relational DB at scale: Use object storage (S3) for bodies/attachments, wide-column store (Cassandra) for metadata
- Attachment deduplication with SHA-256: Same content stored once regardless of how many emails reference it β critical at PB scale
- At-least-once + idempotent consumer = practical exactly-once: message_id deduplication before DB write handles duplicates from queue retries
- Elasticsearch for search, Cassandra for storage: Each tool does one thing well β donβt try to make Cassandra searchable or Elasticsearch the system of record
- Email threading via In-Reply-To header: conversation_id assigned at processing time based on RFC 2822 headers β no complex graph traversal needed at query time
- Dead letter queue: Unprocessable emails need a holding area for manual review β never silently drop
- Two-protocol reality: SMTP sends outgoing mail; IMAP/custom sync protocol handles client-side access. Understanding this split is expected in interviews
Related Resources
- ch07-hotel-reservation - Idempotency patterns (same pattern used for email dedup)
- distributed-system-components > Message Queues - Kafka usage in email pipeline
- key-patterns > At-Least-Once Delivery - Delivery guarantees
- distributed-system-components > Object Storage - S3 for email bodies
Last Updated: 2026-04-13
Status: Hard but very high-signal interview β tests storage system design, protocol knowledge, and distributed systems patterns