Chapter 12 Flashcards - Chat System

flashcards volume1 chat websocket real-time messaging

What are the three client-server communication options for a chat system and which is best?
?
Polling: Client repeatedly asks server for messages every N seconds. Bad — wastes resources, not real-time. Long Polling: Client holds connection open until server has data. Better — but client must re-open after every message, hard to scale across servers. WebSocket: Persistent bidirectional connection, server pushes messages instantly. Best — low latency, no reconnect overhead, supports both send and receive. Use WebSocket for chat!

Why is WebSocket better than long polling for a chat system?
?
WebSocket provides persistent bidirectional connection — server pushes messages to client instantly without waiting for client request. Long polling requires client to re-open HTTP connection after every response (overhead, latency). WebSocket also handles typing indicators, presence updates, and delivery receipts cheaply on the same connection. Industry standard: WhatsApp, Slack, Discord all use WebSocket.

What should use WebSocket vs HTTP in a chat system?
?
WebSocket (stateful, real-time): Sending/receiving messages, online presence updates, typing indicators, message delivery receipts. HTTP/REST (stateless): Login/signup/auth, user profile management, group creation/management, media file upload (to CDN), push notification configuration. Rule: WebSocket for everything that needs real-time push; HTTP for everything else.

What are the three core service types in a chat system?
?
Chat Servers: Handle WebSocket connections, route messages between users, fan-out group messages, detect online/offline state. Presence Servers: Receive heartbeats from clients, track online/offline status, propagate status changes to friends. API Servers: Stateless HTTP servers for login, user/group management, media upload. Each service type scales independently.

What storage do you use for chat messages vs user/group data and why?
?
Chat messages → Key-Value store (HBase/Cassandra): Write-heavy append-only workload, billions of rows, need horizontal scaling, ordered range scan by time (conversation_id + time). User/group data → Relational DB (MySQL): Structured data needing joins (user-friend, user-group), low volume, ACID guarantees needed. Never use relational DB for chat history — indexes degrade at billions of rows.

Why is HBase or Cassandra chosen for chat message storage?
?
Chat history is write-heavy (append-only), read by time range (get messages in last hour), and grows to hundreds of TB. HBase: Fast writes, ordered range scans by row key (use composite key: conversation_id + timestamp), horizontal scaling. Cassandra: Same benefits, better read throughput, used by Discord. Both support auto-sharding and replication. Relational DBs fail here because B-tree indexes degrade at billion+ rows.

How does message ID design work for ordering within a conversation?
?
Use a local sequence number per conversation (not global Snowflake, not timestamp). Implementation: Redis INCR conversation:{id}:seq gives next message_id atomically. Message_id is a monotonically increasing integer within each conversation. Clients sort messages by message_id (not timestamp) for display. Why not timestamp? Clock skew across servers can give two messages the same millisecond, causing ordering bugs.

How does 1-on-1 message delivery work step by step?
?

User A sends message via WebSocket to Chat Server 1. 2. Chat Server 1 generates message_id (Redis INCR). 3. Chat Server 1 stores message in HBase. 4. Chat Server 1 sends ACK to User A. 5a. If User B ONLINE: look up User B’s chat server (Redis), route message to Chat Server 2, Chat Server 2 pushes to User B via WebSocket. 5b. If User B OFFLINE: publish to User B’s Kafka inbox queue, Push Notification Service sends APNs/FCM notification.

How does service discovery work for assigning users to chat servers?
?
Zookeeper (or etcd) acts as service registry. When user logs in: 1. Check Zookeeper for all available chat servers and their load. 2. Assign user to least-loaded server. 3. Store mapping user_id → chat_server_id in Redis for fast lookup. 4. User opens WebSocket to assigned server. On server failure: Zookeeper detects it, affected clients reconnect, Zookeeper assigns new server. This enables horizontal scaling of chat servers.

How does group chat message fanout work for small groups (≤ 100 members)?
?
Use fanout on write (push model): 1. User A sends to Group (100 members). 2. Chat Server stores message once in HBase. 3. Chat Server fans out to each member’s Kafka inbox queue (100 queue writes). 4. Each online member receives message via their chat server’s WebSocket push. 5. Each offline member gets push notification. Works well at ≤ 100 members — 100 queue writes per message is acceptable.

What is the difference between fanout on write (push) vs fanout on read (pull) for group chat?
?
Fanout on write (push): Message written to each member’s inbox queue at send time. Pros: Low read latency, real-time push. Cons: High write amplification (N members = N queue writes). Use for small groups ≤ 100. Fanout on read (pull): Message written once, clients fetch on demand. Pros: Low write cost. Cons: Higher read latency, more complex client logic. Use for large groups (Discord channels with thousands of members).

How does the online presence heartbeat mechanism work?
?
Client sends heartbeat to Presence Server every 5 seconds. Presence Server updates Redis: presence:{user_id} = { status: online, last_seen: now() } with TTL of 35 seconds. If no heartbeat for 30 seconds → user marked offline (Redis key expires). Why heartbeat vs connect/disconnect events? Network flicker causes false offline status with connect/disconnect. 30s grace period handles temporary drops without bothering friends with status changes.

How is online status propagated to friends after a status change?
?

Presence Server detects User B goes ONLINE (heartbeat received after offline period). 2. Publishes status-change event to Kafka topic presence-updates:{user_b_id}. 3. Friends subscribed to this topic receive the event via their chat servers. 4. Chat servers push presence update to online friends via existing WebSocket. Optimization: Only notify online friends (no point notifying offline users). Rate-limit to avoid status flicker (debounce rapid on/off changes).

How does cross-device synchronization work in a chat system?
?
Each device maintains its own cursor: the last_seen_message_id it has received. When device reconnects or comes online: calls sync API with its cursor. Server returns all messages after that cursor. Example: Phone has cursor=3, Laptop has cursor=5. Phone calls GET /messages/sync?after=3 and gets messages 4 and 5. This is cursor-based pagination of the message log. Each device independently tracks its sync position — no central coordination needed.

What is the push notification flow for offline users?
?

Message arrives in User B’s Kafka inbox queue. 2. Consumer detects User B has no active WebSocket (checked via Redis session store). 3. Routes message to Push Notification Service. 4. Push Service selects provider: APNs for iOS, FCM for Android, Web Push for browser. 5. Sends notification with minimal content (sender name, “new message” — not full text for privacy). 6. On user tap: app opens, fetches full messages from server using sync API. Message stored in DB first, so push failure doesn’t lose the message.

How do you ensure message delivery reliability (at-least-once with deduplication)?
?

Client generates a UUID for each message before sending. 2. Chat Server stores message in DB and returns ACK. If no ACK: client retries with same UUID. 3. Server deduplicates by checking UUID before storing (idempotent write). 4. Kafka ensures at-least-once delivery to inbox queues. 5. Client tracks last_seen_message_id and ignores duplicates. Result: effectively exactly-once visible delivery despite network retries.

How do you scale to 50M concurrent WebSocket connections?
?
Each WebSocket connection uses ~50KB RAM → 50M × 50KB = 2.5TB total. Scale horizontally: multiple chat servers each handling 50K-100K connections. Zookeeper tracks server load, assigns new users to least-loaded server. Redis stores user_id → server_id mapping for routing. When server fails: clients detect disconnect and reconnect, Zookeeper assigns new server. Each new server node adds ~50K-100K connection capacity.

What is the message sync API design for cross-device sync?
?
GET /messages/sync?conversation_id={id}&after_message_id={cursor}&limit=50 Returns: list of messages after the cursor, ordered by message_id, and has_more flag. Client calls this on: app start, reconnect after offline period, switching devices, background sync. Cursor-based (not offset-based) because message_ids are stable — new messages don’t shift positions like offset-based pagination would.

How does media (image/video) delivery work in a chat system?
?
Media is too large for WebSocket. Separate flow: 1. Sender uploads image to Object Storage (S3) via API Server using HTTP multipart upload. 2. S3 stores original, generates thumbnails server-side. 3. CDN caches and serves the image at edge. 4. Sender sends chat message containing CDN URL (not the binary). 5. Receiving client downloads image from CDN on demand (lazy load). Optimization: Client compresses image before upload; thumbnails shown first, full image loads on tap.

What data does Redis store in a chat system?
?
Active WebSocket sessions: user_id → chat_server_id (for routing messages to correct server). Conversation sequence counters: conversation:{id}:seq for message_id generation (INCR). Online presence: presence:{user_id} with TTL (auto-expires if no heartbeat). Recent message cache: last N messages per conversation (reduce DB reads). Distributed locks: for any operations needing coordination. Redis is the glue connecting all stateful components.

What is the Kafka topic/queue design for inbox delivery?
?
Each user has a dedicated inbox topic (or partition): inbox:{user_id}. Chat server publishes message events to recipient’s inbox topic. A consumer group per chat server subscribes to inboxes of users connected to that server. When user connects, their chat server subscribes to their inbox. When user disconnects, chat server unsubscribes. Offline messages accumulate in Kafka and are consumed when user reconnects. For groups: fanout means publishing to each member’s inbox topic.

What is the HBase/Cassandra data model for storing chat messages?
?
Row key (HBase): {conversation_id}_{reversed_timestamp} — reversed timestamp ensures newest messages are first in scan. Cassandra partition key: conversation_id, clustering key: message_id DESC. Columns: message_id, sender_id, content_type (text/image/video), content, timestamp, status. Access patterns: Get last N messages for a conversation (range scan), get messages around a specific message_id (pagination), write new message (append). Hot partitions avoided by sharding large conversations.

How does end-to-end encryption work at a high level in a chat system?
?
Signal Protocol (used by WhatsApp): 1. Each user generates a public/private key pair on their device. 2. Public keys uploaded to server; private keys NEVER leave the device. 3. Sender encrypts message with recipient’s public key on sender’s device. 4. Encrypted blob sent to server — server cannot read content. 5. Recipient decrypts using private key on their device. Architecture impact: Server can’t search message content, stores only encrypted bytes. Key exchange uses server as directory, but keys themselves are safe.

What are the key differences between designing for WhatsApp (1-on-1 focus) vs Slack (channels focus)?
?
WhatsApp style: 1-on-1 and small groups (≤ 256). Message delivery to personal inboxes. End-to-end encryption. Push notifications critical (mobile-first). Read receipts (sent/delivered/read ticks). Slack style: Large channels (thousands of members). Fanout on read for channels. Message threading. Search across message history (requires plaintext storage). Bot integrations and webhooks. Workspace/team scoping. For SDI, identify which type you’re designing — different fanout strategies and storage patterns.

What are the 8 key components of a complete chat system architecture?
?

Load Balancer: L4 for WebSocket (TCP), L7 for HTTP. 2. Chat Servers: Stateful WebSocket handlers, message routing. 3. Presence Servers: Heartbeat processing, status tracking. 4. API Servers: Stateless HTTP for login/groups/media. 5. Kafka: Inbox queues, fanout, offline buffering. 6. HBase/Cassandra: Persistent message storage. 7. Redis: Sessions, presence, sequence IDs, cache. 8. Push Notification Service: APNs/FCM for offline users. Plus Zookeeper for service discovery.

Total Cards: 25
Review Time: 20-25 minutes
Priority: HIGH - Very common Hard interview question!
Last Updated: 2026-04-13

Study Notes by Niladri & AI

Explorer

vol1-ch12-chat-system

Chapter 12 Flashcards - Chat System

Graph View