Messaging Systems: Cross-Chapter Comparison Reference
comparisons kafka message-queue pubsub
Type: Cross-chapter reference — NOT a chapter notes file
Covers: Vol 1 Ch10, Ch11, Ch14 · Vol 2 Ch04, Ch05, Ch06, Ch08, Ch13
Last Updated: 2026-04-13
1. Messaging Patterns
Three fundamental patterns underlie all messaging systems.
| Pattern | Flow | Consumers | Message fate after delivery | Typical technology | SDI use case |
|---|---|---|---|---|---|
| Message Queue (point-to-point) | Producer → Queue → Single Consumer | One consumer per message | Deleted after ACK | RabbitMQ, AWS SQS | Task offloading (email send, video encode) |
| Pub/Sub (fan-out) | Publisher → Topic → Multiple Subscribers | All subscribers receive every message | Retained until all subscribers ACK | Google Pub/Sub, AWS SNS, Redis Pub/Sub | Notification fan-out, event broadcasting |
| Event Streaming (Kafka-style) | Producer → Partitioned Log → Consumer Groups | Multiple consumer groups, each independent | Retained for configurable period (days/weeks) | Apache Kafka, AWS Kinesis | Activity tracking, analytics, audit log, replay |
Key distinction: A message queue is about work distribution (one consumer does the job). Pub/Sub is about event broadcasting (everyone hears it). Event streaming is about ordered, replayable, durable logs (you can re-read the past).
2. System Comparison Table
| Feature | Apache Kafka | RabbitMQ | AWS SQS | Redis Pub/Sub | Google Pub/Sub |
|---|---|---|---|---|---|
| Ordering | Per-partition (strict) | Per-queue (strict) | Best-effort (FIFO queues available) | No guarantee | Per-message ordering key |
| Retention | Configurable (days to forever) | Until ACK (or TTL) | 14 days max | No persistence | 7 days default |
| Throughput | Millions msg/sec | ~50K msg/sec | ~3K msg/sec per queue | ~1M msg/sec | ~1M msg/sec |
| Delivery semantics | At-least-once (exactly-once with transactions) | At-least-once | At-least-once (FIFO = exactly-once) | At-most-once (fire and forget) | At-least-once |
| Replay | Yes (seek to offset) | No | No | No | Yes (snapshot + replay) |
| Push or Pull | Pull (consumers poll) | Push and Pull | Pull | Push | Push |
| Consumer isolation | Consumer groups (each group independent) | Competing consumers per queue | Competing consumers | All subscribers get all messages | Subscriptions per topic |
| Protocol | Custom binary (TCP) | AMQP, MQTT, STOMP | HTTP/HTTPS (AWS SDK) | RESP (Redis protocol) | gRPC / HTTP |
| Managed | Self-hosted (or Confluent Cloud) | Self-hosted (or CloudAMQP) | Fully managed | Self-hosted | Fully managed |
| Complexity | High | Medium | Low | Very low | Low |
| Best for | Event streaming, analytics, audit, replay | Task queues, RPC-style async | Simple async tasks, serverless triggers | Real-time ephemeral pub/sub, chat presence | Large-scale event fan-out on GCP |
3. Delivery Semantics Deep Dive
At-Most-Once
What it means: Message is sent once. If it’s lost, it’s never retried.
How to achieve: Send message, do not wait for ACK. Do not retry on failure.
When it’s acceptable:
- Metrics / telemetry (losing 0.01% of data points is fine)
- Log aggregation (occasional gap is tolerable)
- Real-time dashboards (stale is OK, missing one point is OK)
Redis Pub/Sub is at-most-once by design — if a subscriber is offline, it misses the message.
SDI example: Vol 2 Ch05 (Metrics Monitoring) — metric data points sent to Kafka with at-least-once, but individual points can occasionally be dropped at the ingestion edge without meaningful impact.
At-Least-Once
What it means: Message will be delivered one or more times. Duplicates are possible.
How to achieve: Producer retries until it receives ACK. Consumer must be idempotent.
Idempotency requirement: Consumer must handle duplicate messages safely — common approach is deduplication by message ID or idempotent operation design.
When to use:
- Most production systems (default choice)
- Notification delivery (duplicate notification is annoying but not catastrophic)
- Order processing (if consumer is idempotent: check if order already processed)
SDI examples:
- Vol 1 Ch10 (Notification System): Messages retried on delivery failure; push notification providers (APNs, FCM) have idempotency keys.
- Vol 2 Ch06 (Ad Click Aggregation): Click events delivered at-least-once; deduplication at aggregation layer.
Exactly-Once
What it means: Message is delivered and processed exactly one time. No loss, no duplicates.
How to achieve:
- Kafka transactions: Producer wraps writes in a transaction (
begin, write to multiple topics,commit). Consumer reads only committed offsets. - Idempotent producer + transactional consumer: Kafka producer ID + sequence number prevent broker-side duplicates.
- Two-phase commit: Coordinator protocol across producer, broker, and consumer (expensive and complex).
Cost: Higher latency, lower throughput (~30–50% overhead for transactions).
When it’s required:
- Financial transactions (payment deducted exactly once)
- Billing systems (each ad click billed exactly once)
- Inventory deduction (item quantity decremented exactly once)
SDI examples:
- Vol 2 Ch06 (Ad Click Aggregation): Billing requires exactly-once; achieved via Kafka transactions + idempotent aggregation.
- Vol 2 Ch11 (Payment System): Exactly-once via idempotency key at application layer; message queue delivers at-least-once, but application deduplicates.
- Vol 2 Ch13 (Stock Exchange): Event sourcing with sequenced events — sequence numbers ensure exactly-once processing.
4. Kafka-Specific Concepts Quick Reference
| Concept | Definition | Why it matters |
|---|---|---|
| Topic | Named channel for a category of messages (e.g., user-clicks, order-placed) | Logical separation of event streams; consumers subscribe per topic |
| Partition | Ordered, immutable log that is a subdivision of a topic; unit of parallelism | More partitions = more consumer parallelism; messages in same partition are strictly ordered |
| Consumer Group | A set of consumers that collectively read all partitions of a topic | Enables load-balanced consumption; each partition assigned to exactly one consumer in a group |
| Offset | Monotonically increasing integer identifying a message’s position within a partition | Consumers track their read position; can seek backward for replay |
| ISR (In-Sync Replicas) | Set of replicas that are fully caught up with the partition leader | acks=all waits for all ISR replicas to confirm — guarantees no data loss if leader fails |
| Acks | Producer acknowledgment setting: 0 (fire-and-forget), 1 (leader only), all (all ISR) | Controls durability vs. throughput trade-off; acks=all is safest, slowest |
| Retention | How long Kafka keeps messages (time-based or size-based) | Enables replay; set to 7 days typical, longer for audit |
| Compaction | Log compaction keeps only the latest value per key | Useful for event sourcing / changelog topics where only latest state matters |
Partition count decision: Rule of thumb — target ~10 MB/s throughput per partition. For 100 MB/s, use ~10 partitions. More partitions = more consumers = more parallelism, but more overhead.
5. When to Use What — Decision Guide
| Requirement | Recommended | Why |
|---|---|---|
| Simple async task queue (email, image resize) | AWS SQS or RabbitMQ | Simple, managed, no replay needed |
| Fan-out to multiple services | Google Pub/Sub or Kafka topics | Multiple consumer groups / subscriptions each get full copy |
| Strict ordering per entity | Kafka (partition by entity ID) | Same user’s events always go to same partition → ordered |
| Replay / audit / reprocessing | Kafka | Configurable retention; seek to any offset |
| Real-time ephemeral notifications | Redis Pub/Sub | Ultra-low latency; OK to lose if subscriber offline |
| High throughput (>100K msg/sec) | Kafka or Google Pub/Sub | Designed for horizontal scale at millions of msg/sec |
| Exactly-once billing / financial | Kafka transactions | Only system with native transactional exactly-once |
| Simple fire-and-forget telemetry | Redis Pub/Sub or at-most-once Kafka | Loss is acceptable; don’t over-engineer |
| Cross-region fan-out | Google Pub/Sub or Kafka MirrorMaker | Built-in multi-region replication |
| Serverless / pay-per-use | AWS SQS + Lambda | Scales to zero; no idle cost |
6. Which SDI/Vol2 Chapters Use Messaging and Why
| Chapter | Technology used | Why messaging is needed |
|---|---|---|
| Vol 1 Ch10 — Notification System | Message queue (RabbitMQ / SQS per channel) | Decouple trigger from delivery; different queues per channel (iOS, Android, email, SMS) so one slow channel doesn’t block others |
| Vol 1 Ch11 — News Feed | Kafka (fanout service) | Publishing a post fans out to potentially 500 followers’ caches asynchronously; message queue absorbs write burst |
| Vol 1 Ch12 — Chat System | Message queue / Kafka | Decouples message receipt from delivery; ensures messages survive server restart; supports offline delivery |
| Vol 1 Ch14 — YouTube | Kafka (upload events → transcoding workers) | Video uploaded → event triggers parallel transcoding into 5 resolutions; queue absorbs upload spikes |
| Vol 1 Ch15 — Google Drive | Message queue | File upload complete → triggers sync notifications to all user’s other devices; delta notification fan-out |
| Vol 2 Ch04 — Distributed Message Queue | Kafka (the chapter IS about message queues) | Core topic — covers message persistence, partition, consumer groups, delivery guarantees |
| Vol 2 Ch05 — Metrics Monitoring | Kafka | High-volume metric ingestion (10K writes/sec); acts as buffer between collectors and time-series DB; enables replay for backfill |
| Vol 2 Ch06 — Ad Click Aggregation | Kafka | Ingest 50K clicks/sec; stream processing (Flink) reads from Kafka partitions; batch layer also reads Kafka for reconciliation |
| Vol 2 Ch08 — Distributed Email | Message queue | Email delivery is async; queue per priority (transactional vs marketing); retry failed deliveries without blocking sender |
| Vol 2 Ch13 — Stock Exchange | Kafka (market data, trade events) | Broadcast trade executions to all market participants; pub/sub for market data feeds; event sourcing via Kafka topic as immutable log |
7. Anti-Patterns — When NOT to Use a Message Queue
- Synchronous request-response: If the caller must wait for the result (e.g., login, payment confirmation), a message queue adds latency and complexity. Use direct RPC or REST instead.
- Simple in-process async: If producer and consumer are in the same process, use a thread pool or async/await — not a distributed queue.
- When ordering is critical across partitions: Kafka guarantees order within a partition. If you must order across all events globally and can’t partition effectively, a queue won’t save you — you need a sequencer (like stock exchange’s sequence server).
- When you need strong transactional consistency with a database in one atomic step: Message queues break atomicity. Solution is the outbox pattern (write to DB + outbox table in one transaction; CDC or poller publishes to queue).
- Tiny traffic, no burst: For < 10 req/sec with predictable load, a message queue is over-engineering. A simple background job or cron suffices.
- When you need synchronous fan-out with confirmation: Pub/Sub is fire-and-forget. If you need to know all subscribers processed the event before proceeding, use distributed transactions or sagas — not a simple queue.
8. Kafka Architecture Quick Reference
Understanding Kafka’s internal layout is tested directly in Vol 2 Ch04 and referenced in Ch06, Ch13.
Producers Kafka Cluster Consumers
┌──────────────────────────────┐
[Click events] ──→ │ Topic: "ad-clicks" │ ──→ [Flink stream processor]
[Page views] ──→ │ Partition 0: [0,1,2,3...] │ ──→ [ClickHouse batch loader]
[User events] ──→ │ Partition 1: [0,1,2,3...] │
│ Partition 2: [0,1,2,3...] │
│ │
│ Replication: each partition │
│ replicated to 3 brokers │
└──────────────────────────────┘
How partitioning works:
- Default: Round-robin across partitions (balanced load, no ordering across partitions)
- With key:
hash(key) % num_partitions→ same key always goes to same partition - Use case: Partition by
user_idto get per-user ordering; partition byad_idfor ad click aggregation
How consumer groups work:
Consumer Group A (billing):
Consumer 1 → reads Partition 0
Consumer 2 → reads Partition 1
Consumer 3 → reads Partition 2
Consumer Group B (analytics):
Consumer 1 → reads all 3 partitions (slower, single consumer)
- Each partition is consumed by exactly one consumer within a group
- Multiple groups each get a full, independent copy of the stream
- Adding consumers to a group beyond partition count = idle consumers (no work to do)
Producer acks and durability:
acks setting | Durability | Throughput | Data loss risk |
|---|---|---|---|
acks=0 | None (fire-and-forget) | Highest | High |
acks=1 | Leader confirms | High | If leader fails before replication |
acks=all (or -1) | All ISR replicas confirm | Lower | Minimal |
Rule: Use acks=all for billing/financial data (Vol 2 Ch06, Ch13). Use acks=1 for metrics/logs where occasional loss is acceptable (Vol 2 Ch05).
9. Messaging System Interview Decision Flowchart
START: Do you need async messaging?
│
├── No → Use synchronous REST/gRPC call
│
└── Yes
│
▼
Do you need multiple independent consumers (fan-out)?
│
├── No (one consumer per message = work queue)
│ │
│ ▼
│ Do you need replay / retention?
│ ├── No → AWS SQS or RabbitMQ
│ └── Yes → Kafka (configure retention)
│
└── Yes (pub/sub or streaming)
│
▼
Do you need replay / event sourcing?
├── No (ephemeral events) → Redis Pub/Sub or Google Pub/Sub
└── Yes
│
▼
Do you need >100K messages/sec or partitioned ordering?
├── No → Google Pub/Sub (managed, simpler)
└── Yes → Apache Kafka
Quick rule for SDI interviews:
- “Async task” → SQS / RabbitMQ
- “Fan-out to many services” → Kafka topics with multiple consumer groups
- “Event sourcing / audit log” → Kafka with long retention
- “Real-time presence / ephemeral” → Redis Pub/Sub
- “Billing / exactly-once” → Kafka with transactions + idempotent consumers
- Always mention: dead-letter queue (DLQ) for failed messages, backpressure handling, and monitoring queue depth
See also:
- key-patterns > 3. At-Most-Once vs At-Least-Once vs Exactly-Once
- distributed-system-components > 9. Message Queue — Role and when to use
- Sub — Fan-out pattern
- distributed-system-components > 11. Apache Kafka — Key concepts
- ch04-distributed-message-queue — Full chapter deep dive
- ch06-ad-click-aggregation — Kafka + Lambda architecture in action