Messaging Systems: Cross-Chapter Comparison Reference

comparisons kafka message-queue pubsub

Type: Cross-chapter reference — NOT a chapter notes file
Covers: Vol 1 Ch10, Ch11, Ch14 · Vol 2 Ch04, Ch05, Ch06, Ch08, Ch13
Last Updated: 2026-04-13

1. Messaging Patterns

Three fundamental patterns underlie all messaging systems.

Pattern	Flow	Consumers	Message fate after delivery	Typical technology	SDI use case
Message Queue (point-to-point)	Producer → Queue → Single Consumer	One consumer per message	Deleted after ACK	RabbitMQ, AWS SQS	Task offloading (email send, video encode)
Pub/Sub (fan-out)	Publisher → Topic → Multiple Subscribers	All subscribers receive every message	Retained until all subscribers ACK	Google Pub/Sub, AWS SNS, Redis Pub/Sub	Notification fan-out, event broadcasting
Event Streaming (Kafka-style)	Producer → Partitioned Log → Consumer Groups	Multiple consumer groups, each independent	Retained for configurable period (days/weeks)	Apache Kafka, AWS Kinesis	Activity tracking, analytics, audit log, replay

Key distinction: A message queue is about work distribution (one consumer does the job). Pub/Sub is about event broadcasting (everyone hears it). Event streaming is about ordered, replayable, durable logs (you can re-read the past).

2. System Comparison Table

Feature	Apache Kafka	RabbitMQ	AWS SQS	Redis Pub/Sub	Google Pub/Sub
Ordering	Per-partition (strict)	Per-queue (strict)	Best-effort (FIFO queues available)	No guarantee	Per-message ordering key
Retention	Configurable (days to forever)	Until ACK (or TTL)	14 days max	No persistence	7 days default
Throughput	Millions msg/sec	~50K msg/sec	~3K msg/sec per queue	~1M msg/sec	~1M msg/sec
Delivery semantics	At-least-once (exactly-once with transactions)	At-least-once	At-least-once (FIFO = exactly-once)	At-most-once (fire and forget)	At-least-once
Replay	Yes (seek to offset)	No	No	No	Yes (snapshot + replay)
Push or Pull	Pull (consumers poll)	Push and Pull	Pull	Push	Push
Consumer isolation	Consumer groups (each group independent)	Competing consumers per queue	Competing consumers	All subscribers get all messages	Subscriptions per topic
Protocol	Custom binary (TCP)	AMQP, MQTT, STOMP	HTTP/HTTPS (AWS SDK)	RESP (Redis protocol)	gRPC / HTTP
Managed	Self-hosted (or Confluent Cloud)	Self-hosted (or CloudAMQP)	Fully managed	Self-hosted	Fully managed
Complexity	High	Medium	Low	Very low	Low
Best for	Event streaming, analytics, audit, replay	Task queues, RPC-style async	Simple async tasks, serverless triggers	Real-time ephemeral pub/sub, chat presence	Large-scale event fan-out on GCP

3. Delivery Semantics Deep Dive

At-Most-Once

What it means: Message is sent once. If it’s lost, it’s never retried.

How to achieve: Send message, do not wait for ACK. Do not retry on failure.

When it’s acceptable:

Metrics / telemetry (losing 0.01% of data points is fine)
Log aggregation (occasional gap is tolerable)
Real-time dashboards (stale is OK, missing one point is OK)

Redis Pub/Sub is at-most-once by design — if a subscriber is offline, it misses the message.

SDI example: Vol 2 Ch05 (Metrics Monitoring) — metric data points sent to Kafka with at-least-once, but individual points can occasionally be dropped at the ingestion edge without meaningful impact.

At-Least-Once

What it means: Message will be delivered one or more times. Duplicates are possible.

How to achieve: Producer retries until it receives ACK. Consumer must be idempotent.

Idempotency requirement: Consumer must handle duplicate messages safely — common approach is deduplication by message ID or idempotent operation design.

When to use:

Most production systems (default choice)
Notification delivery (duplicate notification is annoying but not catastrophic)
Order processing (if consumer is idempotent: check if order already processed)

SDI examples:

Vol 1 Ch10 (Notification System): Messages retried on delivery failure; push notification providers (APNs, FCM) have idempotency keys.
Vol 2 Ch06 (Ad Click Aggregation): Click events delivered at-least-once; deduplication at aggregation layer.

Exactly-Once

What it means: Message is delivered and processed exactly one time. No loss, no duplicates.

How to achieve:

Kafka transactions: Producer wraps writes in a transaction (begin, write to multiple topics, commit). Consumer reads only committed offsets.
Idempotent producer + transactional consumer: Kafka producer ID + sequence number prevent broker-side duplicates.
Two-phase commit: Coordinator protocol across producer, broker, and consumer (expensive and complex).

Cost: Higher latency, lower throughput (~30–50% overhead for transactions).

When it’s required:

Financial transactions (payment deducted exactly once)
Billing systems (each ad click billed exactly once)
Inventory deduction (item quantity decremented exactly once)

SDI examples:

Vol 2 Ch06 (Ad Click Aggregation): Billing requires exactly-once; achieved via Kafka transactions + idempotent aggregation.
Vol 2 Ch11 (Payment System): Exactly-once via idempotency key at application layer; message queue delivers at-least-once, but application deduplicates.
Vol 2 Ch13 (Stock Exchange): Event sourcing with sequenced events — sequence numbers ensure exactly-once processing.

4. Kafka-Specific Concepts Quick Reference

Concept	Definition	Why it matters
Topic	Named channel for a category of messages (e.g., `user-clicks`, `order-placed`)	Logical separation of event streams; consumers subscribe per topic
Partition	Ordered, immutable log that is a subdivision of a topic; unit of parallelism	More partitions = more consumer parallelism; messages in same partition are strictly ordered
Consumer Group	A set of consumers that collectively read all partitions of a topic	Enables load-balanced consumption; each partition assigned to exactly one consumer in a group
Offset	Monotonically increasing integer identifying a message’s position within a partition	Consumers track their read position; can seek backward for replay
ISR (In-Sync Replicas)	Set of replicas that are fully caught up with the partition leader	`acks=all` waits for all ISR replicas to confirm — guarantees no data loss if leader fails
Acks	Producer acknowledgment setting: `0` (fire-and-forget), `1` (leader only), `all` (all ISR)	Controls durability vs. throughput trade-off; `acks=all` is safest, slowest
Retention	How long Kafka keeps messages (time-based or size-based)	Enables replay; set to 7 days typical, longer for audit
Compaction	Log compaction keeps only the latest value per key	Useful for event sourcing / changelog topics where only latest state matters

Partition count decision: Rule of thumb — target ~10 MB/s throughput per partition. For 100 MB/s, use ~10 partitions. More partitions = more consumers = more parallelism, but more overhead.

5. When to Use What — Decision Guide

Requirement	Recommended	Why
Simple async task queue (email, image resize)	AWS SQS or RabbitMQ	Simple, managed, no replay needed
Fan-out to multiple services	Google Pub/Sub or Kafka topics	Multiple consumer groups / subscriptions each get full copy
Strict ordering per entity	Kafka (partition by entity ID)	Same user’s events always go to same partition → ordered
Replay / audit / reprocessing	Kafka	Configurable retention; seek to any offset
Real-time ephemeral notifications	Redis Pub/Sub	Ultra-low latency; OK to lose if subscriber offline
High throughput (>100K msg/sec)	Kafka or Google Pub/Sub	Designed for horizontal scale at millions of msg/sec
Exactly-once billing / financial	Kafka transactions	Only system with native transactional exactly-once
Simple fire-and-forget telemetry	Redis Pub/Sub or at-most-once Kafka	Loss is acceptable; don’t over-engineer
Cross-region fan-out	Google Pub/Sub or Kafka MirrorMaker	Built-in multi-region replication
Serverless / pay-per-use	AWS SQS + Lambda	Scales to zero; no idle cost

6. Which SDI/Vol2 Chapters Use Messaging and Why

Chapter	Technology used	Why messaging is needed
Vol 1 Ch10 — Notification System	Message queue (RabbitMQ / SQS per channel)	Decouple trigger from delivery; different queues per channel (iOS, Android, email, SMS) so one slow channel doesn’t block others
Vol 1 Ch11 — News Feed	Kafka (fanout service)	Publishing a post fans out to potentially 500 followers’ caches asynchronously; message queue absorbs write burst
Vol 1 Ch12 — Chat System	Message queue / Kafka	Decouples message receipt from delivery; ensures messages survive server restart; supports offline delivery
Vol 1 Ch14 — YouTube	Kafka (upload events → transcoding workers)	Video uploaded → event triggers parallel transcoding into 5 resolutions; queue absorbs upload spikes
Vol 1 Ch15 — Google Drive	Message queue	File upload complete → triggers sync notifications to all user’s other devices; delta notification fan-out
Vol 2 Ch04 — Distributed Message Queue	Kafka (the chapter IS about message queues)	Core topic — covers message persistence, partition, consumer groups, delivery guarantees
Vol 2 Ch05 — Metrics Monitoring	Kafka	High-volume metric ingestion (10K writes/sec); acts as buffer between collectors and time-series DB; enables replay for backfill
Vol 2 Ch06 — Ad Click Aggregation	Kafka	Ingest 50K clicks/sec; stream processing (Flink) reads from Kafka partitions; batch layer also reads Kafka for reconciliation
Vol 2 Ch08 — Distributed Email	Message queue	Email delivery is async; queue per priority (transactional vs marketing); retry failed deliveries without blocking sender
Vol 2 Ch13 — Stock Exchange	Kafka (market data, trade events)	Broadcast trade executions to all market participants; pub/sub for market data feeds; event sourcing via Kafka topic as immutable log

7. Anti-Patterns — When NOT to Use a Message Queue

Synchronous request-response: If the caller must wait for the result (e.g., login, payment confirmation), a message queue adds latency and complexity. Use direct RPC or REST instead.
Simple in-process async: If producer and consumer are in the same process, use a thread pool or async/await — not a distributed queue.
When ordering is critical across partitions: Kafka guarantees order within a partition. If you must order across all events globally and can’t partition effectively, a queue won’t save you — you need a sequencer (like stock exchange’s sequence server).
When you need strong transactional consistency with a database in one atomic step: Message queues break atomicity. Solution is the outbox pattern (write to DB + outbox table in one transaction; CDC or poller publishes to queue).
Tiny traffic, no burst: For < 10 req/sec with predictable load, a message queue is over-engineering. A simple background job or cron suffices.
When you need synchronous fan-out with confirmation: Pub/Sub is fire-and-forget. If you need to know all subscribers processed the event before proceeding, use distributed transactions or sagas — not a simple queue.

8. Kafka Architecture Quick Reference

Understanding Kafka’s internal layout is tested directly in Vol 2 Ch04 and referenced in Ch06, Ch13.

Producers                     Kafka Cluster                      Consumers
                     ┌──────────────────────────────┐
[Click events]  ──→  │  Topic: "ad-clicks"           │  ──→  [Flink stream processor]
[Page views]    ──→  │    Partition 0: [0,1,2,3...]  │  ──→  [ClickHouse batch loader]
[User events]   ──→  │    Partition 1: [0,1,2,3...]  │
                     │    Partition 2: [0,1,2,3...]  │
                     │                               │
                     │  Replication: each partition  │
                     │  replicated to 3 brokers      │
                     └──────────────────────────────┘

How partitioning works:

Default: Round-robin across partitions (balanced load, no ordering across partitions)
With key: hash(key) % num_partitions → same key always goes to same partition
Use case: Partition by user_id to get per-user ordering; partition by ad_id for ad click aggregation

How consumer groups work:

Consumer Group A (billing):
  Consumer 1 → reads Partition 0
  Consumer 2 → reads Partition 1
  Consumer 3 → reads Partition 2

Consumer Group B (analytics):
  Consumer 1 → reads all 3 partitions (slower, single consumer)

Each partition is consumed by exactly one consumer within a group
Multiple groups each get a full, independent copy of the stream
Adding consumers to a group beyond partition count = idle consumers (no work to do)

Producer acks and durability:

`acks` setting	Durability	Throughput	Data loss risk
`acks=0`	None (fire-and-forget)	Highest	High
`acks=1`	Leader confirms	High	If leader fails before replication
`acks=all` (or `-1`)	All ISR replicas confirm	Lower	Minimal

Rule: Use acks=all for billing/financial data (Vol 2 Ch06, Ch13). Use acks=1 for metrics/logs where occasional loss is acceptable (Vol 2 Ch05).

9. Messaging System Interview Decision Flowchart

START: Do you need async messaging?
  │
  ├── No → Use synchronous REST/gRPC call
  │
  └── Yes
       │
       ▼
    Do you need multiple independent consumers (fan-out)?
       │
       ├── No (one consumer per message = work queue)
       │     │
       │     ▼
       │   Do you need replay / retention?
       │     ├── No → AWS SQS or RabbitMQ
       │     └── Yes → Kafka (configure retention)
       │
       └── Yes (pub/sub or streaming)
             │
             ▼
           Do you need replay / event sourcing?
             ├── No (ephemeral events) → Redis Pub/Sub or Google Pub/Sub
             └── Yes
                   │
                   ▼
                 Do you need >100K messages/sec or partitioned ordering?
                   ├── No → Google Pub/Sub (managed, simpler)
                   └── Yes → Apache Kafka

Quick rule for SDI interviews:

“Async task” → SQS / RabbitMQ
“Fan-out to many services” → Kafka topics with multiple consumer groups
“Event sourcing / audit log” → Kafka with long retention
“Real-time presence / ephemeral” → Redis Pub/Sub
“Billing / exactly-once” → Kafka with transactions + idempotent consumers
Always mention: dead-letter queue (DLQ) for failed messages, backpressure handling, and monitoring queue depth

See also:

key-patterns > 3. At-Most-Once vs At-Least-Once vs Exactly-Once
distributed-system-components > 9. Message Queue — Role and when to use
Sub — Fan-out pattern
distributed-system-components > 11. Apache Kafka — Key concepts
ch04-distributed-message-queue — Full chapter deep dive
ch06-ad-click-aggregation — Kafka + Lambda architecture in action

Study Notes by Niladri & AI

Explorer

messaging-systems