Messaging Systems: Cross-Chapter Comparison Reference

comparisons kafka message-queue pubsub

Type: Cross-chapter reference — NOT a chapter notes file
Covers: Vol 1 Ch10, Ch11, Ch14 · Vol 2 Ch04, Ch05, Ch06, Ch08, Ch13
Last Updated: 2026-04-13


1. Messaging Patterns

Three fundamental patterns underlie all messaging systems.

PatternFlowConsumersMessage fate after deliveryTypical technologySDI use case
Message Queue (point-to-point)Producer → Queue → Single ConsumerOne consumer per messageDeleted after ACKRabbitMQ, AWS SQSTask offloading (email send, video encode)
Pub/Sub (fan-out)Publisher → Topic → Multiple SubscribersAll subscribers receive every messageRetained until all subscribers ACKGoogle Pub/Sub, AWS SNS, Redis Pub/SubNotification fan-out, event broadcasting
Event Streaming (Kafka-style)Producer → Partitioned Log → Consumer GroupsMultiple consumer groups, each independentRetained for configurable period (days/weeks)Apache Kafka, AWS KinesisActivity tracking, analytics, audit log, replay

Key distinction: A message queue is about work distribution (one consumer does the job). Pub/Sub is about event broadcasting (everyone hears it). Event streaming is about ordered, replayable, durable logs (you can re-read the past).


2. System Comparison Table

FeatureApache KafkaRabbitMQAWS SQSRedis Pub/SubGoogle Pub/Sub
OrderingPer-partition (strict)Per-queue (strict)Best-effort (FIFO queues available)No guaranteePer-message ordering key
RetentionConfigurable (days to forever)Until ACK (or TTL)14 days maxNo persistence7 days default
ThroughputMillions msg/sec~50K msg/sec~3K msg/sec per queue~1M msg/sec~1M msg/sec
Delivery semanticsAt-least-once (exactly-once with transactions)At-least-onceAt-least-once (FIFO = exactly-once)At-most-once (fire and forget)At-least-once
ReplayYes (seek to offset)NoNoNoYes (snapshot + replay)
Push or PullPull (consumers poll)Push and PullPullPushPush
Consumer isolationConsumer groups (each group independent)Competing consumers per queueCompeting consumersAll subscribers get all messagesSubscriptions per topic
ProtocolCustom binary (TCP)AMQP, MQTT, STOMPHTTP/HTTPS (AWS SDK)RESP (Redis protocol)gRPC / HTTP
ManagedSelf-hosted (or Confluent Cloud)Self-hosted (or CloudAMQP)Fully managedSelf-hostedFully managed
ComplexityHighMediumLowVery lowLow
Best forEvent streaming, analytics, audit, replayTask queues, RPC-style asyncSimple async tasks, serverless triggersReal-time ephemeral pub/sub, chat presenceLarge-scale event fan-out on GCP

3. Delivery Semantics Deep Dive

At-Most-Once

What it means: Message is sent once. If it’s lost, it’s never retried.

How to achieve: Send message, do not wait for ACK. Do not retry on failure.

When it’s acceptable:

  • Metrics / telemetry (losing 0.01% of data points is fine)
  • Log aggregation (occasional gap is tolerable)
  • Real-time dashboards (stale is OK, missing one point is OK)

Redis Pub/Sub is at-most-once by design — if a subscriber is offline, it misses the message.

SDI example: Vol 2 Ch05 (Metrics Monitoring) — metric data points sent to Kafka with at-least-once, but individual points can occasionally be dropped at the ingestion edge without meaningful impact.


At-Least-Once

What it means: Message will be delivered one or more times. Duplicates are possible.

How to achieve: Producer retries until it receives ACK. Consumer must be idempotent.

Idempotency requirement: Consumer must handle duplicate messages safely — common approach is deduplication by message ID or idempotent operation design.

When to use:

  • Most production systems (default choice)
  • Notification delivery (duplicate notification is annoying but not catastrophic)
  • Order processing (if consumer is idempotent: check if order already processed)

SDI examples:

  • Vol 1 Ch10 (Notification System): Messages retried on delivery failure; push notification providers (APNs, FCM) have idempotency keys.
  • Vol 2 Ch06 (Ad Click Aggregation): Click events delivered at-least-once; deduplication at aggregation layer.

Exactly-Once

What it means: Message is delivered and processed exactly one time. No loss, no duplicates.

How to achieve:

  • Kafka transactions: Producer wraps writes in a transaction (begin, write to multiple topics, commit). Consumer reads only committed offsets.
  • Idempotent producer + transactional consumer: Kafka producer ID + sequence number prevent broker-side duplicates.
  • Two-phase commit: Coordinator protocol across producer, broker, and consumer (expensive and complex).

Cost: Higher latency, lower throughput (~30–50% overhead for transactions).

When it’s required:

  • Financial transactions (payment deducted exactly once)
  • Billing systems (each ad click billed exactly once)
  • Inventory deduction (item quantity decremented exactly once)

SDI examples:

  • Vol 2 Ch06 (Ad Click Aggregation): Billing requires exactly-once; achieved via Kafka transactions + idempotent aggregation.
  • Vol 2 Ch11 (Payment System): Exactly-once via idempotency key at application layer; message queue delivers at-least-once, but application deduplicates.
  • Vol 2 Ch13 (Stock Exchange): Event sourcing with sequenced events — sequence numbers ensure exactly-once processing.

4. Kafka-Specific Concepts Quick Reference

ConceptDefinitionWhy it matters
TopicNamed channel for a category of messages (e.g., user-clicks, order-placed)Logical separation of event streams; consumers subscribe per topic
PartitionOrdered, immutable log that is a subdivision of a topic; unit of parallelismMore partitions = more consumer parallelism; messages in same partition are strictly ordered
Consumer GroupA set of consumers that collectively read all partitions of a topicEnables load-balanced consumption; each partition assigned to exactly one consumer in a group
OffsetMonotonically increasing integer identifying a message’s position within a partitionConsumers track their read position; can seek backward for replay
ISR (In-Sync Replicas)Set of replicas that are fully caught up with the partition leaderacks=all waits for all ISR replicas to confirm — guarantees no data loss if leader fails
AcksProducer acknowledgment setting: 0 (fire-and-forget), 1 (leader only), all (all ISR)Controls durability vs. throughput trade-off; acks=all is safest, slowest
RetentionHow long Kafka keeps messages (time-based or size-based)Enables replay; set to 7 days typical, longer for audit
CompactionLog compaction keeps only the latest value per keyUseful for event sourcing / changelog topics where only latest state matters

Partition count decision: Rule of thumb — target ~10 MB/s throughput per partition. For 100 MB/s, use ~10 partitions. More partitions = more consumers = more parallelism, but more overhead.


5. When to Use What — Decision Guide

RequirementRecommendedWhy
Simple async task queue (email, image resize)AWS SQS or RabbitMQSimple, managed, no replay needed
Fan-out to multiple servicesGoogle Pub/Sub or Kafka topicsMultiple consumer groups / subscriptions each get full copy
Strict ordering per entityKafka (partition by entity ID)Same user’s events always go to same partition → ordered
Replay / audit / reprocessingKafkaConfigurable retention; seek to any offset
Real-time ephemeral notificationsRedis Pub/SubUltra-low latency; OK to lose if subscriber offline
High throughput (>100K msg/sec)Kafka or Google Pub/SubDesigned for horizontal scale at millions of msg/sec
Exactly-once billing / financialKafka transactionsOnly system with native transactional exactly-once
Simple fire-and-forget telemetryRedis Pub/Sub or at-most-once KafkaLoss is acceptable; don’t over-engineer
Cross-region fan-outGoogle Pub/Sub or Kafka MirrorMakerBuilt-in multi-region replication
Serverless / pay-per-useAWS SQS + LambdaScales to zero; no idle cost

6. Which SDI/Vol2 Chapters Use Messaging and Why

ChapterTechnology usedWhy messaging is needed
Vol 1 Ch10 — Notification SystemMessage queue (RabbitMQ / SQS per channel)Decouple trigger from delivery; different queues per channel (iOS, Android, email, SMS) so one slow channel doesn’t block others
Vol 1 Ch11 — News FeedKafka (fanout service)Publishing a post fans out to potentially 500 followers’ caches asynchronously; message queue absorbs write burst
Vol 1 Ch12 — Chat SystemMessage queue / KafkaDecouples message receipt from delivery; ensures messages survive server restart; supports offline delivery
Vol 1 Ch14 — YouTubeKafka (upload events → transcoding workers)Video uploaded → event triggers parallel transcoding into 5 resolutions; queue absorbs upload spikes
Vol 1 Ch15 — Google DriveMessage queueFile upload complete → triggers sync notifications to all user’s other devices; delta notification fan-out
Vol 2 Ch04 — Distributed Message QueueKafka (the chapter IS about message queues)Core topic — covers message persistence, partition, consumer groups, delivery guarantees
Vol 2 Ch05 — Metrics MonitoringKafkaHigh-volume metric ingestion (10K writes/sec); acts as buffer between collectors and time-series DB; enables replay for backfill
Vol 2 Ch06 — Ad Click AggregationKafkaIngest 50K clicks/sec; stream processing (Flink) reads from Kafka partitions; batch layer also reads Kafka for reconciliation
Vol 2 Ch08 — Distributed EmailMessage queueEmail delivery is async; queue per priority (transactional vs marketing); retry failed deliveries without blocking sender
Vol 2 Ch13 — Stock ExchangeKafka (market data, trade events)Broadcast trade executions to all market participants; pub/sub for market data feeds; event sourcing via Kafka topic as immutable log

7. Anti-Patterns — When NOT to Use a Message Queue

  • Synchronous request-response: If the caller must wait for the result (e.g., login, payment confirmation), a message queue adds latency and complexity. Use direct RPC or REST instead.
  • Simple in-process async: If producer and consumer are in the same process, use a thread pool or async/await — not a distributed queue.
  • When ordering is critical across partitions: Kafka guarantees order within a partition. If you must order across all events globally and can’t partition effectively, a queue won’t save you — you need a sequencer (like stock exchange’s sequence server).
  • When you need strong transactional consistency with a database in one atomic step: Message queues break atomicity. Solution is the outbox pattern (write to DB + outbox table in one transaction; CDC or poller publishes to queue).
  • Tiny traffic, no burst: For < 10 req/sec with predictable load, a message queue is over-engineering. A simple background job or cron suffices.
  • When you need synchronous fan-out with confirmation: Pub/Sub is fire-and-forget. If you need to know all subscribers processed the event before proceeding, use distributed transactions or sagas — not a simple queue.

8. Kafka Architecture Quick Reference

Understanding Kafka’s internal layout is tested directly in Vol 2 Ch04 and referenced in Ch06, Ch13.

Producers                     Kafka Cluster                      Consumers
                     ┌──────────────────────────────┐
[Click events]  ──→  │  Topic: "ad-clicks"           │  ──→  [Flink stream processor]
[Page views]    ──→  │    Partition 0: [0,1,2,3...]  │  ──→  [ClickHouse batch loader]
[User events]   ──→  │    Partition 1: [0,1,2,3...]  │
                     │    Partition 2: [0,1,2,3...]  │
                     │                               │
                     │  Replication: each partition  │
                     │  replicated to 3 brokers      │
                     └──────────────────────────────┘

How partitioning works:

  • Default: Round-robin across partitions (balanced load, no ordering across partitions)
  • With key: hash(key) % num_partitions → same key always goes to same partition
  • Use case: Partition by user_id to get per-user ordering; partition by ad_id for ad click aggregation

How consumer groups work:

Consumer Group A (billing):
  Consumer 1 → reads Partition 0
  Consumer 2 → reads Partition 1
  Consumer 3 → reads Partition 2

Consumer Group B (analytics):
  Consumer 1 → reads all 3 partitions (slower, single consumer)
  • Each partition is consumed by exactly one consumer within a group
  • Multiple groups each get a full, independent copy of the stream
  • Adding consumers to a group beyond partition count = idle consumers (no work to do)

Producer acks and durability:

acks settingDurabilityThroughputData loss risk
acks=0None (fire-and-forget)HighestHigh
acks=1Leader confirmsHighIf leader fails before replication
acks=all (or -1)All ISR replicas confirmLowerMinimal

Rule: Use acks=all for billing/financial data (Vol 2 Ch06, Ch13). Use acks=1 for metrics/logs where occasional loss is acceptable (Vol 2 Ch05).


9. Messaging System Interview Decision Flowchart

START: Do you need async messaging?
  │
  ├── No → Use synchronous REST/gRPC call
  │
  └── Yes
       │
       ▼
    Do you need multiple independent consumers (fan-out)?
       │
       ├── No (one consumer per message = work queue)
       │     │
       │     ▼
       │   Do you need replay / retention?
       │     ├── No → AWS SQS or RabbitMQ
       │     └── Yes → Kafka (configure retention)
       │
       └── Yes (pub/sub or streaming)
             │
             ▼
           Do you need replay / event sourcing?
             ├── No (ephemeral events) → Redis Pub/Sub or Google Pub/Sub
             └── Yes
                   │
                   ▼
                 Do you need >100K messages/sec or partitioned ordering?
                   ├── No → Google Pub/Sub (managed, simpler)
                   └── Yes → Apache Kafka

Quick rule for SDI interviews:

  • “Async task” → SQS / RabbitMQ
  • “Fan-out to many services” → Kafka topics with multiple consumer groups
  • “Event sourcing / audit log” → Kafka with long retention
  • “Real-time presence / ephemeral” → Redis Pub/Sub
  • “Billing / exactly-once” → Kafka with transactions + idempotent consumers
  • Always mention: dead-letter queue (DLQ) for failed messages, backpressure handling, and monitoring queue depth

See also: