Chapter 5 Cheat Sheet — Encoding and Evolution

One-Line Summaries

ConceptOne-Liner
EncodingConverting in-memory structures to bytes for storage or transmission
Backward compatibilityNewer code can read data written by older code
Forward compatibilityOlder code can read data written by newer code
Field tag (Protobuf)Numeric field identity in wire format — the canonical, permanent identifier
Avro schema resolutionWriter/reader schemas compared at decode time; fields matched by name
Schema registryCentral versioned schema store; enforces compatibility before bad data reaches consumers
Durable executionWorkflow engine that persists execution state to DB; survives process crashes transparently
Event notificationMinimal event triggers callback to source for full data
Event-carried state transferEvent carries full entity state; consumer is self-sufficient
Event sourcingState = replay of immutable event log

Encoding Format Comparison

FormatHuman-ReadableSchemaBinaryEvolutionField IdentitySize vs JSONPrimary Use
JSONYesOptionalNoManualField name100%REST APIs, config
XMLYesOptionalNoManualTag name~130%Legacy enterprise
CSVYesNoneNoPoorColumn position~70%Bulk data export
MessagePackNoNoneYesManualField name~80%Redis, compact JSON
ProtobufNoRequiredYesGoodField tag number~40%gRPC, internal RPC
ThriftNoRequiredYesGoodField tag number~42-70%Internal RPC
AvroNoRequiredYesExcellentField name (by schema match)~38%Kafka, Hadoop
FlatBuffersNoRequiredYesGoodField offset~50%Games, HFT

Compatibility Rules Quick Reference

PROTOCOL BUFFERS / THRIFT:
  Add optional field (new tag)  → SAFE    (backward + forward)
  Remove optional field          → SAFE    (tag becomes unused)
  Rename field                   → SAFE    (name not in wire)
  Change tag number              → UNSAFE  (old data misinterpreted)
  Add required field             → UNSAFE  (old data missing it)
  Reuse deleted tag              → UNSAFE  (old data re-interpreted)
  Change field type              → UNSAFE  (usually; widening is sometimes OK)

AVRO:
  Add field with default value   → SAFE    (missing in old data → use default)
  Remove field                   → SAFE    (old data has value → ignored)
  Rename field                   → UNSAFE  (name is identity; schema resolution fails)
  Add field without default      → UNSAFE  (old data has no value, no default → error)
  Change field type              → UNSAFE  (unless promotion rules allow it)

JSON:
  Everything                     → MANUAL  (no enforcement; discipline required)

Avro Schema Resolution

Writer's Schema (v1)              Reader's Schema (v2)
─────────────────────             ────────────────────────
field: userName  string  ──────→  field: userName  string
field: age       int     ──────→  field: age       int, default=0
                                  field: email     string, default=""  ← new (gets default)

Rules:
  Writer has field, Reader doesn't      → value IGNORED
  Reader has field, Writer doesn't      → reader uses DEFAULT value
  Both have field, same name            → value USED (type promotion if needed)
  Reader field has no default, Writer   → ERROR (cannot decode)
  lacks the field

Protobuf Wire Format

JSON (81 bytes):                    Protobuf (~33 bytes):
{                                   [tag=1, type=string][len][M][a][r][t][i][n]
  "userName": "Martin",             [tag=2, type=varint][1337 as varint]
  "favoriteNumber": 1337,           [tag=3, type=string][len][d][a][y][d][r]...
  "interests": ["daydreaming"]      [tag=3, type=string][len][h][a][c][k][i][n][g]
}

Key insight: tag number = field identity. Name is only in the .proto file.
Changing name: SAFE. Changing tag: CORRUPTS existing data.

Three Modes of Dataflow

1. THROUGH DATABASES
   ┌────────┐  encode  ┌──────┐  decode  ┌────────┐
   │ App v1 │ ───────→ │  DB  │ ───────→ │ App v2 │
   └────────┘          └──────┘          └────────┘
   ⚠ Unknown field preservation: App v1 must re-emit fields it doesn't understand
   ⚠ Rolling upgrades: v1 and v2 run simultaneously, both access same DB

2. THROUGH SERVICES (REST/RPC)
   ┌──────────┐  request   ┌──────────┐
   │  Client  │ ─────────→ │  Server  │
   │  (v1)    │ ←───────── │  (v2)    │
   └──────────┘  response  └──────────┘
   ✓ API versioning: /v2/users or Accept header
   ✓ Clients and servers deploy independently

3. THROUGH MESSAGE BROKERS (Kafka/RabbitMQ)
   ┌──────────┐  publish  ┌────────┐  consume  ┌──────────┐
   │ Producer │ ────────→ │ Kafka  │ ────────→ │ Consumer │
   └──────────┘           └────────┘           └──────────┘
   ⚠ Messages may be consumed days after production — must stay decodable
   ⚠ Schema registry mandatory for binary formats

4. THROUGH DURABLE EXECUTION (Temporal/Step Functions)
   ┌──────────────────────────────────────────────────────────┐
   │  Temporal Server (event log in DB)                        │
   │   workflow_started → activity_completed → timer_fired ... │
   └──────────────────────────────────────────────────────────┘
   Worker polls, replays event history, resumes from last checkpoint
   ⚠ Workflow code must be DETERMINISTIC — no time.now(), no random()
   ⚠ Activity inputs/outputs must be backward-compatible across versions

Durable Execution Deep Dive

PROBLEM:
  Step 1: Charge card
  Step 2: Send email
  Step 3: Update inventory
  → Process crashes after Step 1 — what happens to Steps 2 and 3?

WITHOUT durable execution:
  - Track state in DB manually (complex, error-prone)
  - Use saga pattern with compensating transactions (complex)
  - Leave steps partially done (data inconsistency)

WITH Temporal:
  @workflow.defn
  def order_workflow(order: Order):
      charge_result = workflow.execute_activity(charge_card, order)  # persisted
      email_result  = workflow.execute_activity(send_email, order)    # persisted
      inv_result    = workflow.execute_activity(update_inventory, order) # persisted
  
  On crash: replay event history → resume at Step 2 (Step 1 already persisted)
  Guarantee: each activity executes at-least-once; workflow is effectively-once

DETERMINISM RULE:
  ❌ time.now()              → use workflow.now()
  ❌ random.random()         → use workflow.random()
  ❌ direct HTTP call        → wrap in execute_activity()
  ❌ global mutable state    → each replay starts fresh

Event-Driven Architecture Patterns

1. EVENT NOTIFICATION
   ┌──────────┐  {orderId: "123"}  ┌──────────┐
   │  Order   │ ─────────────────→ │  Email   │
   │  Service │                    │  Service │──→ GET /orders/123
   └──────────┘                    └──────────┘
   + Tiny events, low coupling
   - Consumer still depends on source API

2. EVENT-CARRIED STATE TRANSFER
   ┌──────────┐  {orderId:"123", items:[...], total:49.99}  ┌───────────┐
   │  Order   │ ──────────────────────────────────────────→ │ Analytics │
   │  Service │                                              │  Service  │
   └──────────┘                                             └───────────┘
   + Consumer is autonomous (no callback needed)
   - Larger events, data duplication, evolution complexity

3. EVENT SOURCING
   Append-only event log:
     [order.created] [item.removed] [payment.received] [order.shipped]
                              ↓ fold/replay
                       Current Order State
   
   + Complete audit trail, time travel, multiple projections
   + CQRS: separate write model (events) from read models (projections)
   - Eventual consistency of projections
   - Schema evolution needs upcasters for old event formats

REST vs gRPC Decision Tree

Is the API consumed by external clients (browsers, mobile apps, third parties)?
├─ YES → REST + JSON + OpenAPI  (human-readable, universally accessible)
└─ NO  → Is it internal microservice-to-microservice?
          ├─ YES, needs streaming → gRPC (bidirectional streaming)
          ├─ YES, needs efficiency → gRPC (2-3x smaller, typed)
          └─ YES, needs simplicity → REST (easier debugging, no protoc setup)

Schema Registry Flow (Kafka + Avro)

PRODUCER:                           CONSUMER:
  schema = load("person.avsc")        msg = kafka.read()
  id = registry.register(schema)      schema_id = msg[:4]  # first 4 bytes
  payload = avro.encode(data,         schema = registry.fetch(schema_id)
                        schema)       data = avro.decode(msg[4:],
  kafka.send([id_bytes][payload])               writer_schema=schema,
                                                reader_schema=MY_SCHEMA)

Key Trade-offs Summary

ChooseWhen
JSONPublic API, external clients, human debugging needed
ProtobufInternal microservices, gRPC, high-performance, typed contracts
AvroKafka pipelines, Hadoop, schema registry governance required
Durable ExecutionMulti-step workflow, long-running, human-in-loop, crash safety critical
Message QueueShort async tasks, work distribution, competing consumers
Event SourcingFull audit trail needed, time-travel queries, CQRS, multiple projections

Red Flags

  • Using Java Serializable / Python pickle for inter-service communication
  • Changing a Protobuf field tag number in an existing schema
  • Adding a required field to a Protobuf schema with existing data
  • Avro field renamed without also adding an alias and updating all consumers
  • Workflow code calling time.now() directly instead of the framework timer API
  • Not using a schema registry with Avro-encoded Kafka topics
  • ORM that silently drops unknown fields on update (destroys forward-compat fields)

Green Flags

  • All new Protobuf/Avro fields are optional with sensible default values
  • Schema compatibility checked in CI (Buf breaking change detection)
  • Schema registry with BACKWARD or FULL compatibility mode
  • Durable execution used for workflows with >1 external call
  • Event sourcing with upcaster pipeline for schema-evolved events
  • API versioning via /v2/ URL prefix for breaking changes

Quick Comparison: Durable Execution vs Alternatives

Cron + DBMessage QueueDurable Execution
Crash recoveryManual (poll DB)Re-deliveryAutomatic replay
State managementYou write itYou write itFramework handles
Multi-stepComplex sagasComplex choreographyNatural code flow
Long timers (days)Cron jobsNot designed forBuilt-in timers
VisibilityCustom loggingQueue metricsFull event history
OverheadLowLowMedium (framework)

Quick Revision Time: 5 minutes
Interview Prep: 15 minutes
Last Updated: 2026-05-29