Chapter 12 Flashcards - Digital Wallet
flashcards volume2 digital-wallet fintech consistency balance
What is the core distributed transaction challenge in a digital wallet?
?
User A’s wallet and User B’s wallet may live on different database nodes (due to sharding). Transferring money requires atomically debiting A and crediting B. If Step 1 (debit A) succeeds and Step 2 (credit B) fails, A loses money that never arrives at B. Three approaches to solve this: 2-Phase Commit (2PC), Saga Pattern, or Event Sourcing. The challenge is making two separate DB writes look like one atomic operation.
What are the two phases of 2-Phase Commit (2PC)?
?
Phase 1 (Prepare): Coordinator asks all DB nodes “Can you commit this transaction?” Each node acquires locks, writes to WAL, responds YES or NO. Phase 2 (Commit/Abort): If ALL nodes said YES → coordinator sends COMMIT to all, nodes apply changes and release locks. If ANY node said NO → coordinator sends ABORT to all, nodes rollback and release locks. Result: either all nodes commit or none do.
Why is 2PC called a “blocking” protocol and why is this a problem?
?
After Phase 1, all DB nodes hold locks and wait for the coordinator’s Phase 2 decision. If the coordinator crashes after Phase 1 (before sending Commit/Abort), all nodes are blocked indefinitely — they cannot proceed or rollback without the coordinator’s decision. In a high-scale wallet, this blocks all concurrent transfers until the coordinator recovers. Blocking means 2PC is not suitable for high-availability, high-throughput systems. Use it only for small-scale, same-datacenter deployments.
What is the Saga pattern and how does it avoid distributed locks?
?
Saga breaks a distributed transaction into a sequence of LOCAL transactions. Each local transaction completes and publishes an event/message. If a step fails, compensating transactions undo previous completed steps. Avoids distributed locks because each step is an independent local DB transaction — no node holds locks waiting for others. The trade-off: brief window of inconsistency during the saga’s execution (eventual consistency). Two flavors: choreography (events drive next steps) and orchestration (central orchestrator calls each step).
What is the difference between choreography-based and orchestration-based Saga?
?
Choreography: No central coordinator. Each service listens for events and reacts. Example: Debit succeeds → publishes MoneyDebited event → Credit service listens and credits. Decoupled and simple but: hard to track overall saga state, complex error paths, hard to debug. Orchestration: A central Saga Orchestrator calls each service step by step, holds the saga state, and issues compensating calls on failure. Easier to debug and monitor, single place to see saga progress. Downside: orchestrator is a single point of logic (not failure if durable).
What is a compensating transaction in the Saga pattern?
?
A compensating transaction is the “undo” operation for a completed Saga step. Unlike a DB rollback (which discards uncommitted changes), a compensating transaction is a new operation that reverses the effect of a committed step. Example: Debit A succeeded → if Credit B fails → compensation is Credit A back $50 (new ledger entry). Rules: (1) must always succeed (you can’t fail a compensation), (2) must be idempotent (safe to retry), (3) it is a NEW record in the ledger — not a modification of the original. This preserves audit trail.
What is event sourcing and how does it compute wallet balance?
?
Event sourcing stores every state change as an immutable event (append-only) rather than storing current state. The current balance is computed by replaying all events: Balance = SUM of all CREDITED and DEBITED events for that user. Example events: {user_A, DEBITED, 50, txn_1}, {user_A, CREDITED, 200, txn_2}. Balance for A = 200 - 50 = 150. Never UPDATE or DELETE event rows. Reversals create new compensating events. This gives a complete, immutable audit trail of every cent.
Why is an append-only event store ideal for financial systems?
?
(1) Complete audit trail: every cent movement is permanently recorded, required by financial regulations. (2) Time travel: reconstruct balance at any past point by replaying events up to that timestamp — critical for dispute resolution. (3) No destructive updates: regulators require financial records cannot be silently altered. (4) Bug recovery: if the balance view gets corrupted, replay all events to rebuild it. (5) Analytics: derive any aggregate (daily volume, monthly avg balance) by processing the event stream. Trade-off: computing balance requires O(n) event replay without optimization (use snapshots).
What is the Snapshot pattern in event sourcing and why is it needed?
?
Problem: As a user’s event history grows to thousands of events, replaying all events to compute balance becomes O(n) and slow. Solution: Every N events (e.g., 1,000), save a snapshot of the materialized balance at that point. Computing balance: snapshot_balance + SUM(events after snapshot). Example: snapshot at event 5,000 shows balance=$150. There are 200 events after that. Replay only 200 events, not 5,000. Snapshot table stores: user_id, currency, balance, last_event_id, created_at. Snapshots can be deleted after newer ones are created — events are the source of truth.
What is CQRS and how does it pair with event sourcing?
?
CQRS (Command Query Responsibility Segregation) separates write model (commands) from read model (queries). With event sourcing: Command side appends events to the event store (write path). An async Event Processor reads new events and updates a Materialized Balance View (read model). Query side reads from the pre-computed materialized view for O(1) balance lookups. Benefits: (1) write and read scale independently, (2) event store not burdened with ad-hoc read queries, (3) multiple read models can be derived from same events (balance, history, analytics), (4) read model can be rebuilt from events if corrupted.
How does event sourcing avoid a distributed transaction for wallet transfers?
?
Key insight: write BOTH the debit event and the credit event in a SINGLE local DB transaction, keyed by transfer_id. If both events go to the same event store database (even if user A and user B are conceptually “on different shards,” their events can be co-located by transfer_id), this is one atomic local transaction — no 2PC needed. The trick: the event store is a single append-only DB; you shard it by transfer_id, not by user_id. Both entries for a transfer land on the same shard, making it a local transaction.
What is optimistic locking and when should you use it in a digital wallet?
?
Optimistic locking uses a version number on each record. Read: SELECT balance, version FROM wallets WHERE user_id=‘A’. Update: UPDATE wallets SET balance=X, version=version+1 WHERE user_id=‘A’ AND version=5. If 0 rows updated: version changed (concurrent modification) — retry. No locks held during processing, so no blocking. Use it when: conflicts are rare (most users don’t have 10 concurrent transfers simultaneously), you want to avoid Redis overhead for distributed locking. Use pessimistic locking (SELECT FOR UPDATE) when: conflicts are frequent and retries would be expensive.
How does distributed locking with Redis work for concurrent wallet transfers?
?
SET lock:wallet:{user_id} {random_value} NX EX 5. NX: only set if key does not exist (atomic compare-and-set). EX 5: auto-expire in 5 seconds (prevents deadlock if holder crashes). If SET returns OK: lock acquired, proceed. If SET returns nil: lock held by another process — retry with backoff. Release lock: only delete if value matches (prevents releasing another process’s lock): IF redis.get(key) == random_value THEN redis.del(key). Use random_value (not “1”) to prevent incorrect release. This provides mutual exclusion across services.
What is the read-your-writes consistency guarantee and how do you implement it for wallets?
?
Read-your-writes: after a user performs a write (transfer), their subsequent reads should reflect that write immediately. Problem: in a replicated DB, writes go to primary and reads go to replicas. If replica lag is 1 second, user sees old balance after transferring money — confusing UX. Solutions: (1) route same user’s balance queries to the primary DB after any write, (2) include last_write_timestamp in requests and reject reads from replicas with lag > threshold, (3) sticky reads: always route a session’s reads to same replica, (4) with CQRS, use a consistency token that the event processor acknowledges before showing balance.
What is monotonic reads and why does it matter for wallet balance?
?
Monotonic reads guarantee that if a user reads a value, their subsequent reads never return an older value. Problem without it: User B receives 150. Request 2 → replica 2 (lagging): balance shows $100. Balance appears to decrease — alarming and confusing. Solutions: (1) route all reads for a session to the same replica (sticky reads), (2) use primary for balance queries, (3) include a version/sequence token in responses; reject reads from replicas older than that version. Financial systems should prefer primary reads for balances.
What is the difference between a digital wallet and a payment system?
?
Payment System: Moves money between buyer and external bank/card account via PSP (Stripe/Adyen) and card networks (Visa/Mastercard). Money comes from EXTERNAL accounts. Core concerns: PSP integration, fraud, PCI compliance, idempotency key for PSP calls. Digital Wallet: Stores money WITHIN the platform and transfers between INTERNAL accounts (user-to-user). No external card network involved for internal transfers. Core concerns: atomic internal transfers, distributed transactions, event sourcing, CQRS, consistent balance. A full product like PayPal has BOTH: payment system (pay-in from card / pay-out to bank) + digital wallet (internal balance transfers).
How does idempotency work for wallet transfers?
?
Client generates UUID per transfer intent, reuses on retry. Server: (1) check idempotency cache (Redis) for the key — if found, return cached result, (2) if not found, process transfer, store result in cache with TTL (e.g., 24 hours), (3) return result. With event sourcing: idempotency is enforced by checking if a transfer_id already has events in the event store — if yes, skip re-execution. Two layers: (1) idempotency cache for fast deduplication, (2) transfer_id uniqueness constraint in event store as safety net. Always generate new UUID for new transfer, reuse for retries.
How does the event processor in CQRS maintain the materialized balance view?
?
Event processor is an async consumer (like a Kafka consumer or DB polling loop) that reads new events from the event store and updates the balance view table. For each DEBITED event: UPDATE balances SET balance = balance - amount WHERE user_id = X AND currency = Y. For each CREDITED event: UPDATE balances SET balance = balance + amount WHERE user_id = X AND currency = Y. Key requirement: idempotent processing — use event_id to track which events have been applied (prevent double-apply on replay). If balance view is corrupted: DELETE all rows and replay ALL events from the event store to rebuild.
How should you shard a digital wallet’s event store for scale?
?
Shard by transfer_id (not user_id). Why: both the debit event and credit event for a transfer must be in the same shard to allow a local (non-distributed) transaction. If sharded by user_id, A’s debit goes to shard A-M and B’s credit goes to shard N-Z — requiring a distributed transaction. With transfer_id sharding: HASH(transfer_id) → same shard for both events → single local transaction. The materialized balance view can be sharded by user_id separately (read-only, rebuilt from events). Snapshot table sharded by user_id. Each shard is independently consistent.
What happens if the event processor crashes mid-processing in CQRS?
?
The event processor tracks its progress using a cursor (last_processed_event_id). On restart: resume from last_processed_event_id + 1. Because event processing must be idempotent (each event applied exactly once), use an at-least-once delivery guarantee and deduplicate using event_id. Steps: (1) read event from store, (2) check processed_events table for event_id — if already processed, skip, (3) apply to balance view + insert event_id into processed_events (atomic), (4) update cursor. This ensures no event is missed and no event is double-applied even across crashes.
Compare 2PC, Saga, and Event Sourcing on consistency, availability, and use case.
?
2PC: Consistency=Strong, Availability=Low (blocking), Use case=Small scale, same DC, when strong consistency is non-negotiable (e.g., relational DB with distributed transaction support like CockroachDB/Spanner). Saga: Consistency=Eventual (brief window), Availability=High (no locks), Use case=Microservices on separate DBs, can tolerate brief inconsistency, need independent service scaling. Event Sourcing+CQRS: Consistency=Strong per-user, eventual for materialized views, Availability=High, Use case=Financial wallets requiring audit trail, time travel, full history. Recommendation for wallets at scale: Event Sourcing + CQRS.
What is the “local transaction trick” that makes event sourcing avoid distributed transactions?
?
When User A pays User B, write BOTH events (A’s debit AND B’s credit) in a SINGLE database transaction on the event store, using transfer_id as the shard key. Both events land on the same shard (same DB node) because they share the same transfer_id. This makes it a LOCAL transaction — no coordination across nodes needed, no 2PC, no Saga compensation. The events are: {user_A, DEBITED, 50, transfer_id=xyz} and {user_B, CREDITED, 50, transfer_id=xyz}. Atomically committed together. Either both persist or neither does.
How do you handle multi-currency transfers in a digital wallet?
?
Store all amounts in minor units (cents, paise, etc.) as integers with a currency code. For cross-currency transfers: (1) lock in exchange rate at time of transfer (store in the event), (2) create two events: sender’s event in source currency, receiver’s event in destination currency, (3) store exchange_rate used in the event record (immutable — rate cannot change retroactively), (4) the wallet service maintains separate balance per currency per user, (5) never auto-convert stored balances — only convert on explicit transfer. Rate changes only apply to future transactions, never historical ones.
What monitoring and alerting should a digital wallet have?
?
Business metrics: Transfer success rate (alert if < 99.9%), average transfer latency (alert if > 500ms), failed transfers per minute (spike = systemic issue). Consistency checks: Run periodic integrity check: SUM of all CREDITED events - SUM of all DEBITED events = 0 (double-entry check). Compare materialized balance views to event replay at random intervals (detect event processor bugs). Financial checks: Nightly reconciliation of internal ledger vs external funding source (pay-in amounts). Infrastructure: Event store lag, event processor consumer lag (alert if > 10 seconds), Redis lock wait times, DB connection pool saturation.
What are the guarantees a digital wallet must provide at the API level?
?
(1) Exactly-once transfer: same idempotency_key always returns same result, never charges twice. (2) Atomicity: either both debit and credit happen or neither — no partial transfers visible. (3) Durability: once a 200 response is returned, the transfer is permanently recorded. (4) Read-your-writes: after a successful transfer, the sender’s balance query reflects the deduction. (5) Consistency: at no point should the sum of all balances change (money is conserved — total transferred out = total transferred in). (6) Auditability: every balance change has a corresponding immutable event with timestamp, transfer_id, and counterparty.
Total Cards: 25
Review Time: 20-25 minutes
Priority: HIGH - Very common in fintech interviews!
Last Updated: 2026-04-13