Chapter 11 Flashcards - Payment System

flashcards volume2 payments stripe fintech transactions

What is an idempotency key in payment systems and why is it essential?
?
A unique UUID the client generates per payment intent and reuses on every retry. The server stores the result of the first execution keyed by this UUID. On any retry the server returns the stored result without re-executing the payment. Essential because: network timeouts and crashes force retries, and without idempotency every retry risks a double charge. PSPs like Stripe support it natively via the Idempotency-Key HTTP header.

What happens when a payment request times out and the client retries?
?
Client MUST retry with the SAME idempotency key as the original request. Server checks the idempotency table: if original already executed (even if response was lost), return cached result — no re-execution. If original never executed, process normally. This means: “retry freely, but always carry the same idempotency key.” The key is useless if the client generates a new UUID per retry.

What is double-entry bookkeeping and why do payment systems use it?
?
Every financial transaction creates exactly TWO ledger entries: a debit (money leaves) on one account and a credit (money arrives) on another. The net sum of all entries always equals zero. Payment systems use it because: (1) complete audit trail — every dollar always has a location, (2) non-zero sum signals a data bug, (3) required by financial regulators, (4) reversals create new offsetting entries instead of modifying history. Invented in the 1400s, still industry standard.

Give a concrete double-entry example: buyer pays $100 t o m er c han t . ? T w o l e d g er e n t r i es a r ecr e a t e d a t o mi c a l l y : (1) B u y er a cco u n t — D E B I T$ 100 (money leaves buyer), (2) Merchant account — CREDIT $100 (m o n ey a r r i v es a t m er c han t) . N e t :$ 100 debit + $100 cr e d i t =$ 0. If ledger sum ever ≠ 0, there is a bug. The ledger table is APPEND-ONLY — rows are never updated or deleted. Reversals (refunds) create two new offsetting entries.

What is a PSP (Payment Service Provider) and why should you NOT build your own?
?
A PSP (Stripe, Adyen, Braintree, Square) is an intermediary that connects your system to card networks (Visa, Mastercard) and banks. Reason NOT to build your own: (1) requires banking licenses in every country, (2) enormous compliance burden (PCI-DSS, SOC2, ISO 27001), (3) years of integration work per card network, (4) PSPs already handle fraud tools, chargebacks, and global currency conversion. Use a PSP — focus on your product, not banking infrastructure.

What are the states in a payment state machine and why must state be persisted BEFORE calling PSP?
?
States: PENDING (created, not sent) → EXECUTING (sent to PSP, awaiting response) → SUCCESS / FAILED / TIMED_OUT. REFUNDED (from SUCCESS). Must persist state BEFORE calling external service because: if service crashes between calling PSP and saving result, you need to know what was attempted. DB-first pattern: Save EXECUTING → Call PSP → Save SUCCESS/FAILED. If crash happens after saving EXECUTING, on restart you can query PSP for EXECUTING payments and reconcile.

What is a PSP webhook and why use it instead of just polling for payment status?
?
A webhook is an HTTP callback the PSP sends to your server when a payment completes (e.g., POST /v1/webhooks/psp with event: payment.succeeded). Webhooks are preferred because: (1) push-based — near-instant notification without polling overhead, (2) PSP retries on failure, (3) lower latency for buyer confirmation. BUT: webhook can be lost (server down), so always use polling as a fallback for payments stuck in EXECUTING state. Best: webhooks as fast path + polling as recovery mechanism.

How must webhook processing be idempotent?
?
PSPs retry webhooks on failure, so your webhook handler can receive the same event multiple times. Make it idempotent by: (1) storing the PSP’s unique event_id in a processed_events table, (2) on receipt, check if event_id already processed — if yes, return 200 and skip, (3) if no, process and record event_id atomically. Without this, a retried webhook can double-credit the merchant account or write duplicate ledger entries.

What is the nightly reconciliation process and what does it catch?
?
Reconciliation compares your internal ledger against the PSP’s transaction statement (CSV/API). It runs nightly (end of day). Catches: (1) payments stuck in EXECUTING that PSP marked SUCCESS (missed webhook) — auto-fix by updating status and writing ledger, (2) payments you marked SUCCESS that PSP marked FAILED — human review (refund buyer?), (3) amount mismatches — human review. Categories: Match (no action), auto-fixable (obvious discrepancy), human-review (ambiguous). Reconciliation is the safety net even after all real-time safeguards.

Why store money amounts as integers (cents) instead of floating-point numbers?
?
Floating-point (float/double) cannot represent all decimal values exactly in IEEE 754 binary format. Example: 0.1 + 0.2 = 0.30000000000000004. In financial systems, these micro-errors accumulate across billions of transactions creating real money discrepancies. Solution: store amounts in the smallest currency unit as integers — cents for USD ($9.99 = 999), paise for INR, yen for JPY (already integer). All arithmetic stays exact. Always convert to integer before any storage or calculation.

What is PCI-DSS and how does tokenization achieve compliance?
?
PCI-DSS (Payment Card Industry Data Security Standard) requires that any system storing or processing raw card numbers meet strict security controls. Tokenization: Stripe.js runs in the buyer’s browser and captures the card number directly, sends it to Stripe’s servers, returns a single-use token (e.g., tok_visa_4242) to your server. Your server NEVER sees the real card number — only the useless token. If your DB is breached, attackers cannot use tokens to charge cards. This dramatically reduces your PCI-DSS audit scope.

What is 3D Secure (3DS) and when is it triggered?
?
3DS is an additional authentication layer where the buyer’s bank verifies their identity (usually via SMS OTP). 3DS 1.0 used disruptive pop-up redirects. 3DS 2.0 is mostly frictionless (bank approves silently) with a challenge only for high-risk transactions. Triggered when: ML fraud score is high (e.g., > 0.5), large transaction amount, new device, unusual geography. Key benefit: when 3DS succeeds, liability for fraud shifts from the merchant to the issuing bank. Use 3DS 2.0 for high-risk transactions.

How does rule-based fraud detection differ from ML-based fraud detection?
?
Rule-based: Deterministic rules (block if > 5 failed attempts in 10 min, block if card used in 3+ countries in 1 hour, block if amount > $10K without 3DS). Pros: < 1ms, explainable, easy to update. Cons: fraudsters learn the rules, misses novel attack patterns. ML-based: Real-time model scores each transaction using features (transaction amount vs user’s average, device fingerprint, merchant category, geo-velocity). Output: 0.0–1.0 risk score. Pros: catches novel patterns, adapts to new fraud. Cons: black box, needs labeled training data, latency ~50-100ms. Use BOTH in production: rules first for obvious cases, ML for the rest.

What features does an ML fraud model use for real-time scoring?
?
Transaction features: amount (vs user’s historical average), time of day, day of week, merchant category. User features: account age, historical decline rate, number of cards used. Device features: device fingerprint (new or known?), IP address reputation, geo-velocity (distance from last transaction / time). Card features: issuing bank country vs transaction country, card type. Output latency requirement: < 100ms — must not slow down the checkout experience. Feature store needed for fast feature lookup at scoring time.

What is the difference between a pay-in and a pay-out flow?
?
Pay-in: Buyer pays money TO the platform. Flow: Buyer → Payment Service → PSP → Card Network → Issuing Bank. Result: money lands in merchant’s platform balance. Pay-out: Merchant withdraws money FROM platform to their bank account. Flow: Payment Service → PSP → ACH/Wire → Merchant’s bank. Key difference: pay-in uses card rails (Visa/Mastercard) with near-real-time processing; pay-out uses bank rails (ACH) which can take 1-3 business days. Both require idempotency and double-entry bookkeeping.

How does the payment service handle a crash in the EXECUTING state?
?
On restart (or via a background recovery job): (1) query DB for all payments in EXECUTING state older than timeout threshold (e.g., > 30 seconds), (2) for each, query PSP using the stored psp_reference (PSP’s transaction ID), (3) if PSP says SUCCESS → update DB to SUCCESS, write ledger entries, (4) if PSP says FAILED → update DB to FAILED, (5) if PSP has no record → payment never reached PSP, safe to retry. This ensures no payment stays stuck in EXECUTING forever.

What retry strategy should you use for failed PSP calls?
?
Exponential backoff with a maximum retry count: Attempt 1 immediately, then wait 1s, 2s, 4s, 8s… Max 5 retries. ALWAYS use the same idempotency key across all retries (prevents duplicate charges). After max retries: mark payment as FAILED, alert the operations team, notify the buyer. Jitter (random offset on wait times) is recommended at scale to prevent retry storms from multiple concurrent payments all retrying simultaneously.

What data does the idempotency table store and how long are keys retained?
?
Idempotency table stores: idempotency_key (PK, UUID), payment_id (FK to payment order), response_code (HTTP status), response_body (JSON — full response), created_at, expires_at. Retention: typically 24 hours (configurable). After expiry, the key is cleaned up — a new payment with a new UUID would be required. Never extend TTL to avoid stale results. The response_body is stored so identical responses are returned on retry, not just the same status code.

Why does the payment service use a relational database (not NoSQL)?
?
Financial data requires: (1) ACID transactions — writing payment status + ledger entries + wallet update atomically, (2) strong consistency — balance queries must always be accurate, not eventually consistent, (3) foreign key constraints — prevent orphaned ledger entries, (4) complex queries — reconciliation queries joining payments + ledger + PSP references. PostgreSQL or MySQL are standard choices. NoSQL (Cassandra, DynamoDB) trades consistency for availability/scale — acceptable for social feeds, not for money. Use NoSQL only for non-financial data (e.g., audit logs, analytics).

What is the wallet service responsible for and how does it update balances?
?
Wallet service tracks real-time balances for all users and merchants. Responsibilities: (1) balance storage (current amount per user per currency), (2) balance updates (debit buyer, credit merchant on payment success), (3) balance queries (for checkout — check if sufficient funds in platform wallet), (4) balance history (via ledger). Updates must be: atomic with ledger entry creation (DB transaction), consistent (never show intermediate state), idempotent (driven by payment_id to prevent double-credit). Balance = sum of all ledger credits minus debits, but materialized for query performance.

How does the API Gateway interact with the payment service?
?
API Gateway sits in front of the Payment Service and handles: (1) authentication (validate API key or JWT), (2) TLS termination (HTTPS), (3) rate limiting (prevent payment API abuse — e.g., max 100 requests/min per merchant), (4) request routing to correct service version, (5) logging and tracing (distributed trace ID per payment). The payment service itself handles business logic. API Gateway is the entry point for both merchant-facing API calls and PSP webhooks (usually on a separate secured endpoint).

How do you design the payment system database schema for multi-currency support?
?
Key rules: (1) store amount as BIGINT in minor units (cents, paise, yen — depends on currency), (2) store currency as CHAR(3) ISO 4217 code (USD, EUR, INR), (3) never mix currencies in one arithmetic operation without explicit conversion, (4) store exchange_rate and converted_amount separately at time of transaction (rates change!), (5) ledger entries always in the transaction’s original currency — never auto-convert in the ledger. Settlement currency (what the merchant receives) is a separate field. Example: amount=9999, currency=“USD” means $99.99.

What is the relationship between the payment service and the ledger service?
?
Payment Service is the orchestrator — it manages the payment lifecycle and calls other services. Ledger Service is called ONLY after a payment reaches a terminal state (SUCCESS or REFUNDED). The Payment Service calls Ledger Service to write the double-entry bookkeeping records. The Ledger Service is append-only and has no update/delete APIs — only INSERT. This separation keeps bookkeeping concerns isolated and ensures the ledger cannot be modified through the payment service code path.

How do you scale the payment system to handle 10x growth (1M → 10M TPS)?
?
Database: Shard payment_orders by buyer_id or payment_id (consistent hashing). Ledger: Shard by account_id. Caching: Cache idempotency key lookups in Redis (fast duplicate check before DB hit). PSP: Use multiple PSP connections (load balance across Stripe accounts or use multiple PSPs by geography). Async processing: Move non-critical steps (notifications, analytics) to message queue (Kafka). Read replicas: Route balance queries and reconciliation reports to read replicas. Regional deployment: Deploy in multiple regions, route buyers to nearest region.

What are the key differences between a payment system and a digital wallet?
?
Payment System: Moves money between buyer and merchant via external card networks and banks. Money comes from and goes to EXTERNAL accounts. Relies on PSP for card network access. Core concern: reliability and fraud prevention. Key pattern: idempotency + PSP integration. Digital Wallet: Stores money WITHIN the platform and transfers between internal accounts (user to user). All transfers are internal ledger operations. Core concern: consistency of internal balances. Key pattern: atomic ledger updates, event sourcing. A full product like PayPal has BOTH: a payment system (pay-in/pay-out) and a digital wallet (internal balance management).

Total Cards: 25
Review Time: 20-25 minutes
Priority: HIGH - Very common in fintech interviews!
Last Updated: 2026-04-13

Study Notes by Niladri & AI

Explorer

vol2-ch11-payment-system

Chapter 11 Flashcards - Payment System

Graph View