Chapter 11: Design a Payment System
volume2 payments stripe fintech transactions
Status: π© Interview ready - Very common for fintech interviews!
Difficulty: Very Hard
Time to complete: 60 min read + practice
Overview
Payment systems are the backbone of e-commerce, fintech, and SaaS platforms. Think PayPal, Stripe, Square β they process billions of dollars every day. A payment system must never lose money, never double-charge, and recover gracefully from every kind of failure.
Why this matters:
- Very common interview question at fintech companies (Stripe, Square, PayPal, Robinhood, Chime)
- Also appears at FAANG-level interviews when candidate has fintech background
- Teaches distributed transactions, idempotency, consistency, and event-driven design
- The hardest non-negotiable constraint: exactly-once processing (no double charges)
Problem Statement
Design a payment system that:
- Processes payments reliably (no money lost, no double charges)
- Handles failures and retries safely
- Integrates with external banks and card networks
- Provides an audit trail of every transaction
- Scales to millions of transactions per day
Step 1: Requirements & Scope (5 min)
Functional Requirements
Clarifying questions:
- What type of payments? β Credit/debit card, bank transfer (ACH), digital wallets
- Scale? β 1 million transactions per day (~12 TPS average, ~100 TPS peak)
- Who are our users? β Merchants (businesses) collecting payments from buyers
- Handle refunds? β Yes, full and partial refunds
- Multi-currency? β Yes, with currency conversion
- Store card numbers? β No β PCI-DSS compliance (use tokenization via PSP)
Scope:
- Process pay-in (buyer pays merchant)
- Process pay-out (merchant withdraws to bank account)
- Maintain ledger (bookkeeping of all transactions)
- Reconciliation with PSP at end of day
- Retry failed payments safely with idempotency
Non-Functional Requirements
- Exactly-once processing: No double charges under any failure scenario
- Strong consistency: Financial data must always be accurate
- High availability: 99.99% (payments must not go down)
- Low latency: < 1 second for payment confirmation
- PCI-DSS compliance: Never store raw card numbers
- Auditability: Every state change immutably logged
- Reconciliation: Daily balance check against PSP
Scale Estimates
Transactions per day: 1,000,000
Average TPS: 1,000,000 / 86,400 β 12 TPS
Peak TPS: ~100 TPS (10x headroom)
Storage per transaction: ~1 KB
Daily storage: 1M Γ 1 KB = 1 GB/day
Yearly storage: ~365 GB/year (manageable with sharding)
Ledger entries per transaction: 2 (double-entry: debit + credit)
Ledger rows per day: 2,000,000
Step 2: High-Level Design (10 min)
Core Entities
Payment Order: Intent to pay (created before sending to bank)
Transaction: Completed or attempted payment
Ledger Entry: Immutable accounting record (double-entry)
Wallet: User/merchant balance
Key External Actor: PSP (Payment Service Provider)
Why NOT build our own connection to Visa/Mastercard/banks?
- Requires banking licenses in every country
- Enormously complex compliance (PCI-DSS, SOC2, ISO 27001)
- Years of integration work
- Instead: Use Stripe, Adyen, Braintree β they handle it for you
PSPs we integrate with:
| PSP | Speciality |
|---|---|
| Stripe | Developer-friendly, global cards |
| Adyen | Enterprise, multi-currency, global |
| Braintree (PayPal) | PayPal integration, US-focused |
| Square | In-person point-of-sale |
Payment Flow (Pay-in)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Pay-In Flow β
β β
β ββββββββββ 1. Checkout βββββββββββββββββββ β
β β Buyer β ββββββββββββββ β Payment Service β β
β ββββββββββ β (Orchestrator) β β
β ββββββββββ¬βββββββββ β
β β β
β 2. Create order β 3. Save PENDING β
β + idempotency β to DB β
β key ββββββββββββ β
β β Orders β β
β β DB β β
β ββββββββββββ β
β β β
β 4. Forward to PSP β β
β (Stripe/Adyen) β β
β βββββββββββββββββββ β
β β PSP Gateway β β
β β (Stripe/Adyen) β β
β ββββββββββ¬βββββββββ β
β β β
β 5. PSP sends to β β
β card network β β
β βββββββββββββββββββ β
β β Card Network β β
β β (Visa/Mastercard)β β
β ββββββββββ¬βββββββββ β
β β β
β 6. Bank approves β β
β or declines β β
β βββββββββββββββββββ β
β β Issuing Bank β β
β β (Buyer's bank) β β
β βββββββββββββββββββ β
β β
β 7. PSP webhook β Payment Service β Update PENDING β SUCCESS β
β 8. Payment Service β Ledger Service (double-entry booking) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Core Services
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Payment Platform β
β β
β βββββββββββββββββ ββββββββββββββββββββββββββββ β
β β Payment β β Ledger Service β β
β β Service ββββββ (Double-entry books) β β
β β (Orchestrator)β β - Immutable entries β β
β βββββββββ¬ββββββββ ββββββββββββββββββββββββββββ β
β β β
β β ββββββββββββββββββββββββββββ β
β β β Wallet Service β β
β ββββββββββββ β (User balances) β β
β β - Balance queries β β
β β - Balance updates β β
β ββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββ ββββββββββββββββββββββββββββ β
β β PSP Adaptor β β Reconciliation Service β β
β β (Stripe/Adyen)β β (Nightly EOD checks) β β
β βββββββββββββββββ ββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
API Design
Create payment order:
POST /v1/payments
Request:
{
"buyer_id": "user_123",
"seller_id": "merchant_456",
"amount": 9999, // In cents (avoid float rounding)
"currency": "USD",
"payment_method": "card",
"idempotency_key": "550e8400-e29b-41d4-a716-446655440000"
}
Response (202 Accepted):
{
"payment_id": "pay_abc123",
"status": "PENDING",
"checkout_url": "https://stripe.com/pay/abc123"
}
Query payment status:
GET /v1/payments/{payment_id}
Response:
{
"payment_id": "pay_abc123",
"status": "SUCCESS",
"amount": 9999,
"currency": "USD",
"created_at": "2026-04-13T10:00:00Z",
"completed_at": "2026-04-13T10:00:03Z"
}
PSP Webhook (incoming):
POST /v1/webhooks/psp
{
"event": "payment.succeeded",
"payment_id": "pay_abc123",
"psp_reference": "stripe_ch_abc123",
"amount": 9999,
"currency": "USD",
"timestamp": "2026-04-13T10:00:03Z"
}
Step 3: Deep Dive (25 min)
Deep Dive 1: Exactly-Once Payment (Idempotency)
The problem: Networks fail. Servers crash. Clients retry. Without idempotency, every retry is a potential double charge.
Scenario: Double charge without idempotency
t=0s: Client sends payment request to Payment Service
t=1s: Payment Service forwards to PSP (Stripe)
t=2s: PSP charges bank β SUCCESS
t=2s: Network drops β Client never gets response
t=3s: Client times out β Retries
t=3s: Payment Service sends SECOND request to PSP
t=4s: PSP charges bank AGAIN β DOUBLE CHARGE! β
Solution: Idempotency Key
Every payment request includes a unique idempotency_key (UUID).
The server stores the result of the first execution.
On retry with same key β return stored result, skip re-execution.
Implementation:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Idempotency Layer β
β β
β 1. Client generates UUID: "550e8400-e29b-41d4-a716..." β
β β
β 2. Payment Service receives request: β
β - Check DB: Has this idempotency_key been processed? β
β - If YES β Return cached response (no re-execution) β
β - If NO β Process payment, save result with key β
β β
β 3. On any retry with same key: β
β - Check β Found β Return same response β
β - Never charge twice β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Idempotency table in DB:
CREATE TABLE idempotency_keys (
idempotency_key VARCHAR(255) PRIMARY KEY,
payment_id VARCHAR(255) NOT NULL,
response_code INT,
response_body JSON,
created_at TIMESTAMP DEFAULT NOW(),
expires_at TIMESTAMP -- TTL: clean up after 24 hours
);Key rules:
- Client MUST generate a new UUID per unique payment intent
- Client MUST reuse the same UUID on retries of the SAME payment
- Server stores result with TTL (24 hours is typical)
- PSPs (Stripe, Adyen) support idempotency natively via
Idempotency-Keyheader
Stripe example:
POST https://api.stripe.com/v1/charges
Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000
{
"amount": 9999,
"currency": "usd",
"source": "tok_visa"
}
Deep Dive 2: Double-Entry Bookkeeping
The rule: Every financial transaction produces exactly TWO ledger entries:
- A debit on one account (money leaves)
- A credit on another account (money arrives)
- The sum of all ledger entries always equals zero
Why this matters:
- Catch bugs: If sum β 0, something is wrong in the system
- Audit trail: Every dollar is always accounted for somewhere
- Standard in accounting since the 1400s β industry requirement
Example: Buyer pays $100 to Merchant:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Transaction: Buyer pays $100 to Merchant β
β β
β Account β Debit ($) β Credit ($) β Balance ($) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β Buyer Account β 100.00 β β -100.00 β
β Merchant Account β β 100.00 β +100.00 β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β Net balance: β 100.00 β 100.00 β 0.00 β
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Ledger table (append-only, NEVER update):
CREATE TABLE ledger_entries (
entry_id BIGINT PRIMARY KEY AUTO_INCREMENT,
transaction_id VARCHAR(255) NOT NULL,
account_id VARCHAR(255) NOT NULL,
entry_type ENUM('DEBIT', 'CREDIT') NOT NULL,
amount BIGINT NOT NULL, -- In cents, always positive
currency CHAR(3) NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
-- NO updated_at β immutable!
-- NEVER UPDATE OR DELETE rows
);Why immutable (append-only):
- Financial regulations require full audit history
- Cannot alter past β must create correcting entries (reversals)
- Protects against bugs silently modifying history
Deep Dive 3: Payment State Machine
Every payment goes through defined states. Persisting state in DB before acting on it ensures we can always recover.
βββββββββββ
β PENDING β (Created, not yet sent to PSP)
ββββββ¬βββββ
β Send to PSP
β
ββββββββββββ
β EXECUTING β (Sent to PSP, awaiting response)
βββββββ¬ββββββ
β
ββββββββββββΌβββββββββββββββ
β β β
βββββββββββ ββββββββ ββββββββββββ
β SUCCESS β βFAILEDβ βTIMED_OUT β
ββββββ¬βββββ ββββββββ ββββββββββββ
β
β (Merchant initiates refund)
β
ββββββββββββ
β REFUNDED β
ββββββββββββ
State transitions stored in DB:
CREATE TABLE payment_orders (
payment_id VARCHAR(255) PRIMARY KEY,
buyer_id VARCHAR(255) NOT NULL,
seller_id VARCHAR(255) NOT NULL,
amount BIGINT NOT NULL,
currency CHAR(3) NOT NULL,
status ENUM('PENDING','EXECUTING','SUCCESS','FAILED',
'TIMED_OUT','REFUNDED') NOT NULL,
idempotency_key VARCHAR(255) UNIQUE NOT NULL,
psp_reference VARCHAR(255), -- PSP's internal ID for reconciliation
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP
);Rule: Always write state to DB BEFORE taking action. If service crashes, state is preserved and we can resume.
WRONG (data loss on crash):
1. Call PSP β crash here β don't know if PSP was called
2. Save to DB
CORRECT (safe recovery):
1. Save EXECUTING to DB β crash here β retry resumes from EXECUTING
2. Call PSP
3. Save SUCCESS/FAILED to DB
Deep Dive 4: Handling Failures
Failure type 1: Payment service crashes mid-transaction
Problem: Service crashes after saving EXECUTING but before getting PSP response
Solution:
- On restart, find all EXECUTING payments older than timeout threshold
- Query PSP for their status (PSP reference stored in DB)
- Update DB with PSP's response
- This is called "payment reconciliation on startup"
Failure type 2: Network timeout (client doesnβt know if payment succeeded)
Problem: Client sends payment β times out β is the payment done?
Timeline:
t=0: Client β Payment Service (request sent)
t=5: Network timeout β Client doesn't know result
t=6: Client retries with SAME idempotency_key
Solution:
- If payment was already executed β idempotency key lookup returns SUCCESS
- If payment was not executed β safe to retry (idempotency prevents duplicate)
- Always retry with same idempotency_key
Failure type 3: PSP webhook not received
Problem: PSP sends webhook β Payment service is down β Webhook lost
Solutions:
a) Retry mechanism: PSP retries webhook with exponential backoff
b) Polling: Payment service polls PSP for status of EXECUTING payments
c) Both: Use webhook as fast path, polling as fallback
Retry strategy (exponential backoff):
Attempt 1: Immediate
Attempt 2: Wait 1 second
Attempt 3: Wait 2 seconds
Attempt 4: Wait 4 seconds
Attempt 5: Wait 8 seconds
Max retries: 5 (then move to FAILED, alert operations team)
Always use same idempotency_key across all retries
Webhook processing (idempotent):
POST /v1/webhooks/psp
{
"event_id": "evt_abc123", // PSP's unique event ID
"event": "payment.succeeded",
"payment_id": "pay_abc123"
}
Server:
1. Check if event_id already processed β skip if yes (idempotent)
2. Update payment_orders status β SUCCESS
3. Write ledger entries (double-entry)
4. Update wallet balances
5. Notify merchant
Deep Dive 5: Reconciliation
Why reconciliation? Even with idempotency and retries, distributed systems can have discrepancies. Reconciliation is the safety net.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Nightly Reconciliation Flow β
β β
β 11:59 PM: Fetch PSP statement (CSV/API) β
β β β
β Compare with internal ledger: β
β β
β PSP says: pay_001 = $100 SUCCESS β
β Our DB: pay_001 = $100 SUCCESS β MATCH β
β
β β
β PSP says: pay_002 = $50 SUCCESS β
β Our DB: pay_002 = EXECUTING β MISMATCH β (we missed β
β the webhook!) β
β β Auto-fix: Update to SUCCESS, write ledger β
β β
β PSP says: pay_003 = $200 FAILED β
β Our DB: pay_003 = SUCCESS β MISMATCH β (PSP failed β
β but we marked success)β
β β Alert: Human review needed (refund buyer?) β
β β
β 12:30 AM: Reconciliation report generated β
β Matched: 999,850 / 1,000,000 (99.985%) β
β Auto-fixed: 148 β
β Human review: 2 β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Reconciliation categories:
| Category | Description | Action |
|---|---|---|
| Match | Both agree on amount + status | None needed |
| Missing in internal | PSP has it, we donβt | Insert from PSP data |
| Missing in PSP | We have it, PSP doesnβt | Investigate (fraud?) |
| Amount mismatch | Same ID, different amount | Human review |
| Status mismatch | Same ID, different status | Human review |
Deep Dive 6: Fraud Detection
Rule-based (fast, deterministic):
Block if:
- Same card: > 5 failed attempts in 10 minutes
- Transaction amount > $10,000 (trigger enhanced verification)
- Card used in 3+ countries in 1 hour (impossible travel)
- IP address in high-risk block list
- Velocity check: > 20 transactions/hour from same device
Pros: Fast (<1ms), explainable, easy to update rules
Cons: Can be reverse-engineered by fraudsters, misses novel patterns
ML-based (powerful, adaptive):
Real-time scoring pipeline:
Transaction event β Feature extraction β ML model β Risk score
Features:
- Transaction amount (vs user's average)
- Time of day and day of week
- Geographic velocity (how far from last transaction?)
- Device fingerprint (new device vs known device?)
- Merchant category (unusual for this user?)
- Historical decline rate for this card
Output:
- Risk score: 0.0 (safe) to 1.0 (fraud)
- Score < 0.3: Allow
- Score 0.3-0.7: 3DS challenge (bank OTP)
- Score > 0.7: Block
Latency requirement: < 100ms (must not slow down checkout)
3DS (3D Secure) β Bank OTP:
For high-risk transactions:
1. Payment service detects high risk score
2. Redirect buyer to bank's authentication page
3. Buyer enters OTP sent to their phone
4. Bank confirms identity β Proceed
5. Liability shifts from merchant to bank (important!)
3DS 1.0: Pop-up redirect (bad UX)
3DS 2.0: Frictionless in most cases, challenge only when needed
PCI-DSS Compliance (Tokenization):
NEVER store raw card numbers (PCI-DSS requirement)
Flow:
Buyer enters card β Stripe.js captures it in browser
β
Stripe stores card, returns a TOKEN (e.g., "tok_visa_4242")
β
Your server only ever sees the token
β
Token is useless to attackers (can't charge with token alone)
Benefits:
- Your servers NEVER touch raw card data
- PCI-DSS scope dramatically reduced
- Breach of your DB doesn't expose card numbers
Design Summary
Complete System Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Payment Platform β
β β
β Buyer βββββββββββββββββββββββββββββββββββββββββββββ β
β Browser βββββ β API Gateway / Load Balancer β β
β ββββββββββββββββββββ¬βββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββββΌβββββββββββββββββββββββββ β
β β Payment Service β β
β β - Create payment orders β β
β β - Idempotency key deduplication β β
β β - State machine management β β
β β - PSP integration β β
β ββββ¬βββββββββββββββ¬βββββββββββββ¬ββββββββββββββ β
β β β β β
β βββββββββΌβββββ ββββββββΌβββββ ββββββΌβββββββ β
β β Orders DB β β Ledger β β Wallet β β
β β(PostgreSQL)β β Service β β Service β β
β ββββββββββββββ β(append- β β(balances) β β
β β only DB) β βββββββββββββ β
β βββββββββββββ β
β β β
β ββββββββββββββββββΌββββββββββββββββββββββββββββ β
β β PSP Adaptor β β
β β - Stripe API calls β β
β β - Webhook receiver β β
β β - Retry with idempotency key β β
β ββββββββββββββββββββ¬ββββββββββββββββββββββββββ β
β β β
ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ
β
βββββββββββββββΌβββββββββββββββ
β Stripe / Adyen β
β (External PSP) β
β - Talks to card networks β
β - Handles card storage β
β - Sends webhooks β
ββββββββββββββββββββββββββββββ
Key Decisions Summary
| Decision | Choice | Reasoning |
|---|---|---|
| Exactly-once | Idempotency key (UUID) | Prevents double charges on retry |
| Bookkeeping | Double-entry ledger | Every dollar always accounted for |
| Card storage | PSP tokenization | PCI-DSS, never touch raw card numbers |
| PSP | Stripe / Adyen | Donβt build banking connections in-house |
| Consistency | Strong (RDBMS) | Financial data must be accurate |
| Failure recovery | Retry + reconciliation | Idempotency for retry, reconciliation as safety net |
| Fraud | Rule-based + ML | Rules for speed, ML for novel patterns |
| Currency | Store in cents (integer) | Avoid floating-point rounding errors |
Interview Questions & Answers
Q: What is an idempotency key and why is it critical in payments?
A: An idempotency key is a unique UUID the client generates per payment intent and reuses on retries. The server uses it to detect duplicate requests: if a key was already processed, return the cached result without re-executing. This is critical because network timeouts and service crashes cause retries, and without idempotency every retry risks a double charge. PSPs like Stripe support this natively via the Idempotency-Key HTTP header.
Q: Explain double-entry bookkeeping and why payment systems use it.
A: Double-entry bookkeeping means every transaction creates exactly two ledger entries β a debit on one account and a credit on another β so the net sum is always zero. Payment systems use it because: (1) it creates a complete audit trail where every dollar is always somewhere, (2) a non-zero sum signals a bug, (3) it is required by financial regulators, and (4) reversals are handled cleanly by creating new offsetting entries rather than modifying history.
Q: What is reconciliation and when does it trigger?
A: Reconciliation is a nightly process comparing our internal ledger against the PSPβs transaction statement. It catches discrepancies that slipped through despite idempotency β e.g., a missed webhook leaving a payment in EXECUTING state while PSP considers it SUCCESS. Obvious mismatches (like a missing webhook) are auto-fixed; ambiguous mismatches (status conflicts, amount differences) are flagged for human review. Reconciliation is the systemβs final safety net.
Q: How do you handle a scenario where a payment times out and the client retries?
A: The client must retry with the exact same idempotency key as the original request. The server checks the idempotency table: if the original request already processed (even if the response was lost in transit), the server returns the stored result. If it did not process, the retry is safe to execute as a first attempt. This pattern means βretry freely, but always carry the same key.β
Q: Why store money amounts as integers (cents) rather than floats?
A: Floating-point numbers cannot represent all decimal values exactly. For example, 0.1 + 0.2 = 0.30000000000000004 in IEEE 754. In financial systems, these rounding errors accumulate across billions of transactions and can result in cents going missing or appearing from nowhere. Storing amounts in the smallest currency unit (cents for USD, paise for INR) as integers avoids this entirely.
Q: How does PCI-DSS tokenization work in your architecture?
A: The card number never touches our servers. Stripe.js runs in the buyerβs browser and directly captures the card number, sends it to Stripeβs servers, and returns a single-use token (e.g., tok_visa_4242). Our payment service receives only the token. Stripe maps the token back to the real card when processing. If our database is breached, attackers get tokens β which are useless without Stripeβs cooperation. This keeps our PCI-DSS audit scope very small.
Key Takeaways
- Idempotency key (UUID per payment) is the single most important mechanism β prevents all double charges on retries.
- Double-entry bookkeeping with an append-only ledger is the financial industry standard β every entry is permanent, reversals use offsetting entries.
- Never build your own PSP β use Stripe/Adyen; they handle card networks, compliance, and fraud tools.
- PCI-DSS tokenization: Card numbers never touch your servers; PSP returns a token that is worthless to attackers.
- State machine with DB-first: Always persist state before calling external services so you can always recover from crashes.
- Reconciliation is the safety net: Even with idempotency, do nightly reconciliation against PSP statement to catch anything that slipped through.
- Amounts in integers (cents): Never use floating-point for money β use integer cents to avoid rounding errors.
- Fraud layered defense: Fast rule-based checks + ML scoring + 3DS challenge for high-risk transactions.
Related Resources
- ch12-digital-wallet - Wallet service, distributed transfers, event sourcing
- distributed-system-components - Databases, queues, and caches used in payment systems
- key-patterns > Idempotency - Idempotency pattern deep dive
- ch04-rate-limiter - Rate limiting to prevent payment API abuse
Practice this design! Extremely common in fintech interviews. Be ready to:
- Draw the complete pay-in flow including PSP integration
- Explain idempotency key mechanics step by step
- Describe double-entry bookkeeping with a concrete example
- Walk through how reconciliation catches discrepancies
- Justify every design decision (especially why PSP, why integer cents, why append-only ledger)
Last Updated: 2026-04-13
Status: Very common in fintech interviews β Must know!