System Design Interview Evaluation Rubric

practice rubric evaluation

A detailed rubric mirroring what interviewers at top tech companies actually evaluate. Use this for self-assessment after every practice session and for structured peer feedback in mock interviews.


Section 1: Overall Evaluation Dimensions (with Weight)

DimensionWeightWhat It Measures
Problem Scoping & Clarification20%Can you define the right problem before solving it?
High-Level Design Quality20%Can you draw a coherent, complete architecture?
Deep Dive Depth & Correctness25%Do you have real technical depth in the right areas?
Trade-Off Discussion20%Can you reason about choices, not just make them?
Communication & Thought Process15%Is your reasoning transparent and structured?

Overall decision thresholds (rough guideline):

  • Strong Hire: 85%+ weighted score, at least “Hire” in every dimension
  • Hire: 70%+ weighted score, no “No hire” in any dimension
  • No Decision: 55–70% weighted score, or mixed signals
  • No Hire: Below 55%, or “No hire” in Deep Dive or Trade-Offs

Section 2: Detailed Rubric Per Dimension

Dimension 1: Problem Scoping & Clarification (20%)

LevelScoreWhat It Looks LikeExample Behaviors
Strong Hire4/4Asks exactly the right questions, eliminates ambiguity fast, scopes precisely”Before I start, I want to confirm: are we optimizing for read latency or write throughput? And is this a global service or single-region?”
Hire3/4Asks good questions, covers functional + non-functional requirements, states assumptionsAsks about scale, consistency, latency. Writes down requirements. States 2-3 explicit assumptions.
No Decision2/4Asks some questions but misses key ones, or over-asks trivial thingsAsks about UI design (irrelevant) but doesn’t ask about scale. Or asks 10 questions with no prioritization.
No Hire1/4Skips clarification entirely, or gets requirements badly wrongImmediately starts drawing. Or designs for 1K users when the problem implied 1B.

Specific signals to watch for:

  • Asks about read/write ratio: +signal
  • Asks about availability/consistency requirement: +signal
  • States explicit assumptions before designing: +signal
  • Asks about authentication/authorization (rarely relevant at this level): neutral/minor waste
  • Never writes down requirements: -signal
  • Still asking clarifying questions at minute 8: -signal

Dimension 2: High-Level Design Quality (20%)

LevelScoreWhat It Looks LikeExample Behaviors
Strong Hire4/4Clean, complete architecture that naturally evolves from requirements; APIs and data model are preciseDefines 3 APIs with request/response, draws data model with key entities and relationships, draws system with all critical components labeled
Hire3/4Solid architecture covering main components; minor gaps or imprecisionHas LB, app servers, cache, DB. APIs are sketched. Data model has main entities. Might miss CDN or message queue.
No Decision2/4Architecture covers basics but has notable gaps or unclear data flowHas a DB and a server. Can’t clearly explain data flow. API is vague or missing.
No Hire1/4Architecture is missing critical components or fundamentally incorrectNo caching when clearly needed. No load balancing for a high-scale system. Or single-point-of-failure design with no awareness.

Specific signals to watch for:

  • Draws system first, then explains: +signal (visual thinking)
  • Explains “why” for each component chosen: +signal
  • Evolves design from simple to complex: +signal
  • Names components correctly (not just “a database”): +signal
  • All arrows are labeled with protocols or data types: +signal
  • Puts everything on one server with no scaling plan: -signal
  • Draws system without explaining anything: -signal

Dimension 3: Deep Dive Depth & Correctness (25%)

LevelScoreWhat It Looks LikeExample Behaviors
Strong Hire4/4Goes genuinely deep into 2+ areas; correctness is high; raises non-obvious challenges proactively”The sharding key matters here because [X], and if we pick [Y] instead, we’ll get hotspots when [Z]. I’d use [W] and handle re-balancing by…”
Hire3/4Good depth in at least 1 area; mostly correct; identifies the main technical challengesCan explain consistent hashing with virtual nodes. Can articulate why fan-out on write is better for low-follower users. Minor technical gaps.
No Decision2/4Shallow depth; correct at surface level but can’t go deeper under probingKnows Kafka is used but can’t explain partitioning or consumer group offset. Says “we’d shard” but can’t explain the shard key logic.
No Hire1/4Incorrect answers when probed; can’t go deeper than naming technologiesStates wrong facts under pressure. Can’t explain the difference between a hash partition and range partition.

Specific signals to watch for:

  • Proactively identifies scaling bottleneck without being asked: ++signal
  • Quantifies estimates during deep dive (e.g., “at 10K QPS, one DB can handle this, but at 100K we’d need…”): +signal
  • Knows specific numbers (Redis ~100K ops/sec, Kafka throughput): +signal
  • Correctly uses terms like “idempotency,” “quorum,” “write amplification”: +signal
  • Can whiteboard an algorithm step-by-step: +signal
  • Misuses a term (e.g., says “eventual consistency” when describing a bank transaction): -signal
  • Drops a technology name but can’t explain its properties: -signal

Dimension 4: Trade-Off Discussion (20%)

LevelScoreWhat It Looks LikeExample Behaviors
Strong Hire4/4Every design decision is accompanied by trade-off analysis; acknowledges costs of chosen approach; considers alternatives explicitly”I’m choosing fan-out on write here, which gives faster reads but more write amplification. For celebrity accounts — users with >1M followers — I’d switch to fan-out on read to avoid the write storm.”
Hire3/4Most major decisions include a trade-off; can explain pros and cons when askedWhen asked “why Redis?”, gives a coherent answer about speed + TTL + data structures. Knows when to use SQL vs NoSQL.
No Decision2/4Makes decisions without trade-off analysis; gives trade-offs only when explicitly probedSays “I’ll use Kafka” but doesn’t explain why or what the alternative is. Only acknowledges trade-offs when the interviewer pushes.
No Hire1/4No trade-off discussion at all; or gives wrong trade-offs (incorrect technical facts)“I’ll use NoSQL because it’s faster” without any qualification. Or contradicts themselves on consistency requirements.

Specific signals to watch for:

  • Voluntarily says “the trade-off here is…”: ++signal
  • Mentions the CAP theorem correctly in context: +signal
  • Compares two specific options head-to-head: +signal
  • Acknowledges what their design gives up: +signal
  • Uses “it depends” correctly (with context): +signal
  • Uses “it depends” as a dodge without following up: -signal
  • Claims a technology is strictly better with no caveats: -signal

Dimension 5: Communication & Thought Process (15%)

LevelScoreWhat It Looks LikeExample Behaviors
Strong Hire4/4Consistently thinks out loud; structured narrative; responds well to hints; engages interviewer”I’m considering two approaches here. Option 1 is X, which has advantages A and B. Option 2 is Y, which solves C but at cost D. I’ll go with X because of our read-heavy workload.”
Hire3/4Generally thinks out loud; mostly structured; takes hints well; diagram is clearTalks through reasoning most of the time. Clear whiteboard. Picks up on interviewer nudges after 1-2 hints.
No Decision2/4Occasional silence; reasoning not always clear; slow to take hints30-60 seconds of silence when stuck. Continues in wrong direction after subtle hint. Diagram is unclear.
No Hire1/4Long silences; reasoning is opaque; ignores or doesn’t notice interviewer hints2+ minutes of silence. Interviewer has to give very direct corrective feedback. Doesn’t engage.

Specific signals to watch for:

  • Narrates their thinking in real time: +signal
  • Pauses and says “let me think about this for a second” then continues quickly: +signal (shows metacognition)
  • Uses structured transitions: “Now that I’ve covered X, let me move to Y”: +signal
  • Engages the interviewer: “Does this approach make sense to you?” or “Would you like me to go deeper here?”: +signal
  • Reacts gracefully to pushback: “That’s a good point — I hadn’t considered Z. Let me revise…”: ++signal
  • Long silent pause without verbalization: -signal
  • Defensive reaction to interviewer questions: -signal
  • “I’ll come back to that” and never does: -signal

Section 3: Common Mistakes That Immediately Signal “No Hire”

MistakeWhy It’s Disqualifying
Skipping requirements entirely, jumping straight to solutionSuggests you solve the wrong problem in real life
Claiming a system has “infinite scalability” or “zero downtime”Shows you don’t understand fundamental engineering trade-offs
Using a term incorrectly and insisting you’re right when correctedSignals poor technical foundation and inability to learn
Designing a payment system without mentioning idempotencyCritical knowledge gap for financial systems
Ignoring the interviewer’s hints three or more timesPoor collaboration; will be difficult to mentor or work with
Complete silence for >2 minutes with no outputCannot handle ambiguity under pressure
Saying “I would just use microservices” as a solution to scalingBuzzword response with no substance
Designing a system with a single database and no replication for a 99.99% SLADoesn’t know what high availability requires
Changing your entire design when the interviewer pushes back, with no defenseLack of conviction; suggests design is not grounded in reasoning
Confusing CAP theorem (claiming all three properties are achievable simultaneously)Fundamental misconception of distributed systems

Section 4: Signals That Immediately Impress Interviewers (Green Flags)

SignalWhy It Impresses
Doing back-of-envelope math proactively before designingShows quantitative thinking; grounds design in reality
Identifying a non-obvious bottleneck that the interviewer expected to promptDemonstrates experience with production systems
Saying “here are three approaches; I’ll use X for these reasons, but Y is better if…”Shows depth, alternatives awareness, and decision-making
Knowing specific system characteristics (Redis ops/sec, Kafka message size limits)Signals real-world experience, not just textbook knowledge
Drawing a clean diagram that evolves through versions (V1 simple → V2 scaled)Mirrors how real systems are built; shows architectural thinking
Proactively asking “what level of consistency does this business case require?”Shows understanding that technical choices have business implications
Catching your own mistake mid-design and correcting it: “Wait, this doesn’t work because…”Shows strong self-review and intellectual honesty
Mentioning observability (metrics, tracing, alerting) without being askedSignals production-readiness mindset
Discussing data model in terms of access patterns, not just normalizationShows understanding of NoSQL vs SQL trade-offs at depth
Explicitly mentioning idempotency for any write operationSignals awareness of distributed systems failure modes

Section 5: Self-Evaluation Checklist

After every practice session, answer these 20 questions honestly (yes / partial / no).

Requirements & Scoping

  • Did I ask about scale (DAU, QPS)?
  • Did I ask about consistency vs. availability requirements?
  • Did I explicitly state my assumptions before designing?
  • Did I keep requirements under 5 minutes?

High-Level Design

  • Did I define at least 2 core API endpoints?
  • Did I draw a system diagram with labeled components?
  • Did I describe the data model with key entities?
  • Did I explain the purpose of each major component?

Deep Dive

  • Did I go genuinely deep into at least one component?
  • Did I identify and address the main scaling bottleneck?
  • Were my technical statements correct (no wrong facts)?
  • Did I quantify at least one thing with numbers?

Trade-Offs

  • Did I explain why I chose each major technology?
  • Did I name an alternative for at least one choice?
  • Did I acknowledge what my design gives up?
  • Did I use trade-off framing (“X is better for Y, but costs Z”)?

Communication

  • Did I think out loud consistently?
  • Did I take any interviewer hints gracefully?
  • Did I manage my time well (covered all 4 phases)?
  • Did I end with a wrap-up (bottlenecks, monitoring, improvements)?

Scoring guide:

  • 18-20 yes: Strong hire performance
  • 14-17 yes: Hire performance — identify the partial/no answers as focus areas
  • 10-13 yes: No decision — significant work needed in multiple areas
  • Below 10: No hire — return to fundamentals; practice framework adherence

Section 6: How to Give Feedback in Peer Mock Interviews

Before the Session

  • Agree on the problem in advance (or interviewer picks it without telling the candidate).
  • Use this rubric as your evaluation sheet.
  • Take notes during the session on specific behaviors, not just impressions.
  • Don’t interrupt during the 45-minute session — note questions for feedback phase.

During the Session (Interviewer Role)

  • Use this rubric to track signals in real time.
  • Note timestamps: when did requirements end? When did deep dive start?
  • Write down exact quotes that were strong or weak.
  • Give hints using these prompts (graduated from subtle to direct):
    1. Subtle: “Interesting — what happens at scale?”
    2. Moderate: “How would this handle the case where there are millions of followers?”
    3. Direct: “I’m thinking specifically about how you’d handle write amplification here.”

After the Session (Feedback Structure)

Use this structure for the 15-minute debrief:

1. Self-assessment first (2 min)
Ask the candidate: “How do you think that went? What would you do differently?”

2. Strengths (3 min)
Call out 2-3 specific strong moments with exact examples.
“When you proactively raised the celebrity problem without being asked — that was a strong signal.”

3. Development areas (5 min)
Identify the 2-3 most impactful areas to improve.
“You didn’t discuss trade-offs for the DB choice. When you said ‘I’ll use PostgreSQL’, the follow-up is always: ‘because of X, and the alternative would be Y in a different scenario.’”

4. Dimension scores (3 min)
Walk through the 5 dimensions with scores.
“On trade-offs I’d give you a 2/4 today. Here’s specifically what would make it a 3 or 4…”

5. Next action (2 min)
One concrete thing to focus on in the next session.
“For next time: after every technology choice, force yourself to add ‘because X, and the trade-off is Y.’”

Feedback Anti-Patterns to Avoid

Anti-PatternBetter Approach
”That was pretty good overall”Give specific scores on each dimension
”You should have mentioned Kafka""The design had an async processing step — what options did you consider there?”
Focusing only on technical gapsBalance technical and communication feedback
”You were too slow""You spent 12 minutes on requirements — ideally that’s 5 min. Here’s how to tighten it…”
Piling on 10 improvement areasPick top 2-3 most impactful; the rest can wait

Section 7: Level-Specific Expectations

Mid-Level / Junior Senior (L4–L5 equivalent)

Expected depth: Can design systems correctly using the textbook approach. Knows when to use caching, sharding, and message queues. Can articulate the main trade-offs when prompted.

Expected breadth: Comfortable with 10-15 common systems. Can handle any problem from Vol1 and the simpler Vol2 chapters.

What to demonstrate:

  • Clear framework application (all 4 steps, in order)
  • Correct use of standard patterns (cache-aside, fan-out, consistent hashing)
  • Trade-offs when prompted by interviewer
  • Back-of-envelope estimation (even if rough)

What is NOT expected at L4:

  • Proactively raising non-obvious bottlenecks
  • Deep knowledge of distributed consensus algorithms
  • Experience-based anecdotes from production systems

Common failure modes at this level:

  • Skipping estimation
  • Not drawing a diagram
  • Knowing names of technologies without knowing their properties

Senior (L5–L6 equivalent)

Expected depth: Can go deep on 2+ areas per problem. Identifies trade-offs before being asked. Has real opinions backed by reasoning. Knows the “why” behind patterns.

Expected breadth: Comfortable with all Vol1 and most Vol2 chapters. Can handle novel problem types by applying patterns they know.

What to demonstrate:

  • Proactively identifying the hardest scaling challenges
  • Comparing alternatives head-to-head with specifics
  • Quantifying trade-offs (“if we fan-out to 100M followers, that’s 100M writes per post”)
  • Knowing specific numbers and when they matter

What is expected at L5 that wasn’t at L4:

  • Raising the celebrity / hotspot / thundering herd problem without prompting
  • Knowing the difference between at-least-once and exactly-once delivery and when each matters
  • Understanding consistency models (eventual, strong, monotonic reads) and choosing appropriately

Common failure modes at this level:

  • Good breadth but shallow depth in every area (wide but not deep)
  • Trade-offs only when prompted — not proactively
  • Missing the most important deep-dive area (e.g., spending 20 min on API design for a storage system instead of the storage layer)

Staff+ (L6+ equivalent)

Expected depth: Can design novel systems outside the book’s examples. Identifies second-order effects and emergent problems at scale. Challenges assumptions productively.

Expected breadth: Fluent across all system types. Can cross-reference patterns across domains (e.g., “this is similar to the problem Kafka solves, applied to X”).

What to demonstrate:

  • System-level judgment: not just “what” but “when this design breaks and why”
  • Operational concerns: observability, deployment, rollout strategy
  • Cross-cutting concerns: security, compliance, cost, team topology
  • Can scope the problem themselves: “Given that we have 45 min, I want to focus on X because it’s the most novel part — happy to skip Y since that’s standard.”

What is expected at L6+ that wasn’t at L5:

  • Proactively scoping the interview: “What aspect is most interesting to you — the storage layer or the real-time delivery path?”
  • Knowing when the standard solution is wrong and proposing a better one
  • Discussing failure scenarios at the systems level (not just component level): “If this entire AZ goes down during a write, how does the system recover with no data loss?”
  • Mentioning organizational / team concerns: “This design implies two teams need to coordinate on schema changes — we’d want to version the API here.”

Common failure modes at this level:

  • Being too academic (correct theory, missing operational pragmatism)
  • Spending all 45 min on one deep dive (even a great one); needs to demonstrate breadth too
  • Over-engineering: jumping to a distributed consensus protocol when a simpler solution suffices

Last Updated: 2026-04-13