System Design Interview Evaluation Rubric

practice rubric evaluation

A detailed rubric mirroring what interviewers at top tech companies actually evaluate. Use this for self-assessment after every practice session and for structured peer feedback in mock interviews.

Section 1: Overall Evaluation Dimensions (with Weight)

Dimension	Weight	What It Measures
Problem Scoping & Clarification	20%	Can you define the right problem before solving it?
High-Level Design Quality	20%	Can you draw a coherent, complete architecture?
Deep Dive Depth & Correctness	25%	Do you have real technical depth in the right areas?
Trade-Off Discussion	20%	Can you reason about choices, not just make them?
Communication & Thought Process	15%	Is your reasoning transparent and structured?

Overall decision thresholds (rough guideline):

Strong Hire: 85%+ weighted score, at least “Hire” in every dimension
Hire: 70%+ weighted score, no “No hire” in any dimension
No Decision: 55–70% weighted score, or mixed signals
No Hire: Below 55%, or “No hire” in Deep Dive or Trade-Offs

Section 2: Detailed Rubric Per Dimension

Dimension 1: Problem Scoping & Clarification (20%)

Level	Score	What It Looks Like	Example Behaviors
Strong Hire	4/4	Asks exactly the right questions, eliminates ambiguity fast, scopes precisely	”Before I start, I want to confirm: are we optimizing for read latency or write throughput? And is this a global service or single-region?”
Hire	3/4	Asks good questions, covers functional + non-functional requirements, states assumptions	Asks about scale, consistency, latency. Writes down requirements. States 2-3 explicit assumptions.
No Decision	2/4	Asks some questions but misses key ones, or over-asks trivial things	Asks about UI design (irrelevant) but doesn’t ask about scale. Or asks 10 questions with no prioritization.
No Hire	1/4	Skips clarification entirely, or gets requirements badly wrong	Immediately starts drawing. Or designs for 1K users when the problem implied 1B.

Specific signals to watch for:

Asks about read/write ratio: +signal
Asks about availability/consistency requirement: +signal
States explicit assumptions before designing: +signal
Asks about authentication/authorization (rarely relevant at this level): neutral/minor waste
Never writes down requirements: -signal
Still asking clarifying questions at minute 8: -signal

Dimension 2: High-Level Design Quality (20%)

Level	Score	What It Looks Like	Example Behaviors
Strong Hire	4/4	Clean, complete architecture that naturally evolves from requirements; APIs and data model are precise	Defines 3 APIs with request/response, draws data model with key entities and relationships, draws system with all critical components labeled
Hire	3/4	Solid architecture covering main components; minor gaps or imprecision	Has LB, app servers, cache, DB. APIs are sketched. Data model has main entities. Might miss CDN or message queue.
No Decision	2/4	Architecture covers basics but has notable gaps or unclear data flow	Has a DB and a server. Can’t clearly explain data flow. API is vague or missing.
No Hire	1/4	Architecture is missing critical components or fundamentally incorrect	No caching when clearly needed. No load balancing for a high-scale system. Or single-point-of-failure design with no awareness.

Specific signals to watch for:

Draws system first, then explains: +signal (visual thinking)
Explains “why” for each component chosen: +signal
Evolves design from simple to complex: +signal
Names components correctly (not just “a database”): +signal
All arrows are labeled with protocols or data types: +signal
Puts everything on one server with no scaling plan: -signal
Draws system without explaining anything: -signal

Dimension 3: Deep Dive Depth & Correctness (25%)

Level	Score	What It Looks Like	Example Behaviors
Strong Hire	4/4	Goes genuinely deep into 2+ areas; correctness is high; raises non-obvious challenges proactively	”The sharding key matters here because [X], and if we pick [Y] instead, we’ll get hotspots when [Z]. I’d use [W] and handle re-balancing by…”
Hire	3/4	Good depth in at least 1 area; mostly correct; identifies the main technical challenges	Can explain consistent hashing with virtual nodes. Can articulate why fan-out on write is better for low-follower users. Minor technical gaps.
No Decision	2/4	Shallow depth; correct at surface level but can’t go deeper under probing	Knows Kafka is used but can’t explain partitioning or consumer group offset. Says “we’d shard” but can’t explain the shard key logic.
No Hire	1/4	Incorrect answers when probed; can’t go deeper than naming technologies	States wrong facts under pressure. Can’t explain the difference between a hash partition and range partition.

Specific signals to watch for:

Proactively identifies scaling bottleneck without being asked: ++signal
Quantifies estimates during deep dive (e.g., “at 10K QPS, one DB can handle this, but at 100K we’d need…”): +signal
Knows specific numbers (Redis ~100K ops/sec, Kafka throughput): +signal
Correctly uses terms like “idempotency,” “quorum,” “write amplification”: +signal
Can whiteboard an algorithm step-by-step: +signal
Misuses a term (e.g., says “eventual consistency” when describing a bank transaction): -signal
Drops a technology name but can’t explain its properties: -signal

Dimension 4: Trade-Off Discussion (20%)

Level	Score	What It Looks Like	Example Behaviors
Strong Hire	4/4	Every design decision is accompanied by trade-off analysis; acknowledges costs of chosen approach; considers alternatives explicitly	”I’m choosing fan-out on write here, which gives faster reads but more write amplification. For celebrity accounts — users with >1M followers — I’d switch to fan-out on read to avoid the write storm.”
Hire	3/4	Most major decisions include a trade-off; can explain pros and cons when asked	When asked “why Redis?”, gives a coherent answer about speed + TTL + data structures. Knows when to use SQL vs NoSQL.
No Decision	2/4	Makes decisions without trade-off analysis; gives trade-offs only when explicitly probed	Says “I’ll use Kafka” but doesn’t explain why or what the alternative is. Only acknowledges trade-offs when the interviewer pushes.
No Hire	1/4	No trade-off discussion at all; or gives wrong trade-offs (incorrect technical facts)	“I’ll use NoSQL because it’s faster” without any qualification. Or contradicts themselves on consistency requirements.

Specific signals to watch for:

Voluntarily says “the trade-off here is…”: ++signal
Mentions the CAP theorem correctly in context: +signal
Compares two specific options head-to-head: +signal
Acknowledges what their design gives up: +signal
Uses “it depends” correctly (with context): +signal
Uses “it depends” as a dodge without following up: -signal
Claims a technology is strictly better with no caveats: -signal

Dimension 5: Communication & Thought Process (15%)

Level	Score	What It Looks Like	Example Behaviors
Strong Hire	4/4	Consistently thinks out loud; structured narrative; responds well to hints; engages interviewer	”I’m considering two approaches here. Option 1 is X, which has advantages A and B. Option 2 is Y, which solves C but at cost D. I’ll go with X because of our read-heavy workload.”
Hire	3/4	Generally thinks out loud; mostly structured; takes hints well; diagram is clear	Talks through reasoning most of the time. Clear whiteboard. Picks up on interviewer nudges after 1-2 hints.
No Decision	2/4	Occasional silence; reasoning not always clear; slow to take hints	30-60 seconds of silence when stuck. Continues in wrong direction after subtle hint. Diagram is unclear.
No Hire	1/4	Long silences; reasoning is opaque; ignores or doesn’t notice interviewer hints	2+ minutes of silence. Interviewer has to give very direct corrective feedback. Doesn’t engage.

Specific signals to watch for:

Narrates their thinking in real time: +signal
Pauses and says “let me think about this for a second” then continues quickly: +signal (shows metacognition)
Uses structured transitions: “Now that I’ve covered X, let me move to Y”: +signal
Engages the interviewer: “Does this approach make sense to you?” or “Would you like me to go deeper here?”: +signal
Reacts gracefully to pushback: “That’s a good point — I hadn’t considered Z. Let me revise…”: ++signal
Long silent pause without verbalization: -signal
Defensive reaction to interviewer questions: -signal
“I’ll come back to that” and never does: -signal

Section 3: Common Mistakes That Immediately Signal “No Hire”

Mistake	Why It’s Disqualifying
Skipping requirements entirely, jumping straight to solution	Suggests you solve the wrong problem in real life
Claiming a system has “infinite scalability” or “zero downtime”	Shows you don’t understand fundamental engineering trade-offs
Using a term incorrectly and insisting you’re right when corrected	Signals poor technical foundation and inability to learn
Designing a payment system without mentioning idempotency	Critical knowledge gap for financial systems
Ignoring the interviewer’s hints three or more times	Poor collaboration; will be difficult to mentor or work with
Complete silence for >2 minutes with no output	Cannot handle ambiguity under pressure
Saying “I would just use microservices” as a solution to scaling	Buzzword response with no substance
Designing a system with a single database and no replication for a 99.99% SLA	Doesn’t know what high availability requires
Changing your entire design when the interviewer pushes back, with no defense	Lack of conviction; suggests design is not grounded in reasoning
Confusing CAP theorem (claiming all three properties are achievable simultaneously)	Fundamental misconception of distributed systems

Section 4: Signals That Immediately Impress Interviewers (Green Flags)

Signal	Why It Impresses
Doing back-of-envelope math proactively before designing	Shows quantitative thinking; grounds design in reality
Identifying a non-obvious bottleneck that the interviewer expected to prompt	Demonstrates experience with production systems
Saying “here are three approaches; I’ll use X for these reasons, but Y is better if…”	Shows depth, alternatives awareness, and decision-making
Knowing specific system characteristics (Redis ops/sec, Kafka message size limits)	Signals real-world experience, not just textbook knowledge
Drawing a clean diagram that evolves through versions (V1 simple → V2 scaled)	Mirrors how real systems are built; shows architectural thinking
Proactively asking “what level of consistency does this business case require?”	Shows understanding that technical choices have business implications
Catching your own mistake mid-design and correcting it: “Wait, this doesn’t work because…”	Shows strong self-review and intellectual honesty
Mentioning observability (metrics, tracing, alerting) without being asked	Signals production-readiness mindset
Discussing data model in terms of access patterns, not just normalization	Shows understanding of NoSQL vs SQL trade-offs at depth
Explicitly mentioning idempotency for any write operation	Signals awareness of distributed systems failure modes

Section 5: Self-Evaluation Checklist

After every practice session, answer these 20 questions honestly (yes / partial / no).

Requirements & Scoping

Did I ask about scale (DAU, QPS)?
Did I ask about consistency vs. availability requirements?
Did I explicitly state my assumptions before designing?
Did I keep requirements under 5 minutes?

High-Level Design

Did I define at least 2 core API endpoints?
Did I draw a system diagram with labeled components?
Did I describe the data model with key entities?
Did I explain the purpose of each major component?

Deep Dive

Did I go genuinely deep into at least one component?
Did I identify and address the main scaling bottleneck?
Were my technical statements correct (no wrong facts)?
Did I quantify at least one thing with numbers?

Trade-Offs

Did I explain why I chose each major technology?
Did I name an alternative for at least one choice?
Did I acknowledge what my design gives up?
Did I use trade-off framing (“X is better for Y, but costs Z”)?

Communication

Did I think out loud consistently?
Did I take any interviewer hints gracefully?
Did I manage my time well (covered all 4 phases)?
Did I end with a wrap-up (bottlenecks, monitoring, improvements)?

Scoring guide:

18-20 yes: Strong hire performance
14-17 yes: Hire performance — identify the partial/no answers as focus areas
10-13 yes: No decision — significant work needed in multiple areas
Below 10: No hire — return to fundamentals; practice framework adherence

Section 6: How to Give Feedback in Peer Mock Interviews

Before the Session

Agree on the problem in advance (or interviewer picks it without telling the candidate).
Use this rubric as your evaluation sheet.
Take notes during the session on specific behaviors, not just impressions.
Don’t interrupt during the 45-minute session — note questions for feedback phase.

During the Session (Interviewer Role)

Use this rubric to track signals in real time.
Note timestamps: when did requirements end? When did deep dive start?
Write down exact quotes that were strong or weak.
Give hints using these prompts (graduated from subtle to direct):
1. Subtle: “Interesting — what happens at scale?”
2. Moderate: “How would this handle the case where there are millions of followers?”
3. Direct: “I’m thinking specifically about how you’d handle write amplification here.”

After the Session (Feedback Structure)

Use this structure for the 15-minute debrief:

1. Self-assessment first (2 min)
Ask the candidate: “How do you think that went? What would you do differently?”

2. Strengths (3 min)
Call out 2-3 specific strong moments with exact examples.
“When you proactively raised the celebrity problem without being asked — that was a strong signal.”

3. Development areas (5 min)
Identify the 2-3 most impactful areas to improve.
“You didn’t discuss trade-offs for the DB choice. When you said ‘I’ll use PostgreSQL’, the follow-up is always: ‘because of X, and the alternative would be Y in a different scenario.’”

4. Dimension scores (3 min)
Walk through the 5 dimensions with scores.
“On trade-offs I’d give you a 2/4 today. Here’s specifically what would make it a 3 or 4…”

5. Next action (2 min)
One concrete thing to focus on in the next session.
“For next time: after every technology choice, force yourself to add ‘because X, and the trade-off is Y.’”

Feedback Anti-Patterns to Avoid

Anti-Pattern	Better Approach
”That was pretty good overall”	Give specific scores on each dimension
”You should have mentioned Kafka"	"The design had an async processing step — what options did you consider there?”
Focusing only on technical gaps	Balance technical and communication feedback
”You were too slow"	"You spent 12 minutes on requirements — ideally that’s 5 min. Here’s how to tighten it…”
Piling on 10 improvement areas	Pick top 2-3 most impactful; the rest can wait

Section 7: Level-Specific Expectations

Mid-Level / Junior Senior (L4–L5 equivalent)

Expected depth: Can design systems correctly using the textbook approach. Knows when to use caching, sharding, and message queues. Can articulate the main trade-offs when prompted.

Expected breadth: Comfortable with 10-15 common systems. Can handle any problem from Vol1 and the simpler Vol2 chapters.

What to demonstrate:

Clear framework application (all 4 steps, in order)
Correct use of standard patterns (cache-aside, fan-out, consistent hashing)
Trade-offs when prompted by interviewer
Back-of-envelope estimation (even if rough)

What is NOT expected at L4:

Proactively raising non-obvious bottlenecks
Deep knowledge of distributed consensus algorithms
Experience-based anecdotes from production systems

Common failure modes at this level:

Skipping estimation
Not drawing a diagram
Knowing names of technologies without knowing their properties

Senior (L5–L6 equivalent)

Expected depth: Can go deep on 2+ areas per problem. Identifies trade-offs before being asked. Has real opinions backed by reasoning. Knows the “why” behind patterns.

Expected breadth: Comfortable with all Vol1 and most Vol2 chapters. Can handle novel problem types by applying patterns they know.

What to demonstrate:

Proactively identifying the hardest scaling challenges
Comparing alternatives head-to-head with specifics
Quantifying trade-offs (“if we fan-out to 100M followers, that’s 100M writes per post”)
Knowing specific numbers and when they matter

What is expected at L5 that wasn’t at L4:

Raising the celebrity / hotspot / thundering herd problem without prompting
Knowing the difference between at-least-once and exactly-once delivery and when each matters
Understanding consistency models (eventual, strong, monotonic reads) and choosing appropriately

Common failure modes at this level:

Good breadth but shallow depth in every area (wide but not deep)
Trade-offs only when prompted — not proactively
Missing the most important deep-dive area (e.g., spending 20 min on API design for a storage system instead of the storage layer)

Staff+ (L6+ equivalent)

Expected depth: Can design novel systems outside the book’s examples. Identifies second-order effects and emergent problems at scale. Challenges assumptions productively.

Expected breadth: Fluent across all system types. Can cross-reference patterns across domains (e.g., “this is similar to the problem Kafka solves, applied to X”).

What to demonstrate:

System-level judgment: not just “what” but “when this design breaks and why”
Operational concerns: observability, deployment, rollout strategy
Cross-cutting concerns: security, compliance, cost, team topology
Can scope the problem themselves: “Given that we have 45 min, I want to focus on X because it’s the most novel part — happy to skip Y since that’s standard.”

What is expected at L6+ that wasn’t at L5:

Proactively scoping the interview: “What aspect is most interesting to you — the storage layer or the real-time delivery path?”
Knowing when the standard solution is wrong and proposing a better one
Discussing failure scenarios at the systems level (not just component level): “If this entire AZ goes down during a write, how does the system recover with no data loss?”
Mentioning organizational / team concerns: “This design implies two teams need to coordinate on schema changes — we’d want to version the API here.”

Common failure modes at this level:

Being too academic (correct theory, missing operational pragmatism)
Spending all 45 min on one deep dive (even a great one); needs to demonstrate breadth too
Over-engineering: jumping to a distributed consensus protocol when a simpler solution suffices

Last Updated: 2026-04-13