System Design Interview Evaluation Rubric
A detailed rubric mirroring what interviewers at top tech companies actually evaluate. Use this for self-assessment after every practice session and for structured peer feedback in mock interviews.
Section 1: Overall Evaluation Dimensions (with Weight)
| Dimension | Weight | What It Measures |
|---|---|---|
| Problem Scoping & Clarification | 20% | Can you define the right problem before solving it? |
| High-Level Design Quality | 20% | Can you draw a coherent, complete architecture? |
| Deep Dive Depth & Correctness | 25% | Do you have real technical depth in the right areas? |
| Trade-Off Discussion | 20% | Can you reason about choices, not just make them? |
| Communication & Thought Process | 15% | Is your reasoning transparent and structured? |
Overall decision thresholds (rough guideline):
- Strong Hire: 85%+ weighted score, at least “Hire” in every dimension
- Hire: 70%+ weighted score, no “No hire” in any dimension
- No Decision: 55–70% weighted score, or mixed signals
- No Hire: Below 55%, or “No hire” in Deep Dive or Trade-Offs
Section 2: Detailed Rubric Per Dimension
Dimension 1: Problem Scoping & Clarification (20%)
| Level | Score | What It Looks Like | Example Behaviors |
|---|---|---|---|
| Strong Hire | 4/4 | Asks exactly the right questions, eliminates ambiguity fast, scopes precisely | ”Before I start, I want to confirm: are we optimizing for read latency or write throughput? And is this a global service or single-region?” |
| Hire | 3/4 | Asks good questions, covers functional + non-functional requirements, states assumptions | Asks about scale, consistency, latency. Writes down requirements. States 2-3 explicit assumptions. |
| No Decision | 2/4 | Asks some questions but misses key ones, or over-asks trivial things | Asks about UI design (irrelevant) but doesn’t ask about scale. Or asks 10 questions with no prioritization. |
| No Hire | 1/4 | Skips clarification entirely, or gets requirements badly wrong | Immediately starts drawing. Or designs for 1K users when the problem implied 1B. |
Specific signals to watch for:
- Asks about read/write ratio: +signal
- Asks about availability/consistency requirement: +signal
- States explicit assumptions before designing: +signal
- Asks about authentication/authorization (rarely relevant at this level): neutral/minor waste
- Never writes down requirements: -signal
- Still asking clarifying questions at minute 8: -signal
Dimension 2: High-Level Design Quality (20%)
| Level | Score | What It Looks Like | Example Behaviors |
|---|---|---|---|
| Strong Hire | 4/4 | Clean, complete architecture that naturally evolves from requirements; APIs and data model are precise | Defines 3 APIs with request/response, draws data model with key entities and relationships, draws system with all critical components labeled |
| Hire | 3/4 | Solid architecture covering main components; minor gaps or imprecision | Has LB, app servers, cache, DB. APIs are sketched. Data model has main entities. Might miss CDN or message queue. |
| No Decision | 2/4 | Architecture covers basics but has notable gaps or unclear data flow | Has a DB and a server. Can’t clearly explain data flow. API is vague or missing. |
| No Hire | 1/4 | Architecture is missing critical components or fundamentally incorrect | No caching when clearly needed. No load balancing for a high-scale system. Or single-point-of-failure design with no awareness. |
Specific signals to watch for:
- Draws system first, then explains: +signal (visual thinking)
- Explains “why” for each component chosen: +signal
- Evolves design from simple to complex: +signal
- Names components correctly (not just “a database”): +signal
- All arrows are labeled with protocols or data types: +signal
- Puts everything on one server with no scaling plan: -signal
- Draws system without explaining anything: -signal
Dimension 3: Deep Dive Depth & Correctness (25%)
| Level | Score | What It Looks Like | Example Behaviors |
|---|---|---|---|
| Strong Hire | 4/4 | Goes genuinely deep into 2+ areas; correctness is high; raises non-obvious challenges proactively | ”The sharding key matters here because [X], and if we pick [Y] instead, we’ll get hotspots when [Z]. I’d use [W] and handle re-balancing by…” |
| Hire | 3/4 | Good depth in at least 1 area; mostly correct; identifies the main technical challenges | Can explain consistent hashing with virtual nodes. Can articulate why fan-out on write is better for low-follower users. Minor technical gaps. |
| No Decision | 2/4 | Shallow depth; correct at surface level but can’t go deeper under probing | Knows Kafka is used but can’t explain partitioning or consumer group offset. Says “we’d shard” but can’t explain the shard key logic. |
| No Hire | 1/4 | Incorrect answers when probed; can’t go deeper than naming technologies | States wrong facts under pressure. Can’t explain the difference between a hash partition and range partition. |
Specific signals to watch for:
- Proactively identifies scaling bottleneck without being asked: ++signal
- Quantifies estimates during deep dive (e.g., “at 10K QPS, one DB can handle this, but at 100K we’d need…”): +signal
- Knows specific numbers (Redis ~100K ops/sec, Kafka throughput): +signal
- Correctly uses terms like “idempotency,” “quorum,” “write amplification”: +signal
- Can whiteboard an algorithm step-by-step: +signal
- Misuses a term (e.g., says “eventual consistency” when describing a bank transaction): -signal
- Drops a technology name but can’t explain its properties: -signal
Dimension 4: Trade-Off Discussion (20%)
| Level | Score | What It Looks Like | Example Behaviors |
|---|---|---|---|
| Strong Hire | 4/4 | Every design decision is accompanied by trade-off analysis; acknowledges costs of chosen approach; considers alternatives explicitly | ”I’m choosing fan-out on write here, which gives faster reads but more write amplification. For celebrity accounts — users with >1M followers — I’d switch to fan-out on read to avoid the write storm.” |
| Hire | 3/4 | Most major decisions include a trade-off; can explain pros and cons when asked | When asked “why Redis?”, gives a coherent answer about speed + TTL + data structures. Knows when to use SQL vs NoSQL. |
| No Decision | 2/4 | Makes decisions without trade-off analysis; gives trade-offs only when explicitly probed | Says “I’ll use Kafka” but doesn’t explain why or what the alternative is. Only acknowledges trade-offs when the interviewer pushes. |
| No Hire | 1/4 | No trade-off discussion at all; or gives wrong trade-offs (incorrect technical facts) | “I’ll use NoSQL because it’s faster” without any qualification. Or contradicts themselves on consistency requirements. |
Specific signals to watch for:
- Voluntarily says “the trade-off here is…”: ++signal
- Mentions the CAP theorem correctly in context: +signal
- Compares two specific options head-to-head: +signal
- Acknowledges what their design gives up: +signal
- Uses “it depends” correctly (with context): +signal
- Uses “it depends” as a dodge without following up: -signal
- Claims a technology is strictly better with no caveats: -signal
Dimension 5: Communication & Thought Process (15%)
| Level | Score | What It Looks Like | Example Behaviors |
|---|---|---|---|
| Strong Hire | 4/4 | Consistently thinks out loud; structured narrative; responds well to hints; engages interviewer | ”I’m considering two approaches here. Option 1 is X, which has advantages A and B. Option 2 is Y, which solves C but at cost D. I’ll go with X because of our read-heavy workload.” |
| Hire | 3/4 | Generally thinks out loud; mostly structured; takes hints well; diagram is clear | Talks through reasoning most of the time. Clear whiteboard. Picks up on interviewer nudges after 1-2 hints. |
| No Decision | 2/4 | Occasional silence; reasoning not always clear; slow to take hints | 30-60 seconds of silence when stuck. Continues in wrong direction after subtle hint. Diagram is unclear. |
| No Hire | 1/4 | Long silences; reasoning is opaque; ignores or doesn’t notice interviewer hints | 2+ minutes of silence. Interviewer has to give very direct corrective feedback. Doesn’t engage. |
Specific signals to watch for:
- Narrates their thinking in real time: +signal
- Pauses and says “let me think about this for a second” then continues quickly: +signal (shows metacognition)
- Uses structured transitions: “Now that I’ve covered X, let me move to Y”: +signal
- Engages the interviewer: “Does this approach make sense to you?” or “Would you like me to go deeper here?”: +signal
- Reacts gracefully to pushback: “That’s a good point — I hadn’t considered Z. Let me revise…”: ++signal
- Long silent pause without verbalization: -signal
- Defensive reaction to interviewer questions: -signal
- “I’ll come back to that” and never does: -signal
Section 3: Common Mistakes That Immediately Signal “No Hire”
| Mistake | Why It’s Disqualifying |
|---|---|
| Skipping requirements entirely, jumping straight to solution | Suggests you solve the wrong problem in real life |
| Claiming a system has “infinite scalability” or “zero downtime” | Shows you don’t understand fundamental engineering trade-offs |
| Using a term incorrectly and insisting you’re right when corrected | Signals poor technical foundation and inability to learn |
| Designing a payment system without mentioning idempotency | Critical knowledge gap for financial systems |
| Ignoring the interviewer’s hints three or more times | Poor collaboration; will be difficult to mentor or work with |
| Complete silence for >2 minutes with no output | Cannot handle ambiguity under pressure |
| Saying “I would just use microservices” as a solution to scaling | Buzzword response with no substance |
| Designing a system with a single database and no replication for a 99.99% SLA | Doesn’t know what high availability requires |
| Changing your entire design when the interviewer pushes back, with no defense | Lack of conviction; suggests design is not grounded in reasoning |
| Confusing CAP theorem (claiming all three properties are achievable simultaneously) | Fundamental misconception of distributed systems |
Section 4: Signals That Immediately Impress Interviewers (Green Flags)
| Signal | Why It Impresses |
|---|---|
| Doing back-of-envelope math proactively before designing | Shows quantitative thinking; grounds design in reality |
| Identifying a non-obvious bottleneck that the interviewer expected to prompt | Demonstrates experience with production systems |
| Saying “here are three approaches; I’ll use X for these reasons, but Y is better if…” | Shows depth, alternatives awareness, and decision-making |
| Knowing specific system characteristics (Redis ops/sec, Kafka message size limits) | Signals real-world experience, not just textbook knowledge |
| Drawing a clean diagram that evolves through versions (V1 simple → V2 scaled) | Mirrors how real systems are built; shows architectural thinking |
| Proactively asking “what level of consistency does this business case require?” | Shows understanding that technical choices have business implications |
| Catching your own mistake mid-design and correcting it: “Wait, this doesn’t work because…” | Shows strong self-review and intellectual honesty |
| Mentioning observability (metrics, tracing, alerting) without being asked | Signals production-readiness mindset |
| Discussing data model in terms of access patterns, not just normalization | Shows understanding of NoSQL vs SQL trade-offs at depth |
| Explicitly mentioning idempotency for any write operation | Signals awareness of distributed systems failure modes |
Section 5: Self-Evaluation Checklist
After every practice session, answer these 20 questions honestly (yes / partial / no).
Requirements & Scoping
- Did I ask about scale (DAU, QPS)?
- Did I ask about consistency vs. availability requirements?
- Did I explicitly state my assumptions before designing?
- Did I keep requirements under 5 minutes?
High-Level Design
- Did I define at least 2 core API endpoints?
- Did I draw a system diagram with labeled components?
- Did I describe the data model with key entities?
- Did I explain the purpose of each major component?
Deep Dive
- Did I go genuinely deep into at least one component?
- Did I identify and address the main scaling bottleneck?
- Were my technical statements correct (no wrong facts)?
- Did I quantify at least one thing with numbers?
Trade-Offs
- Did I explain why I chose each major technology?
- Did I name an alternative for at least one choice?
- Did I acknowledge what my design gives up?
- Did I use trade-off framing (“X is better for Y, but costs Z”)?
Communication
- Did I think out loud consistently?
- Did I take any interviewer hints gracefully?
- Did I manage my time well (covered all 4 phases)?
- Did I end with a wrap-up (bottlenecks, monitoring, improvements)?
Scoring guide:
- 18-20 yes: Strong hire performance
- 14-17 yes: Hire performance — identify the partial/no answers as focus areas
- 10-13 yes: No decision — significant work needed in multiple areas
- Below 10: No hire — return to fundamentals; practice framework adherence
Section 6: How to Give Feedback in Peer Mock Interviews
Before the Session
- Agree on the problem in advance (or interviewer picks it without telling the candidate).
- Use this rubric as your evaluation sheet.
- Take notes during the session on specific behaviors, not just impressions.
- Don’t interrupt during the 45-minute session — note questions for feedback phase.
During the Session (Interviewer Role)
- Use this rubric to track signals in real time.
- Note timestamps: when did requirements end? When did deep dive start?
- Write down exact quotes that were strong or weak.
- Give hints using these prompts (graduated from subtle to direct):
- Subtle: “Interesting — what happens at scale?”
- Moderate: “How would this handle the case where there are millions of followers?”
- Direct: “I’m thinking specifically about how you’d handle write amplification here.”
After the Session (Feedback Structure)
Use this structure for the 15-minute debrief:
1. Self-assessment first (2 min)
Ask the candidate: “How do you think that went? What would you do differently?”
2. Strengths (3 min)
Call out 2-3 specific strong moments with exact examples.
“When you proactively raised the celebrity problem without being asked — that was a strong signal.”
3. Development areas (5 min)
Identify the 2-3 most impactful areas to improve.
“You didn’t discuss trade-offs for the DB choice. When you said ‘I’ll use PostgreSQL’, the follow-up is always: ‘because of X, and the alternative would be Y in a different scenario.’”
4. Dimension scores (3 min)
Walk through the 5 dimensions with scores.
“On trade-offs I’d give you a 2/4 today. Here’s specifically what would make it a 3 or 4…”
5. Next action (2 min)
One concrete thing to focus on in the next session.
“For next time: after every technology choice, force yourself to add ‘because X, and the trade-off is Y.’”
Feedback Anti-Patterns to Avoid
| Anti-Pattern | Better Approach |
|---|---|
| ”That was pretty good overall” | Give specific scores on each dimension |
| ”You should have mentioned Kafka" | "The design had an async processing step — what options did you consider there?” |
| Focusing only on technical gaps | Balance technical and communication feedback |
| ”You were too slow" | "You spent 12 minutes on requirements — ideally that’s 5 min. Here’s how to tighten it…” |
| Piling on 10 improvement areas | Pick top 2-3 most impactful; the rest can wait |
Section 7: Level-Specific Expectations
Mid-Level / Junior Senior (L4–L5 equivalent)
Expected depth: Can design systems correctly using the textbook approach. Knows when to use caching, sharding, and message queues. Can articulate the main trade-offs when prompted.
Expected breadth: Comfortable with 10-15 common systems. Can handle any problem from Vol1 and the simpler Vol2 chapters.
What to demonstrate:
- Clear framework application (all 4 steps, in order)
- Correct use of standard patterns (cache-aside, fan-out, consistent hashing)
- Trade-offs when prompted by interviewer
- Back-of-envelope estimation (even if rough)
What is NOT expected at L4:
- Proactively raising non-obvious bottlenecks
- Deep knowledge of distributed consensus algorithms
- Experience-based anecdotes from production systems
Common failure modes at this level:
- Skipping estimation
- Not drawing a diagram
- Knowing names of technologies without knowing their properties
Senior (L5–L6 equivalent)
Expected depth: Can go deep on 2+ areas per problem. Identifies trade-offs before being asked. Has real opinions backed by reasoning. Knows the “why” behind patterns.
Expected breadth: Comfortable with all Vol1 and most Vol2 chapters. Can handle novel problem types by applying patterns they know.
What to demonstrate:
- Proactively identifying the hardest scaling challenges
- Comparing alternatives head-to-head with specifics
- Quantifying trade-offs (“if we fan-out to 100M followers, that’s 100M writes per post”)
- Knowing specific numbers and when they matter
What is expected at L5 that wasn’t at L4:
- Raising the celebrity / hotspot / thundering herd problem without prompting
- Knowing the difference between at-least-once and exactly-once delivery and when each matters
- Understanding consistency models (eventual, strong, monotonic reads) and choosing appropriately
Common failure modes at this level:
- Good breadth but shallow depth in every area (wide but not deep)
- Trade-offs only when prompted — not proactively
- Missing the most important deep-dive area (e.g., spending 20 min on API design for a storage system instead of the storage layer)
Staff+ (L6+ equivalent)
Expected depth: Can design novel systems outside the book’s examples. Identifies second-order effects and emergent problems at scale. Challenges assumptions productively.
Expected breadth: Fluent across all system types. Can cross-reference patterns across domains (e.g., “this is similar to the problem Kafka solves, applied to X”).
What to demonstrate:
- System-level judgment: not just “what” but “when this design breaks and why”
- Operational concerns: observability, deployment, rollout strategy
- Cross-cutting concerns: security, compliance, cost, team topology
- Can scope the problem themselves: “Given that we have 45 min, I want to focus on X because it’s the most novel part — happy to skip Y since that’s standard.”
What is expected at L6+ that wasn’t at L5:
- Proactively scoping the interview: “What aspect is most interesting to you — the storage layer or the real-time delivery path?”
- Knowing when the standard solution is wrong and proposing a better one
- Discussing failure scenarios at the systems level (not just component level): “If this entire AZ goes down during a write, how does the system recover with no data loss?”
- Mentioning organizational / team concerns: “This design implies two teams need to coordinate on schema changes — we’d want to version the API here.”
Common failure modes at this level:
- Being too academic (correct theory, missing operational pragmatism)
- Spending all 45 min on one deep dive (even a great one); needs to demonstrate breadth too
- Over-engineering: jumping to a distributed consensus protocol when a simpler solution suffices
Last Updated: 2026-04-13