Chapter 22 Flashcards — Analyzing Architecture Risk

flashcards fsa architecture-risk risk-storming


Risk Matrix


Q: What are the two dimensions of an architectural risk matrix?
A: (1) Likelihood — the probability that the risk event actually occurs (Low/Medium/High), and (2) Impact — the severity of the consequence if it occurs (Low/Medium/High).


Q: State the risk prioritization formula.
A: Risk priority = Likelihood × Impact. High likelihood combined with high impact produces a Critical risk; low likelihood combined with low impact produces an Acceptable risk.


Q: In a 3×3 risk matrix, which cell is Critical and which is Acceptable?
A: Critical = High Likelihood × High Impact (top-right cell). Acceptable = Low Likelihood × Low Impact (bottom-left cell).


Q: What action should be taken for a Critical risk?
A: Immediate mitigation is required. A Critical risk is acceptable only temporarily, with a documented mitigation plan, an assigned owner, and a target completion date.


Q: In a 3×3 risk matrix, where do Medium Priority risks fall?
A: The medium-priority band includes cells along the middle diagonal: High Likelihood × Low Impact, Medium Likelihood × Medium Impact, and Low Likelihood × High Impact. These should be monitored and mitigated if resources allow.


Q: Why must Likelihood and Impact ratings be assigned independently?
A: Because a highly likely but low-impact event (monitor only) looks very different from a low-likelihood but catastrophic-impact event (design for resilience). Conflating the two produces incorrect prioritization.


Risk Assessments


Q: What are the five steps of conducting an architectural risk assessment?
A: (1) Enumerate components and services, (2) identify risk events for each component, (3) rate likelihood and impact for each risk, (4) plot risks on the risk matrix, (5) prioritize and document with owners and a review schedule.


Q: What is the key output of a risk assessment?
A: A prioritized list of risks with likelihood, impact, and priority ratings; assigned owners for each risk; proposed mitigations for high-priority risks; and a review schedule for keeping the assessment current.


Q: Why must a risk assessment be treated as a living document?
A: As architecture evolves, new risks emerge and existing risks are mitigated or change in severity. A one-time risk assessment goes stale quickly and provides a false sense of security.


Risk Storming — Overview


Q: What is risk storming?
A: A collaborative, facilitated risk identification technique performed against an architecture diagram with multiple stakeholders, structured as three sequential phases: Identification, Consensus, and Mitigation.


Q: Why is risk storming performed with multiple stakeholders rather than a solo architect?
A: Different roles — developers, operations, security, product — see different risks. A solo architect systematically misses risks outside their own mental model. Multi-participant design surfaces a broader and more accurate risk picture.


Q: When should risk storming be used?
A: When evaluating a new architecture, after significant architectural changes, during major incident retrospectives, or on a scheduled periodic basis to keep the risk picture current.


Phase 1: Identification


Q: What is the goal of Phase 1 of risk storming?
A: To surface as many risks as possible without social filtering — ensuring every participant’s risk perceptions are captured independently before group dynamics can suppress them.


Q: Describe how Phase 1 is conducted.
A: All participants independently write risks on sticky notes (one risk per note) and place each note on the component or connection in the architecture diagram that the risk affects. No discussion is allowed during this phase.


Q: Why is the “no discussion” rule critical in Phase 1?
A: Discussion during identification causes participants to self-censor, anchor on others’ ideas, and defer to authority figures. Independent work ensures every perspective is captured on equal footing before social dynamics take over.


Q: What does a useful output of Phase 1 look like?
A: An architecture diagram covered in sticky notes, with natural clusters forming around the most-discussed (highest-risk) components — revealing which parts of the architecture attract the most concern across participants.


Phase 2: Consensus


Q: What is the goal of Phase 2 of risk storming?
A: To review all identified risks, assign agreed severity ratings (likelihood + impact), merge duplicates, and resolve disagreements through facilitated discussion.


Q: How are contested risks handled in Phase 2?
A: Participants discuss the disagreement until consensus is reached. If consensus cannot be reached, a designated tie-breaker (e.g., the lead architect) makes the final call.


Q: What is the deeper value of disagreements during Phase 2, beyond reaching a rating?
A: Disagreements reveal knowledge gaps — a security engineer rating a risk “High Impact” where a developer rates it “Low Impact” shows that one party has information the other lacks. Surfacing this gap is itself a valuable outcome.


Q: What is the output of Phase 2?
A: A fully populated risk matrix with agreed severity ratings for all identified risks, ready to drive prioritization and mitigation planning.


Phase 3: Risk Mitigation


Q: What is the goal of Phase 3 of risk storming?
A: For the highest-priority risks (Critical and High Priority), define concrete mitigation strategies, assign an owner to each, and set a target completion date.


Q: Name four categories of risk mitigation strategy.
A: (1) Design changes (eliminate the risk architecturally, e.g., remove a SPOF), (2) monitoring and alerting (detect the risk event early), (3) redundancy and fallback (tolerate failure without degradation), (4) process controls (operational procedures that reduce likelihood).


Q: What makes a mitigation plan actionable vs. incomplete?
A: Each mitigation must have a named owner (a specific person or team) and a target completion date. Ownerless mitigations with no deadline are never implemented.


User-Story Risk Analysis


Q: What is user-story risk analysis?
A: The practice of evaluating architectural risk at the individual user story level — assessing which stories touch high-risk components, introduce new risk, or have high likelihood/impact of going wrong.


Q: How should high-risk user stories be handled in sprint planning?
A: They should be front-loaded in the sprint (implemented early so problems surface with time to address them), assigned to more experienced engineers, and given more explicit acceptance criteria and test coverage.


Q: What is the advantage of applying risk analysis at the user-story level?
A: It connects architecture risk to where work actually happens — making risk visible to developers at the point of delivery rather than in a separate document that few people read.


Use Cases: Availability, Elasticity, Security


Q: What is the focus question for availability risk storming?
A: “Where are the single points of failure, and what happens when any given component goes down?” Participants look for components with no redundancy, synchronous dependency chains, and missing fallback strategies.


Q: Name two common high-severity availability risks identified in risk storming.
A: (1) An authentication service with no redundancy — its failure blocks all user logins. (2) A database with no read replica — a slow query blocks all reads.


Q: What is the focus question for elasticity risk storming?
A: “What can’t scale, and where are the bottlenecks when load increases?” Participants look for services requiring manual scaling, in-process state that prevents horizontal scaling, and third-party API rate limits.


Q: Why does in-process state create an elasticity risk?
A: If a service stores session or user state in memory (in-process), adding more instances of that service doesn’t help — each new instance lacks the state held by the original, causing session loss or inconsistent behavior.


Q: What is the focus question for security risk storming?
A: “What are the attack surfaces, where is sensitive data vulnerable, and where are authentication/authorization weaknesses?” Participants look for unencrypted data in transit, missing auth on internal APIs, and overly permissive access controls.


Q: Name two common high-severity security risks identified in risk storming.
A: (1) Internal service-to-service calls over HTTP (not HTTPS) — lateral movement after compromise is trivial. (2) An admin API with no rate limiting — vulnerable to credential stuffing attacks.



FieldValue
Total Cards27
Review Time~30 minutes
PriorityMEDIUM
Last Updated2026-05-29