Module 06: Multi-Agent Systems

Multi-agent systems are the architecture you reach for when a single agent is no longer sufficient. This module covers when to decompose tasks, how to coordinate agents, the protocols for inter-agent communication, failure handling, and how the major frameworks compare.

1. Why Multi-Agent Systems

Single Agent Limitations

A single LLM agent hits practical walls in several dimensions:

Context window limits. Even with 200K token windows, complex tasks generate more intermediate state than can fit in one context. A research task that involves reading 20 documents, running code, checking APIs, and synthesizing results will overflow any practical context budget if handled in a single agent loop.

Specialization. A generalist agent is mediocre at everything. A coding-specialized agent (trained with code-specific system prompts, appropriate tool sets, narrow scope) outperforms a generalist on code tasks. Multi-agent systems let you route each subtask to the agent best suited for it.

Speed. A single agent is serial: it does step 1, then step 2, then step 3. If steps 2 and 3 are independent, a multi-agent system can run them in parallel — reducing wall-clock time dramatically.

Reliability. If a single agent’s context becomes corrupted (a tool returned garbage, a loop went off the rails), the entire task fails. Multi-agent systems can retry at the subagent level without restarting the whole pipeline.

When to Decompose

Add agents when you have:

Tasks larger than a context window — research tasks, large codebase changes, multi-document synthesis.
Parallelizable subtasks — summarize 10 articles simultaneously, check 5 APIs concurrently.
Subtasks requiring specialization — different system prompts, different tool sets, different models (use GPT-4 for reasoning, Haiku for classification).
Long-running workflows — if a task takes 30+ minutes, checkpoint-ability via separate agents is valuable.

Coordination Overhead Is Real

Adding agents adds complexity. Every agent boundary introduces:

Latency: one more API round-trip
Cost: one more set of input tokens (the task description, context, tools)
Failure surface: one more thing that can error out
Debugging complexity: understanding failures across multiple agents is harder than within one

Do not decompose a task that a single agent can handle cleanly. The default should be: use one agent. Add a second agent only when there is a clear, concrete reason.

2. Orchestrator–Subagent Pattern

The orchestrator–subagent pattern is the most common multi-agent architecture. It has a clear division of responsibility and is easy to reason about.

Roles

Orchestrator:

Receives the high-level task
Breaks it into scoped subtasks
Delegates each subtask to an appropriate subagent
Collects and validates results
Assembles the final output
Owns the overall state of the task

Subagent:

Receives a single, well-defined task from the orchestrator
Has no knowledge of the larger task or other subagents
Returns a structured result in the expected format
May use its own tools and context, but does not call other agents (unless you have nested delegation)

Communication Protocol

The orchestrator-to-subagent message should include:

Task description: what to do, stated precisely and completely
Context: exactly the information the subagent needs (no more, no less)
Output format: the exact structure of the expected result
Constraints: time budget, length limits, tools allowed

The subagent-to-orchestrator result should include:

Result: the actual output
Status: success / partial / failed
Confidence (optional but useful): how certain the subagent is
Errors (if any): what went wrong
Metadata: token usage, time taken, etc.

# Orchestrator delegates a research subtask
subagent_task = {
    "task": "Summarize the key arguments in the provided text",
    "context": "PAPER_TEXT_HERE",
    "output_format": {
        "summary": "string, 3-5 sentences",
        "key_claims": "list of 3-5 strings",
        "methodology": "string, 1-2 sentences or null"
    },
    "constraints": {
        "max_tokens": 400,
        "language": "English"
    }
}
 
# Subagent returns a structured result
subagent_result = {
    "status": "success",
    "result": {
        "summary": "...",
        "key_claims": ["...", "...", "..."],
        "methodology": "..."
    },
    "metadata": {
        "tokens_used": 312,
        "model": "claude-haiku-4-5-20251001"
    }
}

State Management

The orchestrator owns all shared state. Subagents are stateless workers. The orchestrator:

Tracks which subtasks are complete, in-flight, or failed
Stores intermediate results until assembly
Decides what to do when a subagent fails
Has the final say on task completion

This “orchestrator owns state, subagents are stateless” separation is critical. If subagents start sharing state with each other directly, you’ve created a distributed system with all its attendant consistency problems.

Full Example Walkthrough

Task: Research question — “What are the current limitations of transformer-based language models?”

Step 1 — Orchestrator decomposes:

Subtask A: Summarize limitations from a given paper abstract [assigned to Research Agent 1]
Subtask B: Identify limitations mentioned in a given blog post [assigned to Research Agent 2]
Subtask C: List technical limitations from a given technical blog [assigned to Research Agent 3]

Step 2 — Subagents execute in parallel:

Agent 1 returns: {claims: ["quadratic attention complexity", "fixed context length", ...]}
Agent 2 returns: {claims: ["hallucination", "world knowledge cutoff", ...]}
Agent 3 returns: {claims: ["inference cost", "energy consumption", ...]}

Step 3 — Orchestrator synthesizes:

Collects all three result sets
Deduplicates overlapping claims
Structures the final answer with citations to each source
Returns the assembled result

3. DAG-Based Task Decomposition

What Is a Task DAG?

A Directed Acyclic Graph (DAG) models a task as a set of nodes (subtasks) connected by directed edges (dependencies). “Acyclic” means there are no circular dependencies — every path eventually terminates.

For task planning, the DAG tells you:

Which tasks can run in parallel (no dependency between them)
Which tasks must wait for others (explicit dependency edge)
The overall execution order (topological sort)

Identifying Parallelizable vs Sequential Tasks

Run in parallel if:

Task B does not use the output of Task A
Tasks A and B access different resources with no lock contention
Tasks A and B are logically independent parts of the same whole

Must run sequentially if:

Task B consumes the output of Task A (data dependency)
Task B is a quality check on Task A (functional dependency)
Task B sets up resources that Task A will use

Topological Sort = Execution Order

Given a task DAG, topological sort gives you a valid execution order where every node comes after all of its dependencies.

from collections import deque
 
def topological_sort(tasks: dict[str, list[str]]) -> list[str]:
    """
    tasks: {task_id: [list of task_ids this task depends on]}
    Returns tasks in execution order (dependencies first).
    """
    # Build in-degree counts
    in_degree = {t: 0 for t in tasks}
    dependents = {t: [] for t in tasks}
 
    for task, deps in tasks.items():
        for dep in deps:
            in_degree[task] += 1
            dependents[dep].append(task)
 
    # Start with tasks that have no dependencies
    queue = deque([t for t, deg in in_degree.items() if deg == 0])
    order = []
 
    while queue:
        task = queue.popleft()
        order.append(task)
        for dependent in dependents[task]:
            in_degree[dependent] -= 1
            if in_degree[dependent] == 0:
                queue.append(dependent)
 
    if len(order) != len(tasks):
        raise ValueError("Cycle detected in task DAG")
 
    return order

ASCII DAG: Research Task Example

                  ┌─────────────────────┐
                  │   ORCHESTRATOR      │
                  │   (receives task)   │
                  └─────────┬───────────┘
                            │ decomposes
              ┌─────────────┼─────────────┐
              │             │             │
              ▼             ▼             ▼
        ┌─────────┐   ┌─────────┐   ┌─────────┐
        │ Fetch   │   │ Fetch   │   │ Fetch   │
        │ Source  │   │ Source  │   │ Source  │
        │   A     │   │   B     │   │   C     │
        └────┬────┘   └────┬────┘   └────┬────┘
             │             │             │
             ▼             ▼             ▼
        ┌─────────┐   ┌─────────┐   ┌─────────┐
        │Summarize│   │Summarize│   │Summarize│
        │   A     │   │   B     │   │   C     │
        └────┬────┘   └────┬────┘   └────┬────┘
             │             │             │
             └─────────────┼─────────────┘
                           │ fan-in
                           ▼
                  ┌─────────────────────┐
                  │   SYNTHESIZER       │
                  │  (combines results) │
                  └─────────────────────┘

Fetch A, B, C run in parallel (no dependencies between them)
Each Summarize depends on its corresponding Fetch (sequential within lane)
Summarize A, B, C can run in parallel (independent)
Synthesizer depends on all three summarize steps (sequential after fan-in)

Topological sort gives one valid order: [FetchA, FetchB, FetchC, SumA, SumB, SumC, Synthesize]
But in practice you run [FetchA||FetchB||FetchC] → [SumA||SumB||SumC] → [Synthesize]

Failure Handling in a DAG

When a node fails:

Option 1 — Fail fast: Cancel all downstream nodes, return error to orchestrator. Use when partial results are useless.

Option 2 — Continue with partial results: Mark the failed node as failed, skip all nodes that depend on it, continue executing independent nodes, inform the synthesizer that some inputs are missing. Use when partial results are better than nothing.

Option 3 — Retry: Re-queue the failed node (up to N times) before declaring failure. Use for transient failures (rate limits, timeouts).

Option 4 — Substitute: If a node fails, run a fallback (a simpler model, a different tool, a cached result). Use when you have a viable alternative.

4. Parallel Agent Execution (Fan-Out / Fan-In)

Fan-Out

Fan-out means distributing work: send the same prompt structure (with different inputs) to N agents simultaneously.

# Fan-out: 3 agents receive the same task structure, different inputs
tasks = [
    {"topic": "renewable energy storage", "agent_id": "agent_1"},
    {"topic": "grid infrastructure modernization", "agent_id": "agent_2"},
    {"topic": "policy frameworks for energy transition", "agent_id": "agent_3"},
]
# All three are dispatched simultaneously

Fan-In

Fan-in means collecting and synthesizing N results into one.

results = await asyncio.gather(
    run_agent(tasks[0]),
    run_agent(tasks[1]),
    run_agent(tasks[2]),
)
# All three results are now available simultaneously
synthesized = synthesize(results)

Python Implementation with asyncio

import asyncio
import anthropic
import time
 
async def run_agent(client, topic: str, agent_id: str) -> dict:
    """Single async agent call."""
    start = time.monotonic()
    response = await asyncio.to_thread(
        client.messages.create,
        model="claude-haiku-4-5-20251001",
        max_tokens=256,
        messages=[{
            "role": "user",
            "content": f"Write a 3-sentence summary of: {topic}"
        }]
    )
    elapsed = time.monotonic() - start
    return {
        "agent_id": agent_id,
        "topic": topic,
        "result": response.content[0].text,
        "elapsed_seconds": elapsed,
    }
 
async def parallel_research(topics: list[str]) -> list[dict]:
    """Fan-out: run all agents in parallel. Fan-in: collect all results."""
    client = anthropic.Anthropic()
    tasks = [
        run_agent(client, topic, f"agent_{i}")
        for i, topic in enumerate(topics)
    ]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    return [r for r in results if not isinstance(r, Exception)]

Rate Limiting and Cost Considerations

Running N agents in parallel multiplies your API usage by N. Key considerations:

Anthropic rate limits: Requests per minute (RPM) and tokens per minute (TPM) are per-API-key limits. If you fan out to 10 agents simultaneously, all 10 requests consume from the same RPM bucket. Add a semaphore to cap concurrency.

# Limit to 5 concurrent agent calls at once
semaphore = asyncio.Semaphore(5)
 
async def rate_limited_agent(client, topic, agent_id):
    async with semaphore:
        return await run_agent(client, topic, agent_id)

Cost estimation before fanning out: Each agent call costs money. For 10 agents with 1,000 input tokens each and 200 output tokens each, using claude-haiku-4-5, you’re paying ~ $0.001 p er f an - o u t ba t c h . A t 1, 000 ba t c h es / d a y, t ha t^{'} s$ 1/day just for that one workflow. Estimate before building.

Parallel vs sequential timing: With N=3 parallel agents taking 2s each: parallel = 2s total vs sequential = 6s total. The speedup is approximately min(N, available_concurrency) — capped by the slowest agent.

5. Handoff Protocols

Structured Output as the Contract

The interface between agents should be a strongly-typed schema, not a freeform string. When one agent’s output is another agent’s input, both must agree on the format. Using structured output (via tool_use or response_format) enforces this.

Pydantic Models for Inter-Agent Communication

from pydantic import BaseModel, Field
from typing import Optional, Literal
from datetime import datetime
 
class AgentResult(BaseModel):
    """Canonical structure for any agent's output."""
    task_id: str
    agent_id: str
    status: Literal["success", "partial", "failed"]
    result: Optional[dict] = None
    confidence: Optional[float] = Field(None, ge=0.0, le=1.0)
    errors: list[str] = []
    metadata: dict = {}
    completed_at: datetime = Field(default_factory=datetime.utcnow)
 
class ResearchResult(AgentResult):
    """Specialized result for research subagents."""
    class Config:
        extra = "allow"
 
    # result will contain:
    # {
    #   "summary": str,
    #   "key_claims": list[str],
    #   "sources_used": list[str],
    # }

Versioning Handoff Schemas

As your system evolves, the schema between agents will change. Without versioning, deploying a new orchestrator that emits v2 schemas to subagents still running v1 parsers causes silent failures.

Best practices:

Include a schema_version field in every handoff message.
Subagents should validate the schema version before processing.
Support N-1 compatibility: new subagents should handle both current and previous schema versions.
Use semantic versioning: minor bumps are backwards-compatible additions; major bumps are breaking changes.

What to Include in a Handoff

Field	Required	Purpose
`task_id`	Yes	Correlate requests and responses for tracing
`status`	Yes	Let orchestrator know if it got what it needed
`result`	Conditional	The actual output (null if status=failed)
`errors`	On failure	Structured error info for orchestrator retry logic
`confidence`	Recommended	Let orchestrator weight or re-verify low-confidence results
`metadata`	Recommended	Tokens used, model version, latency — for monitoring
`schema_version`	Recommended	Future-proof the interface

6. Failure Handling in Multi-Agent Systems

Multi-agent systems have more failure modes than single-agent systems. Plan for failure explicitly.

Retry at the Agent Level

The simplest and most important failure handler: when an agent fails, try again.

import asyncio
 
async def run_with_retry(agent_fn, *args, max_retries: int = 3, backoff: float = 1.0):
    """Exponential backoff retry for a single agent call."""
    for attempt in range(max_retries):
        try:
            return await agent_fn(*args)
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            wait = backoff * (2 ** attempt)
            print(f"[Retry] Attempt {attempt+1} failed: {e}. Retrying in {wait}s.")
            await asyncio.sleep(wait)

Only retry on transient errors (rate limits, timeouts, 5xx). Do not retry on permanent errors (invalid input, schema mismatch) — they will fail every time.

Fallback Agents

When a specialist fails, fall back to a generalist.

async def run_with_fallback(primary_fn, fallback_fn, *args):
    """Try primary agent; if it fails, use fallback."""
    try:
        result = await primary_fn(*args)
        if result.status != "failed":
            return result
    except Exception:
        pass
    print("[Fallback] Primary agent failed. Using fallback.")
    return await fallback_fn(*args)
 
# Example: specialist code review agent → fallback to generalist
result = await run_with_fallback(
    specialist_code_reviewer,
    generalist_reviewer,
    code_to_review
)

Circuit Breakers

A circuit breaker stops routing tasks to an agent that is consistently failing, protecting the overall system.

class CircuitBreaker:
    """
    Three states:
      CLOSED — normal operation, requests pass through
      OPEN   — agent is failing, reject requests immediately
      HALF   — test mode, allow one request to see if agent recovered
    """
    def __init__(self, failure_threshold: int = 5, timeout: float = 60.0):
        self.failure_count = 0
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.state = "CLOSED"
        self.last_failure_time = 0.0
 
    def call_allowed(self) -> bool:
        if self.state == "CLOSED":
            return True
        if self.state == "OPEN":
            if time.time() - self.last_failure_time > self.timeout:
                self.state = "HALF"
                return True  # Allow the test call
            return False
        return True  # HALF — allow one call
 
    def on_success(self):
        self.failure_count = 0
        self.state = "CLOSED"
 
    def on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = "OPEN"
            print(f"[CircuitBreaker] OPEN — too many failures")

Partial Results Assembly

When some subagents succeed and some fail, the orchestrator must decide how to assemble a useful output from incomplete inputs.

def assemble_partial_results(
    results: list[AgentResult],
    required_tasks: set[str],
    optional_tasks: set[str]
) -> dict:
    """
    Assemble final output from a mix of successful and failed subagents.
    """
    succeeded = {r.task_id: r for r in results if r.status == "success"}
    failed = {r.task_id: r for r in results if r.status == "failed"}
 
    # All required tasks must succeed
    missing_required = required_tasks - set(succeeded.keys())
    if missing_required:
        return {
            "status": "failed",
            "reason": f"Required tasks failed: {missing_required}",
            "partial_data": succeeded
        }
 
    # Optional tasks: include what we have, note what's missing
    missing_optional = optional_tasks - set(succeeded.keys())
    return {
        "status": "partial" if missing_optional else "success",
        "results": {tid: r.result for tid, r in succeeded.items()},
        "missing_optional": list(missing_optional),
        "errors": {tid: r.errors for tid, r in failed.items()},
    }

7. Multi-Agent Frameworks Comparison

LangGraph

Model: Graph-based. Nodes are Python functions (or agents). Edges are transitions, which can be conditional. State flows through the graph as a typed dictionary.

Strengths:

Explicit, auditable control flow — you can see exactly what transitions are possible
First-class support for cycles (loops until convergence)
Built-in persistence and checkpointing (resume interrupted workflows)
Rich ecosystem (LangChain tools, integrations)
Human-in-the-loop is a first-class concept

Weaknesses:

Steeper learning curve than simpler frameworks
The graph abstraction adds boilerplate for straightforward linear workflows
Tight coupling to LangChain conventions

Best for: Complex workflows with conditional branching, loops, and human checkpoints. Long-running processes where persistence matters.

from langgraph.graph import StateGraph
 
def orchestrator(state): ...
def researcher(state): ...
def synthesizer(state): ...
 
graph = StateGraph(dict)
graph.add_node("orchestrator", orchestrator)
graph.add_node("researcher", researcher)
graph.add_node("synthesizer", synthesizer)
graph.add_edge("orchestrator", "researcher")
graph.add_edge("researcher", "synthesizer")

AutoGen

Model: Conversation-based. Agents are objects that can converse with each other. Multi-agent collaboration is framed as a group chat or pairwise conversation.

Strengths:

Intuitive for conversation-native workflows
Easy to set up basic multi-agent dialogues
Good support for code execution in sandboxes
Microsoft-backed, large community

Weaknesses:

Less control over execution order than LangGraph
The conversation model can be hard to predict for strict pipelines
State management is less explicit

Best for: Conversational multi-agent systems, debate patterns (agent A argues, agent B critiques), code generation with auto-execution and error correction.

CrewAI

Model: Role-based. Define “agents” with roles and goals, group them into “crews”, assign “tasks”. Higher-level abstraction than LangGraph or AutoGen.

Strengths:

Fastest time-to-prototype
Intuitive role/crew metaphor
Good for non-engineers to understand the architecture at a glance

Weaknesses:

Less control over internals
Harder to debug when the crew doesn’t behave as expected
Limited customization vs LangGraph

Best for: Rapid prototyping, non-technical teams, simple sequential crew pipelines.

from crewai import Agent, Task, Crew
 
researcher = Agent(role="Researcher", goal="Find relevant information", ...)
writer = Agent(role="Writer", goal="Write a clear summary", ...)
 
research_task = Task(description="Research X", agent=researcher)
write_task = Task(description="Write a report about X", agent=writer)
 
crew = Crew(agents=[researcher, writer], tasks=[research_task, write_task])
result = crew.kickoff()

Claude Code’s Agent Tool

Model: Built-in subagent delegation within Claude Code. The Agent tool spins up a subagent in an isolated worktree, runs it to completion, and returns the result.

Strengths:

Zero setup — no framework to install
Worktree isolation prevents file system contamination between agents
Inherits all Claude Code tools (Read, Write, Bash, Grep, etc.)
Natural for code-focused tasks

Weaknesses:

Only works within Claude Code (not a standalone framework)
Limited visibility into subagent’s internal steps from the orchestrator
No built-in parallel fan-out

Best for: Software development tasks where you want to delegate a complete coding subtask to a subagent with full tool access.

Decision Guide

Scenario	Recommended
Complex conditional workflows, needs checkpointing	LangGraph
Conversational agents, code generation with auto-execution	AutoGen
Quick prototype, simple crew pipelines	CrewAI
Code tasks within Claude Code	Claude Code Agent tool
Full control, no framework overhead	Raw SDK (as in the examples below)

The raw SDK approach (what this module’s examples use) is always a valid choice for production systems where you need full control, easy debugging, and no framework abstractions in your way.

8. Interview Flashcards

Q1: What is the orchestrator-subagent pattern?

A: The orchestrator-subagent pattern separates task coordination from task execution.

The orchestrator receives the high-level goal, decomposes it into scoped subtasks, delegates each to a subagent, and assembles the final result. It owns all shared state.

The subagent receives one well-defined task, executes it with its own tools and context, and returns a structured result. It has no knowledge of the broader task or other agents.

Key benefits: separation of concerns, clear failure isolation (a subagent failure doesn’t corrupt the orchestrator’s state), and natural support for parallelism (the orchestrator can delegate multiple subtasks simultaneously).

In interviews: draw the pattern as a diagram — orchestrator at the top, arrows going down to N subagents, arrows coming back up with results, synthesis at the bottom.

Q2: When should you decompose a task into multiple agents?

A: Decompose when:

The task generates more intermediate state than fits in one context window
Subtasks are logically independent and can run in parallel (reducing latency)
Different subtasks benefit from different specializations (different system prompts, models, or tool sets)
The overall workflow is too long-running for a single agent loop

Do not decompose when:

A single agent can handle it cleanly (coordination overhead outweighs benefits)
Subtasks are tightly coupled (sharing state via the orchestrator adds complexity without parallelism gains)
The task is time-sensitive and the latency of inter-agent communication is unacceptable

The heuristic: if you cannot write a clean, typed interface between what the orchestrator sends and what the subagent returns, the decomposition is probably wrong.

Q3: How do you handle failure when one agent in a pipeline fails?

A: The right strategy depends on the role of the failing agent:

Retry with backoff: For transient failures (rate limits, timeouts). Use exponential backoff with a cap. Retry up to 3 times before escalating.
Fallback agent: Replace the failing specialist with a generalist. Accept lower quality output rather than total failure.
Partial results: If the failing agent’s output is optional (not on the critical path), continue with the remaining results. Annotate the final output to indicate what’s missing.
Circuit breaker: If an agent fails repeatedly, stop routing to it. This protects the overall system from cascading failures and prevents wasted retries.
Fail fast: If the failing agent produces a required output that the synthesizer cannot proceed without, fail the whole pipeline immediately. This is cleaner than a synthesizer that silently produces wrong output with missing inputs.

Always distinguish between required and optional tasks in your orchestrator’s assembly logic.

Q4: What is fan-out/fan-in in multi-agent context?

A: Fan-out and fan-in are the parallel execution pattern:

Fan-out: The orchestrator sends the same task type (with different inputs) to N agents simultaneously. Example: summarize 5 documents by running 5 summarization agents in parallel.

Fan-in: After all N agents complete, collect all N results and pass them to a synthesis step. asyncio.gather() in Python is the typical fan-in mechanism.

The pattern looks like:

Orchestrator
    │
    ├── Agent A (topic 1) ─┐
    ├── Agent B (topic 2) ─┤ (all run in parallel)
    └── Agent C (topic 3) ─┘
                           │
                     Synthesizer (fan-in)

Key considerations:

Concurrency control (semaphore) to avoid hitting API rate limits
Handle exceptions per-agent so one failure doesn’t cancel all others (use return_exceptions=True with asyncio.gather())
The synthesizer must handle partial results if some agents fail

Q5: How do agents communicate with each other?

A: In practice, agents communicate through the orchestrator, not directly with each other (in a standard orchestrator-subagent architecture).

The communication protocol:

Orchestrator → Subagent: A structured message containing the task description, required context, expected output format, and constraints. This is typically a messages array passed to an LLM call.
Subagent → Orchestrator: A structured result object (validated with Pydantic) containing status, result, confidence, errors, and metadata.

The key principle is typed interfaces: the schema between orchestrator and subagent should be explicit and validated. Never pass freeform strings between agents if you can avoid it — parse failures are hard to debug in a multi-agent pipeline.

In advanced architectures (agent meshes, multi-party conversation with AutoGen), agents can communicate directly. But this increases complexity and makes failure analysis harder. Start with the hub-and-spoke (orchestrator-centric) model.

Q6: What is a DAG and how does it apply to agent task planning?

A: A DAG (Directed Acyclic Graph) is a graph where edges are one-directional and there are no cycles. In task planning, nodes represent subtasks and edges represent “must complete before” dependencies.

Applied to agents: given a complex task, model it as a DAG where:

Each node is a subtask that one agent will execute
Directed edge A → B means “A must complete before B can start”
Nodes with no incoming edges can run immediately (they have no prerequisites)
Nodes at the same level of the topological sort can run in parallel

Practical use:

Draw the dependency graph for your task
Run topological sort to get a valid execution sequence
Within each “generation” of the topological sort (nodes that can all run after the same set of predecessors), execute in parallel via fan-out
Fan-in after each generation before proceeding to the next

Failure handling in a DAG: when a node fails, mark it and all nodes downstream of it as skipped. Nodes in independent branches continue running.

Q7: Compare LangGraph vs AutoGen vs CrewAI for a complex research workflow

A: Given a workflow: “search 5 sources, summarize each, critique each summary, synthesize into a report, have a human review before publishing”:

LangGraph is the best fit here:

The conditional human-review step is a first-class concept in LangGraph (interrupt-and-resume)
The DAG structure maps directly to LangGraph nodes and edges
Checkpointing means you can resume if the process is interrupted mid-run
State is explicitly typed throughout

AutoGen could work but:

The human-review step requires a human proxy agent, which is non-trivial to configure correctly
Managing 5 parallel summarization agents in AutoGen’s conversation model is awkward
Better for the “summarize → critique → revise” loop part of the workflow than for the full DAG

CrewAI is least suited:

Designed for sequential crew pipelines, not complex DAGs
Human-in-the-loop is not a first-class concept
Limited control over parallel execution

Verdict: LangGraph for complex workflows with conditionals, human-in-the-loop, and persistence needs. AutoGen for the debate/critique loop within a workflow. CrewAI for simple sequential pipelines only.

Q8: How do you debug a multi-agent system?

A: Debugging multi-agent systems requires more structure than debugging single-agent systems. Key approaches:

1. Structured logging with correlation IDs. Every agent call should log its task_id, agent_id, inputs (truncated), output (truncated), status, and duration. Use the task_id to trace a request across all agents involved.

2. Trace the DAG execution. Log each node as it starts, completes, or fails. Reconstruct the execution graph post-mortem.

3. Save all intermediate results. Don’t just keep the final output. Store each subagent’s raw output to file or a database. This lets you replay individual steps with modified inputs without re-running the whole pipeline.

4. Isolation testing. Test each agent independently before testing the full pipeline. If Agent B fails, first verify that Agent A’s output is what Agent B expects. Often the bug is in the handoff schema, not in either agent.

5. Prompt-level debugging. When an agent returns a wrong answer, add it to a test suite. Run the exact messages array that agent received and iterate on the prompt or output format.

6. Confidence thresholds. Build in confidence scoring. When confidence is low, log extensively and optionally trigger human review. This surfaces agents that are technically “succeeding” but producing low-quality output.

7. Replay infrastructure. The ability to replay a failed run from any checkpoint is invaluable. LangGraph has this built in. In a custom system, persist the state after each major step.

What’s Next

Work through examples/orchestrator_subagent.py to see the orchestrator-subagent pattern implemented end-to-end with the raw Anthropic SDK.
Work through examples/parallel_agents.py to see fan-out/fan-in with asyncio timing comparisons.
Complete the exercises in exercises/README.md, especially the engineering report system design.
See references.md for LangGraph tutorials, the AutoGen paper, and production case studies.

Study Notes by Niladri & AI

Explorer

README

Module 06: Multi-Agent Systems

1. Why Multi-Agent Systems

Single Agent Limitations

When to Decompose

Coordination Overhead Is Real

2. Orchestrator–Subagent Pattern

Roles

Communication Protocol

State Management

Full Example Walkthrough

3. DAG-Based Task Decomposition

What Is a Task DAG?

Identifying Parallelizable vs Sequential Tasks

Topological Sort = Execution Order

ASCII DAG: Research Task Example

Failure Handling in a DAG

4. Parallel Agent Execution (Fan-Out / Fan-In)

Fan-Out

Fan-In

Python Implementation with asyncio

Rate Limiting and Cost Considerations

5. Handoff Protocols

Structured Output as the Contract

Pydantic Models for Inter-Agent Communication

Versioning Handoff Schemas

What to Include in a Handoff

6. Failure Handling in Multi-Agent Systems

Retry at the Agent Level

Fallback Agents

Circuit Breakers

Partial Results Assembly

7. Multi-Agent Frameworks Comparison

LangGraph

AutoGen

CrewAI

Claude Code’s Agent Tool

Decision Guide

8. Interview Flashcards

Q1: What is the orchestrator-subagent pattern?

Q2: When should you decompose a task into multiple agents?

Q3: How do you handle failure when one agent in a pipeline fails?

Q4: What is fan-out/fan-in in multi-agent context?

Q5: How do agents communicate with each other?

Q6: What is a DAG and how does it apply to agent task planning?

Q7: Compare LangGraph vs AutoGen vs CrewAI for a complex research workflow

Q8: How do you debug a multi-agent system?

What’s Next

Graph View

Table of Contents