Learning Path¶
An opinionated 5-week curriculum for engineers who want to go from zero to production-ready multi-agent systems. Each week has a concrete milestone. Do the work — don't just read.
Resources Consulted¶
The following learning resources were reviewed to identify what the field covers well and where gaps remain. This curriculum is designed to fill those gaps.
| Resource | What It Covers | Strengths | Gaps |
|---|---|---|---|
| DeepLearning.AI — Multi AI Agent Systems with crewAI (2h41m, taught by crewAI creator João Moura) | Role-based agents, task delegation, hierarchical process, CrewAI internals | Authoritative source; covers CrewAI deeply | Framework-specific only; no framework-agnostic thinking |
| DeepLearning.AI — AI Agents in LangGraph (1h32m, taught by Harrison Chase) | StateGraph, typed state, conditional routing, checkpointing | Concise and practical; taught by the LangGraph author | Minimal coverage of production observability |
| DeepLearning.AI — AI Agentic Design Patterns with AutoGen (1h25m, taught by AutoGen creators) | Conversational patterns, group chat, turn-taking strategies | Covers AutoGen patterns not documented elsewhere | Outdated post-AutoGen 0.4 merge with Semantic Kernel |
| DeepLearning.AI — Building Agentic RAG with LlamaIndex (44m) | RAG pipelines, document agents, query routing | Fast path to RAG agents | Very short; no multi-agent coordination |
| HuggingFace Learn AI Agents Course (free) | smolagents, LangGraph, LlamaIndex, competitive leaderboard | Free; covers three frameworks; community competition keeps it current | Lighter on production patterns |
| Microsoft AI Agents for Beginners (54K GitHub stars, 18 lessons) | Agent fundamentals, MCP protocol, A2A protocol, multi-framework overview | Broadest protocol coverage; MCP/A2A sections are unique | Beginner-oriented; thin on architecture depth |
| UC Berkeley LLM Agents MOOC (Fall 2024 + Fall 2025 + Spring 2025 advanced; guest speakers from OpenAI, DeepMind, Meta) | Theoretical foundations, reasoning, planning, emerging research | Best academic depth; frontier research from practitioner speakers | Not hands-on; requires strong ML background |
| Maven Agentic AI Engineering Bootcamp ($1,200, 6 weeks) | End-to-end production agent pipelines, LangGraph, AutoGen, LlamaIndex, deployment | Most production-complete paid course; includes 3 projects and live Q&A | Cost; cohort-only format |
| Anthropic — "Building Effective Agents" (guide) | Workflows vs. autonomous agents, failure modes, prompt engineering for agents | The canonical practitioner guide; concise and opinionated | No code examples |
| Anthropic — "How We Built Our Multi-Agent Research System" (engineering post) | Real production architecture, orchestration decisions, cost management | Rare production case study from a frontier lab | Single system; may not generalize |
| Lilian Weng — "LLM Powered Autonomous Agents" (blog post) | ReAct loop, tool use, memory taxonomy, planning taxonomy | Canonical conceptual reference; well-cited in the literature | Pre-dates most modern frameworks |
Key gaps across all reviewed resources:
- No framework-agnostic architecture thinking — almost every course teaches one framework in isolation
- No practical evaluation curriculum — how to actually measure whether your agents are working
- Minimal production observability — tracing, cost attribution, failure diagnosis
- No cost optimization content — multi-agent systems can be expensive; nobody teaches you how to manage this
This curriculum addresses all four gaps explicitly.
Week 0: Production Systems — What You're Actually Using¶
Goal: Before studying frameworks, understand what production agentic systems look like from the outside in.
Most curricula throw you into framework tutorials before you've seen what a finished product actually does. This week reverses that. You'll study two shipping production systems — one single-agent, one multi-agent — observe their behavior hands-on, and map what you see to the architectural concepts you'll build from scratch in Weeks 1–4.
Day 1–2: Claude Code — Single-Agent Architecture¶
- Set up Claude Code (or observe demo videos/screenshots if no API key)
- Observe and document: What tools does it use? How does it read and edit files? When does it ask for confirmation?
- Map what you see to the agent loop from Internals
- Reference the Claude Code production systems page
- Exercise: Write down the 5 most common patterns you observe in Claude Code's behavior
Claude Code is a terminal-based CLI that runs a single agent loop against your codebase. It uses a suite of file system tools (read_file, write_file, bash, glob, grep) and asks for explicit confirmation before applying writes or running shell commands. Watch how it manages the context window across a long session — it prunes history aggressively to stay within limits. This is the same loop you'll build in Week 1, implemented at production quality.
Day 3–4: Perplexity Computer — Multi-Agent Architecture¶
- Use Perplexity Computer (free tier available) for a complex research task
- Observe: When does it spawn subagents? How does it handle parallel work? How does it cite sources?
- Map what you see to the handoff patterns from Internals
- Reference the Perplexity Computer production systems page
- Exercise: Identify which subagent types were used and what each contributed
Perplexity Computer is a cloud-based multi-agent system that decomposes complex tasks into parallel subtasks, routes each to a specialized subagent (web search, browser, file operations, code execution), and synthesizes results through an orchestrator. Compare this to Claude Code's single-agent design: the same agent loop, but now there are many of them running in parallel with an orchestrator managing handoffs.
Day 5–6: Compare the Two Architectures¶
- Side-by-side analysis: single-agent (Claude Code) vs. multi-agent (Perplexity Computer)
- When does single-agent win? (Focused coding tasks, file system work)
- When does multi-agent win? (Research, parallel information gathering, complex multi-domain tasks)
- Reference Internals § 6 — The Orchestration Tax for the cost/latency tradeoffs
- Exercise: For 5 real tasks you do regularly, which architecture would you choose and why?
The core tradeoff: single-agent systems are simpler, cheaper, and easier to debug. Multi-agent systems pay an orchestration tax — extra latency and cost for each agent hop — but unlock parallel execution and specialization. Neither architecture is universally superior; the right choice depends on the task.
Day 7: The DIY Question¶
- Review the DIY Replication Path sections of both production system pages
- Survey the open-source landscape: What models and frameworks would you need?
- Exercise: Pick ONE component (e.g., "a coding agent with file tools") and sketch how you'd build it with open-source tools
- This sets the stage for Week 1, where you'll build the raw mechanics yourself
Milestone: A written comparison of two production architectures, mapped to the internals concepts, with a plan for what to build in Weeks 1–4.
Week 1: Foundations — The Raw Mechanics¶
Goal: Understand what's happening under the hood before touching any framework.
If you start with CrewAI or LangGraph on day one, you'll use magic you don't understand. When it breaks in production, you won't know why. Spend this week building the primitives by hand.
Day 1–2: The Agent Loop¶
Read the Internals page. Every framework — CrewAI, LangGraph, AutoGen — is a wrapper around the same 20-line loop:
def agent_loop(system_prompt, tools, max_iterations=10):
messages = [{"role": "system", "content": system_prompt}]
for _ in range(max_iterations):
response = llm.chat(messages, tools=tools)
if response.finish_reason == "stop":
return response.content
# Execute tool call, append result to message history
tool_result = execute_tool(response.tool_calls[0])
messages.append({"role": "tool", "content": tool_result})
raise MaxIterationsError()
Build this yourself. Understand that the entire conversation history is replayed on every LLM call — this is why context window management matters. Read Lilian Weng's "LLM Powered Autonomous Agents" for the foundational taxonomy of agent components: planning, memory, and tool use.
Day 3–4: Tool Calling Deep Dive¶
Build a single agent with web search + file read/write using the raw OpenAI function calling API — no wrappers. This forces you to understand how tool schemas work, how the model decides when to call a tool, and how to handle multi-step tool chains.
Reference DeepLearning.AI's "Functions, Tools and Agents with LangChain" for the function calling mechanics, then implement the same pattern without LangChain to see what the framework is hiding.
Key things to understand: - Tool schemas are JSON Schema — the model reads your docstrings and parameter types - Parallel tool calling: the model can emit multiple tool calls in a single response - Tool errors must be handled gracefully — return structured error messages, not exceptions
Day 5–6: Memory and State¶
Set up ChromaDB and build a basic RAG pipeline from scratch:
Understand the three memory types you'll use in production:
| Memory Type | Storage | Use Case |
|---|---|---|
| Short-term | Context window | Current task context |
| Long-term | Vector DB (ChromaDB, Pinecone) | Facts, past research, user preferences |
| Procedural | Structured store | Reusable task execution patterns (LEGOMem) |
Reference the Internals page section on state serialization patterns. The key insight: when an agent "remembers" something, it's either in the context window (ephemeral) or written to an external store (persistent). There is no magic.
Day 7: When NOT to Use Agents¶
Read Anthropic's "Building Effective Agents" guide. The most important thing you'll learn this week is when a simple workflow beats an autonomous agent. Anthropic's framing: prefer deterministic code over agent autonomy when the task is well-defined. Reserve autonomy for tasks where the solution space is too large to enumerate.
Milestone: A single agent with tools, memory, and RAG — built without any framework.
Week 2: Multi-Agent Pipelines — CrewAI¶
Goal: Build multi-agent systems with role separation.
CrewAI is the right first multi-agent framework because its abstractions map to how you naturally think about teams. Agents have roles, goals, and backstories. Tasks have expected outputs. The framework handles the orchestration.
Day 1–2: CrewAI Fundamentals¶
Reference the DeepLearning.AI "Multi AI Agent Systems with crewAI" course and the CrewAI deep dive. Build a researcher + writer crew:
from crewai import Agent, Task, Crew, Process
researcher = Agent(
role="Senior Research Analyst",
goal="Find accurate, up-to-date information on the given topic",
backstory="Expert at synthesizing technical papers and web sources",
verbose=True,
)
writer = Agent(
role="Technical Writer",
goal="Produce clear, well-structured documentation",
backstory="Skilled at making complex topics accessible to engineers",
)
research_task = Task(
description="Research the top multi-agent frameworks in 2025-2026",
expected_output="Structured summary with pros/cons for each framework",
agent=researcher,
)
writing_task = Task(
description="Write a getting-started guide from the research",
expected_output="Markdown document with code examples, 500-800 words",
agent=writer,
context=[research_task],
)
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
process=Process.sequential,
verbose=True,
)
result = crew.kickoff()
Understand: roles, goals, backstories, sequential vs. hierarchical process types.
Day 3–4: Critique Loops and Hierarchical Process¶
Add a reviewer agent that evaluates the writer's output against defined criteria. Use hierarchical process mode with a Manager agent that delegates and reviews:
manager = Agent(
role="Editorial Manager",
goal="Ensure all output meets quality and accuracy standards",
backstory="Experienced editor with deep technical knowledge",
)
crew = Crew(
agents=[researcher, writer, reviewer],
tasks=[...],
process=Process.hierarchical,
manager_agent=manager,
)
The critique loop is one of the most valuable patterns in multi-agent systems — it dramatically improves output quality without increasing task complexity.
Day 5–6: Real-World Pipeline¶
Build a 4-agent content pipeline with parallel task execution:
Reference the Full Workflow page for the research workflow pattern. Add async_execution=True to tasks that can run in parallel. Measure wall-clock time improvement vs. sequential execution.
Day 7: Evaluation¶
Read the Evaluation page. Build a basic eval pipeline for your research crew:
- Deterministic checks: Does the output contain required sections? Is it within the expected length range? Does it cite sources?
- LLM-as-judge faithfulness check: Does the final document contain claims that aren't supported by the researcher's output?
def faithfulness_check(research: str, final_output: str) -> float:
prompt = f"""Rate 0-1 whether every claim in the output is supported by the research.
Research: {research}
Output: {final_output}
Score:"""
return float(llm.complete(prompt))
Milestone: A 4-agent pipeline with a basic eval suite.
Week 3: Graph-Based Orchestration — LangGraph¶
Goal: Master the most flexible orchestration framework.
LangGraph is harder to learn than CrewAI but more powerful. It's the right choice for production systems with complex branching logic, human-in-the-loop requirements, or fault recovery needs. See the LangGraph deep dive for full architectural details.
Day 1–2: LangGraph Fundamentals¶
Reference the DeepLearning.AI "AI Agents in LangGraph" course. Core concepts: StateGraph, typed state with TypedDict, conditional edges, graph compilation.
from langgraph.graph import StateGraph, MessagesState
from typing import TypedDict, Annotated
import operator
class ResearchState(TypedDict):
query: str
research: str
draft: str
feedback: str
iteration: int
graph = StateGraph(ResearchState)
graph.add_node("researcher", researcher_node)
graph.add_node("writer", writer_node)
graph.add_node("reviewer", reviewer_node)
graph.add_edge("researcher", "writer")
graph.add_conditional_edges(
"reviewer",
lambda state: "writer" if state["iteration"] < 3 else "end",
{"writer": "writer", "end": "__end__"},
)
graph.set_entry_point("researcher")
app = graph.compile()
Build a simple 3-node graph. Understand that the graph is compiled — this is what enables checkpointing and replay.
Day 3–4: Advanced Patterns¶
Implement the patterns that make LangGraph worth learning:
- Conditional routing: Branch based on output content (e.g., route to different analysts based on topic classification)
- Critique loops: Writer → Reviewer → Writer with an iteration counter to prevent infinite loops
- Parallel scatter-gather: Fan out to multiple researcher nodes, merge results with a reducer
# Parallel execution with reducers
class State(TypedDict):
results: Annotated[list, operator.add] # Reducer: append, don't overwrite
Day 5–6: Production Features¶
Three features that separate LangGraph prototypes from production systems:
Checkpointing — fault recovery without rerunning the entire graph:
from langgraph.checkpoint.memory import MemorySaver
checkpointer = MemorySaver()
app = graph.compile(checkpointer=checkpointer)
# Resume from any node after failure
result = app.invoke(input, config={"configurable": {"thread_id": "run-123"}})
Human-in-the-loop — pause at decision points for human review:
app = graph.compile(interrupt_before=["publish_node"])
# Graph pauses; human reviews state; then resume
app.invoke(None, config={"configurable": {"thread_id": "run-123"}})
LangSmith tracing — full execution visibility:
Reference workflow.md for the human review gate pattern used in the full research workflow.
Day 7: Framework Comparison¶
Take the 4-agent pipeline you built in Week 2 and implement the same workflow in LangGraph. Document the tradeoffs:
| Dimension | CrewAI | LangGraph |
|---|---|---|
| Setup time | Low | Medium |
| Flexibility | Medium | High |
| Debugging | Role-based logs | Full graph traces |
| Human-in-the-loop | Limited | First-class |
| Checkpointing | No | Yes |
| Learning curve | Low | Medium |
Reference the Internals page section on framework philosophy tradeoffs.
Milestone: A LangGraph pipeline with conditional routing, checkpointing, and human-in-the-loop.
Week 4: Software Engineering Agents & Advanced Topics¶
Goal: Apply multi-agent patterns to software engineering and explore the frontier.
Day 1–2: Coding Agents with OpenHands¶
Set up OpenHands locally — the most capable open-source software engineering agent:
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.20-nikolaik
docker run -it --rm \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.20-nikolaik \
-v /var/run/docker.sock:/var/run/docker.sock \
-p 3000:3000 \
ghcr.io/all-hands-ai/openhands:0.20
Review the SWE-agent ACI pattern — the Agent-Computer Interface is the key architectural innovation that makes software agents effective. Reference workflow.md for the full SWE workflow. Try fixing a bug in a test repository and observe how the agent navigates the codebase.
Also see the OpenHands deep dive for a comparison of its CodeAct architecture against SWE-agent.
Day 3–4: AutoGen for Conversational Multi-Agent Systems¶
Reference the DeepLearning.AI "AI Agentic Design Patterns with AutoGen" course and the AutoGen deep dive. Build a group chat with coder + reviewer + tester agents. Experiment with different turn-taking strategies: round-robin, selector (LLM-based routing), and custom speaker selection.
AutoGen's strength is conversational workflows where agents need to debate or negotiate — code review, architecture decisions, multi-perspective analysis.
Day 5–6: Multi-Model Tiering¶
One of the highest-leverage production techniques: route different subtasks to models matched to their complexity and cost profile.
| Tier | Model | Use Case | Typical Cost |
|---|---|---|---|
| Fast/cheap | GPT-4o-mini, Haiku | Triage, classification, simple extraction | ~$0.15/M tokens |
| Capable | GPT-4o, Sonnet | Reasoning, drafting, code generation | ~$3/M tokens |
| Local | Llama 3.2 via Ollama | Sensitive data, high-volume tasks | $0 |
Implement tiering in LangGraph using conditional routing based on task classification:
def route_by_complexity(state):
if state["task_type"] == "classification":
return "cheap_model_node"
elif state["task_type"] == "reasoning":
return "capable_model_node"
else:
return "local_model_node"
Measure cost and latency differences across tiers. Reference the Internals page orchestration tax section — every agent hop adds latency and cost.
Day 7: Explore the Frontier — Agentic Security Research¶
Multi-agent systems are rapidly expanding into security research, a domain with natural parallels to the plan-act-observe loop that agents excel at. Explore these three frontiers:
Vulnerability Discovery with SWE-agent
SWE-agent's Agent-Computer Interface (deep-dives/swe-agent.md) was designed for bug fixing, but the same architecture applies to offensive security. The repo explicitly lists cybersecurity as a supported use case. Experiment with pointing SWE-agent at a deliberately vulnerable codebase (e.g., OWASP Juice Shop or DVWA) and observe how the ACI navigates code to locate weaknesses.
SWE-smith (NeurIPS 2025) provides a pipeline for synthesizing bugs in real codebases — the inverse of this process is automated vulnerability generation.
Multi-Agent Security Operations
Read "Multi-Agent LLM Orchestration Achieves Deterministic, High-Quality Decision Support for Incident Response" (Drammeh, 2025). In 348 controlled trials, multi-agent orchestration achieved a 100% actionable recommendation rate vs. 1.7% for single-agent approaches — an 80x improvement in action specificity with zero quality variance.
Security Data Pipeline Platforms are evolving into agentic systems where autonomous agents act as data engineers — generating parsing rules for unseen log formats, executing Sigma detection rules within the pipeline layer, and orchestrating extraction-transformation-loading alongside threat hunting.
Hardware and Firmware Analysis Agents — Emerging Frontier
Multi-agent LLM frameworks are beginning to appear in hardware design space exploration. A 2025 paper demonstrates specialized LLM agents for autonomous driving system DSE.
The same multi-agent patterns apply to firmware analysis: one agent for binary disassembly and function identification, another for control flow analysis, a third for vulnerability pattern matching, and an orchestrator to synthesize findings.
Cross-domain network orchestration using multi-agent workflows has been demonstrated across IP, optical, and robotic domains (arXiv:2410.10831).
Recommended reading:
- Live-SWE-agent — self-evolving agents that modify their own scaffold at runtime
- LEGOMem — procedural memory for multi-agent systems
- SagaLLM — transactional guarantees for multi-agent workflows
Milestone: Hands-on experience with 4+ frameworks, understanding of production patterns, and a clear direction for further specialization.
Summary Timeline¶
| Week | Focus | Framework | Key Deliverable |
|---|---|---|---|
| 0 | Production systems — observe before you build | None (observation) | Written comparison of Claude Code vs. Perplexity Computer, mapped to internals concepts |
| 1 | Foundations — raw mechanics | None (raw API) | Single agent with tools, memory, RAG |
| 2 | Multi-agent pipelines | CrewAI | 4-agent pipeline with eval suite |
| 3 | Graph-based orchestration | LangGraph | Pipeline with routing, checkpointing, HITL |
| 4 | SWE agents + advanced topics | OpenHands, AutoGen | Hands-on with 4+ frameworks |
Recommended Resources¶
Courses¶
| Course | Provider | Length | Cost | Link |
|---|---|---|---|---|
| Multi AI Agent Systems with crewAI | DeepLearning.AI | 2h41m | Free | learn.deeplearning.ai |
| AI Agents in LangGraph | DeepLearning.AI | 1h32m | Free | learn.deeplearning.ai |
| AI Agentic Design Patterns with AutoGen | DeepLearning.AI | 1h25m | Free | learn.deeplearning.ai |
| Building Agentic RAG with LlamaIndex | DeepLearning.AI | 44m | Free | learn.deeplearning.ai |
| AI Agents Course | HuggingFace Learn | Self-paced | Free | huggingface.co/learn/agents-course |
| AI Agents for Beginners | Microsoft (GitHub) | 18 lessons | Free | github.com/microsoft/ai-agents-for-beginners |
| LLM Agents MOOC (Fall 2024) | UC Berkeley | ~12 lectures | Free | llmagents-learning.org |
| LLM Agents MOOC (Fall 2025) | UC Berkeley | ~12 lectures | Free | llmagents-learning.org |
| Agentic AI Engineering Bootcamp | Maven | 6 weeks | $1,200 | maven.com/stemplicity/become-an-agentic-ai-engineer |
Guides¶
- Lilian Weng — "LLM Powered Autonomous Agents" — canonical conceptual reference; read this first
- Anthropic — "Building Effective Agents" — the practitioner's guide to when and how to use agents
- Anthropic — "How We Built Our Multi-Agent Research System" — rare production case study from a frontier lab
- OpenAI — "A Practical Guide to Building Agents" — covers the Agents SDK and orchestration patterns
Community¶
- r/AI_Agents — active community, good for finding new frameworks and production war stories
- r/LLMDevs — broader LLM engineering discussions
- LangChain Discord — LangGraph support and announcements
- CrewAI Discord — CrewAI community and framework updates
Papers¶
See meta.md for the full list of 20+ academic papers referenced in this research, with DOIs and links to code repositories.