Changelog & Sources¶
Changelog¶
v1.3.0 — March 23, 2026¶
- New section: Production Systems — reverse-engineers two closed-source agentic systems using a five-layer methodology (Observable Behavior → Inferred Architecture → Published Information → OSS Analog Mapping → DIY Replication Path).
- New page: Claude Code (
production-systems/claude-code.md) — deep dive into Anthropic's agentic coding assistant. Covers the single-agent + tool loop architecture, 17 exposed tools, extended thinking, context compaction, CLAUDE.md convention, SWE-bench benchmarks, and DIY replication with open-source models (Qwen2.5-Coder, DeepSeek, Devstral). - New page: Perplexity Computer (
production-systems/perplexity-computer.md) — deep dive into Perplexity's multi-agent research and task assistant. Covers the orchestrator + subagent architecture, Firecracker VM isolation, integrated search pipeline (200B+ URLs, Vespa AI), skill system, connector ecosystem, and DIY replication with LangGraph + open-source search. - New page: Production Systems Overview (
production-systems/index.md) — methodology explanation and system comparison table. - Updated Learning Path (
learning-path.md) — added Week 0 ("Production Systems — What You're Actually Using") before existing weeks. Curriculum is now 5 weeks. Existing Weeks 1–4 unchanged. - Updated navigation: added Production Systems section between Internals and Getting Started.
- Updated Executive Overview in
index.mdwith reference to Production Systems. - Sources: 50+ new web sources, engineering blogs, system prompt analyses, and benchmark data referenced.
v1.0.0 — March 23, 2026¶
- Initial research compilation covering 10 frameworks and tools across research/orchestration and software development agents.
- Deep dives for: LangGraph, AutoGen, CrewAI, OpenAI Agents SDK, Agency Swarm, SWE-agent, OpenHands, Aider, Patchwork, Devin.
- Comparison tables, decision matrix, and maturity assessment.
- 4-week learning path for engineers.
- Sources: 15+ academic papers, 10+ GitHub repositories, industry reports, and framework documentation.
v1.1.0 — March 23, 2026¶
- Replaced OpenAI Swarm deep dive with OpenAI Agents SDK deep dive (Swarm's production successor).
- Added
site_urltomkdocs.ymlfor canonical URLs. - Created root
README.mdwith project description and live-docs link. - Expanded Day 7 of the learning path with agentic security research use cases: vulnerability discovery (SWE-agent), multi-agent SOC operations (arXiv:2511.15755), and hardware/firmware analysis agents.
- Updated comparison tables and decision matrix to reflect Agents SDK.
v1.2.0 — March 23, 2026¶
- New page: Internals (
internals.md) — raw agent loop, tool calling wire format, state serialization patterns (full replay vs. typed state vs. role-scoped memory), handoff mechanics, framework philosophy tradeoffs, and the orchestration tax with empirical data. - New page: Evaluation (
evals.md) — why evals are hard, what to evaluate (5 dimensions), evaluation approaches (LLM-as-judge, deterministic checks, trajectory eval, human eval), benchmark landscape (SWE-bench, GAIA, HumanEval, AgentBench, WebArena, ToolBench, BFCL, TAU-bench), eval tools comparison, and a minimal 3-layer eval setup with code. - New page: Full Workflow (
workflow.md) — end-to-end annotated walkthroughs for research and software development workflows with concrete data payloads, failure tables, and Mermaid comparison diagram. - Rewrote Learning Path (
learning-path.md) — surveyed 11 external resources (DeepLearning.AI, HuggingFace, Microsoft GitHub, UC Berkeley, Maven, Anthropic, Lilian Weng), restructured Week 1 to start with raw internals before frameworks, added eval day (Week 2 Day 7), added framework comparison day (Week 3 Day 7), preserved v1.1.0 Day 7 security frontier content. - Updated navigation: added Internals, Full Workflow, and Evaluation to
mkdocs.ymlnav. - Updated Executive Overview in
index.mdwith references to new Internals and Evaluation pages. - Sources: 40+ new web sources, 5+ new academic papers referenced.
Academic Sources¶
| # | Paper | Authors | Venue / Year | DOI / Link | Code Available |
|---|---|---|---|---|---|
| 1 | SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering | Yang et al. | NeurIPS 2024 | arXiv:2405.15793 | Yes — GitHub |
| 2 | Agentless: Demystifying LLM-based Software Engineering Agents | Xia et al. | ACM 2025 | DOI:10.1145/3715754 | Yes — GitHub |
| 3 | Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly? | Xia et al. | arXiv 2025 | arXiv:2511.13646 | Yes — GitHub |
| 4 | HALO: Hierarchical Autonomous Logic-Oriented Orchestration | Hou et al. | arXiv 2025 | arXiv:2505.13516 | Yes — GitHub |
| 5 | SagaLLM: Context Management, Validation, and Transaction Guarantees | Chang & Geng | VLDB 2025 | DOI:10.14778/3750601.3750611 | No |
| 6 | LEGOMem: Modular Procedural Memory for Multi-agent LLM Systems | Han et al. | arXiv 2025 | arXiv:2510.04851 | No |
| 7 | OSC: Cognitive Orchestration through Dynamic Knowledge Alignment | Zhang et al. | arXiv 2025 | arXiv:2509.04876 | No |
| 8 | MASAI: Modular Architecture for Software-engineering AI Agents | Arora et al. | arXiv 2024 | arXiv:2406.11638 | No |
| 9 | ChatDev: Communicative Agents for Software Development | Qian et al. | arXiv 2024 | arXiv:2307.07924 | Yes — GitHub |
| 10 | HyperAgent: Generalist Software Engineering Agents | Phan et al. | arXiv 2024 | arXiv:2409.16299 | No |
| 11 | The OpenHands Software Agent SDK | OpenHands Team | arXiv 2025 | arXiv:2511.03690 | Yes — GitHub |
| 12 | SWE-smith: Scaling Data for Software Engineering Agents | Yang et al. | NeurIPS 2025 | NeurIPS Poster | Yes — swesmith.com |
| 13 | SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution | Li et al. | arXiv 2025 | arXiv:2507.23348 | No |
| 14 | SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development | Du et al. | arXiv 2025 | arXiv:2505.16975 | Yes — GitHub |
| 15 | Large Language Model-Based Agents for Software Engineering: A Survey | Liu et al. | arXiv 2024 | arXiv:2409.02977 | No |
| 16 | Multi-Agent Real-time Chat Orchestration (MARCO) | Shrimal et al. | arXiv 2024 | arXiv:2410.21784 | No |
| 17 | Multi-Agent LLM Orchestration Achieves Deterministic Decision Support for Incident Response | Drammeh | arXiv 2025 | arXiv:2511.15755 | Yes |
| 18 | OmniNova: A General Multimodal Agent Framework | Du | arXiv 2025 | arXiv:2503.20028 | No |
| 19 | A Multi-Agent LLM Framework for Design Space Exploration in Autonomous Driving | Shih et al. | arXiv 2025 | arXiv:2512.08476 | No |
| 20 | VoCare AI: A Multi-Agent LLM Workflow for Improved Clinic Operational Efficiency | Han et al. | IEEE TENCON 2025 | IEEE:11375163 | Yes — GitHub |
GitHub Repositories Referenced¶
| Repository | Stars | Language | License | Last Active |
|---|---|---|---|---|
| langchain-ai/langgraph | 27,241 | Python | MIT | 2026-03-23 |
| microsoft/autogen | 56,069 | Python | CC-BY-4.0 | 2026-03-21 |
| crewAIInc/crewAI | 46,965 | Python | MIT | 2026-03-23 |
| openai/swarm | 21,211 | Python | MIT | 2025-03-11 |
| VRSEN/agency-swarm | 4,108 | Python | MIT | 2026-03-23 |
| SWE-agent/SWE-agent | 18,821 | Python | MIT | 2026-03-23 |
| All-Hands-AI/OpenHands | 69,606 | Python | Custom | 2026-03-23 |
| Aider-AI/aider | 42,286 | Python | Apache-2.0 | 2026-03-17 |
| openai/openai-agents-python | 20,220 | Python | MIT | 2026-03-23 |
| patched-codes/patchwork | 1,548 | Python | AGPL-3.0 | 2025-04-18 |
| OpenAutoCoder/Agentless | 2,022 | Python | MIT | 2024-12-22 |
| OpenAutoCoder/live-swe-agent | 339 | Python | MIT | 2026-01-19 |
Web Sources¶
- AutoGen — Microsoft Research
- LangGraph: Agent Orchestration Framework
- CrewAI: The Leading Multi-Agent Platform
- Devin's 2025 Performance Review — Cognition Labs
- Goldman Sachs Deploys Devin — IBM Think
- OpenHands: The Open Platform for Cloud Coding Agents
- Aider: AI Pair Programming in Your Terminal
- Patched.codes: Production-ready Agentic Workflows
- CrewAI vs LangGraph vs AutoGen — DataCamp
- Best Multi-Agent Frameworks in 2026 — GuruSup
- Best AI Agent Frameworks 2025 — Maxim AI
- CodeAct Agent Framework — Emergent Mind
- OpenAI Agents SDK Documentation
- OpenAI Agents SDK — Mem0 Review
- OpenAI Agents SDK — 0DeepResearch Analysis
- OpenAI Agents SDK — Fast.io Comprehensive Guide
- OpenAI Swarm Announcement — Campus Technology
- How to Learn Agentic AI in 2025 — The Hustling Engineer
- The Ultimate AI Agents Roadmap 2025 — The AI Corner
- OpenAI Function Calling Documentation
- Build an AI Agent Loop in 50 Lines of Python — DEV Community
- The Agent Execution Loop — Victor Dibia
- ReAct Agent with OpenAI Function Calling — Peter Roelants
- Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Systems — arXiv:2603.04474
- Why Your Multi-Agent System is Failing — Towards Data Science
- Why Bad Agentic AI Latency Costs You — Parloa
- OpenAI Agents SDK Handoffs Documentation
- OpenAI Agents SDK Sessions Documentation
- CrewAI Memory Documentation
- LangGraph Persistence Documentation
- Comparing Open-Source AI Agent Frameworks — Langfuse
- Why We No Longer Evaluate SWE-bench Verified — OpenAI
- AI Agent Evaluation Benchmarks vs Production — Chanl AI
- Benchmark Reliability for Enterprise Agents — simmering.dev
- GAIA Leaderboard — HAL Princeton
- LLM-as-Judge Best Practices — Monte Carlo Data
- Deterministic Assertion Library — Promptfoo
- Trajectory Evaluation — LangChain
- Hallucination Detection with LLM Judge — Datadog
- Inspect AI — UK AI Security Institute
- Inside the Architecture of a Deep Research Agent — Egnyte
- VMAO: Verified Multi-Agent Orchestration — arXiv:2603.11445
- SWE-agent ACI Documentation
- AI Agent Design Patterns — Azure Architecture Center
- Human-in-the-Loop Agents with interrupt — LangChain Blog
- LLM Powered Autonomous Agents — Lilian Weng
- Building Effective Agents — Anthropic
- How We Built Our Multi-Agent Research System — Anthropic
- Multi AI Agent Systems with crewAI — DeepLearning.AI
- AI Agents in LangGraph — DeepLearning.AI
- AI Agentic Design Patterns with AutoGen — DeepLearning.AI
- Building Agentic RAG with LlamaIndex — DeepLearning.AI
- HuggingFace AI Agents Course
- Microsoft AI Agents for Beginners — GitHub
- UC Berkeley LLM Agents MOOC (Fall 2024)
- UC Berkeley Agentic AI (Fall 2025)
- Agentic AI Engineering Bootcamp — Maven
All star counts and last-active dates were collected on March 23, 2026 via the GitHub API.