PRD-102 — Coordinator Architecture
Version: 1.0 Type: Research + Design Status: Complete — Ready for Peer Review Priority: P0 Dependencies: PRD-100 (Research Master), PRD-101 (Mission Schema) Blocks: PRD-103 (Verification), PRD-104 (Ephemeral Agents), PRD-107 (Context Interface) Author: Gerard Kavanagh + Claude Date: 2026-03-15
1. Problem Statement
1.1 The Gap
Automatos has no coordination layer. The closest existing component is heartbeat_service.py:_orchestrator_tick_llm() (line ~382), which runs a 5-iteration tool loop with an 8,000-token budget and dispatcher_only tools — it does health checks and reporting, not goal decomposition or agent dispatch.
The platform can execute single-agent tasks beautifully. What it cannot do is take a complex goal — "Research EU AI Act compliance for our product" — and decompose it into subtasks, assign agents, execute with dependency ordering, verify outputs, handle failures, and track everything on the board.
1.2 What Exists vs What's Missing
_orchestrator_tick_llm() — LLM tool loop for workspace health checks
Goal decomposition: breaking complex goals into 3-20 subtasks with dependency edges
AgentFactory.execute_with_prompt() — per-agent execution with 10-iteration tool loop
Parallel dispatch: running independent subtasks concurrently via asyncio.gather
AgentCommunicationProtocol — Redis pub/sub messaging (built, not wired to heartbeat)
Cross-task data flow: passing Task 1's output as input to Task 2
BoardTask with assigned_agent_id — manual task assignment
Automatic agent selection: matching task requirements to agent capabilities
SharedContextManager — in-process shared state with Redis backing (2h TTL)
Mission state machine: tracking plan → execute → verify → review lifecycle
TaskReconciler — stall detection for recipe_executions only
Mission-scoped stall detection, dependency-aware retry, escalation on failure
ContextMode.HEARTBEAT_ORCHESTRATOR — 8k tokens, 5 sections, dispatcher tools
ContextMode.COORDINATOR — full tools, mission context section, no token cap
1.3 What This PRD Delivers
The architecture for a CoordinatorService that:
Takes a natural language goal + autonomy settings
Decomposes it into a dependency graph of 3-20 tasks (using PRD-101's
orchestration_tasksschema)Assigns each task to a roster agent or contractor agent
Dispatches tasks respecting dependency ordering
Monitors execution, handles failures (continuation vs retry)
Triggers verification (PRD-103) and human review gates
Detects mission completion and offers "save as routine"
1.4 What This PRD Does NOT Cover
How verification/scoring works
PRD-103 (Verification & Quality)
Ephemeral "contractor" agent lifecycle
PRD-104 (Ephemeral Agents & Model Selection)
Budget enforcement and approval gates
PRD-105 (Budget & Governance)
Outcome telemetry queries and learning
PRD-106 (Outcome Telemetry)
Context interface abstraction for Phase 3
PRD-107 (Context Interface Abstraction)
Neural field prototype
PRD-108 (Memory Field Prototype)
SQL DDL and Alembic migrations
PRD-101 (already delivered) and PRD-82A (implementation)
1.5 Design Philosophy
Four principles guided every decision:
Stateless coordinator, DB-authoritative. The coordinator holds no in-process state. Every tick reads from
orchestration_runs/orchestration_tasksand writes back. Any coordinator instance can take over after a crash. This is the Airflow scheduling pattern validated at massive scale.Two-phase tick (Symphony pattern). Every coordinator cycle runs dispatch (find ready tasks, assign agents) then reconcile (check running tasks for stalls, completions, failures). Clean separation, predictable behavior.
HTN-inspired hybrid planning. Template library for known mission types + LLM for novel goals + structural validation for all plans. Never pure LLM (non-deterministic), never pure rules (brittle).
BDI intention commitment. Once committed to a plan, the coordinator does not replan on every tick. Replanning triggers are explicit: task failure after max retries, user sends new instructions, budget warning. This prevents thrashing.
2. Prior Art: Coordination Patterns
2.1 Overview
Seven systems and architectural patterns were studied to inform the coordinator design. Each addresses a different facet of the coordination problem: how to plan, how to track state, how to handle failure, how to involve humans.
2.2 Comparison Table
Coordination model
Shared state + event-driven knowledge source activation
Hierarchical decomposition of compound tasks into primitives
Belief-Desire-Intention deliberation cycle
Reconciliation loop (dispatch + reconcile) with policy-as-code
Sequential or hierarchical (LLM-as-manager) process
Turn-based group chat with LLM speaker selection
Typed state graph with deterministic conditional edges
State management
Blackboard data structure (shared, hierarchical)
World state updated at each primitive step
Belief base (agent's model of world)
External tracker (Linear) + workspace filesystem
In-memory crew state; Flows add SQLite persistence
In-memory message list (ephemeral)
Typed schema + pluggable checkpointers (Postgres, SQLite)
Planning approach
Opportunistic — no predetermined path
Method library for known decompositions; backtracking for alternatives
Plan library indexed by triggering events; LLM can generate plans dynamically
No planning — work comes from external tracker
LLM-as-manager in hierarchical mode; AgentPlanner pre-generates steps
No planning — conversation-driven emergence
Graph defined at compile time; conditional routing for branching
Failure handling
Knowledge sources produce competing hypotheses; control resolves conflicts
Backtrack and try alternative method
Plan failure propagation with alternative plan selection; bold/cautious reconsideration
Continuation (1s) vs retry (exponential backoff); workspace preserved
Guardrail retry loop (max 3); soft failure — proceeds with bad output
No built-in failure handling
Checkpoint enables resume from last successful step
Human review
Not built-in
Not built-in
Not built-in (agent is autonomous)
PR review is the human gate; no mid-execution review
human_input=True per task; @human_feedback in Flows
human_input_mode on UserProxyAgent
interrupt() pauses execution; resume with human input
2.3 System-by-System Analysis
Blackboard Architecture (Nii 1986; LbMAS, arxiv:2507.01701, 2025)
The blackboard pattern coordinates multiple "knowledge sources" (KS) through a shared workspace. Each KS has activation preconditions — it fires when data it can process appears on the blackboard. A control component resolves conflicts when multiple KS are eligible.
LbMAS (2025) modernized this for LLM multi-agent systems and demonstrated a 5% improvement over static agent configurations. The key insight: event-driven activation (agent fires when its dependencies appear on the shared state) outperforms polling.
What we adopt: The mission state object (orchestration_runs + orchestration_tasks from PRD-101) acts as a blackboard. Agents write results to it; the coordinator reads it to decide next actions. Task activation is dependency-driven — a task becomes queued when all its parent dependencies reach terminal success state.
What we reject: The BB1 control blackboard (a second blackboard to manage the first — overkill for 3-20 tasks). Distributed blackboard partitioning (premature for our scale).
HTN Planning (Nau et al. JAIR 2003; ChatHTN, arxiv:2505.11814, 2025; Hsiao et al., arxiv:2511.07568, 2025)
Hierarchical Task Network planning decomposes compound tasks into primitive actions using a library of decomposition methods. SHOP2 (Nau et al.) proved this formally correct for forward-search decomposition.
ChatHTN (2025) proved that a hybrid approach — symbolic HTN structure with LLM filling in the gaps — is provably sound. The LLM generates decomposition candidates; the HTN validator ensures structural correctness (no cycles, valid dependencies, feasible agent assignments).
Hsiao et al. (2025) showed that hand-coded HTN structures enable 20-70B parameter models to outperform 120B baselines. Structure improves LLM planning quality. This means our decomposition templates aren't just efficiency shortcuts — they make planning better.
What we adopt: Template library for known mission types (the "methods" in HTN terminology). LLM generates decomposition for novel goals. All plans validated structurally before execution — DAG check, agent availability, budget estimate. This is the ChatHTN hybrid.
What we reject: Full formal HTN domain models (too rigid for natural language goals). Requiring hand-authored methods for every decomposition (LLM handles novel cases).
BDI Agents (Rao & Georgeff, ICMAS 1995; ChatBDI, AAMAS 2025)
Belief-Desire-Intention architecture models rational agent behavior. The critical insight for coordinators: intention commitment. Once an agent commits to an intention (plan), it should not reconsider on every deliberation cycle. Kinny & Georgeff proved that bold agents (reconsider rarely) outperform cautious agents (reconsider constantly) in stable environments.
ChatBDI (2025) adapted BDI for LLM agents, showing that the intention stack prevents the "thrashing" problem where agents constantly replan instead of executing.
What we adopt: The bold/cautious spectrum maps directly to the autonomy toggle. approve mode = cautious (human gates at plan approval and result review). autonomous mode = bolder (replan only on failure). In both cases, the coordinator commits to a plan and does not replan on every tick — only on explicit triggers (Section 5.5).
What we reject: The full BDI deliberation cycle (belief revision, desire filtering, plan selection). Our coordinator is simpler — it has one goal (the mission), one plan (the decomposition), and reconsiders only when reality diverges from the plan.
Symphony (OpenAI)
Symphony's defining contribution is the two-phase reconciliation tick:
Dispatch phase: Find tasks whose dependencies are met, claim them, assign agents
Reconcile phase: Check running tasks for stalls, completions, external state changes
This separation is cleaner than a single monolithic loop because dispatch decisions don't interleave with reconciliation decisions. Each phase has a clear contract: dispatch reads pending tasks, reconcile reads running tasks.
Symphony's continuation vs retry distinction (Section 3.5 of PRD-101) is adopted wholesale. A clean agent exit → continuation (1s delay, same workspace). A failure → retry (exponential backoff). This prevents backoff on normal multi-turn agent work while protecting against failure loops.
What we adopt: Two-phase tick. Continuation vs retry. WORKFLOW.md-style state-specific coordinator instructions (the coordinator prompt changes based on mission state). Stall detection via elapsed time since last event.
What we reject: Linear-as-coordinator (we have our own board). In-memory-only state (we need persistent mission history). Single-agent-per-task constraint (we support contractor fan-out within PRD-104).
CrewAI
CrewAI's context=[task_a, task_b] dependency declaration maps directly to PRD-101's orchestration_task_dependencies join table. The explicit, declarative, queryable dependency model is what we need.
The guardrail validation pattern — a function that checks output before accepting it — is a simplified version of what PRD-103 (Verification) delivers.
What we adopt: Explicit dependency declarations. The async_execution + join pattern for parallel tasks.
What we reject: LLM-as-manager for agent selection (non-deterministic, untestable). Soft guardrail failure mode (bad output proceeds — unacceptable for missions).
AutoGen
AutoGen's Swarm handoff pattern defines priority ordering for task transitions: tool-returned agent → OnCondition → AFTER_WORK fallback. The context_variables dict as shared mutable state across agents maps to our mission-scoped context.
What we adopt: Priority ordering for coordinator task transitions (dependency-resolved tasks first, then stalled task recovery, then budget checks). Shared mutable context per mission (via SharedContextManager in Phase 2, neural field in Phase 3).
What we reject: LLM-based speaker selection per turn (expensive, non-deterministic). Magic-string termination conditions. Ephemeral state.
LangGraph
LangGraph's typed state schema with checkpoint-per-step is the closest to our DB-authoritative model. The interrupt() mechanism for human review maps to our awaiting_approval and awaiting_human states.
What we adopt: Typed state schema (our orchestration_runs/orchestration_tasks tables). Checkpoint per state transition (our dual-write to event log). interrupt() for human review (our awaiting_human task state). Send API for dynamic parallelism (our coordinator dispatching multiple tasks concurrently).
What we reject: Full boilerplate burden of graph compilation. Static graph definition at compile time (our plans are generated per-mission). LangSmith vendor lock-in.
2.4 Architectural Decisions Summary
Tick structure
Two-phase: dispatch + reconcile
Symphony
Clean separation; each phase has a clear contract
Planning
HTN-inspired hybrid: templates + LLM + validation
ChatHTN, Hsiao et al.
Templates improve quality; LLM handles novel goals; validation catches structural errors
State authority
DB-authoritative, stateless coordinator
Airflow, LangGraph
Crash-safe; any instance can take over
Replanning policy
BDI intention commitment — replan on explicit triggers only
Rao & Georgeff, ChatBDI
Prevents thrashing; matches autonomy toggle
Mission state
Blackboard pattern — shared state with event-driven activation
Nii, LbMAS
Tasks activate when dependencies met; coordinator reads blackboard each tick
Dependencies
Explicit join table, queryable both directions
CrewAI, Airflow
Declarative, queryable, validates DAG structure
Failure handling
Continuation vs retry + infrastructure/quality failure classification
Symphony, Prefect
Different strategies for different failure types
Human review
Interrupt-based: plan approval + result review
LangGraph, Symphony
Two human gates; configurable per autonomy level
Agent selection
Deterministic scoring, not LLM-based
(Anti-pattern from CrewAI)
Reproducible, testable, debuggable
3. CoordinatorService Architecture
3.1 Module Hierarchy
Rationale: coordinator_service.py lives in services/ alongside heartbeat_service.py and task_reconciler.py — it's a service that registers its tick on the shared scheduler. Supporting classes live in modules/coordination/ because they encapsulate domain logic (planning, dispatching, reconciling) that doesn't belong in the service entry point.
3.2 Class Diagram
3.3 Public Interface
4. Coordinator Tick Algorithm
4.1 Overview
The coordinator tick runs on a configurable interval (default: 5 seconds, matching Symphony's default). Each tick processes ALL active missions in the workspace, not just one.
4.2 Phase A: Dispatch
4.3 Phase B: Reconcile
4.4 Dependency Resolution
When a task completes, the coordinator must check whether downstream tasks are now unblocked. This is event-driven, not polling-based (blackboard pattern).
4.5 Stall Detection
4.6 Concurrency Safety
Multiple coordinator ticks could overlap if a tick takes longer than the interval. Two tasks could complete simultaneously, both triggering dependency resolution for the same downstream task.
Solution: Optimistic locking with version column.
PRD-101 defines version on orchestration_tasks. Every state transition includes WHERE version = :expected_version. If the version changed (another process already transitioned the task), the UPDATE affects 0 rows and the transition is skipped.
5. Plan Decomposition
5.1 Decomposition Pipeline
5.2 MissionPlanner Interface
5.3 Decomposition Templates
Templates are Python dataclasses registered in a template library. They provide structural scaffolding that the LLM customizes with mission-specific details.
5.4 LLM Decomposition Prompt
When no template matches, the coordinator calls an LLM to generate the decomposition. The prompt is structured to produce valid JSON matching the TaskSpec schema.
Rules
Tasks MUST form a valid DAG (no circular dependencies)
Task 1 should have no dependencies (the starting point)
Every task needs at least one success criterion with must_pass=true
Use task_type to guide model selection (research=mid-tier, review=different-family)
Keep task count proportional to goal complexity (simple goal = 3-4 tasks)
Independent tasks CAN run in parallel (no dependency edge between them)
Estimated costs must sum to less than the budget constraint """
6. Agent Assignment
6.1 Assignment Strategy
For each task in the plan, the coordinator assigns an agent using a deterministic scoring algorithm — not LLM-based selection (CrewAI's approach, which is non-deterministic and untestable).
Roster match
Task requirements match a roster agent's skills/tools. Preferred — agent has memory, personality, history.
Contractor spawn
No roster agent scores above threshold, or task needs a specialist model not available on roster. Ephemeral — mission-scoped lifecycle (PRD-104).
User override
In approve mode, user can reassign agents before execution starts.
6.2 Scoring Algorithm
6.3 Dispatch Mechanism
The coordinator dispatches tasks directly via AgentFactory.execute_with_prompt(). It does NOT create a BoardTask and wait for the agent's heartbeat tick to pick it up. Direct dispatch gives the coordinator control over timing, retry, and result collection.
A BoardTask is created for visibility (kanban tracking) but is NOT the dispatch mechanism.
7. ContextMode.COORDINATOR
7.1 New Context Mode Definition
The coordinator needs its own context mode to get mission-aware context when making planning and monitoring decisions.
7.2 New Sections
MissionContextSection
AgentRosterSection
7.3 Files That Must Be Modified
orchestrator/modules/context/modes.py
Add COORDINATOR and VERIFIER to ContextMode enum and MODE_CONFIGS
orchestrator/modules/context/service.py
Register MissionContextSection and AgentRosterSection section renderers
orchestrator/modules/context/sections/
New files: mission_context.py, agent_roster.py
8. Failure Handling
8.1 Decision Tree
8.2 Retry-with-Feedback Protocol
When verification fails but retries remain, the verifier's reasoning is fed back to the executing agent. This is a continuation with guidance, not a blind retry.
8.3 Escalation Strategy
When a task fails after max retries:
1. Different agent
Reassign to next-best-scoring agent
Default
2. Different model
Keep same agent, switch to higher-tier model
If agent-specific issue unlikely
3. Coordinator replanning
Remove failed task, find alternative path
If task is on critical path
4. Human escalation
Flag for human review with full context
All automated options exhausted
5. Mission failure
Mark run as failed, cancel remaining tasks
Human rejects or no alternatives
9. Replanning Specification
9.1 Triggers
Task fails after max retries + all escalations
Replan: find alternative path or substitute task
Completed tasks immutable
User sends new instructions mid-mission
Replan: incorporate new requirements
Completed tasks immutable
Budget warning (>80% spent)
Replan: cut optional tasks, use cheaper models
Running tasks continue
Verification rejects task + coordinator determines task design is wrong
Replan: redesign the task, not just retry
Only pending/queued tasks modified
Agent discovers new information requiring additional work
Replan: add tasks dynamically
New tasks get new task_order values
9.2 Replanning Constraints
Completed tasks are immutable. Their outputs are already consumed by downstream tasks. Removing them would invalidate the dependency graph.
Running tasks continue. Only cancel running tasks if explicitly directed by human or if budget is exhausted.
Plan version increments. Every replan bumps
orchestration_runs.plan_versionfor audit trail.New tasks get the next available task_order. No renumbering of existing tasks.
Dependency graph must remain a valid DAG. Validated after every replan.
9.3 Replanning LLM Prompt
10. API Endpoints
10.1 Mission CRUD
10.2 Mission Lifecycle
10.3 Task Operations
10.4 Request/Response Examples
Create Mission
Plan Ready (webhook or poll)
11. Sequence Diagrams
11.1 Happy Path: 3-Task Sequential Mission
11.2 Mission with Task Failure and Retry
11.3 Mission with Human Review Rejection
12. Integration Points
12.1 Existing Components Used
AgentFactory.execute_with_prompt()
Dispatches each task to its assigned agent
None — accepts AgentRuntime already
ContextService.build_context()
Coordinator builds its own context with ContextMode.COORDINATOR
Add new mode + 2 new sections
get_tools_for_agent() (tool_router.py:~140)
Resolves tools for task agents
None for roster agents; PRD-104 adds explicit_tools param for contractors
UnifiedToolExecutor.execute_tool()
Coordinator's own tool loop for mission management
None
BoardTask model (core/models/board.py)
Creates board tasks with source_type='orchestration' for kanban visibility
None — existing model supports this
TaskReconciler (services/task_reconciler.py)
Extended to cover orchestration_tasks alongside recipe_executions
Add mission task query to _tick()
SharedContextManager (inter_agent.py)
Stores mission-scoped shared context for cross-task data flow
None — used via SharedContextPort (PRD-107)
UnifiedScheduler
Registers coordinator tick alongside heartbeat tick
None — additive registration
workflow_recipes table
"Save as routine" converts mission structure to recipe
Conversion function (new)
12.2 New Components Introduced
CoordinatorService
Main service: tick loop, plan generation, dispatch, reconciliation
orchestrator/services/coordinator_service.py
MissionPlanner
LLM-powered decomposition: goal → task graph
orchestrator/modules/coordination/planner.py
MissionDispatcher
Resolves ready tasks, assigns agents, launches execution
orchestrator/modules/coordination/dispatcher.py
MissionReconciler
Stall detection, completion handling, failure escalation
orchestrator/modules/coordination/reconciler.py
AgentMatcher
Deterministic agent-to-task scoring
orchestrator/modules/coordination/agent_matcher.py
MissionContextSection
New context section: mission state for coordinator
orchestrator/modules/context/sections/mission_context.py
AgentRosterSection
New context section: available agents for coordinator
orchestrator/modules/context/sections/agent_roster.py
platform_create_mission
Platform tool: create mission from chat
platform_actions.py + platform_executor.py
platform_approve_plan
Platform tool: approve plan from chat
platform_actions.py + platform_executor.py
platform_mission_status
Platform tool: check mission progress from chat
platform_actions.py + platform_executor.py
API router
REST endpoints for mission CRUD + lifecycle
orchestrator/api/missions.py
12.3 Board Task Bridge
12.4 Save as Routine Conversion
13. Acceptance Criteria
Must Have
Should Have
Nice to Have
14. Risk Register
1
Coordinator complexity — too many responsibilities in one service
High
Medium
Split into focused classes: Planner, Dispatcher, Reconciler. Coordinator is the orchestrator, not the doer.
2
LLM planning reliability — decomposition quality varies by model and prompt
High
High
Template library for common patterns (ChatHTN hybrid). Validate all plans structurally. Benchmark decomposition quality across models.
3
Cost of coordination calls — coordinator LLM calls add overhead per mission
Medium
Medium
Use cheap models for coordination (Haiku-class). Template matching avoids LLM call entirely for known patterns.
4
Tick frequency tradeoff — too fast = wasted cycles, too slow = delayed dispatch
Medium
Medium
Start with 5s (Symphony default). Make configurable. Event-driven trigger for task completion → immediate dependent dispatch.
5
Parallel dispatch race conditions — two tasks complete simultaneously, both trigger same dependent
Medium
Medium
Optimistic locking with version column on orchestration_tasks. Only one transition succeeds.
6
Replanning destroys progress — bad replan discards valid completed work
High
Low
Immutable completed tasks. Replanning only modifies pending/scheduled tasks. plan_version increments for audit.
7
Agent unavailability — assigned agent offline or overloaded
Medium
Medium
Check availability before dispatch. Fallback: reassign or spawn contractor. Stall detection catches unresponsive agents.
8
Circular dependencies in task graph — LLM generates impossible plan
Low
Low
Validate DAG structure via TopologicalSorter before accepting any plan. Reject plans with cycles.
9
Coordinator single point of failure
Medium
Low
Stateless design (DB-driven) means any instance can take over. No in-process state to lose.
10
Over-engineering v1
High
High
PRD-100 Risk #3: "Start sequential-only. Get lifecycle right first." Implementation phases: sequential (82A/B) → parallel + replanning (82C).
15. Dependencies
PRD-101 (Mission Schema)
Blocked by 101
Coordinator reads/writes orchestration_runs, orchestration_tasks, orchestration_events. Schema must exist.
PRD-103 (Verification)
Blocks 103
Coordinator triggers verification phase. Verification PRD needs coordinator's handoff interface (defined in Section 8).
PRD-104 (Ephemeral Agents)
Blocks 104
Coordinator spawns contractor agents. Contractor PRD needs coordinator's spawn interface (Section 6).
PRD-105 (Budget)
Uses 105
Coordinator calls budget admission gate before dispatch. Can start with simple checks, enhance later.
PRD-106 (Telemetry)
Feeds 106
Coordinator emits orchestration_events that telemetry queries. Event schema supports aggregation.
PRD-107 (Context Interface)
Blocks 107
Context interface must abstract how coordinator gets/sets context. Coordinator is the primary consumer.
HeartbeatService
Integration
Coordinator registers its tick alongside heartbeat. Must not conflict with heartbeat scheduling.
AgentFactory
Integration
Coordinator dispatches via execute_with_prompt(). No changes needed to AgentFactory.
TaskReconciler
Extension
Must extend to cover mission tasks. New MissionReconciler or extension of existing class.
ContextService
Extension
Must add COORDINATOR mode and 2 new sections. Non-breaking — adds new mode, doesn't modify existing.
Appendix A: Coordinator Model Selection
The coordinator LLM call (for planning and replanning) should use a cheap-but-capable model. Planning requires good reasoning but produces relatively short structured output.
Template matching
No LLM needed
Embedding similarity check
Novel decomposition
Mid-tier (Sonnet 4.6, GPT-4o)
Good reasoning for task decomposition
Plan validation
No LLM needed
Structural checks only
Replanning
Mid-tier
Same reasoning as decomposition
Stall detection
No LLM needed
Time-based threshold check
Dependency resolution
No LLM needed
DAG traversal
Estimated coordinator overhead per mission: 1-2 LLM calls for planning (template miss), 0 for execution (all structural). At ~$0.05-0.10 per planning call, coordinator overhead is <5% of mission cost.
Appendix B: Research Sources
Nii 1986, "Blackboard Systems" (AI Magazine)
Shared state coordination, event-driven knowledge source activation
LbMAS 2025 (arxiv:2507.01701)
Modern blackboard for LLMs, 5% improvement over static multi-agent
Nau et al. JAIR 2003 (SHOP2)
HTN formal correctness, forward-search decomposition
ChatHTN 2025 (arxiv:2505.11814)
Hybrid HTN + LLM, provably sound decomposition
Hsiao et al. 2025 (arxiv:2511.07568)
HTN structure enables smaller models to outperform larger baselines
Rao & Georgeff, ICMAS 1995
BDI intention commitment, bold vs cautious agent spectrum
ChatBDI, AAMAS 2025
BDI for LLM agents, intention stack prevents thrashing
OpenAI Symphony (SPEC.md)
Two-phase tick, continuation vs retry, WORKFLOW.md policy-as-code
CrewAI (crewAIInc/crewAI)
context=[] dependency declarations, guardrail validation, async_execution
AutoGen (microsoft/autogen)
Swarm handoff priority, context_variables shared state
LangGraph (langchain-ai/langgraph)
Typed state + checkpoint, interrupt() for human review, Send API
Automatos codebase
heartbeat_service.py, agent_factory.py, task_reconciler.py, context/service.py, inter_agent.py, tool_router.py
Last updated

