PRD-102 — Coordinator Architecture

Version: 1.0 Type: Research + Design Status: Complete — Ready for Peer Review Priority: P0 Dependencies: PRD-100 (Research Master), PRD-101 (Mission Schema) Blocks: PRD-103 (Verification), PRD-104 (Ephemeral Agents), PRD-107 (Context Interface) Author: Gerard Kavanagh + Claude Date: 2026-03-15


1. Problem Statement

1.1 The Gap

Automatos has no coordination layer. The closest existing component is heartbeat_service.py:_orchestrator_tick_llm() (line ~382), which runs a 5-iteration tool loop with an 8,000-token budget and dispatcher_only tools — it does health checks and reporting, not goal decomposition or agent dispatch.

The platform can execute single-agent tasks beautifully. What it cannot do is take a complex goal — "Research EU AI Act compliance for our product" — and decompose it into subtasks, assign agents, execute with dependency ordering, verify outputs, handle failures, and track everything on the board.

1.2 What Exists vs What's Missing

What Exists
What's Missing

_orchestrator_tick_llm() — LLM tool loop for workspace health checks

Goal decomposition: breaking complex goals into 3-20 subtasks with dependency edges

AgentFactory.execute_with_prompt() — per-agent execution with 10-iteration tool loop

Parallel dispatch: running independent subtasks concurrently via asyncio.gather

AgentCommunicationProtocol — Redis pub/sub messaging (built, not wired to heartbeat)

Cross-task data flow: passing Task 1's output as input to Task 2

BoardTask with assigned_agent_id — manual task assignment

Automatic agent selection: matching task requirements to agent capabilities

SharedContextManager — in-process shared state with Redis backing (2h TTL)

Mission state machine: tracking plan → execute → verify → review lifecycle

TaskReconciler — stall detection for recipe_executions only

Mission-scoped stall detection, dependency-aware retry, escalation on failure

ContextMode.HEARTBEAT_ORCHESTRATOR — 8k tokens, 5 sections, dispatcher tools

ContextMode.COORDINATOR — full tools, mission context section, no token cap

1.3 What This PRD Delivers

The architecture for a CoordinatorService that:

  1. Takes a natural language goal + autonomy settings

  2. Decomposes it into a dependency graph of 3-20 tasks (using PRD-101's orchestration_tasks schema)

  3. Assigns each task to a roster agent or contractor agent

  4. Dispatches tasks respecting dependency ordering

  5. Monitors execution, handles failures (continuation vs retry)

  6. Triggers verification (PRD-103) and human review gates

  7. Detects mission completion and offers "save as routine"

1.4 What This PRD Does NOT Cover

Out of Scope
Covered By

How verification/scoring works

PRD-103 (Verification & Quality)

Ephemeral "contractor" agent lifecycle

PRD-104 (Ephemeral Agents & Model Selection)

Budget enforcement and approval gates

PRD-105 (Budget & Governance)

Outcome telemetry queries and learning

PRD-106 (Outcome Telemetry)

Context interface abstraction for Phase 3

PRD-107 (Context Interface Abstraction)

Neural field prototype

PRD-108 (Memory Field Prototype)

SQL DDL and Alembic migrations

PRD-101 (already delivered) and PRD-82A (implementation)

1.5 Design Philosophy

Four principles guided every decision:

  1. Stateless coordinator, DB-authoritative. The coordinator holds no in-process state. Every tick reads from orchestration_runs / orchestration_tasks and writes back. Any coordinator instance can take over after a crash. This is the Airflow scheduling pattern validated at massive scale.

  2. Two-phase tick (Symphony pattern). Every coordinator cycle runs dispatch (find ready tasks, assign agents) then reconcile (check running tasks for stalls, completions, failures). Clean separation, predictable behavior.

  3. HTN-inspired hybrid planning. Template library for known mission types + LLM for novel goals + structural validation for all plans. Never pure LLM (non-deterministic), never pure rules (brittle).

  4. BDI intention commitment. Once committed to a plan, the coordinator does not replan on every tick. Replanning triggers are explicit: task failure after max retries, user sends new instructions, budget warning. This prevents thrashing.


2. Prior Art: Coordination Patterns

2.1 Overview

Seven systems and architectural patterns were studied to inform the coordinator design. Each addresses a different facet of the coordination problem: how to plan, how to track state, how to handle failure, how to involve humans.

2.2 Comparison Table

Aspect
Blackboard (Nii 1986, LbMAS 2025)
HTN Planning (ChatHTN 2025, Hsiao 2025)
BDI Agents (Rao & Georgeff 1995, ChatBDI 2025)
Symphony (OpenAI)
CrewAI
AutoGen
LangGraph

Coordination model

Shared state + event-driven knowledge source activation

Hierarchical decomposition of compound tasks into primitives

Belief-Desire-Intention deliberation cycle

Reconciliation loop (dispatch + reconcile) with policy-as-code

Sequential or hierarchical (LLM-as-manager) process

Turn-based group chat with LLM speaker selection

Typed state graph with deterministic conditional edges

State management

Blackboard data structure (shared, hierarchical)

World state updated at each primitive step

Belief base (agent's model of world)

External tracker (Linear) + workspace filesystem

In-memory crew state; Flows add SQLite persistence

In-memory message list (ephemeral)

Typed schema + pluggable checkpointers (Postgres, SQLite)

Planning approach

Opportunistic — no predetermined path

Method library for known decompositions; backtracking for alternatives

Plan library indexed by triggering events; LLM can generate plans dynamically

No planning — work comes from external tracker

LLM-as-manager in hierarchical mode; AgentPlanner pre-generates steps

No planning — conversation-driven emergence

Graph defined at compile time; conditional routing for branching

Failure handling

Knowledge sources produce competing hypotheses; control resolves conflicts

Backtrack and try alternative method

Plan failure propagation with alternative plan selection; bold/cautious reconsideration

Continuation (1s) vs retry (exponential backoff); workspace preserved

Guardrail retry loop (max 3); soft failure — proceeds with bad output

No built-in failure handling

Checkpoint enables resume from last successful step

Human review

Not built-in

Not built-in

Not built-in (agent is autonomous)

PR review is the human gate; no mid-execution review

human_input=True per task; @human_feedback in Flows

human_input_mode on UserProxyAgent

interrupt() pauses execution; resume with human input

2.3 System-by-System Analysis

Blackboard Architecture (Nii 1986; LbMAS, arxiv:2507.01701, 2025)

The blackboard pattern coordinates multiple "knowledge sources" (KS) through a shared workspace. Each KS has activation preconditions — it fires when data it can process appears on the blackboard. A control component resolves conflicts when multiple KS are eligible.

LbMAS (2025) modernized this for LLM multi-agent systems and demonstrated a 5% improvement over static agent configurations. The key insight: event-driven activation (agent fires when its dependencies appear on the shared state) outperforms polling.

What we adopt: The mission state object (orchestration_runs + orchestration_tasks from PRD-101) acts as a blackboard. Agents write results to it; the coordinator reads it to decide next actions. Task activation is dependency-driven — a task becomes queued when all its parent dependencies reach terminal success state.

What we reject: The BB1 control blackboard (a second blackboard to manage the first — overkill for 3-20 tasks). Distributed blackboard partitioning (premature for our scale).

HTN Planning (Nau et al. JAIR 2003; ChatHTN, arxiv:2505.11814, 2025; Hsiao et al., arxiv:2511.07568, 2025)

Hierarchical Task Network planning decomposes compound tasks into primitive actions using a library of decomposition methods. SHOP2 (Nau et al.) proved this formally correct for forward-search decomposition.

ChatHTN (2025) proved that a hybrid approach — symbolic HTN structure with LLM filling in the gaps — is provably sound. The LLM generates decomposition candidates; the HTN validator ensures structural correctness (no cycles, valid dependencies, feasible agent assignments).

Hsiao et al. (2025) showed that hand-coded HTN structures enable 20-70B parameter models to outperform 120B baselines. Structure improves LLM planning quality. This means our decomposition templates aren't just efficiency shortcuts — they make planning better.

What we adopt: Template library for known mission types (the "methods" in HTN terminology). LLM generates decomposition for novel goals. All plans validated structurally before execution — DAG check, agent availability, budget estimate. This is the ChatHTN hybrid.

What we reject: Full formal HTN domain models (too rigid for natural language goals). Requiring hand-authored methods for every decomposition (LLM handles novel cases).

BDI Agents (Rao & Georgeff, ICMAS 1995; ChatBDI, AAMAS 2025)

Belief-Desire-Intention architecture models rational agent behavior. The critical insight for coordinators: intention commitment. Once an agent commits to an intention (plan), it should not reconsider on every deliberation cycle. Kinny & Georgeff proved that bold agents (reconsider rarely) outperform cautious agents (reconsider constantly) in stable environments.

ChatBDI (2025) adapted BDI for LLM agents, showing that the intention stack prevents the "thrashing" problem where agents constantly replan instead of executing.

What we adopt: The bold/cautious spectrum maps directly to the autonomy toggle. approve mode = cautious (human gates at plan approval and result review). autonomous mode = bolder (replan only on failure). In both cases, the coordinator commits to a plan and does not replan on every tick — only on explicit triggers (Section 5.5).

What we reject: The full BDI deliberation cycle (belief revision, desire filtering, plan selection). Our coordinator is simpler — it has one goal (the mission), one plan (the decomposition), and reconsiders only when reality diverges from the plan.

Symphony (OpenAI)

Symphony's defining contribution is the two-phase reconciliation tick:

  1. Dispatch phase: Find tasks whose dependencies are met, claim them, assign agents

  2. Reconcile phase: Check running tasks for stalls, completions, external state changes

This separation is cleaner than a single monolithic loop because dispatch decisions don't interleave with reconciliation decisions. Each phase has a clear contract: dispatch reads pending tasks, reconcile reads running tasks.

Symphony's continuation vs retry distinction (Section 3.5 of PRD-101) is adopted wholesale. A clean agent exit → continuation (1s delay, same workspace). A failure → retry (exponential backoff). This prevents backoff on normal multi-turn agent work while protecting against failure loops.

What we adopt: Two-phase tick. Continuation vs retry. WORKFLOW.md-style state-specific coordinator instructions (the coordinator prompt changes based on mission state). Stall detection via elapsed time since last event.

What we reject: Linear-as-coordinator (we have our own board). In-memory-only state (we need persistent mission history). Single-agent-per-task constraint (we support contractor fan-out within PRD-104).

CrewAI

CrewAI's context=[task_a, task_b] dependency declaration maps directly to PRD-101's orchestration_task_dependencies join table. The explicit, declarative, queryable dependency model is what we need.

The guardrail validation pattern — a function that checks output before accepting it — is a simplified version of what PRD-103 (Verification) delivers.

What we adopt: Explicit dependency declarations. The async_execution + join pattern for parallel tasks.

What we reject: LLM-as-manager for agent selection (non-deterministic, untestable). Soft guardrail failure mode (bad output proceeds — unacceptable for missions).

AutoGen

AutoGen's Swarm handoff pattern defines priority ordering for task transitions: tool-returned agent → OnCondition → AFTER_WORK fallback. The context_variables dict as shared mutable state across agents maps to our mission-scoped context.

What we adopt: Priority ordering for coordinator task transitions (dependency-resolved tasks first, then stalled task recovery, then budget checks). Shared mutable context per mission (via SharedContextManager in Phase 2, neural field in Phase 3).

What we reject: LLM-based speaker selection per turn (expensive, non-deterministic). Magic-string termination conditions. Ephemeral state.

LangGraph

LangGraph's typed state schema with checkpoint-per-step is the closest to our DB-authoritative model. The interrupt() mechanism for human review maps to our awaiting_approval and awaiting_human states.

What we adopt: Typed state schema (our orchestration_runs/orchestration_tasks tables). Checkpoint per state transition (our dual-write to event log). interrupt() for human review (our awaiting_human task state). Send API for dynamic parallelism (our coordinator dispatching multiple tasks concurrently).

What we reject: Full boilerplate burden of graph compilation. Static graph definition at compile time (our plans are generated per-mission). LangSmith vendor lock-in.

2.4 Architectural Decisions Summary

Decision
Pattern
Source
Rationale

Tick structure

Two-phase: dispatch + reconcile

Symphony

Clean separation; each phase has a clear contract

Planning

HTN-inspired hybrid: templates + LLM + validation

ChatHTN, Hsiao et al.

Templates improve quality; LLM handles novel goals; validation catches structural errors

State authority

DB-authoritative, stateless coordinator

Airflow, LangGraph

Crash-safe; any instance can take over

Replanning policy

BDI intention commitment — replan on explicit triggers only

Rao & Georgeff, ChatBDI

Prevents thrashing; matches autonomy toggle

Mission state

Blackboard pattern — shared state with event-driven activation

Nii, LbMAS

Tasks activate when dependencies met; coordinator reads blackboard each tick

Dependencies

Explicit join table, queryable both directions

CrewAI, Airflow

Declarative, queryable, validates DAG structure

Failure handling

Continuation vs retry + infrastructure/quality failure classification

Symphony, Prefect

Different strategies for different failure types

Human review

Interrupt-based: plan approval + result review

LangGraph, Symphony

Two human gates; configurable per autonomy level

Agent selection

Deterministic scoring, not LLM-based

(Anti-pattern from CrewAI)

Reproducible, testable, debuggable


3. CoordinatorService Architecture

3.1 Module Hierarchy

Rationale: coordinator_service.py lives in services/ alongside heartbeat_service.py and task_reconciler.py — it's a service that registers its tick on the shared scheduler. Supporting classes live in modules/coordination/ because they encapsulate domain logic (planning, dispatching, reconciling) that doesn't belong in the service entry point.

3.2 Class Diagram

3.3 Public Interface


4. Coordinator Tick Algorithm

4.1 Overview

The coordinator tick runs on a configurable interval (default: 5 seconds, matching Symphony's default). Each tick processes ALL active missions in the workspace, not just one.

4.2 Phase A: Dispatch

4.3 Phase B: Reconcile

4.4 Dependency Resolution

When a task completes, the coordinator must check whether downstream tasks are now unblocked. This is event-driven, not polling-based (blackboard pattern).

4.5 Stall Detection

4.6 Concurrency Safety

Multiple coordinator ticks could overlap if a tick takes longer than the interval. Two tasks could complete simultaneously, both triggering dependency resolution for the same downstream task.

Solution: Optimistic locking with version column.

PRD-101 defines version on orchestration_tasks. Every state transition includes WHERE version = :expected_version. If the version changed (another process already transitioned the task), the UPDATE affects 0 rows and the transition is skipped.


5. Plan Decomposition

5.1 Decomposition Pipeline

5.2 MissionPlanner Interface

5.3 Decomposition Templates

Templates are Python dataclasses registered in a template library. They provide structural scaffolding that the LLM customizes with mission-specific details.

5.4 LLM Decomposition Prompt

When no template matches, the coordinator calls an LLM to generate the decomposition. The prompt is structured to produce valid JSON matching the TaskSpec schema.

Rules

  1. Tasks MUST form a valid DAG (no circular dependencies)

  2. Task 1 should have no dependencies (the starting point)

  3. Every task needs at least one success criterion with must_pass=true

  4. Use task_type to guide model selection (research=mid-tier, review=different-family)

  5. Keep task count proportional to goal complexity (simple goal = 3-4 tasks)

  6. Independent tasks CAN run in parallel (no dependency edge between them)

  7. Estimated costs must sum to less than the budget constraint """


6. Agent Assignment

6.1 Assignment Strategy

For each task in the plan, the coordinator assigns an agent using a deterministic scoring algorithm — not LLM-based selection (CrewAI's approach, which is non-deterministic and untestable).

Strategy
When Used

Roster match

Task requirements match a roster agent's skills/tools. Preferred — agent has memory, personality, history.

Contractor spawn

No roster agent scores above threshold, or task needs a specialist model not available on roster. Ephemeral — mission-scoped lifecycle (PRD-104).

User override

In approve mode, user can reassign agents before execution starts.

6.2 Scoring Algorithm

6.3 Dispatch Mechanism

The coordinator dispatches tasks directly via AgentFactory.execute_with_prompt(). It does NOT create a BoardTask and wait for the agent's heartbeat tick to pick it up. Direct dispatch gives the coordinator control over timing, retry, and result collection.

A BoardTask is created for visibility (kanban tracking) but is NOT the dispatch mechanism.


7. ContextMode.COORDINATOR

7.1 New Context Mode Definition

The coordinator needs its own context mode to get mission-aware context when making planning and monitoring decisions.

7.2 New Sections

MissionContextSection

AgentRosterSection

7.3 Files That Must Be Modified

File
Change

orchestrator/modules/context/modes.py

Add COORDINATOR and VERIFIER to ContextMode enum and MODE_CONFIGS

orchestrator/modules/context/service.py

Register MissionContextSection and AgentRosterSection section renderers

orchestrator/modules/context/sections/

New files: mission_context.py, agent_roster.py


8. Failure Handling

8.1 Decision Tree

8.2 Retry-with-Feedback Protocol

When verification fails but retries remain, the verifier's reasoning is fed back to the executing agent. This is a continuation with guidance, not a blind retry.

8.3 Escalation Strategy

When a task fails after max retries:

Escalation Level
Action
When

1. Different agent

Reassign to next-best-scoring agent

Default

2. Different model

Keep same agent, switch to higher-tier model

If agent-specific issue unlikely

3. Coordinator replanning

Remove failed task, find alternative path

If task is on critical path

4. Human escalation

Flag for human review with full context

All automated options exhausted

5. Mission failure

Mark run as failed, cancel remaining tasks

Human rejects or no alternatives


9. Replanning Specification

9.1 Triggers

Trigger
Action
Constraint

Task fails after max retries + all escalations

Replan: find alternative path or substitute task

Completed tasks immutable

User sends new instructions mid-mission

Replan: incorporate new requirements

Completed tasks immutable

Budget warning (>80% spent)

Replan: cut optional tasks, use cheaper models

Running tasks continue

Verification rejects task + coordinator determines task design is wrong

Replan: redesign the task, not just retry

Only pending/queued tasks modified

Agent discovers new information requiring additional work

Replan: add tasks dynamically

New tasks get new task_order values

9.2 Replanning Constraints

  1. Completed tasks are immutable. Their outputs are already consumed by downstream tasks. Removing them would invalidate the dependency graph.

  2. Running tasks continue. Only cancel running tasks if explicitly directed by human or if budget is exhausted.

  3. Plan version increments. Every replan bumps orchestration_runs.plan_version for audit trail.

  4. New tasks get the next available task_order. No renumbering of existing tasks.

  5. Dependency graph must remain a valid DAG. Validated after every replan.

9.3 Replanning LLM Prompt


10. API Endpoints

10.1 Mission CRUD

10.2 Mission Lifecycle

10.3 Task Operations

10.4 Request/Response Examples

Create Mission

Plan Ready (webhook or poll)


11. Sequence Diagrams

11.1 Happy Path: 3-Task Sequential Mission

11.2 Mission with Task Failure and Retry

11.3 Mission with Human Review Rejection


12. Integration Points

12.1 Existing Components Used

Component
How Coordinator Uses It
Changes Required

AgentFactory.execute_with_prompt()

Dispatches each task to its assigned agent

None — accepts AgentRuntime already

ContextService.build_context()

Coordinator builds its own context with ContextMode.COORDINATOR

Add new mode + 2 new sections

get_tools_for_agent() (tool_router.py:~140)

Resolves tools for task agents

None for roster agents; PRD-104 adds explicit_tools param for contractors

UnifiedToolExecutor.execute_tool()

Coordinator's own tool loop for mission management

None

BoardTask model (core/models/board.py)

Creates board tasks with source_type='orchestration' for kanban visibility

None — existing model supports this

TaskReconciler (services/task_reconciler.py)

Extended to cover orchestration_tasks alongside recipe_executions

Add mission task query to _tick()

SharedContextManager (inter_agent.py)

Stores mission-scoped shared context for cross-task data flow

None — used via SharedContextPort (PRD-107)

UnifiedScheduler

Registers coordinator tick alongside heartbeat tick

None — additive registration

workflow_recipes table

"Save as routine" converts mission structure to recipe

Conversion function (new)

12.2 New Components Introduced

Component
Purpose
Location

CoordinatorService

Main service: tick loop, plan generation, dispatch, reconciliation

orchestrator/services/coordinator_service.py

MissionPlanner

LLM-powered decomposition: goal → task graph

orchestrator/modules/coordination/planner.py

MissionDispatcher

Resolves ready tasks, assigns agents, launches execution

orchestrator/modules/coordination/dispatcher.py

MissionReconciler

Stall detection, completion handling, failure escalation

orchestrator/modules/coordination/reconciler.py

AgentMatcher

Deterministic agent-to-task scoring

orchestrator/modules/coordination/agent_matcher.py

MissionContextSection

New context section: mission state for coordinator

orchestrator/modules/context/sections/mission_context.py

AgentRosterSection

New context section: available agents for coordinator

orchestrator/modules/context/sections/agent_roster.py

platform_create_mission

Platform tool: create mission from chat

platform_actions.py + platform_executor.py

platform_approve_plan

Platform tool: approve plan from chat

platform_actions.py + platform_executor.py

platform_mission_status

Platform tool: check mission progress from chat

platform_actions.py + platform_executor.py

API router

REST endpoints for mission CRUD + lifecycle

orchestrator/api/missions.py

12.3 Board Task Bridge

12.4 Save as Routine Conversion


13. Acceptance Criteria

Must Have

Should Have

Nice to Have


14. Risk Register

#
Risk
Impact
Likelihood
Mitigation

1

Coordinator complexity — too many responsibilities in one service

High

Medium

Split into focused classes: Planner, Dispatcher, Reconciler. Coordinator is the orchestrator, not the doer.

2

LLM planning reliability — decomposition quality varies by model and prompt

High

High

Template library for common patterns (ChatHTN hybrid). Validate all plans structurally. Benchmark decomposition quality across models.

3

Cost of coordination calls — coordinator LLM calls add overhead per mission

Medium

Medium

Use cheap models for coordination (Haiku-class). Template matching avoids LLM call entirely for known patterns.

4

Tick frequency tradeoff — too fast = wasted cycles, too slow = delayed dispatch

Medium

Medium

Start with 5s (Symphony default). Make configurable. Event-driven trigger for task completion → immediate dependent dispatch.

5

Parallel dispatch race conditions — two tasks complete simultaneously, both trigger same dependent

Medium

Medium

Optimistic locking with version column on orchestration_tasks. Only one transition succeeds.

6

Replanning destroys progress — bad replan discards valid completed work

High

Low

Immutable completed tasks. Replanning only modifies pending/scheduled tasks. plan_version increments for audit.

7

Agent unavailability — assigned agent offline or overloaded

Medium

Medium

Check availability before dispatch. Fallback: reassign or spawn contractor. Stall detection catches unresponsive agents.

8

Circular dependencies in task graph — LLM generates impossible plan

Low

Low

Validate DAG structure via TopologicalSorter before accepting any plan. Reject plans with cycles.

9

Coordinator single point of failure

Medium

Low

Stateless design (DB-driven) means any instance can take over. No in-process state to lose.

10

Over-engineering v1

High

High

PRD-100 Risk #3: "Start sequential-only. Get lifecycle right first." Implementation phases: sequential (82A/B) → parallel + replanning (82C).


15. Dependencies

Dependency
Direction
Notes

PRD-101 (Mission Schema)

Blocked by 101

Coordinator reads/writes orchestration_runs, orchestration_tasks, orchestration_events. Schema must exist.

PRD-103 (Verification)

Blocks 103

Coordinator triggers verification phase. Verification PRD needs coordinator's handoff interface (defined in Section 8).

PRD-104 (Ephemeral Agents)

Blocks 104

Coordinator spawns contractor agents. Contractor PRD needs coordinator's spawn interface (Section 6).

PRD-105 (Budget)

Uses 105

Coordinator calls budget admission gate before dispatch. Can start with simple checks, enhance later.

PRD-106 (Telemetry)

Feeds 106

Coordinator emits orchestration_events that telemetry queries. Event schema supports aggregation.

PRD-107 (Context Interface)

Blocks 107

Context interface must abstract how coordinator gets/sets context. Coordinator is the primary consumer.

HeartbeatService

Integration

Coordinator registers its tick alongside heartbeat. Must not conflict with heartbeat scheduling.

AgentFactory

Integration

Coordinator dispatches via execute_with_prompt(). No changes needed to AgentFactory.

TaskReconciler

Extension

Must extend to cover mission tasks. New MissionReconciler or extension of existing class.

ContextService

Extension

Must add COORDINATOR mode and 2 new sections. Non-breaking — adds new mode, doesn't modify existing.


Appendix A: Coordinator Model Selection

The coordinator LLM call (for planning and replanning) should use a cheap-but-capable model. Planning requires good reasoning but produces relatively short structured output.

Coordinator Operation
Recommended Model Tier
Rationale

Template matching

No LLM needed

Embedding similarity check

Novel decomposition

Mid-tier (Sonnet 4.6, GPT-4o)

Good reasoning for task decomposition

Plan validation

No LLM needed

Structural checks only

Replanning

Mid-tier

Same reasoning as decomposition

Stall detection

No LLM needed

Time-based threshold check

Dependency resolution

No LLM needed

DAG traversal

Estimated coordinator overhead per mission: 1-2 LLM calls for planning (template miss), 0 for execution (all structural). At ~$0.05-0.10 per planning call, coordinator overhead is <5% of mission cost.

Appendix B: Research Sources

Source
What It Informed

Nii 1986, "Blackboard Systems" (AI Magazine)

Shared state coordination, event-driven knowledge source activation

LbMAS 2025 (arxiv:2507.01701)

Modern blackboard for LLMs, 5% improvement over static multi-agent

Nau et al. JAIR 2003 (SHOP2)

HTN formal correctness, forward-search decomposition

ChatHTN 2025 (arxiv:2505.11814)

Hybrid HTN + LLM, provably sound decomposition

Hsiao et al. 2025 (arxiv:2511.07568)

HTN structure enables smaller models to outperform larger baselines

Rao & Georgeff, ICMAS 1995

BDI intention commitment, bold vs cautious agent spectrum

ChatBDI, AAMAS 2025

BDI for LLM agents, intention stack prevents thrashing

OpenAI Symphony (SPEC.md)

Two-phase tick, continuation vs retry, WORKFLOW.md policy-as-code

CrewAI (crewAIInc/crewAI)

context=[] dependency declarations, guardrail validation, async_execution

AutoGen (microsoft/autogen)

Swarm handoff priority, context_variables shared state

LangGraph (langchain-ai/langgraph)

Typed state + checkpoint, interrupt() for human review, Send API

Automatos codebase

heartbeat_service.py, agent_factory.py, task_reconciler.py, context/service.py, inter_agent.py, tool_router.py

Last updated