PRD-82C: Parallel Execution, Intelligent Decomposition & Budget Governance
Status: Draft Date: 2026-03-24 Authors: Gerard Kavanagh + Claude Dependencies: PRD-82A (built), PRD-82B (partial), PRD-102, PRD-104, PRD-105, PRD-106 Branch: TBD
TL;DR
Missions today are sequential playbooks with an LLM-generated plan. PRD-82C makes them what they were designed to be: parallel, budget-aware, intelligently decomposed multi-agent operations. This PRD consolidates all unfinished work from 82A/B and unbuilt specs from PRDs 102-106 into one deliverable.
1. Problem Statement
1.1 What's broken
A user asked for a 4,000-word research paper. The planner generated a single "write the paper" task. The agent hit max_tokens, output was truncated mid-sentence, verification failed. The user had to start over from scratch.
This failure pattern is systemic, not incidental:
Single massive tasks
Planner prompt says "sequential tasks" with no decomposition guidance
Truncation, verification failure, wasted tokens
No parallel execution
has_active_task() hard-blocks dispatch regardless of max_concurrent
3x slower than necessary for parallelizable work
No budget enforcement
can_afford() exists but never called; soft warning at 150% only
402 errors mid-mission when credits run out
No synthesis step
TaskType.SYNTHESIS enum defined, zero executor logic
Parallel outputs can't be merged
Templates are sequential
All 4 templates chain depends_on = [previous_task]
Even template-matched missions run serially
No task sizing intelligence
Budget estimate = num_tasks * 2000 tokens flat
Large tasks get same budget as small ones
1.2 What works (don't break it)
These components are wired and tested — 82C builds ON them:
State machine & transitions (orchestration_enums.py) — 11 task states, 10 run states, strict transition graph
Dependency DAG (OrchestrationTaskDependency table + DependencyResolver) — validates acyclicity, resolves ready tasks
Agent matcher (agent_matcher.py) — 5-factor weighted scoring, 0.4 threshold
Verification service (verification.py) — deterministic checks + cross-model LLM judge + caching
Cross-task consistency (verification.py) — ConsistencyResult/Issue, runs on mission finalization
Stall detection (reconciler.py) — 60s/300s timeouts, recovery to QUEUED
Shared context field (PRD-108) — per-mission Qdrant collection, inject/query/decay/reinforce
Run Trace UI (mission-dag-canvas.tsx) — DAG visualization, activity feed, status badges
Optimistic locking — version_id on all state transitions prevents double-dispatch
Event sourcing — append-only orchestration_events for audit trail
1.3 What this PRD delivers
When 82C ships, a "write a research paper" mission will:
Decompose into parallel research tasks + sequential drafting + synthesis merge + review
Dispatch up to 3 tasks concurrently (configurable per mission)
Enforce budget — refuse to dispatch if budget would be exceeded
Auto-generate synthesis — merge parallel outputs into unified document
Size tasks intelligently — no single task exceeds token limits
2. User Stories
US-001: Parallel Task Dispatch
As a mission coordinator, I want independent tasks to execute concurrently so that missions complete faster and agents aren't idle.
Acceptance criteria:
Dispatcher respects
max_concurrentfield on OrchestrationRun (default: 3)Independent tasks (no shared dependencies) dispatch simultaneously
Tasks with unmet dependencies remain PENDING until upstream VERIFIED
DAG visualization updates in real-time showing parallel branches
No regression: sequential missions (max_concurrent=1) still work
US-002: Intelligent Decomposition with Parallel Groups
As a planner, I want to generate task DAGs with parallel branches so that independent work happens simultaneously.
Acceptance criteria:
Planner system prompt instructs LLM to identify parallelizable subtasks
LLM output includes
parallel_groupfield on tasks (tasks in same group have no interdependencies)Templates generate parallel groups (e.g., 3 research tasks in parallel, all feeding into 1 synthesis)
Validation rejects plans where parallel-grouped tasks have dependencies on each other
Tasks within a parallel group get the same
sequence_number
US-003: Synthesis Task Auto-Generation
As a coordinator, I want synthesis tasks to be automatically created when parallel branches converge so that parallel outputs are merged coherently.
Acceptance criteria:
When 2+ tasks share a downstream dependent, a SYNTHESIS task is auto-inserted
Synthesis task receives all upstream outputs in
input_contextSynthesis prompt instructs agent to merge, reconcile contradictions, and produce unified output
TaskType.SYNTHESIStasks use verification criteria: coherence check + completeness checkIf planner explicitly includes a synthesis task, auto-generation is skipped
US-004: Budget Admission Gate
As a platform operator, I want missions to stop dispatching when budget is exhausted so that users don't get surprise 402 errors.
Acceptance criteria:
Pre-dispatch check:
can_afford(task_estimated_tokens)must pass before dispatchGraduated response per PRD-105:
HEALTHY (<50%): dispatch normally
WARNING (50-80%): dispatch with reduced max_tokens
CRITICAL (80-100%): dispatch only must-complete tasks (synthesis, review)
EXCEEDED (>100%): pause mission, notify user, await resume or cancel
Budget displayed on mission detail page with visual indicator
Pre-mission estimate shown at plan approval: "Estimated cost: ~X tokens across Y tasks"
US-005: Task Sizing & Complexity-Aware Decomposition
As a planner, I want to size tasks based on complexity so that no single task exceeds model token limits.
Acceptance criteria:
Planner prompt includes guidance: "No task should require more than 4,000 words of output"
For content tasks: sections decomposed individually (e.g., "Write Section 3: Prior Art" not "Write the paper")
Token budget per task estimated by complexity tier:
LIGHT (search, lookup): 1,000 tokens
MEDIUM (analysis, short draft): 4,000 tokens
HEAVY (long-form writing, code generation): 8,000 tokens
SYNTHESIS (merge parallel outputs): 6,000 tokens
Task max_tokens set from complexity tier, not global default
If estimated output > model max_tokens, planner must split into subtasks
US-006: Template Parallel Groups
As a template author, I want templates to define parallel task groups so that common mission types exploit concurrency.
Acceptance criteria:
TaskTemplategainsparallel_group: Optional[str]fieldTasks in same
parallel_groupshare no dependencies and dispatch concurrentlyUpdated templates:
content_pipeline: Research + Source Gathering (parallel) -> Outline -> Section drafts (parallel) -> Synthesis -> Edit -> Reviewresearch_and_report: Topic research tasks (parallel) -> Analysis -> Synthesis -> Draft -> Reviewcompetitive_analysis: Per-competitor research (parallel) -> Synthesis -> Report -> Reviewdata_investigation: Data gathering tasks (parallel) -> Analysis -> Report
All templates include at least one SYNTHESIS task after parallel convergence
US-007: Mission Budget Display
As a user, I want to see token usage and budget status on the mission detail page so that I know how much a mission is costing.
Acceptance criteria:
Budget bar: used / estimated tokens with color coding (green/amber/red/exceeded)
Per-task token breakdown visible in task detail
Budget warning banner when WARNING threshold crossed
Pre-approval: estimated token cost shown alongside plan
Post-completion: total tokens used, cost estimate in USD (if pricing available)
US-008: Complexity Detection for Decomposition Strategy
As a planner, I want to detect mission complexity so that simple goals get simple plans and complex goals get properly decomposed.
Acceptance criteria:
Goal analysis classifies into complexity tiers: SIMPLE (3-5 tasks), MODERATE (5-10 tasks), COMPLEX (10-20 tasks)
Complexity signals: word count of goal, number of deliverables mentioned, domain breadth, attachment count
SIMPLE missions: max_concurrent=1 (sequential is fine)
MODERATE missions: max_concurrent=2
COMPLEX missions: max_concurrent=3
User can override max_concurrent at plan approval
3. Architecture
3.1 Parallel Dispatch (dispatcher.py)
Current: has_active_task() → if ANY task active, skip dispatch.
New: count_active_tasks() → if active_count >= run.max_concurrent, skip dispatch. Otherwise dispatch up to (max_concurrent - active_count) ready tasks per tick.
Key change: dispatch_next() becomes dispatch_ready() and can return multiple DispatchResults.
3.2 Coordinator Tick Update (coordinator_service.py)
Current: _process_run() dispatches one task, then reconciles.
New: _process_run() dispatches up to max_concurrent tasks, executes them concurrently via asyncio.gather(), then reconciles.
3.3 Planner System Prompt Update (planner.py)
Current prompt says: "Tasks execute sequentially (one at a time)."
New prompt:
3.4 Updated JSON Schema
New fields:
complexity: "light" | "medium" | "heavy" | "synthesis" — drives token budget per taskparallel_group: Optional string — tasks sharing a group have no interdependencies
3.5 Budget Admission Gate (dispatcher.py)
Integrates TokenBudgetManager.can_afford() into dispatch flow:
Budget lifecycle:
Plan approval — show estimated tokens:
sum(COMPLEXITY_TOKEN_BUDGET[t.complexity] for t in tasks)Pre-dispatch —
can_afford()check with graduated responsePost-execution — reconcile actual vs estimated, update
run.tokens_usedWarning at 80% — emit event, show banner on UI
Hard stop at 100% — pause mission, user decides: add budget, cancel, or force-continue
3.6 Synthesis Task Executor
New logic in _execute_task() for SYNTHESIS tasks:
Synthesis prompt template:
3.7 Template Updates (templates.py)
Add parallel_group and complexity to TaskTemplate:
Revised content_pipeline template:
This transforms a "write a research paper" mission from:
Before: 1 agent, 1 task, truncated output, failed verification
After: 2 parallel researchers -> outline synthesis -> 3 parallel drafters -> document synthesis -> review
3.8 Complexity Detection (planner.py)
New function called before decomposition:
max_concurrent set during planning, overridable at plan approval.
4. 82A/B Gap Closure
These items were scaffolded in 82A/B but never wired. 82C closes them:
4.1 Wire max_concurrent (82A gap)
max_concurrent (82A gap)Current: Field on OrchestrationRun, server_default=1, never read
Fix: Dispatcher reads
run.max_concurrentin dispatch gateSet by: Planner sets based on complexity detection; user can override at approval
4.2 Wire TaskType.SYNTHESIS (82B gap)
TaskType.SYNTHESIS (82B gap)Current: Enum value exists, never generated or handled
Fix: Planner generates SYNTHESIS tasks; coordinator has synthesis-specific prompt builder; templates include synthesis tasks after parallel convergence
4.3 Wire TokenBudgetManager.can_afford() (82B gap)
TokenBudgetManager.can_afford() (82B gap)Current: Method exists in token_budget_manager.py, never called
Fix: Dispatcher calls
can_afford()before every dispatch; graduated response (allow/defer/block)
4.4 Wire complexity-aware budget estimation (82B gap)
Current:
TOKENS_PER_TASK_ESTIMATE = 2000flat for all tasksFix: Per-task estimate from
complexityfield: light=1000, medium=4000, heavy=8000, synthesis=6000
4.5 Update templates for parallel groups (82B gap)
Current: All 4 templates chain
depends_on = [previous_task]Fix: Templates use
parallel_groupand explicitdepends_onfor DAG structure
4.6 Dispatcher picks all ready tasks (82B gap)
Current:
DependencyResolver.get_ready_tasks()returns multiple, dispatcher takes[0]onlyFix: Dispatcher iterates ready tasks up to available dispatch slots
5. What's NOT in 82C (deferred to 82D)
Ephemeral/contractor agents (PRD-104) — significant new subsystem, decouple from parallel dispatch
Cross-mission knowledge transfer — requires persistent knowledge graph design
Outcome telemetry dashboards (PRD-106) — metadata columns can be added but dashboards are 82D
Model routing optimization — static role->model mapping is sufficient for 82C
Prompt coaching / guidance engine — learning layer, not execution layer
Tool policy layering (PRD-105 Section 4) — workspace > mission > task > agent narrowing
6. Implementation Plan
Phase 1: Parallel Dispatch (Core)
Files: dispatcher.py, coordinator_service.py, orchestration_enums.py
Replace
has_active_task()withcount_active_tasks()in dispatcherdispatch_next()→dispatch_ready()returningList[DispatchResult]Coordinator
_process_run()executes dispatched tasks viaasyncio.gather()Wire
run.max_concurrentinto dispatch gate (read from DB, default 3)Add RUNNING → RUNNING transition guard (multiple tasks running is valid)
Test: Create mission with 2 independent tasks, verify both dispatch on same tick.
Phase 2: Intelligent Decomposition
Files: planner.py, templates.py
Update
_SYSTEM_PROMPT— remove "sequential" language, add parallel group guidanceAdd
complexityandparallel_groupto output schemaAdd
_detect_complexity()function — setmax_concurrentduring planningUpdate
_validate_plan()— verify parallel_group tasks have no cross-dependenciesUpdate all 4 templates with parallel groups and synthesis tasks
Add
render_template()support forparallel_groupand explicitdepends_onSynthesis task auto-insertion: if parallel group converges without explicit synthesis, inject one
Test: Submit "write a research paper" goal, verify plan has parallel research + synthesis.
Phase 3: Budget Governance
Files: dispatcher.py, coordinator_service.py, token_budget_manager.py
Add
_pre_dispatch_budget_check()to dispatcherWire
TokenBudgetManager.can_afford()into dispatch flowAdd graduated response: HEALTHY/WARNING/CRITICAL/EXCEEDED
Pause mission on EXCEEDED — emit event, set run state to PAUSED
Add complexity-aware token estimates replacing flat 2000/task
Pre-approval budget display: estimated tokens shown in plan response
Test: Create mission with low budget, verify it pauses at threshold instead of 402.
Phase 4: Synthesis Executor
Files: coordinator_service.py, dispatcher.py
Add
_build_synthesis_prompt()— merges upstream outputs with reconciliation instructionsSynthesis-specific verification criteria: coherence + completeness
_execute_task()detectsTaskType.SYNTHESISand uses synthesis prompt builderAuto-synthesis injection in planner: detect parallel convergence without explicit synthesis
Test: 2 parallel research tasks → synthesis task merges both outputs coherently.
Phase 5: Frontend Updates
Files: mission-detail-page.tsx, use-missions-api.ts, mission-dag-canvas.tsx
Budget bar component: used/estimated tokens, color-coded
Budget warning banner when WARNING threshold crossed
Pre-approval: show estimated cost alongside plan
DAG canvas: parallel tasks rendered side-by-side (not just linear chain)
max_concurrentoverride control at plan approvalPer-task token usage in task detail panel
Phase 6: Template Expansion
Files: templates.py
Rewrite all 4 templates with parallel groups
Add 2 new templates:
coding_task: Spec -> Implement + Tests (parallel) -> Review -> Deploymulti_document: Per-document analysis (parallel) -> Synthesis -> Report
Template selection uses complexity tier to choose task count range
7. Validation Criteria
7.1 The Research Paper Test (must pass)
Submit the exact mission from log.md: "Write a technical research paper titled 'Shared Semantic Fields for Multi-Agent Coordination'..."
Expected behavior:
Planner detects COMPLEX (long goal, multiple sections, multiple deliverables)
Plan contains 8-12 tasks with parallel groups:
Group "research": 2-3 parallel research tasks (prior art, experiment data, competitive landscape)
Synthesis: merge research into brief
Group "drafting": 3-4 parallel section drafts (each < 4000 words)
Synthesis: merge sections into complete paper
Review: edit pass
max_concurrent = 3 (auto-detected from COMPLEX tier)
Budget estimate shown at approval (~50,000 tokens)
Research tasks dispatch simultaneously on first tick
No truncation — each task produces < 4000 words
Synthesis tasks merge parallel outputs
Final output: complete 3,000-4,000 word paper
Budget tracked throughout, no 402 surprises
7.2 Regression Tests
Simple goal ("summarize this document") → 3-5 sequential tasks, max_concurrent=1
Template match ("write a blog post about X") → content_pipeline template with parallel groups
Budget exceeded → mission pauses, user notified, can resume or cancel
Agent failure mid-parallel → failed task retries, siblings continue unblocked
Replan after failure → generates replacement subtree only, preserves completed work
7.3 Performance Targets
Parallel mission speedup vs sequential
>= 2x for missions with 2+ parallel groups
Budget estimation accuracy
Within 50% of actual (improves with telemetry in 82D)
No single task > 4000 words output
100% for content missions
Plan generation time
< 15s including complexity detection
Dispatch latency per tick
< 2s for up to 3 concurrent dispatches
8. Data Model Changes
8.1 OrchestrationTask additions
8.2 OrchestrationRun additions
8.3 New enum values
9. API Changes
9.1 Plan approval response (existing endpoint, enriched response)
9.2 Mission detail response (existing endpoint, enriched)
9.3 Approve with overrides (existing endpoint, new body fields)
10. Risk & Mitigation
Parallel tasks write conflicting outputs to shared field
Medium
Medium
Field dedup by content_hash already exists; synthesis prompt handles contradictions
LLM planner ignores parallel_group instructions
Medium
Low
Validation catches; fallback to sequential if no parallel groups
Budget estimation wildly inaccurate
High
Medium
Conservative defaults (2x actual); user override; soft-then-hard enforcement
Agent contention (same agent assigned to 2 parallel tasks)
Medium
Low
Agent matcher checks availability (busy=0.5 score); prefer different agents for parallel tasks
Synthesis quality poor (just concatenates)
Medium
High
Verification checks coherence; retry with feedback; cross-model judge
Parallel execution increases DB contention
Low
Medium
Optimistic locking already handles; version_id prevents double-state
11. Migration Path
This is NOT a breaking change. Existing missions continue working:
Default
max_concurrentchanges from 1 → 3, but existing running missions keep their stored valueOld-format plans (no parallel_group, no complexity) are treated as sequential with "medium" complexity
Templates with no parallel_group fall back to sequential chaining
Budget gate defaults to HEALTHY if no estimate exists (no blocking)
Synthesis auto-insertion only triggers when parallel_group detected
Rollback: Set max_concurrent=1 on all runs to revert to sequential behavior. Budget gate can be disabled via config flag BUDGET_HARD_ENFORCEMENT_ENABLED=false.
12. Verification & Review Gates
12.1 The 82A/B Problem
In 82A/B, Ralph built scaffolding (models, enums, classes) that passed code review because the code existed syntactically. But the code was never called. max_concurrent was a column nobody read. TaskType.SYNTHESIS was an enum nobody generated. can_afford() was a method nobody invoked.
Root cause: Review checked "does the code exist?" not "is the code reachable from the execution path?"
12.2 Wiring Verification Tests (MANDATORY per phase)
Every phase must include wiring tests — integration tests that prove the new code is called during actual mission execution. These are not unit tests of isolated functions. They trace the full call path.
Phase 1: Parallel Dispatch — Wiring Tests
Phase 2: Intelligent Decomposition — Wiring Tests
Phase 3: Budget Governance — Wiring Tests
Phase 4: Synthesis Executor — Wiring Tests
12.3 Review Checklist (per phase, before merge)
Every phase PR must include this checklist. Reviewer must verify each item:
12.4 Phase Gate Reviews
Each phase has a gate before the next phase starts:
Phase 1 → 2
Parallel dispatch wiring tests green. Manual test: 2 tasks dispatch on same tick.
Human (Gerard)
Phase 2 → 3
"Write a paper" goal decomposes into parallel groups + synthesis. Plan has max_concurrent > 1.
Human (Gerard)
Phase 3 → 4
Low-budget mission pauses at threshold. can_afford() provably called (test + log).
Human (Gerard)
Phase 4 → 5
Synthesis task merges 2+ upstream outputs. Verification passes on merged output.
Human (Gerard)
Phase 5 → 6
Budget bar renders. DAG shows parallel branches. Override controls work.
Human (Gerard)
Phase 6 → UAT
All templates render parallel groups. New templates match expected goals.
Human (Gerard)
12.5 User Acceptance Tests (after all phases)
These are the final "does it actually work" tests run by Gerard:
Test 1: Research Paper Mission
Input: The exact PRD-108 paper prompt from
log.mdExpected: Parallel research → synthesis → parallel drafting → synthesis → review → complete paper
Pass criteria: Paper is 3,000-4,000 words, no truncation, all sections present, budget tracked
Test 2: Simple Mission
Input: "Summarize this PDF" with attachment
Expected: 3-4 sequential tasks, max_concurrent=1, completes quickly
Pass criteria: Regression — simple missions don't over-decompose
Test 3: Budget Limit Mission
Input: Complex goal with intentionally low budget override
Expected: Mission pauses at budget threshold, user can resume or cancel
Pass criteria: No 402 errors, clear budget UI, graceful pause
Test 4: App Building Mission (stretch)
Input: "Build a simple todo app with React frontend and Express backend"
Expected: Parallel spec + research → architecture → parallel implementation → synthesis → review
Pass criteria: Outputs contain workable code, synthesis merges frontend + backend coherently
13. References
PRD-82A: Sequential Mission Coordinator (built) —
docs/PRDS/82A-SEQUENTIAL-MISSION-COORDINATOR.mdPRD-82 Research: Orchestration Readiness —
docs/PRDS/82-RESEARCH-ORCHESTRATION-READINESS.mdPRD-102: Coordinator Architecture —
docs/PRDS/102-COORDINATOR-ARCHITECTURE.mdPRD-104: Ephemeral Agents & Model Selection —
docs/PRDS/104-EPHEMERAL-AGENTS-MODEL-SELECTION.mdPRD-105: Budget & Governance —
docs/PRDS/105-BUDGET-GOVERNANCE.mdPRD-106: Outcome Telemetry —
docs/PRDS/106-OUTCOME-TELEMETRY.mdPRD-108: Memory Field Prototype —
docs/PRDS/108-MEMORY-FIELD-PROTOTYPE.md
Last updated

