PRD-123 — Harness Pattern Adoption
Version: 1.2 Type: Implementation Status: Draft Priority: P1 Author: Gerard Kavanagh + Claude Date: 2026-03-31 Extends: PRD-82A (Coordinator), PRD-79 (Memory), PRD-35 (Tools), PRD-55 (Channels), PRD-58 (Prompt Management) Touches: PRD-77 (Scheduled Tasks), PRD-06 (Dashboard), PRD-37 (API Keys), PRD-29 (FutureAGI) Research Base: Claude Code harness pattern analysis (instructkr/claude-code Python port)
1. Goal
Adopt 12 architectural patterns, 3 prompt engineering patterns, and 2 system-awareness features identified from studying the Claude Code agent harness and Automatos onboarding gaps. The architectural patterns address runtime gaps (silent permission failures, ambiguous stop reasons, monolithic startup). The prompt patterns address intelligence gaps — the LLM makes poor memory decisions, gets minimal tool-use guidance, and has no anti-pattern awareness for business operations. The system-awareness features address a critical UX gap — Auto doesn't know what users can accomplish with the platform and has no guided onboarding for new workspaces.
This PRD hardens platform internals, improves agent intelligence via prompt content, and introduces a goal-oriented platform awareness prompt plus a dynamic Mission Zero onboarding flow — all delivered through the existing ContextService section architecture and PromptRegistry (PRD-58).
2. Research Context
The patterns were extracted from instructkr/claude-code, a Python clean-room port of the Claude Code TypeScript harness. The analysis compared each pattern against the current Automatos codebase to identify gaps, adaptation strategies, and expected impact.
Full research saved at: claude-code/ repo memory (research_claude_code_patterns.md).
Key insight: Claude Code is a single-user CLI tool. Automatos is a multi-tenant platform with orchestration. The patterns that matter most are the ones that scale — structured denials, named stop reasons, trust gates, and typed events all become more valuable in multi-agent, multi-tenant contexts.
3. What Ships
5
Permission Denial as First-Class Data
Quick Win
4-6
None
6
Named Stop Reasons
Quick Win
3-5
None
2
Trust-Gated Initialization
Quick Win
1-2
None
10
Bootstrap Named Stages
Quick Win
2-3
None
4
Tool Tier Stratification
Next Sprint
5-8
#5
11
Streaming Typed Events
Next Sprint
8-12
None
1
Frozen State Transition Models
Next Sprint
10-15
#6
7
Proactive Transcript Compaction
Next Sprint
2-4
None
8
Session Checkpointing
Backlog
4-6
#1
3
Tool Manifest Snapshots
Backlog
2-3
#4
9
PRD Parity Audit
Backlog
3-5
None
12
Tool Execution Cost Tracking
Backlog
2-3
None
E
Memory Decision Framework
Quick Win
2-3
None
B
Business Tool Behavioral Contracts
Quick Win
3-5
None
F
Section-Level Anti-Patterns
Quick Win
1-2
E, B
H
Platform Awareness Prompt
Quick Win
2-3
None
I
Mission Zero Onboarding
Next Sprint
5-8
H
[!IMPORTANT] Patterns A (Sectioned Prompts) and D (Skills as Prompt Expansion) were identified in research but already exist in Automatos.
ContextService+SECTION_REGISTRYimplements sectioned assembly with 14 sections, 9 modes, parallel rendering, and priority-based token trimming.SkillsSection(Priority 4) already injects SKILL.md content into the system prompt. Pattern G (Prompt-Level Cost Awareness) is deferred — users configure agent LLMs at design time; on-the-fly model switching is a separate feature. Pattern C (Conditional Context Injection) is deferred — the section system handles this adequately for now.
4. What Does NOT Ship (Deferred)
Full event-sourcing migration
Too large; frozen models + events are sufficient for now
CQRS read/write separation
Premature; current DB load doesn't warrant it
Tool sandboxing / runtime isolation
Separate security PRD needed
Frontend UI changes for new events
Frontend PRD follows after backend ships
Pattern A: Sectioned Prompt Assembly
Already built. ContextService + SECTION_REGISTRY with 14 sections, 9 modes
Pattern D: Skills as Prompt Expansion
Already built. SkillsSection (P4) injects SKILL.md into system prompt
Pattern G: On-the-fly LLM switching
Users configure agent LLMs at design time; different feature scope
Pattern C: Mid-conversation context injection
Current section system handles this; revisit when needed
5. Phase 1 — Quick Wins
5.1 Pattern #5: Permission Denial as First-Class Data
Problem
When the coordinator assigns a task to an agent and that agent lacks access to a required tool, the failure is silent. The tool simply doesn't appear in the agent's available tools. The coordinator sees "task failed" with no explanation of why. Debugging requires manually cross-referencing agent_tool_assignments against the task requirements.
What Claude Code Does
Every permission check produces a PermissionDenial(tool_name, reason) frozen dataclass. These denials flow through TurnResult and are surfaced in the query engine output. The system always knows what was blocked and why.
What Automatos Does Today
agent_tool_assignmentscontrols tool access, but denial is silent (tool excluded from list)Workspace role checks return HTTP 403 with generic message
OrchestrationEventtracks task lifecycle but not permission failuresNo structured denial data in the orchestration flow
Design
New dataclass:
Emission points:
Tool resolution (
function_registry.py) — when agent requests a tool not in their assignmentsCoordinator dispatch (
coordinator_service.py) — when task requires a tool the matched agent can't accessChat execution (
chat_service.py) — when user-facing agent hits a tool boundary
Storage:
Emit as OrchestrationEvent with event_type='permission_denied':
Self-healing in coordinator:
When a denial is recorded during dispatch, the coordinator should attempt reassignment:
Files to Change
orchestrator/core/models/orchestration.py
Add PermissionDenial dataclass
orchestrator/core/llm/function_registry.py
Emit denial when tool not found in assignments
orchestrator/core/services/coordinator_service.py
Check tool access before dispatch, attempt reassignment
orchestrator/core/services/state_machine_service.py
Add permission_denied event type
orchestrator/api/missions.py
Include denials in mission status response
Acceptance Criteria
When an agent is denied a tool, a
PermissionDenialrecord is created (not just logged)OrchestrationEventwithevent_type='permission_denied'appears in mission event trailCoordinator attempts reassignment before failing the task
GET /api/missions/{id}response includespermission_denialsarrayNo existing tests break
5.2 Pattern #6: Named Stop Reasons
Problem
When an OrchestrationRun ends, the state is either completed or failed. There's no distinction between "ran out of budget," "hit max retries," "human cancelled," or "all tasks succeeded." Users see "failed" and have to dig through events to understand why. The token_budget_estimate field exists but isn't enforced as a stop condition.
What Claude Code Does
QueryEngineConfig defines max_turns and max_budget_tokens. When the engine stops, it returns a named stop_reason: completed, max_turns_reached, or max_budget_reached. Every exit is explicit.
What Automatos Does Today
RunStatehascompleted,failed,cancelledas terminal statestoken_budget_estimateandtokens_usedexist onOrchestrationRunbut budget is not enforcedmax_retriesexists but exhaustion just sets state tofailedNo
stop_reasonfield — you have to infer from events
Design
New enum:
Schema change:
Enforcement points:
API response enrichment:
Files to Change
orchestrator/core/models/orchestration.py
Add StopReason enum, stop_reason + stop_detail columns
orchestrator/core/services/coordinator_service.py
Enforce budget, set stop_reason at every exit point
orchestrator/core/services/state_machine_service.py
Accept stop_reason in transition_run()
orchestrator/api/missions.py
Include stop_reason and stop_detail in responses
orchestrator/core/database/init_database.py
Migration: add columns
Acceptance Criteria
Every terminal
OrchestrationRunhas a non-nullstop_reasonBudget enforcement: run stops with
BUDGET_EXHAUSTEDwhentokens_used >= token_budget_estimateGET /api/missions/{id}includesstop_reasonandstop_detailRetry exhaustion produces
MAX_RETRIES_EXCEEDED(not genericfailed)Human cancel via
POST /api/missions/{id}/cancelsetsHUMAN_CANCELLED
5.3 Pattern #2: Trust-Gated Initialization
Problem
Automatos boots in a single phase. DB schema, seeding, scheduler, channels, Composio sync, Git-backed skill loading — all run in one lifespan block. If a third-party Git skill repo is compromised, its code runs at startup alongside core database initialization. If a channel OAuth token is expired, the error can delay or break the entire startup sequence.
What Claude Code Does
Boot is split into phases with an explicit trust gate between core initialization and extension loading. Plugins, skills, MCP servers, and hooks only load after the trust gate passes. This is tracked via DeferredInitResult(plugins_loaded, skills_loaded, mcp_connected, hooks_registered).
What Automatos Does Today
main.pylifespan runs everything linearlyNo separation between core (DB, config) and extensions (skills, channels, Composio)
If
start_all_channels()throws, it can crash the startupSkill source seeding (
seed_skill_sources) runs before health checks
Design
Two-phase boot with trust gate:
New dataclass:
Health endpoint update:
Files to Change
main.py
Split lifespan into boot_phase_1_core, trust_gate, boot_phase_2_extensions
orchestrator/core/models/ (new file)
Add DeferredInitResult dataclass
orchestrator/api/health.py or main.py
Update /health to include extension status
Acceptance Criteria
If Composio sync fails, the platform still starts (degraded mode)
If channel OAuth is expired, core API endpoints still work
/healthreports which extensions loaded and which failedNo third-party code executes before
trust_gatepassesStartup logs clearly show Phase 1 / Trust Gate / Phase 2 boundaries
5.4 Pattern #10: Bootstrap Named Stages
Problem
Startup is a linear block in main.py. If seeding fails at line 47 of a 120-line function, the error says "startup failed" with a traceback. In production (Railway), diagnosing which phase failed requires scrolling logs. There's no way to ask "did the scheduler start?" without checking log text.
What Claude Code Does
BootstrapGraph defines ordered named stages with descriptions. Each stage is individually reportable. The system can render a full bootstrap report showing what completed and what didn't.
What Automatos Does Today
Monolithic
lifespanfunction inmain.pyStages exist implicitly but aren't named, timed, or reportable
No
/health/bootstrapendpoint
Design
Named stages enum + timing:
Execution wrapper:
Endpoint:
Files to Change
main.py
Wrap each startup task in run_stage(), store BootstrapReport on app.state
orchestrator/core/models/ (new file or extend)
Add BootstrapStage, StageResult, BootstrapReport
main.py or orchestrator/api/health.py
Add /health/bootstrap endpoint
Acceptance Criteria
Every startup phase is named and timed in logs:
Bootstrap [database_init] completed in 340msGET /health/bootstrapreturns full stage report with durationsFailed stages show error message in the report
Total startup time visible in one API call
5.5 Pattern #E: Memory Decision Framework (Prompt Content)
Problem
The LLM gets 3 lines of memory guidance in get_self_learning_instruction():
The result: the model stores garbage. Real examples from production Mem0:
"Appreciates clean, modern, minimal aesthetic in design"— vague, no context about when/where this applies"Need a blog image for a post titled 'The Top 5 AI Agent Frameworks in 2025'"— ephemeral task, not a memory"On the left: 3 abstract agent nodes in sequence (A → B → C) passing fragmented..."— raw artifact content, not a fact
The backend auto-saves ~95% of memories via SmartMemoryManager.store_conversation(). But platform_store_memory is the LLM's only way to intentionally store high-quality, curated facts. It needs a real decision framework, not a 3-bullet nudge.
What Claude Code Does
The memory system prompt is ~2000 words. It defines 4 named memory types (user, feedback, project, reference), each with:
Description of what belongs in this type
When to save — specific trigger conditions
When to use — retrieval conditions
Examples — input → memory action pairs
Anti-patterns — what NOT to save
Body structure — how to format the memory content
What Automatos Does Today
get_self_learning_instruction()inpersonality.py(lines 300-313): 3 bullets, ~60 wordsplatform_store_memorytool description:"Store a piece of information in the workspace memory system."— 10 wordsNo memory type taxonomy, no examples, no anti-patterns, no format guidance
Backend
SmartMemoryManagerclassifies as "global" or "agent-specific" — but the LLM doesn't know these categories exist
Design
Replace get_self_learning_instruction() with a full memory decision framework. Delivered as a new PromptRegistry version for the relevant chatbot personality slugs.
New prompt content (~800 tokens, replaces ~60):
Tool description upgrade for platform_store_memory:
Current (in actions_workspace.py):
New:
Delivery Mechanism
Replace
get_self_learning_instruction()inpersonality.pywith the new framework textCreate a new PromptRegistry version for
chatbot-friendly,chatbot-professional,chatbot-technicalthat includes the memory frameworkUpdate
platform_store_memorytool description inactions_workspace.pyUse FutureAGI (PRD-29) to A/B test: run
assesson conversations before/after the change to measure memory quality improvement
Files to Change
orchestrator/consumers/chatbot/personality.py
Replace get_self_learning_instruction() (~60 words → ~800 words)
orchestrator/modules/tools/discovery/actions_workspace.py
Upgrade platform_store_memory description (10 words → 50 words)
orchestrator/core/seeds/seed_system_prompts.py
Update PROMPT_MANIFEST entries for chatbot-* slugs to include memory framework
[!NOTE] The backend
SmartMemoryManagerauto-store logic is unchanged. This pattern only improves what the LLM intentionally stores viaplatform_store_memory. Over time, as the LLM stores better memories, the auto-stored conversation data becomes less important relative to curated facts.
Acceptance Criteria
get_self_learning_instruction()returns the full memory decision framework (~800 tokens)platform_store_memorytool description includes type guidance and anti-patternsNew PromptRegistry versions created for all three chatbot personality slugs
FutureAGI
assessrun on 10 sample conversations shows improvement inis_helpfulscoreAfter 1 week of production use, sample 50 new Mem0 entries — fewer than 20% should be ephemeral task artifacts (baseline: ~60%)
5.6 Pattern #B: Business Tool Behavioral Contracts (Prompt Content)
Problem
Tool descriptions are minimal. platform_execute has a 1-line description. composio_execute lists available actions but gives no behavioral guidance. The LLM doesn't know:
When to use a tool vs. just answering from knowledge
How to handle tool failures gracefully
What information to include in tool calls (e.g., always include workspace context for Slack)
What NOT to do with tools (e.g., don't search the knowledge base for "good morning")
Claude Code's Bash tool description is ~1500 words of behavioral rules. Automatos is not a coding tool, but the same principle applies — rich tool descriptions produce expert tool usage.
What Automatos Does Today
get_tool_guidance_prompt()inpersonality.py(lines 245-269): 3 bullets, ~40 wordsPlatformActionsSection(Priority 5): rendersActionRegistry.build_prompt_summary()— a markdown catalog of action names and 1-line descriptionsComposioSection: lists available Composio actions by nameNo behavioral rules, no anti-patterns, no workflow guidance for any tool
Design
Upgrade get_tool_guidance_prompt() with business-focused behavioral contracts. This is not coding-specific — it's about how an AI assistant should use tools when running a business.
New prompt content (~600 tokens, replaces ~40):
Upgrade PlatformActionsSection rendering:
Currently ActionRegistry.build_prompt_summary() returns a flat list. Add a preamble:
Delivery Mechanism
Replace
get_tool_guidance_prompt()inpersonality.pywith the behavioral contractAdd preamble to
PlatformActionsSection._build()before the action catalogCreate new PromptRegistry versions via seed update
FutureAGI assessment before/after
Files to Change
orchestrator/consumers/chatbot/personality.py
Replace get_tool_guidance_prompt() (~40 words → ~600 words)
orchestrator/modules/context/sections/platform_actions.py
Add behavioral preamble before action catalog
orchestrator/core/seeds/seed_system_prompts.py
Update PROMPT_MANIFEST for chatbot-* slugs
Acceptance Criteria
get_tool_guidance_prompt()includes when-to-use, how-to-use, and never-do sectionsPlatformActionsSectionincludes a behavioral preambleChatbot stops calling knowledge search for greetings (testable: send "good morning", verify no tool call)
Tool failures produce plain-language explanations (not raw error JSON)
FutureAGI
assessshows improvement inis_helpfulandis_concisescores
5.7 Pattern #F: Section-Level Anti-Patterns (Prompt Content)
Problem
System prompts tell agents what TO do. They rarely say what NOT to do. The Response Rules section in get_base_system_prompt() has one anti-pattern ("NEVER show code"), but there's no systematic anti-pattern documentation for business operations.
Without negative examples, the LLM defaults to training biases: over-explaining, unsolicited suggestions, redundant tool calls, and verbose responses.
What Claude Code Does
Explicit anti-patterns throughout the system prompt:
"Do NOT use Bash to run commands when a dedicated tool is provided"
"NEVER create documentation files unless explicitly requested"
"Do NOT propose changes to code you haven't read"
"Avoid over-engineering"
Design
Add anti-pattern blocks to the Identity section and Memory section. These are business-focused, not coding-focused.
New anti-pattern block for Identity section (~200 tokens):
Integration into personality.py:
Add as a new static method get_anti_patterns() called from IdentitySection._build_chatbot_identity(), appended after get_action_response_style().
Delivery Mechanism
Add
get_anti_patterns()toAutomatosPersonalityclassCall from
IdentitySection._build_chatbot_identity()in the parts listSeed new PromptRegistry versions
FutureAGI assessment on
is_concisemetric
Files to Change
orchestrator/consumers/chatbot/personality.py
Add get_anti_patterns() static method (~200 tokens)
orchestrator/modules/context/sections/identity.py
Add get_anti_patterns() call in _build_chatbot_identity() parts list
orchestrator/core/seeds/seed_system_prompts.py
Update PROMPT_MANIFEST
Acceptance Criteria
Anti-pattern block is injected in chatbot identity section
Chatbot responds to "create an agent named X" with action + confirmation, not a preamble about what it's about to do
Chatbot doesn't call tools for simple greetings (testable)
FutureAGI
is_concisescore improves by >0.1 on sample conversationsNo regression in
is_helpfulscore
5.8 Pattern #H: Platform Awareness Prompt (Prompt Content)
Problem
get_platform_skill() in personality.py tells Auto what tools exist — "Agent management", "Skills & plugins", "Knowledge base" — but not what users can accomplish. A new user who says "help me run my business" gets a generic response because Auto's self-knowledge is tool-centric, not goal-centric.
Current prompt (~400 tokens):
This reads like a feature list, not an assistant offering help. The user doesn't care about "agent management" — they care about "set up my team to handle customer emails automatically."
What Should Change
Rewrite get_platform_skill() as a goal-oriented capability map. Organized by what users want to do, not what API endpoints exist. Include the full breadth of 100+ platform actions grouped into achievable outcomes.
Design
New get_platform_skill() content (~600 tokens, replaces ~400):
Key differences from current:
Organization
By tool category
By user intent
Language
Technical ("agent management")
Goal-oriented ("set up your business")
Scope
7 bullet points
5 goal sections with specifics
Integrations
"Email, Slack, GitHub, Calendar"
"100+ integrations" with named examples
Missions
Not mentioned
Prominently featured
Onboarding
Not mentioned
Mission Zero teased for new users
Analytics
"Usage stats, costs"
"Real-time analytics: costs, token usage, success rates, efficiency scores"
Marketplace
"Browse the marketplace"
"Browse and install — agents, skills, plugins ready to use"
Delivery Mechanism
Replace
get_platform_skill()inpersonality.pywith the goal-oriented versionCreate new PromptRegistry version for
chatbot-friendly,chatbot-professional,chatbot-technicalFutureAGI (PRD-29) A/B test: measure if new users engage more features in first 5 conversations
Files to Change
orchestrator/consumers/chatbot/personality.py
Rewrite get_platform_skill() (~400 tokens → ~600 tokens, restructured)
orchestrator/core/seeds/seed_system_prompts.py
Update PROMPT_MANIFEST entries for chatbot-* slugs
Acceptance Criteria
get_platform_skill()is organized by user intent, not tool categoryMissions are explicitly mentioned as a capability
Mission Zero is teased for new workspaces ("Just say 'set up my workspace'")
Composio integrations mention 100+ apps with specific named examples
Analytics capabilities include success rates, efficiency scores, predictive alerts (not just "usage stats")
New PromptRegistry versions created for all chatbot personality modes
FutureAGI assessment: users in test group use 2+ more platform features in first 5 conversations vs control
5.9 Pattern #I: Mission Zero Onboarding (Next Sprint — documented here for context)
[!NOTE] Mission Zero is listed here for completeness but is a Next Sprint item (Section 6.5). It depends on Pattern H (Platform Awareness Prompt) being live so Auto knows to offer it.
See Section 6.5 for the full design.
6. Phase 2 — Next Sprint
6.1 Pattern #4: Tool Tier Stratification
Problem
All tools are flat in agent_tool_assignments. A system health-check tool sits alongside a third-party Salesforce connector with no trust distinction. There's no way to enforce "system tools always available" or "marketplace tools require explicit approval." The owner_type field on agents distinguishes workspace vs marketplace, but tools have no equivalent.
What Claude Code Does
CommandGraph stratifies into builtins / plugin-like / skill-like. Each tier is independently togglable via flags (include_plugin_commands, include_skill_commands).
Design
New enum on tools:
Schema change:
Enforcement rules:
system
No (always available)
No
No
None
platform
No (default on)
Per-tool
Yes (workspace setting)
Standard
marketplace
Yes (explicit)
Yes
Yes
Lower (2x less)
custom
Yes (owner approval)
Per-tool
Yes
Standard
Integration with Pattern #5:
When a tool is blocked due to tier policy, emit a PermissionDenial with reason tier_policy:
Files to Change
orchestrator/core/models/core.py
Add ToolTier enum, tier column to tools table
orchestrator/core/llm/function_registry.py
Filter by tier during tool resolution
orchestrator/core/services/tool_service.py
Tier-based policy enforcement
orchestrator/core/database/init_database.py
Migration + backfill
orchestrator/api/tools.py
Include tier in API responses
orchestrator/core/services/coordinator_service.py
Tier-aware agent matching
Acceptance Criteria
System tools always appear in agent tool lists regardless of assignments
Marketplace tools require explicit
agent_tool_assignmentsentryTier is visible in
GET /api/toolsresponseTier enforcement produces
PermissionDenialevents (Pattern #5)Backfill migration correctly classifies existing tools
6.2 Pattern #11: Streaming Typed Events
Problem
SSE streaming currently emits 4 event types: token, thinking, tool_call, done. The frontend activity board (PRD-06) polls for updates. Users can't see memory operations, permission denials, context compaction, or budget warnings in real-time.
What Claude Code Does
stream_submit_message() yields fine-grained typed events: message_start, command_match, tool_match, permission_denial, message_delta, message_stop.
Design
Expanded event vocabulary:
Event schema:
Emission example:
Files to Change
orchestrator/core/models/ (new)
StreamEventType enum, StreamEvent dataclass
orchestrator/core/services/chat_service.py
Emit new events during execution
orchestrator/core/llm/function_registry.py
Emit tool_resolved, tool_permission_denied
orchestrator/core/services/memory_service.py
Emit memory_injected, memory_stored
orchestrator/core/services/coordinator_service.py
Emit task_state_change, mission_stop
orchestrator/core/context/context_guard.py
Emit context_compacted, budget_warning
orchestrator/api/chat.py
Update SSE generator to use StreamEvent.to_sse()
[!NOTE] Frontend changes to consume new events are deferred to a separate frontend PRD. The backend ships the events; frontend can adopt incrementally.
Acceptance Criteria
SSE stream includes
agent_assignedevent before first tokenMemory injection visible as
memory_injectedevent with layer infoTool permission denials appear as
tool_permission_deniedevent in streamAll events include
timestampfor frontend orderingExisting
token,thinking,tool_call,doneevents unchanged (backward compatible)
6.3 Pattern #1: Frozen State Transition Models
Problem
OrchestrationTask and OrchestrationRun state fields are mutated in-place. In the coordinator tick loop, if two workers (unlikely with fcntl lock, but possible in future scaling) process the same run, they can race on state transitions. Even without races, in-place mutation makes it impossible to reconstruct "what state was the task in when the coordinator made this decision?"
What Claude Code Does
Every dataclass is @dataclass(frozen=True). State changes produce new objects. Combined with event-sourcing, every state is a snapshot.
Design
Introduce frozen transition records alongside mutable ORM models:
Usage in state machine service:
Key principle: The ORM model is still mutable (SQLAlchemy requires it), but every mutation goes through transition_*() which produces a frozen Transition record. The coordinator loop works with frozen records, not mutable ORM objects.
Files to Change
orchestrator/core/models/orchestration.py
Add TaskTransition, RunTransition frozen dataclasses
orchestrator/core/services/state_machine_service.py
Return Transition from all state changes
orchestrator/core/services/coordinator_service.py
Work with Transition records in tick loop
orchestrator/core/services/mission_dispatcher.py
Accept frozen transitions
Acceptance Criteria
Every
transition_task()andtransition_run()returns a frozenTransitiondataclassNo direct
task.state = Xmutations outside the state machine serviceAll transitions are recorded as
OrchestrationEvententriesCoordinator tick loop receives and logs
Transitionrecords
6.4 Pattern #7: Proactive Transcript Compaction
Problem
ContextGuard compacts at 80% context window usage — reactive, not proactive. For a long session (30+ turns), all turns stay in context until the panic threshold fires. By then, context quality has already degraded (LLMs perform worse at high context utilization). The L1→L2 consolidation runs hourly — too slow for active sessions.
What Claude Code Does
TranscriptStore.compact() proactively keeps only last N entries. The query engine auto-compacts after a configurable turn count, before context limits are reached.
Design
Proactive compaction after N turns:
Integration point — in chat handler, before LLM call:
Files to Change
orchestrator/core/context/ (new or extend)
maybe_compact_session() function
orchestrator/core/services/chat_service.py
Call compaction before LLM, emit event
orchestrator/core/session_queue.py
Add turn_count to session state
config.py
Add compaction configuration constants
Acceptance Criteria
After 8 turns, oldest turns are summarized and replaced
Last 4 turns always kept verbatim (no loss of recent context)
ContextGuard80% check still exists as safety netcontext_compactedSSE event emitted when compaction occursToken usage per-session decreases for long conversations (measurable)
6.5 Pattern #I: Mission Zero Onboarding
Problem
When a new user signs up, they land in an empty workspace. Auto greets them but has no structured way to:
Learn about their business and goals
Research the marketplace for matching tools
Propose a workspace setup (agents, integrations, skills, playbooks)
Let the user iterate on the proposal ("I use Google Drive, not Dropbox")
Execute the approved plan as a mission
The is_new_workspace signal already exists (GET /api/workspaces returns is_new_workspace: true when agent_count == 0), but nothing acts on it. The frontend triggers a basic onboarding UI, but Auto itself has no onboarding intelligence.
The Mission Zero research (docs/PRDS/Research/MISSION-ZERO/) demonstrated this concept with a 14-agent roster for Automatos' own workspace — proving the pattern works. This pattern generalizes it for any user.
What Mission Zero Does
Mission Zero is Auto as both planner and executor for initial workspace setup. It's a coordinator-mode mission where Auto:
Detects a new or unconfigured workspace
Discovers the user's business through structured questions
Researches the marketplace dynamically to match needs
Proposes a complete setup in plan mode for user review
Iterates based on user feedback ("swap Jira for Linear", "skip the blog agent")
Executes the approved plan as a standard mission via existing orchestration infrastructure
Design
Phase 1: Detection & Trigger
Mission Zero activates when:
is_new_workspace == true(no agents created yet), ORUser explicitly says "set up my workspace", "help me get started", "mission zero"
Add a new prompt block to get_platform_skill() (Pattern H) that tells Auto about Mission Zero. Add detection logic in the chatbot consumer to inject the Mission Zero prompt when the workspace is empty.
Here's what I'd set up for your [business type]:
Agents:
[Agent Name] ([Model]) — [What it does for them]
[Agent Name] ([Model]) — [What it does for them]
...
Integrations:
[App] ✓ (you mentioned this)
[App] ✓ (matched to your [need])
...
Skills & Plugins:
[Skill] → assigned to [Agent]
[Plugin] → workspace-wide
...
Playbooks:
[Workflow name] — [trigger] → [steps]
...
Estimated monthly cost: ~$XX at typical usage
Would you like to adjust anything, or shall I set this up?
Phase 2: Plan Mode Integration
The Mission Zero proposal is presented in a structured format that maps directly to executable actions. When the user says "approve" or "set this up", Auto calls platform_create_mission with the full plan as the goal.
The mission coordinator already handles decomposition — it will break "create 5 agents, connect 3 integrations, install 4 skills" into individual tasks assigned to Auto itself (as the coordinator agent).
Phase 3: Post-Setup
After Mission Zero completes:
Auto stores key facts about the business via
platform_store_memory(Pattern E)The
OnboardingSectionstops injecting (agent_count > 0)Auto's regular Platform Awareness prompt (Pattern H) takes over for ongoing assistance
Auto offers: "Your workspace is ready. Want me to run a quick tour of what each agent does?"
Marketplace Research Tools (Already Exist)
platform_browse_marketplace_agents
Find agent templates by business type (e.g., "marketing", "support")
platform_browse_marketplace_skills
Find skills for priority automations (e.g., "SEO", "email triage")
platform_browse_marketplace_plugins
Find plugins matching workflow needs
platform_list_connected_apps
Check which Composio integrations match user's stated tools
platform_list_llms
Recommend models based on budget sensitivity
platform_create_agent
Create agents from approved plan
platform_install_skill
Install marketplace skills
platform_assign_skill_to_agent
Wire skills to agents
platform_create_playbook
Set up automation workflows
platform_configure_agent_heartbeat
Set up autonomous agent monitoring
platform_create_mission
Execute the full setup as a coordinated mission
All 11 tools already exist. No new infrastructure needed.
Files to Change
orchestrator/modules/context/sections/onboarding.py (new)
OnboardingSection — injects Mission Zero prompt when agent_count == 0
orchestrator/modules/context/sections/__init__.py
Register OnboardingSection in SECTION_REGISTRY
orchestrator/modules/context/modes.py
Add OnboardingSection to CHATBOT mode's section list
orchestrator/consumers/chatbot/personality.py
Add Mission Zero reference in get_platform_skill() (done in Pattern H)
orchestrator/core/seeds/seed_system_prompts.py
Update PROMPT_MANIFEST with onboarding prompt content
Acceptance Criteria
When a new user sends their first message, Auto proactively offers to set up the workspace (doesn't wait to be asked)
Auto asks 4-6 discovery questions conversationally (not a numbered form)
Auto uses
platform_browse_marketplace_*tools to research matching agents, skills, and pluginsAuto presents a structured proposal with agents, integrations, skills, playbooks, and cost estimate
User can modify the proposal ("swap X for Y", "remove Z") and Auto re-presents without re-asking all questions
On user approval, Auto creates a mission that executes the setup
After mission completes, Auto stores business context in memory (Pattern E) and stops showing the onboarding prompt
OnboardingSectionreturns empty string for workspaces with agents (no prompt bloat for existing users)User can trigger Mission Zero manually by saying "set up my workspace" even in a non-empty workspace (for re-configuration)
Research Context
The Mission Zero concept was validated in docs/PRDS/Research/MISSION-ZERO/:
MISSION-0.1-PROMPT.md— 14-agent roster tested for Automatos' own workspaceMISSION-ZERO-RESULTS.md— Full operating model with KPIs, review cadence, channel matrixMISSION-ZERO-REVIEW.md— Governance assessment with 48 acceptance criteriaPLATFORM-CAPABILITIES-DEFINITIVE.md— 98 platform tools across 18 domains confirmed operationalPLATFORM-READINESS-REPORT.md— 3-phase platform build confirmed complete
The key difference: the research was a hardcoded plan for one specific workspace. Pattern I generalizes this into a dynamic, marketplace-driven flow that works for any business type.
7. Phase 3 — Backlog
7.1 Pattern #8: Session Checkpointing
Problem
If the coordinator crashes mid-mission, task state is preserved in PostgreSQL but the conversation context for each agent is lost (L1 Redis may have expired). Long-running missions (10+ tasks, 30+ minutes) are vulnerable to context loss on restart.
Design
At key milestones (task completion, plan change, tool result), write a checkpoint to S3:
Resume from checkpoint: POST /api/missions/{id}/resume?from_checkpoint=latest
Files to Change
orchestrator/core/models/orchestration.py
SessionCheckpoint dataclass, checkpoint_count on run
orchestrator/core/services/coordinator_service.py
Write checkpoint after each task completion
orchestrator/core/services/checkpoint_service.py (new)
S3 read/write, resume logic
orchestrator/api/missions.py
Add resume endpoint
Acceptance Criteria
Checkpoint written to S3 after each verified task
Resume from checkpoint restores conversation context
Checkpoint includes L1 memory snapshot
GET /api/missions/{id}/checkpointslists available checkpoints
7.2 Pattern #3: Tool Manifest Snapshots
Problem
Tool definitions come from Composio sync and the Adapter catalog. These change over time (Composio updates tool schemas, Adapter adds new tools). If an agent behaved differently yesterday, there's no way to know if the tool definitions changed.
Design
After each Composio sync or Adapter catalog refresh, write a versioned manifest:
Diff endpoint: GET /api/tools/manifest/diff?from=2026-03-30&to=2026-03-31
Files to Change
orchestrator/core/services/tool_service.py
snapshot_tool_manifest() after sync
orchestrator/api/tools.py
Manifest list + diff endpoints
7.3 Pattern #9: PRD Parity Audit
Problem
PRDs define what should exist. There's no automated check of feature completeness. Tracking is manual.
Design
A CLI script that parses PRD files and checks for implementation markers:
Run as: python scripts/prd_parity.py docs/PRDS/ --output parity_report.json
Files to Change
scripts/prd_parity.py (new)
Parser + checker
CI pipeline
Add parity check step (optional)
7.4 Pattern #12: Tool Execution Cost Tracking
Problem
LLMUsage tracks LLM call costs perfectly. But tool executions (Composio API calls, Adapter REST calls) have no cost attribution. A mission might cost $2.40 in LLM but trigger 50 Salesforce API calls with their own rate-limit implications.
Design
Extend tool_usage_logs with cost data:
Map costs from provider pricing (Composio reports usage per tool).
Files to Change
orchestrator/core/models/core.py
Add columns to tool_usage_logs
orchestrator/core/services/tool_service.py
Record cost + latency after execution
orchestrator/api/analytics.py
Include tool costs in cost aggregation
8. Implementation Phases Summary
[!NOTE] Prompt patterns (E, B, F) are the fastest wins — they're content changes to existing files, not architectural changes. They ship as new PromptRegistry versions (PRD-58) and can be A/B tested via FutureAGI (PRD-29) before activation. Roll back with one click if metrics regress.
9. Testing Strategy
Unit Tests
#H
test_platform_skill_organized_by_user_intent(), test_platform_skill_mentions_missions(), test_platform_skill_mentions_mission_zero()
#E
test_memory_framework_in_system_prompt(), test_platform_store_memory_description_includes_anti_patterns()
#B
test_tool_guidance_includes_behavioral_contracts(), test_platform_actions_has_preamble()
#F
test_anti_patterns_in_chatbot_identity(), test_anti_patterns_not_in_task_execution()
#I
test_onboarding_section_injected_for_new_workspace(), test_onboarding_section_empty_for_existing_workspace(), test_mission_zero_prompt_includes_discovery_questions()
#5
test_permission_denial_created_on_missing_tool(), test_coordinator_reassigns_on_denial()
#6
test_budget_exhausted_stop_reason(), test_max_retries_stop_reason(), test_completed_stop_reason()
#2
test_core_boots_without_extensions(), test_failed_extension_doesnt_crash_startup()
#10
test_bootstrap_report_captures_all_stages(), test_failed_stage_recorded()
#4
test_system_tools_always_available(), test_marketplace_requires_assignment()
#11
test_agent_assigned_event_emitted(), test_memory_injected_event_emitted()
#1
test_transition_returns_frozen_record(), test_invalid_transition_raises()
#7
test_compaction_after_n_turns(), test_recent_turns_preserved()
Integration Tests
Full mission lifecycle with budget limit
#5, #6, #1
Startup with failed Composio sync
#2, #10
20-turn conversation with compaction
#7, #11
Agent tool resolution with tier enforcement
#4, #5
New workspace first message triggers Mission Zero flow
#H, #I
Mission Zero marketplace research + plan generation
#I
Mission Zero plan approval creates executable mission
#I
Prompt Quality Tests (via FutureAGI)
Memory storage quality: sample 50 new Mem0 entries, <20% ephemeral artifacts
#E
FutureAGI assess on live traffic after 1 week
No tool calls for greetings: send 10 greetings, verify 0 tool calls
#B, #F
Manual test + FutureAGI safety check
Response conciseness: compare avg response length before/after
#F
FutureAGI is_concise metric
Tool failure handling: trigger 5 known tool errors, verify plain-language response
#B
Manual test
New user feature engagement: test group uses 2+ more features in first 5 convos
#H
FutureAGI assess on new user cohort
Mission Zero completion: new workspace → approved plan → executed mission
#I
Manual E2E test with fresh workspace
Mission Zero marketplace matching: user says "I use Slack" → Slack integration proposed
#I
Manual test + assertion on marketplace tool calls
10. Success Criteria
Smarter memory — Mem0 entries shift from ephemeral artifacts to curated facts; <20% garbage after 1 week (baseline: ~60%)
Better tool usage — no tool calls for greetings; tool failures produce plain-language responses; fewer redundant tool calls
Concise responses — measurable improvement in FutureAGI
is_concisemetric across chatbot conversationsPlatform-aware Auto — Auto describes capabilities in terms of user goals, not API endpoints; new users understand what they can do in the first conversation
Zero-to-operational onboarding — a new user can go from empty workspace to fully configured (agents, integrations, skills, playbooks) in one Mission Zero conversation
Dynamic marketplace matching — Mission Zero proposals are built from live marketplace data, not hardcoded templates; user modifications ("swap X for Y") are handled without re-asking discovery questions
Zero silent failures — every permission denial, budget hit, and retry exhaustion is recorded as structured data
Startup resilience — platform runs in degraded mode if any extension fails;
/health/bootstrapshows exactly what's brokenDebuggable orchestration — any mission failure can be explained from
stop_reason+permission_denialswithout log divingReal-time transparency — SSE stream includes 10+ event types covering the full agent execution lifecycle
Context quality — long conversations (20+ turns) maintain response quality via proactive compaction
No regressions — all existing tests pass, all existing SSE consumers backward compatible, FutureAGI
is_helpfulscore does not decrease
11. Risks
Memory framework prompt is too long (800 tokens)
Token budget for identity section is 600 + no limit for chatbot mode. Memory framework goes in self_learning which is outside the max_tokens cap. Monitor total prompt size.
Anti-patterns make the agent too passive
Keep the list tight (6 items). Focus on "don't over-do" not "don't do." FutureAGI is_helpful metric catches regression.
Tool behavioral contracts conflict with specific skill instructions
Skills (Priority 4) override general tool guidance. Behavioral contracts are defaults; skill-specific instructions take precedence.
Prompt changes break existing personality modes
Deliver as new PromptRegistry versions. Old versions remain archived. One-click rollback via admin UI.
Frozen dataclasses add verbosity
Only freeze state transitions, not ORM models. Minimal overhead.
Proactive compaction loses important context
Keep last 4 turns verbatim. Summary preserves key facts. 80% guard remains as safety net.
Too many SSE events overwhelm frontend
Frontend consumes incrementally. New events are additive. Existing consumers unchanged.
Trust gate causes "degraded mode" confusion
Clear /health reporting. Log warnings. Dashboard shows extension status.
Backfill of tool tiers misclassifies tools
Manual review of backfill SQL. Default to marketplace (most restrictive).
Mission Zero prompt too large for context
OnboardingSection has max_tokens=800. Prompt is ~700 tokens. Only injected for empty workspaces — zero cost for existing users.
Mission Zero marketplace results change between proposal and execution
Cache marketplace results for the session. If items are removed between proposal and execution, Auto reports what couldn't be installed and suggests alternatives.
User abandons Mission Zero mid-flow
No harm — nothing is created until explicit approval. Auto stores partial business context in memory for future attempts.
Mission Zero over-provisions (too many agents for a solo user)
Discovery question about team size calibrates the proposal. Solo users get 3-5 agents; teams get more. Budget estimate is shown upfront.
12. Open Questions
Should
PermissionDenialrecords be persisted in their own table or only asOrchestrationEvententries?What's the right
PROACTIVE_COMPACT_AFTER_TURNSvalue? 8 is a starting point — needs tuning per-model.Should tool manifest snapshots be per-workspace or global?
Do we need a "resume from degraded mode" mechanism when a failed extension recovers?
Should the memory decision framework differ per personality mode? (e.g.,
professionalmode might store more formal decisions,friendlymight store more personal context)Should anti-patterns also apply to
TASK_EXECUTIONmode agents, or onlyCHATBOT? Task agents might need different anti-patterns (e.g., "don't skip verification steps").How long before we measure memory quality improvement? 1 week proposed, but may need 2 weeks for statistical significance.
Should Mission Zero be re-runnable? ("I want to add a support department" on an existing workspace) — currently designed to also trigger on explicit "set up my workspace" command.
Should Mission Zero proposals be saved as blueprints so users can share their setup templates with others?
Should the OnboardingSection inject for workspaces with agents but no integrations (partially configured)? Could use a readiness score instead of a simple agent_count check.
Should Mission Zero have a "quick start" mode (skip questions, use sensible defaults for common business types) vs the full discovery flow?
13. References
Research: Claude Code harness pattern analysis (
research_claude_code_patterns.mdin project memory)Source repo:
instructkr/claude-code(Python port of Claude Code harness)Mission Zero Research:
docs/PRDS/Research/MISSION-ZERO/— 8 files documenting the Mission Zero concept, 14-agent roster, governance assessment, platform capabilities inventory, and readiness verdictsRelated PRDs: 82A (Coordinator), 79 (Memory), 35 (Tools), 55 (Channels), 06 (Dashboard), 77 (Scheduled Tasks)
Prompt Management: PRD-58 (PromptRegistry, versioning, seeding), PRD-29 (FutureAGI observability)
Existing Architecture:
ContextService(modules/context/service.py),SECTION_REGISTRY(modules/context/sections/__init__.py),AutomatosPersonality(consumers/chatbot/personality.py)Workspace Onboarding Signal:
api/workspaces.py—is_new_workspace: truewhenagent_count == 0Marketplace Tools:
actions_marketplace.py—platform_browse_marketplace_agents,platform_browse_marketplace_skills,platform_browse_marketplace_plugins
Last updated

