Complexity Assessment (AutoBrain)

chevron-rightRelevant source fileshashtag

Purpose & Scope

AutoBrain is the progressive complexity assessor that receives every incoming chat message and determines the computational depth required to respond. It implements PRD-68's Progressive Complexity Routing, classifying requests on a five-level scale from simple greetings (ATOM) to enterprise-scale multi-agent pipelines (ORGANISM).

The assessor's output determines three critical downstream behaviors:

  1. Routing decision — whether to respond directly, delegate to a specialized agent, or trigger a workflow

  2. Tool availability — which tools to load (if any) to avoid overwhelming the LLM

  3. Memory retrieval — whether to fetch conversation context from Mem0

For universal routing logic (agent selection after AutoBrain delegates), see Universal Router. For memory integration details, see Memory Integration.

Sources: orchestrator/consumers/chatbot/auto.py:1-394


Complexity Levels

AutoBrain classifies requests into five discrete complexity levels, each representing an order-of-magnitude increase in computational requirements:

Level
Name
Description
Token Budget
Example

ATOM

Simple

Greetings, chitchat, factual queries

<200 tokens

"hi", "thanks", "what can you do"

MOLECULE

Single Tool

Needs one tool or specific agent skill

~1K tokens

"send email", "check Jira", "search docs"

CELL

Memory + Tools

Requires conversation context + tools + reasoning

~3K tokens

"reply to that email we discussed"

ORGAN

Multi-Agent

Needs coordination between multiple agents

~6K tokens

"research bug, plan fix, open PR"

ORGANISM

Enterprise Pipeline

Full Neural Swarm with learning + feedback

~12K tokens

"refactor auth across all services"

The Complexity enum defines these levels:

Sources: orchestrator/consumers/chatbot/auto.py:41-48


Three-Tier Assessment Strategy

AutoBrain uses a three-tier cascade with strict latency and cost targets:

spinner

Diagram: AutoBrain's three-tier assessment cascade optimizes for latency and cost

Sources: orchestrator/consumers/chatbot/auto.py:152-199


Tier 1: Redis Cache Lookup

The first tier performs a cache lookup using the SHA-256 hash of the normalized message text. This provides instant (<5ms) responses for repeated queries at zero LLM cost.

Cache entries include the full ComplexityAssessment structure with TTL configured by COMPLEXITY_CACHE_TTL_HOURS (default: 24 hours).

Key characteristics:

  • Latency: <5ms (Redis GET)

  • Cost: $0

  • Hit rate: ~40-60% after warm-up (varies by workspace)

  • Isolation: Workspace-scoped keys prevent cross-tenant leakage

Sources: orchestrator/consumers/chatbot/auto.py:335-370


Tier 2: Regex Fast Paths

When cache misses, Tier 2 applies hand-coded regex patterns for common message types. These patterns are deliberately strict — they must match the entire message (with optional punctuation) to prevent false positives like "hello can you create an image" matching the greeting pattern.

ATOM Pattern Matching

The ATOM detector uses whole-message anchoring to identify pure chitchat:

Accepts: "hello", "thanks!", "bye" Rejects: "hello can you help me", "thanks for the report" (continues to Tier 3)

Platform Query Detection

Platform self-awareness queries (PRD-64) are detected via keyword matching:

These return MOLECULE complexity with matched tool hints.

Memory Recall Detection

Explicit memory references trigger CELL complexity with needs_memory=True:

Key characteristics:

  • Latency: <5ms (compiled regex matching)

  • Cost: $0

  • Coverage: ~30-40% of messages after cache misses

  • Precision: Very high (strict anchoring prevents false positives)

Sources: orchestrator/consumers/chatbot/auto.py:86-139, orchestrator/consumers/chatbot/auto.py:205-235


Tier 3: LLM Classification

When both cache and regex patterns fail, AutoBrain invokes an LLM classifier to assess complexity. This tier is the most expensive (~$0.001 per call) but handles the long tail of nuanced requests.

The classifier prompt includes:

  • Available agents in the workspace (for context)

  • Conversation turn count

  • Five complexity levels with clear definitions

  • Expected JSON response structure

The LLM is configured via the complexity_assessor service name, allowing workspace-level model overrides through the SystemSetting table.

Key characteristics:

  • Latency: ~200ms (model-dependent)

  • Cost: ~$0.001/call (GPT-4-mini equivalent)

  • Fallback: On LLM failure, returns MOLECULE / DELEGATE (current behavior)

  • Results cached: Tier 3 results are written to Redis for future Tier 1 hits

Sources: orchestrator/consumers/chatbot/auto.py:241-308


Action Types

AutoBrain maps complexity levels to three action types that control downstream execution:

spinner

Diagram: Complexity levels map to action types that control execution flow

RESPOND Action

Used for ATOM complexity (greetings, chitchat). Auto responds directly using:

  • Orchestrator's LLM configuration (not the agent's model)

  • No Composio tool loading (skip_composio=True)

  • No Universal Router invocation

  • Optional memory retrieval (usually skipped)

This bypasses the entire routing and tool discovery pipeline for maximum performance.

DELEGATE Action

Used for MOLECULE and CELL complexity (single-agent tasks). Invokes:

  • Universal Router for agent selection (unless explicit agentId provided)

  • Full tool discovery (Composio apps, platform tools, workspace tools)

  • Memory retrieval when needs_memory=True

  • Standard tool calling loop (up to 10 iterations)

WORKFLOW Action

Used for ORGAN and ORGANISM complexity (multi-agent coordination). Triggers:

  • Transient workflow creation from the user message

  • Full PRD-59 Neural Swarm pipeline (PLAN → PREPARE → EXECUTE → EVALUATE → LEARN)

  • Stage-by-stage streaming back to chat via AI SDK format

  • Workflow results saved as assistant message

Sources: orchestrator/consumers/chatbot/auto.py:50-55, orchestrator/api/chat.py:448-566


Integration with Chat Flow

AutoBrain executes before routing in the chat pipeline, providing guidance to all downstream components:

spinner

Diagram: AutoBrain assessment occurs before routing and influences all downstream decisions

The assessment result is attached to response headers for observability:

Sources: orchestrator/api/chat.py:438-526


ComplexityAssessment Data Structure

The ComplexityAssessment dataclass carries assessment results through the chat pipeline:

Key Fields

tool_hints: Short domain keywords (e.g., ["email", "github", "code"]) passed to SmartToolRouter to pre-filter tools. This replaces the old regex-based intent classification with LLM-driven hints.

needs_memory: Boolean flag controlling Mem0 retrieval. Set to False for ATOM/MOLECULE to avoid unnecessary latency.

matched_tools: Populated by Tier 2 platform query detection (e.g., ["platform_list_agents"]). These are surfaced directly as tool hints.

Sources: orchestrator/consumers/chatbot/auto.py:58-82


Integration with SmartOrchestrator

The ComplexityAssessment overrides intent-based decisions in the orchestrator:

Similarly, tool_hints enable tools even when intent classification says no:

This two-stage classification (AutoBrain → IntentClassifier) provides defense-in-depth:

  • AutoBrain determines whether to delegate at all

  • IntentClassifier (inside SmartOrchestrator) refines tool/memory selection

Sources: orchestrator/consumers/chatbot/smart_orchestrator.py:159-198


Configuration & Settings

AutoBrain's behavior is controlled via environment variables and system settings:

Setting
Type
Default
Description

COMPLEXITY_CACHE_TTL_HOURS

int

24

Redis cache TTL for assessment results

LLM_MODEL (service: complexity_assessor)

str

gpt-4o-mini

Model for Tier 3 classification

LLM_TEMPERATURE

float

0.3

Temperature for Tier 3 (low for consistency)

The LLM configuration uses the complexity_assessor service name, allowing per-workspace overrides via the SystemSetting table (category: COMPLEXITY_ASSESSOR).

Sources: orchestrator/consumers/chatbot/auto.py:362-364, orchestrator/core/models/system_settings.py:34


Performance Characteristics

AutoBrain's tiered architecture delivers sub-10ms latency for 70-80% of requests:

spinner

Diagram: AutoBrain's performance profile across the three tiers

Latency Targets

  • P50 (cache hit): <5ms

  • P95 (regex fallback): <10ms

  • P99 (LLM fallback): <300ms

Cost Analysis

Assuming 10,000 chat messages/day:

  • Tier 1 (5,000 cache hits): $0

  • Tier 2 (2,500 regex matches): $0

  • Tier 3 (2,500 LLM calls): ~$2.50/day

Total cost: ~$75/month for complexity assessment at 10K msg/day.

Cache Warm-up

The cache requires ~100-200 messages per workspace to reach steady-state hit rates. Cold-start workspaces see higher LLM usage initially, then stabilize as common patterns are cached.

Sources: orchestrator/consumers/chatbot/auto.py:152-199


Workflow Bridge (ORGAN/ORGANISM)

When AutoBrain detects ORGAN or ORGANISM complexity, the chat endpoint invokes the Workflow Bridge to execute multi-agent coordination:

spinner

Diagram: Workflow Bridge converts ORGAN/ORGANISM chat messages into PRD-59 Neural Swarm pipelines

The transient workflow is tagged chat_generated so users can find and re-run it from the workflows UI.

Sources: orchestrator/api/chat.py:70-197


Observability & Debugging

AutoBrain emits structured logs at each tier for debugging:

Assessment results are included in HTTP response headers:

For live traffic analysis, query the routing_decisions table (which also includes AutoBrain assessments when available).

Sources: orchestrator/api/chat.py:519-526


Last updated