Routing Architecture
Purpose and Scope
This document describes the Universal Router system that intelligently routes incoming requests to appropriate agents or workflows within the Automatos AI platform. The router implements a four-tier cascading strategy to minimize latency and LLM costs while maximizing routing accuracy.
For information about agent execution after routing, see Agent Lifecycle & Status. For workflow execution, see Recipe Execution. For the chat interface that uses routing, see Streaming Chat Service.
Architecture Overview
The Universal Router (UniversalRouter) is a core orchestration component that resolves a RequestEnvelope into a RoutingDecision using a six-tier cascading strategy. Each tier attempts to route the request using progressively more expensive but more flexible methods.
Design Philosophy:
Performance optimization: Minimize LLM calls through aggressive caching and pre-filtering
Cost optimization: 95%+ of requests never reach the LLM tier
Latency optimization: Redis cache provides <1ms lookups, semantic similarity <20ms
Accuracy: Semantic matching + LLM fallback ensures high-quality routing
Transparency: Every decision is logged with reasoning and confidence
Routing Tiers:
Tier 0
User overrides
<1ms
Zero
1.0
Tier 1
Cache lookup
1-5ms
Zero
Varies
Tier 2a
Routing rules (source pattern)
5-20ms
Zero
0.9
Tier 2b
Trigger subscriptions
10-30ms
Zero
0.95
Tier 2.5
Semantic similarity (agent embeddings)
10-50ms
Zero
0.0-1.0
Tier 2c
Intent keyword matching
5-15ms
Zero
0.4-0.8
Tier 3
LLM classification
500ms-3s
API cost
0.0-1.0
Tier 2.5 Semantic Routing (PRD-64): The semantic tier uses pre-computed agent embeddings to find the best match via cosine similarity. High-confidence matches (≥0.85) route directly; ambiguous results are passed as candidate hints to Tier 3, dramatically reducing the search space for the LLM.
Confidence-Based Orchestration: When LLM confidence is below the threshold (ROUTING_LLM_CONFIDENCE_THRESHOLD, default 0.5), the router returns route_type="orchestrate" instead of direct agent routing, triggering full workflow decomposition for complex requests.
Sources: orchestrator/core/routing/engine.py:1-50, orchestrator/core/routing/engine.py:358-447
Core Components
The routing system consists of several key classes and data structures:
RequestEnvelope (core.models.routing.RequestEnvelope):
Input structure containing request content, source channel, workspace context
Fields:
id,workspace_id,content,source,metadata,override_agent_id,override_workflow_idImmutable after creation
RoutingDecision (core.models.routing.RoutingDecision):
Output structure containing routing target and confidence
Fields:
route_type(agent/workflow/orchestrate),agent_id,workflow_id,confidence,reasoning,cached,intent_categoryroute_type="orchestrate"signals low confidence → full decomposition needed
UniversalRouter (core.routing.engine.UniversalRouter):
Main routing engine class
Constructor:
__init__(db: Session, cache: Optional[RoutingCache])Primary method:
async route(envelope: RequestEnvelope) -> Optional[RoutingDecision]Tier methods:
_tier0_override(),_tier1_cache(),_tier2a_rules(),_tier2b_trigger_subscription(),_tier2_5_semantic(),_tier2c_intent_classifier(),_classify_with_llm()
RoutingCache (core.routing.cache.RoutingCache):
Redis-backed cache for routing decisions
Cache key format:
routing:{workspace_id}:{content_hash}:{source}TTL configured via
ROUTING_CACHE_TTL_HOURS(default 24 hours)Supports correction tracking:
record_correction()for user feedback
IntentClassifier (core.services.intent_classifier.IntentClassifier):
Keyword-based intent classification without LLM
Returns
IntentClassification(category, confidence, matched_keywords)Used by Tier 2c for lightweight classification
SemanticIndexer (core.routing.semantic_indexer):
Generates and stores agent embeddings in
Agent.semantic_embeddingcolumnFunctions:
embed_workspace_agents(),find_similar_agents()Thresholds:
SIMILARITY_DIRECT_ROUTE=0.85,MAX_LLM_CANDIDATES=5
Sources: orchestrator/core/routing/engine.py:16-73, orchestrator/core/models/routing.py, orchestrator/core/routing/semantic_indexer.py
Tier 0: User Overrides
The first and fastest tier checks for explicit routing instructions from the user. This allows callers to bypass automatic routing when they know exactly which agent or workflow should handle the request.
Implementation:
Method: _tier0_override(envelope: RequestEnvelope) -> Optional[RoutingDecision]
Logic:
Check
envelope.override_agent_idIf set: return
RoutingDecision(route_type="agent", agent_id=..., confidence=1.0, reasoning="User override")
Check
envelope.override_workflow_idIf set: return
RoutingDecision(route_type="workflow", workflow_id=..., confidence=1.0, reasoning="User override")
If neither set: return
None(proceed to Tier 1)
Use Cases:
API callers that specify target agent explicitly
UI components with agent selection dropdowns
Debugging and testing specific agents
Forced routing in workflow steps
Characteristics:
Latency: <1ms (simple attribute checks)
Confidence: Always 1.0 (explicit user intent)
Cost: Zero (no external calls)
Sources: orchestrator/core/routing/engine.py:148-165
Tier 1: Cache Lookup
When no override is specified, the router checks the RoutingCache for a previously computed decision. This dramatically reduces latency and LLM costs for repeated requests.
Method: _tier1_cache(envelope: RequestEnvelope) -> Optional[RoutingDecision]
Cache Key Format:
Content Normalization: The router normalizes content before hashing to improve cache hit rates:
Lowercasing
Whitespace compression
Special character removal
Implemented in _normalize_content() (referenced by RoutingCache)
TTL Configuration: Cache entries expire after ROUTING_CACHE_TTL_HOURS (default: 24 hours). This balances:
Freshness: Agent capabilities may change
Hit rate: Most queries repeat within 24h
Memory: Prevents unbounded Redis growth
Population: Cache is populated in Tier 3 after successful LLM classification. Even low-confidence results are cached to avoid repeated LLM calls for the same request.
Characteristics:
Latency: 1-5ms (Redis roundtrip)
Hit rate: Varies by workload (typically 60-80% for stable workspaces)
Cost: Zero after initial classification
Sources: orchestrator/core/routing/engine.py:168-177, orchestrator/core/routing/cache.py, orchestrator/config.py:143
Tier 2: Rule-Based Routing
Tier 2 consists of four sub-tiers that check database tables, perform semantic similarity matching, and use lightweight classification without calling LLMs. This provides fast, deterministic routing for configured patterns.
Tier 2a: Routing Rules
The routing_rules table allows administrators to configure explicit routing patterns based on source channel and workspace.
Method: _tier2a_rules(envelope: RequestEnvelope) -> Optional[RoutingDecision]
Database Schema:
Matching Logic:
If
rule.source_patternis NULL or empty → matches any sourceElse: exact match against
envelope.source.value(e.g., "jira_trigger", "web_form", "slack")
Priority: Rules are evaluated in descending priority order. First match wins. This allows broad catch-all rules (low priority) with specific overrides (high priority).
Characteristics:
Latency: 5-20ms (database query)
Confidence: 0.9 (high confidence in configured rules)
Flexibility: Admin-configurable without code changes
Sources: orchestrator/core/routing/engine.py:178-214, orchestrator/core/models/routing.py
Tier 2b: Trigger Subscriptions
For requests originating from Composio triggers (e.g., Jira webhooks), the router checks the trigger_subscriptions table for predefined agent/workflow assignments.
Method: _tier2b_trigger_subscription(envelope: RequestEnvelope) -> Optional[RoutingDecision]
Database Flow:
Resolve
workspace_id→ComposioEntity(Composio's per-workspace entity ID)Query
TriggerSubscriptionfor that entityOptionally match
trigger_namefromenvelope.metadatafor specific trigger routing
Use Case: When a Jira issue is created/updated, Composio webhook delivers it to Automatos. The trigger subscription maps it to a specific triage agent or bug workflow.
Characteristics:
Latency: 10-30ms (two database queries)
Confidence: 0.95 (explicit configuration)
Source-specific: Only applies to
ChannelSource.JIRA_TRIGGER
Sources: orchestrator/core/routing/engine.py:216-278, orchestrator/core/models/composio.py
Tier 2c: Intent Classification
The IntentClassifier performs keyword-based intent detection and matches the detected intent against routing_rules.intent_keywords for fast classification without LLMs.
Method: _tier2c_intent_classifier(envelope: RequestEnvelope) -> Optional[RoutingDecision]
IntentClassifier Algorithm: The classifier scans request content for predefined keyword patterns:
"bug", "error", "crash" → category="bug_report"
"feature", "enhancement", "improvement" → category="feature_request"
"question", "how do I", "help" → category="support_question"
Confidence based on keyword count and strength
Routing Rule Integration: Admin configures routing_rules with intent_keywords array:
When IntentClassifier returns category="bug_report", this rule matches and routes to the Bug Triage Agent.
Characteristics:
Latency: 5-15ms (regex matching + DB query)
Confidence: Variable (from IntentClassifier, typically 0.5-0.8)
Cost: Zero (no LLM)
Accuracy: Good for common patterns, misses nuance
Sources: orchestrator/core/routing/engine.py:311-354, orchestrator/core/services/intent_classifier.py
Tier 2.5: Semantic Similarity Routing
Tier 2.5 uses pre-computed agent embeddings to find the best match via cosine similarity. This tier runs before Tier 2c (intent keywords) because semantic matching understands agent capabilities, while keyword matching is coarse and can be hijacked by overly broad rules.
Method: async _tier2_5_semantic(envelope: RequestEnvelope) -> tuple[Optional[RoutingDecision], list]
Algorithm:
1. Query Agents with Embeddings
If no agents have embeddings, the system logs a warning with diagnostic counts (total agents vs embedded agents) and returns (None, []).
2. Generate Query Embedding
3. Calculate Similarity Scores
For each agent, compute cosine similarity and apply boosts:
4. Evaluate Top Match
The highest-scoring agent is compared against SIMILARITY_DIRECT_ROUTE threshold (default: 0.85):
High Confidence (≥0.85):
Route directly to the top agent
Cache the decision for future Tier 1 hits
Return
(decision, [])
Low Confidence (<0.85):
Return top N candidates (default: 5) as hints for Tier 3
The LLM will see a narrowed list, improving accuracy and reducing cost
Return
(None, candidates)
5. Tier Integration
The router uses the candidates to optimize Tier 3:
Characteristics:
Latency: 10-50ms (embedding generation + vector operations)
Confidence: 0.0-1.0 (from cosine similarity)
Cost: Zero (uses cached embeddings)
Accuracy: High for agents with detailed descriptions
Embedding Generation: Agent embeddings are generated/updated by the semantic indexer:
Triggered on agent create/update
Admin endpoint:
POST /api/routing/semantic/reindexText hash stored to detect changes:
Agent.semantic_text_hash
Embedding Source Text:
Sources: orchestrator/core/routing/engine.py:358-447, orchestrator/core/routing/semantic_indexer.py, orchestrator/core/math/vector_operations.py
Tier 2c: Intent Classification
</old_str>
<new_str> Sources: orchestrator/core/routing/engine.py:311-354, orchestrator/core/services/intent_classifier.py
Tier 2.5: Semantic Similarity Routing
Tier 2.5 uses pre-computed agent embeddings to find the best match via cosine similarity. This tier runs before Tier 2c (intent keywords) because semantic matching understands agent capabilities, while keyword matching is coarse and can be hijacked by overly broad rules.
Method: async _tier2_5_semantic(envelope: RequestEnvelope) -> tuple[Optional[RoutingDecision], list]
Algorithm:
1. Query Agents with Embeddings
If no agents have embeddings, the system logs a warning with diagnostic counts (total agents vs embedded agents) and returns (None, []).
2. Generate Query Embedding
3. Calculate Similarity Scores
For each agent, compute cosine similarity and apply boosts:
4. Evaluate Top Match
The highest-scoring agent is compared against SIMILARITY_DIRECT_ROUTE threshold (default: 0.85):
High Confidence (≥0.85):
Route directly to the top agent
Cache the decision for future Tier 1 hits
Return
(decision, [])
Low Confidence (<0.85):
Return top N candidates (default: 5) as hints for Tier 3
The LLM will see a narrowed list, improving accuracy and reducing cost
Return
(None, candidates)
5. Tier Integration
The router uses the candidates to optimize Tier 3:
Characteristics:
Latency: 10-50ms (embedding generation + vector operations)
Confidence: 0.0-1.0 (from cosine similarity)
Cost: Zero (uses cached embeddings)
Accuracy: High for agents with detailed descriptions
Embedding Generation: Agent embeddings are generated/updated by the semantic indexer:
Triggered on agent create/update
Admin endpoint:
POST /api/routing/semantic/reindexText hash stored to detect changes:
Agent.semantic_text_hash
Embedding Source Text:
Sources: orchestrator/core/routing/engine.py:358-447, orchestrator/core/routing/semantic_indexer.py, orchestrator/core/math/vector_operations.py
Tier 2c: Intent Classification
When all rule-based tiers fail to route the request, the router falls back to LLM-based classification. This tier uses the workspace's configured LLM to analyze the request and select the best agent based on agent descriptions and assigned tools.
Method: async _classify_with_llm(envelope: RequestEnvelope) -> Optional[RoutingDecision]
Step-by-Step Process:
1. Agent Query
Fetch all active agents in the workspace:
2. Build Agent Descriptions
For each agent, construct a description including:
agent_id(used in LLM response)nameanddescriptionapps(fromagent_app_assignmentstable) — crucial for tool-based routing
Example output:
Method: _build_agent_descriptions(agents: List[Agent]) -> List[Dict]
3. Build LLM Prompt
The router constructs a classification prompt optimized for semantic candidates:
Semantic Hints: When Tier 2.5 provides candidate agents, they are injected into the prompt as hints:
This dramatically improves LLM accuracy by pre-filtering the search space.
Prompt Registry Integration: The prompt can be customized by admins via PromptRegistry (see page 11.1):
Method: _build_classification_prompt(content: str, agent_descriptions: List[Dict], semantic_candidates: Optional[List]) -> str
4. Call LLM
Use the workspace's configured LLM provider:
The LLM model is determined by system settings (see LLM Manager).
5. Parse Response
Extract agent_id and confidence from JSON response:
Validates that agent_id is in the workspace's active agent list.
Method: _parse_llm_routing_response(response_text: str, agents: List[Agent]) -> tuple[Optional[int], float]
6. Confidence Threshold Check
Compare confidence against ROUTING_LLM_CONFIDENCE_THRESHOLD (default: 0.5):
High Confidence (≥ 0.5):
Return
RoutingDecision(route_type="agent", agent_id=..., confidence=...)Direct routing to the selected agent
Low Confidence (< 0.5):
Return
RoutingDecision(route_type="orchestrate", agent_id=..., confidence=...)Triggers full workflow decomposition (see Confidence-Based Orchestration)
7. Cache Result
Both high and low confidence results are cached to avoid repeated LLM calls:
Characteristics:
Latency: 500ms - 3s (depends on LLM provider and model)
Confidence: Variable (0.0 - 1.0)
Cost: LLM API call (~500-2000 tokens)
Accuracy: High, especially when agent descriptions are detailed
Sources: orchestrator/core/routing/engine.py:328-433, orchestrator/config.py:144
Confidence-Based Orchestration
The router's confidence threshold (ROUTING_LLM_CONFIDENCE_THRESHOLD) determines whether a request should be routed directly to an agent or decomposed into a full orchestrated workflow.
Rationale: When the LLM is uncertain about which agent to route to (e.g., ambiguous request, multiple possible agents), forcing direct routing may lead to poor outcomes. Instead, the system triggers a multi-agent workflow that:
Clarifies the request with the user
Decomposes it into subtasks
Routes each subtask to specialized agents
Consolidates the results
Configuration: Set ROUTING_LLM_CONFIDENCE_THRESHOLD in environment variables:
Downstream Handling: The caller (e.g., StreamingChatService, RecipeExecutor) checks decision.route_type:
Metrics: Track orchestration rate in analytics:
% of requests routed directly
% of requests orchestrated
Average confidence by source channel
Sources: orchestrator/core/routing/engine.py:386-410, orchestrator/config.py:144
Request Flow
This diagram shows the complete routing flow from request ingestion to decision logging, mapping each step to specific code entities.
Request Headers: The router extracts workspace context from request headers:
X-Workspace-ID: Workspace UUIDAuthorization: Clerk JWT or API key
Handled by get_request_context_hybrid dependency (see Authentication Flow).
Response Headers: The router injects routing metadata into response headers for debugging:
X-Routing-Agent-ID: Selected agent IDX-Routing-Confidence: Confidence score (0.0-1.0)X-Routing-Type: "agent", "workflow", or "orchestrate"X-Routing-Reasoning: Human-readable routing explanationX-Routing-Request-ID: Request envelope ID
Exposed via CORS configuration in orchestrator/main.py:444.
Sources: orchestrator/core/routing/engine.py:78-144, orchestrator/main.py:444
Decision Logging and Analytics
Every routing decision is persisted to the routing_decisions table for analytics, debugging, and optimization.
Database Schema
Unrouted Events
When all routing tiers fail, the request is stored in unrouted_events for later analysis:
Admin dashboards can query this table to identify:
Common request patterns that fail to route
Gaps in agent coverage
Misconfigured routing rules
Analytics Queries
Routing effectiveness:
Cache hit rate:
Agent routing distribution:
Sources: orchestrator/core/routing/engine.py:536-585, orchestrator/core/models/routing.py
Configuration
The routing system is configured via environment variables and system settings.
Environment Variables
ROUTING_CACHE_TTL_HOURS
24
Cache entry TTL in hours. Balances freshness vs hit rate.
ROUTING_LLM_CONFIDENCE_THRESHOLD
0.5
Minimum confidence for direct routing. Lower = more orchestration.
COMPOSIO_WEBHOOK_SECRET
(required)
Secret for validating Composio trigger webhooks.
Example .env configuration:
System Settings
The LLM used for Tier 3 classification is configured via system settings (see LLM Manager):
Provider:
system_settings.orchestrator_llm.provider(e.g., "openai", "anthropic")Model:
system_settings.orchestrator_llm.model(e.g., "gpt-4-turbo-preview")
Example:
Redis Configuration
Routing cache requires Redis (optional but recommended):
If Redis is unavailable:
Tier 1 (cache) is skipped
All requests fall through to Tier 2/3
Performance degrades but system remains functional
Workspace-Level Overrides
Workspaces cannot currently override global routing configuration. All routing parameters are platform-wide.
Future Enhancement: Add workspace_settings table to allow per-workspace:
Custom confidence thresholds
Preferred LLM models for routing
Cache TTL overrides
Sources: orchestrator/config.py:140-149, orchestrator/.env.example:37-41
Summary
The Universal Router provides intelligent, cost-effective request routing through a four-tier cascading strategy:
Tier 0 (User Overrides): <1ms, confidence 1.0, zero cost
Tier 1 (Cache): 1-5ms, varies, zero cost after initial classification
Tier 2 (Rules/Triggers/Intent): 5-30ms, confidence 0.4-0.95, zero cost
Tier 3 (LLM): 500ms-3s, confidence 0.0-1.0, LLM API cost
Key Benefits:
Cost Optimization: 60-80% cache hit rate eliminates LLM costs
Latency Optimization: Most requests routed in <10ms
Accuracy: LLM fallback ensures all requests can be routed
Transparency: Every decision logged with reasoning
Related Systems:
Agent Lifecycle & Status - Agent execution after routing
Recipe Execution - Workflow execution for orchestrated requests
Streaming Chat Service - Chat interface that uses routing
LLM Manager - LLM configuration for Tier 3
Sources: orchestrator/core/routing/engine.py:1-586, orchestrator/main.py:79
Last updated

