Routing Architecture

chevron-rightRelevant source fileshashtag

Purpose and Scope

This document describes the Universal Router system that intelligently routes incoming requests to appropriate agents or workflows within the Automatos AI platform. The router implements a four-tier cascading strategy to minimize latency and LLM costs while maximizing routing accuracy.

For information about agent execution after routing, see Agent Lifecycle & Status. For workflow execution, see Recipe Execution. For the chat interface that uses routing, see Streaming Chat Service.


Architecture Overview

The Universal Router (UniversalRouter) is a core orchestration component that resolves a RequestEnvelope into a RoutingDecision using a six-tier cascading strategy. Each tier attempts to route the request using progressively more expensive but more flexible methods.

Design Philosophy:

  • Performance optimization: Minimize LLM calls through aggressive caching and pre-filtering

  • Cost optimization: 95%+ of requests never reach the LLM tier

  • Latency optimization: Redis cache provides <1ms lookups, semantic similarity <20ms

  • Accuracy: Semantic matching + LLM fallback ensures high-quality routing

  • Transparency: Every decision is logged with reasoning and confidence

Routing Tiers:

Tier
Method
Latency
Cost
Confidence

Tier 0

User overrides

<1ms

Zero

1.0

Tier 1

Cache lookup

1-5ms

Zero

Varies

Tier 2a

Routing rules (source pattern)

5-20ms

Zero

0.9

Tier 2b

Trigger subscriptions

10-30ms

Zero

0.95

Tier 2.5

Semantic similarity (agent embeddings)

10-50ms

Zero

0.0-1.0

Tier 2c

Intent keyword matching

5-15ms

Zero

0.4-0.8

Tier 3

LLM classification

500ms-3s

API cost

0.0-1.0

Tier 2.5 Semantic Routing (PRD-64): The semantic tier uses pre-computed agent embeddings to find the best match via cosine similarity. High-confidence matches (≥0.85) route directly; ambiguous results are passed as candidate hints to Tier 3, dramatically reducing the search space for the LLM.

Confidence-Based Orchestration: When LLM confidence is below the threshold (ROUTING_LLM_CONFIDENCE_THRESHOLD, default 0.5), the router returns route_type="orchestrate" instead of direct agent routing, triggering full workflow decomposition for complex requests.

Sources: orchestrator/core/routing/engine.py:1-50, orchestrator/core/routing/engine.py:358-447


Core Components

The routing system consists of several key classes and data structures:

spinner

RequestEnvelope (core.models.routing.RequestEnvelope):

  • Input structure containing request content, source channel, workspace context

  • Fields: id, workspace_id, content, source, metadata, override_agent_id, override_workflow_id

  • Immutable after creation

RoutingDecision (core.models.routing.RoutingDecision):

  • Output structure containing routing target and confidence

  • Fields: route_type (agent/workflow/orchestrate), agent_id, workflow_id, confidence, reasoning, cached, intent_category

  • route_type="orchestrate" signals low confidence → full decomposition needed

UniversalRouter (core.routing.engine.UniversalRouter):

  • Main routing engine class

  • Constructor: __init__(db: Session, cache: Optional[RoutingCache])

  • Primary method: async route(envelope: RequestEnvelope) -> Optional[RoutingDecision]

  • Tier methods: _tier0_override(), _tier1_cache(), _tier2a_rules(), _tier2b_trigger_subscription(), _tier2_5_semantic(), _tier2c_intent_classifier(), _classify_with_llm()

RoutingCache (core.routing.cache.RoutingCache):

  • Redis-backed cache for routing decisions

  • Cache key format: routing:{workspace_id}:{content_hash}:{source}

  • TTL configured via ROUTING_CACHE_TTL_HOURS (default 24 hours)

  • Supports correction tracking: record_correction() for user feedback

IntentClassifier (core.services.intent_classifier.IntentClassifier):

  • Keyword-based intent classification without LLM

  • Returns IntentClassification(category, confidence, matched_keywords)

  • Used by Tier 2c for lightweight classification

SemanticIndexer (core.routing.semantic_indexer):

  • Generates and stores agent embeddings in Agent.semantic_embedding column

  • Functions: embed_workspace_agents(), find_similar_agents()

  • Thresholds: SIMILARITY_DIRECT_ROUTE=0.85, MAX_LLM_CANDIDATES=5

Sources: orchestrator/core/routing/engine.py:16-73, orchestrator/core/models/routing.py, orchestrator/core/routing/semantic_indexer.py


Tier 0: User Overrides

The first and fastest tier checks for explicit routing instructions from the user. This allows callers to bypass automatic routing when they know exactly which agent or workflow should handle the request.

Implementation:

spinner

Method: _tier0_override(envelope: RequestEnvelope) -> Optional[RoutingDecision]

Logic:

  1. Check envelope.override_agent_id

    • If set: return RoutingDecision(route_type="agent", agent_id=..., confidence=1.0, reasoning="User override")

  2. Check envelope.override_workflow_id

    • If set: return RoutingDecision(route_type="workflow", workflow_id=..., confidence=1.0, reasoning="User override")

  3. If neither set: return None (proceed to Tier 1)

Use Cases:

  • API callers that specify target agent explicitly

  • UI components with agent selection dropdowns

  • Debugging and testing specific agents

  • Forced routing in workflow steps

Characteristics:

  • Latency: <1ms (simple attribute checks)

  • Confidence: Always 1.0 (explicit user intent)

  • Cost: Zero (no external calls)

Sources: orchestrator/core/routing/engine.py:148-165


Tier 1: Cache Lookup

When no override is specified, the router checks the RoutingCache for a previously computed decision. This dramatically reduces latency and LLM costs for repeated requests.

spinner

Method: _tier1_cache(envelope: RequestEnvelope) -> Optional[RoutingDecision]

Cache Key Format:

Content Normalization: The router normalizes content before hashing to improve cache hit rates:

  • Lowercasing

  • Whitespace compression

  • Special character removal

Implemented in _normalize_content() (referenced by RoutingCache)

TTL Configuration: Cache entries expire after ROUTING_CACHE_TTL_HOURS (default: 24 hours). This balances:

  • Freshness: Agent capabilities may change

  • Hit rate: Most queries repeat within 24h

  • Memory: Prevents unbounded Redis growth

Population: Cache is populated in Tier 3 after successful LLM classification. Even low-confidence results are cached to avoid repeated LLM calls for the same request.

Characteristics:

  • Latency: 1-5ms (Redis roundtrip)

  • Hit rate: Varies by workload (typically 60-80% for stable workspaces)

  • Cost: Zero after initial classification

Sources: orchestrator/core/routing/engine.py:168-177, orchestrator/core/routing/cache.py, orchestrator/config.py:143


Tier 2: Rule-Based Routing

Tier 2 consists of four sub-tiers that check database tables, perform semantic similarity matching, and use lightweight classification without calling LLMs. This provides fast, deterministic routing for configured patterns.

Tier 2a: Routing Rules

The routing_rules table allows administrators to configure explicit routing patterns based on source channel and workspace.

spinner

Method: _tier2a_rules(envelope: RequestEnvelope) -> Optional[RoutingDecision]

Database Schema:

Matching Logic:

  1. If rule.source_pattern is NULL or empty → matches any source

  2. Else: exact match against envelope.source.value (e.g., "jira_trigger", "web_form", "slack")

Priority: Rules are evaluated in descending priority order. First match wins. This allows broad catch-all rules (low priority) with specific overrides (high priority).

Characteristics:

  • Latency: 5-20ms (database query)

  • Confidence: 0.9 (high confidence in configured rules)

  • Flexibility: Admin-configurable without code changes

Sources: orchestrator/core/routing/engine.py:178-214, orchestrator/core/models/routing.py


Tier 2b: Trigger Subscriptions

For requests originating from Composio triggers (e.g., Jira webhooks), the router checks the trigger_subscriptions table for predefined agent/workflow assignments.

spinner

Method: _tier2b_trigger_subscription(envelope: RequestEnvelope) -> Optional[RoutingDecision]

Database Flow:

  1. Resolve workspace_idComposioEntity (Composio's per-workspace entity ID)

  2. Query TriggerSubscription for that entity

  3. Optionally match trigger_name from envelope.metadata for specific trigger routing

Use Case: When a Jira issue is created/updated, Composio webhook delivers it to Automatos. The trigger subscription maps it to a specific triage agent or bug workflow.

Characteristics:

  • Latency: 10-30ms (two database queries)

  • Confidence: 0.95 (explicit configuration)

  • Source-specific: Only applies to ChannelSource.JIRA_TRIGGER

Sources: orchestrator/core/routing/engine.py:216-278, orchestrator/core/models/composio.py


Tier 2c: Intent Classification

The IntentClassifier performs keyword-based intent detection and matches the detected intent against routing_rules.intent_keywords for fast classification without LLMs.

spinner

Method: _tier2c_intent_classifier(envelope: RequestEnvelope) -> Optional[RoutingDecision]

IntentClassifier Algorithm: The classifier scans request content for predefined keyword patterns:

  • "bug", "error", "crash" → category="bug_report"

  • "feature", "enhancement", "improvement" → category="feature_request"

  • "question", "how do I", "help" → category="support_question"

  • Confidence based on keyword count and strength

Routing Rule Integration: Admin configures routing_rules with intent_keywords array:

When IntentClassifier returns category="bug_report", this rule matches and routes to the Bug Triage Agent.

Characteristics:

  • Latency: 5-15ms (regex matching + DB query)

  • Confidence: Variable (from IntentClassifier, typically 0.5-0.8)

  • Cost: Zero (no LLM)

  • Accuracy: Good for common patterns, misses nuance

Sources: orchestrator/core/routing/engine.py:311-354, orchestrator/core/services/intent_classifier.py


Tier 2.5: Semantic Similarity Routing

Tier 2.5 uses pre-computed agent embeddings to find the best match via cosine similarity. This tier runs before Tier 2c (intent keywords) because semantic matching understands agent capabilities, while keyword matching is coarse and can be hijacked by overly broad rules.

spinner

Method: async _tier2_5_semantic(envelope: RequestEnvelope) -> tuple[Optional[RoutingDecision], list]

Algorithm:

1. Query Agents with Embeddings

If no agents have embeddings, the system logs a warning with diagnostic counts (total agents vs embedded agents) and returns (None, []).

2. Generate Query Embedding

3. Calculate Similarity Scores

For each agent, compute cosine similarity and apply boosts:

4. Evaluate Top Match

The highest-scoring agent is compared against SIMILARITY_DIRECT_ROUTE threshold (default: 0.85):

High Confidence (≥0.85):

  • Route directly to the top agent

  • Cache the decision for future Tier 1 hits

  • Return (decision, [])

Low Confidence (<0.85):

  • Return top N candidates (default: 5) as hints for Tier 3

  • The LLM will see a narrowed list, improving accuracy and reducing cost

  • Return (None, candidates)

5. Tier Integration

The router uses the candidates to optimize Tier 3:

Characteristics:

  • Latency: 10-50ms (embedding generation + vector operations)

  • Confidence: 0.0-1.0 (from cosine similarity)

  • Cost: Zero (uses cached embeddings)

  • Accuracy: High for agents with detailed descriptions

Embedding Generation: Agent embeddings are generated/updated by the semantic indexer:

  • Triggered on agent create/update

  • Admin endpoint: POST /api/routing/semantic/reindex

  • Text hash stored to detect changes: Agent.semantic_text_hash

Embedding Source Text:

Sources: orchestrator/core/routing/engine.py:358-447, orchestrator/core/routing/semantic_indexer.py, orchestrator/core/math/vector_operations.py


Tier 2c: Intent Classification

</old_str>

<new_str> Sources: orchestrator/core/routing/engine.py:311-354, orchestrator/core/services/intent_classifier.py


Tier 2.5: Semantic Similarity Routing

Tier 2.5 uses pre-computed agent embeddings to find the best match via cosine similarity. This tier runs before Tier 2c (intent keywords) because semantic matching understands agent capabilities, while keyword matching is coarse and can be hijacked by overly broad rules.

spinner

Method: async _tier2_5_semantic(envelope: RequestEnvelope) -> tuple[Optional[RoutingDecision], list]

Algorithm:

1. Query Agents with Embeddings

If no agents have embeddings, the system logs a warning with diagnostic counts (total agents vs embedded agents) and returns (None, []).

2. Generate Query Embedding

3. Calculate Similarity Scores

For each agent, compute cosine similarity and apply boosts:

4. Evaluate Top Match

The highest-scoring agent is compared against SIMILARITY_DIRECT_ROUTE threshold (default: 0.85):

High Confidence (≥0.85):

  • Route directly to the top agent

  • Cache the decision for future Tier 1 hits

  • Return (decision, [])

Low Confidence (<0.85):

  • Return top N candidates (default: 5) as hints for Tier 3

  • The LLM will see a narrowed list, improving accuracy and reducing cost

  • Return (None, candidates)

5. Tier Integration

The router uses the candidates to optimize Tier 3:

Characteristics:

  • Latency: 10-50ms (embedding generation + vector operations)

  • Confidence: 0.0-1.0 (from cosine similarity)

  • Cost: Zero (uses cached embeddings)

  • Accuracy: High for agents with detailed descriptions

Embedding Generation: Agent embeddings are generated/updated by the semantic indexer:

  • Triggered on agent create/update

  • Admin endpoint: POST /api/routing/semantic/reindex

  • Text hash stored to detect changes: Agent.semantic_text_hash

Embedding Source Text:

Sources: orchestrator/core/routing/engine.py:358-447, orchestrator/core/routing/semantic_indexer.py, orchestrator/core/math/vector_operations.py


Tier 2c: Intent Classification

When all rule-based tiers fail to route the request, the router falls back to LLM-based classification. This tier uses the workspace's configured LLM to analyze the request and select the best agent based on agent descriptions and assigned tools.

spinner

Method: async _classify_with_llm(envelope: RequestEnvelope) -> Optional[RoutingDecision]

Step-by-Step Process:

1. Agent Query

Fetch all active agents in the workspace:

2. Build Agent Descriptions

For each agent, construct a description including:

  • agent_id (used in LLM response)

  • name and description

  • apps (from agent_app_assignments table) — crucial for tool-based routing

Example output:

Method: _build_agent_descriptions(agents: List[Agent]) -> List[Dict]

3. Build LLM Prompt

The router constructs a classification prompt optimized for semantic candidates:

Semantic Hints: When Tier 2.5 provides candidate agents, they are injected into the prompt as hints:

This dramatically improves LLM accuracy by pre-filtering the search space.

Prompt Registry Integration: The prompt can be customized by admins via PromptRegistry (see page 11.1):

Method: _build_classification_prompt(content: str, agent_descriptions: List[Dict], semantic_candidates: Optional[List]) -> str

4. Call LLM

Use the workspace's configured LLM provider:

The LLM model is determined by system settings (see LLM Manager).

5. Parse Response

Extract agent_id and confidence from JSON response:

Validates that agent_id is in the workspace's active agent list.

Method: _parse_llm_routing_response(response_text: str, agents: List[Agent]) -> tuple[Optional[int], float]

6. Confidence Threshold Check

Compare confidence against ROUTING_LLM_CONFIDENCE_THRESHOLD (default: 0.5):

  • High Confidence (≥ 0.5):

    • Return RoutingDecision(route_type="agent", agent_id=..., confidence=...)

    • Direct routing to the selected agent

  • Low Confidence (< 0.5):

    • Return RoutingDecision(route_type="orchestrate", agent_id=..., confidence=...)

    • Triggers full workflow decomposition (see Confidence-Based Orchestration)

7. Cache Result

Both high and low confidence results are cached to avoid repeated LLM calls:

Characteristics:

  • Latency: 500ms - 3s (depends on LLM provider and model)

  • Confidence: Variable (0.0 - 1.0)

  • Cost: LLM API call (~500-2000 tokens)

  • Accuracy: High, especially when agent descriptions are detailed

Sources: orchestrator/core/routing/engine.py:328-433, orchestrator/config.py:144


Confidence-Based Orchestration

The router's confidence threshold (ROUTING_LLM_CONFIDENCE_THRESHOLD) determines whether a request should be routed directly to an agent or decomposed into a full orchestrated workflow.

spinner

Rationale: When the LLM is uncertain about which agent to route to (e.g., ambiguous request, multiple possible agents), forcing direct routing may lead to poor outcomes. Instead, the system triggers a multi-agent workflow that:

  1. Clarifies the request with the user

  2. Decomposes it into subtasks

  3. Routes each subtask to specialized agents

  4. Consolidates the results

Configuration: Set ROUTING_LLM_CONFIDENCE_THRESHOLD in environment variables:

Downstream Handling: The caller (e.g., StreamingChatService, RecipeExecutor) checks decision.route_type:

Metrics: Track orchestration rate in analytics:

  • % of requests routed directly

  • % of requests orchestrated

  • Average confidence by source channel

Sources: orchestrator/core/routing/engine.py:386-410, orchestrator/config.py:144


Request Flow

This diagram shows the complete routing flow from request ingestion to decision logging, mapping each step to specific code entities.

spinner

Request Headers: The router extracts workspace context from request headers:

  • X-Workspace-ID: Workspace UUID

  • Authorization: Clerk JWT or API key

Handled by get_request_context_hybrid dependency (see Authentication Flow).

Response Headers: The router injects routing metadata into response headers for debugging:

  • X-Routing-Agent-ID: Selected agent ID

  • X-Routing-Confidence: Confidence score (0.0-1.0)

  • X-Routing-Type: "agent", "workflow", or "orchestrate"

  • X-Routing-Reasoning: Human-readable routing explanation

  • X-Routing-Request-ID: Request envelope ID

Exposed via CORS configuration in orchestrator/main.py:444.

Sources: orchestrator/core/routing/engine.py:78-144, orchestrator/main.py:444


Decision Logging and Analytics

Every routing decision is persisted to the routing_decisions table for analytics, debugging, and optimization.

Database Schema

Unrouted Events

When all routing tiers fail, the request is stored in unrouted_events for later analysis:

Admin dashboards can query this table to identify:

  • Common request patterns that fail to route

  • Gaps in agent coverage

  • Misconfigured routing rules

Analytics Queries

Routing effectiveness:

Cache hit rate:

Agent routing distribution:

Sources: orchestrator/core/routing/engine.py:536-585, orchestrator/core/models/routing.py


Configuration

The routing system is configured via environment variables and system settings.

Environment Variables

Variable
Default
Description

ROUTING_CACHE_TTL_HOURS

24

Cache entry TTL in hours. Balances freshness vs hit rate.

ROUTING_LLM_CONFIDENCE_THRESHOLD

0.5

Minimum confidence for direct routing. Lower = more orchestration.

COMPOSIO_WEBHOOK_SECRET

(required)

Secret for validating Composio trigger webhooks.

Example .env configuration:

System Settings

The LLM used for Tier 3 classification is configured via system settings (see LLM Manager):

  • Provider: system_settings.orchestrator_llm.provider (e.g., "openai", "anthropic")

  • Model: system_settings.orchestrator_llm.model (e.g., "gpt-4-turbo-preview")

Example:

Redis Configuration

Routing cache requires Redis (optional but recommended):

If Redis is unavailable:

  • Tier 1 (cache) is skipped

  • All requests fall through to Tier 2/3

  • Performance degrades but system remains functional

Workspace-Level Overrides

Workspaces cannot currently override global routing configuration. All routing parameters are platform-wide.

Future Enhancement: Add workspace_settings table to allow per-workspace:

  • Custom confidence thresholds

  • Preferred LLM models for routing

  • Cache TTL overrides

Sources: orchestrator/config.py:140-149, orchestrator/.env.example:37-41


Summary

The Universal Router provides intelligent, cost-effective request routing through a four-tier cascading strategy:

  1. Tier 0 (User Overrides): <1ms, confidence 1.0, zero cost

  2. Tier 1 (Cache): 1-5ms, varies, zero cost after initial classification

  3. Tier 2 (Rules/Triggers/Intent): 5-30ms, confidence 0.4-0.95, zero cost

  4. Tier 3 (LLM): 500ms-3s, confidence 0.0-1.0, LLM API cost

Key Benefits:

  • Cost Optimization: 60-80% cache hit rate eliminates LLM costs

  • Latency Optimization: Most requests routed in <10ms

  • Accuracy: LLM fallback ensures all requests can be routed

  • Transparency: Every decision logged with reasoning

Related Systems:

Sources: orchestrator/core/routing/engine.py:1-586, orchestrator/main.py:79


Last updated