Tier 1: Cache Lookup
Purpose and Scope
Tier 1: Cache Lookup is the second tier in the Universal Router's decision-making pipeline, immediately after Tier 0: User Overrides. When a request has no explicit override, the router first checks a Redis-backed routing cache to see if an identical request has been routed recently. This tier provides sub-millisecond routing decisions for repeated requests, dramatically reducing LLM API costs and latency compared to Tier 3: LLM Classification.
The cache stores complete RoutingDecision objects keyed by workspace, content, and source. Cache hits return immediately with high confidence; cache misses fall through to Tier 2: Rule-Based Routing or Tier 3: LLM Classification, which then populate the cache for future requests.
Related Pages:
For the overall routing architecture, see Routing Architecture
For rule-based routing that follows cache misses, see Tier 2: Rule-Based Routing
For the final LLM fallback tier, see Tier 3: LLM Classification
Cache Lookup Flow
Sources: orchestrator/core/routing/engine.py:171-176
The cache lookup implementation is minimal and fast:
If the RoutingCache instance is not available (Redis unavailable or not configured), Tier 1 immediately returns None and routing proceeds to Tier 2. Otherwise, it queries the cache with three parameters:
workspace_id
UUID of the workspace making the request
550e8400-e29b-41d4-a716-446655440000
content
Normalized request content/message text
"Create a new GitHub PR for bug fix"
source
Channel source enum value
ChannelSource.WEB_CHAT
Sources: orchestrator/core/routing/engine.py:171-176, orchestrator/core/models/routing.py
Cache Key Generation
Sources: orchestrator/core/routing/engine.py:54, orchestrator/core/routing/cache.py
The cache key is constructed by:
Normalizing content - The
_normalize_content()function strips whitespace, converts to lowercase, and removes punctuation variations to ensure similar requests matchHashing - A SHA-256 hash is computed from
normalized_content + "|" + source.valueto create a deterministic, compact keyWorkspace scoping - The workspace ID is prepended to ensure complete tenant isolation
This ensures that:
Identical requests from the same workspace always hit the same cache entry
Minor formatting differences (e.g., trailing spaces) don't create cache misses
Different workspaces never share routing decisions, even for identical text
Different channels (web chat vs Slack vs email) maintain separate cache entries
Sources: orchestrator/core/routing/engine.py:51-54, orchestrator/core/routing/cache.py
RoutingCache Implementation
Sources: orchestrator/core/routing/cache.py, orchestrator/config.py:143
The RoutingCache class provides a Redis-backed storage layer for routing decisions. Key characteristics:
Redis Connection
Uses the centralized Redis client from
core.redis.client.get_redis_client()Lazy initialization - connection established on first cache access
Graceful degradation - if Redis is unavailable, cache operations return
Noneand routing continues
Data Structure
The cached RoutingDecision object includes:
TTL Configuration
Cache entries expire after ROUTING_CACHE_TTL_HOURS (default: 24 hours) to ensure:
Routing logic changes eventually propagate to all requests
Stale decisions don't persist indefinitely if agents are deleted or modified
Redis memory usage remains bounded
Sources: orchestrator/core/routing/cache.py, orchestrator/config.py:143, orchestrator/core/models/routing.py
Cache Population
Sources: orchestrator/core/routing/engine.py:404-409, orchestrator/core/routing/engine.py:421-427
Cache population occurs in two scenarios:
1. After Tier 3 LLM Classification (High Confidence)
When the LLM classifies a request with confidence ≥ ROUTING_LLM_CONFIDENCE_THRESHOLD (default: 0.5), the router immediately caches the decision:
Location: orchestrator/core/routing/engine.py:413-428
2. After Tier 3 LLM Classification (Low Confidence)
Even when confidence is below threshold (triggering orchestrated workflow execution), the router still caches the decision to avoid re-invoking the LLM:
Location: orchestrator/core/routing/engine.py:396-410
This ensures that even uncertain routing decisions are cached, preventing repeated LLM invocations for the same ambiguous request.
Sources: orchestrator/core/routing/engine.py:390-428
Cache Configuration
Environment Variables
The routing cache is controlled by three primary configuration settings:
ROUTING_CACHE_TTL_HOURS
24
Cache entry lifetime in hours
REDIS_HOST
(required)
Redis server hostname or IP
REDIS_PORT
6379
Redis server port
REDIS_PASSWORD
(optional)
Redis authentication password
REDIS_URL
(optional)
Complete Redis URL (overrides individual params)
Sources: orchestrator/config.py:47-62, orchestrator/config.py:143
Cache TTL Strategy
The 24-hour default TTL balances multiple concerns:
Sources: orchestrator/config.py:143
Redis Configuration Fallback
The configuration system supports multiple Redis configuration patterns:
Complete URL -
REDIS_URL=redis://:password@host:port/0(highest priority)Component-based - Individual
REDIS_HOST,REDIS_PORT,REDIS_PASSWORDvariablesNo Redis - Cache gracefully degrades, all requests skip Tier 1
Sources: orchestrator/config.py:51-62
Cache Lifecycle and Invalidation
Sources: orchestrator/core/routing/cache.py, orchestrator/config.py:143
Automatic Expiration
Cache entries are automatically removed by Redis after ROUTING_CACHE_TTL_HOURS. When this happens:
The next identical request becomes a cache miss at Tier 1
Routing falls through to Tier 2 (rules/subscriptions/intent matching)
If Tier 2 still produces no match, Tier 3 LLM re-classifies the request
The new LLM decision (which may differ from the expired one) is cached again
This ensures that routing logic naturally adapts to:
Agent description changes
New agent additions
Routing rule modifications
Model provider updates
No Explicit Invalidation
The current implementation does not provide manual cache invalidation APIs. When agents or routing rules are modified, cached decisions persist until their TTL expires. This is a deliberate design choice to:
Keep the cache implementation simple and stateless
Avoid complex invalidation logic tracking which cache entries are affected by which agent changes
Rely on the 24-hour TTL to provide "eventual freshness" within a reasonable timeframe
For immediate routing changes, administrators can manually flush Redis keys or restart the Redis instance (which clears all cached routing decisions).
Sources: orchestrator/core/routing/cache.py, orchestrator/core/routing/engine.py:390-428
Performance Impact
Cache Hit Metrics
Sources: orchestrator/core/routing/engine.py:171-176, orchestrator/core/routing/engine.py:332-433
For a workspace with 1000 routing requests per day and a 70% cache hit rate:
LLM API Calls
1000/day
300/day
70% reduction
LLM Cost (at $0.005/call)
$5.00/day
$1.50/day
$3.50/day saved
Avg Response Time
800ms
310ms
61% faster
The cache hit rate typically improves over time as:
Common user requests establish stable routing patterns
Repeated questions from the same channels populate the cache
High-frequency workflows (e.g., "create JIRA ticket") become instant-route
Sources: orchestrator/core/routing/engine.py:171-176, orchestrator/core/routing/engine.py:332-433
Relationship to Plugin Cache
The routing cache shares architectural patterns with the plugin content cache, but serves a fundamentally different purpose:
Purpose
Cache routing decisions (agent/workflow selection)
Cache marketplace plugin files from S3
Key Structure
routing:{workspace}:{content_hash}
plugin_content:{slug}:{version}
Value Type
JSON RoutingDecision object
JSON Dict[filepath, content]
TTL
24 hours (routing freshness)
1 hour (S3 read reduction)
Population
After Tier 3 LLM classification
On-demand when plugins loaded
Backend
Redis only
Redis (cache) + S3 (source of truth)
Both caches use similar Redis interaction patterns (lazy initialization, graceful degradation, TTL-based expiration) but operate at different layers of the system.
Sources: orchestrator/core/routing/cache.py, orchestrator/core/services/plugin_cache.py:1-263
Last updated

