Tier 1: Cache Lookup

chevron-rightRelevant source fileshashtag

Purpose and Scope

Tier 1: Cache Lookup is the second tier in the Universal Router's decision-making pipeline, immediately after Tier 0: User Overrides. When a request has no explicit override, the router first checks a Redis-backed routing cache to see if an identical request has been routed recently. This tier provides sub-millisecond routing decisions for repeated requests, dramatically reducing LLM API costs and latency compared to Tier 3: LLM Classification.

The cache stores complete RoutingDecision objects keyed by workspace, content, and source. Cache hits return immediately with high confidence; cache misses fall through to Tier 2: Rule-Based Routing or Tier 3: LLM Classification, which then populate the cache for future requests.

Related Pages:


Cache Lookup Flow

spinner

Sources: orchestrator/core/routing/engine.py:171-176

The cache lookup implementation is minimal and fast:

If the RoutingCache instance is not available (Redis unavailable or not configured), Tier 1 immediately returns None and routing proceeds to Tier 2. Otherwise, it queries the cache with three parameters:

Parameter
Description
Example

workspace_id

UUID of the workspace making the request

550e8400-e29b-41d4-a716-446655440000

content

Normalized request content/message text

"Create a new GitHub PR for bug fix"

source

Channel source enum value

ChannelSource.WEB_CHAT

Sources: orchestrator/core/routing/engine.py:171-176, orchestrator/core/models/routing.py


Cache Key Generation

spinner

Sources: orchestrator/core/routing/engine.py:54, orchestrator/core/routing/cache.py

The cache key is constructed by:

  1. Normalizing content - The _normalize_content() function strips whitespace, converts to lowercase, and removes punctuation variations to ensure similar requests match

  2. Hashing - A SHA-256 hash is computed from normalized_content + "|" + source.value to create a deterministic, compact key

  3. Workspace scoping - The workspace ID is prepended to ensure complete tenant isolation

This ensures that:

  • Identical requests from the same workspace always hit the same cache entry

  • Minor formatting differences (e.g., trailing spaces) don't create cache misses

  • Different workspaces never share routing decisions, even for identical text

  • Different channels (web chat vs Slack vs email) maintain separate cache entries

Sources: orchestrator/core/routing/engine.py:51-54, orchestrator/core/routing/cache.py


RoutingCache Implementation

spinner

Sources: orchestrator/core/routing/cache.py, orchestrator/config.py:143

The RoutingCache class provides a Redis-backed storage layer for routing decisions. Key characteristics:

Redis Connection

  • Uses the centralized Redis client from core.redis.client.get_redis_client()

  • Lazy initialization - connection established on first cache access

  • Graceful degradation - if Redis is unavailable, cache operations return None and routing continues

Data Structure

The cached RoutingDecision object includes:

TTL Configuration

Cache entries expire after ROUTING_CACHE_TTL_HOURS (default: 24 hours) to ensure:

  • Routing logic changes eventually propagate to all requests

  • Stale decisions don't persist indefinitely if agents are deleted or modified

  • Redis memory usage remains bounded

Sources: orchestrator/core/routing/cache.py, orchestrator/config.py:143, orchestrator/core/models/routing.py


Cache Population

spinner

Sources: orchestrator/core/routing/engine.py:404-409, orchestrator/core/routing/engine.py:421-427

Cache population occurs in two scenarios:

1. After Tier 3 LLM Classification (High Confidence)

When the LLM classifies a request with confidence ≥ ROUTING_LLM_CONFIDENCE_THRESHOLD (default: 0.5), the router immediately caches the decision:

Location: orchestrator/core/routing/engine.py:413-428

2. After Tier 3 LLM Classification (Low Confidence)

Even when confidence is below threshold (triggering orchestrated workflow execution), the router still caches the decision to avoid re-invoking the LLM:

Location: orchestrator/core/routing/engine.py:396-410

This ensures that even uncertain routing decisions are cached, preventing repeated LLM invocations for the same ambiguous request.

Sources: orchestrator/core/routing/engine.py:390-428


Cache Configuration

Environment Variables

The routing cache is controlled by three primary configuration settings:

Variable
Default
Description

ROUTING_CACHE_TTL_HOURS

24

Cache entry lifetime in hours

REDIS_HOST

(required)

Redis server hostname or IP

REDIS_PORT

6379

Redis server port

REDIS_PASSWORD

(optional)

Redis authentication password

REDIS_URL

(optional)

Complete Redis URL (overrides individual params)

Sources: orchestrator/config.py:47-62, orchestrator/config.py:143

Cache TTL Strategy

The 24-hour default TTL balances multiple concerns:

spinner

Sources: orchestrator/config.py:143

Redis Configuration Fallback

The configuration system supports multiple Redis configuration patterns:

  1. Complete URL - REDIS_URL=redis://:password@host:port/0 (highest priority)

  2. Component-based - Individual REDIS_HOST, REDIS_PORT, REDIS_PASSWORD variables

  3. No Redis - Cache gracefully degrades, all requests skip Tier 1

Sources: orchestrator/config.py:51-62


Cache Lifecycle and Invalidation

spinner

Sources: orchestrator/core/routing/cache.py, orchestrator/config.py:143

Automatic Expiration

Cache entries are automatically removed by Redis after ROUTING_CACHE_TTL_HOURS. When this happens:

  1. The next identical request becomes a cache miss at Tier 1

  2. Routing falls through to Tier 2 (rules/subscriptions/intent matching)

  3. If Tier 2 still produces no match, Tier 3 LLM re-classifies the request

  4. The new LLM decision (which may differ from the expired one) is cached again

This ensures that routing logic naturally adapts to:

  • Agent description changes

  • New agent additions

  • Routing rule modifications

  • Model provider updates

No Explicit Invalidation

The current implementation does not provide manual cache invalidation APIs. When agents or routing rules are modified, cached decisions persist until their TTL expires. This is a deliberate design choice to:

  • Keep the cache implementation simple and stateless

  • Avoid complex invalidation logic tracking which cache entries are affected by which agent changes

  • Rely on the 24-hour TTL to provide "eventual freshness" within a reasonable timeframe

For immediate routing changes, administrators can manually flush Redis keys or restart the Redis instance (which clears all cached routing decisions).

Sources: orchestrator/core/routing/cache.py, orchestrator/core/routing/engine.py:390-428


Performance Impact

Cache Hit Metrics

spinner

Sources: orchestrator/core/routing/engine.py:171-176, orchestrator/core/routing/engine.py:332-433

For a workspace with 1000 routing requests per day and a 70% cache hit rate:

Metric
Without Cache
With Cache (70% hit)
Improvement

LLM API Calls

1000/day

300/day

70% reduction

LLM Cost (at $0.005/call)

$5.00/day

$1.50/day

$3.50/day saved

Avg Response Time

800ms

310ms

61% faster

The cache hit rate typically improves over time as:

  • Common user requests establish stable routing patterns

  • Repeated questions from the same channels populate the cache

  • High-frequency workflows (e.g., "create JIRA ticket") become instant-route

Sources: orchestrator/core/routing/engine.py:171-176, orchestrator/core/routing/engine.py:332-433


Relationship to Plugin Cache

The routing cache shares architectural patterns with the plugin content cache, but serves a fundamentally different purpose:

Aspect
RoutingCache
PluginContentCache

Purpose

Cache routing decisions (agent/workflow selection)

Cache marketplace plugin files from S3

Key Structure

routing:{workspace}:{content_hash}

plugin_content:{slug}:{version}

Value Type

JSON RoutingDecision object

JSON Dict[filepath, content]

TTL

24 hours (routing freshness)

1 hour (S3 read reduction)

Population

After Tier 3 LLM classification

On-demand when plugins loaded

Backend

Redis only

Redis (cache) + S3 (source of truth)

Both caches use similar Redis interaction patterns (lazy initialization, graceful degradation, TTL-based expiration) but operate at different layers of the system.

Sources: orchestrator/core/routing/cache.py, orchestrator/core/services/plugin_cache.py:1-263


Last updated