PRD: Universal Orchestrator Router (PRD-50)

Introduction

Transform the Automatos Orchestrator from a workflow-only executor into a universal request router that receives input from any channel — Chatbot, Jira triggers, Slack, WhatsApp, external APIs — normalizes it into a standard envelope, and intelligently routes it to the right pre-configured agent or recipe/workflow.

Today, the three consumers (Workflows, Chatbot, APIs) operate as separate systems with separate routing logic. The chatbot requires the user to select an agent. Webhooks are received but not dispatched (TODO at api/composio.py:509). This PRD unifies all input into a single routing pipeline, where the orchestrator makes a lightweight LLM classification, caches the decision, and hands off to a purpose-built agent that already has its tools assigned — minimizing cost by avoiding redundant reasoning at every step.

Phase 1 scope: Chatbot + Jira triggers. Slack, WhatsApp, and other external channels follow in Phase 2.

Problem Statement

  1. Chatbot requires manual agent selection — users must know which agent handles what, or it falls back to a generic default agent

  2. Webhook events are received but discardedPOST /api/composio/webhook logs and returns "received" without triggering any agent or workflow

  3. No event-driven agent dispatch — agents only execute via explicit workflow runs or chatbot conversations

  4. Three separate routing paths — chatbot, workflows, and APIs each have independent logic, leading to duplicated intent classification

Current Infrastructure (already built)

Component
Location
Status

Composio webhook endpoint

api/composio.py:467-516

Receives + validates, no dispatch

TriggerSubscription model

core/models/composio.py:88-108

Schema exists, unused

Trigger subscribe/unsubscribe API

api/composio.py:523-631

Endpoints exist, no downstream effect

IntentClassifier (rule-based)

core/services/intent_classifier.py

11 categories, regex patterns

ToolRouterService (LLM-based)

services/tool_router_service.py

Category-to-app mapping, caching

ActionClassifier (heuristic + LLM)

modules/tools/capabilities/classifier.py

Heuristic first, LLM fallback

LLMAgentSelector

modules/orchestrator/llm/llm_agent_selector.py

Function-calling agent selection

Chat streaming endpoint

api/chat.py:150-293

Agent-based streaming, uses agentId or default

Workflow execution pipeline

api/workflows.py:923+

9-stage async execution

ComposioToolExecutor

core/composio/tool_executor.py

Execute Composio actions with validation

CodeGraph (repo indexing)

modules/codegraph/

Clone, parse, embed, search code

GitHub webhook handler

api/github_webhooks.py

PR events → workflow trigger (working pattern)


Goals

  • Unify all input channels through a single routing pipeline with a standard request envelope

  • Replace manual agent selection in chatbot with intelligent auto-routing (with manual override)

  • Complete the webhook → agent/workflow dispatch (fill the TODO at composio.py:509)

  • Enable event-driven agent execution via Composio triggers (starting with Jira)

  • Reduce LLM cost per request by routing to pre-configured agents (one classification call, not full orchestration)

  • Cache routing decisions so repeated patterns (e.g., "check my emails") resolve instantly

  • Build the Jira → Coder Agent autonomous pipeline: ticket created → agent reads ticket → clones code → opens PR → moves ticket to "In Review"


Architecture

Request Envelope (Standard Format)

Every input, regardless of source, is normalized to:

Routing Decision Output

Routing Tiers (evaluated in order)

Tier
Method
Latency
Cost
When Used

0

User override

0ms

Free

User selected an agent or recipe explicitly

1

Cache hit (Redis)

<5ms

Free

Same intent pattern seen before, cached in Redis

2

Rule-based

<10ms

Free

Trigger source rules (jira_trigger → Bug Triage), keyword patterns

3

LLM classification

200-500ms

~$0.001

Ambiguous requests, first-time patterns

After Tier 3 resolves, the decision is cached in Redis (keyed by routing:decision:{workspace_id}:{content_hash}) so subsequent identical requests hit Tier 1. Cache persists across process restarts and is shared across workers.


User Stories

US-001: Create RequestEnvelope and RoutingDecision models

Description: As a developer, I need standard data models for the universal request envelope and routing decision so that all channels produce consistent input and the router produces consistent output.

Acceptance Criteria:


US-002: Build the Routing Engine

Description: As the orchestrator, I need a routing engine that takes a RequestEnvelope and returns a RoutingDecision using tiered evaluation (override → cache → rules → LLM).

Acceptance Criteria:


US-003: Create routing cache with feedback learning

Description: As the system, I need to cache routing decisions and incorporate user corrections so routing accuracy improves over time.

Acceptance Criteria:


US-004: Build channel ingestors — Chatbot

Description: As a chatbot user, when I send a message, the orchestrator should auto-route it to the best agent instead of requiring me to select one from a dropdown.

Acceptance Criteria:


US-005: Build channel ingestors — Jira Trigger

Description: As the system, when a Jira trigger webhook fires (new issue created), I need to normalize the event into a RequestEnvelope and dispatch it through the router.

Acceptance Criteria:


US-006: Complete webhook → router dispatch

Description: As a developer, I need to replace the TODO at api/composio.py:509 with actual routing logic that dispatches webhook events through the universal router.

Acceptance Criteria:


US-007: Set up Jira trigger subscription

Description: As an admin, I need to register the JIRA_NEW_ISSUE_TRIGGER for the PILOT project so that new Jira tickets automatically fire webhook events to the orchestrator.

Acceptance Criteria:


US-008: Build Jira Bug Triage autonomous workflow

Description: As the system, when a Jira bug ticket is created in PILOT, I need to autonomously: read the ticket, analyze the codebase, plan a fix, clone the repo, apply changes, open a PR, and move the Jira ticket to "In Review".

Acceptance Criteria:


US-009: Add routing configuration API

Description: As an admin, I need API endpoints to manage routing rules, view routing decisions, and configure agent-to-intent mappings.

Acceptance Criteria:


US-010: Update chatbot UI for auto-routing

Description: As a chatbot user, I want to see which agent the orchestrator selected for my message, with the option to override.

Acceptance Criteria:


Functional Requirements

  • FR-1: All input channels MUST normalize to RequestEnvelope before routing

  • FR-2: Router MUST evaluate tiers in order: override → cache → rules → LLM. Stop at first match.

  • FR-3: LLM routing decisions MUST be cached in Redis via the existing RedisClient infrastructure (core.redis.client). Cache key: routing:decision:{workspace_id}:{sha256(normalized_content + source)}. Follows DatabaseCacheService patterns (key prefixes, setex(), incr() stats).

  • FR-4: User override MUST always take precedence over auto-routing at any point in a conversation

  • FR-5: Webhook dispatch MUST be async — return 200 to Composio immediately, process in background

  • FR-6: Every routing decision MUST be logged to database with full audit trail

  • FR-7: User corrections MUST feed back into cache to improve future routing

  • FR-8: Jira Bug Triage workflow MUST be fully autonomous: read → analyze → plan → fix → PR → update ticket

  • FR-9: If Bug Triage fails at any step, it MUST post a failure comment on the Jira ticket and halt cleanly

  • FR-10: Chatbot conversation history MUST be maintained across agent switches (if router selects a different agent mid-conversation)


Security Requirements

Channel
Auth Method
Validation

Chatbot

Clerk JWT (Authorization: Bearer <token>)

get_request_context_hybrid extracts user + workspace

Jira Trigger

Composio webhook signature

HMAC SHA256 via COMPOSIO_WEBHOOK_SECRET

Slack (Phase 2)

Slack signing secret

HMAC SHA256 via SLACK_SIGNING_SECRET

WhatsApp (Phase 2)

Composio webhook signature

Same as Jira trigger path

External API

API key (X-API-Key header)

Existing require_api_key middleware

  • SR-1: Webhook endpoints MUST validate signatures before processing. Reject with 401 on mismatch.

  • SR-2: All routing endpoints MUST require authentication via get_request_context_hybrid

  • SR-3: Agents executing via trigger dispatch MUST operate within the workspace context of the trigger subscription (not a global/admin context)

  • SR-4: Rate limiting per channel: Chatbot — 60 req/min per user. Webhooks — 120 req/min per workspace. API — configurable per API key.

  • SR-5: Routing decisions MUST NOT leak data across workspaces. Cache is workspace-scoped.

  • SR-6: RequestEnvelope.raw_payload MUST be sanitized — strip any auth tokens or secrets from stored payloads

  • SR-7: GitHub operations (clone, branch, PR) MUST use workspace-scoped credentials, not global tokens


Non-Goals (Out of Scope)

  • Phase 2 channels — Slack, WhatsApp, Telegram ingestors are not in this PRD. Only the ingestor interface is defined; implementations come later.

  • UI for routing rule management — Phase 1 is API-only. A visual rule builder in the frontend is a future story.

  • Multi-workspace routing — Each workspace has its own routing context. Cross-workspace routing is not supported.

  • Real-time streaming from webhook-triggered agents — Webhook-triggered executions run async. Results are stored, not streamed. (Chatbot channel retains streaming.)

  • Custom LLM model for routing — Uses the workspace's configured LLM. No fine-tuned routing model.

  • Automatic agent creation — Router only selects from existing agents. It does not create new agents on-the-fly.


Technical Considerations

New Files

File
Purpose

orchestrator/core/models/routing.py

RequestEnvelope, RoutingDecision, RoutingRule Pydantic + ORM models

orchestrator/core/routing/engine.py

UniversalRouter class — tiered routing logic

orchestrator/core/routing/cache.py

RoutingCache — Redis-backed caching via existing RedisClient infrastructure, with get_routing_cache() singleton

orchestrator/core/routing/ingestors/base.py

BaseIngestor abstract class

orchestrator/core/routing/ingestors/chatbot.py

ChatbotIngestor — ChatRequest → RequestEnvelope

orchestrator/core/routing/ingestors/jira_trigger.py

JiraTriggerIngestor — Composio webhook → RequestEnvelope

orchestrator/api/routing.py

Routing management API endpoints

Modified Files

File
Change

orchestrator/api/composio.py

Replace TODO at line 509 with router dispatch

orchestrator/api/chat.py

Integrate router before agent selection (line 253)

orchestrator/main.py

Register routing router

orchestrator/config.py

Add COMPOSIO_WEBHOOK_SECRET, routing config vars

orchestrator/.env.example

Document new env vars

Database Migrations

  • routing_decisions table — audit log of all routing decisions

  • routing_rules table — user-defined routing rules (source pattern, intent keywords, target agent/workflow)

  • unrouted_events table — events that couldn't be routed (for manual review)

Dependencies

  • Existing RedisClient (core/redis/client.py) + get_redis_client() is used for routing cache — same infrastructure as DatabaseCacheService

  • Existing IntentClassifier (rule-based) is reused in Tier 2

  • Existing LLMAgentSelector logic is adapted for Tier 3 (lighter-weight prompt)

  • Existing ComposioToolExecutor handles all Composio action execution

  • Existing TriggerSubscription model is used for trigger → agent/workflow mapping

Performance

  • Tier 0-2 routing: < 50ms (no LLM call)

  • Tier 3 routing: < 500ms (single LLM call with small prompt)

  • Cache hit rate target: > 70% after 1 week of usage

  • Webhook processing: return 200 within 100ms, dispatch async

Cost Model

Scenario
LLM Calls
Estimated Cost

Cached route (Tier 1)

0

$0.00

Rule-matched route (Tier 2)

0

$0.00

LLM classification (Tier 3)

1 (small prompt ~500 tokens)

~$0.001

Agent execution

1-3 (agent's own reasoning)

$0.01-0.05

Full orchestration

5-9 (decompose, select, execute)

$0.05-0.20

Compared to routing everything through full orchestration: ~90% cost reduction for routine requests.


Dependency Order


Success Metrics

  • Chatbot users send messages without selecting an agent and get correct routing > 85% of the time

  • Routing latency < 50ms for 70%+ of requests (cache/rule hits)

  • Jira ticket created in PILOT → PR opened autonomously within 5 minutes

  • Zero webhook events lost — all received events either routed or stored in unrouted_events

  • Routing cache hit rate > 70% after 1 week of production usage

  • User corrections decrease over time as cache learns (week-over-week improvement)


Open Questions

  1. Coder Agent scope — Should the Bug Triage workflow be limited to the Automatos repo, or should it support multiple repos configured per workspace?

  2. Conflict resolution — If the LLM routes to Agent A but the user corrects to Agent B, and later the same pattern appears, should it always use Agent B or re-evaluate?

  3. Conversation handoff — If the router switches agents mid-conversation (e.g., user asks about emails then asks about code), should the new agent see the full conversation history from the previous agent?

  4. Webhook retry — If async dispatch fails (agent error, DB down), should the webhook handler store the event for retry? What's the retry policy?

  5. Trigger management UI — Should there be a frontend page to manage trigger subscriptions (enable/disable Jira trigger, set up Slack triggers), or is API-only sufficient for Phase 1?

Last updated