PRD-54: Multi-Provider LLM Marketplace & Cost Optimization
Executive Summary
Transform Automatos AI from a single-provider (OpenAI) platform into a multi-tier LLM marketplace where users browse, compare, and deploy 200+ models from direct providers and aggregators. Users select models per-agent, per-workspace, and per-task — enabling cost optimization where cheap models handle 80% of work and premium models handle the 20% that matters.
Three-tier architecture:
Tier 1: Direct provider integrations (OpenAI, Anthropic, Google) — best pricing, highest reliability
Tier 2: OpenRouter aggregator — 200+ open-source and commercial models via single API
Tier 3: BYOK (Bring Your Own Key) — users plug in their own provider keys
Current State
What Exists
7 LLM client implementations: OpenAI, Anthropic, Google Gemini, Azure OpenAI, HuggingFace, AWS Bedrock, Grok/xAI
ModelRegistrywith rich metadata (context_window, costs, capabilities, recommended_for)ModelInfodataclass with cost tracking fields/api/models/endpoints: list, recommendations, compare, cost-estimatePer-agent
model_configJSON field with provider/model/temperature/max_tokensPer-service system settings (orchestrator_llm, chatbot, codegraph, rag, etc.)
Marketplace with
llmitem type defined but not yet builtmarketplace-llms-tab.tsxexists as placeholderAnalytics with token usage and cost tracking in
WorkflowExecution.models_usedCredential resolution: settings → name patterns → type-based → env vars
What's Missing
Only OpenAI actively used; Anthropic connected but underutilized
No OpenRouter or aggregator integration
No BYOK key management for end users
No LLM marketplace UI (browse, compare, install models to workspace)
No model-level analytics (cost per model, tokens per model, success rate per model)
No workspace-level model access control (plan-based model gating)
No usage quotas or billing integration
Model costs in registry are hardcoded, not dynamically updated
No model health monitoring or automatic fallback
Target State
Architecture
Implementation Details
Part 1: OpenRouter Integration (Tier 2)
1A. New LLM Client — openrouter_client.py
openrouter_client.pyFile: orchestrator/core/llm/openrouter_client.py
OpenRouter uses the OpenAI-compatible API format. The client extends the existing OpenAI client pattern with:
Key differences from OpenAI client:
Different
base_urlExtra headers (
HTTP-Referer,X-Title) for OpenRouter's leaderboardModel IDs use provider prefix format:
anthropic/claude-3.5-sonnet,meta-llama/llama-3.1-405bPricing returned in response headers (
x-ratelimit-*)
1B. Register Provider in LLM Manager
File: orchestrator/core/llm/manager.py
Add openrouter to the provider resolution chain. When system settings or agent config specifies provider: openrouter, route to the new client.
File: orchestrator/core/llm/base.py
Add OPENROUTER = "openrouter" to LLMProvider enum.
1C. Seed OpenRouter Models in Registry
File: orchestrator/core/llm/model_registry.py
Add 30-50 most popular OpenRouter models with full metadata. Group by category:
Fast & Cheap (for simple agent tasks):
meta-llama/llama-3.1-8b-instruct— $0.055/1M, 128K contextmistralai/mistral-7b-instruct— $0.055/1M, 32K contextgoogle/gemini-flash-1.5— $0.075/1M, 1M contextqwen/qwen-2.5-7b-instruct— $0.054/1M, 128K context
Balanced (for most agent work):
meta-llama/llama-3.1-70b-instruct— $0.52/1M, 128K contextmistralai/mixtral-8x22b-instruct— $0.90/1M, 65K contextdeepseek/deepseek-chat— $0.14/1M, 128K contextanthropic/claude-3.5-sonnet(via OpenRouter) — $3.00/1M, 200K context
Premium (for complex reasoning):
anthropic/claude-3-opus— $15.00/1M, 200K contextopenai/gpt-4o— $2.50/1M, 128K contextgoogle/gemini-pro-1.5— $1.25/1M, 2M context
Coding Specialists:
deepseek/deepseek-coder— $0.14/1M, 128K contextqwen/qwen-2.5-coder-32b— $0.18/1M, 128K context
Each model entry includes full ModelInfo:
1D. Dynamic Model Sync from OpenRouter API
File: New orchestrator/core/llm/openrouter_sync.py
OpenRouter provides GET https://openrouter.ai/api/v1/models — returns all available models with pricing. Build a sync service that:
Fetches model list periodically (daily cron or on-demand)
Updates
ModelInfoentries with current pricingFlags deprecated models
Adds new models automatically
Stores in database table (not just in-memory registry)
Part 2: Activate Anthropic & Google (Tier 1)
2A. Verify Anthropic Client
File: orchestrator/core/llm/anthropic_client.py
Already implemented. Verify:
Tool/function calling works (Anthropic uses different format)
Streaming works
All Claude 3.x and 3.5 models registered in ModelRegistry
Cost tracking accurate
Models to ensure are registered:
claude-3-5-sonnet-20241022
$3.00
$15.00
200K
Best balance of capability and cost
claude-3-5-haiku-20241022
$0.80
$4.00
200K
Fast, cheap, good for simple tasks
claude-3-opus-20240229
$15.00
$75.00
200K
Most capable, complex reasoning
claude-sonnet-4-5-20250929
$3.00
$15.00
200K
Latest Sonnet
claude-haiku-4-5-20251001
$0.80
$4.00
200K
Latest Haiku
2B. Activate Google Gemini
File: orchestrator/core/llm/google_client.py
Already implemented. Verify and add to registry:
gemini-2.0-flash
$0.075
$0.30
1M
Extremely cheap, fast, huge context
gemini-2.0-flash-lite
$0.0375
$0.15
1M
Even cheaper
gemini-1.5-pro
$1.25
$5.00
2M
Largest context window available
2C. System Settings — Default Model per Service
File: orchestrator/api/system_settings.py
Ensure each service category has configurable defaults:
orchestrator_llm
gpt-4o-mini
Routing decisions are simple
chatbot
gpt-4o or claude-3-5-sonnet
User-facing, needs quality
codegraph
deepseek-coder or gpt-4o
Code-specific
document_processing
gemini-2.0-flash
Cheap, huge context for docs
rag
gpt-4o-mini
Retrieval augmentation is simple
embeddings
BAAI/bge-large-en-v1.5
Keep existing (good quality, local)
Part 3: BYOK — Bring Your Own Key (Tier 3)
3A. User API Key Storage
File: orchestrator/core/models/core.py — new UserApiKey model
Uses existing CREDENTIAL_ENCRYPTION_KEY for encryption at rest.
3B. API Keys CRUD Endpoint
File: New orchestrator/api/user_api_keys.py
3C. Key Resolution in LLM Manager
File: orchestrator/core/llm/manager.py
Update credential resolution order:
Agent-level model_config (explicit key — admin only)
User API key for the provider (BYOK — new)
System settings credential mapping (platform key)
Environment variable fallback
When a BYOK key exists and the workspace is on a BYOK-eligible plan, use their key. Track usage separately for billing (platform fee only, no token markup).
3D. Frontend — API Keys Management
File: frontend/components/settings/ApiKeysSettingsTab.tsx (new)
Add "API Keys" tab in Settings:
List connected providers with status (active/inactive)
Add key with provider dropdown and paste field
Test key button (validates with a small API call)
Usage stats per key
Remove key
File: frontend/components/settings/SettingsPanel.tsx
Add new tab (6-column grid):
Part 4: LLM Marketplace UI
4A. Database — Model Catalog Table
File: New alembic migration
Create llm_models table for persistent model catalog (currently in-memory registry):
4B. Seed Default Models
File: New seed script or migration
Pre-populate llm_models with all Tier 1 and key Tier 2 models. Mark Tier 1 models as is_default = true (auto-installed in all workspaces).
Default models (included in all plans):
gpt-4o-mini (OpenAI)
gemini-2.0-flash (Google)
claude-3-5-haiku (Anthropic)
Pro plan models (available on Pro+):
gpt-4o (OpenAI)
claude-3-5-sonnet (Anthropic)
gemini-1.5-pro (Google)
All OpenRouter models
Enterprise models:
gpt-4-turbo (OpenAI)
claude-3-opus (Anthropic)
All BYOK models
4C. Marketplace API — LLM Endpoints
File: orchestrator/api/marketplace.py (extend existing)
4D. Frontend — LLM Marketplace Tab
File: frontend/components/marketplace/marketplace-llms-tab.tsx (replace placeholder)
Full marketplace experience:
Browse View:
Filter bar: Provider | Category | Price Range | Context Size | Capabilities
Sort: Most Popular | Cheapest | Largest Context | Newest
Grid of model cards (see 4E below)
Category sections: "Fast & Cheap", "Balanced", "Premium", "Coding", "Creative"
Model Card Component:
Detail Modal (click on card):
Full description and changelog
Performance benchmarks
Cost calculator (estimate monthly cost based on usage)
Usage in your workspace (if installed)
Similar models recommendations
Compare View:
Side-by-side table of 2-4 models
Capability radar chart
Cost comparison for sample workloads
Context window comparison
4E. Frontend — Model Card Component
File: frontend/components/marketplace/llm-model-card.tsx (new)
Props:
4F. Frontend — Model Detail Modal
File: frontend/components/marketplace/llm-model-detail-modal.tsx (new)
Sections:
Overview: Name, provider, description, tier badge
Specifications: Context window, max output, cost per 1K/1M tokens
Capabilities: Visual bars (reasoning, coding, analysis, creativity, speed)
Features: Function calling, streaming, vision, JSON mode
Recommended For: Task type badges
Cost Calculator: Input estimated tokens/month → monthly cost
Usage in Workspace: Token usage, cost to date (if installed)
Similar Models: Other models in same category/price range
Part 5: Agent Model Selection UI
5A. Agent Create/Edit — Model Picker
File: frontend/components/agents/ (modify existing agent form)
Replace the current simple model dropdown with a rich model picker:
Quick Select: Dropdown grouped by tier (Recommended → Direct → OpenRouter → BYOK)
Browse Models: Opens marketplace LLM tab filtered to workspace-installed models
Model Info Inline: Show cost, context, capabilities for selected model
Fallback Model: Secondary model if primary fails or is rate-limited
Agent model config structure (extends existing):
5B. Smart Model Recommendation
When creating an agent, suggest models based on:
Agent type (code_architect → coding models, communication → general models)
Task complexity (simple routing → cheap models, complex reasoning → premium)
Workspace plan (starter → default models only)
Budget preference (cost-optimized vs quality-optimized)
Part 6: Analytics & Usage Tracking
6A. Database — Usage Tracking Table
File: New alembic migration
6B. Usage Tracking Middleware
File: orchestrator/core/llm/usage_tracker.py (new)
Wrap every LLM call to automatically record usage:
Integrate into each LLM client's generate() method via decorator or post-hook.
6C. Analytics API — Model Usage Endpoints
File: orchestrator/api/analytics.py (extend)
6D. Frontend — Analytics Dashboard Updates
File: frontend/components/analytics/ (extend existing)
Add "LLM Usage" section to analytics dashboard:
Cost Overview Card: Total spend this period, trend %, top 3 models by cost
Usage Chart: Stacked area chart — tokens over time, colored by model/provider
Model Breakdown Table: Model | Requests | Tokens | Cost | Avg Latency | Error Rate
Cost Optimization Tips: AI-generated suggestions based on usage patterns
Per-Agent Cost: Which agents cost the most, model recommendations
6E. Quota & Billing Integration
File: orchestrator/core/services/quota_service.py (new)
Plan-based quotas:
Starter (Free)
100K tokens
Default models only
No
Pro
5M tokens
All models
Yes
Enterprise
Unlimited
All models
Yes
Quota enforcement:
Before each LLM call, check remaining quota (Redis counter)
If exceeded, return 429 with upgrade prompt
BYOK calls don't count against quota (user pays directly)
Track overage for billing (Pro plan: $X per 1M tokens over quota)
Part 7: Model Health & Fallback
7A. Model Health Monitor
File: orchestrator/core/llm/health_monitor.py (new)
Track per-model health metrics in Redis:
Success rate (last 100 requests)
Average latency (last 100 requests)
Error rate and error types
Rate limit status
If a model's error rate exceeds 20% in a 5-minute window:
Mark as
degradedAuto-fallback to configured fallback model
Alert in dashboard
Retry original model after cooldown
7B. Automatic Fallback Chain
Agent model config supports fallback:
LLM Manager tries each model in order until one succeeds.
Database Changes Summary
llm_models
NEW — Model catalog with pricing, capabilities, marketplace metadata
workspace_models
NEW — Which models a workspace has installed
user_api_keys
NEW — BYOK encrypted API keys per workspace
llm_usage
NEW — Per-request usage tracking for analytics and billing
system_settings
Add default model settings for each service category
API Changes Summary
/api/marketplace/items?type=llm
GET
Browse LLM models
/api/marketplace/items/{id}/install
POST
Install model to workspace
/api/marketplace/llm/compare
GET
Compare models side-by-side
/api/marketplace/llm/recommend
GET
Smart recommendations
/api/keys
CRUD
Manage BYOK API keys
/api/keys/{id}/test
POST
Validate API key
/api/analytics/llm/usage
GET
Usage analytics
/api/analytics/llm/costs
GET
Cost analytics
/api/analytics/llm/summary
GET
Dashboard summary
/api/analytics/llm/recommendations
GET
Cost optimization tips
Frontend Changes Summary
marketplace-llms-tab.tsx
REWRITE — Full LLM marketplace browse/install
llm-model-card.tsx
NEW — Rich model card component
llm-model-detail-modal.tsx
NEW — Model detail view
llm-compare-view.tsx
NEW — Side-by-side model comparison
ApiKeysSettingsTab.tsx
NEW — BYOK key management
SettingsPanel.tsx
Add API Keys tab
Agent create/edit form
Add rich model picker
Analytics dashboard
Add LLM usage section
Implementation Phases
Phase 1: Foundation (Tier 1 Activation)
Verify Anthropic client works with function calling and streaming
Verify Google Gemini client works
Seed all Tier 1 models in registry with accurate pricing
Update system settings defaults (cheap models for simple services)
Ensure analytics tracks all LLM calls with token/cost data
Phase 2: Model Catalog & Marketplace UI
Create
llm_modelsandworkspace_modelstablesSeed model catalog
Build LLM marketplace tab (browse, filter, compare, install)
Build model card and detail modal components
Update agent create/edit with rich model picker
Phase 3: OpenRouter Integration (Tier 2)
Build OpenRouter client
Add OpenRouter models to catalog
Build model sync service (daily pricing updates)
Test function calling across OpenRouter models
Phase 4: BYOK & Billing (Tier 3)
Create
user_api_keystableBuild BYOK key management API and UI
Update LLM Manager key resolution for BYOK
Build
llm_usagetracking table and middlewareImplement quota service and plan limits
Phase 5: Analytics & Optimization
Build LLM analytics endpoints
Update analytics dashboard with cost tracking
Build cost optimization recommendations engine
Build model health monitor and auto-fallback
Verification
Tier 1: Create agent with Claude Sonnet → chat → verify response streams correctly
Tier 1: Create agent with Gemini Flash → chat → verify function calling works
Marketplace: Browse LLMs → filter by "coding" → install DeepSeek Coder → assign to agent → test
Tier 2: Create agent with Llama 3.1 70B (via OpenRouter) → complex prompt → verify quality
BYOK: Add personal OpenAI key → create agent → verify uses personal key → verify no platform quota deducted
Analytics: Run 10 chat messages across 3 models → verify dashboard shows per-model costs
Fallback: Block a model (rate limit) → verify auto-fallback to secondary model
Quota: Set low quota on starter plan → exceed it → verify 429 response with upgrade message
Last updated

