LLM Usage Tracking
Purpose and Scope
This document describes the LLM usage tracking system that records every LLM API call for cost calculation, analytics, and optimization. The system captures token counts, latency, model information, and calculates costs based on a model pricing registry. Usage data is workspace-scoped and powers the analytics dashboard.
The tracking system integrates with multiple LLM providers (OpenAI, Anthropic, OpenRouter, Google, Azure OpenAI, xAI, Cohere) and supports both platform-provided keys and user-provided BYOK (Bring Your Own Key) credentials. All tracked usage is attributed to workspaces and optionally to specific agents or workflow executions.
Key Capabilities:
Per-request token and cost tracking for all LLM providers
Workspace-scoped analytics with admin override for platform-wide views
Cost projections and optimization recommendations
BYOK vs platform key usage differentiation
Error rate and latency monitoring
Real-time and cached aggregate statistics
Sources: orchestrator/core/llm/usage_tracker.py:1-150, orchestrator/api/llm_analytics.py:1-50, orchestrator/config.py:103-137
LLM Provider Configuration
Before usage can be tracked, LLM providers must be configured with API keys. The system supports multiple credential resolution strategies:
Configuration Hierarchy
Supported Providers
OpenAI
OPENAI_API_KEY
OPENAI_API_KEY
Primary provider, supports GPT models
Anthropic
ANTHROPIC_API_KEY
ANTHROPIC_API_KEY
Claude models
OpenRouter
OPENROUTER_API_KEY
OPENROUTER_API_KEY
Multi-model router
GOOGLE_API_KEY
GOOGLE_API_KEY or GEMINI_API_KEY
Gemini models
Azure OpenAI
AZURE_OPENAI_API_KEY
AZURE_OPENAI_API_KEY + AZURE_OPENAI_ENDPOINT
Enterprise deployments
xAI
XAI_API_KEY
XAI_API_KEY
Grok models
Cohere
COHERE_API_KEY
COHERE_API_KEY
Reranking and embeddings
Configuration Class Structure
The Config class provides centralized access to all LLM provider credentials:
Key Points:
Configuration values are loaded from environment variables at startup
LLM provider and model can be overridden in database
system_settingstableNo hardcoded defaults for provider/model — must be explicitly configured
configis a singleton instance available throughout the application
Sources: orchestrator/config.py:103-137, orchestrator/.env.example:18-26
Database Schema
The usage tracking system uses three primary database constructs:
llm_usage
Records individual LLM API calls
workspace_id, model_id, provider, input_tokens, output_tokens, input_cost, output_cost, total_cost, latency_ms, agent_id, execution_id, request_type, tier, is_byok, status, created_at
llm_models
Model pricing registry
model_id, provider, input_cost_per_1k_tokens, output_cost_per_1k_tokens, context_window, tier, capabilities
agents.model_usage_stats
Cached per-agent aggregates (JSONB)
total_tokens, total_cost, total_requests, avg_tokens_per_request, last_used_at, input_tokens, output_tokens
LLMUsage Table Structure
Key Relationships:
LLMUsage.workspace_id → Workspace.id: Multi-tenant isolationLLMUsage.agent_id → Agent.id: Attribution to specific agentsLLMUsage.model_id → LLMModel.model_id: Cost lookup in pricing registryLLMUsage.is_byok = true → UserApiKey: Indicates workspace-provided credentials were usedAgent.model_usage_stats: Cached JSONB aggregate of usage per agent
Sources: orchestrator/core/models/core.py:200-250, orchestrator/api/llm_analytics.py:30-60, orchestrator/api/agents.py:230-240
Usage Tracking Flow
Recording a Usage Event
Every LLM API call is tracked through the UsageTracker.track() static method. The tracker runs in a separate database session to ensure tracking failures never break the parent transaction.
Critical Design Decisions:
Separate session:
UsageTrackercreates its ownSessionLocal()to isolate tracking from parent transactionNever throws: All exceptions are caught and logged; tracking failures never break agent execution
BYOK flag: Captures whether user-provided (BYOK) or platform credentials were used
Dual aggregation: Writes granular
llm_usagerow + updates cachedagent.model_usage_statsJSONB
Sources: orchestrator/core/llm/usage_tracker.py:17-95
Cost Calculation
Pricing Model
Costs are calculated using a per-1000-token pricing model stored in the llm_models registry:
UsageTracker.track() Method
The tracking method accepts these parameters:
workspace_id
UUID
Multi-tenant scoping
model_id
str
Model identifier (e.g., "gpt-4o", "claude-3-5-sonnet-20241022")
provider
str
Provider name ("openai", "anthropic", "openrouter", "google", "azure", "xai", "cohere")
input_tokens
int
Prompt token count
output_tokens
int
Completion token count
agent_id
int?
Optional agent attribution
execution_id
str?
Optional workflow/recipe execution ID for traceability
request_type
str
"chat", "completion", "embedding" (default: "chat")
latency_ms
int?
Response time in milliseconds (for performance monitoring)
status
str
"success" or "error" (default: "success")
is_byok
bool
Whether user-provided API key was used (default: False)
tier
str
Model tier ("premium", "standard", "fast") or routing tier ("direct", "tier1", "tier2")
error_message
str?
Error details if status="error"
Implementation:
Sources: orchestrator/core/llm/usage_tracker.py:20-95
API Endpoints
Query Endpoints
The /api/analytics/llm router provides endpoints for querying usage data:
Endpoint Details
GET /api/analytics/llm/usage
Groups usage data by dimension (model, provider, agent, tier, is_byok).
Query Parameters:
period: "1h" | "24h" | "7d" | "30d" | "90d"group_by: "model" | "provider" | "agent" | "tier" | "is_byok" | "request_type"
Response: List[UsageGroup]
GET /api/analytics/llm/summary
Dashboard summary with totals, top models, and daily cost trend.
Response: UsageSummary
GET /api/analytics/llm/recommendations
AI-generated cost optimization suggestions based on usage patterns.
Logic:
Identifies agents using premium models (gpt-4o, claude-3-opus) for simple tasks (avg output < 200 tokens)
Suggests switching to cheaper models (gpt-4o-mini, claude-haiku)
Calculates potential savings (~85% reduction)
Response: List[Recommendation]
Sources: orchestrator/api/llm_analytics.py:87-320
Workspace Scoping and Admin Override
Standard Workspace Filtering
All analytics queries are automatically filtered by workspace_id from the RequestContext:
Admin Override Mechanism
Admins can view platform-wide analytics using the __all__ workspace sentinel:
Frontend Implementation:
The AdminWorkspaceSwitcher component allows admins to switch between workspaces:
Backend Implementation:
Admin Endpoints:
Sources: orchestrator/core/auth/hybrid.py:310-354, orchestrator/api/llm_analytics.py:739-820, frontend/components/analytics/admin-workspace-switcher.tsx:1-64
Frontend Integration
React Query Hooks
The use-unified-analytics.ts hook provides typed access to LLM usage data:
Cache Key Strategy
Query keys include workspace scope to prevent data leakage when admin switches workspaces:
Cost Analytics UI Component
The AnalyticsCosts component displays token usage, costs, and trends:
Key Features:
Hero stats (total cost, tokens, cost per request, top spender)
Multi-line stacked area chart showing daily cost by model
Cost projections with monthly estimates
Model comparison radar chart
Per-agent cost breakdown table
Data Flow:
Sources: frontend/hooks/use-unified-analytics.ts:288-396, frontend/components/analytics/analytics-costs.tsx:1-700
Dual Aggregation Strategy
The system maintains usage data in two locations for different query patterns:
Real-Time Tracking (Primary)
Table: llm_usage
Granular per-request records
Queryable by date range, model, agent, workspace
Supports time-series analytics and cost projections
Ground truth for billing
Cached Aggregates (Secondary)
Field: agents.model_usage_stats (JSONB)
Pre-aggregated per-agent totals
Updated on every tracked request
Fast for agent list queries
Used as fallback when
llm_usagehas no data
Update Logic:
Frontend Fallback:
Sources: orchestrator/core/llm/usage_tracker.py:67-95, frontend/hooks/use-unified-analytics.ts:288-396, orchestrator/api/agents.py:230-240
Cost Projection Algorithm
Projection Calculation
Monthly cost projections account for sparse usage data by counting actual days with activity:
Why This Matters:
Using days_in_period instead of days_with_data would underestimate costs:
If a workspace only used LLM on 5 days out of a 30-day period
current_cost / 30 × 30 = current_cost(incorrect)current_cost / 5 × 30 = 6× current_cost(correct)
Projection Response
Sources: orchestrator/api/llm_analytics.py:490-602
OpenRouter Integration
Specialized Endpoints
The system includes dedicated endpoints for OpenRouter analytics:
GET /api/analytics/llm/openrouter/credits
Fetch account credits balance
GET /api/analytics/llm/openrouter/key-info
Query key limits, daily/weekly/monthly usage
POST /api/analytics/llm/openrouter/sync
Sync activity data into llm_usage table (BYOK only)
Key Resolution Strategy
Sync Activity:
The sync endpoint prevents cross-workspace data duplication by requiring BYOK keys:
Sources: orchestrator/api/llm_analytics.py:604-737
Error Handling and Isolation
Separate Session Strategy
The UsageTracker uses a dedicated database session to ensure tracking failures never break the parent transaction:
Status Field
The status field distinguishes successful vs. failed LLM calls:
status='success': LLM call completed, tokens countedstatus='error': LLM call failed, tracked for error rate metrics
Error Rate Calculation:
Sources: orchestrator/core/llm/usage_tracker.py:36-95, orchestrator/api/llm_analytics.py:210-260
Last updated

