LLM Usage Tracking

chevron-rightRelevant source fileshashtag

Purpose and Scope

This document describes the LLM usage tracking system that records every LLM API call for cost calculation, analytics, and optimization. The system captures token counts, latency, model information, and calculates costs based on a model pricing registry. Usage data is workspace-scoped and powers the analytics dashboard.

The tracking system integrates with multiple LLM providers (OpenAI, Anthropic, OpenRouter, Google, Azure OpenAI, xAI, Cohere) and supports both platform-provided keys and user-provided BYOK (Bring Your Own Key) credentials. All tracked usage is attributed to workspaces and optionally to specific agents or workflow executions.

Key Capabilities:

  • Per-request token and cost tracking for all LLM providers

  • Workspace-scoped analytics with admin override for platform-wide views

  • Cost projections and optimization recommendations

  • BYOK vs platform key usage differentiation

  • Error rate and latency monitoring

  • Real-time and cached aggregate statistics

Sources: orchestrator/core/llm/usage_tracker.py:1-150, orchestrator/api/llm_analytics.py:1-50, orchestrator/config.py:103-137


LLM Provider Configuration

Before usage can be tracked, LLM providers must be configured with API keys. The system supports multiple credential resolution strategies:

Configuration Hierarchy

spinner

Supported Providers

Provider
Config Key
Environment Variable
Notes

OpenAI

OPENAI_API_KEY

OPENAI_API_KEY

Primary provider, supports GPT models

Anthropic

ANTHROPIC_API_KEY

ANTHROPIC_API_KEY

Claude models

OpenRouter

OPENROUTER_API_KEY

OPENROUTER_API_KEY

Multi-model router

Google

GOOGLE_API_KEY

GOOGLE_API_KEY or GEMINI_API_KEY

Gemini models

Azure OpenAI

AZURE_OPENAI_API_KEY

AZURE_OPENAI_API_KEY + AZURE_OPENAI_ENDPOINT

Enterprise deployments

xAI

XAI_API_KEY

XAI_API_KEY

Grok models

Cohere

COHERE_API_KEY

COHERE_API_KEY

Reranking and embeddings

Configuration Class Structure

The Config class provides centralized access to all LLM provider credentials:

Key Points:

  • Configuration values are loaded from environment variables at startup

  • LLM provider and model can be overridden in database system_settings table

  • No hardcoded defaults for provider/model — must be explicitly configured

  • config is a singleton instance available throughout the application

Sources: orchestrator/config.py:103-137, orchestrator/.env.example:18-26


Database Schema

The usage tracking system uses three primary database constructs:

Table/Field
Purpose
Key Columns

llm_usage

Records individual LLM API calls

workspace_id, model_id, provider, input_tokens, output_tokens, input_cost, output_cost, total_cost, latency_ms, agent_id, execution_id, request_type, tier, is_byok, status, created_at

llm_models

Model pricing registry

model_id, provider, input_cost_per_1k_tokens, output_cost_per_1k_tokens, context_window, tier, capabilities

agents.model_usage_stats

Cached per-agent aggregates (JSONB)

total_tokens, total_cost, total_requests, avg_tokens_per_request, last_used_at, input_tokens, output_tokens

LLMUsage Table Structure

spinner

Key Relationships:

  • LLMUsage.workspace_id → Workspace.id: Multi-tenant isolation

  • LLMUsage.agent_id → Agent.id: Attribution to specific agents

  • LLMUsage.model_id → LLMModel.model_id: Cost lookup in pricing registry

  • LLMUsage.is_byok = true → UserApiKey: Indicates workspace-provided credentials were used

  • Agent.model_usage_stats: Cached JSONB aggregate of usage per agent

Sources: orchestrator/core/models/core.py:200-250, orchestrator/api/llm_analytics.py:30-60, orchestrator/api/agents.py:230-240


Usage Tracking Flow

Recording a Usage Event

Every LLM API call is tracked through the UsageTracker.track() static method. The tracker runs in a separate database session to ensure tracking failures never break the parent transaction.

spinner

Critical Design Decisions:

  1. Separate session: UsageTracker creates its own SessionLocal() to isolate tracking from parent transaction

  2. Never throws: All exceptions are caught and logged; tracking failures never break agent execution

  3. BYOK flag: Captures whether user-provided (BYOK) or platform credentials were used

  4. Dual aggregation: Writes granular llm_usage row + updates cached agent.model_usage_stats JSONB

Sources: orchestrator/core/llm/usage_tracker.py:17-95


Cost Calculation

Pricing Model

Costs are calculated using a per-1000-token pricing model stored in the llm_models registry:

UsageTracker.track() Method

The tracking method accepts these parameters:

Parameter
Type
Purpose

workspace_id

UUID

Multi-tenant scoping

model_id

str

Model identifier (e.g., "gpt-4o", "claude-3-5-sonnet-20241022")

provider

str

Provider name ("openai", "anthropic", "openrouter", "google", "azure", "xai", "cohere")

input_tokens

int

Prompt token count

output_tokens

int

Completion token count

agent_id

int?

Optional agent attribution

execution_id

str?

Optional workflow/recipe execution ID for traceability

request_type

str

"chat", "completion", "embedding" (default: "chat")

latency_ms

int?

Response time in milliseconds (for performance monitoring)

status

str

"success" or "error" (default: "success")

is_byok

bool

Whether user-provided API key was used (default: False)

tier

str

Model tier ("premium", "standard", "fast") or routing tier ("direct", "tier1", "tier2")

error_message

str?

Error details if status="error"

Implementation:

Sources: orchestrator/core/llm/usage_tracker.py:20-95


API Endpoints

Query Endpoints

The /api/analytics/llm router provides endpoints for querying usage data:

spinner

Endpoint Details

GET /api/analytics/llm/usage

Groups usage data by dimension (model, provider, agent, tier, is_byok).

Query Parameters:

  • period: "1h" | "24h" | "7d" | "30d" | "90d"

  • group_by: "model" | "provider" | "agent" | "tier" | "is_byok" | "request_type"

Response: List[UsageGroup]

GET /api/analytics/llm/summary

Dashboard summary with totals, top models, and daily cost trend.

Response: UsageSummary

GET /api/analytics/llm/recommendations

AI-generated cost optimization suggestions based on usage patterns.

Logic:

  • Identifies agents using premium models (gpt-4o, claude-3-opus) for simple tasks (avg output < 200 tokens)

  • Suggests switching to cheaper models (gpt-4o-mini, claude-haiku)

  • Calculates potential savings (~85% reduction)

Response: List[Recommendation]

Sources: orchestrator/api/llm_analytics.py:87-320


Workspace Scoping and Admin Override

Standard Workspace Filtering

All analytics queries are automatically filtered by workspace_id from the RequestContext:

Admin Override Mechanism

Admins can view platform-wide analytics using the __all__ workspace sentinel:

spinner

Frontend Implementation:

The AdminWorkspaceSwitcher component allows admins to switch between workspaces:

Backend Implementation:

Admin Endpoints:

Sources: orchestrator/core/auth/hybrid.py:310-354, orchestrator/api/llm_analytics.py:739-820, frontend/components/analytics/admin-workspace-switcher.tsx:1-64


Frontend Integration

React Query Hooks

The use-unified-analytics.ts hook provides typed access to LLM usage data:

spinner

Cache Key Strategy

Query keys include workspace scope to prevent data leakage when admin switches workspaces:

Cost Analytics UI Component

The AnalyticsCosts component displays token usage, costs, and trends:

Key Features:

  • Hero stats (total cost, tokens, cost per request, top spender)

  • Multi-line stacked area chart showing daily cost by model

  • Cost projections with monthly estimates

  • Model comparison radar chart

  • Per-agent cost breakdown table

Data Flow:

spinner

Sources: frontend/hooks/use-unified-analytics.ts:288-396, frontend/components/analytics/analytics-costs.tsx:1-700


Dual Aggregation Strategy

The system maintains usage data in two locations for different query patterns:

Real-Time Tracking (Primary)

Table: llm_usage

  • Granular per-request records

  • Queryable by date range, model, agent, workspace

  • Supports time-series analytics and cost projections

  • Ground truth for billing

Cached Aggregates (Secondary)

Field: agents.model_usage_stats (JSONB)

  • Pre-aggregated per-agent totals

  • Updated on every tracked request

  • Fast for agent list queries

  • Used as fallback when llm_usage has no data

Update Logic:

Frontend Fallback:

Sources: orchestrator/core/llm/usage_tracker.py:67-95, frontend/hooks/use-unified-analytics.ts:288-396, orchestrator/api/agents.py:230-240


Cost Projection Algorithm

Projection Calculation

Monthly cost projections account for sparse usage data by counting actual days with activity:

Why This Matters:

Using days_in_period instead of days_with_data would underestimate costs:

  • If a workspace only used LLM on 5 days out of a 30-day period

  • current_cost / 30 × 30 = current_cost (incorrect)

  • current_cost / 5 × 30 = 6× current_cost (correct)

Projection Response

Sources: orchestrator/api/llm_analytics.py:490-602


OpenRouter Integration

Specialized Endpoints

The system includes dedicated endpoints for OpenRouter analytics:

Endpoint
Purpose

GET /api/analytics/llm/openrouter/credits

Fetch account credits balance

GET /api/analytics/llm/openrouter/key-info

Query key limits, daily/weekly/monthly usage

POST /api/analytics/llm/openrouter/sync

Sync activity data into llm_usage table (BYOK only)

Key Resolution Strategy

Sync Activity:

The sync endpoint prevents cross-workspace data duplication by requiring BYOK keys:

Sources: orchestrator/api/llm_analytics.py:604-737


Error Handling and Isolation

Separate Session Strategy

The UsageTracker uses a dedicated database session to ensure tracking failures never break the parent transaction:

Status Field

The status field distinguishes successful vs. failed LLM calls:

  • status='success': LLM call completed, tokens counted

  • status='error': LLM call failed, tracked for error rate metrics

Error Rate Calculation:

Sources: orchestrator/core/llm/usage_tracker.py:36-95, orchestrator/api/llm_analytics.py:210-260


Last updated