PRD-105 — Budget & Governance

Version: 1.0 Type: Research + Design Status: Complete — Ready for Peer Review Priority: P0 Dependencies: PRD-101 (Mission Schema), PRD-100 (Research Master) Feeds Into: PRD-82C (Parallel Execution + Budget + Contractors) Author: Gerard Kavanagh + Claude Date: 2026-03-15


1. Problem Statement

1.1 The Gap

Automatos has no per-mission budget enforcement. Cost data flows from LLM responses into the llm_usage table (via UsageTracker), and analytics endpoints surface spending trends, but nothing blocks a mission from spending beyond any limit. The platform records what was spent — it never prevents overspending.

Gap
Impact

No pre-call budget check

A runaway mission can exhaust workspace LLM credits in minutes

No per-mission cost cap

Coordinator-spawned tasks have no aggregate spending boundary

No tool policy layering

Every agent gets every assigned tool — no mission-scoped restrictions

No approval gates beyond chat

Complex missions auto-execute with no human checkpoint

Workspace.plan_limits JSONB exists but is never read

The schema hook for enforcement is present but unwired

Two TokenBudgetManager classes serve different purposes

modules/context/budget.py (context-window packing) vs stages/token_budget_manager.py (workflow tokens, latent bugs)

1.2 What This PRD Delivers

  1. Budget admission gate — synchronous pre-call check that blocks LLM calls when budget is exhausted

  2. Budget data modelbudget_config and budget_spent JSONB schemas on orchestration_runs

  3. Graduated thresholds — warn at 50%, throttle at 80%, stop at 100%

  4. Tool policy layering — 4-tier monotonically narrowing model (workspace → mission → task → agent)

  5. Pre-estimation algorithm — predict mission cost before execution for human approval

  6. Post-call reconciliation — actual cost recorded with mission/task attribution

  7. Workspace.plan_limits activation — wire the existing but unused JSONB field


2. Prior Art: Budget & Governance Patterns

2.1 System-by-System Analysis

OpenClaw 8-Stage Tool Policy Chain

OpenClaw implements an 8-stage monotonically narrowing tool policy chain. Each stage can only narrow the tool set — never expand. Deny always wins over allow. Enforcement happens at tool-set construction (tools passed to LLM tools= param), not post-hoc interception.

What we adopt: Monotonic narrowing invariant (workspace → mission → task → agent). Tool group shorthand for policy configuration. Enforcement at tool-set construction (already how get_tools_for_agent() works).

What we reject: 8 stages (overkill). No temporal/budget dimension. No per-mission scoping.

K8s ResourceQuota & Admission Control

K8s enforces resource limits through synchronous admission control — the API server returns HTTP 403 before the resource is created. Two layers: ResourceQuota (namespace aggregate) + LimitRange (per-pod defaults/maximums). Quota does not retroactively evict running workloads.

Direct translation:

K8s Concept
Mission Equivalent

Namespace

Mission (isolated budget boundary)

ResourceQuota spec.hard

max_cost_usd, max_tokens, max_wall_time_s

LimitRange default + max

Per-agent defaults and ceilings within a mission

Validating admission

Pre-call check: current_spend + estimated_cost ≤ ceiling

HTTP 403

Raise BudgetExceededError before LLM call

Quota scopes

Priority sub-budgets (coordinator/verifier vs worker agents)

What we adopt: Synchronous admission gate. Hard rejection (not queuing). Two-layer limits (mission aggregate + per-task). In-flight work completes; next admission rejected.

AWS Budgets

AWS implements budget enforcement through soft caps with graduated automated actions. Up to 5 thresholds per budget. AUTOMATIC or MANUAL approval model per action. Critical lesson: AWS has no true hard cap — billing data updates every 8-12 hours. For LLM missions that exhaust budgets in seconds, this lag is fatal.

What we adopt: Graduated thresholds (warn → throttle → stop). Separate action thresholds from notification thresholds. Dual COST + USAGE tracking.

What we reject: Post-hoc billing approach. We need synchronous pre-call checks.

Token Bucket (Anthropic/Stripe Pattern)

For mission budgeting, use a cost-denominated token bucket with no refill:

  • Bucket capacity = mission budget in dollars

  • Each LLM call consumes estimated_cost from the bucket

  • After call, reconcile estimated vs actual cost

  • Refill disabled (missions have fixed, non-replenishing budgets)

LiteLLM BudgetManager

Two-phase pattern: projected_cost(model, messages, user) pre-call, update_cost(completion_response, user) post-call. This is the closest existing implementation to what we need.

What we adopt: The two-phase pattern (estimate → check → execute → reconcile).

2.2 Architectural Decisions Summary

Decision
Choice
Source
Rationale

Enforcement model

Synchronous pre-call admission gate

K8s admission control

Prevents overspend before it happens

Budget type

Hard cap on USD, soft cap on tokens

AWS Budgets + K8s

Dollars matter to users; tokens are internal

Threshold model

Graduated: warn 50%, throttle 80%, stop 100%

AWS Budgets

Progressive response prevents surprise stops

In-flight handling

Complete current call, reject next

K8s quota

Interrupting LLM calls wastes tokens already consumed

Running total

Redis-based for sub-ms reads

Rate limiter pattern

Can't add DB round-trip to every LLM call

Tool policy

4-tier monotonic narrowing

OpenClaw

Each scope can only restrict, never expand

Pre-estimation

Input tokens (exact) + output estimate (max × 0.7)

LiteLLM

Conservative but not worst-case


3. Budget Admission Gate

3.1 Architecture

3.2 MissionBudgetManager Interface

3.3 Integration with LLMManager

The budget check inserts into the existing LLM call path:


4. Budget Data Model

4.1 Budget Config (on orchestration_runs)

4.2 Budget Spent (on orchestration_runs)

4.3 Budget Status (on orchestration_runs)


5. Pre-Estimation Algorithm

5.1 Per-Call Estimate

5.2 Per-Mission Estimate

Computed at plan approval time for the user:

The 2x buffer accounts for retries, replanning, and estimation error.


6. Budget Exceeded Handling

6.1 Coordinator Response to BudgetExceededError

6.2 Human Resolution Options

Action
API Endpoint
Effect

Increase budget

POST /missions/{id}/budget

Updates budget_config.max_cost_usd, resumes mission

Cancel mission

DELETE /missions/{id}

Cancels all pending tasks, mission → cancelled

Downgrade models

POST /missions/{id}/budget with downgrade=true

Switch remaining tasks to BUDGET_MODELS, resume

Complete current + stop

POST /missions/{id}/budget with complete_current=true

Finish in-flight tasks, cancel remaining


7. Tool Policy Layering

7.1 Four-Tier Narrowing Model

Invariant: each tier can only narrow, never expand. The effective tool set is the intersection of all tiers.

7.2 Enforcement Point

7.3 ToolPolicy Schema


8. Workspace Plan Limits Activation

8.1 Current State

Workspace.plan_limits JSONB exists on the workspaces table but is never read by any code path.

8.2 Wiring Design


9. RBAC Extensions

9.1 New Permissions

Permission
Who Can
Description

mission:create

EDITOR, ADMIN, OWNER

Create a new mission

mission:approve

ADMIN, OWNER

Approve/reject mission plans

mission:review

EDITOR, ADMIN, OWNER

Review mission results

budget:view

VIEWER, EDITOR, ADMIN, OWNER

See budget status

budget:set

ADMIN, OWNER

Set mission budget

budget:override

OWNER

Increase budget beyond workspace limits

tool_policy:set

ADMIN, OWNER

Configure workspace tool policy

9.2 Integration with Existing RBAC

The existing permissions.py uses role-based access (OWNER/ADMIN/EDITOR/VIEWER). New permissions map to existing roles — no new permission system needed, just new permission checks in mission API endpoints.


10. Cost Attribution

10.1 Linking llm_usage to Missions

Decision: Add nullable mission_task_id FK to llm_usage.

Why nullable: Thousands of existing llm_usage rows from non-mission calls (chatbot, heartbeat, routing). These have no mission context and never will. Non-nullable would require backfill.

10.2 Request Type Extension

This enables computing:

  • coordination_cost_usd = SUM(cost WHERE request_type = 'coordinator')

  • verification_cost_usd = SUM(cost WHERE request_type = 'verifier')

  • task_cost_usd = SUM(cost WHERE request_type = 'mission_task')


11. Acceptance Criteria

Must Have

Should Have

Nice to Have


12. Risk Register

#
Risk
Impact
Likelihood
Mitigation

1

Pre-estimation inaccuracy — over-estimate blocks work, under-estimate allows overspend

Medium

High

0.7 median estimate (not worst case). Reconcile actual after each call. 10% overage buffer.

2

Stale model pricing

Medium

Medium

Periodic sync from OpenRouter API. Timestamp pricing. Alert on age > 7 days.

3

Budget check latency

Low

Low

Redis-based running total (sub-ms). No DB query per call.

4

Governance overhead / user friction

High

Medium

Minimal defaults: plan approval ON, everything else OFF. Progressive disclosure.

5

Context-budget vs cost-budget confusion

Low

High

Clear naming: ContextBudgetManager vs MissionBudgetManager. Document distinction.

6

In-flight work when budget exceeded

Medium

Low

K8s pattern: in-flight completes, next rejected. Track overage.


13. Dependencies

Dependency
Direction
Notes

PRD-101 (Mission Schema)

Uses

budget_config, budget_spent JSONB on orchestration_runs

PRD-102 (Coordinator)

Uses

Coordinator calls budget gate before spawning tasks

PRD-103 (Verification)

Uses

Verification cost tracked separately within budget

PRD-104 (Contractors)

Uses

Contractors inherit mission budget constraints

PRD-106 (Telemetry)

Feeds

Budget utilization is a telemetry dimension

UsageTracker

Extension

Add mission_task_id to tracking path

LLMManager

Extension

Insert budget check before _call_provider

tool_router.py

Extension

Add policy intersection to get_tools_for_agent()


Appendix: Research Sources

Source
What It Informed

OpenClaw 8-stage tool policy (docs.openclaw.ai)

Monotonic narrowing, enforcement at tool-set construction

K8s ResourceQuota + admission control (kubernetes.io)

Synchronous hard rejection, two-layer limits, quota scopes

AWS Budgets API

Graduated thresholds, AUTOMATIC/MANUAL actions, cost allocation

Token bucket (Anthropic/Stripe)

Cost-denominated bucket with no refill for fixed budgets

LiteLLM BudgetManager

projected_cost() + update_cost() two-phase pattern

RouteLLM (ICLR 2025)

75% cost reduction validates model downgrade strategy

Automatos UsageTracker

Existing cost recording path to extend

Automatos config.py

PREMIUM_MODELS, BUDGET_MODELS for downgrade targets

Automatos Workspace.plan_limits

Existing unwired JSONB field to activate

Last updated