Workspace Execution

chevron-rightRelevant source fileshashtag

Purpose and Scope

This document covers the Workspace Execution subsystem, which provides sandboxed code execution, file operations, and repository management in isolated workspace environments. Each workspace gets its own persistent directory on a shared volume where agents can clone repos, run commands, read/write files, and execute development workflows.

Related pages:

The workspace execution system consists of three main components:

  1. Workspace Worker Service — Long-running process that consumes tasks from Redis queues and executes them in isolated directories

  2. Orchestrator APIs — REST endpoints for task submission, file browsing, and GitHub integration

  3. WorkspaceClient — HTTP client library that proxies orchestrator requests to the worker


System Architecture

The workspace execution system follows a task queue pattern with strict orchestrator-worker separation:

spinner

Sources: orchestrator/api/tasks.py:1-404, services/workspace-worker/main.py:1-832, orchestrator/core/workspace_client.py:1-185


Workspace Worker Service

Main Loop

The WorkspaceWorker class implements an ARQ-style queue consumer without the ARQ library dependency. It polls Redis priority queues in strict order (critical → high → normal → low) and executes tasks concurrently up to WORKER_CONCURRENCY (default: 3).

spinner

Sources: services/workspace-worker/main.py:60-226

Concurrency Control

The worker uses an asyncio.Semaphore to limit concurrent task execution. Each task execution:

  1. Acquires the semaphore (blocks if at limit)

  2. Spawns an asyncio.Task for execution

  3. Releases the semaphore in the finally block

This prevents resource exhaustion when many tasks are queued simultaneously.

Sources: services/workspace-worker/main.py:71-180

Health Check Server

The worker runs an aiohttp HTTP server on port 8081 (configurable via WORKER_HEALTH_PORT) for two purposes:

Endpoint
Method
Purpose

/health

GET

Kubernetes liveness probe, returns {"status": "healthy", "active_tasks": N}

/workspaces/{id}/files

GET

Directory listing for code viewer widget (proxied from orchestrator)

/workspaces/{id}/files/content

GET

File content for code viewer (max 2MB)

/workspaces/{id}/exec

POST

Direct command execution (bypasses queue)

/workspaces/{id}/files/write

POST

Direct file write (bypasses queue)

/workspaces/{id}/files/grep

GET

Search file contents via grep

/workspaces/{id}/git

POST

Git operations (status, diff, commit, push)

The HTTP endpoints use WORKER_INTERNAL_TOKEN for authentication (header: X-Internal-Token). This prevents public internet access while allowing orchestrator-to-worker communication.

Sources: services/workspace-worker/main.py:461-818


Workspace Filesystem

Directory Layout

Each workspace gets a persistent directory tree on the worker volume:

Sources: services/workspace-worker/workspace_manager.py:10-18

WorkspaceManager API

The WorkspaceManager class provides the core filesystem abstraction:

Method
Purpose

ensure_workspace_exists()

Create directory tree + metadata on first use

get_usage_bytes()

Calculate current disk usage recursively

check_quota()

Enforce DEFAULT_QUOTA_GB storage limit

create_task_dir(task_id)

Create ephemeral tasks/task_{id}/ directory

cleanup_task(task_id)

Remove ephemeral dir + task-specific credentials

inject_credentials(task_id, creds)

Write SSH keys, git config, env vars

resolve_safe_path(rel_path)

Validate path stays within workspace (security boundary)

get_repo_path(repo_name)

Return path for a cached repo

list_repos()

List all cached repos in repos/

Sources: services/workspace-worker/workspace_manager.py:36-303

Path Safety

All path operations go through resolve_safe_path(), which enforces these security rules:

  1. No null bytes\x00 in path raises SecurityError

  2. No absolute paths — Paths starting with / are rejected

  3. No directory traversal — Resolved path must stay within {workspace_id}/ (checked via Path.relative_to())

  4. Symlinks resolved — Uses Path.resolve() to follow symlinks and verify containment

Sources: services/workspace-worker/workspace_manager.py:228-253

Storage Quotas

The worker enforces per-workspace storage quotas to prevent disk exhaustion:

  1. Default quota: DEFAULT_QUOTA_GB (default: 5GB, configurable via env var)

  2. Quota check before each task execution — task fails immediately if over quota

  3. Usage calculation via recursive rglob("*") (cached in _current_usage)

Quota enforcement happens in the task execution flow:

spinner

Sources: services/workspace-worker/main.py:251-264, services/workspace-worker/workspace_manager.py:83-114


Task Lifecycle

State Machine

Tasks flow through the following states:

spinner

Submission Flow

Tasks are submitted via POST /api/tasks/submit with this atomic two-phase protocol:

Phase 1: Database Insert

Phase 2: Redis Enqueue (only if DB insert succeeds)

If Phase 2 fails, the DB row is immediately marked as failed with error_message = "Redis enqueue failed: {error}".

Sources: orchestrator/api/tasks.py:62-173

Task Payload Structure

The JSON payload enqueued to Redis contains all execution metadata:

Sources: orchestrator/api/tasks.py:89-99

Event Streaming

Tasks publish real-time events via Redis pub/sub to channel workspace:task:{task_id}:events. The orchestrator exposes this as Server-Sent Events (SSE) via GET /api/tasks/{task_id}/events:

Event Type
Data
When

status_changed

{"status": "running"}

Transitions between states

progress_update

{"step": 2, "total_steps": 5, "description": "..."}

Before each step execution

error

{"error": "..."}

Step failure or exception

spinner

Sources: orchestrator/api/tasks.py:352-403, services/workspace-worker/main.py:422-434


Command Execution

Security Model

The WorkspaceToolExecutor enforces a five-layer security model for command execution:

spinner

Sources: services/workspace-worker/executor.py:1-537

Command Whitelist

Only these binaries are allowed (exact name matching, no path components):

Category
Commands

Shell

sh, bash, cd, pwd, export, source, test

VCS

git

Python

python, python3, pip, pip3, uv, pytest, ruff, black, mypy, isort, flake8, coverage, tox

Node

node, npm, npx, pnpm, yarn, vitest, jest, tsc, eslint, prettier

General

ls, cat, grep, find, tree, wc, sort, diff, jq, sed, awk, curl, wget, make, cmake, tar, gzip, zip, touch, mkdir, cp, mv, rm, chmod, echo, env, which

Other Languages

cargo, go, ruby, java, javac, mvn, gradle, rustc, gcc, g++

Commands with path separators (e.g., /usr/bin/python, ./malicious, ../escape) are rejected to prevent binary injection via relative/absolute paths.

Sources: services/workspace-worker/executor.py:35-73

Blocked Patterns

These regex patterns are always blocked, even if the binary is whitelisted:

Pattern
Why Blocked

rm\s+-rf\s+/\s*$

Deletes entire filesystem

rm\s+-rf\s+/[^w]

Deletes anything except /workspaces/

\bsudo\b

Privilege escalation

\bsu\s

User switching

\bchmod\s+777\b

Dangerous permissions

\bkubectl\b

Kubernetes cluster access

>\s*/dev/

Device file access

\bmkfs\b

Filesystem formatting

\bdd\s+if=

Raw disk operations

\biptables\b

Firewall manipulation

\bsystemctl\b

Service management

\bpasswd\b

Password changes

\buseradd\b, \buserdel\b

User management

\bmount\b, \bumount\b

Filesystem mounting

`

Backtick command substitution

\n

Embedded newlines

Sources: services/workspace-worker/executor.py:76-98

Validation Algorithm

Commands are validated by splitting on shell operators (&&, ||, ;, |) and checking each segment:

Sources: services/workspace-worker/executor.py:448-500

Sandboxed Environment

Subprocesses run with a stripped environment that prevents host variable leakage:

Host environment variables (e.g., AWS_ACCESS_KEY_ID, DATABASE_URL) are not inherited.

Sources: services/workspace-worker/executor.py:506-536

Output Limits

Command output is capped to prevent memory exhaustion:

  • STDOUT: 100KB (MAX_STDOUT_BYTES)

  • STDERR: 50KB (MAX_STDERR_BYTES)

Truncated output sets truncated: true in the result dict.

Sources: services/workspace-worker/executor.py:100-106


File Operations

The WorkspaceToolExecutor provides sandboxed file operations that agents call via workspace_* tools:

Read File

Sources: services/workspace-worker/executor.py:230-252

Write File

Sources: services/workspace-worker/executor.py:254-270

List Directory

Sources: services/workspace-worker/executor.py:272-300

HTTP Endpoints for File Browsing

The worker's HTTP server exposes file operations for the code viewer widget:

Endpoint
Method
Purpose
Max Size
Filter

/workspaces/{id}/files

GET

Directory listing

500 entries

Hides .ssh, .gitconfig, .aws, .task_env_*

/workspaces/{id}/files/content

GET

File content

2 MB

Hides sensitive paths

/workspaces/{id}/files/grep

GET

Search via grep -rn

200 matches

User-provided pattern + include glob

/workspaces/{id}/files/write

POST

Direct file write

No limit

Path safety enforced

Sensitive paths (.ssh, .gitconfig, .aws, .workspace_meta.json, .task_env_*) are blocked from file browsing to prevent credential leakage.

Sources: services/workspace-worker/main.py:525-759


GitHub Integration

OAuth-Authenticated Cloning

The GitHub integration uses Composio to retrieve OAuth tokens and inject them into clone URLs:

spinner

Sources: orchestrator/api/workspace_github.py:167-293, services/workspace-worker/executor.py:368-419

URL Validation (PRD-66)

Clone URLs are validated with strict rules to prevent injection attacks:

Sources: orchestrator/api/workspace_github.py:69-91

Git Clone Command Construction (PRD-70 FIX-01)

The worker uses -- separator to prevent argument injection:

This prevents attacks like --upload-pack=malicious-script being injected via branch names.

Sources: services/workspace-worker/executor.py:368-419

Repository Caching

Cloned repos are cached in repos/ for the lifetime of the workspace:

  1. First clone: git clone --depth 1 (shallow clone for speed)

  2. Subsequent access: git pull (updates existing clone)

  3. Metadata tracking: Workspace .workspace_meta.json stores repos_cached: ["repo-name", ...]

The worker checks WorkspaceManager.repo_exists(repo_name) before cloning to decide between clone vs pull.

Sources: services/workspace-worker/executor.py:393-398


Agent Tools

Agents interact with workspaces via platform tools registered in the ActionRegistry. These tools route through the orchestrator's WorkspaceClient to the worker:

Tool Definitions

Tool Name
Action
Permission
Parameters

workspace_read_file

Read file content

read

path (relative to workspace root)

workspace_write_file

Write/create file

write

path, content

workspace_list_dir

Directory listing

read

path (default: .)

workspace_grep

Search file contents

read

pattern, path, include (glob), max_results

workspace_exec

Run shell command

write

command, cwd, timeout

workspace_git

Git operations

write

operation (enum), cwd, args

Sources: orchestrator/modules/tools/discovery/workspace_actions.py:15-248

Tool Routing Flow

spinner

Sources: orchestrator/modules/tools/tool_router.py:1-575, orchestrator/core/workspace_client.py:56-185

WorkspaceClient Methods

The WorkspaceClient is a thin HTTP wrapper around the worker's endpoints:

All methods return {"success": False, "error": "..."} on connection errors.

Sources: orchestrator/core/workspace_client.py:56-185


Security & Sandboxing

The workspace execution system implements defense-in-depth with five security layers:

Layer 1: URL Validation (PRD-66)

GitHub clone URLs are validated before task submission:

  • Scheme check: Only https:// is allowed (no git://, ssh://, file://)

  • Host allowlist: Only github.com, gitlab.com, bitbucket.org

  • No embedded credentials: URL must not contain username:password@

  • Branch name validation: No .., @{, leading dashes (prevents --upload-pack injection)

Sources: orchestrator/api/workspace_github.py:69-91

Layer 2: Path Safety

All file/directory operations go through WorkspaceManager.resolve_safe_path():

This prevents attacks like:

  • ../../../etc/passwd (directory traversal)

  • /etc/passwd (absolute path)

  • symlink-to-root (symlink escape)

Sources: services/workspace-worker/workspace_manager.py:228-253

Layer 3: Command Whitelist

Only approved binaries from ALLOWED_COMMANDS can execute. Path-based binaries are rejected:

Sources: services/workspace-worker/executor.py:35-73, services/workspace-worker/executor.py:448-500

Layer 4: Environment Sandboxing

Subprocesses run with a stripped environment:

  • PATH: Limited to /usr/local/bin:/usr/bin:/bin (no /sbin, no user paths)

  • HOME: Set to workspace root (not host user's home)

  • No host variables: AWS_*, DATABASE_URL, SECRET_KEY are not inherited

  • Git isolation: GIT_CONFIG_GLOBAL points to workspace .gitconfig

Sources: services/workspace-worker/executor.py:506-536

Layer 5: Storage Quotas

Per-workspace disk usage is enforced to prevent exhaustion attacks:

  1. Default quota: 5GB (DEFAULT_QUOTA_GB)

  2. Checked before each task execution

  3. Task fails immediately if over quota (no execution)

  4. Usage calculation via rglob("*") + sum(f.stat().st_size)

Sources: services/workspace-worker/workspace_manager.py:83-114

Sensitive Path Filtering

The worker's HTTP server blocks access to sensitive paths:

File browsing endpoints return 403 Forbidden if a path component matches.

Sources: services/workspace-worker/main.py:473-481, services/workspace-worker/main.py:546-548, services/workspace-worker/main.py:609-611


API Reference

Task Management API

Base URL: /api/tasks

Endpoint
Method
Purpose
Auth

/submit

POST

Submit workspace task

Workspace JWT

`` (list)

GET

List recent tasks

Workspace JWT

/{task_id}

GET

Get task detail + result

Workspace JWT

/{task_id}/cancel

POST

Cancel queued/running task

Workspace JWT

/{task_id}/events

GET

SSE stream of task events

Workspace JWT

Request: POST /api/tasks/submit

Response: POST /api/tasks/submit

Sources: orchestrator/api/tasks.py:1-404

Workspace Files API

Base URL: /api/workspaces/{workspace_id}

Endpoint
Method
Purpose
Auth

/files

GET

Directory listing

Workspace JWT

/files/content

GET

Read file content

Workspace JWT

/exec

POST

Run shell command

Workspace JWT

Request: GET /files?path=repos/my-app

Request: GET /files/content?path=repos/my-app/src/main.py

Sources: orchestrator/api/workspace_files.py:1-108

GitHub Integration API

Base URL: /api/workspaces/{workspace_id}/github

Endpoint
Method
Purpose
Auth

/repos

GET

List user's GitHub repos

Workspace JWT

/clone

POST

Clone repo into workspace

Workspace JWT

Request: POST /clone

Response: POST /clone

Sources: orchestrator/api/workspace_github.py:1-294

Worker HTTP API (Internal)

Base URL: http://workspace-worker:8081 (internal only, requires X-Internal-Token)

Endpoint
Method
Purpose

/health

GET

Health check (public)

/workspaces/{id}/files

GET

Directory listing

/workspaces/{id}/files/content

GET

File content

/workspaces/{id}/files/write

POST

Write file

/workspaces/{id}/files/grep

GET

Search files

/workspaces/{id}/exec

POST

Run command

/workspaces/{id}/git

POST

Git operation

Sources: services/workspace-worker/main.py:798-804


Configuration

Environment Variables

Variable
Default
Purpose

REDIS_URL

redis://localhost:6379/0

Redis connection for queues + pub/sub

DATABASE_URL

(required)

PostgreSQL connection for task_executions table

WORKSPACE_VOLUME_PATH

/workspaces

Persistent volume mount path

WORKSPACE_DEFAULT_QUOTA_GB

5

Default storage quota per workspace

WORKER_CONCURRENCY

3

Max concurrent task executions

WORKER_HEALTH_PORT

8081

HTTP server port for health checks + file API

WORKER_BIND_HOST

0.0.0.0

HTTP server bind address

WORKER_INTERNAL_TOKEN

(optional)

Bearer token for worker API auth

WORKER_INTERNAL_URL

http://workspace-worker:8081

Worker HTTP URL (orchestrator config)

Sources: services/workspace-worker/main.py:14-19, services/workspace-worker/workspace_manager.py:32-33, orchestrator/core/workspace_client.py:20-44

Docker Compose

The workspace worker runs as a separate service in the docker-compose.yml:

Sources: docker-compose.yml:1-282


Last updated