Workspace Execution
Purpose and Scope
This document covers the Workspace Execution subsystem, which provides sandboxed code execution, file operations, and repository management in isolated workspace environments. Each workspace gets its own persistent directory on a shared volume where agents can clone repos, run commands, read/write files, and execute development workflows.
Related pages:
For agent-driven workflow execution, see Recipe Execution Engine
For task submission and management APIs, see Task Management
For security policies and multi-tenancy, see Data Isolation
The workspace execution system consists of three main components:
Workspace Worker Service — Long-running process that consumes tasks from Redis queues and executes them in isolated directories
Orchestrator APIs — REST endpoints for task submission, file browsing, and GitHub integration
WorkspaceClient — HTTP client library that proxies orchestrator requests to the worker
System Architecture
The workspace execution system follows a task queue pattern with strict orchestrator-worker separation:
Sources: orchestrator/api/tasks.py:1-404, services/workspace-worker/main.py:1-832, orchestrator/core/workspace_client.py:1-185
Workspace Worker Service
Main Loop
The WorkspaceWorker class implements an ARQ-style queue consumer without the ARQ library dependency. It polls Redis priority queues in strict order (critical → high → normal → low) and executes tasks concurrently up to WORKER_CONCURRENCY (default: 3).
Sources: services/workspace-worker/main.py:60-226
Concurrency Control
The worker uses an asyncio.Semaphore to limit concurrent task execution. Each task execution:
Acquires the semaphore (blocks if at limit)
Spawns an
asyncio.Taskfor executionReleases the semaphore in the
finallyblock
This prevents resource exhaustion when many tasks are queued simultaneously.
Sources: services/workspace-worker/main.py:71-180
Health Check Server
The worker runs an aiohttp HTTP server on port 8081 (configurable via WORKER_HEALTH_PORT) for two purposes:
/health
GET
Kubernetes liveness probe, returns {"status": "healthy", "active_tasks": N}
/workspaces/{id}/files
GET
Directory listing for code viewer widget (proxied from orchestrator)
/workspaces/{id}/files/content
GET
File content for code viewer (max 2MB)
/workspaces/{id}/exec
POST
Direct command execution (bypasses queue)
/workspaces/{id}/files/write
POST
Direct file write (bypasses queue)
/workspaces/{id}/files/grep
GET
Search file contents via grep
/workspaces/{id}/git
POST
Git operations (status, diff, commit, push)
The HTTP endpoints use WORKER_INTERNAL_TOKEN for authentication (header: X-Internal-Token). This prevents public internet access while allowing orchestrator-to-worker communication.
Sources: services/workspace-worker/main.py:461-818
Workspace Filesystem
Directory Layout
Each workspace gets a persistent directory tree on the worker volume:
Sources: services/workspace-worker/workspace_manager.py:10-18
WorkspaceManager API
The WorkspaceManager class provides the core filesystem abstraction:
ensure_workspace_exists()
Create directory tree + metadata on first use
get_usage_bytes()
Calculate current disk usage recursively
check_quota()
Enforce DEFAULT_QUOTA_GB storage limit
create_task_dir(task_id)
Create ephemeral tasks/task_{id}/ directory
cleanup_task(task_id)
Remove ephemeral dir + task-specific credentials
inject_credentials(task_id, creds)
Write SSH keys, git config, env vars
resolve_safe_path(rel_path)
Validate path stays within workspace (security boundary)
get_repo_path(repo_name)
Return path for a cached repo
list_repos()
List all cached repos in repos/
Sources: services/workspace-worker/workspace_manager.py:36-303
Path Safety
All path operations go through resolve_safe_path(), which enforces these security rules:
No null bytes —
\x00in path raisesSecurityErrorNo absolute paths — Paths starting with
/are rejectedNo directory traversal — Resolved path must stay within
{workspace_id}/(checked viaPath.relative_to())Symlinks resolved — Uses
Path.resolve()to follow symlinks and verify containment
Sources: services/workspace-worker/workspace_manager.py:228-253
Storage Quotas
The worker enforces per-workspace storage quotas to prevent disk exhaustion:
Default quota:
DEFAULT_QUOTA_GB(default: 5GB, configurable via env var)Quota check before each task execution — task fails immediately if over quota
Usage calculation via recursive
rglob("*")(cached in_current_usage)
Quota enforcement happens in the task execution flow:
Sources: services/workspace-worker/main.py:251-264, services/workspace-worker/workspace_manager.py:83-114
Task Lifecycle
State Machine
Tasks flow through the following states:
Submission Flow
Tasks are submitted via POST /api/tasks/submit with this atomic two-phase protocol:
Phase 1: Database Insert
Phase 2: Redis Enqueue (only if DB insert succeeds)
If Phase 2 fails, the DB row is immediately marked as failed with error_message = "Redis enqueue failed: {error}".
Sources: orchestrator/api/tasks.py:62-173
Task Payload Structure
The JSON payload enqueued to Redis contains all execution metadata:
Sources: orchestrator/api/tasks.py:89-99
Event Streaming
Tasks publish real-time events via Redis pub/sub to channel workspace:task:{task_id}:events. The orchestrator exposes this as Server-Sent Events (SSE) via GET /api/tasks/{task_id}/events:
status_changed
{"status": "running"}
Transitions between states
progress_update
{"step": 2, "total_steps": 5, "description": "..."}
Before each step execution
error
{"error": "..."}
Step failure or exception
Sources: orchestrator/api/tasks.py:352-403, services/workspace-worker/main.py:422-434
Command Execution
Security Model
The WorkspaceToolExecutor enforces a five-layer security model for command execution:
Sources: services/workspace-worker/executor.py:1-537
Command Whitelist
Only these binaries are allowed (exact name matching, no path components):
Shell
sh, bash, cd, pwd, export, source, test
VCS
git
Python
python, python3, pip, pip3, uv, pytest, ruff, black, mypy, isort, flake8, coverage, tox
Node
node, npm, npx, pnpm, yarn, vitest, jest, tsc, eslint, prettier
General
ls, cat, grep, find, tree, wc, sort, diff, jq, sed, awk, curl, wget, make, cmake, tar, gzip, zip, touch, mkdir, cp, mv, rm, chmod, echo, env, which
Other Languages
cargo, go, ruby, java, javac, mvn, gradle, rustc, gcc, g++
Commands with path separators (e.g., /usr/bin/python, ./malicious, ../escape) are rejected to prevent binary injection via relative/absolute paths.
Sources: services/workspace-worker/executor.py:35-73
Blocked Patterns
These regex patterns are always blocked, even if the binary is whitelisted:
rm\s+-rf\s+/\s*$
Deletes entire filesystem
rm\s+-rf\s+/[^w]
Deletes anything except /workspaces/
\bsudo\b
Privilege escalation
\bsu\s
User switching
\bchmod\s+777\b
Dangerous permissions
\bkubectl\b
Kubernetes cluster access
>\s*/dev/
Device file access
\bmkfs\b
Filesystem formatting
\bdd\s+if=
Raw disk operations
\biptables\b
Firewall manipulation
\bsystemctl\b
Service management
\bpasswd\b
Password changes
\buseradd\b, \buserdel\b
User management
\bmount\b, \bumount\b
Filesystem mounting
`
Backtick command substitution
\n
Embedded newlines
Sources: services/workspace-worker/executor.py:76-98
Validation Algorithm
Commands are validated by splitting on shell operators (&&, ||, ;, |) and checking each segment:
Sources: services/workspace-worker/executor.py:448-500
Sandboxed Environment
Subprocesses run with a stripped environment that prevents host variable leakage:
Host environment variables (e.g., AWS_ACCESS_KEY_ID, DATABASE_URL) are not inherited.
Sources: services/workspace-worker/executor.py:506-536
Output Limits
Command output is capped to prevent memory exhaustion:
STDOUT: 100KB (
MAX_STDOUT_BYTES)STDERR: 50KB (
MAX_STDERR_BYTES)
Truncated output sets truncated: true in the result dict.
Sources: services/workspace-worker/executor.py:100-106
File Operations
The WorkspaceToolExecutor provides sandboxed file operations that agents call via workspace_* tools:
Read File
Sources: services/workspace-worker/executor.py:230-252
Write File
Sources: services/workspace-worker/executor.py:254-270
List Directory
Sources: services/workspace-worker/executor.py:272-300
HTTP Endpoints for File Browsing
The worker's HTTP server exposes file operations for the code viewer widget:
/workspaces/{id}/files
GET
Directory listing
500 entries
Hides .ssh, .gitconfig, .aws, .task_env_*
/workspaces/{id}/files/content
GET
File content
2 MB
Hides sensitive paths
/workspaces/{id}/files/grep
GET
Search via grep -rn
200 matches
User-provided pattern + include glob
/workspaces/{id}/files/write
POST
Direct file write
No limit
Path safety enforced
Sensitive paths (.ssh, .gitconfig, .aws, .workspace_meta.json, .task_env_*) are blocked from file browsing to prevent credential leakage.
Sources: services/workspace-worker/main.py:525-759
GitHub Integration
OAuth-Authenticated Cloning
The GitHub integration uses Composio to retrieve OAuth tokens and inject them into clone URLs:
Sources: orchestrator/api/workspace_github.py:167-293, services/workspace-worker/executor.py:368-419
URL Validation (PRD-66)
Clone URLs are validated with strict rules to prevent injection attacks:
Sources: orchestrator/api/workspace_github.py:69-91
Git Clone Command Construction (PRD-70 FIX-01)
The worker uses -- separator to prevent argument injection:
This prevents attacks like --upload-pack=malicious-script being injected via branch names.
Sources: services/workspace-worker/executor.py:368-419
Repository Caching
Cloned repos are cached in repos/ for the lifetime of the workspace:
First clone:
git clone --depth 1(shallow clone for speed)Subsequent access:
git pull(updates existing clone)Metadata tracking: Workspace
.workspace_meta.jsonstoresrepos_cached: ["repo-name", ...]
The worker checks WorkspaceManager.repo_exists(repo_name) before cloning to decide between clone vs pull.
Sources: services/workspace-worker/executor.py:393-398
Agent Tools
Agents interact with workspaces via platform tools registered in the ActionRegistry. These tools route through the orchestrator's WorkspaceClient to the worker:
Tool Definitions
workspace_read_file
Read file content
read
path (relative to workspace root)
workspace_write_file
Write/create file
write
path, content
workspace_list_dir
Directory listing
read
path (default: .)
workspace_grep
Search file contents
read
pattern, path, include (glob), max_results
workspace_exec
Run shell command
write
command, cwd, timeout
workspace_git
Git operations
write
operation (enum), cwd, args
Sources: orchestrator/modules/tools/discovery/workspace_actions.py:15-248
Tool Routing Flow
Sources: orchestrator/modules/tools/tool_router.py:1-575, orchestrator/core/workspace_client.py:56-185
WorkspaceClient Methods
The WorkspaceClient is a thin HTTP wrapper around the worker's endpoints:
All methods return {"success": False, "error": "..."} on connection errors.
Sources: orchestrator/core/workspace_client.py:56-185
Security & Sandboxing
The workspace execution system implements defense-in-depth with five security layers:
Layer 1: URL Validation (PRD-66)
GitHub clone URLs are validated before task submission:
Scheme check: Only
https://is allowed (nogit://,ssh://,file://)Host allowlist: Only
github.com,gitlab.com,bitbucket.orgNo embedded credentials: URL must not contain
username:password@Branch name validation: No
..,@{, leading dashes (prevents--upload-packinjection)
Sources: orchestrator/api/workspace_github.py:69-91
Layer 2: Path Safety
All file/directory operations go through WorkspaceManager.resolve_safe_path():
This prevents attacks like:
../../../etc/passwd(directory traversal)/etc/passwd(absolute path)symlink-to-root(symlink escape)
Sources: services/workspace-worker/workspace_manager.py:228-253
Layer 3: Command Whitelist
Only approved binaries from ALLOWED_COMMANDS can execute. Path-based binaries are rejected:
Sources: services/workspace-worker/executor.py:35-73, services/workspace-worker/executor.py:448-500
Layer 4: Environment Sandboxing
Subprocesses run with a stripped environment:
PATH: Limited to
/usr/local/bin:/usr/bin:/bin(no/sbin, no user paths)HOME: Set to workspace root (not host user's home)
No host variables:
AWS_*,DATABASE_URL,SECRET_KEYare not inheritedGit isolation:
GIT_CONFIG_GLOBALpoints to workspace.gitconfig
Sources: services/workspace-worker/executor.py:506-536
Layer 5: Storage Quotas
Per-workspace disk usage is enforced to prevent exhaustion attacks:
Default quota: 5GB (
DEFAULT_QUOTA_GB)Checked before each task execution
Task fails immediately if over quota (no execution)
Usage calculation via
rglob("*")+sum(f.stat().st_size)
Sources: services/workspace-worker/workspace_manager.py:83-114
Sensitive Path Filtering
The worker's HTTP server blocks access to sensitive paths:
File browsing endpoints return 403 Forbidden if a path component matches.
Sources: services/workspace-worker/main.py:473-481, services/workspace-worker/main.py:546-548, services/workspace-worker/main.py:609-611
API Reference
Task Management API
Base URL: /api/tasks
/submit
POST
Submit workspace task
Workspace JWT
`` (list)
GET
List recent tasks
Workspace JWT
/{task_id}
GET
Get task detail + result
Workspace JWT
/{task_id}/cancel
POST
Cancel queued/running task
Workspace JWT
/{task_id}/events
GET
SSE stream of task events
Workspace JWT
Request: POST /api/tasks/submit
Response: POST /api/tasks/submit
Sources: orchestrator/api/tasks.py:1-404
Workspace Files API
Base URL: /api/workspaces/{workspace_id}
/files
GET
Directory listing
Workspace JWT
/files/content
GET
Read file content
Workspace JWT
/exec
POST
Run shell command
Workspace JWT
Request: GET /files?path=repos/my-app
Request: GET /files/content?path=repos/my-app/src/main.py
Sources: orchestrator/api/workspace_files.py:1-108
GitHub Integration API
Base URL: /api/workspaces/{workspace_id}/github
/repos
GET
List user's GitHub repos
Workspace JWT
/clone
POST
Clone repo into workspace
Workspace JWT
Request: POST /clone
Response: POST /clone
Sources: orchestrator/api/workspace_github.py:1-294
Worker HTTP API (Internal)
Base URL: http://workspace-worker:8081 (internal only, requires X-Internal-Token)
/health
GET
Health check (public)
/workspaces/{id}/files
GET
Directory listing
/workspaces/{id}/files/content
GET
File content
/workspaces/{id}/files/write
POST
Write file
/workspaces/{id}/files/grep
GET
Search files
/workspaces/{id}/exec
POST
Run command
/workspaces/{id}/git
POST
Git operation
Sources: services/workspace-worker/main.py:798-804
Configuration
Environment Variables
REDIS_URL
redis://localhost:6379/0
Redis connection for queues + pub/sub
DATABASE_URL
(required)
PostgreSQL connection for task_executions table
WORKSPACE_VOLUME_PATH
/workspaces
Persistent volume mount path
WORKSPACE_DEFAULT_QUOTA_GB
5
Default storage quota per workspace
WORKER_CONCURRENCY
3
Max concurrent task executions
WORKER_HEALTH_PORT
8081
HTTP server port for health checks + file API
WORKER_BIND_HOST
0.0.0.0
HTTP server bind address
WORKER_INTERNAL_TOKEN
(optional)
Bearer token for worker API auth
WORKER_INTERNAL_URL
http://workspace-worker:8081
Worker HTTP URL (orchestrator config)
Sources: services/workspace-worker/main.py:14-19, services/workspace-worker/workspace_manager.py:32-33, orchestrator/core/workspace_client.py:20-44
Docker Compose
The workspace worker runs as a separate service in the docker-compose.yml:
Sources: docker-compose.yml:1-282
Last updated

