Workspace Worker Architecture
The Workspace Worker is a standalone service that executes agent tasks in isolated filesystem environments. It operates independently from the orchestrator, consuming tasks from Redis queues and exposing an HTTP API for file operations. Each workspace gets its own persistent directory on a mounted volume, with sandboxed command execution, path safety validation, and storage quota enforcement.
For API endpoints that submit tasks to the worker, see Task Management. For GitHub repository integration via the worker, see GitHub Integration. For security policies and sandboxing details, see Security & Sandboxing.
Architecture Overview
The workspace worker implements a two-interface architecture: a Redis queue consumer for long-running background tasks, and an HTTP server for synchronous file operations.
Dual-Interface Design
Sources: services/workspace-worker/main.py:1-832, orchestrator/api/workspace_files.py:1-108, orchestrator/api/workspace_github.py:1-294
WorkspaceWorker Main Loop
The WorkspaceWorker class implements a priority-based task consumer with graceful shutdown, health reporting, and concurrent execution limits.
Task Consumer Architecture
Sources: services/workspace-worker/main.py:60-143, services/workspace-worker/main.py:149-204
Key Configuration
REDIS_URL
redis://localhost:6379/0
Redis connection string for task queues
WORKER_CONCURRENCY
3
Maximum concurrent task executions
WORKER_HEALTH_PORT
8081
HTTP server port for health checks and file API
WORKSPACE_VOLUME_PATH
//workspaces
Mount path for persistent workspace volume
WORKSPACE_DEFAULT_QUOTA_GB
5
Default storage quota per workspace
WORKER_INTERNAL_TOKEN
(empty)
Auth token for internal API calls (optional)
WORKER_BIND_HOST
0.0.0.0
HTTP server bind address
Sources: services/workspace-worker/main.py:16-19, services/workspace-worker/workspace_manager.py:32-33
Task Execution Lifecycle
Tasks move through a state machine from queued → running → completed/failed/timed_out/cancelled. The worker updates both Redis (for real-time tracking) and PostgreSQL (for persistent history) at each transition.
Task Execution Flow
Sources: services/workspace-worker/main.py:205-358, services/workspace-worker/main.py:363-416
WorkspaceManager: Filesystem Isolation
The WorkspaceManager class provides workspace provisioning, path safety, credential injection, and quota enforcement. Each workspace is a self-contained directory tree on the persistent volume.
Workspace Directory Structure
Sources: services/workspace-worker/workspace_manager.py:10-18
WorkspaceManager Methods
ensure_workspace_exists()
Create directory tree on first use
Creates repos/, tasks/, artifacts/ + .workspace_meta.json
check_quota()
Verify storage under WORKSPACE_DEFAULT_QUOTA_GB
Walks entire tree, returns False if over quota
create_task_dir(task_id)
Create ephemeral dir under tasks/
Returns tasks/task_{task_id}/ path
cleanup_task(task_id)
Remove ephemeral task dir + credentials
shutil.rmtree() task dir + .task_env_{task_id}
inject_credentials(task_id, creds)
Write SSH key, git config, env vars
.ssh/id_ed25519 with chmod 600
resolve_safe_path(relative)
Validate path stays within workspace
Blocks ../, symlinks, absolute paths, null bytes
get_repo_path(name)
Get path for a cached repo
Returns repos/{sanitized_name}/
list_repos()
List all cloned repos
Returns directory names from repos/
Sources: services/workspace-worker/workspace_manager.py:36-303
Path Safety Validation
The resolve_safe_path() method is the security boundary for all file operations. It prevents directory traversal attacks by ensuring resolved paths stay within the workspace root.
Sources: services/workspace-worker/workspace_manager.py:228-253
WorkspaceToolExecutor: Sandboxed Execution
The WorkspaceToolExecutor class enforces command whitelisting, output limits, timeouts, and environment sandboxing for all shell commands executed in the workspace.
Command Execution Flow
Sources: services/workspace-worker/executor.py:122-224, services/workspace-worker/executor.py:448-500
Command Whitelist
The ALLOWED_COMMANDS set contains only approved binaries. Any command not in this set is rejected before execution.
Selected Whitelisted Commands:
Version Control
git
Python
python, python3, pip, pip3, uv, pytest, ruff, black, mypy, isort, flake8, coverage
Node.js
node, npm, npx, pnpm, yarn, vitest, jest, tsc, eslint, prettier
File Operations
ls, cat, grep, find, tree, wc, sort, head, tail, diff, jq, sed, awk
System Tools
curl, wget, make, cmake, tar, gzip, zip, touch, mkdir, cp, mv, rm
Other Runtimes
cargo, go, ruby, java, mvn, gradle, rustc, gcc
Sources: services/workspace-worker/executor.py:35-73
Blocked Patterns
Even if a binary is whitelisted, commands matching these regex patterns are always rejected:
Sources: services/workspace-worker/executor.py:76-95
Sandboxed Environment
The _build_sandboxed_env() method strips the host environment and provides only essential variables:
Sources: services/workspace-worker/executor.py:506-536
HTTP API for File Operations
The worker exposes an HTTP server on port 8081 (configurable via WORKER_HEALTH_PORT) for synchronous file operations. The orchestrator proxies requests to this API via the WorkspaceClient.
HTTP Endpoints
Sources: services/workspace-worker/main.py:461-818
Endpoint Details
/health
GET
Health check
{status, worker_id, active_tasks, concurrency, volume_path}
/workspaces/{id}/files
GET
List directory (?path=.)
{path, entries: [{name, type, size, modified_at}], truncated}
/workspaces/{id}/files/content
GET
Read file content (?path=file.py)
{path, name, content, size, language, mime_type}
/workspaces/{id}/exec
POST
Execute command
{exit_code, stdout, stderr, duration_ms, truncated}
/workspaces/{id}/files/write
POST
Write file
{written, path, size_bytes}
/workspaces/{id}/files/grep
GET
Search pattern (?pattern=TODO)
{matches: [{file, line, content}], total, truncated}
/workspaces/{id}/git
POST
Git operation
{exit_code, stdout, stderr, duration_ms}
Sources: services/workspace-worker/main.py:516-818
Internal Authentication
If WORKER_INTERNAL_TOKEN is set, the HTTP server requires an X-Internal-Token header on all non-health requests. The orchestrator includes this token via WorkspaceClient._get_client().
Sources: services/workspace-worker/main.py:501-512, orchestrator/core/workspace_client.py:28-39
Integration with Orchestrator
The orchestrator interacts with the workspace worker through two channels: task submission (Redis) and file operations (HTTP).
Orchestrator-Worker Communication
Sources: orchestrator/core/workspace_client.py:56-176, orchestrator/api/tasks.py:62-174
WorkspaceClient Methods
The WorkspaceClient class provides an async HTTP client interface to the worker's file operations:
Sources: orchestrator/core/workspace_client.py:56-185
Task Submission Flow
Tasks can be submitted via two routes: direct API submission (/api/tasks/submit) or agent-initiated operations (GitHub clone, recipe step execution).
Task Submission Sequence
Sources: orchestrator/api/tasks.py:62-174, orchestrator/api/workspace_github.py:167-293
Redis Key Patterns
The worker uses Redis for task coordination, status tracking, and real-time events. All keys have TTL to prevent unbounded growth.
Task-Related Keys
workspace:task:{task_id}:status
Hash
7200s
Task status, worker_id, timestamps
workspace:task:{task_id}:result
String (JSON)
3600s
Final execution result
workspace:task:{task_id}:events
Pub/Sub
N/A
Real-time event stream (SSE)
workspace:ws:{workspace_id}:active_tasks
Set
∞
Currently running tasks for workspace
workspace:worker:{worker_id}:heartbeat
String
60s
Worker health timestamp
workspace:worker:{worker_id}:tasks
Set
60s
Task IDs this worker is executing
Queue Keys
workspace:tasks:critical
List
1 (highest)
Critical-priority tasks
workspace:tasks:high
List
2
High-priority tasks
workspace:tasks:normal
List
3
Normal-priority tasks (default)
workspace:tasks:low
List
4 (lowest)
Low-priority tasks
Sources: services/workspace-worker/main.py:44-58
Health Check and Heartbeat
The worker reports health via two mechanisms: an HTTP /health endpoint and periodic heartbeat updates to Redis.
Health Server
Sources: services/workspace-worker/main.py:516-523
Heartbeat Loop
The worker updates Redis every 30 seconds with its health status and active task IDs:
Sources: services/workspace-worker/main.py:440-459
Graceful Shutdown
The worker handles SIGTERM and SIGINT by setting the _running flag to False, which stops the consume loop. It then waits for all active tasks to complete before closing connections.
Sources: services/workspace-worker/main.py:144-142
Sensitive Path Filtering
The HTTP server blocks access to sensitive files and directories that should never be exposed via file browsing or reading endpoints.
When listing directories or reading files, the server traverses the path components and rejects access if any part is sensitive:
Sources: services/workspace-worker/main.py:473-481, services/workspace-worker/main.py:545-548, services/workspace-worker/main.py:609-611
Docker Compose Integration
The workspace worker is orchestrated via docker-compose.yml with the workers profile. It mounts the workspace_data volume and exposes port 8081 for internal HTTP API access.
Sources: docker-compose.yml:1-282 (workspace-worker service definition)
Summary Table: Component Responsibilities
WorkspaceWorker
Main consumer loop, task orchestration, HTTP server
WorkspaceManager
Filesystem provisioning, path safety, quota enforcement
WorkspaceToolExecutor
Command execution, validation, sandboxing
WorkspaceClient
Async HTTP client for orchestrator → worker calls
Sources: All files listed in table
Last updated

