Workspace Worker Architecture

chevron-rightRelevant source fileshashtag

The Workspace Worker is a standalone service that executes agent tasks in isolated filesystem environments. It operates independently from the orchestrator, consuming tasks from Redis queues and exposing an HTTP API for file operations. Each workspace gets its own persistent directory on a mounted volume, with sandboxed command execution, path safety validation, and storage quota enforcement.

For API endpoints that submit tasks to the worker, see Task Management. For GitHub repository integration via the worker, see GitHub Integration. For security policies and sandboxing details, see Security & Sandboxing.


Architecture Overview

The workspace worker implements a two-interface architecture: a Redis queue consumer for long-running background tasks, and an HTTP server for synchronous file operations.

Dual-Interface Design

spinner

Sources: services/workspace-worker/main.py:1-832, orchestrator/api/workspace_files.py:1-108, orchestrator/api/workspace_github.py:1-294


WorkspaceWorker Main Loop

The WorkspaceWorker class implements a priority-based task consumer with graceful shutdown, health reporting, and concurrent execution limits.

Task Consumer Architecture

spinner

Sources: services/workspace-worker/main.py:60-143, services/workspace-worker/main.py:149-204

Key Configuration

Environment Variable
Default
Purpose

REDIS_URL

redis://localhost:6379/0

Redis connection string for task queues

WORKER_CONCURRENCY

3

Maximum concurrent task executions

WORKER_HEALTH_PORT

8081

HTTP server port for health checks and file API

WORKSPACE_VOLUME_PATH

//workspaces

Mount path for persistent workspace volume

WORKSPACE_DEFAULT_QUOTA_GB

5

Default storage quota per workspace

WORKER_INTERNAL_TOKEN

(empty)

Auth token for internal API calls (optional)

WORKER_BIND_HOST

0.0.0.0

HTTP server bind address

Sources: services/workspace-worker/main.py:16-19, services/workspace-worker/workspace_manager.py:32-33


Task Execution Lifecycle

Tasks move through a state machine from queuedrunningcompleted/failed/timed_out/cancelled. The worker updates both Redis (for real-time tracking) and PostgreSQL (for persistent history) at each transition.

Task Execution Flow

spinner

Sources: services/workspace-worker/main.py:205-358, services/workspace-worker/main.py:363-416


WorkspaceManager: Filesystem Isolation

The WorkspaceManager class provides workspace provisioning, path safety, credential injection, and quota enforcement. Each workspace is a self-contained directory tree on the persistent volume.

Workspace Directory Structure

Sources: services/workspace-worker/workspace_manager.py:10-18

WorkspaceManager Methods

Method
Purpose
Security

ensure_workspace_exists()

Create directory tree on first use

Creates repos/, tasks/, artifacts/ + .workspace_meta.json

check_quota()

Verify storage under WORKSPACE_DEFAULT_QUOTA_GB

Walks entire tree, returns False if over quota

create_task_dir(task_id)

Create ephemeral dir under tasks/

Returns tasks/task_{task_id}/ path

cleanup_task(task_id)

Remove ephemeral task dir + credentials

shutil.rmtree() task dir + .task_env_{task_id}

inject_credentials(task_id, creds)

Write SSH key, git config, env vars

.ssh/id_ed25519 with chmod 600

resolve_safe_path(relative)

Validate path stays within workspace

Blocks ../, symlinks, absolute paths, null bytes

get_repo_path(name)

Get path for a cached repo

Returns repos/{sanitized_name}/

list_repos()

List all cloned repos

Returns directory names from repos/

Sources: services/workspace-worker/workspace_manager.py:36-303

Path Safety Validation

The resolve_safe_path() method is the security boundary for all file operations. It prevents directory traversal attacks by ensuring resolved paths stay within the workspace root.

Sources: services/workspace-worker/workspace_manager.py:228-253


WorkspaceToolExecutor: Sandboxed Execution

The WorkspaceToolExecutor class enforces command whitelisting, output limits, timeouts, and environment sandboxing for all shell commands executed in the workspace.

Command Execution Flow

spinner

Sources: services/workspace-worker/executor.py:122-224, services/workspace-worker/executor.py:448-500

Command Whitelist

The ALLOWED_COMMANDS set contains only approved binaries. Any command not in this set is rejected before execution.

Selected Whitelisted Commands:

Category
Commands

Version Control

git

Python

python, python3, pip, pip3, uv, pytest, ruff, black, mypy, isort, flake8, coverage

Node.js

node, npm, npx, pnpm, yarn, vitest, jest, tsc, eslint, prettier

File Operations

ls, cat, grep, find, tree, wc, sort, head, tail, diff, jq, sed, awk

System Tools

curl, wget, make, cmake, tar, gzip, zip, touch, mkdir, cp, mv, rm

Other Runtimes

cargo, go, ruby, java, mvn, gradle, rustc, gcc

Sources: services/workspace-worker/executor.py:35-73

Blocked Patterns

Even if a binary is whitelisted, commands matching these regex patterns are always rejected:

Sources: services/workspace-worker/executor.py:76-95

Sandboxed Environment

The _build_sandboxed_env() method strips the host environment and provides only essential variables:

Sources: services/workspace-worker/executor.py:506-536


HTTP API for File Operations

The worker exposes an HTTP server on port 8081 (configurable via WORKER_HEALTH_PORT) for synchronous file operations. The orchestrator proxies requests to this API via the WorkspaceClient.

HTTP Endpoints

spinner

Sources: services/workspace-worker/main.py:461-818

Endpoint Details

Endpoint
Method
Purpose
Returns

/health

GET

Health check

{status, worker_id, active_tasks, concurrency, volume_path}

/workspaces/{id}/files

GET

List directory (?path=.)

{path, entries: [{name, type, size, modified_at}], truncated}

/workspaces/{id}/files/content

GET

Read file content (?path=file.py)

{path, name, content, size, language, mime_type}

/workspaces/{id}/exec

POST

Execute command

{exit_code, stdout, stderr, duration_ms, truncated}

/workspaces/{id}/files/write

POST

Write file

{written, path, size_bytes}

/workspaces/{id}/files/grep

GET

Search pattern (?pattern=TODO)

{matches: [{file, line, content}], total, truncated}

/workspaces/{id}/git

POST

Git operation

{exit_code, stdout, stderr, duration_ms}

Sources: services/workspace-worker/main.py:516-818

Internal Authentication

If WORKER_INTERNAL_TOKEN is set, the HTTP server requires an X-Internal-Token header on all non-health requests. The orchestrator includes this token via WorkspaceClient._get_client().

Sources: services/workspace-worker/main.py:501-512, orchestrator/core/workspace_client.py:28-39


Integration with Orchestrator

The orchestrator interacts with the workspace worker through two channels: task submission (Redis) and file operations (HTTP).

Orchestrator-Worker Communication

spinner

Sources: orchestrator/core/workspace_client.py:56-176, orchestrator/api/tasks.py:62-174

WorkspaceClient Methods

The WorkspaceClient class provides an async HTTP client interface to the worker's file operations:

Sources: orchestrator/core/workspace_client.py:56-185


Task Submission Flow

Tasks can be submitted via two routes: direct API submission (/api/tasks/submit) or agent-initiated operations (GitHub clone, recipe step execution).

Task Submission Sequence

spinner

Sources: orchestrator/api/tasks.py:62-174, orchestrator/api/workspace_github.py:167-293


Redis Key Patterns

The worker uses Redis for task coordination, status tracking, and real-time events. All keys have TTL to prevent unbounded growth.

Key Pattern
Type
TTL
Purpose

workspace:task:{task_id}:status

Hash

7200s

Task status, worker_id, timestamps

workspace:task:{task_id}:result

String (JSON)

3600s

Final execution result

workspace:task:{task_id}:events

Pub/Sub

N/A

Real-time event stream (SSE)

workspace:ws:{workspace_id}:active_tasks

Set

Currently running tasks for workspace

workspace:worker:{worker_id}:heartbeat

String

60s

Worker health timestamp

workspace:worker:{worker_id}:tasks

Set

60s

Task IDs this worker is executing

Queue Keys

Key
Type
Priority
Purpose

workspace:tasks:critical

List

1 (highest)

Critical-priority tasks

workspace:tasks:high

List

2

High-priority tasks

workspace:tasks:normal

List

3

Normal-priority tasks (default)

workspace:tasks:low

List

4 (lowest)

Low-priority tasks

Sources: services/workspace-worker/main.py:44-58


Health Check and Heartbeat

The worker reports health via two mechanisms: an HTTP /health endpoint and periodic heartbeat updates to Redis.

Health Server

Sources: services/workspace-worker/main.py:516-523

Heartbeat Loop

The worker updates Redis every 30 seconds with its health status and active task IDs:

Sources: services/workspace-worker/main.py:440-459


Graceful Shutdown

The worker handles SIGTERM and SIGINT by setting the _running flag to False, which stops the consume loop. It then waits for all active tasks to complete before closing connections.

Sources: services/workspace-worker/main.py:144-142


Sensitive Path Filtering

The HTTP server blocks access to sensitive files and directories that should never be exposed via file browsing or reading endpoints.

When listing directories or reading files, the server traverses the path components and rejects access if any part is sensitive:

Sources: services/workspace-worker/main.py:473-481, services/workspace-worker/main.py:545-548, services/workspace-worker/main.py:609-611


Docker Compose Integration

The workspace worker is orchestrated via docker-compose.yml with the workers profile. It mounts the workspace_data volume and exposes port 8081 for internal HTTP API access.

Sources: docker-compose.yml:1-282 (workspace-worker service definition)


Summary Table: Component Responsibilities

Component
File
Responsibility

WorkspaceWorker

Main consumer loop, task orchestration, HTTP server

WorkspaceManager

Filesystem provisioning, path safety, quota enforcement

WorkspaceToolExecutor

Command execution, validation, sandboxing

WorkspaceClient

Async HTTP client for orchestrator → worker calls

Task API

Task submission, listing, cancellation, SSE streaming

Files API

Proxy endpoints for file browsing from frontend

GitHub API

Repository listing and cloning via Composio

Sources: All files listed in table


Last updated