GitHub Integration

chevron-rightRelevant source fileshashtag

Purpose and Scope

The GitHub Integration subsystem enables agents and users to browse and clone GitHub repositories directly into workspace directories via Composio's OAuth authentication. This provides agents with the ability to read source code, execute tests, and make modifications to repositories within the sandboxed workspace environment.

This document covers repository listing, authenticated cloning, and security validation. For general workspace file operations once a repository is cloned, see File Operations. For executing commands within cloned repositories, see Command Execution.


Architecture Overview

GitHub integration follows a three-tier architecture: the frontend initiates requests, the orchestrator validates and queues them, and the workspace worker executes the actual git operations with injected OAuth credentials.

System Flow Diagram

spinner

Sources: orchestrator/api/workspace_github.py:1-294, services/workspace-worker/main.py:227-358, services/workspace-worker/executor.py:368-419


Repository Browsing

Repository listing is proxied through Composio's GitHub integration to retrieve the authenticated user's accessible repositories with OAuth scope enforcement.

Composio Entity Resolution

Before making any GitHub API calls, the system resolves the workspace's Composio entity_id via the EntityManager:

spinner

The _get_entity_id helper function orchestrator/api/workspace_github.py:47-58 throws HTTP 404 if the workspace has not connected GitHub via Composio, ensuring early failure with a clear error message:

Sources: orchestrator/api/workspace_github.py:47-58, orchestrator/core/composio/entity_manager.py

List Repositories Endpoint

Endpoint: GET /api/workspaces/{workspace_id}/github/repos

Query Parameters:

  • page (default: 1, min: 1)

  • per_page (default: 30, min: 1, max: 100)

Response Structure:

Field
Type
Description

repos

array

List of repository objects

page

int

Current page number

per_page

int

Items per page

Repository Object Fields:

Field
Type
Description

name

string

Repository short name

full_name

string

Owner/repo format

url

string

Clone URL (HTTPS)

description

string

Repository description

default_branch

string

Default branch (e.g., "main")

private

boolean

Private repository flag

language

string

Primary language

updated_at

string

Last update timestamp (ISO 8601)

The implementation orchestrator/api/workspace_github.py:97-161 handles Composio's nested response structure, extracting the repository list from result.data.data.repositories and normalizing it into a flat array.

Sources: orchestrator/api/workspace_github.py:97-161


Repository Cloning

Repository cloning is implemented as an asynchronous task executed by the workspace worker with OAuth token injection for private repository access.

Clone Request Validation

The CloneRequest model orchestrator/api/workspace_github.py:65-92 enforces strict validation rules via Pydantic validators:

URL Validation

Allowed Hosts: github.com, gitlab.com, bitbucket.org

Validation rules enforced by validate_repo_url orchestrator/api/workspace_github.py:69-79:

Branch Name Validation

The validate_branch method orchestrator/api/workspace_github.py:81-91 prevents injection attacks via branch names:

This prevents attacks like --upload-pack=/tmp/malicious being passed to git commands.

Sources: orchestrator/api/workspace_github.py:65-92

OAuth Token Injection

For private repositories, the clone endpoint attempts to retrieve the user's GitHub OAuth token from Composio and inject it into the HTTPS clone URL:

spinner

The token injection code orchestrator/api/workspace_github.py:193-211 modifies the clone URL before task submission:

If token retrieval fails (e.g., OAuth not connected), the system logs a warning and proceeds with unauthenticated cloning, which works for public repositories.

Sources: orchestrator/api/workspace_github.py:193-211

Clone Endpoint Implementation

Endpoint: POST /api/workspaces/{workspace_id}/github/clone

Request Body:

Response:

The clone operation is not executed synchronously. Instead, it creates a background task and returns immediately, allowing the frontend to poll or stream events via SSE.

Sources: orchestrator/api/workspace_github.py:167-293


Task Submission Flow

Clone requests follow an atomic two-phase submission pattern to prevent race conditions between the database and Redis queue.

Atomic Task Submission Sequence

spinner

Critical Ordering: The database row is inserted before Redis enqueue orchestrator/api/workspace_github.py:238-256. This prevents a race condition where a worker picks up a task that has no corresponding database record.

If Redis enqueue fails after the database commit, the endpoint marks the task as failed in the database and returns HTTP 503 orchestrator/api/workspace_github.py:275-282:

Sources: orchestrator/api/workspace_github.py:236-283

Task Payload Structure

The clone task payload orchestrator/api/workspace_github.py:216-234 follows the standard workspace task format:

The OAuth token (if retrieved) is embedded directly in the repo URL, not in a separate credentials field. This allows the worker to clone the repository without additional Composio API calls.

Sources: orchestrator/api/workspace_github.py:216-234


Worker-Side Execution

The workspace worker executes the git_clone action with additional security validations and intelligent caching.

Git Clone Implementation

The _git_clone method services/workspace-worker/executor.py:368-419 implements the following flow:

spinner

Sources: services/workspace-worker/executor.py:368-419

PRD-70 Security Fix: Argument Injection Prevention

The clone implementation includes a critical security fix documented in PRD-70 FIX-01. The vulnerability was that unvalidated branch names could be used to inject git arguments like --upload-pack=/tmp/malicious.

Defense-in-Depth Strategy:

  1. Orchestrator-side validation orchestrator/api/workspace_github.py:81-91: Reject invalid branch names before task submission

  2. Worker-side validation services/workspace-worker/executor.py:380-386: Redundant check in case orchestrator validation is bypassed

  3. Git separator -- services/workspace-worker/executor.py:406: Marks end of options, treating all subsequent arguments as positional

Sources: services/workspace-worker/executor.py:368-419

Repository Caching Strategy

Cloned repositories are stored persistently in the workspace's repos/ directory services/workspace-worker/workspace_manager.py:259-262:

If a repository with the same name already exists, the worker automatically switches from git clone to git pull services/workspace-worker/executor.py:395-398, updating the cached repository instead of re-cloning. This optimization reduces bandwidth and speeds up repeated executions.

Sources: services/workspace-worker/executor.py:395-398, services/workspace-worker/workspace_manager.py:259-274


Frontend Integration

The RepoSelector component provides a modal dialog for browsing and cloning GitHub repositories.

Component Architecture

spinner

Sources: frontend/components/widgets/CodingCanvasWidget/RepoSelector.tsx:1-182

Repository Display

Repositories are displayed with the following visual indicators frontend/components/widgets/CodingCanvasWidget/RepoSelector.tsx:144-176:

  • Lock icon (Lock): Private repository

  • Globe icon (Globe): Public repository

  • Language badge: Primary programming language

  • Loader spinner: Indicates cloning in progress

The filter input frontend/components/widgets/CodingCanvasWidget/RepoSelector.tsx:90-94 searches both full_name and description fields case-insensitively:

Sources: frontend/components/widgets/CodingCanvasWidget/RepoSelector.tsx:90-176

Clone Callback Pattern

When a clone is initiated, the component calls the onCloneStarted callback with the task ID frontend/components/widgets/CodingCanvasWidget/RepoSelector.tsx:69-88:

The parent component (typically CodingCanvasWidget) is responsible for:

  1. Subscribing to task events via SSE

  2. Updating UI with clone progress

  3. Refreshing the file browser when clone completes

Sources: frontend/components/widgets/CodingCanvasWidget/RepoSelector.tsx:69-88


Security Considerations

The GitHub integration implements multiple layers of security validation:

URL Security

Validation
Location
Purpose

HTTPS-only

CloneRequest.validate_repo_url

Prevents protocol downgrade attacks

Host allowlist

_ALLOWED_CLONE_HOSTS

Restricts to known safe hosts

No embedded credentials

URL parser check

Prevents credential leakage in logs

Sources: orchestrator/api/workspace_github.py:36-79

Branch Name Security

Validation
Pattern
Blocked Examples

Alphanumeric + ./_- only

[A-Za-z0-9._/\-]+

--upload-pack, ../escape

No path traversal

.. detection

../../etc/passwd

No git reflog syntax

@{ detection

@{-1}, @{push}

Sources: orchestrator/api/workspace_github.py:39-91, services/workspace-worker/executor.py:366-386

Worker Environment Sandboxing

The workspace worker runs with a stripped environment services/workspace-worker/executor.py:506-536:

  • Restricted PATH: Only standard system paths

  • Isolated HOME: Set to workspace root

  • Custom SSH config: Points to workspace .ssh/ directory

  • No host environment leakage: Parent process env vars are not inherited

Git operations use the sandboxed environment, preventing access to host credentials or configurations.

Sources: services/workspace-worker/executor.py:506-536


Error Handling

Orchestrator Error Cases

Error Condition
HTTP Status
Error Message

Workspace access denied

403

"Workspace access denied"

No Composio entity

404

"No Composio entity found for this workspace. Connect GitHub first."

Composio SDK not installed

501

"Composio SDK not installed"

GitHub API error

502

"GitHub API error: {error_msg}"

Redis enqueue failure

503

"Failed to enqueue task to worker"

Sources: orchestrator/api/workspace_github.py:97-293

Worker Error Cases

Error Condition
Behavior
Exit Code

Invalid branch name

Return error immediately

1

Git clone fails

Return stderr output

Non-zero

Timeout (300s)

Kill process, return timeout error

-1

Worker errors are stored in both Redis (workspace:task:{id}:result) and PostgreSQL (task_executions.error_message) for reliable retrieval.

Sources: services/workspace-worker/main.py:342-353, services/workspace-worker/executor.py:368-419


API Reference

List GitHub Repositories

Endpoint: GET /api/workspaces/{workspace_id}/github/repos

Authentication: Requires valid workspace context (JWT + X-Workspace-ID header)

Query Parameters:

Parameter
Type
Default
Constraints
Description

page

integer

1

≥ 1

Page number for pagination

per_page

integer

30

1-100

Items per page

Success Response (200 OK):

Error Responses:

  • 403 Forbidden: Workspace access denied

  • 404 Not Found: No Composio entity configured (GitHub not connected)

  • 501 Not Implemented: Composio SDK not available

  • 502 Bad Gateway: GitHub API returned an error

Sources: orchestrator/api/workspace_github.py:97-161

Clone GitHub Repository

Endpoint: POST /api/workspaces/{workspace_id}/github/clone

Authentication: Requires valid workspace context + queued task runner backend

Request Body:

Request Body Fields:

Field
Type
Required
Validation

repo_url

string

Yes

HTTPS only, allowed hosts, no embedded credentials

branch

string

No

Alphanumeric + ./_-, no .. or @{

Success Response (200 OK):

The response includes a task_id that can be used to:

  • Poll task status: GET /api/tasks/{task_id}

  • Stream live updates: GET /api/tasks/{task_id}/events (SSE)

  • Cancel task: POST /api/tasks/{task_id}/cancel

Error Responses:

  • 400 Bad Request: Invalid URL or branch name, or wrong task runner backend

  • 403 Forbidden: Workspace access denied

  • 404 Not Found: No Composio entity configured

  • 503 Service Unavailable: Redis queue unavailable

Sources: orchestrator/api/workspace_github.py:167-293

Task Event Stream Format

Once a clone task is queued, subscribe to its event stream:

Endpoint: GET /api/tasks/{task_id}/events

Response Type: text/event-stream

Event Types:

For complete task management API documentation, see Workspace API Reference.

Sources: orchestrator/api/tasks.py:349-403


Last updated