GitHub Integration
Purpose and Scope
The GitHub Integration subsystem enables agents and users to browse and clone GitHub repositories directly into workspace directories via Composio's OAuth authentication. This provides agents with the ability to read source code, execute tests, and make modifications to repositories within the sandboxed workspace environment.
This document covers repository listing, authenticated cloning, and security validation. For general workspace file operations once a repository is cloned, see File Operations. For executing commands within cloned repositories, see Command Execution.
Architecture Overview
GitHub integration follows a three-tier architecture: the frontend initiates requests, the orchestrator validates and queues them, and the workspace worker executes the actual git operations with injected OAuth credentials.
System Flow Diagram
Sources: orchestrator/api/workspace_github.py:1-294, services/workspace-worker/main.py:227-358, services/workspace-worker/executor.py:368-419
Repository Browsing
Repository listing is proxied through Composio's GitHub integration to retrieve the authenticated user's accessible repositories with OAuth scope enforcement.
Composio Entity Resolution
Before making any GitHub API calls, the system resolves the workspace's Composio entity_id via the EntityManager:
The _get_entity_id helper function orchestrator/api/workspace_github.py:47-58 throws HTTP 404 if the workspace has not connected GitHub via Composio, ensuring early failure with a clear error message:
Sources: orchestrator/api/workspace_github.py:47-58, orchestrator/core/composio/entity_manager.py
List Repositories Endpoint
Endpoint: GET /api/workspaces/{workspace_id}/github/repos
Query Parameters:
page(default: 1, min: 1)per_page(default: 30, min: 1, max: 100)
Response Structure:
repos
array
List of repository objects
page
int
Current page number
per_page
int
Items per page
Repository Object Fields:
name
string
Repository short name
full_name
string
Owner/repo format
url
string
Clone URL (HTTPS)
description
string
Repository description
default_branch
string
Default branch (e.g., "main")
private
boolean
Private repository flag
language
string
Primary language
updated_at
string
Last update timestamp (ISO 8601)
The implementation orchestrator/api/workspace_github.py:97-161 handles Composio's nested response structure, extracting the repository list from result.data.data.repositories and normalizing it into a flat array.
Sources: orchestrator/api/workspace_github.py:97-161
Repository Cloning
Repository cloning is implemented as an asynchronous task executed by the workspace worker with OAuth token injection for private repository access.
Clone Request Validation
The CloneRequest model orchestrator/api/workspace_github.py:65-92 enforces strict validation rules via Pydantic validators:
URL Validation
Allowed Hosts: github.com, gitlab.com, bitbucket.org
Validation rules enforced by validate_repo_url orchestrator/api/workspace_github.py:69-79:
Branch Name Validation
The validate_branch method orchestrator/api/workspace_github.py:81-91 prevents injection attacks via branch names:
This prevents attacks like --upload-pack=/tmp/malicious being passed to git commands.
Sources: orchestrator/api/workspace_github.py:65-92
OAuth Token Injection
For private repositories, the clone endpoint attempts to retrieve the user's GitHub OAuth token from Composio and inject it into the HTTPS clone URL:
The token injection code orchestrator/api/workspace_github.py:193-211 modifies the clone URL before task submission:
If token retrieval fails (e.g., OAuth not connected), the system logs a warning and proceeds with unauthenticated cloning, which works for public repositories.
Sources: orchestrator/api/workspace_github.py:193-211
Clone Endpoint Implementation
Endpoint: POST /api/workspaces/{workspace_id}/github/clone
Request Body:
Response:
The clone operation is not executed synchronously. Instead, it creates a background task and returns immediately, allowing the frontend to poll or stream events via SSE.
Sources: orchestrator/api/workspace_github.py:167-293
Task Submission Flow
Clone requests follow an atomic two-phase submission pattern to prevent race conditions between the database and Redis queue.
Atomic Task Submission Sequence
Critical Ordering: The database row is inserted before Redis enqueue orchestrator/api/workspace_github.py:238-256. This prevents a race condition where a worker picks up a task that has no corresponding database record.
If Redis enqueue fails after the database commit, the endpoint marks the task as failed in the database and returns HTTP 503 orchestrator/api/workspace_github.py:275-282:
Sources: orchestrator/api/workspace_github.py:236-283
Task Payload Structure
The clone task payload orchestrator/api/workspace_github.py:216-234 follows the standard workspace task format:
The OAuth token (if retrieved) is embedded directly in the repo URL, not in a separate credentials field. This allows the worker to clone the repository without additional Composio API calls.
Sources: orchestrator/api/workspace_github.py:216-234
Worker-Side Execution
The workspace worker executes the git_clone action with additional security validations and intelligent caching.
Git Clone Implementation
The _git_clone method services/workspace-worker/executor.py:368-419 implements the following flow:
Sources: services/workspace-worker/executor.py:368-419
PRD-70 Security Fix: Argument Injection Prevention
The clone implementation includes a critical security fix documented in PRD-70 FIX-01. The vulnerability was that unvalidated branch names could be used to inject git arguments like --upload-pack=/tmp/malicious.
Defense-in-Depth Strategy:
Orchestrator-side validation orchestrator/api/workspace_github.py:81-91: Reject invalid branch names before task submission
Worker-side validation services/workspace-worker/executor.py:380-386: Redundant check in case orchestrator validation is bypassed
Git separator
--services/workspace-worker/executor.py:406: Marks end of options, treating all subsequent arguments as positional
Sources: services/workspace-worker/executor.py:368-419
Repository Caching Strategy
Cloned repositories are stored persistently in the workspace's repos/ directory services/workspace-worker/workspace_manager.py:259-262:
If a repository with the same name already exists, the worker automatically switches from git clone to git pull services/workspace-worker/executor.py:395-398, updating the cached repository instead of re-cloning. This optimization reduces bandwidth and speeds up repeated executions.
Sources: services/workspace-worker/executor.py:395-398, services/workspace-worker/workspace_manager.py:259-274
Frontend Integration
The RepoSelector component provides a modal dialog for browsing and cloning GitHub repositories.
Component Architecture
Sources: frontend/components/widgets/CodingCanvasWidget/RepoSelector.tsx:1-182
Repository Display
Repositories are displayed with the following visual indicators frontend/components/widgets/CodingCanvasWidget/RepoSelector.tsx:144-176:
Lock icon (
Lock): Private repositoryGlobe icon (
Globe): Public repositoryLanguage badge: Primary programming language
Loader spinner: Indicates cloning in progress
The filter input frontend/components/widgets/CodingCanvasWidget/RepoSelector.tsx:90-94 searches both full_name and description fields case-insensitively:
Sources: frontend/components/widgets/CodingCanvasWidget/RepoSelector.tsx:90-176
Clone Callback Pattern
When a clone is initiated, the component calls the onCloneStarted callback with the task ID frontend/components/widgets/CodingCanvasWidget/RepoSelector.tsx:69-88:
The parent component (typically CodingCanvasWidget) is responsible for:
Subscribing to task events via SSE
Updating UI with clone progress
Refreshing the file browser when clone completes
Sources: frontend/components/widgets/CodingCanvasWidget/RepoSelector.tsx:69-88
Security Considerations
The GitHub integration implements multiple layers of security validation:
URL Security
HTTPS-only
CloneRequest.validate_repo_url
Prevents protocol downgrade attacks
Host allowlist
_ALLOWED_CLONE_HOSTS
Restricts to known safe hosts
No embedded credentials
URL parser check
Prevents credential leakage in logs
Sources: orchestrator/api/workspace_github.py:36-79
Branch Name Security
Alphanumeric + ./_- only
[A-Za-z0-9._/\-]+
--upload-pack, ../escape
No path traversal
.. detection
../../etc/passwd
No git reflog syntax
@{ detection
@{-1}, @{push}
Sources: orchestrator/api/workspace_github.py:39-91, services/workspace-worker/executor.py:366-386
Worker Environment Sandboxing
The workspace worker runs with a stripped environment services/workspace-worker/executor.py:506-536:
Restricted PATH: Only standard system paths
Isolated HOME: Set to workspace root
Custom SSH config: Points to workspace
.ssh/directoryNo host environment leakage: Parent process env vars are not inherited
Git operations use the sandboxed environment, preventing access to host credentials or configurations.
Sources: services/workspace-worker/executor.py:506-536
Error Handling
Orchestrator Error Cases
Workspace access denied
403
"Workspace access denied"
No Composio entity
404
"No Composio entity found for this workspace. Connect GitHub first."
Composio SDK not installed
501
"Composio SDK not installed"
GitHub API error
502
"GitHub API error: {error_msg}"
Redis enqueue failure
503
"Failed to enqueue task to worker"
Sources: orchestrator/api/workspace_github.py:97-293
Worker Error Cases
Invalid branch name
Return error immediately
1
Git clone fails
Return stderr output
Non-zero
Timeout (300s)
Kill process, return timeout error
-1
Worker errors are stored in both Redis (workspace:task:{id}:result) and PostgreSQL (task_executions.error_message) for reliable retrieval.
Sources: services/workspace-worker/main.py:342-353, services/workspace-worker/executor.py:368-419
API Reference
List GitHub Repositories
Endpoint: GET /api/workspaces/{workspace_id}/github/repos
Authentication: Requires valid workspace context (JWT + X-Workspace-ID header)
Query Parameters:
page
integer
1
≥ 1
Page number for pagination
per_page
integer
30
1-100
Items per page
Success Response (200 OK):
Error Responses:
403 Forbidden: Workspace access denied
404 Not Found: No Composio entity configured (GitHub not connected)
501 Not Implemented: Composio SDK not available
502 Bad Gateway: GitHub API returned an error
Sources: orchestrator/api/workspace_github.py:97-161
Clone GitHub Repository
Endpoint: POST /api/workspaces/{workspace_id}/github/clone
Authentication: Requires valid workspace context + queued task runner backend
Request Body:
Request Body Fields:
repo_url
string
Yes
HTTPS only, allowed hosts, no embedded credentials
branch
string
No
Alphanumeric + ./_-, no .. or @{
Success Response (200 OK):
The response includes a task_id that can be used to:
Poll task status:
GET /api/tasks/{task_id}Stream live updates:
GET /api/tasks/{task_id}/events(SSE)Cancel task:
POST /api/tasks/{task_id}/cancel
Error Responses:
400 Bad Request: Invalid URL or branch name, or wrong task runner backend
403 Forbidden: Workspace access denied
404 Not Found: No Composio entity configured
503 Service Unavailable: Redis queue unavailable
Sources: orchestrator/api/workspace_github.py:167-293
Task Event Stream Format
Once a clone task is queued, subscribe to its event stream:
Endpoint: GET /api/tasks/{task_id}/events
Response Type: text/event-stream
Event Types:
For complete task management API documentation, see Workspace API Reference.
Sources: orchestrator/api/tasks.py:349-403
Last updated

