Production Deployment

Relevant source files

The following files were used as context for generating this wiki page:

This page covers production deployment strategies for Automatos AI, with specific guidance for Railway deployment and general best practices applicable to any cloud platform. For local development setup, see Getting Started. For Docker containerization details, see Docker Containerization. For environment variable reference, see Environment Variables.

Deployment Overview

Automatos AI supports production deployment on cloud platforms using Docker containers. The system uses multi-stage Docker builds with optimized production images, centralized configuration management, and support for Railway's environment variable conventions.

Key Production Components:

Backend: FastAPI application on port 8000 (configurable via PORT env var)
Frontend: Next.js application on port 3000
PostgreSQL: Database with pgvector extension
Redis: Optional caching and pub/sub service
S3: Plugin marketplace storage

Sources: orchestrator/config.py:1-285, README.md:1-115

Railway Deployment

Railway is the primary supported platform, with automatic detection of Railway-specific environment variables and deployment patterns.

Railway Project Structure

Railway-Specific Configuration:

Service

Build Command

Start Command

PORT Detection

Frontend

npm install --legacy-peer-deps && npm run build

npm start

Fixed: 3000

Backend

Auto (Dockerfile)

uvicorn main:app --host 0.0.0.0 --port $PORT --workers 4

Railway sets $PORT

PostgreSQL

Railway Plugin

Railway provides DATABASE_URL

Redis

Railway Plugin

Railway provides REDIS_URL

Sources: orchestrator/Dockerfile:73-116, frontend/Dockerfile:83-119, orchestrator/config.py:73-79

Deployment Steps

1. Create Railway Project

# Install Railway CLI
npm install -g @railway/cli

# Login to Railway
railway login

# Create new project
railway init

2. Add Database Plugins

Add PostgreSQL and Redis plugins in Railway dashboard:

PostgreSQL: Version 16 with pgvector support
Redis: Version 7 with persistence

Railway automatically injects DATABASE_URL and REDIS_URL environment variables.

3. Configure Environment Variables

Set these variables in Railway dashboard for backend service:

Required:

# LLM Provider Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

# Clerk Authentication
CLERK_SECRET_KEY=sk_live_...
CLERK_JWKS_URL=https://your-clerk-domain.clerk.accounts.dev/.well-known/jwks.json

# AWS S3 (Plugin Marketplace)
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=us-east-1
MARKETPLACE_S3_BUCKET=automatos-marketplace

# Security
API_KEY=your_secure_api_key_here

# Environment
ENVIRONMENT=production

Optional:

# CORS (Frontend Domain)
CORS_ALLOW_ORIGINS=https://your-frontend.up.railway.app,https://yourdomain.com

# Redis Configuration (if using external Redis)
REDIS_URL=redis://:password@host:6379/0

# Composio Integration
COMPOSIO_API_KEY=...
COMPOSIO_WEBHOOK_SECRET=...

# Feature Flags
ENABLE_BATCH_API=false
S3_VECTORS_ENABLED=false

For frontend service:

# Backend URL (Railway internal or public)
NEXT_PUBLIC_API_URL=https://your-backend.up.railway.app

# Clerk Public Keys
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=pk_live_...
NEXT_PUBLIC_CLERK_SIGN_IN_URL=/sign-in
NEXT_PUBLIC_CLERK_SIGN_UP_URL=/sign-up

Sources: orchestrator/.env.example:1-64, orchestrator/config.py:28-185

4. Deploy Services

# Deploy backend
railway up

# Deploy frontend (separate service)
railway up

Railway automatically builds Docker images using multi-stage Dockerfiles and deploys to production infrastructure.

Production Docker Configuration

Multi-Stage Build Architecture

Both frontend and backend use multi-stage Docker builds with separate development and production targets.

Backend Production Stage

Key Production Optimizations:

Optimization

Implementation

File Reference

Minimal base image

python:3.11-slim

orchestrator/Dockerfile:13

Dependency caching

pip install --no-cache-dir

orchestrator/Dockerfile:33-35

Non-root user

useradd automatos

orchestrator/Dockerfile:98-99

Clean temp files

Remove __pycache__, .pyc

orchestrator/Dockerfile:94-95

Multi-worker

--workers 4

orchestrator/Dockerfile:115

Health check

curl -f http://localhost:$PORT/health

orchestrator/Dockerfile:107-108

Production Command:

CMD sh -c "uvicorn main:app --host 0.0.0.0 --port ${PORT:-8000} --workers 4"

This command:

Uses $PORT environment variable (Railway requirement)
Falls back to port 8000 if PORT not set
Runs 4 worker processes for concurrency
Binds to all interfaces (0.0.0.0)

Sources: orchestrator/Dockerfile:73-116

Frontend Production Stage

Key Frontend Optimizations:

Optimization

Implementation

File Reference

Build-time env vars

ARG NEXT_PUBLIC_*

frontend/Dockerfile:55-63

Production dependencies only

npm ci --only=production

frontend/Dockerfile:94

Copy built artifacts

.next, public, next.config.js

frontend/Dockerfile:96-100

Non-root user

adduser nextjs

frontend/Dockerfile:103-105

Static optimization

Next.js SSG/ISR

Built into Next.js

Important: NEXT_PUBLIC_* environment variables are baked into the client bundle at build time and cannot contain secrets. Server-side API keys must be handled via backend API routes.

Sources: frontend/Dockerfile:51-119

Security Hardening

Production images implement multiple security layers:

Security Checklist:

✅ Non-root user for both frontend and backend
✅ Secrets injected at runtime (never in Dockerfile or code)
✅ Development dependencies removed in production
✅ Health checks for automatic restart on failure
✅ Encrypted credential storage (AES-256-GCM)
✅ API key authentication for backend
✅ Clerk JWT verification for user endpoints

Sources: orchestrator/Dockerfile:98-115, frontend/Dockerfile:103-118, orchestrator/core/credentials/service.py:1-850

Database Configuration

PostgreSQL Production Setup

Connection Pooling

The system uses SQLAlchemy with connection pooling configured for production workloads:

# Database URL format (Railway provides this)
DATABASE_URL=postgresql://user:password@host:port/database

# Connection pool settings (configured in docker-compose for reference)
max_connections=200
shared_buffers=256MB

Recommended Pool Configuration:

Parameter

Development

Production

Notes

pool_size

Base connection pool

max_overflow

Additional connections under load

pool_timeout

Seconds to wait for connection

pool_recycle

3600

1800

Recycle connections (seconds)

Sources: docker-compose.yml:21-42, orchestrator/config.py:36-42

pgvector Extension

Production PostgreSQL must have the pgvector extension enabled for embedding storage:

-- Verify pgvector is available
SELECT * FROM pg_available_extensions WHERE name = 'vector';

-- Enable extension (auto-initialized in Railway via init script)
CREATE EXTENSION IF NOT EXISTS vector;

Railway PostgreSQL plugin includes pgvector by default when using pgvector/pgvector:pg16 image.

Backup Strategy

Automated Backups:

Railway provides automatic daily backups (retained 7 days)
Configure additional backup retention via Railway dashboard

Manual Backups:

# Export database dump
railway run pg_dump -Fc orchestrator_db > backup.dump

# Restore from dump
railway run pg_restore -d orchestrator_db backup.dump

Sources: docker-compose.yml:21-42

Redis Production Setup

Connection Configuration

Redis is optional but recommended for production to enable:

Plugin content caching
Real-time workflow updates (pub/sub)
Composio app/action caching

Railway REDIS_URL Format:

# Railway provides this automatically
REDIS_URL=redis://:password@host:port/0

The RedisClient class automatically parses REDIS_URL with fallback to individual variables:

Sources: orchestrator/core/redis/client.py:149-198, orchestrator/config.py:46-62

Cache Policies

Production Redis is configured with LRU eviction for memory management:

# Redis configuration (set via Railway or docker-compose)
maxmemory 256mb
maxmemory-policy allkeys-lru

Cache TTL Values:

Cache Type

TTL (seconds)

Config Variable

Plugin content

3600 (1 hour)

PLUGIN_CACHE_TTL_SECONDS

Composio apps

86400 (24 hours)

ROUTING_CACHE_TTL_HOURS

Session data

Varies

Configured in application

Sources: docker-compose.yml:48-63, orchestrator/core/services/plugin_cache.py:38-47

Pub/Sub Channels

Redis pub/sub is used for real-time workflow execution updates:

Channel Naming Convention:

workflow:{workflow_id}:execution:{execution_id}

Message Format:

{
  "type": "execution_started | subtask_execution_update | execution_completed",
  "data": {
    "execution_id": 123,
    "workflow_id": 456,
    ...
  }
}

The RedisClient.publish_workflow_event() method handles message formatting and publishing.

Sources: orchestrator/core/redis/client.py:91-119

Secrets Management

Credential Encryption

Production credentials are encrypted using AES-256-GCM with workspace-scoped encryption keys.

Encryption Architecture

Encryption Key Setup:

# Generate encryption key (run once)
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

# Set as environment variable (Railway secret)
ENCRYPTION_KEY=your_generated_key_here

Security Features:

AES-256-GCM encryption
Workspace-scoped credential isolation
Audit logging for all access
Automatic key rotation support
Expiration date enforcement

Sources: orchestrator/core/credentials/service.py:42-182, orchestrator/core/models/credentials.py:60-103

API Key Rotation

Production API keys should be rotated periodically:

# Generate new API key
openssl rand -hex 32

# Update Railway environment variable
railway variables set API_KEY=new_key_here

# Restart backend service (Railway auto-restarts on env change)

The get_request_context_hybrid authentication middleware supports API key validation:

# Headers for API key authentication
X-API-Key: your_api_key_here

Sources: orchestrator/config.py:66-68

CORS Configuration

Production CORS must explicitly allow frontend domains:

# Single domain
CORS_ALLOW_ORIGINS=https://app.yourdomain.com

# Multiple domains (comma-separated)
CORS_ALLOW_ORIGINS=https://app.yourdomain.com,https://staging.yourdomain.com

# Railway default
CORS_ALLOW_ORIGINS=https://automotas-ai-frontend-production.up.railway.app

The Config class automatically parses and validates CORS origins:

# Parsed into list, stripping whitespace
_cors_origins = os.getenv("CORS_ALLOW_ORIGINS", "...")
CORS_ALLOW_ORIGINS: str = ",".join([origin.strip() for origin in _cors_origins.split(",") if origin.strip()])

Sources: orchestrator/config.py:71-79

Health Checks & Monitoring

Health Endpoints

Both frontend and backend include health check endpoints for monitoring and auto-restart.

Backend Health Check

# Health endpoint
GET /health

# Response
{
  "status": "healthy",
  "timestamp": "2025-01-15T10:30:00Z",
  "database": "connected",
  "redis": "connected"
}

Docker Health Check Configuration:

HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
    CMD curl -f http://localhost:${PORT:-8000}/health || exit 1

Sources: orchestrator/Dockerfile:106-108

Frontend Health Check

# Root endpoint returns Next.js HTML
GET /

# Docker health check
curl -f http://localhost:3000 || exit 1

Health Check Configuration:

HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
    CMD curl -f http://localhost:3000 || exit 1

Sources: frontend/Dockerfile:112-114

Railway Monitoring

Railway provides built-in monitoring:

Metrics: CPU, memory, network usage
Logs: Aggregated stdout/stderr from all services
Deployments: Automatic rollback on health check failures
Alerts: Configure alerts for downtime or errors

Access Logs:

# View live logs
railway logs

# View logs for specific service
railway logs --service backend

Scaling Considerations

Horizontal Scaling

Backend Worker Processes

Production backend runs multiple Uvicorn workers:

CMD sh -c "uvicorn main:app --host 0.0.0.0 --port ${PORT:-8000} --workers 4"

Worker Count Recommendations:

Deployment Size

Workers

Memory per Worker

Total Memory

Small (Railway Hobby)

256 MB

512 MB

Medium (Railway Pro)

512 MB

2 GB

Large (Railway Enterprise)

1 GB

8 GB

Formula: workers = (2 * num_cpu_cores) + 1

Sources: orchestrator/Dockerfile:115

Database Connection Pooling

Each worker maintains its own connection pool. Ensure PostgreSQL max_connections accommodates all workers:

max_connections = (num_workers * pool_size * num_backend_instances) + overhead

Example for 2 backend instances with 4 workers each:

max_connections = (4 * 20 * 2) + 20 = 180

Set max_connections=200 in PostgreSQL configuration for safety margin.

Sources: docker-compose.yml:29

Redis Connection Management

Redis uses connection pooling to handle concurrent requests:

# Connection pool configuration
self.pool = redis.ConnectionPool(
    host=host,
    port=port,
    password=password,
    db=db,
    decode_responses=True,
    max_connections=50  # Adjust based on worker count
)

For multiple backend instances, increase max_connections:

max_connections = num_workers * num_instances * 10

Sources: orchestrator/core/redis/client.py:22-29

Vertical Scaling

Railway allows easy vertical scaling through the dashboard:

Resource Limits:

Tier

CPU

Memory

Disk

Hobby

Shared

512 MB - 8 GB

100 GB

Pro

Shared

512 MB - 32 GB

100 GB

Enterprise

Dedicated

Custom

Scaling Guidelines:

Monitor resource usage via Railway dashboard
Scale memory first if OOM errors occur
Scale CPU if high latency with low memory usage
Increase workers if CPU utilization < 70% but response time is slow

Environment-Specific Configuration

Production vs Development

The Config class provides environment detection:

@property
def IS_PRODUCTION(self) -> bool:
    return self.ENVIRONMENT.lower() == "production"

@property
def IS_DEVELOPMENT(self) -> bool:
    return self.ENVIRONMENT.lower() == "development"

Key Differences:

Setting

Development

Production

ENVIRONMENT

development

production

LOG_LEVEL

DEBUG

INFO or WARNING

DEBUG

true

false

Docker target

development

production

Hot reload

Enabled

Disabled

Source mounts

Yes

Optimization

None

Multi-stage, cleaned

Sources: orchestrator/config.py:114-123

Feature Flags

Production deployments can enable/disable features via environment variables:

# Batch API (experimental)
ENABLE_BATCH_API=false

# S3 Vector Storage
S3_VECTORS_ENABLED=false

# Jira Bug Reports (Pilot feature)
JIRA_BUG_REPORTS_ENABLED=true

Feature flags are centralized in the Config class:

ENABLE_BATCH_API: bool = os.getenv("ENABLE_BATCH_API", "false").lower() == "true"
S3_VECTORS_ENABLED: bool = os.getenv("S3_VECTORS_ENABLED", "false").lower() == "true"
JIRA_BUG_REPORTS_ENABLED: bool = os.getenv("JIRA_BUG_REPORTS_ENABLED", "true").lower() == "true"

Sources: orchestrator/config.py:154-174

Troubleshooting Production Issues

Common Issues

1. Backend Fails to Start

Symptoms:

Health check failures
Crash loop in Railway logs

Diagnosis:

# Check logs
railway logs --service backend

# Common errors:
# - "Database connection failed"
# - "Redis connection failed"
# - "Failed to decrypt credentials"

Solutions:

Error

Solution

Database connection failed

Verify DATABASE_URL is set and PostgreSQL plugin is running

Redis connection failed

Redis is optional; set REDIS_URL or remove Redis dependencies

Failed to decrypt credentials

Set ENCRYPTION_KEY environment variable

Import errors

Ensure all dependencies in requirements.txt are installed

2. Frontend Cannot Connect to Backend

Symptoms:

"Failed to fetch" errors in browser console
CORS errors

Solutions:

Verify NEXT_PUBLIC_API_URL points to backend service URL
Check CORS configuration includes frontend domain
Ensure backend service is healthy (check /health endpoint)

3. Slow Database Queries

Diagnosis:

# Check active connections
SELECT count(*) FROM pg_stat_activity WHERE state = 'active';

# Check connection pool exhaustion
SELECT count(*) FROM pg_stat_activity;

Solutions:

Increase max_connections in PostgreSQL
Increase connection pool size in application
Add database indexes for slow queries
Enable query logging to identify bottlenecks

4. Redis Connection Timeouts

Symptoms:

Workflow updates not streaming
Plugin content cache misses

Solutions:

# Test Redis connection
redis-cli -u $REDIS_URL ping

# Check memory usage
redis-cli -u $REDIS_URL INFO memory

# Clear cache if memory full
redis-cli -u $REDIS_URL FLUSHDB

Sources: orchestrator/core/redis/client.py:121-134, docs/LOCAL_SETUP_GUIDE.md:145-175

Deployment Checklist

Before deploying to production:

Pre-Deployment

Generate and set ENCRYPTION_KEY for credential encryption
Set all required environment variables (see Environment Variables)
Configure CORS with production frontend domains
Set ENVIRONMENT=production
Set LOG_LEVEL=INFO or WARNING
Verify LLM API keys are valid and have sufficient credits
Create S3 bucket for plugin marketplace
Configure Clerk authentication with production domain
Set strong API_KEY for backend API authentication

Post-Deployment

Monitoring

Monitor CPU/memory usage in Railway dashboard
Check application logs for errors
Monitor database connection pool utilization
Track API response times and error rates
Set up alerts for service downtime
Configure log retention and aggregation

Sources: orchestrator/config.py:225-247, orchestrator/.env.example:1-64

PreviousEnvironment Variables NextRedis Configuration

Last updated 23 days ago

Good afternoon

hashtagDeployment Overview

hashtagRailway Deployment

hashtagRailway Project Structure

hashtagDeployment Steps

hashtag1. Create Railway Project

hashtag2. Add Database Plugins

hashtag3. Configure Environment Variables

hashtag4. Deploy Services

hashtagProduction Docker Configuration

hashtagMulti-Stage Build Architecture

hashtagBackend Production Stage

hashtagFrontend Production Stage

hashtagSecurity Hardening

hashtagDatabase Configuration

hashtagPostgreSQL Production Setup

hashtagConnection Pooling

hashtagpgvector Extension

hashtagBackup Strategy

hashtagRedis Production Setup

hashtagConnection Configuration

hashtagCache Policies

hashtagPub/Sub Channels

hashtagSecrets Management

hashtagCredential Encryption

hashtagEncryption Architecture

hashtagAPI Key Rotation

hashtagCORS Configuration

hashtagHealth Checks & Monitoring

hashtagHealth Endpoints

hashtagBackend Health Check

hashtagFrontend Health Check

hashtagRailway Monitoring

hashtagScaling Considerations

hashtagHorizontal Scaling

hashtagBackend Worker Processes

hashtagDatabase Connection Pooling

hashtagRedis Connection Management

hashtagVertical Scaling

hashtagEnvironment-Specific Configuration

hashtagProduction vs Development

hashtagFeature Flags

hashtagTroubleshooting Production Issues

hashtagCommon Issues

hashtag1. Backend Fails to Start

hashtag2. Frontend Cannot Connect to Backend

hashtag3. Slow Database Queries

hashtag4. Redis Connection Timeouts

hashtagDeployment Checklist

hashtagPre-Deployment

hashtagPost-Deployment

hashtagMonitoring

Deployment Overview

Railway Deployment

Railway Project Structure

Deployment Steps

1. Create Railway Project

2. Add Database Plugins

3. Configure Environment Variables

4. Deploy Services

Production Docker Configuration

Multi-Stage Build Architecture

Backend Production Stage

Frontend Production Stage

Security Hardening

Database Configuration

PostgreSQL Production Setup

Connection Pooling

pgvector Extension

Backup Strategy

Redis Production Setup

Connection Configuration

Cache Policies

Pub/Sub Channels

Secrets Management

Credential Encryption

Encryption Architecture

API Key Rotation

CORS Configuration

Health Checks & Monitoring

Health Endpoints

Backend Health Check

Frontend Health Check

Railway Monitoring

Scaling Considerations

Horizontal Scaling

Backend Worker Processes

Database Connection Pooling

Redis Connection Management

Vertical Scaling

Environment-Specific Configuration

Production vs Development

Feature Flags

Troubleshooting Production Issues

Common Issues

1. Backend Fails to Start

2. Frontend Cannot Connect to Backend

3. Slow Database Queries

4. Redis Connection Timeouts

Deployment Checklist

Pre-Deployment

Post-Deployment

Monitoring