Database Setup
Purpose and Scope
This document covers the PostgreSQL database configuration, initialization, and management for Automatos AI. It details the Docker-based deployment, pgvector extension setup, schema initialization, connection pooling, and multi-tenancy architecture.
For application-level data models and ORM patterns, see Database Models. For Redis configuration, see Redis Configuration. For complete environment variable reference, see Environment Variables.
PostgreSQL with pgvector
Automatos AI uses PostgreSQL 16 with the pgvector extension for vector similarity search in the RAG system. The pgvector/pgvector:pg16 Docker image provides both the database engine and native vector operations without requiring separate installation steps.
Key Features
pgvector Extension
Vector embeddings for RAG
Automatic initialization via init script
Connection Pooling
Max 200 concurrent connections
max_connections=200 in POSTGRES_INITDB_ARGS
Shared Buffers
Memory optimization
shared_buffers=256MB for query performance
SQLAlchemy ORM
Python database abstraction
sqlalchemy==2.0.23 with async support
Alembic Migrations
Schema versioning
alembic==1.12.1 for incremental updates
Sources: docker-compose.yml:22-43, orchestrator/requirements.txt:6-13
Docker Container Configuration
Container Initialization Flow
Sources: docker-compose.yml:22-43
Docker Compose Service Definition
The PostgreSQL service is defined with strict health checks to ensure dependent services (backend, workspace-worker) only start when the database is fully ready:
The POSTGRES_PASSWORD:? syntax enforces that the variable must be set in the .env file, preventing accidental deployment with default credentials.
Sources: docker-compose.yml:22-43
Environment Variables
Required Configuration
POSTGRES_DB
orchestrator_db
Database name
POSTGRES_USER
postgres
Superuser account
POSTGRES_PASSWORD
(required)
Master password - no default
POSTGRES_HOST
postgres
Hostname (container name in Docker network)
POSTGRES_PORT
5432
Standard PostgreSQL port
DATABASE_URL
(computed)
Full connection string for SQLAlchemy
Connection String Construction
The backend service constructs the DATABASE_URL from individual components:
This format is compatible with SQLAlchemy's create_engine() and Alembic's configuration.
Sources: docker-compose.yml:91-97
Schema Initialization
Automatic Schema Creation
The database schema is initialized once during container first-start via the Docker docker-entrypoint-initdb.d mechanism. The init_complete_schema.sql script is mounted as a read-only volume and executed before PostgreSQL accepts connections.
Sources: docker-compose.yml:35
Key Schema Components
The complete schema includes 40+ tables organized by functional domain:
Identity
workspaces, users, team_members
Multi-tenancy and authentication
Agents
agents, agent_categories, personas
Agent definitions and configuration
Tools
composio_apps, composio_actions, agent_app_assignments
Tool metadata and permissions
RAG
documents, chunks, knowledge_graphs, entities
Document storage and retrieval
Workflows
workflows, workflow_recipes, recipe_executions
Multi-step automation
Memory
conversation_contexts, memory_entries
Chat history and agent memory
System
credentials, settings, audit_logs
Configuration and compliance
Every major table includes:
workspace_id: Foreign key for tenant isolation (indexed)created_at/updated_at: Timestamp tracking with auto-update triggersSoft delete support:
is_deleted+deleted_atcolumns where appropriate
Sources: docker-compose.yml:35
Connection Pooling and Session Management
SQLAlchemy Engine Configuration
The backend uses SQLAlchemy 2.0 with a connection pool configured for high concurrency:
Session Lifecycle
Each API request follows this pattern:
Request arrives → Router invokes endpoint
Dependency injection → FastAPI calls
get_db()Session creation → New SQLAlchemy
Sessionfrom engineQuery execution → ORM operations within transaction
Auto-commit/rollback →
yieldreturns control, exception handling commits or rolls backSession close → Connection returned to pool
This ensures:
Automatic transaction management (commit on success, rollback on exception)
Connection reuse (pooling prevents connection exhaustion)
Thread safety (each request gets isolated session)
Sources: orchestrator/requirements.txt:6-13
Database Dependencies
Python Package Stack
Sources: orchestrator/requirements.txt:6-13
System-Level Dependencies
The Docker image includes PostgreSQL client tools for debugging and migrations:
Sources: orchestrator/Dockerfile:18-33
Migrations with Alembic
Migration Directory Structure
Migration Commands
For development environments, use Alembic to evolve the schema:
Production Deployment Strategy
In production (Railway, AWS), the database is pre-initialized with the complete schema via the init script. Alembic is used only for incremental updates after initial deployment:
First deploy: Docker runs
init_complete_schema.sql→ Full schema createdSchema changes: Developer creates Alembic migration
Deploy update: CI/CD runs
alembic upgrade headbefore starting backendRollback: Use
alembic downgradeif issues detected
Sources: orchestrator/requirements.txt:8, docker-compose.yml:35
Multi-Tenancy Data Isolation
Database-Level Isolation
Every workspace-scoped table includes a workspace_id column with an index and foreign key constraint:
Query Filtering Pattern
All ORM queries automatically filter by workspace context:
Sources: docker-compose.yml:22-43
Marketplace vs Workspace Items
The owner_type field distinguishes shared marketplace items from private workspace items:
owner_type
Visibility
Query Filter
marketplace
Global (all workspaces)
No workspace_id filter
workspace
Private (single workspace)
Filter by workspace_id
When a user installs a marketplace item, a copy is created with:
owner_type = 'workspace'workspace_id = <user's workspace>cloned_from_id = <marketplace item ID>
This enables centralized updates while maintaining tenant isolation.
Sources: docker-compose.yml:22-43
Performance Optimization
Indexing Strategy
Critical indexes for query performance:
agents
(workspace_id, created_at DESC)
List agents by workspace (common query)
documents
(workspace_id, created_at DESC)
Document timeline view
chunks
(document_id, chunk_index)
Ordered chunk retrieval for RAG
composio_actions
(app_name, name)
Tool resolution in ToolRouter
workflow_recipes
(workspace_id, is_active)
Active recipe lookup
conversation_contexts
(workspace_id, agent_id, created_at DESC)
Chat history pagination
Vector Search Indexes
For pgvector columns, HNSW indexes provide approximate nearest neighbor search:
m=16: Number of bidirectional links per layer (higher = better recall, more memory)
ef_construction=64: Candidate pool size during index build (higher = better quality, slower build)
At query time, ef_search controls the tradeoff between speed and recall (set dynamically per query).
Sources: orchestrator/requirements.txt:11
Connection Pool Tuning
Default pool configuration:
This supports 30 concurrent requests under normal load, with graceful degradation (connection queueing) beyond that.
Sources: docker-compose.yml:30
Health Checks and Monitoring
Container Health Check
The PostgreSQL container exports a health check endpoint:
This command:
Succeeds (exit 0) if PostgreSQL is accepting connections
Fails (exit 1) if the server is unreachable or initializing
Retries 5 times with 10-second intervals before marking unhealthy
Backend Database Connection Test
The FastAPI backend includes a /health endpoint that verifies database connectivity:
This is called by Docker health checks and load balancers to determine service readiness.
Sources: docker-compose.yml:36-41
Volume Persistence
The postgres_data named volume ensures data survives container restarts:
Location:
/var/lib/docker/volumes/automatos_postgres_data/_dataLifecycle: Persists across
docker-compose down(destroyed only withdocker-compose down -v)Backup: Use
docker run --rm -v automatos_postgres_data:/data -v $(pwd):/backup ubuntu tar czf /backup/postgres.tar.gz -C /data .
Sources: docker-compose.yml:259-261
Database Access Patterns
Direct SQL Execution
For debugging or manual operations, connect to PostgreSQL directly:
Adminer Database UI
The adminer service (profile: all) provides a web-based database administration interface:
URL
http://localhost:8080
Server
postgres (Docker network hostname)
Login
Uses POSTGRES_USER and POSTGRES_PASSWORD from .env
Capabilities
Query editor, schema browser, table editor, import/export
Sources: docker-compose.yml:223-236
Production Deployment Considerations
Railway Managed PostgreSQL
On Railway, use the managed PostgreSQL addon instead of self-hosted container:
Add PostgreSQL Addon → Railway automatically injects
DATABASE_URLRemove postgres service from
docker-compose.yml(not needed)Run migrations via Railway build command:
alembic upgrade headpgvector extension must be manually enabled:
Connection Limit Scaling
For high-traffic production:
Managed PostgreSQL: Increase connection limit to 500+ via provider dashboard
Self-hosted: Tune
max_connectionsandshared_buffersin postgresql.confConnection pooling: Use PgBouncer for connection multiplexing (100+ backend connections → 1000+ client connections)
Backup Strategy
Hourly
WAL archiving to S3
7 days
Daily
Full dump via pg_dump
30 days
Weekly
Snapshot of volume/RDS instance
90 days
Restore procedure:
Sources: docker-compose.yml:22-43, orchestrator/Dockerfile:23
Troubleshooting
Common Issues
FATAL: password authentication failed
Wrong POSTGRES_PASSWORD in .env
Update .env and restart: docker-compose restart postgres
FATAL: database does not exist
POSTGRES_DB mismatch
Check DATABASE_URL matches POSTGRES_DB
connection refused
Container not healthy
Check logs: docker-compose logs postgres
too many connections
Exceeds max_connections=200
Increase pool limit or reduce concurrent requests
extension "vector" does not exist
pgvector not installed
Verify using correct image: pgvector/pgvector:pg16
Diagnostic Commands
Reset Database
To completely reset the database (⚠️ destroys all data):
Sources: docker-compose.yml:22-43, orchestrator/Dockerfile:18-33
Last updated

