PRD-73: Agent Monitoring Integration Guide
What's Running
Service
Internal Address
Purpose
1. How Logs Flow (Agent-Readable)
Any Service (Python logger)
↓ POST /push (JSON)
log-relay.railway.internal:8080
↓ batches + transforms
loki.railway.internal:3100
↓ queryable via
Grafana Logs Explorer dashboardSending Logs from Any Service
Log-Relay Push Format
Querying Logs (for Agents)
2. How Metrics Flow
Prometheus Scrape Targets
Target
Address
Metrics Available
Custom Application Metrics (in code)
Querying Metrics (for Agents)
3. How Alerts Flow
Active Alert Rules
Infrastructure (fire immediately)
Alert
Severity
Condition
Meaning
PostgreSQL
Alert
Severity
Condition
Redis
Alert
Severity
Condition
Application (requires /metrics — now deployed)
Alert
Severity
Condition
Querying Alerts (for Agents)
4. SENTINEL Pattern — Agent Investigation (Read-Only v1)
Investigation Playbook per Alert
Alert
Agent Action
Tools to Use
Agent Response Format
Triggering Investigation
5. Grafana Dashboards
Available Dashboards
Dashboard
Status
What It Shows
Embedding Dashboards
6. Wiring Checklist for New Services
1. Logging
2. Metrics (if Python/FastAPI)
3. Add to Prometheus Scrape Config
4. Add Alert Rules (if needed)
5. Add Grafana Dashboard (if needed)
7. Environment Variables Reference
On Backend (automatos-ai-api)
Variable
Value
Purpose
On Agent Worker
Variable
Value
Purpose
On Monitoring Services
Service
Key Variables
8. Repository Structure (automatos-monitoring)
TL;DR for Auto
Last updated

