PRD-78: Unified Memory & Context Architecture

Status: DRAFT Author: Gerard Kavanagh + Claude Date: 2026-03-12 Priority: P0 — Foundation for agent intelligence Supersedes: PRD-05 (Memory & Knowledge), PRD-39 (Mem0 Migration) Extends: PRD-77 (Memory Dashboard), PRD-69 (Agent Intelligence) Touches: PRD-08 (RAG v2), PRD-03 (Context Engineering), PRD-21 (Database Knowledge)


1. Problem Statement

1.1 The Core Issue

Memory in Automatos is fragmented across 6 consumers, 5 Mem0Client instances, 5 user_id formats, and 2 competing architectural approaches (PRD-05 hierarchical vs PRD-39 Mem0 flat). The result:

  • Agents lose all conversation context when a session ends

  • No graduated importance — a passing "I like dark mode" gets the same treatment as a critical business decision

  • Agents don't know what they don't know — no awareness of "I can look this up in documents" vs "I should remember this"

  • Every request hits Mem0 cold — no local cache hierarchy

  • At 10K users, one Mem0 instance doing embedding searches per request will choke

1.2 Current Fragmentation (Evidence)

Consumer
Mem0Client
user_id Format
Stores?
Retrieves?

Chatbot

Own lazy instance

ws_{id}, ws_{id}_agent_{id}, ws_{id}_daily

Yes

Yes

Recipes

Own instance in __init__

ws_{id}_recipe_{rid}, ws_{id}_recipe_{rid}_agent_{aid}

Yes

Yes

Widget

Own lazy instance

ws_{id} only

Yes

Yes

Platform tools

New per-call

ws_{id}, ws_{id}_agent_{id}

Yes

Yes

Memory stats

Own lazy getter

All tiers dynamically

No

Yes

Heartbeat

Via SmartMemoryManager

Inherits chatbot tiers

Yes (daily)

Yes (daily)

Circuit breaker state is NOT shared — one consumer can have breaker open while another keeps hammering a dead Mem0.

1.3 What Gets Lost Today

  • Conversation continuity: Chat history dies when session ends. Only extracted facts survive.

  • Recipe learnings: Stored in recipe-scoped Mem0 keys — invisible to chatbot agents.

  • Cross-agent knowledge: Agent A learns something useful for Agent B — no transfer mechanism.

  • Temporal context: "What did we discuss last week?" — no retrieval path exists.

  • Operational context: Heartbeat daily logs stored but rarely injected into chat context.


2. Vision: The Human Brain Analogy

When you hire someone, they:

  1. Focus on what's in front of them (context window)

  2. Scribble notes during the meeting (working memory)

  3. Remember this week's discussions without effort (short-term)

  4. Know your preferences after months of working together (long-term)

  5. Look things up in Confluence/CRM when they need specifics (organizational knowledge)

They don't memorize the entire CRM. They know it EXISTS and when to look. That's the system we need.


3. Architecture: 5-Layer Memory Stack

Layer 0: FOCUS (Context Window)

Property
Value

Storage

In-memory, request-scoped

What

Current conversation, tool results, system prompt

Capacity

Model context window (128K tokens)

TTL

Request lifetime

Analogy

Your desk right now

New work

None — this already works

Layer 1: WORKING MEMORY (Session Cache)

Property
Value

Storage

Redis (per-session key)

What

Conversation summaries, session decisions, temp notes, tool results

Capacity

~50 items per session

TTL

24 hours after last activity (configurable)

Analogy

Your notepad from today's meeting

New work

NEW — doesn't exist today

Key design:

  • Key format: mem:session:{workspace_id}:{conversation_id}

  • Stores conversation summary (rolling, updated every 5 messages)

  • Stores key decisions and action items extracted per exchange

  • On session resume: hydrate L0 from L1 summary instead of replaying full history

  • On session end + 1hr: consolidation job promotes important items to L2

Why Redis:

  • Sub-millisecond reads (vs 200-500ms Mem0)

  • Natural TTL support

  • Already deployed on Railway

  • Handles 100K+ concurrent sessions trivially

Layer 2: SHORT-TERM MEMORY (Recent Context)

Property
Value

Storage

Postgres (structured rows) + Mem0 (semantic search)

What

Last 7-30 days of interactions, recent preferences, project context

Capacity

~1,000 items per user

TTL

7-30 days (Ebbinghaus decay curve)

Analogy

What you discussed this week with a colleague

New work

Wire existing HierarchicalMemoryManager decay logic

Key design:

  • Postgres table: memory_short_term (workspace_id, agent_id, content, importance, decay_score, access_count, created_at, last_accessed_at)

  • Mem0 stores the semantic version (for vector search)

  • Consolidation job: hourly, promotes high-importance items to L3

  • Importance scoring: base score + access_frequency_boost + recency_boost + content_richness_boost

  • Decay: retention = exp(-0.1 * hours_elapsed) boosted by importance and access count

  • Items below 0.3 retention score are archived or deleted

What goes here:

  • All conversation exchanges (stored with infer=False — raw, not extracted)

  • Recipe execution summaries

  • Heartbeat daily logs

  • Tool call results that had useful outcomes

Layer 3: LONG-TERM MEMORY (Learned Knowledge)

Property
Value

Storage

Mem0 with infer=True (fact extraction + deduplication)

What

Stable facts, preferences, patterns, relationships

Capacity

~10K items per user

TTL

Permanent (refreshed on contradiction)

Analogy

What you know about a colleague after months

New work

Promotion pipeline from L2, consolidation

Key design:

  • Mem0 handles deduplication automatically ("user likes coffee" stored twice → merged)

  • LLM fact extraction pulls out key facts, not verbatim storage

  • Custom categories via Mem0: personal, workflow, preference, decision, learning

  • Contradiction handling: latest truth wins (Mem0 native)

  • Consolidation: daily job merges related memories, flags stale ones (PRD-77 Phase 3)

What goes here:

  • User facts (name, role, company, timezone)

  • Stable preferences (communication style, tool preferences)

  • Learned patterns (how user likes reports formatted, which Slack channels matter)

  • Business decisions (pricing strategy, target markets)

  • Agent instincts (PRD-69 promoted patterns)

Layer 4: ORGANIZATIONAL KNOWLEDGE (Look-up)

Property
Value

Storage

S3 Vectors (RAG), Postgres (NL2SQL), External APIs

What

Company docs, databases, CRM, Jira, Confluence

Capacity

Unlimited

TTL

Permanent (updated by sync)

Analogy

Confluence, the CRM, the company wiki

New work

NL2SQL integration, Context Router awareness

Key design:

  • NOT pre-fetched — agent decides to search via tools

  • BUT: Context Router gives agent awareness that these sources exist

  • Three sub-channels:

    • RAG (documents): search_knowledge tool → S3 Vectors → chunks

    • NL2SQL (live data): query_data tool → SQL generation → Postgres → results

    • APIs (external): Composio tools → Jira, Slack, GitHub, etc.


4. The Context Router (Core Innovation)

The Context Router is not a tool — it's a pre-LLM context assembly layer that decides what context to inject BEFORE the agent sees the prompt.

4.1 How It Works

4.2 Context Budget Allocation

Total budget: configurable per model, default 4,000 tokens for context injection.

Source
Budget
Priority
Pre-fetched?

L1 Session summary

500 tokens

Highest

Yes (Redis, <5ms)

L3 Long-term memories

800 tokens (top 5)

High

Yes (Mem0, cached in Redis)

L2 Temporal results

600 tokens

Medium

Only if temporal signal detected

Daily activity logs

400 tokens

Low

Only if relevant

Knowledge awareness

200 tokens

Always

Static text injection

Reserved for tools

Remainder

N/A

Tool results fill this

4.3 Knowledge Awareness Injection

Instead of pre-fetching all knowledge, inject a dynamic capability map into the system prompt. For example, if a user has connected their Postgres metrics database:

This is ~100 tokens but gives the agent the "I know Confluence exists" awareness without pre-fetching.


5. Unified Memory Service (Consolidation)

5.1 Single Entry Point

Replace the 5 scattered Mem0Client instances with ONE service:

5.2 User ID Strategy (Unified)

ONE format, namespace prefixes:

[!WARNING] Mem0 Namespace Isolation: Wrap the Mem0 client so it strictly requires a typed WorkspaceID object, forcing developers to provide the scope at compile/type-check time rather than relying on string concatenation which could easily leak workspace A's long-term memory to workspace B with a single missing prefix.

All consumers use these formats via UnifiedMemoryService methods — never construct user_ids directly.

5.3 Migration from Current State

Current
New
Migration

SmartMemoryManager

UnifiedMemoryService

Wrapper, delegates internally

RecipeMemoryService

UnifiedMemoryService.store_short_term() + tags

Recipe-specific methods become thin wrappers

Widget widget_memory.py

UnifiedMemoryService with widget scope

Remove standalone API, use shared service

Platform executor search

UnifiedMemoryService.search_long_term()

Replace inline Mem0Client creation

Memory stats browse

UnifiedMemoryService.get_all()

Replace lazy getter

MemoryInjector (deprecated)

DELETE

Already dead code


6. Performance at Scale

6.1 Request Path Latency

Step
1 user
10K users
Strategy

L1 Redis session lookup

<5ms

<10ms

Redis scales horizontally, key-partitioned

L3 cached in Redis

<5ms

<10ms

Cache L3 results for 5min in Redis

L3 Mem0 cold fetch

200-500ms

200-500ms

Only on cache miss (~10% of requests)

L2 Postgres search

50ms

100ms

B-tree index on (workspace_id, created_at)

Context Router logic

<10ms

<10ms

In-process, no I/O

Total pre-LLM

~60ms (cached)

~120ms (cached)

Current: 500-1000ms (always cold)

6.2 Caching Strategy

Cache invalidation: On store_exchange(), invalidate the cache key for that workspace+agent.

6.3 Storage Projections

Per workspace (active, 1 user):

  • L1: ~10 session keys × 2KB = 20KB Redis

  • L2: ~200 rows/month × 1KB = 200KB Postgres

  • L3: ~50 facts (Mem0 deduplicates) × 500B = 25KB vector store

At 10K workspaces:

  • L1: 200MB Redis (trivial)

  • L2: 2GB Postgres/month (partition by workspace_id, archive after 30 days)

  • L3: 250MB Mem0 (pgvector handles this easily)


7. Background Jobs

7.1 Session Consolidation (Hourly)

7.2 Decay & Promotion (Daily)

7.3 Consolidation (Weekly)

[!IMPORTANT] Scale constraint (10K Users): Scanning a monolithic memory_short_term table for 10K workspaces in a single loop will eventually timeout. Partition the table by workspace_id and dispatch parallel or batched queue tasks (consolidate_workspace(id)) to background workers instead of looping sequentially.

Uses the existing consolidation.py engine (705 lines, already built for PRD-05).


8. NL2SQL Integration (New Capability)

8.1 Why

Agents need to answer questions about users' external, connected databases ("What's our current MRR?" or "How many users signed up last week?") without pre-loading all business data. This operates on user-supplied databases, not the Automatos core system DB.

8.2 Design

When a user connects a database, Automatos syncs, indexes the schema, and heavily caches its metadata in Redis. You cannot afford to introspect their schema on every request. The Context Router dynamically injects awareness of these specific external databases (as outlined in 4.3).

Extend PRD-21's safe SQL execution with natural language:

Safety:

  • Read-only (SQLValidator enforces SELECT only)

  • Schema allowlist (only expose permitted tables)

  • Query audit trail (existing database_query_audit table)

  • Per-query timeout (5 seconds)

  • Row limit (1000 rows max)

8.3 Tool Definition


9. Phased Rollout

Phase 1: Foundation (Week 1-2)

Goal: Single memory service, Redis session layer, fix fragmentation

Outcome: All consumers use ONE service. Session continuity works. 5x faster repeated queries.

Phase 2: Context Router (Week 3)

Goal: Intelligent pre-fetching, knowledge awareness

Outcome: Agents know what they can look up. Context is assembled intelligently.

Phase 3: Layered Storage (Week 4-5)

Goal: Graduated importance, decay, promotion

Outcome: Save everything short-term, promote what matters to long-term.

Phase 4: NL2SQL + Knowledge Graph (Week 6-7)

Goal: Agents can query live data

Outcome: Agents answer data questions from live database.

Phase 5: Scale & Optimize (Week 8+)

Goal: Production-ready for 10K users

Outcome: Proven at scale with measurable quality metrics.


10. What Gets Deleted / Deprecated

File
Action
Reason

modules/memory/operations/injection.py

DELETE

Dead code, replaced by SmartChatOrchestrator

modules/memory/service.py (HierarchicalMemoryManager)

ABSORB

Decay/promotion logic moves into UnifiedMemoryService

modules/memory/types/memory_types.py

ABSORB

MemoryLevel enum → L0-L4 constants

api/widget_memory.py standalone client

REFACTOR

Use UnifiedMemoryService instead of own Mem0Client

SmartMemoryManager class

REFACTOR → thin wrapper

Delegates to UnifiedMemoryService

RecipeMemoryService class

REFACTOR → thin wrapper

Delegates to UnifiedMemoryService

All inline Mem0Client() instantiation

DELETE

Use singleton from UnifiedMemoryService

No code gets orphaned. Every deletion is replaced by the unified service.


11. Conflicts Resolved

Conflict
Resolution

PRD-05 (hierarchical) vs PRD-39 (Mem0 flat)

Both. L2 uses Postgres (PRD-05 decay logic), L3 uses Mem0 (PRD-39 fact extraction). Not competing — complementary layers.

PRD-03 (knapsack) vs PRD-69 (iterative retrieval)

Context Router Phase 2. Single-pass by default, iterative for MULTI_STEP intents only.

PRD-08 (cognitive formatting) vs PRD-69 (phase-aware compaction)

Both. PRD-08 formats retrieved chunks. PRD-69 D.2 compacts conversation history. Different concerns.

5 user_id formats

ONE format with namespace prefixes via UnifiedMemoryService.

5 Mem0Client instances

ONE singleton with shared circuit breaker and connection pool.

Double memory injection

Eliminated. Context Router is the single injection point.


12. Success Metrics

Metric
Current
Target (Phase 1)
Target (Phase 5)

Context assembly latency (p50)

500ms

60ms

30ms

Context assembly latency (p95)

1000ms

200ms

100ms

Session continuity

0% (lost on close)

100% (24hr)

100% (configurable)

Memory retrieval relevance

Unknown

Baseline measured

>0.7 cosine similarity

Mem0 requests per chat message

2-3 (cold)

0.2 (cached)

0.1

Concurrent users supported

~100

~1,000

~10,000

Memory consumers using shared service

0/6

6/6

6/6

Cross-session context accuracy

0%

70%

90%


13. Dependencies

Dependency
Status
Blocker?

Redis on Railway

Deployed

No

Mem0 on Railway

Deployed

No

Postgres on Railway

Deployed

No

PRD-21 (Database Knowledge)

MVP built

No (Phase 4 only)

PRD-08 (RAG v2)

Complete

No

PRD-77 Phase 4 (memory bugs)

Partial

Yes — fix before Phase 1

PRD-69 (instincts)

Design only

No (Phase 3+ integration)

Consolidation engine

Built (705 lines)

No — ready to wire

HierarchicalMemoryManager decay

Built (401 lines)

No — ready to absorb


14. Risk Register

Risk
Impact
Mitigation

Redis becomes SPOF for all memory

High

Graceful degradation: if Redis down, skip L1/cache, hit Mem0 directly (current behavior)

Mem0 fact extraction quality varies

Medium

Store raw in L2 always; L3 extraction is bonus, not sole source

NL2SQL generates unsafe queries

High

SQLValidator + schema allowlist + audit trail + timeouts

Singleton DB Session Leak

Critical

Ensure UnifiedMemoryService acquires DB sessions per-request from an async pool, never holding a single session globally.

Synchronous L2/L3 Writes

High

Push L2/L3 storage operations to a background queue (Celery/ARQ) so TTFT (Time To First Token) doesn't suffer during the main chat request cycle.

Temporal Detection Latency

Medium

Keep Context Router temporal checks fast and regex-driven to stay under the 10ms budget; avoid heavy NLP models here.

Migration breaks existing memory

High

Phase 1 is additive — UnifiedMemoryService wraps existing code first, replaces later

Over-caching stale context

Medium

Cache invalidation on write + short TTLs (5min L3 cache)

Consolidation job runs too long at scale

Medium

Partition by workspace, process in batches, configurable concurrency


15. Open Questions

  1. Should L2 short-term also use Mem0? Or is Postgres + time-based queries sufficient without vector search?

  2. Memory export/import — should users be able to download their agent's memories? GDPR compliance?

  3. Cross-workspace memory — should an enterprise org share certain memories across workspaces?

  4. Memory quotas — at what point do we limit storage per workspace/plan tier?

  5. Agent-to-agent memory transfer — when Agent B is created from Agent A's template, copy memories?

Last updated