PRD-108 — Memory Field Prototype

Version: 1.0 Type: Research + Design Status: Complete — Ready for Peer Review Priority: P1 Dependencies: PRD-107 (Context Interface Abstraction), PRD-100 (Master Research) Author: Gerard Kavanagh + Claude Date: 2026-03-15


1. Problem Statement

1.1 The Hypothesis

PRD-100 Risk #6: "Context Engineering theory doesn't translate to code — PRD-108 is the prototype gate. If field prototype doesn't outperform message passing, reassess Phase 3."

Hypothesis: Agents sharing a continuous semantic field — where information resonates, decays, and forms attractors — produce higher-quality collaborative output than agents passing discrete messages.

1.2 The Telephone Game Problem

Message-Passing (Today):
Agent A → "Here are my research findings: ..." → Agent B
Agent B → "Based on your findings, I conclude..." → Agent C
Agent C sees A's work through B's interpretation.
If A's finding #7 is relevant to C but B didn't mention it → lost.

Shared Field (Proposed):
┌──────────── SHARED FIELD ────────────┐
│  Agent A injects 20 findings         │
│  Agent B injects 15 analyses         │
│  Finding #7 resonates with Analysis #3│
│    → both amplified automatically    │
│  Finding #12 unreferenced            │
│    → decays naturally over time      │
│  Agent C queries field               │
│    → sees amplified #7+#3 first      │
│    → #12 still retrievable but faded │
└──────────────────────────────────────┘

No telephone game. Agent C accesses the full field, with relevance surfaced by resonance rather than filtered by intermediate agents.

1.3 What This PRD Delivers

A controlled experiment: same task, same agents, same models — one run with message-passing (PRD-107 RedisSharedContext), one with a shared vector field (VectorFieldSharedContext). Measured comparison on context quality, task accuracy, token efficiency, and latency. Results determine whether Phase 3 (PRDs 110-116) proceeds.


2. Prior Art Analysis

2.1 Vector Store Backend Evaluation

Backend
Key Strength
Key Weakness
Verdict

Qdrant

Native payload filtering (datetime, numeric, keyword); Recommendations API = resonance discovery; Docker single-command deploy; :memory: mode for tests

No built-in TTL on points; docs sparse for recommend API

PRIMARY — deploy as Railway Docker service

FAISS

Fastest CPU search at small scale; trivial persistence (write_index/read_index); zero infrastructure

No metadata filtering (external table needed); no thread-safe writes; no payload storage

BENCHMARK ONLY — in-process comparison baseline

Redis Vector Search

Would have been zero-infra (Redis already deployed); native TTL = automatic decay

NOT AVAILABLE — Railway deploys vanilla Redis, not Redis Stack. No RediSearch, no RedisJSON modules

ELIMINATED

Pinecone

Managed service, serverless

External dependency; network latency; cost per query

REJECTED — adds dependency for a self-contained prototype

S3 Vectors (existing)

Already configured at automatos-vector-index, 2048-dim cosine

Document-oriented (designed for RAG, not live fields); no TTL; no real-time metadata queries

WRONG ABSTRACTION — keep for documents, not fields

2.2 Temporal Decay Research

Approach
Source
What We Adopt
What We Reject

Exponential decay S(t) = S₀ × e^(-λt)

Ebbinghaus (1885), standard IR

Core decay formula with λ=0.1 (7h half-life). Already implemented at memory_types.py:65

— (adopted as-is)

Elasticsearch decay functions

ES function_score

Score-time application (no deletion) — patterns persist, decay computed at query time

Three decay profiles (linear, exp, gauss) — exponential is sufficient for prototype

LRU/LFU/ARC cache eviction

Standard CS literature

Access count as reinforcement signal — frequently accessed patterns resist decay

ARC's adaptive tuning — over-complex for prototype

Hebbian reinforcement

Hebb (1949)

"Neurons that fire together wire together" — co-accessed patterns boost each other. +5% per access, cap at 2×

Continuous Hebbian learning rates — fixed increment is simpler

Spaced repetition

Kornell & Bjork (2008)

Re-access resets decay clock (uses last_accessed, not created_at)

SRS scheduling algorithm — agents access on-demand, not on schedule

2.3 Context Engineering Theory (Chapters 08-11)

Concept
What We Adopt for Prototype
What's Deferred to Phase 3

8 core operations (Ch. 08)

5 of 8: inject, query (≈resonate), decay, reinforce (≈amplify), measure_stability

3 deferred: attenuate, tune, collapse

Boundary permeability (Ch. 08)

effective_strength = strength × boundary_permeability — configurable per field

Dynamic boundary adjustment — fixed permeability per field in prototype

Resonance formula (Ch. 09)

`R(A,B) = cos(θ)² ×

A

Attractor protection (Ch. 09)

effective_decay = decay_rate × (1 - attractor_protection) where protection = Σ(resonance × 0.5), cap 0.5

Full attractor dynamics (formation, classification, basin mapping)

Multi-field operations (Ch. 10)

Not in prototype — single field per mission

Superposition, interference, coupling — Phase 3 (PRD-110)

Attractor detection (Ch. 11)

Simple stability metric: avg_strength × 0.6 + organization × 0.4

Gradient convergence, bifurcation detection — Phase 3 (PRD-112)

2.4 Existing Infrastructure Reuse

Component
Location
How We Reuse It

EmbeddingManager

core/llm/embedding_manager.py

generate_embeddings_batch() with qwen3-embedding-8b (2048-dim) via OpenRouter. Zero new embedding infrastructure

SharedContextManager

inter_agent.py:400-649

The Phase 2 baseline. Wrapped by RedisSharedContext (PRD-107) as the control condition

MEMORY_DECAY_RATE

config.py

λ=0.1 — same decay rate for field patterns. Consistent with L2 memory behavior

MEMORY_DECAY_ARCHIVE_THRESHOLD

config.py

0.3 — filter threshold for decayed patterns. We use 0.05 (stricter) for field queries

MemoryNamespace pattern

unified_memory_service.py:39-117

Extend: mem:{workspace_id}:field:{field_id} for field-scoped Qdrant collections

ContextProvider / SharedContextPort

PRD-107 core/ports/context.py

The interface the field adapter implements. Validates PRD-107's design


3. Architecture

3.1 Qdrant Deployment

Config extension:

3.2 Field Data Model

Each mission field is a Qdrant collection. Each pattern is a point:

3.3 System Diagram


4. Core Operations

4.1 Operation 1: inject(pattern, strength)

Add an embedding to the shared field with metadata.

Helper: _find_by_hash()

4.2 Operation 2: query(embedding, top_k)

Retrieve resonant patterns by cosine similarity, weighted by decay + reinforcement.

4.3 Operation 3: decay() — Score-Time

Decay is NOT a periodic job. It's computed at query time within _compute_decayed_strength():

Decay calibration:

λ Value
Half-Life
Use Case

0.05

~14 hours

Long-running missions (multi-day research)

0.1

~7 hours

Default — standard mission duration

0.2

~3.5 hours

Fast-turnaround tasks

Start with λ=0.1. Run sensitivity analysis across {0.05, 0.1, 0.2} during the experiment.

4.4 Operation 4: reinforce(pattern_id) — Hebbian

When a pattern is accessed via query():

4.5 Operation 5: measure_stability()

Quantify field convergence — used for telemetry and experiment analysis.


5. VectorFieldSharedContext — Full Adapter

Implements PRD-107's SharedContextPort interface:


6. Experiment Design

6.1 Task Selection

The experiment task must be:

  • Multi-agent — requires at least 2 agents to collaborate

  • Context-dependent — later agents benefit from earlier agents' full context

  • Measurable — output quality can be objectively scored

  • Repeatable — same task can run multiple times for statistical significance

Task: "Research a topic and produce an analysis report"

Role
Agent
Actions

Researcher

Agent A

Web search + document analysis → inject findings into shared context

Analyst

Agent B

Query context for findings → produce structured analysis → inject analysis

Writer

Agent C

Query context for resonant patterns → produce final report

6.2 Topic Selection (5 Topics)

#
Topic
Why This Topic

1

EU AI Act compliance requirements for SaaS platforms

Complex regulation, multi-faceted, requires synthesis

2

Comparison of vector database architectures for production ML

Technical depth, multiple dimensions to compare

3

Impact of remote work policies on software team productivity

Mix of qualitative and quantitative data

4

State of AI agent frameworks: LangGraph vs CrewAI vs AutoGen

Directly relevant, verifiable claims

5

Carbon footprint reduction strategies for cloud infrastructure

Cross-domain (engineering + sustainability)

6.3 Experimental Conditions

Variable
Control (Message-Passing)
Treatment (Shared Field)

Context mechanism

RedisSharedContext (PRD-107) — key-value dict, no semantic ranking

VectorFieldSharedContext — Qdrant vectors, resonance scoring, decay

LLM model

Sonnet 4.6 for all roles

Sonnet 4.6 for all roles

Task description

Identical per topic

Identical per topic

Tool access

Same per role

Same per role

Token budget

Same per role

Same per role

Agent system prompts

Same per role

Same per role

6.4 Metrics

Metric
How Measured
Primary/Secondary

Information Retention

Count of Agent A's findings that appear in final output (manual + LLM-assisted)

Primary — core hypothesis test

Context Quality

Blind human eval (1-5 Likert): "Does the report reflect all relevant source findings?"

Primary

Task Accuracy

LLM-as-judge (PRD-103 rubric-scored) against reference answer

Primary

Token Efficiency

Total tokens consumed across all agents (from llm_usage)

Secondary

Latency

Wall-clock time from mission start to final output

Secondary

Cross-Agent Resonance

Count of field patterns accessed by >1 agent

Secondary (field condition only)

Field Stability

Convergence score at mission end via measure_stability()

Secondary (field condition only)

Embedding Cost

Number of inject() + query() API calls × embedding cost

Secondary

6.5 Success Criteria (Phase 3 Gate)

Criterion
Threshold
Rationale

Information Retention

Field retains ≥20% more of Agent A's findings

Core hypothesis: field fixes the telephone game

Context Quality

Human eval ≥0.5 points higher (on 5-point scale)

Perceptible quality improvement

Token Efficiency

Field uses ≤120% of message-passing tokens

Small overhead acceptable; >20% overhead = too expensive

Latency

Field completes in ≤150% of message-passing time

Embedding overhead must be bounded

Decision rules:

  • ALL four pass: Phase 3 validated. Proceed to PRDs 110-116.

  • Information retention fails: Core hypothesis is wrong. Reassess Phase 3 entirely.

  • Only token/latency fail: Field works but costs too much. Optimize embeddings before proceeding.

  • Only quality fails: Field preserves more context but doesn't improve output. Investigate prompt engineering.

6.6 Statistical Methodology

  • 5 topics × 2 conditions × 3 repetitions = 30 total runs

  • Paired comparison: Same topic in both conditions reduces topic-variance noise

  • Wilcoxon signed-rank test for non-parametric paired comparison (small sample)

  • Blind evaluation: Human raters don't know which condition produced which output

  • LLM-as-judge as secondary metric (PRD-103 rubric format) — cross-validated against human eval


7. Resonance Scoring

7.1 Core Formula

From Context Engineering Chapter 09:

Where:

  • cos(θ) = cosine similarity between embedding vectors A and B

  • |A|, |B| = decayed strength values of patterns A and B

Why squared cosine: Amplifies high-similarity pairs and suppresses noise.

Cosine Similarity
Resonance Factor (cos²)
Effect

0.95

0.90

Strong resonance — these patterns amplify each other

0.80

0.64

Moderate resonance

0.60

0.36

Weak resonance — barely above noise

0.40

0.16

Negligible — effectively filtered

7.2 Query-Time Resonance Scoring

For a query Q against field pattern P:

This is what query() uses to rank results. The resonance formula between two field patterns (for attractor detection) is:

7.3 Anti-Domination Safeguard

One strong pattern could dominate the field, drowning out everything else.

Mitigation: Cap resonance amplification:

  • Maximum strength after reinforcement: initial_strength × reinforce_cap (default 2.0×)

  • Co-access bonus capped at +2% per co-access event

  • Monitor max_strength / min_strength ratio per field — if >10×, log a warning


8. Telemetry & Experiment Data

8.1 Per-Experiment Telemetry

Every experiment run produces structured data via PRD-106 mission_events:

Event Type
Data Captured

field.created

{field_id, team_size, initial_data_count}

field.injected

{field_id, agent_id, key, strength, content_hash}

field.queried

{field_id, agent_id, query_preview, results_count, top_score}

field.reinforced

{field_id, pattern_ids, access_counts}

field.stability

{field_id, stability, pattern_count, active, decayed}

field.destroyed

{field_id, final_pattern_count, total_queries, total_injects}

8.2 Experiment Results Table

8.3 Analysis Queries


9. Cost Analysis

9.1 Embedding Costs

Operation
Calls per Mission
Cost per Call
Total

inject() — Researcher (20 findings)

20

~$0.001

$0.020

inject() — Analyst (15 analyses)

15

~$0.001

$0.015

query() — Analyst (5 queries)

5

~$0.001

$0.005

query() — Writer (5 queries)

5

~$0.001

$0.005

query() — Coordinator (3 status checks)

3

~$0.001

$0.003

Total embedding overhead per mission

48

$0.048

9.2 Comparison with LLM Costs

A typical 3-agent mission with Sonnet 4.6 costs ~$0.15-0.50 in LLM calls. The field's embedding overhead ($0.048) is 10-30% on top — within the ≤120% token efficiency threshold.

9.3 Qdrant Storage

At prototype scale (30 experiment runs × 50 patterns × 2048 floats × 4 bytes = ~12MB), storage is negligible. Qdrant's free tier handles millions of vectors.


10. Cross-PRD Integration

PRD
Integration
Notes

PRD-107

VectorFieldSharedContext implements SharedContextPort. Validates PRD-107's interface design

If the interface doesn't feel right during implementation, update PRD-107 before Phase 3

PRD-102

Coordinator manages field lifecycle: create_context() at mission start, destroy_context() at end, query() for progress assessment

Coordinator code is identical for both conditions — only adapter differs

PRD-103

Experiment's LLM-as-judge scoring uses PRD-103's rubric format for task_accuracy_score

Consistent quality measurement methodology

PRD-106

Field events (field.created, field.injected, etc.) flow into mission_events telemetry

First real data for the telemetry pipeline

PRD-105

Embedding costs counted against mission budget

inject() and query() embedding calls tracked in llm_usage with request_type='embedding'

Phase 3 (110-116)

Experiment results are the go/no-go gate. Pass/fail criteria determine entire Phase 3 roadmap

This is the most important deliverable of PRD-108


11. Risk Register

#
Risk
Impact
Likelihood
Mitigation

1

Prototype is just "RAG with extra steps" — resonance adds no value beyond cosine similarity

High

Medium

Include decay + reinforcement. If results match plain RAG, the unique mechanisms aren't contributing — honest negative result

2

Over-engineering — building attractor dynamics, bifurcation, multi-field coupling

Medium

Medium

Hard scope: 5 operations only. No attractors, no coupling, no emergence. Test the minimum hypothesis

3

Wrong experiment task — doesn't exercise field's advantages

High

Medium

Choose tasks where context preservation is critical. Validate with dry run. 5 topics provide diversity

4

Confirmation bias — desire for Phase 3 biases evaluation

Medium

High

Blind human evaluation. LLM-as-judge as secondary cross-validation. Pre-registered pass/fail thresholds

5

Embedding quality bottleneck — qwen3 produces poor domain embeddings

High

Low

Sanity check first: known-similar texts should have similarity >0.8. Switch to OpenAI text-embedding-3-large if needed

6

Qdrant deployment complexity on Railway

Medium

Low

Docker single-command deploy. Persistent volume for data. :memory: mode for local dev. Fallback to FAISS if Railway deployment fails

7

Decay rate miscalibration — λ=0.1 too fast or too slow

Medium

Medium

Sensitivity analysis across λ ∈ {0.05, 0.1, 0.2}. Pick the λ that maximizes information retention

8

Uncontrolled resonance amplification — one pattern dominates

Medium

Medium

Reinforcement cap (2.0×), co-access bonus cap (+2%), strength ratio monitoring

9

"Neural field resonance" is just cosine similarity rebranded

High

Medium

The novelty is: (a) decay removes stale info, (b) reinforcement amplifies co-accessed patterns. If (a)+(b) don't improve results, accept the result honestly

10

Small sample size (30 runs) lacks statistical power

Medium

Medium

Wilcoxon signed-rank is designed for small paired samples. Effect size (d>0.5) more important than p-value. If results are ambiguous, run 30 more


12. Acceptance Criteria

Must Have

Should Have

Nice to Have


Appendix A: Research Sources

Source
What It Informed

Qdrant (qdrant/qdrant)

Payload filtering, Recommendations API, Docker deploy, :memory: testing mode

FAISS (facebookresearch/faiss)

IndexFlatL2 exact search baseline, thread-safety limitations

Redis Vector Search (Redis Stack)

Eliminated — Railway Redis is vanilla, no Stack modules

Ebbinghaus Forgetting Curve (1885)

Exponential decay formula S(t) = S₀ × e^(-λt)

Hebb, Organization of Behavior (1949)

Co-access reinforcement pattern — "fire together, wire together"

Elasticsearch decay functions

Score-time decay application (no deletion)

Kornell & Bjork (2008)

Spaced repetition — re-access resets decay clock

Context Engineering, Ch. 08

8 core field operations, boundary permeability

Context Engineering, Ch. 09

Resonance formula, decay formula, attractor protection

Context Engineering, Ch. 10

Multi-field operations (deferred to Phase 3)

Context Engineering, Ch. 11

Stability measurement, gradient convergence (simplified for prototype)

Automatos memory_types.py:65

Existing exponential decay with access_count boost

Automatos inter_agent.py:400-649

SharedContextManager — Phase 2 baseline

Automatos embedding_manager.py

qwen3-embedding-8b, 2048-dim, batch support

Automatos config.py

MEMORY_DECAY_RATE=0.1, consistent decay parameters

Railway community (station.railway.com)

Confirmed vanilla Redis — no Redis Stack support

Last updated