PRD-108 A/B Evidence
Purpose
This document repackages the current PRD-108 evidence into an investor-safe empirical summary.
It is intentionally narrower than a paper. It should be read as:
evidence that the mechanism is real
evidence that the mechanism improves retrieval in a controlled scenario
not proof of universal superiority across all missions
Evidence Sources
Primary sources:
automatos-ai/docs/PRD-108-ALGORITHMS.mdautomatos-ai/docs/PRD-108-IMPLEMENTATION.mdautomatos-ai/docs/PRD-108-TECHNICAL-DISCLOSURE.md
Executable references:
automatos-ai/orchestrator/tests/test_vector_field.pyautomatos-ai/orchestrator/tests/demo_field_stress.pyautomatos-ai/orchestrator/tests/demo_ab_comparison.py
Evidence Layer 1: Implementation Verification
The core mechanism is documented as implemented and traceable to code:
SharedContextPortdefines the common interfaceVectorFieldSharedContextimplements the PRD-108 fieldRedisSharedContextprovides the message-passing baselinethe mission coordinator creates, seeds, uses, and destroys the field
field tools are exposed to agents during execution
This matters because the A/B comparison is not just conceptual. It is tied to a real baseline/backend swap.
Evidence Layer 2: Unit and Stress Results
Unit tests
The source docs report 57 passing unit tests covering:
decay math
create/destroy lifecycle
inject with deduplication
query ranking
Hebbian reinforcement
stability measurement
helper behavior
Stress and integration-style assertions
The source docs report 16 passing assertions, including:
resonance ranking
24h decay from
1.0000to0.0907reinforcement advantage for frequently accessed patterns
archival threshold behavior
stability changes under mixed ages
150-pattern / 50-agent scenario
controlled cross-agent visibility scenario
Evidence Layer 3: Controlled A/B Demonstration
The strongest current comparative evidence is the script:
automatos-ai/orchestrator/tests/demo_ab_comparison.py
The script runs the same scenario against two backends:
run_vector_field(...)run_redis_baseline(...)
Experimental setup
3-agent mission scenario
10 research findings
3 analyses
7 downstream queries from Agent C
same inputs for both backends
Important limitation:
the script uses
fake_embed(...)withDIM = 128and word-overlap semantics for the demothis means the A/B is a controlled mechanism test, not a production-grade benchmark with full embedding stack realism
That is acceptable for early proof, but it should be disclosed clearly.
Reported Comparative Results
From PRD-108-ALGORITHMS.md as documented in the March 21, 2026 PRD-108 evidence set:
Context coverage
86% (6/7)
43% (3/7)
Information loss
1 finding
4 findings
Patterns visible to Agent C
13
6
Interpretation:
the vector field preserved broader downstream visibility
the baseline exposed only what Agent B forwarded
the vector field answered more of Agent C's downstream needs
What This Evidence Supports
The current A/B evidence supports these claims:
A shared semantic field can reduce information loss caused by sequential forwarding.
Backend swapping through a common interface enables direct comparison with a control.
The field design improves downstream retrieval in the tested scenario.
What This Evidence Does Not Yet Prove
The current evidence does not yet prove:
superiority across all mission types
superiority across all embedding models
production-grade latency/performance in realistic networked deployment
end-to-end business outcome improvements across many workflows
Investor-Safe Summary
The right way to describe the A/B result is:
In a controlled three-agent comparison using the same scenario and a shared interface, the PRD-108 vector-field backend recovered materially more downstream-relevant information than the message-passing baseline.
The wrong way to describe it is:
PRD-108 has conclusively proven that message passing is obsolete.
Strongest Current Evidence Sentence
PRD-108 already has early comparative evidence that its coordination model reduces downstream information loss relative to a simpler message-passing baseline in a controlled scenario.
Next Validation Step
The next credibility upgrade should be a broader A/B package with:
more mission types
real embedding model configuration
explicit scoring rubric
raw outputs
reproducible command log
That would turn this from promising internal evidence into advisor-grade comparative validation.
Last updated

