PRD-108 A/B Evidence

Purpose

This document repackages the current PRD-108 evidence into an investor-safe empirical summary.

It is intentionally narrower than a paper. It should be read as:

evidence that the mechanism is real
evidence that the mechanism improves retrieval in a controlled scenario
not proof of universal superiority across all missions

Evidence Sources

Primary sources:

automatos-ai/docs/PRD-108-ALGORITHMS.md
automatos-ai/docs/PRD-108-IMPLEMENTATION.md
automatos-ai/docs/PRD-108-TECHNICAL-DISCLOSURE.md

Executable references:

automatos-ai/orchestrator/tests/test_vector_field.py
automatos-ai/orchestrator/tests/demo_field_stress.py
automatos-ai/orchestrator/tests/demo_ab_comparison.py

Evidence Layer 1: Implementation Verification

The core mechanism is documented as implemented and traceable to code:

SharedContextPort defines the common interface
VectorFieldSharedContext implements the PRD-108 field
RedisSharedContext provides the message-passing baseline
the mission coordinator creates, seeds, uses, and destroys the field
field tools are exposed to agents during execution

This matters because the A/B comparison is not just conceptual. It is tied to a real baseline/backend swap.

Evidence Layer 2: Unit and Stress Results

Unit tests

The source docs report 57 passing unit tests covering:

decay math
create/destroy lifecycle
inject with deduplication
query ranking
Hebbian reinforcement
stability measurement
helper behavior

Stress and integration-style assertions

The source docs report 16 passing assertions, including:

resonance ranking
24h decay from 1.0000 to 0.0907
reinforcement advantage for frequently accessed patterns
archival threshold behavior
stability changes under mixed ages
150-pattern / 50-agent scenario
controlled cross-agent visibility scenario

Evidence Layer 3: Controlled A/B Demonstration

The strongest current comparative evidence is the script:

automatos-ai/orchestrator/tests/demo_ab_comparison.py

The script runs the same scenario against two backends:

run_vector_field(...)
run_redis_baseline(...)

Experimental setup

3-agent mission scenario
10 research findings
3 analyses
7 downstream queries from Agent C
same inputs for both backends

Important limitation:

the script uses fake_embed(...) with DIM = 128 and word-overlap semantics for the demo
this means the A/B is a controlled mechanism test, not a production-grade benchmark with full embedding stack realism

That is acceptable for early proof, but it should be disclosed clearly.

Reported Comparative Results

From PRD-108-ALGORITHMS.md as documented in the March 21, 2026 PRD-108 evidence set:

Metric

Vector Field

Message Passing

Context coverage

86% (6/7)

43% (3/7)

Information loss

1 finding

4 findings

Patterns visible to Agent C

Interpretation:

the vector field preserved broader downstream visibility
the baseline exposed only what Agent B forwarded
the vector field answered more of Agent C's downstream needs

What This Evidence Supports

The current A/B evidence supports these claims:

A shared semantic field can reduce information loss caused by sequential forwarding.
Backend swapping through a common interface enables direct comparison with a control.
The field design improves downstream retrieval in the tested scenario.

What This Evidence Does Not Yet Prove

The current evidence does not yet prove:

superiority across all mission types
superiority across all embedding models
production-grade latency/performance in realistic networked deployment
end-to-end business outcome improvements across many workflows

Investor-Safe Summary

The right way to describe the A/B result is:

In a controlled three-agent comparison using the same scenario and a shared interface, the PRD-108 vector-field backend recovered materially more downstream-relevant information than the message-passing baseline.

The wrong way to describe it is:

PRD-108 has conclusively proven that message passing is obsolete.

Strongest Current Evidence Sentence

PRD-108 already has early comparative evidence that its coordination model reduces downstream information loss relative to a simpler message-passing baseline in a controlled scenario.

Next Validation Step

The next credibility upgrade should be a broader A/B package with:

more mission types
real embedding model configuration
explicit scoring rubric
raw outputs
reproducible command log

That would turn this from promising internal evidence into advisor-grade comparative validation.

PreviousPRD-108 Prior-Art Comparison NextPRD-108 Reproducibility Guide

Last updated 11 days ago

Good evening

hashtagPurpose

hashtagEvidence Sources

hashtagEvidence Layer 1: Implementation Verification

hashtagEvidence Layer 2: Unit and Stress Results

hashtagUnit tests

hashtagStress and integration-style assertions

hashtagEvidence Layer 3: Controlled A/B Demonstration

hashtagExperimental setup

hashtagReported Comparative Results

hashtagWhat This Evidence Supports

hashtagWhat This Evidence Does Not Yet Prove

hashtagInvestor-Safe Summary

hashtagStrongest Current Evidence Sentence

hashtagNext Validation Step