PRD-108 A/B Evidence

Purpose

This document repackages the current PRD-108 evidence into an investor-safe empirical summary.

It is intentionally narrower than a paper. It should be read as:

  • evidence that the mechanism is real

  • evidence that the mechanism improves retrieval in a controlled scenario

  • not proof of universal superiority across all missions

Evidence Sources

Primary sources:

  • automatos-ai/docs/PRD-108-ALGORITHMS.md

  • automatos-ai/docs/PRD-108-IMPLEMENTATION.md

  • automatos-ai/docs/PRD-108-TECHNICAL-DISCLOSURE.md

Executable references:

  • automatos-ai/orchestrator/tests/test_vector_field.py

  • automatos-ai/orchestrator/tests/demo_field_stress.py

  • automatos-ai/orchestrator/tests/demo_ab_comparison.py

Evidence Layer 1: Implementation Verification

The core mechanism is documented as implemented and traceable to code:

  • SharedContextPort defines the common interface

  • VectorFieldSharedContext implements the PRD-108 field

  • RedisSharedContext provides the message-passing baseline

  • the mission coordinator creates, seeds, uses, and destroys the field

  • field tools are exposed to agents during execution

This matters because the A/B comparison is not just conceptual. It is tied to a real baseline/backend swap.

Evidence Layer 2: Unit and Stress Results

Unit tests

The source docs report 57 passing unit tests covering:

  • decay math

  • create/destroy lifecycle

  • inject with deduplication

  • query ranking

  • Hebbian reinforcement

  • stability measurement

  • helper behavior

Stress and integration-style assertions

The source docs report 16 passing assertions, including:

  • resonance ranking

  • 24h decay from 1.0000 to 0.0907

  • reinforcement advantage for frequently accessed patterns

  • archival threshold behavior

  • stability changes under mixed ages

  • 150-pattern / 50-agent scenario

  • controlled cross-agent visibility scenario

Evidence Layer 3: Controlled A/B Demonstration

The strongest current comparative evidence is the script:

  • automatos-ai/orchestrator/tests/demo_ab_comparison.py

The script runs the same scenario against two backends:

  1. run_vector_field(...)

  2. run_redis_baseline(...)

Experimental setup

  • 3-agent mission scenario

  • 10 research findings

  • 3 analyses

  • 7 downstream queries from Agent C

  • same inputs for both backends

Important limitation:

  • the script uses fake_embed(...) with DIM = 128 and word-overlap semantics for the demo

  • this means the A/B is a controlled mechanism test, not a production-grade benchmark with full embedding stack realism

That is acceptable for early proof, but it should be disclosed clearly.

Reported Comparative Results

From PRD-108-ALGORITHMS.md as documented in the March 21, 2026 PRD-108 evidence set:

Metric
Vector Field
Message Passing

Context coverage

86% (6/7)

43% (3/7)

Information loss

1 finding

4 findings

Patterns visible to Agent C

13

6

Interpretation:

  • the vector field preserved broader downstream visibility

  • the baseline exposed only what Agent B forwarded

  • the vector field answered more of Agent C's downstream needs

What This Evidence Supports

The current A/B evidence supports these claims:

  1. A shared semantic field can reduce information loss caused by sequential forwarding.

  2. Backend swapping through a common interface enables direct comparison with a control.

  3. The field design improves downstream retrieval in the tested scenario.

What This Evidence Does Not Yet Prove

The current evidence does not yet prove:

  1. superiority across all mission types

  2. superiority across all embedding models

  3. production-grade latency/performance in realistic networked deployment

  4. end-to-end business outcome improvements across many workflows

Investor-Safe Summary

The right way to describe the A/B result is:

In a controlled three-agent comparison using the same scenario and a shared interface, the PRD-108 vector-field backend recovered materially more downstream-relevant information than the message-passing baseline.

The wrong way to describe it is:

PRD-108 has conclusively proven that message passing is obsolete.

Strongest Current Evidence Sentence

PRD-108 already has early comparative evidence that its coordination model reduces downstream information loss relative to a simpler message-passing baseline in a controlled scenario.

Next Validation Step

The next credibility upgrade should be a broader A/B package with:

  • more mission types

  • real embedding model configuration

  • explicit scoring rubric

  • raw outputs

  • reproducible command log

That would turn this from promising internal evidence into advisor-grade comparative validation.

Last updated