PRD 08: Universal RAG & Semantic Search System

Version 2.0 - Supercharged with Kimia Context Engineering + LangChain

1. Overview

Purpose

Transform the RAG system from basic vector-only search into a production-grade, universal retrieval system using:

  • David Kimia's Context Engineering principles (hierarchical chunking, cognitive formatting)

  • LangChain's advanced retrievers (hybrid search, reranking)

  • IBM Zurich Cognitive Tools research (structured context scaffolding)

What Was Wrong (v1.0)

Problem
Impact
Root Cause

Empty header chunks ranked high

Useless results like "### 3.1 Agent Flow"

Fixed-size chunking broke semantic units

No keyword matching

Missed exact term matches

Vector-only search

No quality filtering

Garbage results returned

No reranking stage

Context without structure

Hard for LLM to reason

Chunks dumped without formatting

Single retrieval method

Limited coverage

No hybrid approach

What We're Building (v2.0)

┌─────────────────────────────────────────────────────────────────┐
│            UNIVERSAL RAG SERVICE v2.0                           │
│    (Chatbot, Agents, Search, Context Engineering, Workflows)    │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────┐    ┌──────────────┐    ┌─────────────────┐   │
│  │   INGEST    │    │   RETRIEVE   │    │    FORMAT       │   │
│  ├─────────────┤    ├──────────────┤    ├─────────────────┤   │
│  │ 1. Markdown │    │ 1. Query     │    │ 1. Cognitive    │   │
│  │    Header   │    │    Transform │    │    Structure    │   │
│  │    Split    │    │    (expand)  │    │                 │   │
│  │             │    │              │    │ 2. Source       │   │
│  │ 2. Parent/  │    │ 2. Hybrid    │    │    Citations    │   │
│  │    Child    │    │    Search    │    │                 │   │
│  │    Storage  │    │    (V+BM25)  │    │ 3. Token        │   │
│  │             │    │              │    │    Budget       │   │
│  │ 3. Quality  │    │ 3. Rerank    │    │                 │   │
│  │    Filter   │    │    (Cross-   │    │                 │   │
│  │             │    │     Encoder) │    │                 │   │
│  └─────────────┘    └──────────────┘    └─────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

2. Research Foundation

2.1 David Kimia's Context Engineering

From Context Engineering Repositoryarrow-up-right:

"Context is not just the single prompt users send to an LLM. Context is the complete information payload provided at inference time."

Key Principles Applied:

  1. Hierarchical Chunking - Semantic boundaries with parent/child relationships

  2. Hybrid Search - Vector similarity + keyword matching (BM25)

  3. Cognitive Tools - Structured prompts that scaffold reasoning

  4. Query Transformation - Expand and reformulate queries

2.2 IBM Zurich Cognitive Tools Research

From Eliciting Reasoning in Language Modelsarrow-up-right:

"Cognitive tools break down the problem by identifying main concepts, extracting relevant information, and highlighting meaningful properties."

Applied to RAG:

  • Format retrieved context with structure (headers, sources, relevance)

  • Don't just dump chunks - scaffold reasoning

  • Use structured formats (Markdown, JSON) for better LLM parsing

2.3 LangChain Components

Component
Purpose

MarkdownHeaderTextSplitter

Keep headers WITH their content

ParentDocumentRetriever

Store small chunks, return parent context

EnsembleRetriever

Combine vector + BM25 search

BM25Retriever

Keyword/exact match search

FlashrankRerank

Cross-encoder reranking (FREE)

ContextualCompressionRetriever

Quality filtering after retrieval


3. Technical Architecture

3.1 Chunking Pipeline (SmartChunker)

Problem Solved: Empty headers, broken semantic units, no context

3.2 Hybrid Retriever

Problem Solved: Vector search misses exact keyword matches

3.3 Reranker

Problem Solved: Initial retrieval returns garbage that matches keywords but doesn't answer query

3.4 Cognitive Context Formatter

Problem Solved: Dumping chunks without structure makes it hard for LLM to reason

3.5 Universal RAG Service

The main service used by ALL components:


4. Database Schema Updates

4.1 Enhanced document_chunks Table

4.2 RAG Configuration Table


5. API Endpoints

5.1 Universal RAG Retrieve

Request:

Response:

5.2 Document Re-indexing

Triggers full re-chunking and re-embedding with new SmartChunker.


6. Implementation Phases

Phase 1: Smart Chunking (3 hours)

Phase 2: Hybrid Search (2 hours)

Phase 3: Reranking (1 hour)

Phase 4: Cognitive Formatting (1.5 hours)

Phase 5: Universal Service (2 hours)

Phase 6: Re-index & Test (1.5 hours)

Total: ~11 hours


7. Success Criteria

7.1 Chunking Quality

7.2 Retrieval Quality

7.3 Performance

7.4 Universal Usage


8. Testing Queries

After implementation, test with:

Query
Expected Top Result

"How does AgentFactory work?"

Actual AgentFactory code with create_agent() method

"Show me agent creation flow"

Diagram + explanation from AGENT_FLOW_GUIDE.md

"database schema"

SQL schema definitions, not just mentions

"workflow execution"

WorkflowExecutor class code

"RAG retrieval"

This PRD or RAG service code


9. Migration from v1.0

Steps:

  1. Stop backend

  2. Run database migrations (add new columns)

  3. Delete all document_chunks (will re-create)

  4. Deploy new code

  5. Start backend

  6. Trigger re-index via API or upload all docs again

Rollback:

If issues occur, revert to v1.0 by:

  1. Dropping new columns

  2. Re-deploying old code

  3. Re-importing documents with old chunker


10. Future Enhancements

v2.1 (Next iteration)

v2.2 (Future)


This PRD transforms RAG from "barely working" to "production-grade" using proven research and battle-tested LangChain components.

Last updated