PRD-11: CodeGraph Implementation & Integration

Status: Draft Priority: High Effort: 12-16 hours Dependencies: None (standalone feature)


1. Overview

1.1 Purpose

CodeGraph transforms client codebases into AI-readable knowledge graphs, enabling agents to access precise code context instead of entire repositories. This is NOT for Automatos's own code (though we can index it for meta purposes) - it's for client/user codebases.

1.2 Vision Alignment

Following the Context Engineering paradigm:

  • Atoms: Individual code symbols (functions, classes)

  • Molecules: Symbol relationships (calls, imports, dependencies)

  • Cells: Contextual code chunks with embeddings

  • Organs: Multi-file analysis with call graphs

  • Organisms: Complete codebase understanding

1.3 Value Proposition

"Turn any codebase into an AI-readable knowledge graph. Instead of dumping 10,000 files into context, agents get laser-focused, relevant code snippets."

Real-World Impact:

  • 3 weeks → 30 minutes: Developer onboarding time

  • 2-3 days → 2 minutes: Security review automation

  • 4 weeks → 3 days: Legacy migration projects


2. Problem Statement

2.1 Current State

What Exists:

  • Basic UI in CodeGraphPanel.tsx (69 lines)

  • Backend API stubs exist but incomplete

  • Database schema not fully implemented

  • No multi-project support

  • No GitHub integration

  • No analytics

What's Missing:

  • Multi-source indexing (GitHub, GitLab, local)

  • Background re-indexing

  • Call graph analysis

  • Complexity heatmaps

  • Query analytics

  • Workflow integration

  • Chatbot integration

2.2 Business Impact

Without CodeGraph:

  • Agents can't access client code intelligently

  • Manual code review takes days

  • No automated legacy migration

  • No intelligent refactoring

  • No code-aware chatbot assistance


3. Success Criteria

3.1 Functional Requirements

3.2 Performance Requirements

3.3 Quality Requirements


4. Functional Requirements

4.1 Code Indexing

4.1.1 Multi-Source Support

Local Directory:

GitHub Repository:

GitLab Repository:

4.1.2 Language Support

Language
Parser
Status
Features

Python

tree-sitter-python

✅ Ready

Classes, functions, imports, decorators

TypeScript

tree-sitter-typescript

✅ Ready

Interfaces, types, classes, exports

JavaScript

tree-sitter-javascript

✅ Ready

Functions, classes, modules

Go

tree-sitter-go

⚠️ Partial

Functions, structs, interfaces

Rust

tree-sitter-rust

⚠️ Partial

Functions, structs, traits, impls

Java

tree-sitter-java

⚠️ Partial

Classes, methods, interfaces

4.1.3 Symbol Extraction

Extract and index:

  • Functions/Methods: Name, parameters, return type, docstring

  • Classes/Structs: Name, inheritance, methods, properties

  • Imports: Dependencies, source modules

  • Variables: Global/class-level constants

  • Types: Interfaces, enums, type aliases

  • Comments: Documentation strings

4.1.4 Relationship Mapping

Build graphs of:

  • Call Graph: Function A calls Function B

  • Import Graph: Module A imports Module B

  • Inheritance Graph: Class A extends Class B

  • Dependency Graph: Package A depends on Package B

Response:

Response: Returns semantically relevant code chunks with embeddings.

4.2.3 Call Graph Queries

Response:

4.3 Project Management

4.3.1 List Projects

Response:

4.3.2 Delete Project

Removes all indexed data permanently.

4.3.3 Re-index Project

4.4 Analytics

4.4.1 Query Analytics

Track:

  • Most queried files

  • Popular search terms

  • Average query latency

  • Search success rate

4.4.2 Complexity Metrics

Response:


5. Technical Architecture

5.1 System Architecture

5.2 Database Schema


6. Implementation Details

6.1 Indexing Pipeline

6.2 Search Implementation


7. Workflow Integration

7.1 Automatic Context Injection

When a workflow includes codegraph_project in its context, agents automatically get relevant code:

7.2 Workflow Example

Agent automatically receives:

  • Existing authentication patterns

  • Database query examples

  • Security middleware code

  • Related test cases


8. Chatbot Integration

8.1 Chat Interface with Code Context

Response:


9. API Endpoints Summary

Endpoint
Method
Purpose

/api/code-graph/index

POST

Index new project

/api/code-graph/projects

GET

List all projects

/api/code-graph/projects/{id}

GET

Get project details

/api/code-graph/projects/{id}

DELETE

Delete project

/api/code-graph/projects/{id}/reindex

POST

Re-index project

/api/code-graph/search

GET

Symbol search

/api/code-graph/search

POST

Semantic search

/api/code-graph/call-graph

GET

Generate call graph

/api/code-graph/analytics/queries

GET

Query analytics

/api/code-graph/analytics/complexity

GET

Code complexity metrics

/api/code-graph/health

GET

System health check


10. Implementation Timeline

Phase 1: Core Indexing (Week 1)

Day 1-2: Database schema + basic indexer

  • Create tables

  • Implement file discovery

  • Basic tree-sitter integration

Day 3-4: Symbol extraction

  • Python parser

  • TypeScript parser

  • Symbol storage

Day 5: Testing

  • Index test projects

  • Verify symbol extraction

  • Performance testing

Phase 2: Search & Relationships (Week 2)

Day 1-2: Search implementation

  • Symbol search

  • Semantic search

  • Query API

Day 3-4: Relationship mapping

  • Call graph builder

  • Dependency tracking

  • Graph queries

Day 5: Analytics

  • Query tracking

  • Complexity metrics

  • Dashboard integration

Phase 3: Integration (Week 3)

Day 1-2: Workflow integration

  • Context injection

  • Agent access

  • Testing with workflows

Day 3-4: Multi-source support

  • GitHub cloning

  • GitLab support

  • Credential management

Day 5: UI enhancement

  • Advanced UI components

  • Project management

  • Analytics visualization


11. Success Metrics

11.1 Performance

  • Index 10K lines: <10s

  • Symbol search: <100ms

  • Semantic search: <500ms

  • Call graph generation: <1s

11.2 Quality

  • Symbol extraction accuracy: >95%

  • Search relevance: >85%

  • Relationship accuracy: >90%

11.3 Adoption

  • Projects indexed per user: >3

  • Queries per day: >50

  • Agent usage rate: >70%


12. Risk Mitigation

12.1 Technical Risks

  • Large repos: Implement incremental indexing

  • Parse errors: Graceful degradation, skip unparseable files

  • Storage costs: Cleanup old/unused projects automatically

  • Performance: Cache queries, optimize indexes

12.2 Quality Risks

  • Search relevance: User feedback loops, tuning

  • Symbol accuracy: Language-specific testing

  • Relationship mapping: Validate with known codebases


13. Dependencies

  • tree-sitter: v0.20+ (symbol parsing)

  • tree-sitter-languages: Language grammar support

  • networkx: v3.0+ (graph analysis)

  • pgvector: PostgreSQL extension (vector search)

  • GitPython: v3.1+ (Git repository handling)


14. Out of Scope (Future Enhancements)

  • Bitbucket support (add later)

  • Real-time file watching (webhook-based)

  • Code diff analysis

  • Historical code search (git blame integration)

  • Multi-language projects in single query

  • Custom language parsers

  • AI-powered code suggestions

  • Automated refactoring suggestions


15. Acceptance Criteria

15.1 Functional

15.2 Non-Functional

15.3 Quality


Total Effort: 12-16 hours (2 weeks) Priority: High (enables code-aware agents) ROI: Massive (3 weeks → 30 min onboarding, 2-3 days → 2 min reviews)

This PRD enables the complete CodeGraph system from indexing to workflow integration, transforming how AI agents interact with code.

Last updated