PRD-28: Vercel AI SDK Migration
1. Executive Summary
This document outlines the plan to migrate the Automatos AI Platform's workflow streaming architecture from a custom Server-Sent Events (SSE) implementation to the Vercel AI SDK. This migration aims to resolve persistent UI latency issues ("chunking/lag"), standardize the streaming protocol, and enable advanced UX features like smooth token streaming, automatic reconnections, and rich UI interactions.
2. Problem Statement
The current workflow streaming implementation relies on a custom SSE setup (WorkflowStageTracker -> Redis/Memory -> SSE Endpoint -> EventSource). While functional, it suffers from:
Latency & Jitter: Events often arrive in bursts due to buffering at various network layers (proxies, Nginx, browser), causing a "laggy" feel.
Complexity: Maintaining custom connection management, heartbeats, and error recovery logic is error-prone.
Limited UX: Implementing "typing effects" or smooth token updates requires significant custom frontend logic.
Synchronization Issues: Disconnects between backend state and frontend UI (e.g., "stuck" stages) due to missing events or race conditions.
3. Proposed Solution: Vercel AI SDK
We will adopt the Vercel AI SDK (specifically the Data Stream Protocol) as the standard for all real-time communication between the Orchestrator and the Frontend.
Key Benefits
Standardized Protocol: Uses a robust, text-based protocol for streaming text, data, and tool calls.
Optimized Streaming: Designed specifically to minimize latency and handle token-by-token updates smoothly.
Resilience: Built-in automatic reconnection and error handling.
Developer Experience: Simple hooks (
useChat,useCompletion) replace complexEventSourcemanagement.
4. Architecture Design
4.1 Data Stream Protocol
The backend will emit events in the AI SDK's Data Stream Protocol format. Each chunk is a line of text:
0: Text delta (for LLM tokens)d: Data payload (for workflow stage updates, logs, JSON objects)e: Error information
4.2 Backend Architecture (FastAPI)
We will create a generic StreamAdapter that converts our internal workflow events into the AI SDK format.
Current Flow: WorkflowStageTracker -> WorkflowStreamManager -> SSE Generator (Custom JSON)
New Flow: WorkflowStageTracker -> AISDKStreamAdapter -> StreamingResponse (AI SDK Protocol)
Code Example (Adapter):
4.3 Frontend Architecture (Next.js)
We will replace the custom useWorkflowExecution hook with the AI SDK's useChat or useCompletion hooks.
Integration:
5. Implementation Plan
Phase 1: Backend Adapter (1-2 Days)
Install
aipackage (if Python SDK exists) or implement protocol manually (simple JSON wrapping).Create
AISDKStreamAdapterclass inservices/workflow_streaming_service.py.Create new endpoint
/api/workflows/{id}/stream/aisdkthat uses this adapter.Ensure both old SSE and new AI SDK endpoints work in parallel (for safe migration).
Phase 2: Frontend Integration (1-2 Days)
Install
aipackage:npm install ai.Create a Next.js Route Handler (
app/api/chat/route.ts) to proxy requests to the FastAPI backend (avoids CORS issues and handles streaming headers correctly).Create a new component
WorkflowStreamViewerusinguseChat.Map the
datastream to the existingExecutionTheaterstate (stages, logs).
Phase 3: Verification & Switchover (1 Day)
Run side-by-side comparison of old vs. new streaming.
Verify latency improvement (measure time-to-first-token and stage update delay).
Deprecate old SSE endpoint.
Remove legacy
WorkflowStreamManagercode.
6. Migration Strategy
Dual-Stack: We will keep the current SSE implementation running while building the AI SDK implementation.
Feature Flag: Use a feature flag or a separate URL route to toggle between the two streaming methods in the UI.
Rollback: If issues arise, we can instantly revert to the SSE implementation.
7. Success Metrics
Latency: < 50ms latency for token updates.
Reliability: Zero "stuck" stages due to missed events.
Code Quality: Reduction in custom streaming code (backend & frontend).
Last updated

