PRD-27: Multi-Provider LLM Integration (AWS Bedrock & HuggingFace)

Status: ✅ Implementation Complete Created: 2025-11-08 Priority: High Estimated Effort: 8 hours Actual Effort: 6 hours


📋 Overview

Purpose

Integrate AWS Bedrock and HuggingFace as additional LLM providers to reduce costs and provide more model options for agents, while maintaining the existing OpenAI, Anthropic, and Google provider support.

Business Value

  • Cost Reduction: 80-95% cost savings using AWS Bedrock models

  • Model Diversity: Access to 16+ models across 6 providers

  • Flexibility: Per-agent model configuration for optimal cost/performance balance

  • Scalability: Hybrid strategy (premium for orchestration, cost-effective for subtasks)

Key Results (KRs)

  • ✅ KR1: AWS Bedrock provider fully integrated with 4 models

  • ✅ KR2: HuggingFace provider integrated with 3 models

  • ✅ KR3: Frontend UI updated to display all providers

  • ✅ KR4: Database seeding updated with 16 total models

  • ⏳ KR5: Cost tracking per provider implemented

  • ⏳ KR6: Testing and validation complete


🎯 Goals

Primary Goals

  1. ✅ Add AWS Bedrock as a new LLM provider

  2. ✅ Expand HuggingFace model offerings

  3. ✅ Update frontend to support new providers

  4. ✅ Maintain backward compatibility with existing agents

  5. ⏳ Enable cost-effective model selection for agents

Non-Goals

  • Removing support for existing providers (OpenAI, Anthropic, Google)

  • Auto-migration of existing agents to new providers

  • Custom model hosting infrastructure


🏗️ Technical Architecture

System Components Modified

1. Backend LLM Provider System

Files:

  • orchestrator/services/llm_provider/clients/base.py - Added AWS_BEDROCK enum

  • orchestrator/services/llm_provider/clients/bedrock_client.py - NEW Bedrock implementation

  • orchestrator/services/llm_provider/clients/__init__.py - Exported BedrockProvider

  • orchestrator/services/llm_provider/manager.py - Registered Bedrock routing

2. Database Models

Files:

  • orchestrator/seeds/seed_models.py - Added 7 new models

New Models:

3. Frontend UI Components

Files:

  • frontend/components/agents/model-selector.tsx - Provider color mappings

  • frontend/components/agents/agent-roster.tsx - Provider badges

  • All agent management components updated


🔧 Implementation Details

AWS Bedrock Provider

Model Family Support

Key Features

  • Unified API: Single client for multiple model families

  • Cost Optimization: Model ID mappings for easy switching

  • Function Calling: Native support for Claude, prompt-based for others

  • Error Handling: Retry logic and graceful fallbacks

  • Token Tracking: Placeholder for usage tracking (TODO: implement)

HuggingFace Provider

Current Implementation

  • ✅ Already integrated via huggingface_client.py

  • ✅ Supports Inference API

  • ⚠️ Limitation: No native function calling

  • 🔄 Workaround: Prompt-based tool calling (to be implemented)

Enhanced Model Support

Added 3 popular models to database:

  1. Mistral 7B Instruct - General purpose, fast

  2. Llama 2 70B Chat - High quality conversations

  3. Zephyr 7B Beta - Optimized for chat


📝 Configuration Guide

Step 1: Add AWS Credentials

Option B: Environment Variables

Step 2: Install Dependencies

Step 3: Seed New Models

Step 4: Verify Installation

Step 5: Create Agent with Bedrock Model

Via UI:

  1. Navigate to Agents > Create Agent

  2. In the Model Configuration tab:

    • Select provider: AWS Bedrock

    • Select model: Claude 3 Haiku (Bedrock) or Llama 3.1 8B (Bedrock)

    • Adjust temperature, max_tokens as needed

  3. Save and activate agent

Via API:


🧪 Testing Procedures

Test Plan Overview

  1. Unit Tests: Provider initialization and API calls

  2. Integration Tests: End-to-end workflow with Bedrock models

  3. Cost Validation: Verify token tracking and cost calculations

  4. Performance Tests: Response time comparison across providers

  5. UI Tests: Frontend model selection and display

Manual Testing Checklist

✅ Phase 1: Provider Registration

⏳ Phase 2: Model Availability

⏳ Phase 3: API Endpoint Tests

⏳ Phase 4: Frontend UI Tests

⏳ Phase 5: Cost Tracking


📊 Cost Comparison

Current vs. New Pricing

Model
Provider
Input ($/M)
Output ($/M)
Use Case
Savings

GPT-4

OpenAI

$30.00

$60.00

Premium orchestration

Baseline

Claude 3.5 Sonnet (Bedrock)

AWS

$3.00

$15.00

Orchestration

80-90%

Claude 3 Haiku (Bedrock)

AWS

$0.25

$1.25

High-volume tasks

95-99%

Llama 3.1 70B (Bedrock)

AWS

$0.99

$0.99

Balanced tasks

90-97%

Llama 3.1 8B (Bedrock)

AWS

$0.22

$0.22

Simple tasks

99%+

Mistral 7B

HuggingFace

FREE

FREE

Development/testing

100%

Hybrid Strategy Example


🚀 Deployment Checklist

Pre-Deployment

Deployment Steps

Post-Deployment


🔍 Monitoring & Observability

Key Metrics to Track

  1. Provider Usage

    • Requests per provider

    • Token consumption per provider

    • Cost per provider

  2. Performance

    • Response time by provider

    • Error rate by provider

    • Timeout rate

  3. Cost Efficiency

    • Daily/monthly spend by provider

    • Cost per workflow execution

    • Savings vs. baseline (GPT-4)

Logging

Alerts

  • Cost exceeds $100/day

  • Error rate > 5% for any provider

  • Response time > 10s average


📚 API Reference

New Endpoints

None - existing endpoints now support new providers

Updated Models

AgentModelConfig

Model ID Mappings (Bedrock)


🐛 Known Issues & Limitations

Limitations

  1. Token Tracking: Bedrock token counts are placeholders (TODO: implement)

  2. Function Calling: Only Claude models on Bedrock have native support

  3. HuggingFace: No native function calling support

  4. Rate Limits: AWS Bedrock has model-specific rate limits

  5. Availability: Some models (e.g., Llama 3.1 405B) are in preview

Workarounds

  1. Token Tracking: Estimate based on character count until API returns counts

  2. Function Calling: Use prompt engineering for non-Claude models

  3. HuggingFace: Use for non-tool tasks or implement prompt-based calling

  4. Rate Limits: Implement exponential backoff (already in boto3 config)


🎯 Success Criteria

Must Have (MVP)

Should Have

Nice to Have


📅 Timeline

Phase
Duration
Status
Completion Date

Phase 1: Backend Integration

3 hours

✅ Complete

2025-11-08

Phase 2: Database & Models

1 hour

✅ Complete

2025-11-08

Phase 3: Frontend UI

2 hours

✅ Complete

2025-11-08

Phase 4: Testing

2 hours

⏳ In Progress

-

Phase 5: Documentation

1 hour

⏳ In Progress

-

Total

9 hours

67% Complete

-


  • PRD-15: Multi-Model Agent (foundation for this work)

  • PRD-18: Credential Management (AWS creds storage)

  • PRD-20: MCP Credential (tool credential management)

  • PRD-16: LLM-Driven Orchestrator (uses these providers)


📞 Support & Contact

Questions?

  • Technical Issues: Check logs in /root/automatos-ai/orchestrator/logs/

  • Configuration Help: Review MULTI_PROVIDER_STRATEGY.md

  • Cost Questions: Review BEDROCK_VS_DIRECT_COMPARISON.md

Resources

  • AWS Bedrock Docs: https://docs.aws.amazon.com/bedrock/

  • HuggingFace API: https://huggingface.co/docs/api-inference/

  • boto3 Documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/index.html


📝 Change Log

v1.0.0 - 2025-11-08

  • ✅ Initial implementation complete

  • ✅ AWS Bedrock provider added

  • ✅ 4 Bedrock models seeded

  • ✅ 3 HuggingFace models seeded

  • ✅ Frontend UI updated

  • ⏳ Testing in progress

v1.1.0 - TBD


Last Updated: 2025-11-08 Document Owner: Platform Team Reviewers: Backend Team, Frontend Team, DevOps

Last updated