AI API – Production-Ready LLM Backend (RAG + Agents + Observability)

Overview

This project is a production-ready AI backend system built using FastAPI, LangChain, and OpenAI. It supports Retrieval-Augmented Generation (RAG), agent-based orchestration, and LLM observability using LangSmith.

The system is designed to simulate real-world AI backend architecture used in enterprise applications. FrontEnd: https://ai-by0z7njes-manibala-sinhas-projects-273c5a77.vercel.app/ Backend: https://ai-api-6.onrender.com/

Key Features

FastAPI-based scalable backend
RAG pipeline using vector database (Chroma)
LangChain-powered agent orchestration
Function/tool calling support
Prompt engineering with hallucination control
Observability & tracing via LangSmith
Structured logging (latency, responses)
Modular and production-ready architecture
Unit testing support

Architecture

User Query
   ↓
FastAPI Endpoint (/ask)
   ↓
Agent Layer (decision making)
   ↓
 ┌───────────────┬───────────────┐
 ↓               ↓               ↓
RAG Pipeline   LLM Call     Tool/API
 ↓
Vector DB (Chroma)
 ↓
Context + Prompt
 ↓
LLM Response
 ↓
Logging + Tracing (LangSmith)
 ↓
API Response

Project Structure

app/
 ├── main.py
 ├── routes/
 │   └── ask.py
 ├── services/
 │   ├── rag_service.py
 │   ├── agent_service.py
 │   ├── llm_service.py
 ├── models/
 │   └── schemas.py
 ├── utils/
 │   ├── logger.py
 │   ├── config.py
tests/
 └── test_api.py

Tech Stack

Backend: FastAPI
LLM Framework: LangChain
LLM Provider: OpenAI
Vector DB: Chroma
Observability: LangSmith
Testing: Pytest

How It Works

1. RAG Pipeline

Documents are split into chunks
Converted into embeddings
Stored in vector DB
Top-K relevant documents retrieved
Injected into prompt for answer generation

2. Agent System

Determines whether to:
- Use RAG
- Call LLM directly
- Use tools/APIs
Built using LangChain agent executor

3. Prompt Engineering

Strict system prompts used
Prevent hallucination
Enforce structured responses

4. Observability

LangSmith traces:
- LLM calls
- Agent decisions
- Retrieval steps
Logs include:
- Latency
- Input/output
- Token usage

API Usage

Endpoint

POST /ask

Request

{
  "query": "What is RAG?"
}

Response

{
  "answer": "RAG stands for Retrieval Augmented Generation...",
  "latency": 0.45
}

Running Locally

1. Clone repo

git clone <your-repo-url>
cd ai-api

2. Install dependencies

pip install -r requirements.txt

3. Set environment variables

export OPENAI_API_KEY=your_key
export LANGCHAIN_API_KEY=your_langsmith_key
export LANGCHAIN_TRACING_V2=true

4. Run server

uvicorn app.main:app --reload

Testing

pytest

Observability (LangSmith)

Tracks full LLM lifecycle
Debug prompts, responses, failures
Monitor latency and token usage

Design Decisions

Chroma DB used for simplicity and local persistence
LangChain agents for flexible orchestration
Modular services for scalability
Structured logging for production readiness

Future Improvements

Add Pinecone / scalable vector DB
Multi-agent workflows
Caching layer (Redis)
Streaming responses
Authentication & rate limiting

Author

Manibala Sinha Senior Backend Engineer | Python | FastAPI | AI Systems

Summary

End-to-end LLM backend design
Production-level architecture
Real-world AI engineering practices

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.github/workflows		.github/workflows
__pycache__		__pycache__
ai-api		ai-api
app		app
auth		auth
image-caption-summarizer		image-caption-summarizer
models		models
templates		templates
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
index.htm		index.htm
index.html		index.html
main.py		main.py
render.yaml		render.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI API – Production-Ready LLM Backend (RAG + Agents + Observability)

Overview

The system is designed to simulate real-world AI backend architecture used in enterprise applications. FrontEnd: https://ai-by0z7njes-manibala-sinhas-projects-273c5a77.vercel.app/ Backend: https://ai-api-6.onrender.com/

Key Features

Architecture

Project Structure

Tech Stack

How It Works

1. RAG Pipeline

2. Agent System

3. Prompt Engineering

4. Observability

API Usage

Endpoint

Request

Response

Running Locally

1. Clone repo

2. Install dependencies

3. Set environment variables

4. Run server

Testing

Observability (LangSmith)

Design Decisions

Future Improvements

Author

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI API – Production-Ready LLM Backend (RAG + Agents + Observability)

Overview

The system is designed to simulate real-world AI backend architecture used in enterprise applications. FrontEnd: https://ai-by0z7njes-manibala-sinhas-projects-273c5a77.vercel.app/ Backend: https://ai-api-6.onrender.com/

Key Features

Architecture

Project Structure

Tech Stack

How It Works

1. RAG Pipeline

2. Agent System

3. Prompt Engineering

4. Observability

API Usage

Endpoint

Request

Response

Running Locally

1. Clone repo

2. Install dependencies

3. Set environment variables

4. Run server

Testing

Observability (LangSmith)

Design Decisions

Future Improvements

Author

Summary

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages