rag-evaluation

Here are 74 public repositories matching this topic...

Giskard-AI / giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

ai-security mlops fairness-ai responsible-ai ml-validation red-team-tools trustworthy-ai ml-testing llm ai-red-team ai-testing llmops llm-security llm-eval llm-evaluation rag-evaluation agent-evaluation

Updated Mar 19, 2026
Python

Marker-Inc-Korea / AutoRAG

Star

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

python open-source qa benchmarking ops pipeline analysis optimization evaluation embeddings automl document-parser rag llm retrieval-augmented-generation llm-ops llm-evaluation rag-evaluation

Updated Mar 10, 2026
Python

Agenta-AI / agenta

Star

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

evaluation agents observability prompt-engineering llmops prompt-management llm-tools llm-framework llm-playground llm-platform llm-evaluation rag-evaluation llm-monitoring llm-as-a-judge llm-observability

Updated Mar 19, 2026
TypeScript

frutik / Awesome-RAG

Star

rag rag-implementation rag-evaluation

Updated Sep 7, 2025

vectara / open-rag-eval

Star

RAG evaluation without the need for "golden answers"

metrics evaluation-metrics rag vectara retrieval-augmented-generation rag-evaluation

Updated Dec 15, 2025
Python

LLAMATOR-Core / llamator

Star

Red Teaming python-framework for testing chatbots and GenAI systems.

Updated Jan 16, 2026
Python

mburaksayici / RAG-Boilerplate

Star

RAG boilerplate with semantic/propositional chunking, hybrid search (BM25 + dense), LLM reranking, query enhancement agents, CrewAI orchestration, Qdrant vector search, Redis/Mongo sessioning, Celery ingestion pipeline, Gradio UI, and an evaluation suite (Hit-Rate, MRR, hybrid configs).

ai-agents reranking rag vector-database hybrid-search qdrant llm retrieval-augmented-generation rag-evaluation semantic-chunking crewai rag-pipeline propositional-models query-enhancement

Updated Nov 18, 2025
Python

mts-ai / rurage

Star

information-retrieval question-answering rag llm-evaluation rag-evaluation

Updated Apr 14, 2025
Python

HZYAI / RagScore

Star

⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or CLI. Privacy-first, async, visual reports.

privacy jupyter mcp evaluation colab dataset-generation synthetic-data fine-tuning rag qa-generation ai-evaluation llm llmops local-llm ollama rag-evaluation llm-as-a-judge

Updated Mar 13, 2026
Python

vero-labs-ai / vero-eval

Star

Open source framework for evaluating AI Agents

python testing evaluation datasets dataset-generation evaluation-metrics evaluation-framework testing-framework testing-library synthetic-dataset-generation user-persona evals llm-evaluation rag-evaluation llm-evaluation-framework langgraph rag-testing

Updated Feb 24, 2026
Python

GiovanniPasq / chunky

Star

A visual tool to convert PDFs to Markdown and create, inspect, and refine document chunks for RAG pipelines.

visualization python markdown ai developer-tools chunking document-processing rag fastapi pdf-processing ai-engineering llm retrieval-augmented-generation rag-evaluation text-splitting rag-pipeline rag-tools chunk-validation

Updated Mar 15, 2026
Python

mburaksayici / smallevals

Star

smallevals — CPU-fast, GPU-blazing fast offline retrieval evaluation for RAG systems with tiny QA models.

qa chroma question-generation weaviate qa-generation milvus vector-database qdrant chromadb rag-evaluation tiny-llm retrieval-evaluation offline-evaluation retrieval-metrics

Updated Dec 4, 2025
Python

dokimos-dev / dokimos

Star

Evaluation Framework for LLM applications in Java and Kotlin

Updated Mar 18, 2026
Java

oztrkoguz / RAG-Framework-Evaluation

Star

This project aims to compare different Retrieval-Augmented Generation (RAG) frameworks in terms of speed and performance.

swarms autogen rag langchain llamaindex rag-evaluation crewai langchain-rag autogen-rag crewai-rag llamaindex-rag swarms-rag

Updated Jul 28, 2024
Python

ioannis-papadimitriou / rag-playground

Star

A framework for systematic evaluation of retrieval strategies and prompt engineering in RAG systems, featuring an interactive chat interface for document analysis.

chatbot qa-generation llm-inference retrieval-augmented-generation rag-evaluation