References — RAG Module

Core papers, documentation, and guides for deep RAG study.

Primary Documentation

Anthropic

Anthropic RAG Guide
https://docs.anthropic.com/en/docs/build-with-claude/retrieval-augmented-generation
Official Anthropic documentation on building RAG systems with Claude. Covers tool use
for retrieval, prompt construction, and production patterns.

LangChain

LangChain RAG Concepts
https://python.langchain.com/docs/concepts/rag
Comprehensive conceptual overview of RAG components in the LangChain ecosystem:
document loaders, text splitters, vector stores, retrievers, and chains.

LlamaIndex

LlamaIndex RAG Guide
https://docs.llamaindex.ai/en/stable/understanding/rag
LlamaIndex’s guide to building production RAG pipelines. Strong on advanced patterns
like sub-question query engines, recursive retrieval, and knowledge graphs.

Foundational Papers

Original RAG Paper

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Lewis et al., Facebook AI Research (2020)
https://arxiv.org/abs/2005.11401
The paper that introduced the RAG framework. Proposes RAG-Sequence and RAG-Token
variants, fine-tunes a retriever and generator jointly. Essential reading for
understanding the original formulation before it became an engineering pattern.

HyDE Paper

Precise Zero-Shot Dense Retrieval without Relevance Labels
Gao et al. (2022)
https://arxiv.org/abs/2212.10496
Introduces Hypothetical Document Embeddings (HyDE): generating a hypothetical answer
to a query and using that for dense retrieval instead of the raw query. Demonstrates
consistent improvements on BEIR benchmarks without any fine-tuning.

Evaluation

RAGAS

RAGAS Documentation
https://docs.ragas.io
The standard framework for RAG evaluation. Covers Faithfulness, Answer Relevancy,
Context Precision, Context Recall, and more. Includes guides for building evaluation
datasets and integrating with LangChain / LlamaIndex.

Vector Databases

Chroma

Chroma Documentation
https://www.trychroma.com/docs
Official docs for the Chroma open-source vector database. Covers collections,
embedding functions, metadata filtering, persistent vs in-memory modes, and the
Python/JavaScript clients.

Pinecone

Pinecone Documentation
https://docs.pinecone.io
Managed vector database docs. Covers indexes, namespaces, serverless vs pod-based
architecture, metadata filtering, hybrid search, and production best practices.

Additional Resources

Surveys and Deep Dives

BEIR: A Heterogeneous Benchmark for Zero-Shot Evaluation of Information Retrieval Models
Thakur et al. (2021) — https://arxiv.org/abs/2104.08663
The standard benchmark for comparing retrieval models. If you want to evaluate
embedding models or retrieval strategies, BEIR is the reference.
Lost in the Middle: How Language Models Use Long Contexts
Liu et al. (2023) — https://arxiv.org/abs/2307.03172
Demonstrates empirically that LLMs underutilize information in the middle of long
contexts. Directly relevant to how you order retrieved chunks in the prompt.
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Asai et al. (2023) — https://arxiv.org/abs/2310.11511
Proposes training a model to decide when to retrieve and to critique its own outputs.
Related to agentic RAG patterns.
FLARE: Active Retrieval Augmented Generation
Jiang et al. (2023) — https://arxiv.org/abs/2305.06983
Forward-looking active retrieval: the model triggers retrieval mid-generation when
uncertain. Relevant to the FLARE section in the README.

BM25 and Hybrid Retrieval

The Probabilistic Relevance Framework: BM25 and Beyond
Robertson & Zaragoza (2009)
https://www.nowpublishers.com/article/Details/INR-019
The definitive paper on BM25. Explains the probabilistic model behind the formula.
Hybrid Search via Reciprocal Rank Fusion
Cormack et al. (2009)
https://dl.acm.org/doi/10.1145/1571941.1572114
Original RRF paper. Very short (2 pages), shows empirically that RRF fusion with k=60
outperforms linear combination of scores.

ColBERT and Reranking

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction
Khattab & Zaharia (2020) — https://arxiv.org/abs/2004.12832
Introduces ColBERT’s MaxSim operator for efficient cross-encoder-quality retrieval.
RAGatouille — https://github.com/bclavie/RAGatouille
Python library making ColBERT practical for RAG systems. Wraps training, indexing,
and retrieval in a simple API.

Chunking

Evaluating Chunking Strategies for Retrieval
Pinecone blog post — https://www.pinecone.io/learn/chunking-strategies/
Practical guide with benchmarks comparing fixed-size, sentence, and semantic chunking.

Tools Referenced in This Module

Tool	Purpose	Link
chromadb	Local/embedded vector DB	https://www.trychroma.com
rank-bm25	BM25 implementation in Python	https://github.com/dorianbrown/rank_bm25
sentence-transformers	Local embedding models	https://www.sbert.net
BAAI/bge-m3	Top open-source embedding model	https://huggingface.co/BAAI/bge-m3
RAGAS	RAG evaluation framework	https://docs.ragas.io
Cohere Rerank	Reranking API	https://docs.cohere.com/reference/rerank
LangChain	RAG orchestration framework	https://python.langchain.com
LlamaIndex	RAG orchestration framework	https://docs.llamaindex.ai
Qdrant	High-performance vector DB	https://qdrant.tech/documentation
pgvector	PostgreSQL vector extension	https://github.com/pgvector/pgvector
FAISS	Meta’s ANN library	https://faiss.ai

Study Notes by Niladri & AI

Explorer

references

References — RAG Module

Primary Documentation

Anthropic

LangChain

LlamaIndex

Foundational Papers

Original RAG Paper

HyDE Paper

Evaluation

RAGAS

Vector Databases

Chroma

Pinecone

Additional Resources

Surveys and Deep Dives

BM25 and Hybrid Retrieval

ColBERT and Reranking

Chunking

Tools Referenced in This Module

Graph View

Table of Contents