References — RAG Module
Core papers, documentation, and guides for deep RAG study.
Primary Documentation
Anthropic
- Anthropic RAG Guide
https://docs.anthropic.com/en/docs/build-with-claude/retrieval-augmented-generation
Official Anthropic documentation on building RAG systems with Claude. Covers tool use
for retrieval, prompt construction, and production patterns.
LangChain
- LangChain RAG Concepts
https://python.langchain.com/docs/concepts/rag
Comprehensive conceptual overview of RAG components in the LangChain ecosystem:
document loaders, text splitters, vector stores, retrievers, and chains.
LlamaIndex
- LlamaIndex RAG Guide
https://docs.llamaindex.ai/en/stable/understanding/rag
LlamaIndex’s guide to building production RAG pipelines. Strong on advanced patterns
like sub-question query engines, recursive retrieval, and knowledge graphs.
Foundational Papers
Original RAG Paper
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Lewis et al., Facebook AI Research (2020)
https://arxiv.org/abs/2005.11401
The paper that introduced the RAG framework. Proposes RAG-Sequence and RAG-Token
variants, fine-tunes a retriever and generator jointly. Essential reading for
understanding the original formulation before it became an engineering pattern.
HyDE Paper
- Precise Zero-Shot Dense Retrieval without Relevance Labels
Gao et al. (2022)
https://arxiv.org/abs/2212.10496
Introduces Hypothetical Document Embeddings (HyDE): generating a hypothetical answer
to a query and using that for dense retrieval instead of the raw query. Demonstrates
consistent improvements on BEIR benchmarks without any fine-tuning.
Evaluation
RAGAS
- RAGAS Documentation
https://docs.ragas.io
The standard framework for RAG evaluation. Covers Faithfulness, Answer Relevancy,
Context Precision, Context Recall, and more. Includes guides for building evaluation
datasets and integrating with LangChain / LlamaIndex.
Vector Databases
Chroma
- Chroma Documentation
https://www.trychroma.com/docs
Official docs for the Chroma open-source vector database. Covers collections,
embedding functions, metadata filtering, persistent vs in-memory modes, and the
Python/JavaScript clients.
Pinecone
- Pinecone Documentation
https://docs.pinecone.io
Managed vector database docs. Covers indexes, namespaces, serverless vs pod-based
architecture, metadata filtering, hybrid search, and production best practices.
Additional Resources
Surveys and Deep Dives
-
BEIR: A Heterogeneous Benchmark for Zero-Shot Evaluation of Information Retrieval Models
Thakur et al. (2021) — https://arxiv.org/abs/2104.08663
The standard benchmark for comparing retrieval models. If you want to evaluate
embedding models or retrieval strategies, BEIR is the reference. -
Lost in the Middle: How Language Models Use Long Contexts
Liu et al. (2023) — https://arxiv.org/abs/2307.03172
Demonstrates empirically that LLMs underutilize information in the middle of long
contexts. Directly relevant to how you order retrieved chunks in the prompt. -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Asai et al. (2023) — https://arxiv.org/abs/2310.11511
Proposes training a model to decide when to retrieve and to critique its own outputs.
Related to agentic RAG patterns. -
FLARE: Active Retrieval Augmented Generation
Jiang et al. (2023) — https://arxiv.org/abs/2305.06983
Forward-looking active retrieval: the model triggers retrieval mid-generation when
uncertain. Relevant to the FLARE section in the README.
BM25 and Hybrid Retrieval
-
The Probabilistic Relevance Framework: BM25 and Beyond
Robertson & Zaragoza (2009)
https://www.nowpublishers.com/article/Details/INR-019
The definitive paper on BM25. Explains the probabilistic model behind the formula. -
Hybrid Search via Reciprocal Rank Fusion
Cormack et al. (2009)
https://dl.acm.org/doi/10.1145/1571941.1572114
Original RRF paper. Very short (2 pages), shows empirically that RRF fusion with k=60
outperforms linear combination of scores.
ColBERT and Reranking
-
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction
Khattab & Zaharia (2020) — https://arxiv.org/abs/2004.12832
Introduces ColBERT’s MaxSim operator for efficient cross-encoder-quality retrieval. -
RAGatouille — https://github.com/bclavie/RAGatouille
Python library making ColBERT practical for RAG systems. Wraps training, indexing,
and retrieval in a simple API.
Chunking
- Evaluating Chunking Strategies for Retrieval
Pinecone blog post — https://www.pinecone.io/learn/chunking-strategies/
Practical guide with benchmarks comparing fixed-size, sentence, and semantic chunking.
Tools Referenced in This Module
| Tool | Purpose | Link |
|---|---|---|
| chromadb | Local/embedded vector DB | https://www.trychroma.com |
| rank-bm25 | BM25 implementation in Python | https://github.com/dorianbrown/rank_bm25 |
| sentence-transformers | Local embedding models | https://www.sbert.net |
| BAAI/bge-m3 | Top open-source embedding model | https://huggingface.co/BAAI/bge-m3 |
| RAGAS | RAG evaluation framework | https://docs.ragas.io |
| Cohere Rerank | Reranking API | https://docs.cohere.com/reference/rerank |
| LangChain | RAG orchestration framework | https://python.langchain.com |
| LlamaIndex | RAG orchestration framework | https://docs.llamaindex.ai |
| Qdrant | High-performance vector DB | https://qdrant.tech/documentation |
| pgvector | PostgreSQL vector extension | https://github.com/pgvector/pgvector |
| FAISS | Meta’s ANN library | https://faiss.ai |