Implementation Guide — RAG Examples
This directory originally contained Python example scripts that were removed from this archive.
An AI assistant can recreate them by following the instructions below.
basic_rag.py
Purpose: Implement a minimal Retrieval-Augmented Generation pipeline.
What to implement:
- Ingestion: Load a small set of plain-text documents (can be hardcoded strings).
- Chunking: Split each document into ~300-token chunks with 50-token overlap.
- Embedding: Embed each chunk using
sentence-transformers(all-MiniLM-L6-v2) or OpenAI/Anthropic embeddings. - Storage: Store chunk text + embedding in an in-memory list (or
numpyarray). - Retrieval: On a user query, embed the query, compute cosine similarity, return top-3 chunks.
- Generation: Pass retrieved chunks as context into a Claude prompt and return the answer.
- Demo: Run 2–3 sample questions and print retrieved context + final answer.
How to run: python basic_rag.py
Dependencies: anthropic, sentence-transformers, numpy
hybrid_retrieval.py
Purpose: Combine dense (vector) and sparse (BM25) retrieval for better recall.
What to implement:
- Reuse the chunking/document setup from
basic_rag.py. - Dense retrieval: Same embedding + cosine similarity approach.
- Sparse retrieval: Use
rank_bm25(BM25Okapi) for keyword-based scoring. - Fusion: Implement Reciprocal Rank Fusion (RRF) to merge both ranked lists:
score(d) = Σ 1 / (k + rank_i(d))where k=60.
- Pass the top-3 fused results to Claude.
- Compare outputs: print which chunks were selected by each method alone vs. hybrid.
How to run: python hybrid_retrieval.py
Dependencies: anthropic, sentence-transformers, numpy, rank-bm25