Implementation Guide — RAG Examples

This directory originally contained Python example scripts that were removed from this archive.
An AI assistant can recreate them by following the instructions below.


basic_rag.py

Purpose: Implement a minimal Retrieval-Augmented Generation pipeline.

What to implement:

  1. Ingestion: Load a small set of plain-text documents (can be hardcoded strings).
  2. Chunking: Split each document into ~300-token chunks with 50-token overlap.
  3. Embedding: Embed each chunk using sentence-transformers (all-MiniLM-L6-v2) or OpenAI/Anthropic embeddings.
  4. Storage: Store chunk text + embedding in an in-memory list (or numpy array).
  5. Retrieval: On a user query, embed the query, compute cosine similarity, return top-3 chunks.
  6. Generation: Pass retrieved chunks as context into a Claude prompt and return the answer.
  7. Demo: Run 2–3 sample questions and print retrieved context + final answer.

How to run: python basic_rag.py
Dependencies: anthropic, sentence-transformers, numpy


hybrid_retrieval.py

Purpose: Combine dense (vector) and sparse (BM25) retrieval for better recall.

What to implement:

  1. Reuse the chunking/document setup from basic_rag.py.
  2. Dense retrieval: Same embedding + cosine similarity approach.
  3. Sparse retrieval: Use rank_bm25 (BM25Okapi) for keyword-based scoring.
  4. Fusion: Implement Reciprocal Rank Fusion (RRF) to merge both ranked lists:
    • score(d) = Σ 1 / (k + rank_i(d)) where k=60.
  5. Pass the top-3 fused results to Claude.
  6. Compare outputs: print which chunks were selected by each method alone vs. hybrid.

How to run: python hybrid_retrieval.py
Dependencies: anthropic, sentence-transformers, numpy, rank-bm25