Project 1: Personal Knowledge Assistant

A command-line RAG system that indexes your personal notes/docs folder and lets you query it conversationally using Claude.

What It Does

ingest.py walks a directory, chunks .txt/.md/.pdf files, embeds them, and stores in a local Chroma vector database. Idempotent: files are hashed and skipped if already indexed.
query.py starts an interactive REPL. You type a question, the system retrieves the 5 most relevant chunks, and streams a Claude-powered answer. Conversation history (last 10 turns) is maintained for follow-up questions.

Architecture

Your docs folder
      |
      v
  ingest.py
  |-- Walk directory (.txt / .md / .pdf)
  |-- Chunk text (~500 tokens, 50-token overlap)
  |-- Hash check -> skip if already indexed
  |-- Embed with OpenAI text-embedding-3-small (or sentence-transformers fallback)
  `-- Store in Chroma DB (./chroma_db)

  query.py  <--  you type a question
  |-- Embed query
  |-- Retrieve top-5 chunks from Chroma
  |-- Build prompt (system + context + conversation history + query)
  |-- Stream response from Claude (claude-haiku-4-5-20251001)
  `-- Append turn to conversation history -> loop

Skills Covered

Module	Concept
02 — RAG	Document chunking, embedding, vector retrieval, context injection
03 — Agents	Conversational REPL loop
05 — Memory	Rolling conversation history (last 10 turns)
09 — Production	Idempotent ingestion, env-var config, graceful error handling

Setup

1. Install dependencies

cd projects/01-personal-knowledge-assistant
pip install -r requirements.txt

2. Create a `.env` file

ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...      # Optional: falls back to sentence-transformers if omitted
DOCS_DIR=./docs            # Directory to index  (default: ./docs)
CHROMA_DIR=./chroma_db     # Vector DB location  (default: ./chroma_db)
COLLECTION_NAME=knowledge  # Chroma collection   (default: knowledge)

If OPENAI_API_KEY is not set the system falls back to sentence-transformers/all-MiniLM-L6-v2 (local, no API key needed).

3. Add your documents

mkdir docs
cp ~/Documents/notes/*.md docs/
cp ~/Downloads/some-paper.pdf docs/

Nested folder structures are fine — ingest.py walks recursively.

Usage

Index your documents

python ingest.py

Using OpenAI embeddings (text-embedding-3-small)
Scanning: ./docs
  [ 1/12] notes/project-alpha.md       ->  8 chunks
  [ 2/12] notes/meeting-2025-03.md     ->  0 chunks  (skipped: already indexed)
  [12/12] papers/attention-is-all.pdf  -> 24 chunks
Indexed 47 new chunks from 10 documents (2 skipped, already up to date)

Query interactively

python query.py

Knowledge Assistant ready. Type 'quit' to exit.
Loaded 47 chunks from collection 'knowledge'.

You: What did we decide in the March meeting about the API design?
Assistant: Based on your notes from the March 14 meeting, the team decided to...

You: What were the action items?
Assistant: Following up on the API design decision, the action items were...

You: quit
Goodbye!

Extension Ideas

Idea	Effort	Description
Web UI	Medium	Wrap `query.py` in FastAPI + a Streamlit or React frontend
Slack bot	Medium	Replace the CLI REPL with a Slack Bolt app — each DM is a query
Scheduled re-indexing	Small	Cron job or GitHub Action that re-runs `ingest.py` when your notes repo changes
Multi-user	Medium	Store user IDs in Chroma metadata; filter by user on retrieval
Source highlighting	Small	Return the source filename and page/line alongside each answer
Hybrid search	Medium	Combine vector similarity with BM25 keyword search for better recall

Interview Demo Script

Run ingest.py on your own notes folder — show real output.
Ask a question whose answer spans two different documents.
Ask a follow-up question that only makes sense given the previous answer (demonstrates history).
Re-run ingest.py and show “0 new chunks” because hashes match (demonstrates idempotency).
Explain the architecture diagram — why chunking, why overlap, why embeddings.

File Reference

File	Purpose
`ingest.py`	Index documents into Chroma
`query.py`	Interactive RAG REPL
`requirements.txt`	Python dependencies
`.env`	API keys and config (never commit this)
`./docs/`	Put your documents here
`./chroma_db/`	Auto-created; stores the vector index

Study Notes by Niladri & AI

Explorer

README

Project 1: Personal Knowledge Assistant

What It Does

Architecture

Skills Covered

Setup

1. Install dependencies

2. Create a `.env` file

3. Add your documents

Usage

Index your documents

Query interactively

Extension Ideas

Interview Demo Script

File Reference

Graph View

Table of Contents

Study Notes by Niladri & AI

Explorer

README

Project 1: Personal Knowledge Assistant

What It Does

Architecture

Skills Covered

Setup

1. Install dependencies

2. Create a .env file

3. Add your documents

Usage

Index your documents

Query interactively

Extension Ideas

Interview Demo Script

File Reference

Graph View

Table of Contents

2. Create a `.env` file