Implementation Guide — Personal Knowledge Assistant Project

This directory originally contained Python source files that were removed from this archive.
An AI assistant can recreate them by following the instructions below.
See README.md and requirements.txt for project context and dependencies.


ingest.py

Purpose: Index a folder of documents into a vector store for later retrieval.

What to implement:

  1. Accept a --source-dir argument (directory of .md, .txt, or .pdf files).
  2. Load each file with appropriate loaders.
  3. Chunk with RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64).
  4. Embed with sentence-transformers or the Anthropic embeddings API.
  5. Upsert into chromadb (persistent storage in ./chroma_db/).
  6. Print ingestion summary: files processed, total chunks stored.

CLI usage: python ingest.py --source-dir ./docs
Dependencies: see requirements.txt


query.py

Purpose: Answer questions over the ingested knowledge base.

What to implement:

  1. Load the persistent chromadb collection from ./chroma_db/.
  2. Accept a question as a CLI argument or enter an interactive REPL loop.
  3. Embed the question and retrieve top-5 chunks.
  4. Build a Claude prompt: system = “You are a helpful assistant. Answer only from the provided context.”, user = “Context:\n{chunks}\n\nQuestion: {question}”.
  5. Stream the response to stdout.
  6. Optionally print the source document names used.

CLI usage: python query.py "What is the capital of France?"
or python query.py (enters interactive mode)
Dependencies: see requirements.txt