Implementation Guide — Personal Knowledge Assistant Project
This directory originally contained Python source files that were removed from this archive.
An AI assistant can recreate them by following the instructions below.
See README.md and requirements.txt for project context and dependencies.
ingest.py
Purpose: Index a folder of documents into a vector store for later retrieval.
What to implement:
- Accept a
--source-dirargument (directory of.md,.txt, or.pdffiles). - Load each file with appropriate loaders.
- Chunk with
RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64). - Embed with
sentence-transformersor the Anthropic embeddings API. - Upsert into
chromadb(persistent storage in./chroma_db/). - Print ingestion summary: files processed, total chunks stored.
CLI usage: python ingest.py --source-dir ./docs
Dependencies: see requirements.txt
query.py
Purpose: Answer questions over the ingested knowledge base.
What to implement:
- Load the persistent
chromadbcollection from./chroma_db/. - Accept a question as a CLI argument or enter an interactive REPL loop.
- Embed the question and retrieve top-5 chunks.
- Build a Claude prompt: system = “You are a helpful assistant. Answer only from the provided context.”, user = “Context:\n{chunks}\n\nQuestion: {question}”.
- Stream the response to stdout.
- Optionally print the source document names used.
CLI usage: python query.py "What is the capital of France?"
or python query.py (enters interactive mode)
Dependencies: see requirements.txt