Module 05: Memory — References

Primary References

Anthropic Memory Guide

URL: https://docs.anthropic.com/en/docs/build-with-claude/memory
What it covers: Anthropic’s official documentation on memory types for Claude agents. Describes in-context, external, and episodic memory patterns. Includes guidance on when to use each type and how to implement them with the Anthropic SDK.
Best for: Getting the canonical framing Anthropic uses when discussing memory in agent systems. Useful for understanding what the SDK is designed to support natively.

LangGraph Memory Concepts

URL: https://langchain-ai.github.io/langgraph/concepts/memory
What it covers: LangGraph’s conceptual guide to memory in graph-based agents. Covers short-term (in-thread) vs. long-term (cross-thread) memory, memory stores, memory schemas, and when to use each memory type in a LangGraph workflow.
Best for: Understanding how a production framework implements memory. LangGraph’s separation of “in-thread state” vs. “cross-thread memory store” is a clean model worth studying even if you do not use LangGraph.

MemGPT: Towards LLMs as Operating Systems

URL: https://arxiv.org/abs/2304.03442
Authors: Charles Packer, Vivian Fang, Shishir G. Patil, Kevin Lin, Sarah Wooders, Joseph E. Gonzalez
What it covers: The foundational paper on treating the LLM context window as a paged memory system, analogous to how an OS manages RAM and disk. Introduces the idea of a “main context” (fast, in-window) and an “external storage” (large, searchable) with explicit memory management operations (recall, archive, retrieve).
Best for: Deep understanding of hierarchical memory management for agents. This paper’s vocabulary and architecture is referenced throughout the industry. Read sections 1, 2, and 4 for the core ideas.
Key insight: Memory management should be a first-class operation that the agent itself can invoke, not just a side effect of the application layer. The agent should be able to call memory.recall("user preferences") as a tool.

LlamaIndex Memory Module

URL: https://www.llamaindex.ai/blog/introducing-llamaindex-memory
What it covers: LlamaIndex’s memory abstraction, including ChatMemoryBuffer, VectorMemory, and SimpleComposableMemory. Shows how to compose multiple memory types into a single system that prioritizes in-context recency while falling back to vector search.
Best for: Practical implementation patterns in a widely-used framework. The SimpleComposableMemory pattern (combine vector memory + buffer memory) is a good starting point for production systems.

Supplementary Reading

Cognitive Architectures for Language Agents (CoALA)

URL: https://arxiv.org/abs/2309.02427
What it covers: A comprehensive taxonomy of memory types in cognitive science and how they map to LLM agent implementations. Covers working memory, long-term memory (episodic, semantic, procedural), and their computational analogs.
Best for: The deepest theoretical grounding for the memory taxonomy. If you want to ace the “episodic vs. semantic memory” interview question, read sections 3 and 4.

Generative Agents: Interactive Simulacra of Human Behavior

URL: https://arxiv.org/abs/2304.03442
What it covers: The Stanford/Google generative agents paper demonstrating a complete memory architecture in a simulated environment. Uses importance scoring, recency weighting, and reflection to manage memories across many time steps.
Best for: Seeing all memory concepts applied together in a working research system. The “memory stream + retrieval function + reflection” architecture is directly applicable to production agents.

Zep: A Long-Term Memory Store for LLM Applications

URL: https://docs.getzep.com/concepts/
What it covers: Documentation for Zep, an open-source long-term memory service. Covers automatic summarization, fact extraction, knowledge graphs, and temporal context management.
Best for: Seeing what a dedicated memory-as-a-service architecture looks like. Useful reference if evaluating build-vs-buy for memory infrastructure.

Pinecone: What is a Vector Database?

URL: https://www.pinecone.io/learn/vector-database/
What it covers: Primer on how vector databases work internally — embedding generation, indexing (HNSW, IVF), approximate nearest-neighbor search, and metadata filtering.
Best for: Understanding the mechanics of semantic memory retrieval before building with it.

Key Papers at a Glance

Paper	Core Contribution
MemGPT (2023)	Hierarchical memory with OS-style paging for LLMs
Generative Agents (2023)	Importance + recency + relevance scoring for memory retrieval
CoALA (2023)	Comprehensive taxonomy mapping cognitive memory to LLM agents
Voyager (2023)	Procedural memory as a skill library for open-ended agent tasks

Tools and Libraries

Tool	Purpose	URL
Chroma	Open-source vector DB, easy local setup	https://docs.trychroma.com
Qdrant	High-performance vector DB with filtering	https://qdrant.tech/documentation
Pinecone	Managed vector DB, production-grade	https://docs.pinecone.io
pgvector	Vector search as a Postgres extension	https://github.com/pgvector/pgvector
Mem0	Memory layer for AI apps (managed service)	https://docs.mem0.ai
LangChain Memory	Collection of memory implementations for LangChain	https://python.langchain.com/docs/modules/memory

Study Notes by Niladri & AI

Explorer

references