References: Prompt Engineering

A curated reading list organized by category. Start with the Official Docs and Papers sections, then explore the courses and tools as needed.

Official Documentation

Anthropic Prompt Engineering Guide

URL: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
What it covers: Anthropic’s first-party recommendations for prompting Claude — XML tags, system prompts, prefilling, avoiding hallucinations, handling long documents
Why read it: These are the canonical best practices for the model you are most likely working with. Anthropic documents Claude-specific behaviors that differ from other models.
Best sections: “Be clear and direct”, “Use XML tags”, “Long context tips”, “Extended thinking”

Foundational Papers

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

URL: https://arxiv.org/abs/2201.11903
Authors: Wei et al. (Google Brain), 2022
What it covers: Introduces chain-of-thought prompting — showing that providing intermediate reasoning steps as few-shot examples dramatically improves performance on arithmetic, commonsense, and symbolic reasoning tasks
Key result: Few-shot CoT on GPT-3 (540B) achieved 57% accuracy on GSM8K math problems, vs 17% for standard few-shot — a 3x improvement
Why read it: The foundational paper for the single most impactful prompting technique. Every engineer using LLMs for multi-step tasks should understand this.

Self-Consistency Improves Chain of Thought Reasoning in Language Models

URL: https://arxiv.org/abs/2203.11171
Authors: Wang et al. (Google Brain), 2022
What it covers: Proposes sampling multiple diverse reasoning paths and taking a majority vote on the final answer, instead of using greedy decoding on a single CoT path
Key result: Self-consistency with 40 sampled paths improved CoT performance by 17.9% on GSM8K
Why read it: Critical for production systems where single-pass accuracy is insufficient. Also introduces the concept of reasoning path diversity as a signal of answer confidence.

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

URL: https://arxiv.org/abs/2305.10601
Authors: Yao et al. (Princeton/Google DeepMind), 2023
What it covers: Frames problem solving as search over a tree of partial solutions, where the model can evaluate and backtrack — a generalization of CoT that enables non-linear reasoning
Key result: ToT solved 74% of “Game of 24” problems vs 4% for standard CoT
Why read it: Foundational for understanding agent architectures and structured reasoning. Conceptually underpins modern agent planning frameworks.

ReAct: Synergizing Reasoning and Acting in Language Models

URL: https://arxiv.org/abs/2210.03629
Authors: Yao et al. (Princeton), 2022
What it covers: Interleaves reasoning traces and task-specific actions (e.g., search queries, API calls), allowing models to dynamically adjust plans based on observations
Key result: ReAct outperforms pure reasoning (CoT) and pure acting approaches on knowledge-intensive tasks like HotpotQA and FEVER
Why read it: The direct precursor to modern tool-using agent patterns. Understanding ReAct is essential for the 04-agents module.

Large Language Models Are Zero-Shot Reasoners

URL: https://arxiv.org/abs/2205.11916
Authors: Kojima et al. (University of Tokyo), 2022
What it covers: Discovers that “Let’s think step by step” as a zero-shot prompt suffix elicits CoT reasoning without any examples — the famous “zero-shot CoT” result
Key result: Zero-shot CoT with “Let’s think step by step” improved MultiArith accuracy from 17.7% to 78.7%
Why read it: Explains the mechanism behind the single most useful prompting phrase. Short, readable paper — highly recommended.

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

URL: https://arxiv.org/abs/2205.10625
Authors: Zhou et al. (Google Research), 2022
What it covers: A two-stage prompting strategy: first decompose a complex problem into sub-problems, then solve them in sequence using previous answers as context
Why read it: Addresses a key limitation of standard CoT — handling problems that require solving simpler prerequisites first. Directly applicable to multi-step software engineering tasks.

Comprehensive Guides & Courses

Prompting Guide (promptingguide.ai)

URL: https://www.promptingguide.ai
What it covers: Comprehensive reference for all major prompting techniques with examples across multiple models. Covers zero/few-shot, CoT, ToT, ReAct, self-consistency, and more.
Why use it: Best single reference for looking up a specific technique quickly. Well-maintained with citations to original papers.
Best sections: Techniques overview, Model-specific guides, Applications

Learn Prompting

URL: https://learnprompting.org
What it covers: Interactive, structured course on prompt engineering from basics to advanced topics. Covers defensive prompting, image prompting, and agent design.
Why use it: More approachable than reading papers. Good for filling in gaps after you’ve read the core papers.
Recommended path: Start with “Basics” → “Intermediate” → “Prompt Hacking” section

Security & Adversarial Prompting

Prompt Injection Resources (Brex)

URL: https://github.com/brexhq/prompt-security
What it covers: Practical guide from Brex’s engineering team on prompt injection attacks and defenses in production LLM systems. Includes attack taxonomy, real-world examples, and mitigation strategies.
Why read it: One of the most practically useful security resources for engineers building LLM-powered products. Written by practitioners, not just researchers.

Prompt Injection Attacks Against GPT-3 (Simon Willison)

URL: https://simonwillison.net/2022/Sep/12/prompt-injection/
What it covers: The blog post that popularized the term “prompt injection” — explains the attack with clear examples and analogies to SQL injection
Why read it: Clear, concise introduction to the attack surface. Good for explaining the concept to non-ML engineers.

Tools & Infrastructure

Anthropic Console

URL: https://console.anthropic.com
What it is: Anthropic’s web IDE for prompt development. Features: prompt editor with version history, side-by-side model comparison, token counting, test case library.
Best for: Rapid prototyping, sharing prompts with team, exploring model differences

LangSmith

URL: https://smith.langchain.com
What it is: LLM observability platform. Features: run tracing, prompt versioning, dataset management, evaluation pipelines, A/B testing.
Best for: Production systems requiring full observability and systematic prompt evaluation

OpenAI Evals (framework, not model-specific)

URL: https://github.com/openai/evals
What it is: Open-source evaluation framework. Useful for building automated prompt evaluation suites even if you’re not using OpenAI models.
Best for: Building systematic evals for prompt regression testing

Study Notes by Niladri & AI

Explorer

references

References: Prompt Engineering

Official Documentation

Anthropic Prompt Engineering Guide

Foundational Papers

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

ReAct: Synergizing Reasoning and Acting in Language Models

Large Language Models Are Zero-Shot Reasoners

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

Comprehensive Guides & Courses

Prompting Guide (promptingguide.ai)

Learn Prompting

Security & Adversarial Prompting

Prompt Injection Resources (Brex)

Prompt Injection Attacks Against GPT-3 (Simon Willison)

Tools & Infrastructure

Anthropic Console

LangSmith

OpenAI Evals (framework, not model-specific)

Recommended Reading Order

Graph View

Table of Contents