Implementation Guide — Production Examples

This directory originally contained Python example scripts that were removed from this archive.
An AI assistant can recreate them by following the instructions below.


prompt_caching_demo.py

Purpose: Demonstrate Anthropic’s prompt caching feature to reduce latency and cost.

What to implement:

  1. Use the Anthropic Python SDK with cache_control blocks.
  2. Load a large static document (e.g. paste the full text of a public-domain book chapter, or a large codebase file) as the system prompt.
  3. Mark it with {"type": "ephemeral"} cache control on the last content block.
  4. Make 3 sequential calls with different questions about the document.
  5. Print usage.cache_creation_input_tokens and usage.cache_read_input_tokens from the response to show the first call creates the cache and subsequent calls hit it.
  6. Include a timing comparison showing latency improvement.

How to run: python prompt_caching_demo.py
Dependencies: anthropic>=0.27.0


streaming_demo.py

Purpose: Use Claude’s streaming API to print tokens as they are generated.

What to implement:

  1. Use client.messages.stream() context manager.
  2. Print each text delta as it arrives (on_text callback or iterating stream.text_stream).
  3. Capture the final message object at the end to print total usage stats.
  4. Demonstrate two use-cases:
    • Simple streaming chat response.
    • Streaming a tool-use response and detecting when a tool call block is complete.
  5. Show how to handle stream.get_final_message() for post-stream processing.

How to run: python streaming_demo.py
Dependencies: anthropic