Implementation Guide — Production Examples
This directory originally contained Python example scripts that were removed from this archive.
An AI assistant can recreate them by following the instructions below.
prompt_caching_demo.py
Purpose: Demonstrate Anthropic’s prompt caching feature to reduce latency and cost.
What to implement:
- Use the Anthropic Python SDK with
cache_controlblocks. - Load a large static document (e.g. paste the full text of a public-domain book chapter, or a large codebase file) as the system prompt.
- Mark it with
{"type": "ephemeral"}cache control on the last content block. - Make 3 sequential calls with different questions about the document.
- Print
usage.cache_creation_input_tokensandusage.cache_read_input_tokensfrom the response to show the first call creates the cache and subsequent calls hit it. - Include a timing comparison showing latency improvement.
How to run: python prompt_caching_demo.py
Dependencies: anthropic>=0.27.0
streaming_demo.py
Purpose: Use Claude’s streaming API to print tokens as they are generated.
What to implement:
- Use
client.messages.stream()context manager. - Print each text delta as it arrives (
on_textcallback or iteratingstream.text_stream). - Capture the final
messageobject at the end to print totalusagestats. - Demonstrate two use-cases:
- Simple streaming chat response.
- Streaming a tool-use response and detecting when a tool call block is complete.
- Show how to handle
stream.get_final_message()for post-stream processing.
How to run: python streaming_demo.py
Dependencies: anthropic