Implementation Guide — Production Examples

This directory originally contained Python example scripts that were removed from this archive.
An AI assistant can recreate them by following the instructions below.

`prompt_caching_demo.py`

Purpose: Demonstrate Anthropic’s prompt caching feature to reduce latency and cost.

What to implement:

Use the Anthropic Python SDK with cache_control blocks.
Load a large static document (e.g. paste the full text of a public-domain book chapter, or a large codebase file) as the system prompt.
Mark it with {"type": "ephemeral"} cache control on the last content block.
Make 3 sequential calls with different questions about the document.
Print usage.cache_creation_input_tokens and usage.cache_read_input_tokens from the response to show the first call creates the cache and subsequent calls hit it.
Include a timing comparison showing latency improvement.

How to run: python prompt_caching_demo.py
Dependencies: anthropic>=0.27.0

`streaming_demo.py`

Purpose: Use Claude’s streaming API to print tokens as they are generated.

What to implement:

Use client.messages.stream() context manager.
Print each text delta as it arrives (on_text callback or iterating stream.text_stream).
Capture the final message object at the end to print total usage stats.
Demonstrate two use-cases:
- Simple streaming chat response.
- Streaming a tool-use response and detecting when a tool call block is complete.
Show how to handle stream.get_final_message() for post-stream processing.

How to run: python streaming_demo.py
Dependencies: anthropic

Study Notes by Niladri & AI

Explorer

IMPLEMENTATION_GUIDE

Implementation Guide — Production Examples

`prompt_caching_demo.py`

`streaming_demo.py`

Graph View

Table of Contents