Lower cost and latency on Claude with multi-block prompt caching

Agno now supports Anthropic's multi-block prompt caching for Claude models, giving teams granular control over what gets cached and for how long. You can define multiple system prompt blocks, each with its own cache setting and TTL of either 5 minutes or 1 hour, and opt in to caching tool definitions so the tool prefix is reused across requests. Tool serialization is also deterministic across Anthropic, OpenAI, Gemini, and Bedrock, so request prefixes stay stable from run to run and cached tokens actually hit.

Why it matters: Production agents with long system prompts or large tool catalogs can see meaningful reductions in inference cost and time-to-first-token without changing application logic. Caching is opt-in and configured at the model level, so existing agents are unaffected until you turn it on.

See Cookbook for reference.