v2.3.12
December 13, 2025
Predictable costs and faster responses with cross-provider token counting and smart compression
A new unified token counting utility provides consistent, accurate token estimates across OpenAI, Anthropic, AWS Bedrock, Google Gemini, and LiteLLM. We’ve also integrated token-based compression into Compression Manager to automatically fit content within model limits. Together, these changes simplify multi-model operations and help teams proactively control cost, latency, and throughput.
Details
- Single API for cross-provider token accounting improves planning and governance
- Token-aware compression prioritizes relevant context to meet target budgets
- Reduces prompt overruns and tail latency caused by context overflow
- Backward-compatible; no required action to upgrade
Who this is for: Platform teams orchestrating multi-model workloads, cost-sensitive deployments, and applications that must meet strict SLAs.
