Predictable costs and faster responses with cross-provider token counting and smart compression

A new unified token counting utility provides consistent, accurate token estimates across OpenAI, Anthropic, AWS Bedrock, Google Gemini, and LiteLLM. We’ve also integrated token-based compression into Compression Manager to automatically fit content within model limits. Together, these changes simplify multi-model operations and help teams proactively control cost, latency, and throughput.

Details

Single API for cross-provider token accounting improves planning and governance
Token-aware compression prioritizes relevant context to meet target budgets
Reduces prompt overruns and tail latency caused by context overflow
Backward-compatible; no required action to upgrade

Who this is for: Platform teams orchestrating multi-model workloads, cost-sensitive deployments, and applications that must meet strict SLAs.