v2.3.12

December 13, 2025

Predictable costs and faster responses with cross-provider token counting and smart compression

A new unified token counting utility provides consistent, accurate token estimates across OpenAI, Anthropic, AWS Bedrock, Google Gemini, and LiteLLM. We’ve also integrated token-based compression into Compression Manager to automatically fit content within model limits. Together, these changes simplify multi-model operations and help teams proactively control cost, latency, and throughput.

Details

  • Single API for cross-provider token accounting improves planning and governance
  • Token-aware compression prioritizes relevant context to meet target budgets
  • Reduces prompt overruns and tail latency caused by context overflow
  • Backward-compatible; no required action to upgrade

Who this is for: Platform teams orchestrating multi-model workloads, cost-sensitive deployments, and applications that must meet strict SLAs.