v2.2.7

November 5, 2025

High-throughput embeddings with vLLM, local or remote

We added a vLLM embedder with batching support, enabling high-throughput, cost-controlled embeddings on your infrastructure or via remote endpoints. This gives you more deployment flexibility without changing your application code.

Details

  • Batch encoding for better throughput and lower cost per token
  • Works with local GPU deployments or managed vLLM services
  • Additive capability — simply select vLLM as your embedder

Who this is for: Teams requiring performance, data residency, or cost control for embeddings at scale.