v2.2.7
November 5, 2025
High-throughput embeddings with vLLM, local or remote
We added a vLLM embedder with batching support, enabling high-throughput, cost-controlled embeddings on your infrastructure or via remote endpoints. This gives you more deployment flexibility without changing your application code.
Details
- Batch encoding for better throughput and lower cost per token
- Works with local GPU deployments or managed vLLM services
- Additive capability — simply select vLLM as your embedder
Who this is for: Teams requiring performance, data residency, or cost control for embeddings at scale.
