High-throughput embeddings with vLLM, local or remote

We added a vLLM embedder with batching support, enabling high-throughput, cost-controlled embeddings on your infrastructure or via remote endpoints. This gives you more deployment flexibility without changing your application code.

Details

Batch encoding for better throughput and lower cost per token
Works with local GPU deployments or managed vLLM services
Additive capability — simply select vLLM as your embedder

Who this is for: Teams requiring performance, data residency, or cost control for embeddings at scale.