v2.2.0

October 23, 2025

Agents now support media-only inputs

Agents can now process inputs that contain only media (no text). This unlocks use cases such as camera-to-answer, voice-only prompts, and file-first interactions, reducing friction in multimodal experiences.

Details

Accepts images, audio, video, or files without requiring text
Simplifies mobile and voice-first integrations

Who this is for: Teams shipping multimodal or mobile-first experiences that prioritize media over text.