v2.3.9
December 9, 2025
Operationalize model quality with Agent-as-Judge evaluations
A new built-in evaluation system lets you automate LLM quality checks with binary and numeric scoring, background execution, post-hooks, and customizable evaluator agents. This makes it easier to standardize evals, gate releases, and compare models — without bolting on external systems.
Details
- Run evaluations in the background to keep pipelines responsive
- Use post-hooks to persist metrics, trigger alerts, or update dashboards
- Create custom evaluator agents to encode domain-specific criteria
Who this is for: AI platform teams, ML engineers, and QA leads who need consistent, auditable evaluation workflows at scale.
