v2.3.21
December 23, 2025
Full visibility and control of Agent-as-Judge evaluations in AgentOS
Agent-as-Judge evaluation runs are now returned on GET endpoints, making them fully visible and manageable in the AgentOS UI. This gives teams end-to-end observability of evaluation pipelines, improves governance with auditable results, and reduces time-to-triage when diagnosing model or agent behavior.
Details
- Retrieve status, scores, and metadata for evaluation runs via read APIs
- Monitor, filter, and drill into evaluations directly in the AgentOS UI
- Backward-compatible; no workflow changes required to start seeing results
Who this is for: Platform, MLOps, and QA teams validating agent behavior and benchmarking models at scale.
