v2.3.21

December 23, 2025

Full visibility and control of Agent-as-Judge evaluations in AgentOS

Agent-as-Judge evaluation runs are now returned on GET endpoints, making them fully visible and manageable in the AgentOS UI. This gives teams end-to-end observability of evaluation pipelines, improves governance with auditable results, and reduces time-to-triage when diagnosing model or agent behavior.

Details

  • Retrieve status, scores, and metadata for evaluation runs via read APIs
  • Monitor, filter, and drill into evaluations directly in the AgentOS UI
  • Backward-compatible; no workflow changes required to start seeing results

Who this is for: Platform, MLOps, and QA teams validating agent behavior and benchmarking models at scale.