Full visibility and control of Agent-as-Judge evaluations in AgentOS

Agent-as-Judge evaluation runs are now returned on GET endpoints, making them fully visible and manageable in the AgentOS UI. This gives teams end-to-end observability of evaluation pipelines, improves governance with auditable results, and reduces time-to-triage when diagnosing model or agent behavior.

‍

Details:

Retrieve status, scores, and metadata for evaluation runs via read APIs
Monitor, filter, and drill into evaluations directly in the AgentOS UI
Backward-compatible; no workflow changes required to start seeing results

‍

Who this is for: Platform, MLOps, and QA teams validating agent behavior and benchmarking models at scale.