Changelog_

Agent-as-Judge evaluation runs are now returned on GET endpoints, making them fully visible and manageable in the AgentOS UI. This gives teams end-to-end observability of evaluation pipelines, improves governance with auditable results, and reduces time-to-triage when diagnosing model or agent behavior.

Details

  • Retrieve status, scores, and metadata for evaluation runs via read APIs
  • Monitor, filter, and drill into evaluations directly in the AgentOS UI
  • Backward-compatible; no workflow changes required to start seeing results

Who this is for: Platform, MLOps, and QA teams validating agent behavior and benchmarking models at scale.

We overhauled the getting-started cookbook with structured examples, ready-to-use configs, and clear requirements. New projects reach first value faster, with fewer setup errors and better alignment to the latest APIs and patterns.

Details

  • End-to-end templates that demonstrate common agent, tool, and workflow scenarios.
  • Copy-paste configurations for typical environments reduce integration time.
  • Up-to-date guidance minimizes rework and accelerates team ramp-up.

Who this is for: New adopters, solution engineers, and teams scaling Agno across multiple projects.

Agno’s LiteLLM integration now extracts and surfaces reasoning_content for supported models, enabling richer, audit-ready reasoning traces. Teams gain better visibility into model behavior for debugging, evaluation, and governance — without changing application logic.

Details

  • Structured reasoning signals are available through standard responses when using the LiteLLM gateway.
  • Enhances experiment design, incident analysis, and compliance reviews with traceable model steps.
  • Works across compatible reasoning models supported by LiteLLM.

Who this is for: Teams standardizing on LiteLLM who need stronger tracing for reliability engineering, model evaluation, and oversight.

We introduced an async-capable cancellation manager with in-memory and Redis-backed options. This lets you reliably stop long-running or runaway work across distributed workers, improving cost control and adherence to SLAs without adding orchestration complexity.

Details

  • Redis-backed manager coordinates cancellation across multiple nodes; in-memory remains available for local and single-node use.
  • Bring your own implementation via the new public API to standardize cancellation with your existing infrastructure.
  • Non-disruptive adoption; defaults remain unchanged.

Who this is for: Platform and SRE teams running distributed agents/workflows who need predictable termination, cost containment, and safer rollback scenarios.

You can now pass Google OAuth2 service account credentials directly when configuring Vertex AI models. This removes reliance on ambient credentials and gives platform teams precise control over how agents authenticate to Google Cloud, improving security posture and simplifying deployments across environments.

Details

  • Accepts google.oauth2.service_account.Credentials for direct Vertex AI authentication
  • Enables per-environment and per-agent credential isolation for stronger governance
  • Streamlines CI/CD, serverless, and multi-project setups without additional scaffolding
  • Additive change with no breaking impact or migration required

Who this is for: Platform, security, and MLOps teams standardizing on service accounts, especially in regulated or multi-tenant environments.

SemanticChunking now works with all Agno embedders (e.g., Azure OpenAI, Mistral) and custom chonkie BaseEmbeddings via a wrapper, with new parameters for finer control. This expands model choice, helps optimize cost/latency, and reduces vendor lock-in without refactoring pipelines.

Details

  • Plug in your preferred embedding provider with minimal configuration
  • Tune chunk sizes and thresholds to match corpus and performance goals
  • Maintain consistent chunking strategies across environments

Who this is for: RAG builders and platform teams optimizing retrieval quality and TCO.

Workflow event streams now support robust reconnection, catch-up, and replay. Clients automatically resume from the last known event after transient network issues, preventing gaps in dashboards, human-in-the-loop experiences, and downstream automations.

Details

  • Event buffering and replay ensure continuity without manual intervention
  • Backoff and resubscribe logic reduce dropped events and duplicate handling
  • No changes required to existing workflows

Who this is for: Teams running long-lived or interactive workflows that require consistent real-time updates.

AgentOSClient is a first-class client for connecting to and operating a remote AgentOS. It standardizes how you authenticate, manage agents/teams/workflows, and stream events, reducing integration effort and operational risk while accelerating time-to-value.

Details

  • Production-ready patterns with examples and tests to speed adoption
  • Consistent error handling and simplified remote operations
  • Fits CI/CD and service-to-service integrations without bespoke tooling

Who this is for: Platform owners and integrators who need a reliable, supported way to manage AgentOS remotely.

Introducing RemoteAgent, RemoteTeam, and RemoteWorkflow to execute orchestration on a remote AgentOS. This decouples runtime from application code so you can centralize governance and observability, isolate workloads for security and compliance, and scale horizontally without increasing client complexity.

Details

  • Maintain the same agent and workflow definitions; no migration needed
  • Run close to data for lower latency and better utilization
  • Standard APIs for consistent operations across environments

Who this is for: Platform and infrastructure teams operating multi-tenant or regulated environments, or deploying across cloud and on-prem.

Hybrid search combines dense semantic similarity with keyword matching using reciprocal rank fusion (RRF) for Chroma-backed knowledge bases. This delivers more relevant results across diverse content, especially for queries with rare terms, acronyms, or exact phrases, improving answer quality and reducing false negatives in production RAG systems.

Details

  • Works with existing Chroma stores; no schema or migration required
  • Balances lexical and semantic signals for robust top-k retrieval
  • Improves consistency across varied content types and edge-case queries

Who this is for: Teams running RAG, internal search, or support automation that need dependable retrieval quality at scale.

We resolved issues in read and async_read across multiple readers (CSV, field‑labeled CSV, JSON, Markdown, PDF, DOCX, PPTX, S3, Text, and Web Search). Pipelines now ingest documents and data consistently in both synchronous and asynchronous modes, reducing failures, retries, and operational noise.

Details

  • Restores parity between read and async_read for predictable behavior and outputs
  • Stabilizes ingestion from popular file formats, S3, and web sources used in production
  • No code changes required; upgrade to benefit immediately

Who this is for: Teams building knowledge bases, ETL/ingestion pipelines, and retrieval workflows that rely on diverse document sources or high‑throughput async processing.

When both JWT and security key authentication are enabled, JWT now takes precedence. This standardizes behavior, reduces ambiguity for clients, and aligns with common enterprise security practices.

Details

  • No change if only one method is in use
  • For deployments using both, ensure clients present a valid JWT after upgrade
  • Improves governance and reduces authorization edge cases

Who this is for: Security and platform administrators, API consumers, and teams operating shared gateways.

AgentOS now exposes an API endpoint to migrate all managed databases in one operation. This reduces operational overhead in multi-tenant or multi-environment deployments and ensures consistent schema versions during upgrades.

Details

  • Orchestrates migrations across all databases, reducing error risk and manual work
  • Fits CI/CD workflows for faster, safer rollouts
  • Action recommended after upgrade: invoke the endpoint to ensure all schemas are current

Who this is for: Platform and SRE teams operating multiple agents, tenants, or environments.

We added a cost field to Metrics for OpenRouter-backed activity. This provides a reliable, standardized view of model spend without manual spreadsheets or custom aggregations, improving financial governance across environments.

Details

  • Capture per-run and aggregate cost to support budget tracking, reporting, and chargeback
  • Enables cost dashboards and alerts for proactive spend management
  • No configuration changes required; cost appears automatically wherever Metrics are used

Who this is for: Platform and FinOps teams managing LLM spend across providers.

A2A protocol endpoints have been updated to follow standardized URL conventions, and related payloads were aligned to the protocol. Clients must migrate to the new paths to remain compatible with future releases.

Details

  • Update client base paths and payload shapes to the new conventions
  • Use the new Agent Card retrieval endpoint where applicable
  • Plan a staged rollout to minimize downtime and validate behavior

Who this is for: Integration teams and platform owners maintaining A2A clients and cross-system agent orchestration.

JWTMiddleware now enforces token presence on every request. validate=False no longer permits requests without a token. This improves baseline security and reduces the risk of accidental unauthenticated access.

Details

  • Action: propagate JWTs across all clients and internal services
  • Validate non-verified paths still include tokens and pass as expected
  • Monitor auth metrics to verify parity post‑migration

Who this is for: Operators and integration teams managing authentication across services and environments.

AgentOS now blocks initialization/resync if duplicate IDs are detected across Agents, Teams, or Workflows. This ensures unambiguous references and prevents hard-to-debug behavior at runtime.

Details

  • Breaking change: initialization will fail on duplicate IDs
  • Action: audit and ensure unique IDs before upgrading

Who this is for: Platform owners and multi-team deployments managing large catalogs of agents and workflows.

output_schema now accepts provider-specific JSON schemas and passes them directly to model APIs (OpenAI, Claude, and OpenAI‑like). This removes mapping layers, reduces boilerplate, and enables faster adoption of the latest vendor features.

Details

  • Send provider-native JSON schema objects directly to models
  • Less custom translation code and fewer maintenance points
  • Backward-compatible; existing usages continue to work

Who this is for: Teams standardizing structured outputs across multiple model providers.

Milvus search and async_search now support radius, range_filter, and async search_parameters. These controls help teams tune recall vs. precision and reduce tail latency in high-throughput workloads.

Details

  • Radius and range_filter for precise vector similarity windows
  • Optional async execution for lower latency and higher throughput
  • Backward-compatible; defaults unchanged

Who this is for: Teams running RAG and vector search on Milvus that need predictable performance and relevance.

We introduced conventional A2A endpoints — including Agent Card retrieval — and aligned run endpoints and payloads to the updated protocol. This reduces custom handling across clients, improves cross-system compatibility, and clarifies long-term API boundaries.

Details

  • New Agent Card retrieval endpoint
  • Protocol-aligned run endpoints and payloads
  • Requires client updates to adopt the new endpoints and schema

Who this is for: Platform teams integrating agents across services and organizations standardizing on A2A interfaces.

You can now stream reasoning chunks whenever a reasoning model is used. A new ReasoningManager coordinates streaming and lifecycle, giving teams earlier visibility into model thinking, faster debugging, and better auditability — with minimal changes to existing workflows.

Details

  • Real-time streaming of reasoning traces for supported models
  • Centralized control and error handling via ReasoningManager
  • Backward-compatible; enable by providing a reasoning model

Who this is for: Teams building evaluators, regulated or safety-critical applications, and leaders who need transparent reasoning for review and governance.

We’ve added role-based access control (RBAC) to AgentOS via JWT middleware with per-endpoint authorization and per-resource scopes. This brings consistent, least-privilege enforcement across Agents, Teams, and Workflows, reducing custom policy code and operational risk. Standardized scopes help security and platform teams implement clear policies, simplify reviews, and support multi-tenant deployments with confidence. The release includes predefined scopes, enforcement, tests, and examples to speed adoption.

Details

  • Per-endpoint authorization with scoped access to individual resources
  • Clear, reusable scopes reduce policy drift and review overhead
  • Backward compatible; adopting RBAC requires configuration

Who this is for: Platform and security teams, enterprise deployments, and organizations needing strong governance and least-privilege controls.

A new unified token counting utility provides consistent, accurate token estimates across OpenAI, Anthropic, AWS Bedrock, Google Gemini, and LiteLLM. We’ve also integrated token-based compression into Compression Manager to automatically fit content within model limits. Together, these changes simplify multi-model operations and help teams proactively control cost, latency, and throughput.

Details

  • Single API for cross-provider token accounting improves planning and governance
  • Token-aware compression prioritizes relevant context to meet target budgets
  • Reduces prompt overruns and tail latency caused by context overflow
  • Backward-compatible; no required action to upgrade

Who this is for: Platform teams orchestrating multi-model workloads, cost-sensitive deployments, and applications that must meet strict SLAs.

We now populate provider metadata for OpenAI Chat responses and surface it across key response and event objects. Completion ID, system fingerprint, and other model-specific fields are included on ModelResponse/Message and emitted in RunOutput and RunCompletedEvent. This gives teams reliable identifiers to correlate with provider logs and invoices, streamlining debugging, cost analysis, and auditability — without disrupting existing workflows.

Details

  • Access completion_id, system_fingerprint, and model_extra from response.provider_data or event payloads
  • Available in ModelResponse/Message, RunOutput, and RunCompletedEvent
  • Backward compatible and additive; no migration required

Who this is for: Platform, MLOps, and application teams that need faster root-cause analysis, precise cost attribution, and improved observability in production.

To make runs more predictable, the stream and stream_events flags no longer persist across run/arun calls. This eliminates hidden state between invocations and ensures teams explicitly control streaming behavior per execution, improving reproducibility in development and production.

Details

  • Set streaming flags on each run/arun to opt in per execution
  • Reduces surprises and aligns runs across services and environments

Who this is for: Platform owners and teams standardizing run behavior in production pipelines and multi-service deployments.

Streaming experiences using Gemini now accept URL context and web_search_queries, enabling real-time retrieval and reasoning over live web content. This removes prior limitations in streaming flows, improving answer quality for research, summarization, and monitoring scenarios — without requiring any migration.

Details

  • Provide URLs and suggested search queries during streaming for richer, in-flow context
  • Improve response relevance in assistants that reason over current web data

Who this is for: Teams building real-time assistants, research tools, or monitoring workflows on Google Gemini.

We introduced a Shopify toolkit that lets agents analyze store data such as sales, customers, and products without custom integration work. This reduces time-to-value for commerce analytics and reporting, and provides a clear path from prototype to production via a cookbook example.

Details

  • Standardized Tools interface to authenticate and query Shopify data
  • Plug into agents and workflows for automated reporting, alerts, and insights
  • Cookbook example to go from zero to actionable analytics quickly

Who this is for: Shopify developers, data teams, and commerce platforms building analytics, automation, or customer operations on Shopify.

Knowledge add_content_ methods now support true synchronous execution. This removes the async-only limitation, making it straightforward to integrate content ingestion into synchronous services and batch jobs without event loop management or architectural workarounds.

Details

  • Synchronous parity with existing async methods for consistent behavior
  • Drop-in for frameworks and environments that don’t use async
  • No migration steps required

Who this is for: Backend teams building on synchronous frameworks and data pipelines that need reliable, easy-to-use Knowledge ingestion.

Agno now supports reasoning messages from OpenRouter, enabling you to capture and act on models’ reasoning outputs where available. This provides greater transparency for debugging, evaluation, and governance, and expands the set of model capabilities you can use without changing your integration approach.

Details

  • Ingest reasoning messages alongside standard outputs for improved traceability
  • Works with existing routing, logging, and evaluation workflows
  • No migration required; enable where OpenRouter models support reasoning

Who this is for: Teams adopting OpenRouter models that expose reasoning signals and need better observability and evaluation fidelity.

A new built-in evaluation system lets you automate LLM quality checks with binary and numeric scoring, background execution, post-hooks, and customizable evaluator agents. This makes it easier to standardize evals, gate releases, and compare models — without bolting on external systems.

Details

  • Run evaluations in the background to keep pipelines responsive
  • Use post-hooks to persist metrics, trigger alerts, or update dashboards
  • Create custom evaluator agents to encode domain-specific criteria

Who this is for: AI platform teams, ML engineers, and QA leads who need consistent, auditable evaluation workflows at scale.

We’ve added AsyncMySQLDb with native compatibility for the asyncmy driver, enabling fully asynchronous MySQL operations. This unlocks higher concurrency, better throughput, and lower latency for agent and workflow backends that depend on MySQL. Built-in tracing support and cookbook examples reduce integration time and improve observability from day one.

Details

  • Non-blocking I/O with asyncmy for scalable, event-driven architectures
  • Integrated tracing hooks for end-to-end visibility and troubleshooting
  • Cookbook examples to shorten time-to-value and standardize adoption

Who this is for: Teams running high-throughput agents, streaming pipelines, or workflow services that need async database performance and robust observability.

MemoriTools has been removed in favor of Memori SDK v3’s built-in auto-recording. This consolidates functionality in the SDK, reduces integration complexity, and lowers maintenance overhead. To avoid breakage, remove MemoriTools from your code and rely on the SDK for conversation recording.

Details

  • MemoriTools is no longer supported; SDK v3 provides automatic recording
  • Action required: remove MemoriTools imports/usages and update your flows to SDK v3
  • Outcome: a simpler, more reliable integration path with fewer components to manage

Who this is for: Teams adopting or maintaining Memori-based conversation storage who want a supported, lower-friction integration.

Agno now ships with Memori SDK v3.0.5, enabling automatic recording of agent conversations without a separate tool. This simplifies integration, reduces setup time, and ensures a consistent audit trail out of the box. If you previously used MemoriTools, you can remove it — SDK v3 handles recording automatically.

Details

  • Zero-config conversation capture for Agno agents
  • Fewer moving parts and dependencies to maintain
  • Action recommended: upgrade to SDK v3.0.5 to adopt auto-recording

Who this is for: Teams standardizing on conversation archiving for compliance, analytics, or customer support quality.

We’ve moved retry logic from Agents/Teams to the Model layer. When you set retries on a model, Agno now retries at the model execution level, which is more effective for handling provider throttling and transient errors. Agent/Team retries now apply only to run-level exceptions. This change reduces wasted cycles, makes behavior more predictable, and improves throughput under rate limits.

Details

  • Configure retries on the Model to handle LLM/provider errors directly
  • Agent/Team retries now cover orchestration-level failures only
  • Action required: move any Agent/Team retry settings to the associated Model

Who this is for: Teams running production workloads at scale who need consistent behavior and better resilience under variable provider limits.

AgentOS evaluation endpoints now work with asynchronous database backends. Teams using async DB classes can run evaluations without changing their stack, removing a key limitation for modern, event-driven deployments.

Details

  • Evals run as expected with async database drivers, improving parity across environments
  • No configuration changes or migration steps required

Who this is for: Engineering teams standardizing on asynchronous databases who need reliable, automated evaluation workflows.

RunRequirement simplifies how agents request and manage human input. Requirements now surface directly in agent responses or as RunPaused events in streaming flows, providing a consistent pattern for approvals, confirmations, and other human checkpoints. This reduces implementation effort today and lays the groundwork for richer triggers and orchestration in the future.

Details

  • Unified model for HITL across synchronous and streaming executions
  • Less glue code and fewer edge cases to handle in application logic
  • No action required to benefit from the new model

Who this is for: Teams implementing approvals, compliance gates, or manual reviews in agent-driven workflows.

We introduced RedshiftTools, giving agents first-class access to Amazon Redshift without custom glue. Teams can explore schemas, describe tables, inspect and run queries, and export data directly through a consistent tool interface. The toolkit supports both standard credential-based auth and IAM-based authentication (via explicit credentials or AWS profiles), aligning with enterprise security practices.

Details

  • Speed up prototyping and operations by eliminating one-off scripts and SDK wiring
  • Standardize Redshift access patterns across agents and workflows
  • Reduce integration risk with built-in support for IAM authentication

Who this is for: Data and platform teams building agents or workflows that need secure, governed access to Redshift.

We’ve introduced a Spotify toolkit and example agent to manage and interact with Spotify, including library management. This addition reduces custom API work and speeds up delivery of music features in assistants, automations, and internal tools. Teams can quickly prototype, then productionize common Spotify workflows without building from scratch.

Details

  • Prebuilt capabilities for common library operations to minimize integration effort
  • Example agent demonstrates end-to-end usage for faster adoption
  • Compatible with existing Agno agents and workflows; no migration required

Who this is for: Product and platform teams building Spotify-powered assistants, content curation tools, or media automations.

To prevent CreateTable validation errors and ensure reliable, time-ordered queries, the DynamoDB schema for the user Memory table now requires a global secondary index (GSI) on created_at. Deployments using DynamoDB must add this index or recreate the table using the updated schema.

Details

  • Action required for DynamoDB users: add the created_at GSI or reprovision the table.
  • Eliminates schema validation failures and improves query performance.
  • No changes required for other storage backends.

Who this is for: Teams running Agno with AWS DynamoDB for Memory storage.

We introduced native tracing with OpenTelemetry, including first-class spans and new endpoints to inspect traces. Spans are stored in your configured database, giving you immediate, consistent observability without additional instrumentation. This improves debugging, performance analysis, and compliance with reliability objectives.

Details

  • New endpoints: /traces, /traces/<trace_id>, /traces/<trace_id>?<span_id>
  • Works with Agno-supported storage backends; optionally configure exporters to forward data to your observability stack.
  • Speeds up root-cause analysis and shortens time-to-resolution.

Who this is for: Platform, SRE, and ops teams that need standardized, low-friction tracing across agents and tools.

Agent and Team pre- and post-hooks now run as background tasks in AgentOS, so they no longer block the main operation. This reduces end-to-end latency and increases throughput, especially under concurrent load. Teams should ensure hooks are idempotent and not dependent on synchronous completion.

Details

  • Hooks execute concurrently and may complete after the primary request returns.
  • Move any required synchronous logic into the main flow; treat hooks as asynchronous side effects.
  • Expect reduced wait times and better parallelism in high-throughput environments.

Who this is for: Teams scaling agent workloads, multi-tenant platforms, and latency-sensitive use cases.

Runs now support an optional citations field across single, team, and workflow executions. This lets you store and surface model-provided source citations directly in your run metadata, improving traceability, auditability, and user trust without changing existing integrations. The field is non-breaking and can be adopted incrementally to power features like “show your work,” compliance review, and knowledge attribution.

Details

  • Available in RunSchema, TeamRunSchema, and WorkflowRunSchema responses.
  • Optional and backward-compatible; no migrations required.

Who this is for: Teams building user-facing experiences that require explainability, or organizations in regulated environments that need evidence of sources and decision trails.

We added a complete Gemini 3 demo, including example agents, configuration, and generated assets. This makes it faster to evaluate and roll out Gemini 3 within Agno by providing opinionated, runnable patterns you can copy, adapt, and deploy. Teams can stand up proofs of concept in minutes and standardize on a repeatable setup, reducing integration effort and risk.

Details

  • Preconfigured agents and sample assets showcase best practices for orchestration and evaluation.
  • Works out of the box; no changes required to existing projects.

Who this is for: Platform teams and developers evaluating Gemini 3 or scaling multi-model strategies with minimal setup time.

MongoDB clients now support Motor and PyMongo async libraries with improved error handling and typing. This enables non-blocking storage operations, better concurrency, and lower latency in async-first applications.

Details

  • Drop-in async clients for faster, more scalable data operations
  • Enhanced typing and error handling improve reliability and observability
  • Reduces custom glue code for Python async stacks

Who this is for: Teams running high-QPS, async Python services that depend on MongoDB.

We’ve added an optional API key path for AWS Bedrock Claude in addition to IAM. This reduces setup friction in environments where IAM is not feasible while preserving IAM as the default for production.

Details

  • Use AWS_BEDROCK_API_KEY as an alternative authentication method
  • No changes required for existing IAM-based configurations
  • Simplifies local development, cross-account, and restricted-policy scenarios

Who this is for: Enterprises with constrained IAM policies or teams needing rapid prototyping paths.

You can now override output_schema at runtime for both Agent and Team (streaming and non-streaming), with automatic restoration after the run. This enables per-request structured output variations without cloning agents or adding conditional boilerplate.

Details

  • Change the expected output format for a single run; state is restored automatically
  • Works for streaming and batch runs to support diverse downstream consumers
  • Simplifies A/B testing, multi-tenant formats, and evolving contract needs

Who this is for: Platform teams orchestrating varied integrations and output contracts across services.

Agno now offers full support for Google Gemini File Search, including store and document management, uploads/imports, metadata filters, citation extraction, and async APIs. This enables high-quality retrieval workflows with traceability and scale on the Gemini platform.

Details

  • Manage file stores and documents, including bulk uploads and async ingestion
  • Filter by metadata and extract citations for auditability and explainability
  • First-class integration to accelerate RAG and knowledge-heavy agents

Who this is for: Teams standardizing on Google’s AI stack and building retrieval-rich applications with compliance needs.

A new MemoryOptimizationStrategy framework and APIs allow you to summarize and optimize memories outside of agent runs. By decoupling memory maintenance from inference, you can keep context high-signal while reducing runtime tokens and improving decision quality at scale.

Details

  • Schedule or trigger memory compaction and summarization independently of agent runs
  • Keep knowledge current and concise to improve downstream model performance
  • Works without changes to agent logic; designed for scale and governance

Who this is for: Production teams with large or fast-growing memory stores seeking lower costs and tighter control.

Automatically compress and summarize tool call results to keep agent context safely within model token windows. This change reduces context overflow errors, stabilizes long-running workflows, and lowers token spend without requiring any application changes.

Details

  • Summarizes large tool outputs before adding them to conversation history
  • Improves reliability for tool-heavy agents and extended sessions
  • Reduces token usage while preserving relevant signal for downstream reasoning

Who this is for: Teams operating long-running or tool-intensive agents where reliability and cost control are priorities.

We fixed an issue where filtering memories by topic could return incorrect results when using SQLite or AsyncSQLite backends. Topic-based queries now behave predictably, improving the accuracy of agents and workflows that rely on segmented memory retrieval. No action is required — existing implementations will benefit immediately after upgrading.

Details

  • Accurate topic filters for both SQLite and AsyncSQLite memory backends
  • Reduces debugging and unexpected agent responses caused by misclassified results
  • Improves determinism for evaluations, automation, and knowledge reuse

Who this is for: Teams using local SQLite storage for Memory, especially those organizing knowledge by topic for agents, offline/edge deployments, or deterministic test environments.

We added first-class support for Anthropic’s structured outputs, including schema enforcement, strict tool calling, and robust response parsing across synchronous, asynchronous, and streaming APIs. This delivers predictable, typed responses, cuts custom parsing and boilerplate, and improves reliability and governance for production workloads using Claude.

Details

  • Enforce JSON/object schemas to keep outputs consistent and machine-readable
  • Strict tools and end-to-end parsing reduce failure modes and post-processing
  • Streaming support preserves low latency while maintaining structure
  • Backward-compatible and additive; adopt incrementally

Who this is for: Teams standardizing on Claude, building workflow automations, or requiring dependable, typed outputs.

We introduced NanoBananaTools, a turnkey toolkit to generate images with Google’s Nano Banana model. It includes built-in parameter validation and a cookbook example, enabling faster adoption and fewer integration errors. Standardizing how you invoke the model within Agno reduces glue code and makes image features easier to operate and maintain.

Details

  • Ready-to-use tool wrappers with input validations to prevent malformed requests
  • Cookbook example accelerates first run and team onboarding
  • Designed to plug into your existing toolchain to minimize integration effort

Who this is for: Product and platform teams adding image generation or evaluating Google’s vision models.

We removed deprecated AgentOS parameters to standardize on stable naming: os_id -> id, fastapi_app -> base_app, enable_mcp -> enable_mcp_server, replace_routes -> on_route_conflict.

Details

  • Action required: rename parameters to the stable forms
  • Reduces ambiguity and future migration effort

Who this is for: Platform teams embedding AgentOS into services and APIs.

We removed get_messages_for_session and get_messages_from_last_n_runs in favor of get_messages, get_session_messages, and get_chat_history. This unifies patterns and reduces mental overhead.

Details

  • Action required: migrate to the new method names
  • Clearer contracts for history retrieval and observability

Who this is for: Teams managing conversation history, logging, or analytics.

When using knowledge_filters, you must configure contents_db. This ensures deterministic, stateless filtering aligned with AgentOS and prevents silent mismatches.

Details

  • Action required: provide a contents_db for any knowledge base that uses filters
  • Improves reliability and reproducibility of retrieval and filtering

Who this is for: Teams building RAG and knowledge-aware agents at scale.

We removed the stream_events parameter from print_response/aprint_response and CLI. Streaming now works correctly by default, reducing configuration and edge cases.

Details

  • Action required: remove the parameter from calls
  • For fine-grained control, use run()/arun() instead of print helpers

Who this is for: Teams embedding CLIs or console output in developer workflows and demos.

GoogleSearchTools has been removed. Please migrate to DuckDuckGoTools for web search capabilities. This streamlines support and ensures predictable results across environments.

Details

  • Action required: replace imports and configuration with DuckDuckGoTools
  • No major changes to usage patterns or developer workflows

Who this is for: Applications that rely on web search during agent reasoning.

We renamed the Team parameter delegate_task_to_all_members to delegate_to_all_members. This clarifies intent and standardizes naming across the API.

Details

  • Action required: update parameter names in your code
  • No behavior change — only the parameter name has changed

Who this is for: Teams orchestrating multi-agent collaboration in production.

The default Nebius model endpoint is now api.tokenfactory.nebius.com. This aligns with Nebius updated platform and helps avoid legacy endpoints that may degrade or deprecate.

Details

  • Default endpoint updated for new deployments
  • If you still use AI Studio, set base_url explicitly to maintain continuity

Who this is for: Teams using Nebius-hosted models across environments.

Agno now supports the thought signatures required by Gemini 3.0 Pro, ensuring compatibility and unlocking the latest model capabilities without custom integration work.

Details

  • Use Gemini 3.0 Pro models out of the box
  • No code changes beyond selecting supported models

Who this is for: Teams standardizing on Gemini for advanced reasoning or multimodal workloads.

RedisDb now accepts RedisCluster clients, enabling high availability and horizontal scalability for agent state. Teams running clustered Redis can adopt Agno storage without architectural workarounds.

Details

  • Plug in an existing Redis Cluster client; no schema changes required
  • Opt-in support for HA and shared deployments

Who this is for: Organizations with high-throughput or HA requirements for agent storage.

We introduced a MigrationManager and the first migrations for sessions and memories tables. This provides a controlled, repeatable path for schema evolution, reducing upgrade risk and operational overhead.

Details

  • Apply schema changes consistently across environments
  • Improves reliability for session and memory data at scale
  • Action required: run the migrations as part of your upgrade process

Who this is for: Platform teams managing self-hosted storage and lifecycle operations.

You can now generate WAV sound effects directly through ModelsLabTools. This adds an audio SFX modality to Agno, enabling teams to build sound-driven experiences — alerts, games, product interactions — without bespoke pipelines.

Details

  • Handles SFX-specific payloads and responses end-to-end
  • No migration required; select an SFX-capable model and generate WAV outputs

Who this is for: Teams building multimodal apps, interactive experiences, or automated notification systems.

We restored reliable live event streaming across Agents and Teams workflows, ensuring events from custom executor steps are delivered consistently. Functions that yield BaseModel are now handled correctly, and redundant completion checks were removed. The result is accurate, ordered streaming to UIs, logs, and monitors during long-running steps — without any changes required on your end.

Details

  • Streaming from custom executor steps now propagates to clients in real time
  • Support for generators that yield BaseModel ensures consistent serialization and display
  • Removes duplicate completion events to prevent premature stream termination and UI flapping

Who this is for: Teams building Agent/Team workflows that power interactive UIs, long-running automations, or observability pipelines.

The Slack interface now replies only when mentioned, reducing channel noise and improving operator control by default. This change helps teams run bots in busy channels without overwhelming users. If your workflows rely on the previous behavior (reply to all messages), you can retain it by setting the new parameter reply_to_mentions_only to False.

Details

  • New default: replies only to @mentions
  • Configurable with reply_to_mentions_only (set False to preserve prior behavior)
  • Reduces noise and improves signal in high-traffic channels

Who this is for: Teams running Slack-based assistants in large or shared channels who want stronger control over bot interactions and a cleaner user experience.

We introduced a metadata-based filter DSL for Knowledge searches, enabling precise, policy-aligned retrieval at scale. You can now combine filters using EQ, IN, GT/LT, NOT, AND, and OR to target exactly the content you need. Initial support is available for PGVector-backed Knowledge bases. This reduces noise, improves result relevance, and helps enforce governance (for example, by segment, tenant, or timeframe) without complex post-processing.

Details

  • Compose boolean and relational filters to narrow results by metadata
  • Improves accuracy and reduces retrieval overhead in large Knowledge stores
  • Available with PGVector; pass knowledge_filters to opt in

Who this is for: Platform teams and builders managing large or multi-tenant Knowledge bases who need precise, metadata-aware retrieval and stronger guardrails.

Agents can now list, create, apply/remove, and delete custom Gmail labels. This enables end-to-end email triage, routing, and compliance workflows without external glue code, speeding deployment and simplifying ongoing maintenance.

Details

  • Full label lifecycle management through the agent interface
  • Consistent behavior across workflows for reliable automation
  • Additive and opt-in; no changes to existing email integrations

Who this is for: Operations, support, and sales teams automating inbox triage, SLA routing, or compliance tagging at scale.

You can now enable Anthropic beta features by passing the betas parameter. When present, Agno automatically uses the appropriate beta client, making it easier to evaluate new capabilities without changing application code or deployment architecture.

Details

  • Configuration-based opt-in for Anthropic betas
  • Streamlines experimentation and staged rollouts across environments
  • No impact on existing integrations when not used

Who this is for: Teams standardizing on Anthropic who want early access to new features with controlled, low-risk adoption.

Opt-in support for Claude’s context editing helps automatically remove stale tool results from long conversations. This keeps prompts lean, cuts token usage, and reduces context drift — improving cost, latency, and answer quality over extended sessions.

Details

  • Auto-clears outdated tool outputs to keep conversation history focused
  • Lowers token counts without manual prompt engineering or ad hoc pruning
  • Improves responsiveness and maintains accuracy in long-running chats
  • Optional and backward-compatible

Who this is for: Teams running long-lived assistants (support, ops, internal copilots) where conversation length and cost control are critical.

We introduced ParallelTools with Search and Extract APIs to deliver LLM-ready excerpts and robust markdown from the open web and PDFs — including JS-heavy pages. This removes the need for custom scrapers, accelerates onboarding, and improves the reliability of downstream RAG and automation workflows.

Details

  • Handles JavaScript-rendered sites and PDFs, returning clean, structured markdown
  • Produces concise, citation-friendly excerpts optimized for LLM consumption
  • Reduces integration and maintenance risk by eliminating bespoke scraping infrastructure
  • Fully opt-in with no changes required to existing workflows

Who this is for: Teams building research agents, RAG pipelines, due diligence, or monitoring workflows that depend on high-quality external content.

Custom executors can now yield native objects — no need to wrap every output as an Agno event. This fix removes a key limitation, making it easier to integrate existing business logic and libraries without extra boilerplate.

Details

  • Non-event objects are accepted and persisted correctly during workflow upserts
  • Reduces friction for custom runtime behaviors and accelerates prototyping
  • Expands compatibility with existing pipelines and data models

Who this is for: Teams extending workflows with custom executors or integrating external systems with minimal code changes.

We introduced yield_run_output for Agent and Team runs, replacing yield_run_response (slated for deprecation). This clarifies how outputs are streamed or collected, aligning behavior across orchestration layers and reducing ambiguity in run handling.

Details

  • Drop-in replacement in most cases; update code to use yield_run_output
  • Aligns Agent and Team APIs for consistent output control
  • Future-proofs integrations as we consolidate run output handling

Who this is for: Platform teams standardizing run behavior and developers building interactive agents or workers.

Run context now propagates automatically through every workflow step — including parallel branches. This delivers predictable, shared state across complex flows, reducing manual plumbing and avoiding subtle state drift. The result is simpler orchestration, fewer edge cases, and more reliable executions.

Details

  • First-class run_context is available in each step and executor, including parallelized paths
  • Simplifies branch/merge logic and improves auditability of multi-step runs
  • Recommended to update workflows to rely on run_context instead of ad-hoc state passing

Who this is for: Teams orchestrating multi-step or fan-out/fan-in automations that need consistent, observable state across steps.

Child agents now inherit only the primary model from their parent. Auxiliary models for output, parsing, or reasoning are no longer inherited, ensuring predictable configurations and reducing hidden coupling across teams of agents. If you relied on the previous behavior, explicitly set auxiliary models on the child.

Details

  • Primary model inheritance remains; auxiliary model inheritance is removed
  • Improves correctness and clarity in multi-agent team configurations
  • Review configs if you previously depended on implicit auxiliary inheritance

Who this is for: Teams composing agent hierarchies or reusable team templates that need reliable, isolated model configurations.

AgentOS now supports multiple Agno tables of the same type within a single database. This enables clean tenant or namespace isolation without multiplying databases, lowering operational overhead while improving data segmentation and governance.

Details

  • Create parallel tables (e.g., per tenant, region, or environment) under one DB id
  • No migration required for existing setups; opt in as needed
  • Improves isolation and compliance without additional infrastructure

Who this is for: Platform teams building multi-tenant or compliance-segmented deployments seeking better isolation with lower cost and complexity.

AG-UI request state is now passed and mapped into session_state, ensuring agents receive the right UI context with no extra plumbing. This reduces boilerplate, keeps state in sync across turns, and speeds up UI integration for agent-driven experiences.

Details

  • UI request context flows automatically into agent session state
  • Eliminates manual mapping code and reduces integration risk
  • Works with existing AG-UI integrations; no action required

Who this is for: Product and platform teams shipping agent UIs, assistants, and interactive experiences that depend on consistent state across steps.

We introduced a strict_output setting that enforces exact adherence to your output_schema by default across supported models. This delivers consistent, machine-parseable responses without extra validation layers, reducing integration effort and downstream failures. You can relax enforcement when needed by switching to guided mode.

Details

  • Strict mode is on by default to return outputs that match your schema exactly
  • Guided mode offers more flexibility when strictness isn’t required
  • Fewer retries, simpler validators, and more reliable pipelines

Who this is for: Teams building structured workflows, data pipelines, or API integrations that require deterministic, schema-compliant outputs.

We removed Redis and redisvl as dependencies of the base VectorDb class. VectorDb can now be used without installing or configuring Redis, reducing setup time and expanding where Agno can run (for example, serverless, constrained, or air-gapped environments). Teams adopt only the vector store they need, improving portability and lowering maintenance.

Details

  • Redis support remains optional; add it only when your use case requires it.
  • No breaking changes expected for existing Redis users.
  • Smaller install footprint and fewer transitive dependencies.

Who this is for: Platform teams standardizing on non-Redis vector stores, organizations with strict dependency policies, and developers optimizing CI/CD and container images.

To comply with Exa API v2.0.0, ExaTools has removed support for the highlights parameter. Calls that include it will fail. Update your integrations to avoid errors and maintain service compatibility.

Details

  • Remove the highlights argument wherever ExaTools is invoked
  • No other behavior changes; this is an upstream alignment
  • Ensures stable, future-proof use of Exa-powered search

Who this is for: Engineering teams integrating ExaTools in production or CI paths.

ExaTools no longer accepts or passes the removed highlights parameter, aligning with Exa API v2.0.0. This prevents runtime errors and ensures forward compatibility with the upstream service.

Details

  • Breaking change: remove highlights from your ExaTools usage
  • No change to core capabilities — this is an API compliance update
  • Minimal code update to maintain reliability

Who this is for: Teams using ExaTools for search who are upgrading to the latest release.

We introduced a migration script to move existing VectorDB data to the v2 format. This protects compatibility, unlocks improvements in the new version, and reduces risk during upgrades.

Details

  • One-time migration for existing datasets
  • Designed to preserve data integrity and minimize downtime
  • Action required for existing stores before adopting v2 features

Who this is for: Any team with existing Knowledge/VectorDB data preparing to upgrade.

MCPTools now supports a tool_name_prefix to avoid name collisions when sourcing tools from multiple MCP servers. This ensures predictable resolution, safer consolidation, and cleaner governance in larger deployments.

Details

  • Per-server namespacing to guarantee unique tool identities
  • Backward-compatible; no change needed for single-server setups
  • Recommended to set tool_name_prefix when using multiple servers

Who this is for: Teams orchestrating tools from multiple MCP servers or integrating third-party MCP sources.

We added a vLLM embedder with batching support, enabling high-throughput, cost-controlled embeddings on your infrastructure or via remote endpoints. This gives you more deployment flexibility without changing your application code.

Details

  • Batch encoding for better throughput and lower cost per token
  • Works with local GPU deployments or managed vLLM services
  • Additive capability — simply select vLLM as your embedder

Who this is for: Teams requiring performance, data residency, or cost control for embeddings at scale.

Knowledge now supports a Redis VectorDB backend. Teams standardized on Redis can consolidate infrastructure, simplify operations, and reduce latency by keeping vector search close to existing caches and data services.

Details

  • Additive option — no changes required unless you choose Redis
  • Includes examples and tests to accelerate adoption
  • Enables consistent ops and cost control in Redis-centric stacks

Who this is for: Organizations running Redis at scale that want to unify data services and minimize operational overhead.

We introduced RunContext to carry session state, dependencies, metadata, and knowledge filters through every step of a run. This creates a single, predictable way to pass context across tools, hooks, and functions — reducing boilerplate, cutting integration mistakes, and improving consistency across workflows.

Details

  • One parameter to propagate context end-to-end, simplifying code and reviews
  • Fewer wiring errors and easier debugging with a unified mental model
  • Partial adoption required: add a run_context parameter where applicable

Who this is for: Platform teams and developers standardizing orchestration patterns across complex agent workflows.

FileTools now supports delete, chunked read, and partial replace operations, with new size limits and base_dir disclosure controls. This lets teams handle large files efficiently while enforcing stricter boundaries to reduce data leakage risks.

Details

  • Chunked reads improve performance and memory usage for large files
  • Partial replace enables targeted edits without full rewrites
  • Configurable size limits and directory scoping enforce policy and reduce risk
  • Additive update; existing workflows continue to work unchanged

Who this is for: Platform and operations teams processing files within agents and workflows at scale.

A new Notion toolkit and cookbook make it easy to connect agents to Notion. Teams can read, create, and update Notion pages and databases programmatically, reducing integration effort and speeding up delivery of content and knowledge workflows.

Details

  • Prebuilt actions for common Notion operations
  • Cookbook examples to ship production workflows faster
  • Streamlined setup to reduce integration overhead

Who this is for: Content operations, customer support, and internal knowledge management teams working in Notion.

Agno now validates input schemas for Agents and Teams, enforcing strong contracts at the edge. This catches issues earlier, improves reliability in production, and shortens debugging cycles. Teams get clearer error messages and safer deployments without changing how they build.

Details

  • Define input schemas once; validation runs on every invocation
  • Clear, actionable errors accelerate troubleshooting
  • Backward compatible with existing agents and workflows

Who this is for: Platform owners standardizing multi-agent systems and teams operating governed workflows at scale.

We introduced WorkflowAgent, which powers chat-like workflows that decide when to answer from conversation history and when to execute workflow steps. This reduces latency and cost for routine questions while still enabling full automation when needed. Teams can deliver more natural conversational experiences without restructuring existing workflows.

Details

  • Automatically chooses between responding from history and orchestrating workflow steps
  • Preserves context across turns for consistent answers
  • Works with existing tools and workflows; no migration required

Who this is for: Product and platform teams building assistants, support workflows, and operations bots.

Updating AgentOS now automatically refreshes all API routers — including custom ones — so your routes stay aligned with the new OS state without manual intervention. This removes a prior limitation where only built-in routers were refreshed, reducing operational risk and preventing routing drift after upgrades.

Details

  • No code changes or restarts required after an AgentOS resync; custom routes rebind automatically.
  • Prevents stale endpoints and routing errors, improving reliability and time-to-recover during OS updates.
  • Consistent behavior across environments streamlines release processes and rollbacks.

Who this is for: Platform teams and developers who extend Agno with custom API routes or manage controlled OS upgrades across environments.

AsyncPostgresDb now accepts id as the primary identifier, with db_id deprecated. This aligns naming with the broader platform and eliminates parsing edge cases, improving reliability and reducing integration friction.

Details

  • Use id going forward; db_id remains temporarily supported for backward compatibility
  • Update configuration at your next maintenance window to avoid future breakage
  • Reduces ambiguity in connection and resource identification

Who this is for: Teams using Agno’s async Postgres storage for state, memory, or workflow data.

Agent-facing memory managers now include the deletion tool by default. This reduces setup friction and makes it easier to enforce data retention policies, remove outdated information, and keep long-running agents accurate and compliant.

Details

  • Deletion is available out of the box; no extra configuration required
  • Supports better governance for PII and sensitive content
  • If you previously relied on stricter retention by omission, review your defaults

Who this is for: Teams prioritizing data governance, compliance, and the accuracy of long-running agent workloads.

You can now update a live AgentOS instance — adding Agents, Teams, and Workflows — inside FastAPI lifespan functions. This enables safe, zero-downtime configuration changes at application startup/shutdown boundaries, improving deployment velocity and reducing operational risk.

Details

  • Apply structural changes to your multi-agent system without full restarts or manual orchestration
  • Standardizes change management in a predictable, framework-native lifecycle block
  • Lowers maintenance effort for services that evolve frequently

Who this is for: Teams embedding AgentOS in FastAPI services that need to scale or iterate agent topologies without service disruption.

We introduced AsyncMongoDb to provide fully asynchronous MongoDB access end to end. Teams can process more requests concurrently, reduce I/O bottlenecks, and improve responsiveness in agent interactions and workflow runs. This removes the prior sync limitation while remaining fully backward compatible.

Details

  • End-to-end async reads and writes to MongoDB with non-blocking I/O for higher throughput under load
  • Works across agents, teams, and workflows; adopt incrementally alongside existing sync integrations
  • No migration required; opt in where your services are async to simplify architecture and resource usage

Who this is for: Organizations running high-concurrency agent systems, event-driven workflows, or chat automation; engineering teams standardizing on async Python stacks.

New async APIs (aget_session_summary()) let Agents and Teams return session summaries without blocking. This aligns with event-driven and streaming architectures, reducing UI latency and improving responsiveness.

Details

  • Async parity with existing summary APIs
  • Safe to call during long-running operations and streams
  • No migration required

Who this is for: Teams building async services, dashboards, or interactive experiences that require live summaries.

Agents and Teams can now share and reuse the same session to preserve context and history across handoffs. This unlocks smoother multi-agent coordination, better auditability, and less custom plumbing for continuity.

Details

  • Share session state across agents and teams for seamless handoffs
  • Improves traceability for compliance and postmortems
  • Backwards compatible and easy to adopt

Who this is for: Orchestration teams building multi-agent workflows and escalations.

We’ve integrated TavilyReader for knowledge base ingestion and enhanced Tavily tools with URL content extraction (sync and async). This streamlines building and maintaining ingestion pipelines for retrieval-augmented generation.

Details

  • Ingest web pages into Knowledge bases with fewer moving parts
  • Supports sync and async workflows for flexible throughput
  • Reduces custom parsing and ETL glue code

Who this is for: Teams building RAG systems, knowledge bases, and automated research pipelines.

Agents can now leverage Claude’s native skills for documents, spreadsheets, and presentations. This expands what agents can execute natively — from editing and analysis to content generation — without custom tooling.

Details

  • Invoke skills to edit docs, analyze spreadsheets, and build presentations
  • Enhances agent capability coverage with no breaking changes
  • Complements existing toolchains and governance controls

Who this is for: Teams automating knowledge work across operations, research, and product workflows.

Introducing Async SqliteDb for end-to-end async database access. This enables higher throughput and lower tail latency in asyncio-based services by removing blocking I/O, while reducing boilerplate and integration effort.

Details

  • Full async reads, writes, and transactions for SQLite
  • Drop-in for async apps and serverless runtimes
  • Examples included; no migration required for existing sync usage

Who this is for: Teams running agents in async web services, workers, or streaming pipelines.

We’ve added native caching for model responses across sync, async, and streaming APIs. You can configure TTL and cache storage to accelerate repeated prompts and reduce token spend — without building your own cache layer. This is opt-in and works out of the box across the platform.

Details

  • Works across all API modes (sync/async/streaming)
  • Configurable TTL and cache directory for control and portability
  • Reduces latency for repeated prompts in production and evaluation pipelines

Who this is for: Teams optimizing cost, responsiveness, and throughput for high-volume or repetitive LLM workloads.