Changelog_

Agno adds first-party integrations for four more providers, widening the set of models you can run agents on without leaving the framework. Inception Labs, Xiaomi's MiMo, and MiniMax (M2.7) join as direct model providers, and Cloudflare AI Gateway lands as a provider too, so you can route requests through your gateway and pick up its caching and observability instead of calling each model endpoint directly.

View the docs for more:

A new YouTools toolkit wires up the You.com Search API, so an agent can run web searches through You.com with no custom client to build. Drop the toolkit onto an agent and it gains search the same way it picks up any other Agno toolkit.

See the YouTools docs for setup.

Agno now supports DOCX file generation, so an agent can produce a finished .docx as output rather than handing back raw text for someone to format. Document-producing workflows can deliver the file your users actually want.

Context providers can now stream the events from their sub-agents, so a provider that runs its own agent surfaces that work as it happens instead of going quiet until the final result. You get visibility into intermediate steps for live progress and debugging.

The registry now supports knowledge and managers alongside the components it already tracks, so you can register and reuse those pieces through the same mechanism rather than wiring them up by hand each time.

See Registry docs for more.

Agents, teams, and workflows now persist cancelled runs properly, so a run that gets cancelled is recorded in the database instead of vanishing. Your run history and downstream tooling see the cancellation rather than a gap.

The RunCompleted event now carries a files field, so anything listening for run completion can grab the files a run produced directly off the event instead of fetching them separately.

The model string parser now recognizes the google-interactions provider, so you can select GeminiInteractions through a model string rather than importing and constructing the class yourself. It lines up with the shorthand the other providers already support.

Updated DeepSeek V4's thinking mode and default settings so agents on DeepSeek run against current, sensible defaults out of the box.

Post-hooks and observability integrations can now read the complete resolved approval record, including resolved_by and resolved_at, through run_response.metadata["approval"]. Earlier only status and resolution_data were exposed, so audit and notification logic had no way to see who resolved a run or when. Keeping the record in metadata means it reads the same way across RunOutput, TeamRunOutput, and the future WorkflowRunOutput.

PgVector(prefix_match=True) used to be a silent no-op: it appended a * and then routed through websearch_to_tsquery, which ignores wildcards. It now routes through to_tsquery with proper tokenization, so a partial query like "ani" full-text matches "animal" the way the docs always described. A new cookbook walks through the help-center typeahead use case it unlocks.

On the agent path for Antigravity and Deep Research, the autonomous loop runs its tools inside Google's server-managed sandbox. Agno used to surface those steps as local tool calls, which triggered Function <name> not found errors and follow-up 400 invalid_request failures. Those server-side steps are now skipped on the agent path, so managed agents run cleanly. The model path with your own declared tools is unchanged.

Claude on Anthropic, AWS, and VertexAI used to silently drop an explicit 0 for temperature, top_p, or top_k, since a bare truthiness check treated 0.0 as unset and fell back to the API default near 1.0. Agno now checks is not None, so setting any of these to 0 produces the deterministic output you asked for instead of quietly reverting to a random one.

Managed Deep Research and Antigravity

You can now run Google's two most capable managed agents, autonomous research and a code-running sandbox, without leaving the Gemini setup you already have.

GeminiInteractions reaches both of them through the same model class. You switch one on by setting the managed agent in place of a model ID, so there's no new client and no separate integration to wire up.

Agent 1: hand off the research, get back a cited report. Deep Research plans the task, browses the web, and returns a researched report with citations. It runs in the background, with streaming you can reconnect and resume after a dropped connection, so a long job survives a flaky network instead of starting over.

from agno.agent import Agent

from agno.models.google import GeminiInteractions

research = Agent(

model=GeminiInteractions(id="deep-research-preview-04-2026"),

)

research.print_response("Research solid-state battery commercialization.")

Agent 2: hand off the work, get back the artifacts. Antigravity is a general-purpose agent that plans, runs code, and produces artifacts inside a managed Linux sandbox, so it does the work rather than just describing it.

sandbox = Agent(

model=GeminiInteractions(id="antigravity-preview-05-2026"),

)

sandbox.print_response("Summarize the top 5 Hacker News stories as Markdown.")

Same model class, same interface you already use for Gemini. Swap the ID for a managed agent and you're running.

Managed-agent setup, including the config knobs for each, is in the Agno docs: https://docs.agno.com/models/providers/native/google/gemini-interactions

You can now give your agents a full code-running, web-browsing, file-editing sandbox without building or operating any of it.

Agno ships first-party support for Google's Antigravity agents in two shapes, so you get that power on your existing stack. Which one you pick comes down to a single question: Do you want Antigravity to be the agent, or to work for one?

Shape 1: run it as a full agent, on the surface you already operate.

AntigravityAgent serves straight through AgentOS, so a Google-managed sandbox agent gets native sessions, streaming, and UI without a separate stack to stand up or maintain.

from agno.agents.antigravity import AntigravityAgent
from agno.db.sqlite import SqliteDb
from agno.os import AgentOS

agent = AntigravityAgent(name="Antigravity")

agent_os = AgentOS(
    agents=[agent],
    db=SqliteDb(db_file="tmp/agentos.db"),
)

app = agent_os.get_app()

if __name__ == "__main__":
    agent_os.serve(app="antigravity_agent:app", reload=True)

Add a db and you get persistent workspaces for free. Each interaction provisions a sandbox and returns an environment ID, and AgentOS stores it so later turns in the same session land on the same sandbox. Files, installed packages, and state carry over, instead of every turn starting from scratch.

Shape 2: keep your agent, give it sandbox superpowers.

AntigravityTools lets any existing Agno agent delegate sandboxed work to Antigravity without being rebuilt as one. Your agent keeps its own model and offloads only the part that needs a sandbox.

from agno.agent import Agent 
from agno.models.google import Gemini
from agno.tools.antigravity import AntigravityTools

agent = Agent(
    name="Research Assistant",
    model=Gemini(id="gemini-2.5-pro"),
    tools=[AntigravityTools()],
    markdown=True,
    instructions=[
        "When a task benefits from a sandboxed Linux environment with web search "
        "and code execution, delegate it via run_antigravity_task.",
        "Otherwise answer directly.",
    ],
)

agent.print_response(
    "Use the sandbox to find the latest stable Python release and summarize what changed."
)

Now it can delegate code, search, and file work to a managed sandbox.

Same underlying API, two ways to get the benefit: one for when Antigravity is the worker, one for when it is the helper.

Docs for the agent: https://docs.agno.com/agent-os/multi-framework/antigravity

Docs for the toolkit: https://docs.agno.com/tools/toolkits/others/antigravity

ParallelMCPBackend now sends a User-Agent: agno/<version> header on every request, so Parallel can attribute the traffic your agents generate. Behavior is unchanged; the requests just carry clear provenance now.

EvalsDomainConfig drops its unused available_models field, leaving the top-level AgentOSConfig.available_models as the only supported source for the model dropdown in the Evals UI. With one place to set the list, the two can no longer drift out of sync.

Updated the Chonkie dependency pin as a follow-up to #7869, so installs resolve to the version Agno expects.

Renamed gemini-3-flash-preview to gemini-3.5-flash across the Gemini Interactions cookbooks, so copied examples run against the current model ID instead of the preview name.

A new GeminiInteractions model class builds on Google's stateful Interactions API, so agents can talk to the interactions endpoint directly instead of Gemini's generateContent. Rather than resending the full conversation on every turn, it stores prior turns server-side and references them by ID, so only the new message goes over the wire. That pulls down token cost and latency through implicit caching, and it opens the door to background execution for long-running work like Deep Research. The Agent class tracks the interaction ID for you, so multi-turn conversations just work.

Action required: Install google-genai>=2.0 to use it. The Interactions API is experimental and may change in future versions.

See the Gemini Interactions docs for the full capability set.

AgentOS now offers an opt-in per-user data isolation layer for authenticated endpoints, so one deployment keeps each user's data separated rather than pooling it together. Teams running multi-tenant setups can serve many users from a single AgentOS without standing up separate instances just to keep data apart. It stays off until you enable it, so existing deployments carry on unchanged.

See the RBAC docs for how it fits alongside scopes and authorization. View the Per-User Data Isolation docs for more on authorization.

URL-fetching knowledge readers now take an allowed_hosts parameter, so a reader pulls only from hosts you trust and rejects everything else. This closes the same SSRF and data-exfiltration surface during knowledge ingestion that any link-following fetcher opens in production.

See cookbook for more reference.

Qdrant's async_insert no longer calls the sparse encoder twice. Hybrid inserts now encode once, cutting wasted compute on every write to a Qdrant collection.

A child agent's spans no longer overwrite the parent trace's session_id, agent_id, or team_id when both share a trace_id. The fix shows up most clearly when a Team uses a background post-hook such as @hook(run_in_background=True), where the parent trace now holds onto its own identifiers instead of inheriting the child's.

The workflow HITL continue path now calls the async acleanup_run when it runs in an async context, rather than the synchronous version. Resuming a paused workflow behaves correctly under async execution instead of mismatching the running event loop.

Reviewers no longer have to chase pending approvals one at a time. The Slack interface now supports multi-row approvals with all pause types covered, so a reviewer can resolve several pending approvals in one place without leaving the channel. Confirmations, user-input prompts, structured feedback, and external-execution steps all surface as interactive TaskCards, and each rejection still collects a reason where one applies and passes it back to the agent so it can adjust.

Wiring it up is mostly one decorator and a db (paused runs persist and resume by run_id):

from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from agno.models.openai import OpenAIResponses
from agno.os import AgentOS
from agno.os.interfaces.slack import Slack
from agno.tools import tool


@tool(requires_confirmation=True)
def deploy_service(name: str) -> str:
    """Deploy a service. The run pauses for approval before this runs."""
    return f"Deployed {name}."


# A paused run persists to the database and resumes by run_id once the
# reviewer acts, so the Slack interface needs a db.
db = SqliteDb(db_file="tmp/approvals.db", session_table="agent_sessions")

agent = Agent(
    name="Ops Agent",
    id="ops-agent",
    model=OpenAIResponses(id="gpt-5.4"),
    tools=[deploy_service],
    db=db,
)

agent_os = AgentOS(
    agents=[agent],
    interfaces=[Slack(agent=agent, reply_to_mentions_only=True)],
)
app = agent_os.get_app()

if __name__ == "__main__":
    agent_os.serve(app="approvals:app", port=7777)

See the Slack HITL docs for the full set of pause types and cookbooks.

WikiContextProvider now supports a NotionDatabaseBackend source, so you can point an agent's wiki at a Notion database and have it query that content directly rather than maintaining a separate store. Teams already keeping knowledge in Notion can put it to work as agent context without copying it anywhere first.

Registering two tools under the same name on an agent or team used to fail quietly, with one definition shadowing the other and no signal as to why a tool misbehaved. Agno now warns you the moment a duplicate tool name is registered, so you spot the collision while you are wiring things up instead of debugging it in production.

aget_last_run_output no longer returns None when agent.id is auto-generated during arun(). You get the run output back whether or not you set an explicit agent ID, so code that reads the last result behaves consistently across configured and auto-generated agents.

The /continue endpoint now forwards dependencies and metadata through get_request_kwargs, so a resumed run sees the same context as the original call. Continued runs behave like the runs they pick up from instead of dropping configuration partway through.

LearningMachine now injects its context into the Team system prompt, not just the agent path. Teams get the same learned context that individual agents already received, so their behavior reflects it as expected.

Agno now supports multimodal inputs in the Gemini File Search API, so agents can index and semantically search across images alongside text rather than text alone. This is available with google-genai≥1.75.0; older versions remain text-only.

Action required: Bump google-genai to 1.75.0 or later to use image inputs with Gemini file search. Existing text-only setups keep working without changes.

See the cookbook example for an image-upload walkthrough.

Email and calendar are two of the most-requested grounding sources, and wiring them up usually means custom API clients and token plumbing. Two new context providers, GmailContextProvider and CalendarContextProvider, remove that work: both follow Agno's existing provider patterns and expose each source as natural-language query tools, so an agent reads mail and schedule context through the same consistent interface as every other source, with no bespoke retrieval layer to maintain. Alongside them, GDriveContextProvider now supports OAuth in addition to service-account auth, so an agent can connect to a user's own Drive without a service account.

Learn more about the Gmail provider and Calendar provider, or see the Context Providers overview.

Fetch tools that follow links are an SSRF and data-exfiltration risk in production. LLMsTxtTools now takes an allowed_hosts parameter that closes that surface: an agent only fetches from hosts you explicitly trust, and requests to anything outside the list are rejected, so agents can use llms.txt indexes without being able to reach arbitrary URLs.

File handling is powerful but not always wanted, so SlackContextProvider now puts it behind an enable_media_tools flag that defaults to False. Existing Slack integrations are unaffected until you opt in; when you do, download_file is added to the read tools and upload_file to the write tools, keeping file capabilities split cleanly along the existing read/write boundary.

The AgentOS scheduler now supports Mongo and AsyncMongo as backing stores, so teams already running on Mongo can schedule recurring runs of agents, teams, and workflows without standing up a separate database or external job scheduler. Cron-based background work stays on the infrastructure you already operate.

Conditional branches often wrap fragile work like external calls or tool execution, and until now a failure inside one would propagate unhandled and halt the run. The Condition workflow step now takes an on_error parameter that gives you explicit control over what happens when a sub-step fails, so a workflow can recover or continue instead of failing outright.

Agno introduces WikiContextProvider, a context provider built specifically for wiki and knowledge-base content. It supports filesystem and git backends, so wikis can be loaded straight from a directory or pulled from a git-versioned source, and it can ingest content from the web as well. Read/write flags give teams explicit control over whether an agent is allowed to add new pages or only consume existing ones.

Why it matters: Internal wikis are usually the highest-signal source of organizational knowledge, including runbooks, architecture decisions, and onboarding guides, and they live in inconsistent places, from a docs folder to a git repo to a public site. WikiContextProvider gives agents a single, consistent way to read that content without bespoke ingestion code, and the read/write controls keep human-curated knowledge bases from being modified by accident. For research, support, and engineering agents, this collapses what used to be a custom integration into one configuration step.

Learn more:

A round of fixes makes Slack-backed agents more predictable in production. The interface now gracefully falls back to public channels when the groups:read scope is missing, rather than failing outright, so agents continue to operate under reduced OAuth permissions. Read-instruction overrides have been restored and briefing guidance has been tightened, giving teams more consistent agent behavior across read and write modes. Update operations are now exposed correctly in agent mode, with agent mode returning query-only results where appropriate.

Why it matters: Slack deployments often involve carefully scoped OAuth permissions, custom briefing instructions, and a mix of agent and assistant modes. Each of these fixes removes a sharp edge that previously required workarounds, so teams running Slack agents at scale get more reliable behavior without changing their configuration.

SlackContextProvider has been simplified to a single, self-documenting configuration surface. The for_bot_read(), for_assistant_search(), and for_write() factory methods have been removed in favor of explicit flags on the provider, and SlackTools construction is now inlined so the underlying tool exposes its own capabilities directly. A new opt-in enable_workspace_search parameter is also available for agents that need to search across the workspace.

Action required: If you previously instantiated SlackContextProvider through one of the factory methods, replace those calls with direct construction using the relevant flags. Code that already constructs the provider directly is unaffected.

Why it matters: Factory methods made it harder to see what an agent could actually do with Slack at a glance, and forced runtime mode-switching when configurations needed to combine read, search, and write. Explicit flags make capability composition obvious in the agent definition, simplify reasoning about least-privilege access, and remove a layer of indirection that was easy to misconfigure.

Check out the cookbook for more.

Agno introduces WorkspaceContextProvider, a context provider purpose-built for agents that operate inside a repository root. It's backed by the read-only Workspace toolkit rather than generic file tools, so reading a repository and acting on it stay cleanly separated by default. Exclusion patterns for noise like .context, .venvs, dependency caches, and build artifacts are centralized across FileTools and Workspace, and FilesystemContextProvider now accepts an exclude_patterns parameter for teams that want to opt out or customize the defaults explicitly.

Why it matters: Pointing an agent at a repository is one of the most common patterns in agentic software, and one of the most token-expensive when it pulls in lockfiles, virtualenvs, and build output. Out-of-the-box noise filtering reduces context size, cost, and latency without forcing teams to maintain their own ignore lists. The read-only backing also means the provider is safe to attach to research and analysis agents that should never modify the repo they're reading.

Learn more in the Context engineering docs or the Cookbook.

A new Workspace toolkit gives agents structured access to a configurable root directory, with operations grouped by capability and destructive actions gated by human-in-the-loop confirmation by default. Read, list, and search run freely. Write, edit, move, delete, and shell pause for explicit approval before they execute. The toolkit is scoped to the directory you pass at construction, bounding an agent's blast radius to the path you specify.

Filesystem and shell access unlock the most useful "agent-as-coworker" patterns, including code generation, document editing, and operational scripts that touch real systems. Shipping these capabilities behind HITL by default makes it safe to put an agent in front of real work and expand write privileges progressively as confidence grows. The confirmation policy is configurable per action, so teams can tighten or loosen oversight without rewriting the agent.

Here's a minimal example. Reads run silently; writes pause for approval:

from agno.agent import Agent
from agno.tools.workspace import Workspace

agent = Agent(
    model=...,
    tools=[Workspace("/path/to/workspace")],
)

run = agent.run("Read draft.md and fix the typo on the line about typos.")

# Reads execute immediately. The edit pauses for confirmation.
while run.is_paused:
    for requirement in run.active_requirements:
        if requirement.needs_confirmation:
            # Inspect requirement.tool_execution, then confirm or reject.
            requirement.confirm()
    run = agent.continue_run(run_id=run.run_id, requirements=run.requirements)


In AgentOS, pauses surface as approval cards in the run timeline. In a plain script, you drive the confirmation loop yourself, as shown above.

See the full cookbook example for the complete pattern, including how to wire up an interactive prompt.

Agno has updated the default model id used by several model providers to newer, actively supported versions. Agents that don't pin a specific model will now run on more current models, helping teams avoid upcoming provider deprecations, get more consistent performance, and in many cases lower inference cost.

Action required: If your application depends on specific model behavior, pin the version explicitly on the model class rather than relying on the default. For example, prefer OpenAIResponses(id="...") or Claude(id="...") over leaving id unset, so future default updates don't change your agent's behavior unexpectedly.

Why it matters: Provider model lifecycles move on the provider's timeline, not yours. Tracking defaults to actively supported models keeps existing Agno applications running smoothly through deprecations and avoids the operational scramble of a forced migration when an older model retires.

Agno now supports Anthropic's multi-block prompt caching for Claude models, giving teams granular control over what gets cached and for how long. You can define multiple system prompt blocks, each with its own cache setting and TTL of either 5 minutes or 1 hour, and opt in to caching tool definitions so the tool prefix is reused across requests. Tool serialization is also deterministic across Anthropic, OpenAI, Gemini, and Bedrock, so request prefixes stay stable from run to run and cached tokens actually hit.

Why it matters: Production agents with long system prompts or large tool catalogs can see meaningful reductions in inference cost and time-to-first-token without changing application logic. Caching is opt-in and configured at the model level, so existing agents are unaffected until you turn it on.

See Cookbook for reference.

WebContextProvider now ships with a Parallel backend, giving agents access to high-quality web search and page fetch through Parallel's hosted research service. The backend exposes both web_search and web_fetch with compressed markdown output, runs keyless by default for fast experimentation, and supports Bearer authentication or OAuth for higher rate limits and production workloads. Default timeouts are tuned for fetching larger pages, so long-running research calls complete reliably out of the box.

Why it matters: Adding web context to an agent traditionally meant wiring up a search API, building a fetcher, and managing rate limits and timeouts. The Parallel backend collapses that work into a single configuration option, so teams can stand up a production-ready research pipeline in minutes and focus on agent behavior instead of infrastructure.

Learn more in our Parallel MCP agent docs.

The openai: model prefix now resolves to OpenAIResponses rather than the legacy Chat Completions surface. New agents written as Agent(model="openai:...") automatically route through OpenAI's Responses API, giving teams access to its richer feature set, including built-in tools and improved streaming behavior, without code changes.

Action required: If your application depends on Chat Completions semantics, switch to the new openai-chat: prefix, for example Agent(model="openai-chat:gpt-4.1"). No action is needed for teams that already instantiate OpenAIChat or OpenAIResponses directly.

Why it matters: Aligning the default with the Responses API moves new agents onto OpenAI's actively developed surface, reduces friction when adopting newer capabilities, and makes the most capable behavior the path of least resistance for everyone building on Agno.

AgentOS now runs agents built with the Claude Agent SDK, LangGraph, and DSPy alongside native Agno agents, all through a unified AgentProtocol interface. Teams can standardize on one runtime, control plane, and observability layer without rewriting agents that already exist in other frameworks.

This turns AgentOS into a framework-agnostic platform. Engineering organizations can adopt Agno incrementally, bringing existing agent investments under a single production environment for sessions, tracing, scheduling, and role-based access control. It also reduces lock-in for teams evaluating multiple agent frameworks in parallel.

Available in beta. Native Agno agents remain fully supported with no changes required.

Learn more about Multi-Framework Support in our docs.

The new agno.context API lets agents reach into filesystems, web sources, SQL databases, Slack, Google Drive, and MCP servers as natural-language tools. What used to require custom integrations, retrieval pipelines, or bespoke tool wrappers now works through one first-party interface.

Context providers turn live data sources into queryable context for any agent, without forcing teams to build and maintain their own retrieval layer. Agents stay grounded in the actual systems your organization already runs, and platform owners get a consistent integration surface to govern and observe.

This shortens time-to-value for retrieval-heavy use cases and removes a recurring source of glue code from production agent stacks.

Browse all built-in context providers in our docs.

AgentFactory, TeamFactory, and WorkflowFactory let you create agents, teams, and workflows dynamically at runtime instead of defining them statically at startup. Each request can spin up its own configuration, drawing on per-tenant settings, runtime context, or user-specific permissions.

For platform teams running shared infrastructure across customers, departments, or business units, this removes a structural limitation. You no longer need a separate process or deployment to isolate configurations between tenants. One AgentOS instance can serve many distinct contexts with appropriate boundaries.

The factory pattern also makes A/B testing, gradual rollouts, and per-environment customization straightforward, since the agent definition is decided when the request arrives rather than baked into the deployment.

Learn more about Dynamic Agents in our docs.

Human-in-the-loop is now available for Teams, with full support in the AgentOS chat interface and a dedicated API layer. Operators can review, intervene in, and steer team-level decisions the same way they already can with single agents.

Multi-agent teams often produce more consequential output than individual agents, since they coordinate across roles to complete higher-stakes tasks. Adding HITL at the team level closes a governance gap for organizations deploying teams in customer-facing or regulated workflows.

This gives platform owners a consistent oversight model across single agents and teams, so review processes, escalation paths, and compliance controls work the same way regardless of how an agent system is structured.

Learn more about HITL for Teams in our docs.

Teams now support approval flows through both the API and the AgentOS chat interface. Sensitive actions can be paused for explicit human sign-off before they execute, giving operators a clear control point for high-impact operations.

Approvals work the same way they already do for single agents, so teams managing both can apply consistent governance policies across them. Engineering and compliance leaders can require human authorization for actions like financial transactions, data writes, customer communications, or any step that needs accountability before it ships.

This makes multi-agent teams safer to deploy in production environments where every action needs an audit trail and a responsible decision-maker on record.

Learn more in the Approvals docs.

Background runs streamed over Server-Sent Events can now reconnect and resume after a disconnection or page refresh. Operators rejoin the run exactly where they left off, with full context preserved.

Long-running agents and teams are common in production, particularly for research, analysis, and multi-step automation. Until now, a transient network drop or browser refresh meant losing the run or restarting from the beginning. The new behavior eliminates that failure mode, making AgentOS more reliable for the workflows users actually run on it.

For operators monitoring live agent activity, this also means fewer interrupted sessions and less wasted compute spent regenerating progress that was already complete.

Learn more in the Background Execution docs.

The /sessions endpoint returns agent, team, and workflow sessions in a single response by default. This gives a complete view of session activity in one call, which is the most common use case for operations dashboards, audit views, and platform monitoring.

To filter for a specific session type, pass ?type=agent, ?type=team, or ?type=workflow as a query parameter.

This is a breaking change. Integrations that previously depended on the endpoint returning only one session type should add the corresponding type filter to preserve their existing behavior. Update any custom dashboards, monitoring scripts, or downstream services that consume this endpoint before upgrading to v2.6.0.

We fixed an issue where custom db table names set on components were being overwritten with defaults when those components were loaded back from configuration. Custom table names are now preserved correctly through the full save and load cycle.

GitHubConfig now accepts a repository override at the request level, allowing agents that work across multiple repositories to specify the target repo per call rather than being locked to a single repo at initialization time.

See cookbook

A new option lets you turn off file citations in Claude responses. This is useful when citations add noise to the output, for example in conversational flows, summarization tasks, or any context where surfacing source references per response is unwanted.

We fixed an issue where headers supplied by header_provider were not being applied during MCP session initialization, only during subsequent requests. Sessions now open with the correct headers from the start, preventing authentication and routing failures on first contact.

We fixed an issue where knowledge databases were not being built live during configuration API calls, causing agents to operate without their knowledge base until a separate build step was triggered. Knowledge databases are now constructed inline as part of the configuration flow.

We fixed an issue where events emitted by inner workflows could lose their identity or be misattributed when bubbling up through outer workflows. Events now carry a nested_depth field on agent and team events, and inner workflow event identity is preserved throughout, making it straightforward to trace exactly where in a nested pipeline any event originated.

We fixed an issue where a shared HTTP/2 client was being injected across all model providers, causing connection conflicts and transient failures under concurrent load. Each provider now maintains its own client, eliminating the source of these errors across all providers simultaneously.

We fixed an issue where cancellation of a client connection during streaming could surface as an unhandled error rather than being handled quietly. CancelledError is now caught explicitly in all router streaming generators, so cancelled connections close gracefully without producing noise in logs or error handlers.

We fixed an issue where JSON cleaning was stripping or corrupting code blocks embedded in string values before the parse was even attempted. The parser now tries a raw JSON parse first and only falls back to cleaning if that fails, preserving code blocks and other structured content in the output as intended.

We fixed an issue where parameters automatically injected by the framework, such as agent, team, and run_context, were appearing in user_input_schema, presenting users with fields they should never need to fill in. These parameters are now excluded, so only genuinely user facing fields appear in the schema.

We fixed an issue where the memory pipeline gate check did not account for extra_messages, causing memory summarization to be skipped in runs where additional context messages were provided alongside the main conversation. The gate now correctly evaluates the full message set, including extra_messages, before deciding whether to run the memory pipeline.

LLMsTxtTools and LLMsTxtReader add native support for the llms.txt standard — a Markdown-based file that websites publish at /llms.txt to provide LLMs with a concise, structured index of their documentation, free of navigation elements, JavaScript, and other noise that wastes context. Agents can now fetch, read, and work with llms.txt files directly, making it straightforward to build agents that are grounded in up-to-date third-party documentation without manual content pipelines.

Details:

  • LLMsTxtReader ingests any llms.txt file into a knowledge base for retrieval and RAG
  • LLMsTxtTools lets agents fetch and query llms.txt indexes directly as a tool call
  • Compatible with any site publishing the standard, including https://docs.agno.com/llms.txt
  • No preprocessing required — llms.txt files are already structured for LLM consumption

See cookbook for reference

SalesforceTools gives agents native access to Salesforce CRM data, making it straightforward to build agents that query records, surface pipeline information, triage support cases, or answer questions about account state — without custom API wrappers or manual data exports.

View the Salesforce docs to learn more.

We fixed an issue where knowledge_table was being read from agent.db instead of contents_db, causing knowledge lookups to fail or return incorrect results when the two databases were configured separately. Knowledge retrieval now correctly targets the intended storage backend.

We fixed two issues in the AG-UI interface: reasoning events are now correctly emitted as they occur so users can follow the model's thinking in real time, and input_content now stores the current user input rather than the full message history, ensuring the correct value is surfaced per turn.

We fixed an issue where workflow steps that included file path images were not being converted correctly, causing those images to be dropped or mishandled when passed between steps. File path images now flow through step conversion as intended.

We fixed handling of response.reasoning_summary_text.delta events in OpenAIResponses so that reasoning content is streamed incrementally as it is generated rather than being dropped or buffered. Users now see the model's reasoning surface in real time alongside the response.

We fixed TeamSession.from_dict() so it no longer mutates the input mapping it receives. Previously, loading a team session from a dictionary could silently modify the original data structure, causing hard-to-trace state issues in workflows that reused or inspected the source mapping after loading.

A new Azure AI Foundry Claude model provider gives teams a first-class way to run Claude models through Microsoft's Azure AI infrastructure, with the same configuration patterns used across other Agno model providers. This is particularly useful for organizations that require Azure-hosted deployments for compliance, data residency, or enterprise procurement reasons.

View the Azure AI Foundry Claude docs to learn more.

OpenAIResponses now supports background mode for the OpenAI Responses API, allowing long-running agent tasks to execute asynchronously without holding an open connection. This is useful for tasks that exceed typical request timeouts or that need to be dispatched and polled rather than streamed directly.

Workflows can now pause after a step completes and wait for a human to inspect the output before it flows to the next step. Configured via HumanReview(requires_output_review=True) on a Step, Router, or Loop, the run pauses with the full step output available in req.step_output. Reviewers can approve, reject with optional feedback to trigger a retry, or edit the output directly — giving teams a structured, auditable post-execution review gate at any point in a pipeline without custom orchestration code.

Details:

  • requires_output_review accepts a bool or a callable predicate that receives the StepOutput at runtime — enabling conditional review (e.g., only pause for outputs over 200 characters, or outputs containing sensitive keywords)
  • Four reviewer actions: confirm() to approve as-is, reject() to reject, reject(feedback="...") to pass correction instructions back to the agent on retry, and edit("new output") to accept with inline modifications
  • on_reject controls rejection behavior: skip, cancel, retry, or else_branch; when on_reject=OnReject.retry, the step re-executes with feedback injected into the agent's next message
  • max_retries (default 3) caps the number of retry attempts before the step is treated as a final rejection
  • Supported on Step, Router, and Loop (via requires_iteration_review on HumanReview for per-iteration review in loops)
  • Flat parameter requires_output_review=True on Step is still supported for backward compatibility

See the Output Review docs for more.

A Workflow can now be used directly as a step inside another workflow, enabling modular composition of reusable sub-pipelines. The inner workflow runs as a single step in the outer workflow, with its output chained to the next step via the standard StepInput/StepOutput interface. Complex orchestrations can be broken into smaller, independently testable units and assembled without duplicating logic — the same sub-workflow can be reused across multiple parent workflows.

Details:

  • Pass a Workflow instance to a Step via Step(name="...", workflow=inner_workflow), or use the shorthand auto-wrap by placing the workflow directly in the steps list (uses the workflow's name as the step name)
  • Inner workflows support the full set of primitives — Condition, Loop, Router, Parallel, agents, and custom executors — mixed in any combination
  • Session state is deep-copied into the inner workflow before execution and merged back into the outer workflow after, keeping state consistent across levels
  • Workflows can be nested multiple levels deep; streaming events bubble up with a nested_depth field so outer and inner events can be distinguished by depth, workflow_id, and workflow_name
  • Enables modular workflow design: build reusable research, processing, or review sub-pipelines once and compose them into larger orchestrations

See the Nested Workflow docs to learn more.

Skills—reusable, instruction-based capability modules—can now be attached to Teams directly via the skills parameter, giving the team leader access to domain expertise without delegating to a member agent. The leader receives skill summaries injected into its system prompt and three skill tools (get_skill_instructions, get_skill_reference, get_skill_script) that let it discover and use skills on demand during a run.

Details:

  • Attach skills to a Team via skills=Skills(loaders=[LocalSkills(...)]), using any SkillLoader such as LocalSkills
  • Skills are surfaced to the team leader only — member agents retain their own independent skill configurations
  • Use team-level skills when the leader needs domain expertise to coordinate (e.g., review standards, routing rules); attach skills to individual member agents when specialists need expertise to execute their own work; both can coexist
  • Skills follow the same pattern as knowledge, memory, and tools: get_tools() adds skill tools to the leader's tool list and get_system_prompt_snippet() injects skill metadata into the leader's system prompt
  • Shared skill directories can be reused across agents and teams without duplication

See Team Skills docs for reference

A new AGNO_LOG_TRACEBACKS environment variable (opt-in) enables full Python tracebacks in log_error and log_warning calls. By default, tracebacks are suppressed to keep logs clean in production; setting this variable surfaces the complete stack trace for faster local debugging and error diagnosis.

Details:

  • Set AGNO_LOG_TRACEBACKS=true to enable full traceback output in log_error and log_warning
  • Off by default; no change in behavior for existing deployments
  • Useful for development environments and debugging sessions where full stack context is needed

SessionSummaryManager now exposes last_n_runs and conversation_limit parameters, giving precise control over how much of the conversation history is fed into summary generation. Teams running long sessions or high-frequency agents can use these to keep summaries focused and cost-efficient by limiting the input window rather than always summarizing the full history.

Details:

  • last_n_runs limits summary generation to the most recent N runs in the session
  • conversation_limit caps the number of conversation turns included in the summary input
  • Both parameters work independently and can be combined
  • No changes required for existing SessionSummaryManager configurations; defaults preserve current behavior

See cookbook for reference.

Resolved an issue where a shared HTTP/2 client was being injected across concurrent OpenAI and Azure OpenAI requests, causing transient 400 errors under load. Each request now uses its own client, eliminating the conflict.

audio_total_tokens is now correctly computed and included in run metrics for OpenAI, Perplexity, and LiteLLM. Audio token usage is now visible alongside text tokens for accurate cost tracking and monitoring.

Resolved a bug where TeamSession.get_messages could return the same message more than once, causing downstream logic that relies on message history to process duplicates.

Resolved a crash in GitHubTools where get_pull_requests would raise an IndexError if the repository contained fewer pull requests than the specified limit. The tool now returns however many PRs are available.

Resolved an unhandled DisambiguationError that caused WikipediaTools to crash when a search term matched multiple Wikipedia articles. A new auto_suggest parameter also lets you control whether Wikipedia's suggestion engine is applied to queries.

Resolved an issue where .msg, .xlsx, and .xls files were not recognized on upload due to missing MIME type mappings. These file types now upload correctly without requiring manual workarounds.

Agents and teams can now be configured with fallback models that activate automatically when the primary model fails, whether from rate limits, outages, context window overflows, or other retryable errors. Fallbacks are tried in order after the primary model’s retry loop is fully exhausted, and each fallback model runs its own independent retry cycle before the next one is attempted. Both simple lists and error-specific routing are supported, giving teams full control over how failures are handled.

Pass fallback_models to any Agent or Team. If the primary model fails after exhausting its retries, each fallback is tried in order until one succeeds.

from agno.agent import Agent
from agno.models.anthropic import Claude
from agno.models.openai import OpenAIChat

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    fallback_models=[Claude(id="claude-sonnet-4-20250514")],
)


If gpt-4o fails after exhausting its own retries, Claude is tried automatically.

Model strings work too:

from agno.agent import Agent

agent = Agent(
    model="openai:gpt-4o",
    fallback_models=["anthropic:claude-sonnet-4-20250514"],
)


See Fallback Models docs for more.

AzureBlobConfig now supports Shared Access Signature (SAS) token authentication as an alternative to connection strings and service principal credentials. This makes it easier to grant time-scoped, permission-limited access to Azure Blob Storage without exposing full account credentials, which is useful for automated pipelines, temporary access grants, and least-privilege storage configurations.

# Clone and setup repo
git clone https://github.com/agno-agi/agno.git
cd agno/cookbook/07_knowledge/cloud

# Create and activate virtual environment
./scripts/demo_setup.sh
source .venvs/demo/bin/activate

# Optiona: Run PgVector (needs docker)
./cookbook/scripts/run_pgvector.sh

python azure_blob.py

Details:

  • Pass a SAS token directly to AzureBlobConfig alongside the account URL
  • Complements existing authentication paths; no migration required for configurations already using connection strings or service principal auth

See the Azure Blob Storage Content Source for Knowledge docs for more.

SlackTools now includes a workspace search tool, letting agents query messages, files, and content across channels directly from a tool call. This makes it straightforward to build agents that surface relevant Slack history as part of a broader workflow, without requiring manual channel navigation or separate integrations.

Details:

  • New search_workspace tool queries Slack's search API and returns matching messages and files
  • Works alongside existing SlackTools capabilities for reading channels, posting messages, and managing threads
  • Requires a Slack token with the appropriate search:read scope

View the Slack Tools docs for more.

Claude 4.6 and later models do not support assistant message prefill, which previously caused silent failures or malformed requests when conversations ended with an assistant turn. Agno now automatically injects a trailing user message in these cases, with centralized detection logic shared across all Claude deployment paths, including Anthropic, AWS Bedrock, Vertex AI, and LiteLLM, so the fix applies consistently regardless of how Claude is served.

Details:

  • Trailing user message injection is applied automatically when the last message in a conversation is an assistant message and the model does not support prefill
  • Prefill support detection is centralized and version-aware, covering Claude 4.6+ across Anthropic, Bedrock, Vertex AI, and LiteLLM providers
  • No configuration changes required; existing agents and teams using Claude are unaffected

ReliabilityEval has been extended with more precise evaluation capabilities: expected tool calls can now be matched as a subset of actual calls rather than requiring an exact full match, argument values are validated against expected parameters, and missing tool calls are explicitly tracked and surfaced in results. Multi-round tool call collection has also been fixed so all rounds are gathered correctly, along with a mutation bug that was modifying original RunOutput.messages in place and an arun() issue using the wrong ID when saving evaluation files.

The /sessions list endpoint now includes a significantly expanded set of fields per session, giving dashboards, monitoring tools, and integrations a more complete picture of each session without requiring separate follow-up requests.

Details:

  • Additional fields returned per session: user_id, agent_id, team_id, workflow_id, session_summary, metrics, total_tokens, and metadata
  • No changes required to existing integrations; new fields are additive
  • Enables richer session filtering, reporting, and analytics directly from the list response

A new /info API endpoint returns a lightweight count of agents, teams, and workflows registered in the AgentOS instance. The endpoint is intentionally unauthenticated, making it suitable as a health or readiness signal for infrastructure tooling, status pages, and deployment pipelines that need a fast, low-cost way to verify instance state.

Details:

  • Returns agent, team, and workflow counts for the current AgentOS instance
  • Unauthenticated by design — no credentials required for lightweight infrastructure checks
  • Useful for readiness probes, status dashboards, and deployment verification scripts

We’ve made ChromaDB operations more reliable by automatically splitting large upsert and query requests into smaller batches at runtime. This prevents failures that used to happen when requests exceeded ChromaDB’s per-request limits.

You can continue calling upsert and query operations the same way as before. The system now handles batching behind the scenes, so large payloads process smoothly without extra work.

import asyncio

from agno.agent import Agent
from agno.knowledge.knowledge import Knowledge
from agno.vectordb.chroma import ChromaDb

# Create Knowledge Instance with ChromaDB
knowledge = Knowledge(
    name="Basic SDK Knowledge Base",
    description="Agno 2.0 Knowledge Implementation with ChromaDB",
    vector_db=ChromaDb(
        collection="vectors", path="tmp/chromadb", persistent_client=True
    ),
)

asyncio.run(
    knowledge.ainsert(
        name="Recipes",
        url="https://agno-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",
        metadata={"doc_type": "recipe_book"},
    )
)

# Create and use the agent
agent = Agent(knowledge=knowledge)
agent.print_response("List down the ingredients to make Massaman Gai", markdown=True)

# Delete operations examples
vector_db = knowledge.vector_db
vector_db.delete_by_name("Recipes")
# or
vector_db.delete_by_metadata({"user_tag": "Recipes from website"})


View the ChromaDB Vector Database docs for more.

Reader classes now correctly propagate the chunk_size parameter to the default chunking strategy they apply when no explicit chunking configuration is provided. Previously, chunk_size set on a reader was silently ignored when falling back to the default chunker, producing chunks of unexpected size.

Details:

  • Fixes chunk_size being ignored in default chunking strategies used by reader classes
  • Ensures consistent chunk sizing across both explicit and default chunking configurations
  • No changes required; the fix applies automatically to all reader classes

Two improvements have been made to the Slack interface to give teams better visibility and more robust handling of long agent responses. A new show_member_tool_calls parameter controls whether tool calls from team members are shown inline in the Slack thread, and automatic card overflow rotation ensures that responses exceeding Slack's message size limit are continued in a new message rather than being truncated or failing silently.

SchedulerTools Gives agents programmatic control over the AgentOS Scheduler, allowing them to create, list, update, enable, disable, trigger, and delete cron schedules as part of a run. This makes it possible to build agents that autonomously manage their own recurring tasks, such as scheduling a report, adjusting a polling interval, or cleaning up stale jobs, without requiring a separate orchestration layer.

Details:

  • Full schedule lifecycle management: create, list, update, enable, disable, trigger, and delete operations available as agent tools
  • Schedules target AgentOS endpoints (e.g., /agents/reporter/runs) with configurable cron expressions, timezones, payloads, retry counts, and timeouts
  • Run history is stored per schedule, giving agents visibility into past execution status, timings, and errors
  • Requires agno[scheduler] and an AgentOS instance with scheduler=True enabled

Learn more in the SchedulerTools docs.

Resolved an unhandled msg_too_long error in the Slack streaming path that caused the agent to fail silently or crash when a streamed response exceeded Slack's message length limit. Long responses are now handled gracefully rather than surfacing an error to the user.

Resolved a collection of bugs affecting agents deployed with Coda, including issues in CodingTools, Slack interface behavior, team streaming output, and the learning pipeline. These fixes restore correct end-to-end behavior for Coda-integrated agents across all affected surfaces.

Resolved an issue where server-side tool blocks in Claude conversations were not being preserved when building subsequent request messages. This caused Claude to lose track of tool interactions mid-conversation, breaking multi-turn flows that relied on server tool results being visible in history.

DoclingTools gives agents the ability to convert documents on demand using the Docling library — accepting PDFs, DOCX, PPTX, XLSX, HTML, images, audio, and video files as input and exporting to Markdown, plain text, HTML, JSON, YAML, DocTags, and VTT. Each output format is a separately togglable tool, so agents only expose the conversions they actually need. Advanced PDF handling is also available, with configurable OCR engines, language settings, table structure recognition, picture classification, and per-document timeouts for scanned or complex documents.

Example: The following agent converts a PDF to Markdown

from agno.agent import Agent
from agno.tools.docling import DoclingTools

agent = Agent(
    tools=[DoclingTools(all=True)],
    description="You are an agent that converts documents from all Docling parsers and exports to all supported output formats.",
)

agent.print_response(
    "Convert to Markdown: cookbook/07_knowledge/testing_resources/cv_1.pdf",
    markdown=True,
)

See the DoclingTools docs for more.

We’ve introduced GoogleSlidesTools to give agents full control over Google Slides. With it, you can create presentations, build out slides, and manage content end to end, all directly from your agent.

Agents can add and reorder slides, insert text boxes, tables, images, and videos, and read existing slide content to stay context-aware. Whether you are building decks from scratch or modifying existing ones, everything happens programmatically in a single workflow.

We support both OAuth and service account authentication, so you can use the toolkit in interactive setups or deploy it in server-side, multi-user environments.

from agno.agent import Agent
from agno.models.google import Gemini
from agno.tools.google.slides import GoogleSlidesTools

agent = Agent(
    model=Gemini(id="gemini-2.0-flash"),
    tools=[
        GoogleSlidesTools(
            oauth_port=8080,
        )
    ],
    instructions=[
        "You are a Google Slides assistant that helps users create and manage presentations.",
        "Always call get_presentation_metadata before modifying slides to get current slide IDs.",
        "Use slide_id values returned by the API -- never guess them.",
        "Return the presentation ID and URL after creating a presentation.",
    ],
    add_datetime_to_context=True,
    markdown=True,
)

agent.print_response(
    "Create a new Google Slides presentation titled 'Quarterly Business Review'. "
    "Then add the following slides: "
    "1. A TITLE slide with title 'Q3 2025 Business Review' and subtitle 'Prepared by the Strategy Team'. "
    "2. A TITLE_AND_BODY slide with title 'Agenda' and body listing: Revenue Overview, Key Metrics, Product Roadmap, Q4 Goals.",
    stream=True,
)


See the Google Slides docs for more.