Changelog_

Agno now supports multimodal inputs in the Gemini File Search API, so agents can index and semantically search across images alongside text rather than text alone. This is available with google-genai≥1.75.0; older versions remain text-only.

Action required: Bump google-genai to 1.75.0 or later to use image inputs with Gemini file search. Existing text-only setups keep working without changes.

See the cookbook example for an image-upload walkthrough.

Email and calendar are two of the most-requested grounding sources, and wiring them up usually means custom API clients and token plumbing. Two new context providers, GmailContextProvider and CalendarContextProvider, remove that work: both follow Agno's existing provider patterns and expose each source as natural-language query tools, so an agent reads mail and schedule context through the same consistent interface as every other source, with no bespoke retrieval layer to maintain. Alongside them, GDriveContextProvider now supports OAuth in addition to service-account auth, so an agent can connect to a user's own Drive without a service account.

Learn more about the Gmail provider and Calendar provider, or see the Context Providers overview.

Fetch tools that follow links are an SSRF and data-exfiltration risk in production. LLMsTxtTools now takes an allowed_hosts parameter that closes that surface: an agent only fetches from hosts you explicitly trust, and requests to anything outside the list are rejected, so agents can use llms.txt indexes without being able to reach arbitrary URLs.

File handling is powerful but not always wanted, so SlackContextProvider now puts it behind an enable_media_tools flag that defaults to False. Existing Slack integrations are unaffected until you opt in; when you do, download_file is added to the read tools and upload_file to the write tools, keeping file capabilities split cleanly along the existing read/write boundary.

The AgentOS scheduler now supports Mongo and AsyncMongo as backing stores, so teams already running on Mongo can schedule recurring runs of agents, teams, and workflows without standing up a separate database or external job scheduler. Cron-based background work stays on the infrastructure you already operate.

Conditional branches often wrap fragile work like external calls or tool execution, and until now a failure inside one would propagate unhandled and halt the run. The Condition workflow step now takes an on_error parameter that gives you explicit control over what happens when a sub-step fails, so a workflow can recover or continue instead of failing outright.

Agno introduces WikiContextProvider, a context provider built specifically for wiki and knowledge-base content. It supports filesystem and git backends, so wikis can be loaded straight from a directory or pulled from a git-versioned source, and it can ingest content from the web as well. Read/write flags give teams explicit control over whether an agent is allowed to add new pages or only consume existing ones.

Why it matters: Internal wikis are usually the highest-signal source of organizational knowledge, including runbooks, architecture decisions, and onboarding guides, and they live in inconsistent places, from a docs folder to a git repo to a public site. WikiContextProvider gives agents a single, consistent way to read that content without bespoke ingestion code, and the read/write controls keep human-curated knowledge bases from being modified by accident. For research, support, and engineering agents, this collapses what used to be a custom integration into one configuration step.

Learn more:

A round of fixes makes Slack-backed agents more predictable in production. The interface now gracefully falls back to public channels when the groups:read scope is missing, rather than failing outright, so agents continue to operate under reduced OAuth permissions. Read-instruction overrides have been restored and briefing guidance has been tightened, giving teams more consistent agent behavior across read and write modes. Update operations are now exposed correctly in agent mode, with agent mode returning query-only results where appropriate.

Why it matters: Slack deployments often involve carefully scoped OAuth permissions, custom briefing instructions, and a mix of agent and assistant modes. Each of these fixes removes a sharp edge that previously required workarounds, so teams running Slack agents at scale get more reliable behavior without changing their configuration.

SlackContextProvider has been simplified to a single, self-documenting configuration surface. The for_bot_read(), for_assistant_search(), and for_write() factory methods have been removed in favor of explicit flags on the provider, and SlackTools construction is now inlined so the underlying tool exposes its own capabilities directly. A new opt-in enable_workspace_search parameter is also available for agents that need to search across the workspace.

Action required: If you previously instantiated SlackContextProvider through one of the factory methods, replace those calls with direct construction using the relevant flags. Code that already constructs the provider directly is unaffected.

Why it matters: Factory methods made it harder to see what an agent could actually do with Slack at a glance, and forced runtime mode-switching when configurations needed to combine read, search, and write. Explicit flags make capability composition obvious in the agent definition, simplify reasoning about least-privilege access, and remove a layer of indirection that was easy to misconfigure.

Check out the cookbook for more.

Agno introduces WorkspaceContextProvider, a context provider purpose-built for agents that operate inside a repository root. It's backed by the read-only Workspace toolkit rather than generic file tools, so reading a repository and acting on it stay cleanly separated by default. Exclusion patterns for noise like .context, .venvs, dependency caches, and build artifacts are centralized across FileTools and Workspace, and FilesystemContextProvider now accepts an exclude_patterns parameter for teams that want to opt out or customize the defaults explicitly.

Why it matters: Pointing an agent at a repository is one of the most common patterns in agentic software, and one of the most token-expensive when it pulls in lockfiles, virtualenvs, and build output. Out-of-the-box noise filtering reduces context size, cost, and latency without forcing teams to maintain their own ignore lists. The read-only backing also means the provider is safe to attach to research and analysis agents that should never modify the repo they're reading.

Learn more in the Context engineering docs or the Cookbook.

A new Workspace toolkit gives agents structured access to a configurable root directory, with operations grouped by capability and destructive actions gated by human-in-the-loop confirmation by default. Read, list, and search run freely. Write, edit, move, delete, and shell pause for explicit approval before they execute. The toolkit is scoped to the directory you pass at construction, bounding an agent's blast radius to the path you specify.

Filesystem and shell access unlock the most useful "agent-as-coworker" patterns, including code generation, document editing, and operational scripts that touch real systems. Shipping these capabilities behind HITL by default makes it safe to put an agent in front of real work and expand write privileges progressively as confidence grows. The confirmation policy is configurable per action, so teams can tighten or loosen oversight without rewriting the agent.

Here's a minimal example. Reads run silently; writes pause for approval:

from agno.agent import Agent
from agno.tools.workspace import Workspace

agent = Agent(
    model=...,
    tools=[Workspace("/path/to/workspace")],
)

run = agent.run("Read draft.md and fix the typo on the line about typos.")

# Reads execute immediately. The edit pauses for confirmation.
while run.is_paused:
    for requirement in run.active_requirements:
        if requirement.needs_confirmation:
            # Inspect requirement.tool_execution, then confirm or reject.
            requirement.confirm()
    run = agent.continue_run(run_id=run.run_id, requirements=run.requirements)


In AgentOS, pauses surface as approval cards in the run timeline. In a plain script, you drive the confirmation loop yourself, as shown above.

See the full cookbook example for the complete pattern, including how to wire up an interactive prompt.

Agno has updated the default model id used by several model providers to newer, actively supported versions. Agents that don't pin a specific model will now run on more current models, helping teams avoid upcoming provider deprecations, get more consistent performance, and in many cases lower inference cost.

Action required: If your application depends on specific model behavior, pin the version explicitly on the model class rather than relying on the default. For example, prefer OpenAIResponses(id="...") or Claude(id="...") over leaving id unset, so future default updates don't change your agent's behavior unexpectedly.

Why it matters: Provider model lifecycles move on the provider's timeline, not yours. Tracking defaults to actively supported models keeps existing Agno applications running smoothly through deprecations and avoids the operational scramble of a forced migration when an older model retires.

Agno now supports Anthropic's multi-block prompt caching for Claude models, giving teams granular control over what gets cached and for how long. You can define multiple system prompt blocks, each with its own cache setting and TTL of either 5 minutes or 1 hour, and opt in to caching tool definitions so the tool prefix is reused across requests. Tool serialization is also deterministic across Anthropic, OpenAI, Gemini, and Bedrock, so request prefixes stay stable from run to run and cached tokens actually hit.

Why it matters: Production agents with long system prompts or large tool catalogs can see meaningful reductions in inference cost and time-to-first-token without changing application logic. Caching is opt-in and configured at the model level, so existing agents are unaffected until you turn it on.

See Cookbook for reference.

WebContextProvider now ships with a Parallel backend, giving agents access to high-quality web search and page fetch through Parallel's hosted research service. The backend exposes both web_search and web_fetch with compressed markdown output, runs keyless by default for fast experimentation, and supports Bearer authentication or OAuth for higher rate limits and production workloads. Default timeouts are tuned for fetching larger pages, so long-running research calls complete reliably out of the box.

Why it matters: Adding web context to an agent traditionally meant wiring up a search API, building a fetcher, and managing rate limits and timeouts. The Parallel backend collapses that work into a single configuration option, so teams can stand up a production-ready research pipeline in minutes and focus on agent behavior instead of infrastructure.

Learn more in our Parallel MCP agent docs.

The openai: model prefix now resolves to OpenAIResponses rather than the legacy Chat Completions surface. New agents written as Agent(model="openai:...") automatically route through OpenAI's Responses API, giving teams access to its richer feature set, including built-in tools and improved streaming behavior, without code changes.

Action required: If your application depends on Chat Completions semantics, switch to the new openai-chat: prefix, for example Agent(model="openai-chat:gpt-4.1"). No action is needed for teams that already instantiate OpenAIChat or OpenAIResponses directly.

Why it matters: Aligning the default with the Responses API moves new agents onto OpenAI's actively developed surface, reduces friction when adopting newer capabilities, and makes the most capable behavior the path of least resistance for everyone building on Agno.

AgentOS now runs agents built with the Claude Agent SDK, LangGraph, and DSPy alongside native Agno agents, all through a unified AgentProtocol interface. Teams can standardize on one runtime, control plane, and observability layer without rewriting agents that already exist in other frameworks.

This turns AgentOS into a framework-agnostic platform. Engineering organizations can adopt Agno incrementally, bringing existing agent investments under a single production environment for sessions, tracing, scheduling, and role-based access control. It also reduces lock-in for teams evaluating multiple agent frameworks in parallel.

Available in beta. Native Agno agents remain fully supported with no changes required.

Learn more about Multi-Framework Support in our docs.

The new agno.context API lets agents reach into filesystems, web sources, SQL databases, Slack, Google Drive, and MCP servers as natural-language tools. What used to require custom integrations, retrieval pipelines, or bespoke tool wrappers now works through one first-party interface.

Context providers turn live data sources into queryable context for any agent, without forcing teams to build and maintain their own retrieval layer. Agents stay grounded in the actual systems your organization already runs, and platform owners get a consistent integration surface to govern and observe.

This shortens time-to-value for retrieval-heavy use cases and removes a recurring source of glue code from production agent stacks.

Browse all built-in context providers in our docs.

AgentFactory, TeamFactory, and WorkflowFactory let you create agents, teams, and workflows dynamically at runtime instead of defining them statically at startup. Each request can spin up its own configuration, drawing on per-tenant settings, runtime context, or user-specific permissions.

For platform teams running shared infrastructure across customers, departments, or business units, this removes a structural limitation. You no longer need a separate process or deployment to isolate configurations between tenants. One AgentOS instance can serve many distinct contexts with appropriate boundaries.

The factory pattern also makes A/B testing, gradual rollouts, and per-environment customization straightforward, since the agent definition is decided when the request arrives rather than baked into the deployment.

Learn more about Dynamic Agents in our docs.

Human-in-the-loop is now available for Teams, with full support in the AgentOS chat interface and a dedicated API layer. Operators can review, intervene in, and steer team-level decisions the same way they already can with single agents.

Multi-agent teams often produce more consequential output than individual agents, since they coordinate across roles to complete higher-stakes tasks. Adding HITL at the team level closes a governance gap for organizations deploying teams in customer-facing or regulated workflows.

This gives platform owners a consistent oversight model across single agents and teams, so review processes, escalation paths, and compliance controls work the same way regardless of how an agent system is structured.

Learn more about HITL for Teams in our docs.

Teams now support approval flows through both the API and the AgentOS chat interface. Sensitive actions can be paused for explicit human sign-off before they execute, giving operators a clear control point for high-impact operations.

Approvals work the same way they already do for single agents, so teams managing both can apply consistent governance policies across them. Engineering and compliance leaders can require human authorization for actions like financial transactions, data writes, customer communications, or any step that needs accountability before it ships.

This makes multi-agent teams safer to deploy in production environments where every action needs an audit trail and a responsible decision-maker on record.

Learn more in the Approvals docs.

Background runs streamed over Server-Sent Events can now reconnect and resume after a disconnection or page refresh. Operators rejoin the run exactly where they left off, with full context preserved.

Long-running agents and teams are common in production, particularly for research, analysis, and multi-step automation. Until now, a transient network drop or browser refresh meant losing the run or restarting from the beginning. The new behavior eliminates that failure mode, making AgentOS more reliable for the workflows users actually run on it.

For operators monitoring live agent activity, this also means fewer interrupted sessions and less wasted compute spent regenerating progress that was already complete.

Learn more in the Background Execution docs.

The /sessions endpoint returns agent, team, and workflow sessions in a single response by default. This gives a complete view of session activity in one call, which is the most common use case for operations dashboards, audit views, and platform monitoring.

To filter for a specific session type, pass ?type=agent, ?type=team, or ?type=workflow as a query parameter.

This is a breaking change. Integrations that previously depended on the endpoint returning only one session type should add the corresponding type filter to preserve their existing behavior. Update any custom dashboards, monitoring scripts, or downstream services that consume this endpoint before upgrading to v2.6.0.

We fixed an issue where custom db table names set on components were being overwritten with defaults when those components were loaded back from configuration. Custom table names are now preserved correctly through the full save and load cycle.

GitHubConfig now accepts a repository override at the request level, allowing agents that work across multiple repositories to specify the target repo per call rather than being locked to a single repo at initialization time.

See cookbook

A new option lets you turn off file citations in Claude responses. This is useful when citations add noise to the output, for example in conversational flows, summarization tasks, or any context where surfacing source references per response is unwanted.

We fixed an issue where headers supplied by header_provider were not being applied during MCP session initialization, only during subsequent requests. Sessions now open with the correct headers from the start, preventing authentication and routing failures on first contact.

We fixed an issue where knowledge databases were not being built live during configuration API calls, causing agents to operate without their knowledge base until a separate build step was triggered. Knowledge databases are now constructed inline as part of the configuration flow.

We fixed an issue where events emitted by inner workflows could lose their identity or be misattributed when bubbling up through outer workflows. Events now carry a nested_depth field on agent and team events, and inner workflow event identity is preserved throughout, making it straightforward to trace exactly where in a nested pipeline any event originated.

We fixed an issue where a shared HTTP/2 client was being injected across all model providers, causing connection conflicts and transient failures under concurrent load. Each provider now maintains its own client, eliminating the source of these errors across all providers simultaneously.

We fixed an issue where cancellation of a client connection during streaming could surface as an unhandled error rather than being handled quietly. CancelledError is now caught explicitly in all router streaming generators, so cancelled connections close gracefully without producing noise in logs or error handlers.

We fixed an issue where JSON cleaning was stripping or corrupting code blocks embedded in string values before the parse was even attempted. The parser now tries a raw JSON parse first and only falls back to cleaning if that fails, preserving code blocks and other structured content in the output as intended.

We fixed an issue where parameters automatically injected by the framework, such as agent, team, and run_context, were appearing in user_input_schema, presenting users with fields they should never need to fill in. These parameters are now excluded, so only genuinely user facing fields appear in the schema.

We fixed an issue where the memory pipeline gate check did not account for extra_messages, causing memory summarization to be skipped in runs where additional context messages were provided alongside the main conversation. The gate now correctly evaluates the full message set, including extra_messages, before deciding whether to run the memory pipeline.

LLMsTxtTools and LLMsTxtReader add native support for the llms.txt standard — a Markdown-based file that websites publish at /llms.txt to provide LLMs with a concise, structured index of their documentation, free of navigation elements, JavaScript, and other noise that wastes context. Agents can now fetch, read, and work with llms.txt files directly, making it straightforward to build agents that are grounded in up-to-date third-party documentation without manual content pipelines.

Details:

  • LLMsTxtReader ingests any llms.txt file into a knowledge base for retrieval and RAG
  • LLMsTxtTools lets agents fetch and query llms.txt indexes directly as a tool call
  • Compatible with any site publishing the standard, including https://docs.agno.com/llms.txt
  • No preprocessing required — llms.txt files are already structured for LLM consumption

See cookbook for reference

SalesforceTools gives agents native access to Salesforce CRM data, making it straightforward to build agents that query records, surface pipeline information, triage support cases, or answer questions about account state — without custom API wrappers or manual data exports.

View the Salesforce docs to learn more.

We fixed an issue where knowledge_table was being read from agent.db instead of contents_db, causing knowledge lookups to fail or return incorrect results when the two databases were configured separately. Knowledge retrieval now correctly targets the intended storage backend.

We fixed two issues in the AG-UI interface: reasoning events are now correctly emitted as they occur so users can follow the model's thinking in real time, and input_content now stores the current user input rather than the full message history, ensuring the correct value is surfaced per turn.

We fixed an issue where workflow steps that included file path images were not being converted correctly, causing those images to be dropped or mishandled when passed between steps. File path images now flow through step conversion as intended.

We fixed handling of response.reasoning_summary_text.delta events in OpenAIResponses so that reasoning content is streamed incrementally as it is generated rather than being dropped or buffered. Users now see the model's reasoning surface in real time alongside the response.

We fixed TeamSession.from_dict() so it no longer mutates the input mapping it receives. Previously, loading a team session from a dictionary could silently modify the original data structure, causing hard-to-trace state issues in workflows that reused or inspected the source mapping after loading.

A new Azure AI Foundry Claude model provider gives teams a first-class way to run Claude models through Microsoft's Azure AI infrastructure, with the same configuration patterns used across other Agno model providers. This is particularly useful for organizations that require Azure-hosted deployments for compliance, data residency, or enterprise procurement reasons.

View the Azure AI Foundry Claude docs to learn more.

OpenAIResponses now supports background mode for the OpenAI Responses API, allowing long-running agent tasks to execute asynchronously without holding an open connection. This is useful for tasks that exceed typical request timeouts or that need to be dispatched and polled rather than streamed directly.

Workflows can now pause after a step completes and wait for a human to inspect the output before it flows to the next step. Configured via HumanReview(requires_output_review=True) on a Step, Router, or Loop, the run pauses with the full step output available in req.step_output. Reviewers can approve, reject with optional feedback to trigger a retry, or edit the output directly — giving teams a structured, auditable post-execution review gate at any point in a pipeline without custom orchestration code.

Details:

  • requires_output_review accepts a bool or a callable predicate that receives the StepOutput at runtime — enabling conditional review (e.g., only pause for outputs over 200 characters, or outputs containing sensitive keywords)
  • Four reviewer actions: confirm() to approve as-is, reject() to reject, reject(feedback="...") to pass correction instructions back to the agent on retry, and edit("new output") to accept with inline modifications
  • on_reject controls rejection behavior: skip, cancel, retry, or else_branch; when on_reject=OnReject.retry, the step re-executes with feedback injected into the agent's next message
  • max_retries (default 3) caps the number of retry attempts before the step is treated as a final rejection
  • Supported on Step, Router, and Loop (via requires_iteration_review on HumanReview for per-iteration review in loops)
  • Flat parameter requires_output_review=True on Step is still supported for backward compatibility

See the Output Review docs for more.

A Workflow can now be used directly as a step inside another workflow, enabling modular composition of reusable sub-pipelines. The inner workflow runs as a single step in the outer workflow, with its output chained to the next step via the standard StepInput/StepOutput interface. Complex orchestrations can be broken into smaller, independently testable units and assembled without duplicating logic — the same sub-workflow can be reused across multiple parent workflows.

Details:

  • Pass a Workflow instance to a Step via Step(name="...", workflow=inner_workflow), or use the shorthand auto-wrap by placing the workflow directly in the steps list (uses the workflow's name as the step name)
  • Inner workflows support the full set of primitives — Condition, Loop, Router, Parallel, agents, and custom executors — mixed in any combination
  • Session state is deep-copied into the inner workflow before execution and merged back into the outer workflow after, keeping state consistent across levels
  • Workflows can be nested multiple levels deep; streaming events bubble up with a nested_depth field so outer and inner events can be distinguished by depth, workflow_id, and workflow_name
  • Enables modular workflow design: build reusable research, processing, or review sub-pipelines once and compose them into larger orchestrations

See the Nested Workflow docs to learn more.

Skills—reusable, instruction-based capability modules—can now be attached to Teams directly via the skills parameter, giving the team leader access to domain expertise without delegating to a member agent. The leader receives skill summaries injected into its system prompt and three skill tools (get_skill_instructions, get_skill_reference, get_skill_script) that let it discover and use skills on demand during a run.

Details:

  • Attach skills to a Team via skills=Skills(loaders=[LocalSkills(...)]), using any SkillLoader such as LocalSkills
  • Skills are surfaced to the team leader only — member agents retain their own independent skill configurations
  • Use team-level skills when the leader needs domain expertise to coordinate (e.g., review standards, routing rules); attach skills to individual member agents when specialists need expertise to execute their own work; both can coexist
  • Skills follow the same pattern as knowledge, memory, and tools: get_tools() adds skill tools to the leader's tool list and get_system_prompt_snippet() injects skill metadata into the leader's system prompt
  • Shared skill directories can be reused across agents and teams without duplication

See Team Skills docs for reference

A new AGNO_LOG_TRACEBACKS environment variable (opt-in) enables full Python tracebacks in log_error and log_warning calls. By default, tracebacks are suppressed to keep logs clean in production; setting this variable surfaces the complete stack trace for faster local debugging and error diagnosis.

Details:

  • Set AGNO_LOG_TRACEBACKS=true to enable full traceback output in log_error and log_warning
  • Off by default; no change in behavior for existing deployments
  • Useful for development environments and debugging sessions where full stack context is needed

SessionSummaryManager now exposes last_n_runs and conversation_limit parameters, giving precise control over how much of the conversation history is fed into summary generation. Teams running long sessions or high-frequency agents can use these to keep summaries focused and cost-efficient by limiting the input window rather than always summarizing the full history.

Details:

  • last_n_runs limits summary generation to the most recent N runs in the session
  • conversation_limit caps the number of conversation turns included in the summary input
  • Both parameters work independently and can be combined
  • No changes required for existing SessionSummaryManager configurations; defaults preserve current behavior

See cookbook for reference.

Resolved an issue where a shared HTTP/2 client was being injected across concurrent OpenAI and Azure OpenAI requests, causing transient 400 errors under load. Each request now uses its own client, eliminating the conflict.

audio_total_tokens is now correctly computed and included in run metrics for OpenAI, Perplexity, and LiteLLM. Audio token usage is now visible alongside text tokens for accurate cost tracking and monitoring.

Resolved a bug where TeamSession.get_messages could return the same message more than once, causing downstream logic that relies on message history to process duplicates.

Resolved a crash in GitHubTools where get_pull_requests would raise an IndexError if the repository contained fewer pull requests than the specified limit. The tool now returns however many PRs are available.

Resolved an unhandled DisambiguationError that caused WikipediaTools to crash when a search term matched multiple Wikipedia articles. A new auto_suggest parameter also lets you control whether Wikipedia's suggestion engine is applied to queries.

Resolved an issue where .msg, .xlsx, and .xls files were not recognized on upload due to missing MIME type mappings. These file types now upload correctly without requiring manual workarounds.

Agents and teams can now be configured with fallback models that activate automatically when the primary model fails, whether from rate limits, outages, context window overflows, or other retryable errors. Fallbacks are tried in order after the primary model’s retry loop is fully exhausted, and each fallback model runs its own independent retry cycle before the next one is attempted. Both simple lists and error-specific routing are supported, giving teams full control over how failures are handled.

Pass fallback_models to any Agent or Team. If the primary model fails after exhausting its retries, each fallback is tried in order until one succeeds.

from agno.agent import Agent
from agno.models.anthropic import Claude
from agno.models.openai import OpenAIChat

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    fallback_models=[Claude(id="claude-sonnet-4-20250514")],
)


If gpt-4o fails after exhausting its own retries, Claude is tried automatically.

Model strings work too:

from agno.agent import Agent

agent = Agent(
    model="openai:gpt-4o",
    fallback_models=["anthropic:claude-sonnet-4-20250514"],
)


See Fallback Models docs for more.

AzureBlobConfig now supports Shared Access Signature (SAS) token authentication as an alternative to connection strings and service principal credentials. This makes it easier to grant time-scoped, permission-limited access to Azure Blob Storage without exposing full account credentials, which is useful for automated pipelines, temporary access grants, and least-privilege storage configurations.

# Clone and setup repo
git clone https://github.com/agno-agi/agno.git
cd agno/cookbook/07_knowledge/cloud

# Create and activate virtual environment
./scripts/demo_setup.sh
source .venvs/demo/bin/activate

# Optiona: Run PgVector (needs docker)
./cookbook/scripts/run_pgvector.sh

python azure_blob.py

Details:

  • Pass a SAS token directly to AzureBlobConfig alongside the account URL
  • Complements existing authentication paths; no migration required for configurations already using connection strings or service principal auth

See the Azure Blob Storage Content Source for Knowledge docs for more.

SlackTools now includes a workspace search tool, letting agents query messages, files, and content across channels directly from a tool call. This makes it straightforward to build agents that surface relevant Slack history as part of a broader workflow, without requiring manual channel navigation or separate integrations.

Details:

  • New search_workspace tool queries Slack's search API and returns matching messages and files
  • Works alongside existing SlackTools capabilities for reading channels, posting messages, and managing threads
  • Requires a Slack token with the appropriate search:read scope

View the Slack Tools docs for more.

Claude 4.6 and later models do not support assistant message prefill, which previously caused silent failures or malformed requests when conversations ended with an assistant turn. Agno now automatically injects a trailing user message in these cases, with centralized detection logic shared across all Claude deployment paths, including Anthropic, AWS Bedrock, Vertex AI, and LiteLLM, so the fix applies consistently regardless of how Claude is served.

Details:

  • Trailing user message injection is applied automatically when the last message in a conversation is an assistant message and the model does not support prefill
  • Prefill support detection is centralized and version-aware, covering Claude 4.6+ across Anthropic, Bedrock, Vertex AI, and LiteLLM providers
  • No configuration changes required; existing agents and teams using Claude are unaffected

ReliabilityEval has been extended with more precise evaluation capabilities: expected tool calls can now be matched as a subset of actual calls rather than requiring an exact full match, argument values are validated against expected parameters, and missing tool calls are explicitly tracked and surfaced in results. Multi-round tool call collection has also been fixed so all rounds are gathered correctly, along with a mutation bug that was modifying original RunOutput.messages in place and an arun() issue using the wrong ID when saving evaluation files.

The /sessions list endpoint now includes a significantly expanded set of fields per session, giving dashboards, monitoring tools, and integrations a more complete picture of each session without requiring separate follow-up requests.

Details:

  • Additional fields returned per session: user_id, agent_id, team_id, workflow_id, session_summary, metrics, total_tokens, and metadata
  • No changes required to existing integrations; new fields are additive
  • Enables richer session filtering, reporting, and analytics directly from the list response

A new /info API endpoint returns a lightweight count of agents, teams, and workflows registered in the AgentOS instance. The endpoint is intentionally unauthenticated, making it suitable as a health or readiness signal for infrastructure tooling, status pages, and deployment pipelines that need a fast, low-cost way to verify instance state.

Details:

  • Returns agent, team, and workflow counts for the current AgentOS instance
  • Unauthenticated by design — no credentials required for lightweight infrastructure checks
  • Useful for readiness probes, status dashboards, and deployment verification scripts

We’ve made ChromaDB operations more reliable by automatically splitting large upsert and query requests into smaller batches at runtime. This prevents failures that used to happen when requests exceeded ChromaDB’s per-request limits.

You can continue calling upsert and query operations the same way as before. The system now handles batching behind the scenes, so large payloads process smoothly without extra work.

import asyncio

from agno.agent import Agent
from agno.knowledge.knowledge import Knowledge
from agno.vectordb.chroma import ChromaDb

# Create Knowledge Instance with ChromaDB
knowledge = Knowledge(
    name="Basic SDK Knowledge Base",
    description="Agno 2.0 Knowledge Implementation with ChromaDB",
    vector_db=ChromaDb(
        collection="vectors", path="tmp/chromadb", persistent_client=True
    ),
)

asyncio.run(
    knowledge.ainsert(
        name="Recipes",
        url="https://agno-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",
        metadata={"doc_type": "recipe_book"},
    )
)

# Create and use the agent
agent = Agent(knowledge=knowledge)
agent.print_response("List down the ingredients to make Massaman Gai", markdown=True)

# Delete operations examples
vector_db = knowledge.vector_db
vector_db.delete_by_name("Recipes")
# or
vector_db.delete_by_metadata({"user_tag": "Recipes from website"})


View the ChromaDB Vector Database docs for more.

Reader classes now correctly propagate the chunk_size parameter to the default chunking strategy they apply when no explicit chunking configuration is provided. Previously, chunk_size set on a reader was silently ignored when falling back to the default chunker, producing chunks of unexpected size.

Details:

  • Fixes chunk_size being ignored in default chunking strategies used by reader classes
  • Ensures consistent chunk sizing across both explicit and default chunking configurations
  • No changes required; the fix applies automatically to all reader classes

Two improvements have been made to the Slack interface to give teams better visibility and more robust handling of long agent responses. A new show_member_tool_calls parameter controls whether tool calls from team members are shown inline in the Slack thread, and automatic card overflow rotation ensures that responses exceeding Slack's message size limit are continued in a new message rather than being truncated or failing silently.

SchedulerTools Gives agents programmatic control over the AgentOS Scheduler, allowing them to create, list, update, enable, disable, trigger, and delete cron schedules as part of a run. This makes it possible to build agents that autonomously manage their own recurring tasks, such as scheduling a report, adjusting a polling interval, or cleaning up stale jobs, without requiring a separate orchestration layer.

Details:

  • Full schedule lifecycle management: create, list, update, enable, disable, trigger, and delete operations available as agent tools
  • Schedules target AgentOS endpoints (e.g., /agents/reporter/runs) with configurable cron expressions, timezones, payloads, retry counts, and timeouts
  • Run history is stored per schedule, giving agents visibility into past execution status, timings, and errors
  • Requires agno[scheduler] and an AgentOS instance with scheduler=True enabled

Learn more in the SchedulerTools docs.

Resolved an unhandled msg_too_long error in the Slack streaming path that caused the agent to fail silently or crash when a streamed response exceeded Slack's message length limit. Long responses are now handled gracefully rather than surfacing an error to the user.

Resolved a collection of bugs affecting agents deployed with Coda, including issues in CodingTools, Slack interface behavior, team streaming output, and the learning pipeline. These fixes restore correct end-to-end behavior for Coda-integrated agents across all affected surfaces.

Resolved an issue where server-side tool blocks in Claude conversations were not being preserved when building subsequent request messages. This caused Claude to lose track of tool interactions mid-conversation, breaking multi-turn flows that relied on server tool results being visible in history.

DoclingTools gives agents the ability to convert documents on demand using the Docling library — accepting PDFs, DOCX, PPTX, XLSX, HTML, images, audio, and video files as input and exporting to Markdown, plain text, HTML, JSON, YAML, DocTags, and VTT. Each output format is a separately togglable tool, so agents only expose the conversions they actually need. Advanced PDF handling is also available, with configurable OCR engines, language settings, table structure recognition, picture classification, and per-document timeouts for scanned or complex documents.

Example: The following agent converts a PDF to Markdown

from agno.agent import Agent
from agno.tools.docling import DoclingTools

agent = Agent(
    tools=[DoclingTools(all=True)],
    description="You are an agent that converts documents from all Docling parsers and exports to all supported output formats.",
)

agent.print_response(
    "Convert to Markdown: cookbook/07_knowledge/testing_resources/cv_1.pdf",
    markdown=True,
)

See the DoclingTools docs for more.

We’ve introduced GoogleSlidesTools to give agents full control over Google Slides. With it, you can create presentations, build out slides, and manage content end to end, all directly from your agent.

Agents can add and reorder slides, insert text boxes, tables, images, and videos, and read existing slide content to stay context-aware. Whether you are building decks from scratch or modifying existing ones, everything happens programmatically in a single workflow.

We support both OAuth and service account authentication, so you can use the toolkit in interactive setups or deploy it in server-side, multi-user environments.

from agno.agent import Agent
from agno.models.google import Gemini
from agno.tools.google.slides import GoogleSlidesTools

agent = Agent(
    model=Gemini(id="gemini-2.0-flash"),
    tools=[
        GoogleSlidesTools(
            oauth_port=8080,
        )
    ],
    instructions=[
        "You are a Google Slides assistant that helps users create and manage presentations.",
        "Always call get_presentation_metadata before modifying slides to get current slide IDs.",
        "Use slide_id values returned by the API -- never guess them.",
        "Return the presentation ID and URL after creating a presentation.",
    ],
    add_datetime_to_context=True,
    markdown=True,
)

agent.print_response(
    "Create a new Google Slides presentation titled 'Quarterly Business Review'. "
    "Then add the following slides: "
    "1. A TITLE slide with title 'Q3 2025 Business Review' and subtitle 'Prepared by the Strategy Team'. "
    "2. A TITLE_AND_BODY slide with title 'Agenda' and body listing: Revenue Overview, Key Metrics, Product Roadmap, Q4 Goals.",
    stream=True,
)


See the Google Slides docs for more.

Tool call schemas are now normalized across model providers, so switching an agent from one model to another no longer requires adjusting how tools are defined or how their outputs are parsed. This removes a common source of friction when benchmarking models, migrating providers, or running the same agent across multiple backends.

Details:

  • Tool call inputs and outputs are translated into a consistent internal format regardless of the originating model provider
  • Eliminates provider-specific edge cases in tool schema generation and response parsing
  • Enables drop-in model swapping without changes to tool definitions or agent logic

See Fallback Models docs for more.

A new PerplexitySearch toolkit gives agents access to the Perplexity Search API, returning ranked web results with titles, URLs, snippets, and publication dates in a single tool call. Built-in filtering by recency and domain makes it straightforward to build agents that need up-to-date, source-controlled retrieval without additional post-processing.

Check out this example of basic search:

from agno.agent import Agent
from agno.tools.perplexity import PerplexitySearch

agent = Agent(tools=[PerplexitySearch()], markdown=True)
agent.print_response("What are the latest developments in AI?")


Details:

  • search and asearch (async) functions return a JSON array of results with URL, title, snippet, and date per result
  • search_recency_filter restricts results to content from the past day, week, month, or year
  • search_domain_filter limits results to a specific list of domains (e.g., reuters.com, bloomberg.com)
  • search_language_filter accepts ISO language codes for language-scoped retrieval
  • max_results (default 5) and max_tokens_per_page (default 2048) give fine-grained control over result volume and content length
  • Requires a PERPLEXITY_API_KEY environment variable; no other configuration needed

See the Perplexity docs for reference.

AgenticChunking now accepts a custom_prompt parameter, letting you override the default model-driven chunking instructions with domain-specific logic. Rather than relying solely on the built-in heuristics for finding semantic breakpoints, you can now describe exactly how the model should segment your documents — for example, splitting at major section boundaries, preserving clause integrity, or separating structured metadata from body content — making it straightforward to tune retrieval quality for specialized corpora.

Details:

  • Pass any string to custom_prompt to override the default chunking behavior; custom prompts are prioritized over built-in instructions
  • The default output format constraints are still enforced automatically — custom_prompt only needs to describe the chunking logic itself
  • Always pair custom_prompt with max_chunk_size to bound output length; the default max_chunk_size is 5000 characters
  • The model parameter accepts any Agno-compatible model, allowing you to route chunking to a smaller or cheaper model independently of your agent

See the Custom Prompts docs for more.

Resolved an issue where LanceDB's search() could return the same document multiple times when hybrid search retrieved it via both vector similarity and full-text search. Results are now deduplicated before being returned, ensuring each document appears only once regardless of which search path surfaced it.

Details:

  • Fixes duplicate results in hybrid search caused by the same document matching both the vector and FTS indices
  • Deduplication is applied automatically; no configuration changes required
  • Improves result quality and reduces noise for agents and workflows using LanceDB hybrid search

The Seltz toolkit has been updated to align with the breaking changes introduced in the Seltz SDK 0.2.0 release, replacing the previous 0.1.x integration. Teams using the Seltz toolkit should update their Seltz SDK dependency to 0.2.0 alongside this release.

Details:

  • Updates the Seltz toolkit integration from SDK 0.1.x to 0.2.0
  • Ensures compatibility with the latest Seltz SDK API surface
  • Upgrade the seltz package to 0.2.0 to avoid integration errors

We resolved an issue where tools from async toolkits were not included in the tool name list injected into the team system message, leaving the team unaware of those tools at the prompt level.

We resolved an additional case where hybrid search could surface the same document more than once when it matched across multiple search indices.

We fixed output_config not being applied correctly on Claude model wrappers, $defs being stripped from tool schemas, and file_ids and container information not being surfaced during streaming for skills.

We resolved a bug where streamed tool call data was overwriting accumulated state instead of appending to it, causing incomplete or incorrect tool calls to be dispatched.

We resolved an issue where empty string values in streamed LiteLLM responses could overwrite previously accumulated tool names, resulting in tool calls with missing identifiers.

We added an early error when AWS_BEDROCK_API_KEY is set for Claude models on AWS Bedrock, which is not a supported authentication path, rather than failing silently later in the request lifecycle.

We overrode deepcopy behavior on the Azure OpenAI model class to preserve live client references, preventing connection failures that occurred when the model object was copied during agent or team setup.

We resolved an issue where empty reasoning blocks returned by OpenRouter for non-reasoning models were being processed unnecessarily, causing noise in parsed responses.

We resolved a failure in cache key generation when the input contained types that are not directly JSON-serializable, ensuring caching works reliably across a broader range of agent inputs.

Resolved an incorrect import of the pymongo async modules that could cause runtime failures when using MongoDB with async agents or workflows. The import now correctly references the async-compatible pymongo interfaces.

Details:

  • Fixes a broken import path for pymongo async modules in the MongoDB database backend
  • Resolves runtime errors encountered when running async agents or workflows with MongoDB storage
  • No configuration changes required; upgrading applies the fix automatically

Resolved a bug in parse_tool_calls where shared dictionary references across parsed tool calls would cause the same tool to be executed multiple times during streaming. Each tool call is now constructed from an independent copy, eliminating the duplication.

Details:

  • Fixes duplicate tool executions that occurred in streaming mode when multiple tool calls were parsed in the same pass
  • Caused by a mutable shared dict reference being reused across tool call objects in parse_tool_calls
  • No configuration changes required; the fix applies automatically to all streaming tool call workflows

Resolved an issue where structured output support was not correctly detected for certain Claude models, causing agents to fall back to less reliable output parsing strategies even when the model fully supports structured output. Affected models now use the correct path automatically.

Details:

  • Fixes structured output capability detection across supported Claude model variants
  • Improves reliability and consistency of structured output for agents using response schemas
  • No configuration changes required; the fix applies automatically

Resolved a race condition in MCPTools where parallel tool calls using a header_provider would each independently spin up their own MCP session instead of sharing one, leaving the agent in a stuck state. Session creation is now correctly coordinated so that concurrent tool calls share a single session as intended.

Details:

  • Fixes duplicate session creation when multiple MCP tool calls execute in parallel with header_provider configured
  • Eliminates the agent hang caused by conflicting concurrent sessions
  • No configuration changes required; the fix applies automatically to all MCPTools setups using header_provider

The Gemini model class now accepts a timeout parameter, giving teams explicit control over how long a request is allowed to run before being cancelled. This is particularly useful for production deployments where unbounded request durations can affect reliability and resource utilization.

Details:

  • Set timeout (in seconds) directly on the Gemini model instance
  • Applies to all request types made through the Gemini model class
  • Falls back to the existing default behavior when not set; no migration required

See reference in docs.

The Mistral model provider now supports the mistralai v2 SDK while continuing to work with v1. Teams can upgrade their SDK dependency and take advantage of v2 improvements without any changes to their agent or model configuration.

Details:

  • Full support for mistralai v2 SDK alongside continued v1 compatibility
  • No migration required; existing configurations work without modification
  • Enables access to v2 SDK features and performance improvements for teams ready to upgrade

The GET /workflows/{id} endpoint now accepts a version query parameter, allowing callers to fetch a specific version of a workflow rather than always receiving the latest. Workflows also now support run-level parameters — metadata, dependencies, add_dependencies_to_context, and add_session_state_to_context — bringing them to parity with agents and teams for consistent configuration across all execution types.

Details:

  • Pass ?version=<version> to GET /workflows/{id} to retrieve a specific workflow version
  • metadata, dependencies, add_dependencies_to_context, and add_session_state_to_context are now available at the run level on workflows
  • Aligns the workflow runtime configuration surface with agents and teams
  • No breaking changes; existing workflow definitions and API calls are unaffected

AgentTools now includes ToolParallelAiSearch, a native integration with Vertex AI's Parallel AI Search that allows agents to issue multiple search queries concurrently and aggregate results. This brings Vertex AI search into the same parallel retrieval pattern as other search tools, reducing latency for knowledge-intensive tasks that benefit from broad, simultaneous retrieval.

Details:

  • ToolParallelAiSearch integrates directly with Vertex AI's native parallel search API
  • Enables concurrent multi-query search within a single tool call, reducing round-trip latency
  • Consistent with existing parallel search patterns in the toolkit; no special agent configuration required
  • Suitable for RAG workflows, research agents, and any use case requiring broad, fast retrieval from Vertex AI


View the cookbook.

The WhatsApp interface has been significantly extended in V2, adding support for rich media, interactive message types, teams, and workflows. Agents can now send and receive images, video, audio, and documents, and respond with structured interactive elements like reply buttons, list menus, location shares, and message reactions, moving beyond plain text into a full conversational interface.

Create an agent, expose it with the Whatsapp interface, and serve via AgentOS:

from agno.agent import Agent
from agno.models.openai import OpenAIResponses
from agno.os import AgentOS
from agno.os.interfaces.whatsapp import Whatsapp

image_agent = Agent(
    model=OpenAIResponses(id="gpt-5.2"), # Ensure OPENAI_API_KEY is set
    tools=[OpenAITools(image_model="gpt-image-1")],
    markdown=True,
    add_history_to_context=True,
)

agent_os = AgentOS(
    agents=[image_agent],
    interfaces=[Whatsapp(agent=image_agent)],
)
app = agent_os.get_app()

if __name__ == "__main__":
    agent_os.serve(app="basic:app", port=8000, reload=True)

View the Whatsapp docs for more.

The new Telegram interface mounts webhook endpoints directly on AgentOS, turning any agent, team, or workflow into a fully functional Telegram bot. Inbound messages — text, photos, voice notes, audio, video, documents, stickers, and animations — are handled natively and passed to the agent as structured inputs. Responses stream back in real time with live message edits, throttled to stay within Telegram's rate limits, so users see output as it is generated rather than waiting for a complete reply.

Create an agent, expose it with the Telegram interface, and serve via AgentOS:

from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from agno.models.google import Gemini
from agno.os.app import AgentOS
from agno.os.interfaces.telegram import Telegram

agent_db = SqliteDb(session_table="telegram_sessions", db_file="tmp/telegram_basic.db")

telegram_agent = Agent(
    name="Telegram Bot",
    model=Gemini(id="gemini-2.5-pro"),
    db=agent_db,
    instructions=[
        "You are a helpful assistant on Telegram.",
        "Keep responses concise and friendly.",
    ],
    add_history_to_context=True,
    num_history_runs=3,
    add_datetime_to_context=True,
    markdown=True,
)

agent_os = AgentOS(
    agents=[telegram_agent],
    interfaces=[Telegram(agent=telegram_agent)],
)
app = agent_os.get_app()

if __name__ == "__main__":
    agent_os.serve(app="basic:app", port=7777, reload=True)


See the Telegram docs for more.

The DoclingReader provides a single, unified interface for processing the full range of document formats an AI agent encounters — PDFs, Word files, PowerPoint decks, Excel spreadsheets, images, and even audio and video files — all through the same reader, without format-specific ingestion logic or a sprawling set of dependencies. Built on IBM Research's open-source Docling library, it preserves document structure (headings, tables, hierarchies, formulas, and layout) during extraction, so context is not lost in translation before content reaches your vector store.

Details:

  • Supports PDFs, .docx, .pptx, .xlsx, markup files, images (JPEG, PNG), and audio/video (MP4 and others via FFmpeg and Whisper)
  • Structure-preserving extraction keeps tables, headings, and hierarchies intact for higher-quality RAG retrieval
  • Outputs flow directly into Agno's chunking pipeline with no additional preprocessing required
  • Configurable output_format supports Markdown (default), plain text, JSON, HTML, DocTags, and VTT for audio/video transcripts
  • Load from local paths or directly from URLs with the same interface

See Docling Reader docs.

Production agent systems demand visibility. Agno now integrates with MLflow to deliver complete, end-to-end trace observability across every model call, tool invocation, and agent step—without custom instrumentation or additional configuration overhead.

With a single call to mlflow.agno.autolog() at startup, all agent activity is automatically captured and surfaced in the MLflow UI. This applies to both individual agents and full AgentOS deployments.

Details:

  • Full trace capture across model calls, tool use, and agent steps — out of the box
  • Works with self-hosted and managed MLflow servers (AWS, Azure, GCP)
  • Supports AgentOS applications with no additional setup beyond the single autolog call
  • Traces are OpenTelemetry-native, making them compatible with existing observability pipelines

View the MLflow docs for more.

LearningMode.PROPOSE now automatically enables chat history for the session, ensuring that the multi-turn confirmation flow — where the agent proposes a learned fact and waits for user approval — has full conversational context available across rounds. Previously, history was not retained between turns, causing the agent to lose track of pending proposals mid-confirmation.

Details:

  • Chat history is enabled automatically when LearningMode.PROPOSE is active; no manual configuration needed
  • Ensures proposed facts and user responses remain in context throughout the confirmation loop
  • Fully backward-compatible; no changes required for existing learning configurations

Updated the default base_url for the Siliconflow model provider from .com to .cn to match Siliconflow's actual API endpoint. Requests were previously routed to an incorrect domain, causing connection failures for users relying on the default configuration.

Details:

  • Corrects the default base_url to siliconflow.cn
  • Users who had already overridden base_url explicitly are unaffected
  • No other configuration changes required

Fixed a formatting issue where tool parameter descriptions were incorrectly prefixed with (None) when no type annotation was present. Parameter descriptions now render cleanly in all contexts — tool schemas, AgentOS views, and model prompts — without extraneous noise that could confuse the model or degrade tool call accuracy.

Details:

  • Removes the (None) prefix from parameter descriptions that lack explicit type annotations
  • Improves the quality and readability of generated tool schemas
  • No changes required; the fix applies automatically to all tools

Resolved a bug where add_history_to_context was not correctly applied during Human-in-the-Loop runs that involved multiple conversation rounds. Agents paused for human review and subsequently resumed now have access to the full conversation history in context, preventing gaps in reasoning across approval boundaries.

Details:

  • Fixes history injection for HITL workflows using add_history_to_context across multiple rounds
  • Ensures agents resuming after a pause have full conversational context available
  • No configuration changes required; the fix applies automatically to existing HITL setups

A new datetime_format parameter on Agent and Team lets you control exactly how the current datetime is presented in the agent's context using any valid strftime format string. This removes the need to manually inject formatted timestamps through instructions and ensures consistent datetime representation across different locales, regions, and output requirements.

Details:

  • Pass any strftimecompatible format string (e.g., "%Y-%m-%dT%H:%M:%S" for ISO-8601, "%Y-%m-%d" for date-only, or locale-specific patterns)
  • Applies wherever datetime context is injected, including add_datetime_to_context=True
  • Defaults to existing behavior when not set; no migration required

See cookbook.