Changelog_
We fixed an issue where custom db table names set on components were being overwritten with defaults when those components were loaded back from configuration. Custom table names are now preserved correctly through the full save and load cycle.
GitHubConfig now accepts a repository override at the request level, allowing agents that work across multiple repositories to specify the target repo per call rather than being locked to a single repo at initialization time.
See cookbook
A new option lets you turn off file citations in Claude responses. This is useful when citations add noise to the output, for example in conversational flows, summarization tasks, or any context where surfacing source references per response is unwanted.
We fixed an issue where headers supplied by header_provider were not being applied during MCP session initialization, only during subsequent requests. Sessions now open with the correct headers from the start, preventing authentication and routing failures on first contact.
We fixed an issue where knowledge databases were not being built live during configuration API calls, causing agents to operate without their knowledge base until a separate build step was triggered. Knowledge databases are now constructed inline as part of the configuration flow.
We fixed an issue where events emitted by inner workflows could lose their identity or be misattributed when bubbling up through outer workflows. Events now carry a nested_depth field on agent and team events, and inner workflow event identity is preserved throughout, making it straightforward to trace exactly where in a nested pipeline any event originated.
We fixed an issue where a shared HTTP/2 client was being injected across all model providers, causing connection conflicts and transient failures under concurrent load. Each provider now maintains its own client, eliminating the source of these errors across all providers simultaneously.
We fixed an issue where cancellation of a client connection during streaming could surface as an unhandled error rather than being handled quietly. CancelledError is now caught explicitly in all router streaming generators, so cancelled connections close gracefully without producing noise in logs or error handlers.
We fixed an issue where JSON cleaning was stripping or corrupting code blocks embedded in string values before the parse was even attempted. The parser now tries a raw JSON parse first and only falls back to cleaning if that fails, preserving code blocks and other structured content in the output as intended.
We fixed an issue where parameters automatically injected by the framework, such as agent, team, and run_context, were appearing in user_input_schema, presenting users with fields they should never need to fill in. These parameters are now excluded, so only genuinely user facing fields appear in the schema.
We fixed an issue where the memory pipeline gate check did not account for extra_messages, causing memory summarization to be skipped in runs where additional context messages were provided alongside the main conversation. The gate now correctly evaluates the full message set, including extra_messages, before deciding whether to run the memory pipeline.
LLMsTxtTools and LLMsTxtReader add native support for the llms.txt standard — a Markdown-based file that websites publish at /llms.txt to provide LLMs with a concise, structured index of their documentation, free of navigation elements, JavaScript, and other noise that wastes context. Agents can now fetch, read, and work with llms.txt files directly, making it straightforward to build agents that are grounded in up-to-date third-party documentation without manual content pipelines.
Details:
LLMsTxtReaderingests anyllms.txtfile into a knowledge base for retrieval and RAGLLMsTxtToolslets agents fetch and queryllms.txtindexes directly as a tool call- Compatible with any site publishing the standard, including
https://docs.agno.com/llms.txt - No preprocessing required —
llms.txtfiles are already structured for LLM consumption
See cookbook for reference
SalesforceTools gives agents native access to Salesforce CRM data, making it straightforward to build agents that query records, surface pipeline information, triage support cases, or answer questions about account state — without custom API wrappers or manual data exports.
View the Salesforce docs to learn more.
We fixed an issue where knowledge_table was being read from agent.db instead of contents_db, causing knowledge lookups to fail or return incorrect results when the two databases were configured separately. Knowledge retrieval now correctly targets the intended storage backend.
We fixed two issues in the AG-UI interface: reasoning events are now correctly emitted as they occur so users can follow the model's thinking in real time, and input_content now stores the current user input rather than the full message history, ensuring the correct value is surfaced per turn.
We fixed an issue where workflow steps that included file path images were not being converted correctly, causing those images to be dropped or mishandled when passed between steps. File path images now flow through step conversion as intended.
We fixed handling of response.reasoning_summary_text.delta events in OpenAIResponses so that reasoning content is streamed incrementally as it is generated rather than being dropped or buffered. Users now see the model's reasoning surface in real time alongside the response.
We fixed TeamSession.from_dict() so it no longer mutates the input mapping it receives. Previously, loading a team session from a dictionary could silently modify the original data structure, causing hard-to-trace state issues in workflows that reused or inspected the source mapping after loading.
A new Azure AI Foundry Claude model provider gives teams a first-class way to run Claude models through Microsoft's Azure AI infrastructure, with the same configuration patterns used across other Agno model providers. This is particularly useful for organizations that require Azure-hosted deployments for compliance, data residency, or enterprise procurement reasons.
View the Azure AI Foundry Claude docs to learn more.
OpenAIResponses now supports background mode for the OpenAI Responses API, allowing long-running agent tasks to execute asynchronously without holding an open connection. This is useful for tasks that exceed typical request timeouts or that need to be dispatched and polled rather than streamed directly.
Workflows can now pause after a step completes and wait for a human to inspect the output before it flows to the next step. Configured via HumanReview(requires_output_review=True) on a Step, Router, or Loop, the run pauses with the full step output available in req.step_output. Reviewers can approve, reject with optional feedback to trigger a retry, or edit the output directly — giving teams a structured, auditable post-execution review gate at any point in a pipeline without custom orchestration code.
Details:
requires_output_reviewaccepts aboolor a callable predicate that receives theStepOutputat runtime — enabling conditional review (e.g., only pause for outputs over 200 characters, or outputs containing sensitive keywords)- Four reviewer actions:
confirm()to approve as-is,reject()to reject,reject(feedback="...")to pass correction instructions back to the agent on retry, andedit("new output")to accept with inline modifications on_rejectcontrols rejection behavior:skip,cancel,retry, orelse_branch; whenon_reject=OnReject.retry, the step re-executes with feedback injected into the agent's next messagemax_retries(default 3) caps the number of retry attempts before the step is treated as a final rejection- Supported on
Step,Router, andLoop(viarequires_iteration_reviewonHumanReviewfor per-iteration review in loops) - Flat parameter
requires_output_review=TrueonStepis still supported for backward compatibility
See the Output Review docs for more.
A Workflow can now be used directly as a step inside another workflow, enabling modular composition of reusable sub-pipelines. The inner workflow runs as a single step in the outer workflow, with its output chained to the next step via the standard StepInput/StepOutput interface. Complex orchestrations can be broken into smaller, independently testable units and assembled without duplicating logic — the same sub-workflow can be reused across multiple parent workflows.
Details:
- Pass a
Workflowinstance to aStepviaStep(name="...", workflow=inner_workflow), or use the shorthand auto-wrap by placing the workflow directly in thestepslist (uses the workflow'snameas the step name) - Inner workflows support the full set of primitives —
Condition,Loop,Router,Parallel, agents, and custom executors — mixed in any combination - Session state is deep-copied into the inner workflow before execution and merged back into the outer workflow after, keeping state consistent across levels
- Workflows can be nested multiple levels deep; streaming events bubble up with a
nested_depthfield so outer and inner events can be distinguished by depth,workflow_id, andworkflow_name - Enables modular workflow design: build reusable research, processing, or review sub-pipelines once and compose them into larger orchestrations
See the Nested Workflow docs to learn more.
Skills—reusable, instruction-based capability modules—can now be attached to Teams directly via the skills parameter, giving the team leader access to domain expertise without delegating to a member agent. The leader receives skill summaries injected into its system prompt and three skill tools (get_skill_instructions, get_skill_reference, get_skill_script) that let it discover and use skills on demand during a run.
Details:
- Attach skills to a
Teamviaskills=Skills(loaders=[LocalSkills(...)]), using anySkillLoadersuch asLocalSkills - Skills are surfaced to the team leader only — member agents retain their own independent skill configurations
- Use team-level skills when the leader needs domain expertise to coordinate (e.g., review standards, routing rules); attach skills to individual member agents when specialists need expertise to execute their own work; both can coexist
- Skills follow the same pattern as knowledge, memory, and tools:
get_tools()adds skill tools to the leader's tool list andget_system_prompt_snippet()injects skill metadata into the leader's system prompt - Shared skill directories can be reused across agents and teams without duplication
See Team Skills docs for reference
A new AGNO_LOG_TRACEBACKS environment variable (opt-in) enables full Python tracebacks in log_error and log_warning calls. By default, tracebacks are suppressed to keep logs clean in production; setting this variable surfaces the complete stack trace for faster local debugging and error diagnosis.
Details:
- Set
AGNO_LOG_TRACEBACKS=trueto enable full traceback output inlog_errorandlog_warning - Off by default; no change in behavior for existing deployments
- Useful for development environments and debugging sessions where full stack context is needed
SessionSummaryManager now exposes last_n_runs and conversation_limit parameters, giving precise control over how much of the conversation history is fed into summary generation. Teams running long sessions or high-frequency agents can use these to keep summaries focused and cost-efficient by limiting the input window rather than always summarizing the full history.
Details:
last_n_runslimits summary generation to the most recent N runs in the sessionconversation_limitcaps the number of conversation turns included in the summary input- Both parameters work independently and can be combined
- No changes required for existing
SessionSummaryManagerconfigurations; defaults preserve current behavior
See cookbook for reference.
Resolved an issue where a shared HTTP/2 client was being injected across concurrent OpenAI and Azure OpenAI requests, causing transient 400 errors under load. Each request now uses its own client, eliminating the conflict.
audio_total_tokens is now correctly computed and included in run metrics for OpenAI, Perplexity, and LiteLLM. Audio token usage is now visible alongside text tokens for accurate cost tracking and monitoring.
Resolved a bug where TeamSession.get_messages could return the same message more than once, causing downstream logic that relies on message history to process duplicates.
Resolved a crash in GitHubTools where get_pull_requests would raise an IndexError if the repository contained fewer pull requests than the specified limit. The tool now returns however many PRs are available.
Resolved an unhandled DisambiguationError that caused WikipediaTools to crash when a search term matched multiple Wikipedia articles. A new auto_suggest parameter also lets you control whether Wikipedia's suggestion engine is applied to queries.
Resolved an issue where .msg, .xlsx, and .xls files were not recognized on upload due to missing MIME type mappings. These file types now upload correctly without requiring manual workarounds.
Agents and teams can now be configured with fallback models that activate automatically when the primary model fails, whether from rate limits, outages, context window overflows, or other retryable errors. Fallbacks are tried in order after the primary model’s retry loop is fully exhausted, and each fallback model runs its own independent retry cycle before the next one is attempted. Both simple lists and error-specific routing are supported, giving teams full control over how failures are handled.
Pass fallback_models to any Agent or Team. If the primary model fails after exhausting its retries, each fallback is tried in order until one succeeds.
from agno.agent import Agent
from agno.models.anthropic import Claude
from agno.models.openai import OpenAIChat
agent = Agent(
model=OpenAIChat(id="gpt-4o"),
fallback_models=[Claude(id="claude-sonnet-4-20250514")],
)
If gpt-4o fails after exhausting its own retries, Claude is tried automatically.
Model strings work too:
from agno.agent import Agent
agent = Agent(
model="openai:gpt-4o",
fallback_models=["anthropic:claude-sonnet-4-20250514"],
)
See Fallback Models docs for more.
AzureBlobConfig now supports Shared Access Signature (SAS) token authentication as an alternative to connection strings and service principal credentials. This makes it easier to grant time-scoped, permission-limited access to Azure Blob Storage without exposing full account credentials, which is useful for automated pipelines, temporary access grants, and least-privilege storage configurations.
# Clone and setup repo
git clone https://github.com/agno-agi/agno.git
cd agno/cookbook/07_knowledge/cloud
# Create and activate virtual environment
./scripts/demo_setup.sh
source .venvs/demo/bin/activate
# Optiona: Run PgVector (needs docker)
./cookbook/scripts/run_pgvector.sh
python azure_blob.py
Details:
- Pass a SAS token directly to
AzureBlobConfigalongside the account URL - Complements existing authentication paths; no migration required for configurations already using connection strings or service principal auth
See the Azure Blob Storage Content Source for Knowledge docs for more.
SlackTools now includes a workspace search tool, letting agents query messages, files, and content across channels directly from a tool call. This makes it straightforward to build agents that surface relevant Slack history as part of a broader workflow, without requiring manual channel navigation or separate integrations.
Details:
- New
search_workspacetool queries Slack's search API and returns matching messages and files - Works alongside existing
SlackToolscapabilities for reading channels, posting messages, and managing threads - Requires a Slack token with the appropriate
search:readscope
View the Slack Tools docs for more.
Claude 4.6 and later models do not support assistant message prefill, which previously caused silent failures or malformed requests when conversations ended with an assistant turn. Agno now automatically injects a trailing user message in these cases, with centralized detection logic shared across all Claude deployment paths, including Anthropic, AWS Bedrock, Vertex AI, and LiteLLM, so the fix applies consistently regardless of how Claude is served.
Details:
- Trailing user message injection is applied automatically when the last message in a conversation is an assistant message and the model does not support prefill
- Prefill support detection is centralized and version-aware, covering Claude 4.6+ across Anthropic, Bedrock, Vertex AI, and LiteLLM providers
- No configuration changes required; existing agents and teams using Claude are unaffected
ReliabilityEval has been extended with more precise evaluation capabilities: expected tool calls can now be matched as a subset of actual calls rather than requiring an exact full match, argument values are validated against expected parameters, and missing tool calls are explicitly tracked and surfaced in results. Multi-round tool call collection has also been fixed so all rounds are gathered correctly, along with a mutation bug that was modifying original RunOutput.messages in place and an arun() issue using the wrong ID when saving evaluation files.
The /sessions list endpoint now includes a significantly expanded set of fields per session, giving dashboards, monitoring tools, and integrations a more complete picture of each session without requiring separate follow-up requests.
Details:
- Additional fields returned per session:
user_id,agent_id,team_id,workflow_id,session_summary,metrics,total_tokens, andmetadata - No changes required to existing integrations; new fields are additive
- Enables richer session filtering, reporting, and analytics directly from the list response
A new /info API endpoint returns a lightweight count of agents, teams, and workflows registered in the AgentOS instance. The endpoint is intentionally unauthenticated, making it suitable as a health or readiness signal for infrastructure tooling, status pages, and deployment pipelines that need a fast, low-cost way to verify instance state.
Details:
- Returns agent, team, and workflow counts for the current AgentOS instance
- Unauthenticated by design — no credentials required for lightweight infrastructure checks
- Useful for readiness probes, status dashboards, and deployment verification scripts
We’ve made ChromaDB operations more reliable by automatically splitting large upsert and query requests into smaller batches at runtime. This prevents failures that used to happen when requests exceeded ChromaDB’s per-request limits.
You can continue calling upsert and query operations the same way as before. The system now handles batching behind the scenes, so large payloads process smoothly without extra work.
import asyncio
from agno.agent import Agent
from agno.knowledge.knowledge import Knowledge
from agno.vectordb.chroma import ChromaDb
# Create Knowledge Instance with ChromaDB
knowledge = Knowledge(
name="Basic SDK Knowledge Base",
description="Agno 2.0 Knowledge Implementation with ChromaDB",
vector_db=ChromaDb(
collection="vectors", path="tmp/chromadb", persistent_client=True
),
)
asyncio.run(
knowledge.ainsert(
name="Recipes",
url="https://agno-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",
metadata={"doc_type": "recipe_book"},
)
)
# Create and use the agent
agent = Agent(knowledge=knowledge)
agent.print_response("List down the ingredients to make Massaman Gai", markdown=True)
# Delete operations examples
vector_db = knowledge.vector_db
vector_db.delete_by_name("Recipes")
# or
vector_db.delete_by_metadata({"user_tag": "Recipes from website"})
View the ChromaDB Vector Database docs for more.
Reader classes now correctly propagate the chunk_size parameter to the default chunking strategy they apply when no explicit chunking configuration is provided. Previously, chunk_size set on a reader was silently ignored when falling back to the default chunker, producing chunks of unexpected size.
Details:
- Fixes
chunk_sizebeing ignored in default chunking strategies used by reader classes - Ensures consistent chunk sizing across both explicit and default chunking configurations
- No changes required; the fix applies automatically to all reader classes
Two improvements have been made to the Slack interface to give teams better visibility and more robust handling of long agent responses. A new show_member_tool_calls parameter controls whether tool calls from team members are shown inline in the Slack thread, and automatic card overflow rotation ensures that responses exceeding Slack's message size limit are continued in a new message rather than being truncated or failing silently.
SchedulerTools Gives agents programmatic control over the AgentOS Scheduler, allowing them to create, list, update, enable, disable, trigger, and delete cron schedules as part of a run. This makes it possible to build agents that autonomously manage their own recurring tasks, such as scheduling a report, adjusting a polling interval, or cleaning up stale jobs, without requiring a separate orchestration layer.
Details:
- Full schedule lifecycle management: create, list, update, enable, disable, trigger, and delete operations available as agent tools
- Schedules target AgentOS endpoints (e.g.,
/agents/reporter/runs) with configurable cron expressions, timezones, payloads, retry counts, and timeouts - Run history is stored per schedule, giving agents visibility into past execution status, timings, and errors
- Requires
agno[scheduler]and an AgentOS instance withscheduler=Trueenabled
Learn more in the SchedulerTools docs.
Resolved an unhandled msg_too_long error in the Slack streaming path that caused the agent to fail silently or crash when a streamed response exceeded Slack's message length limit. Long responses are now handled gracefully rather than surfacing an error to the user.
Resolved a collection of bugs affecting agents deployed with Coda, including issues in CodingTools, Slack interface behavior, team streaming output, and the learning pipeline. These fixes restore correct end-to-end behavior for Coda-integrated agents across all affected surfaces.
Resolved an issue where server-side tool blocks in Claude conversations were not being preserved when building subsequent request messages. This caused Claude to lose track of tool interactions mid-conversation, breaking multi-turn flows that relied on server tool results being visible in history.
DoclingTools gives agents the ability to convert documents on demand using the Docling library — accepting PDFs, DOCX, PPTX, XLSX, HTML, images, audio, and video files as input and exporting to Markdown, plain text, HTML, JSON, YAML, DocTags, and VTT. Each output format is a separately togglable tool, so agents only expose the conversions they actually need. Advanced PDF handling is also available, with configurable OCR engines, language settings, table structure recognition, picture classification, and per-document timeouts for scanned or complex documents.
Example: The following agent converts a PDF to Markdown
from agno.agent import Agent
from agno.tools.docling import DoclingTools
agent = Agent(
tools=[DoclingTools(all=True)],
description="You are an agent that converts documents from all Docling parsers and exports to all supported output formats.",
)
agent.print_response(
"Convert to Markdown: cookbook/07_knowledge/testing_resources/cv_1.pdf",
markdown=True,
)
See the DoclingTools docs for more.
We’ve introduced GoogleSlidesTools to give agents full control over Google Slides. With it, you can create presentations, build out slides, and manage content end to end, all directly from your agent.
Agents can add and reorder slides, insert text boxes, tables, images, and videos, and read existing slide content to stay context-aware. Whether you are building decks from scratch or modifying existing ones, everything happens programmatically in a single workflow.
We support both OAuth and service account authentication, so you can use the toolkit in interactive setups or deploy it in server-side, multi-user environments.
from agno.agent import Agent
from agno.models.google import Gemini
from agno.tools.google.slides import GoogleSlidesTools
agent = Agent(
model=Gemini(id="gemini-2.0-flash"),
tools=[
GoogleSlidesTools(
oauth_port=8080,
)
],
instructions=[
"You are a Google Slides assistant that helps users create and manage presentations.",
"Always call get_presentation_metadata before modifying slides to get current slide IDs.",
"Use slide_id values returned by the API -- never guess them.",
"Return the presentation ID and URL after creating a presentation.",
],
add_datetime_to_context=True,
markdown=True,
)
agent.print_response(
"Create a new Google Slides presentation titled 'Quarterly Business Review'. "
"Then add the following slides: "
"1. A TITLE slide with title 'Q3 2025 Business Review' and subtitle 'Prepared by the Strategy Team'. "
"2. A TITLE_AND_BODY slide with title 'Agenda' and body listing: Revenue Overview, Key Metrics, Product Roadmap, Q4 Goals.",
stream=True,
)
See the Google Slides docs for more.
Tool call schemas are now normalized across model providers, so switching an agent from one model to another no longer requires adjusting how tools are defined or how their outputs are parsed. This removes a common source of friction when benchmarking models, migrating providers, or running the same agent across multiple backends.
Details:
- Tool call inputs and outputs are translated into a consistent internal format regardless of the originating model provider
- Eliminates provider-specific edge cases in tool schema generation and response parsing
- Enables drop-in model swapping without changes to tool definitions or agent logic
See Fallback Models docs for more.
A new PerplexitySearch toolkit gives agents access to the Perplexity Search API, returning ranked web results with titles, URLs, snippets, and publication dates in a single tool call. Built-in filtering by recency and domain makes it straightforward to build agents that need up-to-date, source-controlled retrieval without additional post-processing.
Check out this example of basic search:
from agno.agent import Agent
from agno.tools.perplexity import PerplexitySearch
agent = Agent(tools=[PerplexitySearch()], markdown=True)
agent.print_response("What are the latest developments in AI?")
Details:
searchandasearch(async) functions return a JSON array of results with URL, title, snippet, and date per resultsearch_recency_filterrestricts results to content from the pastday,week,month, oryearsearch_domain_filterlimits results to a specific list of domains (e.g.,reuters.com,bloomberg.com)search_language_filteraccepts ISO language codes for language-scoped retrievalmax_results(default 5) andmax_tokens_per_page(default 2048) give fine-grained control over result volume and content length- Requires a
PERPLEXITY_API_KEYenvironment variable; no other configuration needed
See the Perplexity docs for reference.
AgenticChunking now accepts a custom_prompt parameter, letting you override the default model-driven chunking instructions with domain-specific logic. Rather than relying solely on the built-in heuristics for finding semantic breakpoints, you can now describe exactly how the model should segment your documents — for example, splitting at major section boundaries, preserving clause integrity, or separating structured metadata from body content — making it straightforward to tune retrieval quality for specialized corpora.
Details:
- Pass any string to
custom_promptto override the default chunking behavior; custom prompts are prioritized over built-in instructions - The default output format constraints are still enforced automatically —
custom_promptonly needs to describe the chunking logic itself - Always pair
custom_promptwithmax_chunk_sizeto bound output length; the defaultmax_chunk_sizeis 5000 characters - The
modelparameter accepts any Agno-compatible model, allowing you to route chunking to a smaller or cheaper model independently of your agent
See the Custom Prompts docs for more.
Resolved an issue where LanceDB's search() could return the same document multiple times when hybrid search retrieved it via both vector similarity and full-text search. Results are now deduplicated before being returned, ensuring each document appears only once regardless of which search path surfaced it.
Details:
- Fixes duplicate results in hybrid search caused by the same document matching both the vector and FTS indices
- Deduplication is applied automatically; no configuration changes required
- Improves result quality and reduces noise for agents and workflows using LanceDB hybrid search
The Seltz toolkit has been updated to align with the breaking changes introduced in the Seltz SDK 0.2.0 release, replacing the previous 0.1.x integration. Teams using the Seltz toolkit should update their Seltz SDK dependency to 0.2.0 alongside this release.
Details:
- Updates the Seltz toolkit integration from SDK
0.1.xto0.2.0 - Ensures compatibility with the latest Seltz SDK API surface
- Upgrade the
seltzpackage to0.2.0to avoid integration errors
We resolved an issue where tools from async toolkits were not included in the tool name list injected into the team system message, leaving the team unaware of those tools at the prompt level.
We resolved an additional case where hybrid search could surface the same document more than once when it matched across multiple search indices.
We fixed output_config not being applied correctly on Claude model wrappers, $defs being stripped from tool schemas, and file_ids and container information not being surfaced during streaming for skills.
We resolved a bug where streamed tool call data was overwriting accumulated state instead of appending to it, causing incomplete or incorrect tool calls to be dispatched.
We resolved an issue where empty string values in streamed LiteLLM responses could overwrite previously accumulated tool names, resulting in tool calls with missing identifiers.
We added an early error when AWS_BEDROCK_API_KEY is set for Claude models on AWS Bedrock, which is not a supported authentication path, rather than failing silently later in the request lifecycle.
We overrode deepcopy behavior on the Azure OpenAI model class to preserve live client references, preventing connection failures that occurred when the model object was copied during agent or team setup.
We resolved an issue where empty reasoning blocks returned by OpenRouter for non-reasoning models were being processed unnecessarily, causing noise in parsed responses.
We resolved a failure in cache key generation when the input contained types that are not directly JSON-serializable, ensuring caching works reliably across a broader range of agent inputs.
Resolved an incorrect import of the pymongo async modules that could cause runtime failures when using MongoDB with async agents or workflows. The import now correctly references the async-compatible pymongo interfaces.
Details:
- Fixes a broken import path for
pymongoasync modules in the MongoDB database backend - Resolves runtime errors encountered when running async agents or workflows with MongoDB storage
- No configuration changes required; upgrading applies the fix automatically
Resolved a bug in parse_tool_calls where shared dictionary references across parsed tool calls would cause the same tool to be executed multiple times during streaming. Each tool call is now constructed from an independent copy, eliminating the duplication.
Details:
- Fixes duplicate tool executions that occurred in streaming mode when multiple tool calls were parsed in the same pass
- Caused by a mutable shared dict reference being reused across tool call objects in
parse_tool_calls - No configuration changes required; the fix applies automatically to all streaming tool call workflows
Resolved an issue where structured output support was not correctly detected for certain Claude models, causing agents to fall back to less reliable output parsing strategies even when the model fully supports structured output. Affected models now use the correct path automatically.
Details:
- Fixes structured output capability detection across supported Claude model variants
- Improves reliability and consistency of structured output for agents using response schemas
- No configuration changes required; the fix applies automatically
Resolved a race condition in MCPTools where parallel tool calls using a header_provider would each independently spin up their own MCP session instead of sharing one, leaving the agent in a stuck state. Session creation is now correctly coordinated so that concurrent tool calls share a single session as intended.
Details:
- Fixes duplicate session creation when multiple MCP tool calls execute in parallel with
header_providerconfigured - Eliminates the agent hang caused by conflicting concurrent sessions
- No configuration changes required; the fix applies automatically to all
MCPToolssetups usingheader_provider
The Gemini model class now accepts a timeout parameter, giving teams explicit control over how long a request is allowed to run before being cancelled. This is particularly useful for production deployments where unbounded request durations can affect reliability and resource utilization.
Details:
- Set
timeout(in seconds) directly on theGeminimodel instance - Applies to all request types made through the Gemini model class
- Falls back to the existing default behavior when not set; no migration required
See reference in docs.
The Mistral model provider now supports the mistralai v2 SDK while continuing to work with v1. Teams can upgrade their SDK dependency and take advantage of v2 improvements without any changes to their agent or model configuration.
Details:
- Full support for
mistralaiv2 SDK alongside continued v1 compatibility - No migration required; existing configurations work without modification
- Enables access to v2 SDK features and performance improvements for teams ready to upgrade
The GET /workflows/{id} endpoint now accepts a version query parameter, allowing callers to fetch a specific version of a workflow rather than always receiving the latest. Workflows also now support run-level parameters — metadata, dependencies, add_dependencies_to_context, and add_session_state_to_context — bringing them to parity with agents and teams for consistent configuration across all execution types.
Details:
- Pass
?version=<version>toGET /workflows/{id}to retrieve a specific workflow version metadata,dependencies,add_dependencies_to_context, andadd_session_state_to_contextare now available at the run level on workflows- Aligns the workflow runtime configuration surface with agents and teams
- No breaking changes; existing workflow definitions and API calls are unaffected
AgentTools now includes ToolParallelAiSearch, a native integration with Vertex AI's Parallel AI Search that allows agents to issue multiple search queries concurrently and aggregate results. This brings Vertex AI search into the same parallel retrieval pattern as other search tools, reducing latency for knowledge-intensive tasks that benefit from broad, simultaneous retrieval.
Details:
ToolParallelAiSearchintegrates directly with Vertex AI's native parallel search API- Enables concurrent multi-query search within a single tool call, reducing round-trip latency
- Consistent with existing parallel search patterns in the toolkit; no special agent configuration required
- Suitable for RAG workflows, research agents, and any use case requiring broad, fast retrieval from Vertex AI
View the cookbook.
The WhatsApp interface has been significantly extended in V2, adding support for rich media, interactive message types, teams, and workflows. Agents can now send and receive images, video, audio, and documents, and respond with structured interactive elements like reply buttons, list menus, location shares, and message reactions, moving beyond plain text into a full conversational interface.
Create an agent, expose it with the Whatsapp interface, and serve via AgentOS:
from agno.agent import Agent
from agno.models.openai import OpenAIResponses
from agno.os import AgentOS
from agno.os.interfaces.whatsapp import Whatsapp
image_agent = Agent(
model=OpenAIResponses(id="gpt-5.2"), # Ensure OPENAI_API_KEY is set
tools=[OpenAITools(image_model="gpt-image-1")],
markdown=True,
add_history_to_context=True,
)
agent_os = AgentOS(
agents=[image_agent],
interfaces=[Whatsapp(agent=image_agent)],
)
app = agent_os.get_app()
if __name__ == "__main__":
agent_os.serve(app="basic:app", port=8000, reload=True)
View the Whatsapp docs for more.
The new Telegram interface mounts webhook endpoints directly on AgentOS, turning any agent, team, or workflow into a fully functional Telegram bot. Inbound messages — text, photos, voice notes, audio, video, documents, stickers, and animations — are handled natively and passed to the agent as structured inputs. Responses stream back in real time with live message edits, throttled to stay within Telegram's rate limits, so users see output as it is generated rather than waiting for a complete reply.
Create an agent, expose it with the Telegram interface, and serve via AgentOS:
from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from agno.models.google import Gemini
from agno.os.app import AgentOS
from agno.os.interfaces.telegram import Telegram
agent_db = SqliteDb(session_table="telegram_sessions", db_file="tmp/telegram_basic.db")
telegram_agent = Agent(
name="Telegram Bot",
model=Gemini(id="gemini-2.5-pro"),
db=agent_db,
instructions=[
"You are a helpful assistant on Telegram.",
"Keep responses concise and friendly.",
],
add_history_to_context=True,
num_history_runs=3,
add_datetime_to_context=True,
markdown=True,
)
agent_os = AgentOS(
agents=[telegram_agent],
interfaces=[Telegram(agent=telegram_agent)],
)
app = agent_os.get_app()
if __name__ == "__main__":
agent_os.serve(app="basic:app", port=7777, reload=True)
See the Telegram docs for more.
The DoclingReader provides a single, unified interface for processing the full range of document formats an AI agent encounters — PDFs, Word files, PowerPoint decks, Excel spreadsheets, images, and even audio and video files — all through the same reader, without format-specific ingestion logic or a sprawling set of dependencies. Built on IBM Research's open-source Docling library, it preserves document structure (headings, tables, hierarchies, formulas, and layout) during extraction, so context is not lost in translation before content reaches your vector store.
Details:
- Supports PDFs,
.docx,.pptx,.xlsx, markup files, images (JPEG, PNG), and audio/video (MP4 and others via FFmpeg and Whisper) - Structure-preserving extraction keeps tables, headings, and hierarchies intact for higher-quality RAG retrieval
- Outputs flow directly into Agno's chunking pipeline with no additional preprocessing required
- Configurable
output_formatsupports Markdown (default), plain text, JSON, HTML, DocTags, and VTT for audio/video transcripts - Load from local paths or directly from URLs with the same interface
Production agent systems demand visibility. Agno now integrates with MLflow to deliver complete, end-to-end trace observability across every model call, tool invocation, and agent step—without custom instrumentation or additional configuration overhead.
With a single call to mlflow.agno.autolog() at startup, all agent activity is automatically captured and surfaced in the MLflow UI. This applies to both individual agents and full AgentOS deployments.
Details:
- Full trace capture across model calls, tool use, and agent steps — out of the box
- Works with self-hosted and managed MLflow servers (AWS, Azure, GCP)
- Supports AgentOS applications with no additional setup beyond the single autolog call
- Traces are OpenTelemetry-native, making them compatible with existing observability pipelines
View the MLflow docs for more.
LearningMode.PROPOSE now automatically enables chat history for the session, ensuring that the multi-turn confirmation flow — where the agent proposes a learned fact and waits for user approval — has full conversational context available across rounds. Previously, history was not retained between turns, causing the agent to lose track of pending proposals mid-confirmation.
Details:
- Chat history is enabled automatically when
LearningMode.PROPOSEis active; no manual configuration needed - Ensures proposed facts and user responses remain in context throughout the confirmation loop
- Fully backward-compatible; no changes required for existing learning configurations
Updated the default base_url for the Siliconflow model provider from .com to .cn to match Siliconflow's actual API endpoint. Requests were previously routed to an incorrect domain, causing connection failures for users relying on the default configuration.
Details:
- Corrects the default
base_urltosiliconflow.cn - Users who had already overridden
base_urlexplicitly are unaffected - No other configuration changes required
Fixed a formatting issue where tool parameter descriptions were incorrectly prefixed with (None) when no type annotation was present. Parameter descriptions now render cleanly in all contexts — tool schemas, AgentOS views, and model prompts — without extraneous noise that could confuse the model or degrade tool call accuracy.
Details:
- Removes the
(None)prefix from parameter descriptions that lack explicit type annotations - Improves the quality and readability of generated tool schemas
- No changes required; the fix applies automatically to all tools
Resolved a bug where add_history_to_context was not correctly applied during Human-in-the-Loop runs that involved multiple conversation rounds. Agents paused for human review and subsequently resumed now have access to the full conversation history in context, preventing gaps in reasoning across approval boundaries.
Details:
- Fixes history injection for HITL workflows using
add_history_to_contextacross multiple rounds - Ensures agents resuming after a pause have full conversational context available
- No configuration changes required; the fix applies automatically to existing HITL setups
A new datetime_format parameter on Agent and Team lets you control exactly how the current datetime is presented in the agent's context using any valid strftime format string. This removes the need to manually inject formatted timestamps through instructions and ensures consistent datetime representation across different locales, regions, and output requirements.
Details:
- Pass any
strftimecompatible format string (e.g.,"%Y-%m-%dT%H:%M:%S"for ISO-8601,"%Y-%m-%d"for date-only, or locale-specific patterns) - Applies wherever datetime context is injected, including
add_datetime_to_context=True - Defaults to existing behavior when not set; no migration required
See cookbook.
Tool pre- and post-hooks, as well as agent-level tool_hooks, can now read the current run's complete message history via run_context.messages. This makes it straightforward to build hooks that inspect prior conversation turns for auditing, conditional logic, prompt injection detection, or logging without needing to pass history through separate channels.
Details:
run_context.messagesis available in both pre- and post-hooks at the tool and agent level- Enables hooks to make decisions based on the full conversation up to the current tool call
- No changes required to existing hooks that don't need message history; fully backward-compatible
See cookbook for reference.
GoogleCalendarTools has been extended with additional tools, a service account authentication path, and new cookbooks to help teams get started quickly. Agents can now handle a broader set of calendar workflows, from listing and creating events to managing multi-user deployments, without requiring per-user OAuth flows.
This agent will use GoogleCalendarTools to find today’s events:
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.tools.google.calendar import GoogleCalendarTools
agent = Agent(
model=OpenAIChat(id="gpt-4o"),
tools=[GoogleCalendarTools()],
add_datetime_to_context=True,
markdown=True,
)
agent.print_response("What meetings do I have tomorrow?", stream=True)
Details:
- New tools extend coverage beyond
list_eventsandcreate_eventfor richer calendar management - Service account authentication enables server-side and multi-tenant deployments without personal OAuth credentials
allow_updateflag (defaultFalse) gates write operations, providing a safe default for read-heavy workflowscalendar_id,oauth_port,token_path, andaccess_tokenparameters give fine-grained control over auth and calendar targeting- Existing OAuth credential flows continue to work; no migration required
See the Google Calendar docs for more.
Agents and teams can now automatically generate actionable followup prompts after each response by setting followups=True.
A second model call produces a configurable number of short, context-aware suggestions based on the conversation, giving users a clear path to continue their work without having to formulate the next question themselves.
Details:
- Enable with
followups=True; control the number of suggestions withnum_followups(default 3) - Use
followup_modelto route followup generation to a smaller, cheaper model independently of the main agent model - Suggestions are available on
response.followupsfor non-streaming runs - Streaming surfaces suggestions via the
FollowupsCompletedevent, emitted after the main response finishes - Works for both agents and teams with no additional configuration
See agent with Followup Suggestions docs for more.
Resolved a bug where each iteration of a Loop always received the original input rather than the output of the previous iteration, causing loops to repeat work instead of building on it. A new forward_iteration_output flag lets you explicitly opt in to passing each iteration's output forward as the next iteration's input.
Details:
- Fixes the default behavior where loop iterations were incorrectly re-receiving the original input
- Set
forward_iteration_output=Trueto chain iteration outputs sequentially through the loop - Default behavior remains unchanged for workflows that do not set the flag, preserving backward compatibility
See docs
A json_serializer is now passed during MySQL engine creation, ensuring that JSON fields are correctly serialized when reading from and writing to MySQL-backed databases. This resolves silent data corruption and type errors that could occur when storing structured agent state or session data.
Details:
json_serializeris applied automatically at engine creation time- Fixes incorrect handling of JSON columns in MySQL storage backends
- No application-level changes required
Resolved a bug in OpenAIResponses where combining external_execution tools with standard tools caused incorrect dispatch behavior. Mixed tool configurations now route correctly, allowing both tool types to coexist in the same agent without unexpected failures or skipped calls.
Details:
- Fixes tool call handling for agents using both
external_executionand regular tools simultaneously - No configuration changes required; the fix applies automatically
- Improves reliability for agents with hybrid tool setups using the OpenAI Responses API
serve() now reads AGENT_OS_HOST and AGENT_OS_PORT environment variables as fallbacks when explicit values are not passed. This removes the need to hardcode host and port configuration at the call site, making containerized and orchestrated deployments cleaner to manage.
Details:
- Set
AGENT_OS_HOSTandAGENT_OS_PORTin your environment to configureserve()without code changes - Explicit arguments passed to
serve()continue to take precedence - No changes required for existing deployments that pass host and port directly
See environmental fallback docs for more.
Images and audio generated during a run are now consistently included in run output regardless of the store_media setting. Media is scrubbed before being written to the database, keeping storage lean while ensuring callers always receive the full output they expect.
Details:
- Generated media (images, audio) is present in run output in all cases
- Media is stripped from the payload prior to database storage, decoupling output completeness from persistence behavior
store_mediacontinues to control persistence; output delivery is no longer tied to it
Agents and teams are now assigned human-readable IDs (e.g., brave-falcon-7x3k) instead of raw UUIDs.
This makes it significantly easier to identify and track specific runs at a glance across logs, traces, and monitoring dashboards without needing to cross-reference opaque identifier strings.
Details:
- Human-readable IDs are generated automatically for all agents and teams
- Existing workflows referencing explicit session or agent IDs are unaffected
- Improves legibility across AgentOS trace views, logs, and debugging output
GmailTools has been extended with a broader set of email management functions and a new service account authentication path. Teams can now handle more of the Gmail workflow, including reading, drafting, sending, replying, labeling, and searching, directly from an agent, while platform teams gain a credential-free auth option suited for server-side and multi-user deployments.
from agno.agent import Agent
from agno.tools.google.gmail import GmailTools
agent = Agent(tools=[GmailTools()])
agent.print_response("Show me my latest 5 unread emails", markdown=True)
Details:
- New tools cover the full email lifecycle:
get_emails_by_date,get_emails_by_thread,send_email_reply,create_draft_email,mark_email_as_read,mark_email_as_unread,list_custom_labels,apply_label,remove_label, anddelete_custom_label - Service account authentication provides a governed, non-personal credential path for automated and multi-tenant workflows
- Existing OAuth credential flows (
creds,credentials_path,token_path) continue to work with no migration required - Use
include_toolsorexclude_toolsto expose only the subset of tools your agent needs
See Gmail docs for reference.
Engineering and platform teams using GitLab can now connect agents directly to their repositories. GitlabTools brings read-focused GitLab access to Agno agents, covering projects, merge requests, and issues, with async support and granular control over which tools are exposed.
This makes it straightforward to build agents that monitor repository activity, triage open issues, summarize merge request pipelines, or answer questions about project state, without custom API wrappers or manual data fetching.
Details:
- Covers five read operations out of the box: list and inspect projects, merge requests, and issues
- Supports both GitLab.com and self-hosted GitLab instances via a configurable base URL
- Each tool can be toggled on or off individually using
enable_*parameters, giving teams precise control over what the agent can access - Async support ensures GitLab operations don't block agent execution in concurrent or high-throughput deployments
- Authentication via a GitLab access token set through environment variables — no code changes needed to rotate credentials
from agno.agent import Agent
from agno.tools.gitlab import GitlabTools
agent = Agent(
instructions=[
"Use GitLab tools to answer repository questions.",
"Use read-only operations unless explicitly asked to modify data.",
],
tools=[GitlabTools()],
)
agent.print_response(
"List open merge requests for project 'gitlab-org/gitlab' and summarize the top 5 by recency.",
markdown=True,
)
The built-in session search tool has been upgraded from a single-pass lookup to a two-step process: the agent first calls search_past_sessions() to retrieve lightweight previews of recent sessions, then selectively fetches the full conversation for a specific session with read_past_session(session_id).
This reduces unnecessary data loading and gives the agent a clearer, more structured path to locating relevant history.
Details:
search_past_sessions()returns per-run previews across recent sessions without loading full message historiesread_past_session(session_id)fetches the complete conversation for a targeted session on demand- Control scope with
num_past_sessions_to_search(default 20) andnum_past_session_runs_in_search(default 3) to tune preview depth - Session history is scoped per user — agents cannot surface another user's sessions
- Enable with
search_past_sessions=Trueon theAgent; no other changes required
See cookbook for reference.
JSON schema generation now handles Literal types, ensuring that agents and tools using constrained value sets produce valid, complete schemas. This closes a gap that could cause schema generation to fail or produce incomplete type definitions for structured outputs.
Details:
Literaltypes are now correctly represented in generated JSON schemas- Improves reliability of structured output validation and tool definitions
- No migration required
OpenAIResponses now supports input_file, letting you pass files directly into OpenAI Responses API calls. This simplifies document-aware workflows by removing the need to pre-process or separately upload files before invoking a model.
Details:
- Pass files directly as input alongside text prompts
- Reduces pipeline complexity for document analysis, extraction, and summarization tasks
See cookbook for reference
We resolved a race condition in OpenAI Responses where file_search could silently return empty results due to eventual consistency in OpenAI's vector store file listing API. Polling now correctly waits for file readiness, ensuring that retrieval queries return complete, accurate results from the start.
Details:
- Eliminates silent empty results caused by premature polling
- No application changes required; the fix applies automatically
- Improves reliability for RAG and document-grounded workflows using OpenAI file search
Google tools have been restructured into a dedicated agno.tools.google sub-package (e.g., from agno.tools.google import GmailTools). This organizes a growing set of Google integrations under a single, predictable namespace. Existing import paths continue to work, so no migration is required.
Details:
- All Google tools consolidated under
agno.tools.google - Backward-compatible; legacy imports remain functional
- Establishes a consistent pattern as Google tool coverage expands
File upload endpoints now accept image/heic and image/heif formats, removing the need to convert Apple-native image formats before ingestion. This reduces friction for teams processing user-submitted or mobile-captured content and ensures broader device compatibility out of the box.
Details:
- Native support for HEIC/HEIF alongside existing image formats
- No configuration changes required
- Improves throughput for field, support, and on‑site capture flows
A new approval status endpoint lets you query where a paused run stands in the approval process, and admin-gated enforcement ensures that only authorized users can continue execution. Together, these changes give teams auditable, policy-driven control over Human-in-the-Loop workflows, closing gaps between initiating a pause and resuming work.
Details:
- Query approval status programmatically to build dashboards, alerts, or integration triggers
- Admin-gated continue-run enforcement prevents unauthorized resumption of paused executions
- Strengthens governance for high-stakes or compliance-sensitive workflows
See docs for reference
A new approval status endpoint lets you query where a paused run stands in the approval process, and admin-gated enforcement ensures that only authorized users can continue execution. Together, these changes give teams auditable, policy-driven control over Human-in-the-Loop workflows, closing gaps between initiating a pause and resuming work.
Details:
- Query approval status programmatically to build dashboards, alerts, or integration triggers
- Admin-gated continue-run enforcement prevents unauthorized resumption of paused executions
- Strengthens governance for high-stakes or compliance-sensitive workflows
See docs for reference
AgentOS now supports an advanced filtering DSL for traces, letting you construct precise, composable queries to isolate specific runs, models, components, or behaviors. This replaces broad, manual trace inspection with targeted retrieval, accelerating debugging, audit workflows, and performance analysis.
Details:
- Composable filter expressions for fine-grained trace queries
- Reduces time-to-resolution when diagnosing issues across complex agent and workflow executions
- Available through AgentOS trace endpoints
See cookbook for reference or view the docs
Knowledge sources now support GitHub App authentication (app_id, installation_id, private_key) in addition to personal access tokens. This gives platform and security teams a more governed authentication path, with scoped permissions, no personal credentials in the pipeline, and thread-safe token caching that handles expiration automatically. Both sync and async variants are supported.
Details:
- Authenticate as a GitHub App for fine-grained, org-managed access to private repositories
- Thread-safe token caching eliminates redundant auth requests and simplifies concurrent workloads
- Personal access tokens continue to work; no migration required
See cookbook for reference
ModelsLabTools now supports text-to-image generation with PNG/JPG outputs, an image fetch endpoint, and sizing options. Teams can add visual generation to agents and workflows without custom integrations, standardize on common file types, and speed prototyping. This reduces integration effort and helps deliver richer end-user experiences with minimal setup.
Details:
- Generate images from text prompts with configurable dimensions
- PNG and JPG outputs for compatibility with existing storage and delivery pipelines
- Unified tool interface works across agents and workflows
See docs for reference
