Docling Reader: How to ingest any document format into your AI agent's knowledge base

Building a knowledge base for an AI agent sounds simple at first: load your documents, embed them, and retrieve what you need. That simplicity quickly breaks down once you start dealing with real-world data. Documents arrive in many forms—PDFs from legal, slide decks from product, spreadsheets from finance, images of invoices, even recordings of meetings—and each format typically demands its own tools, configurations, and ingestion workflows. What begins as a straightforward setup can quietly turn into a complex, fragmented document processing system.

Agno’s Docling Reader is designed to remove that complexity. Built on IBM Research’s open-source Docling library, it provides a single, unified interface for handling the wide range of document formats that AI agents encounter. Instead of stitching together multiple libraries and pipelines, you can rely on one reader to process everything consistently.

What file formats does the Docling Reader support?

The Docling Reader supports documents across all major categories:

Documents: PDFs, Word files (.docx), markup files
Presentations: PowerPoint files (.pptx)
Spreadsheets: Excel files (.xlsx)
Images: JPEG, PNG, and other image formats
Audio and video: MP4 and other media files (via FFmpeg and Whisper)

That last category is worth calling out. Tasks like transcribing meetings, indexing product demos, or processing lectures are often treated as entirely separate workflows, but here they are handled within the same system using FFmpeg and OpenAI Whisper behind the scenes.

How the Docling Reader works

Every document that passes through the Docling Reader moves through the same pipeline, no matter what format it starts in. Agno exposes this pipeline through the DoclingReader interface so it integrates directly with your knowledge base. That consistency is what makes it possible to treat a wide range of inputs as part of a single, unified system.

1. Format detection and parsing. When a file is passed in, Docling automatically determines what kind of document it is and applies the appropriate parsing logic. There is no need to specify whether you are working with a PDF, a spreadsheet, or a presentation. The reader handles that decision for you.

2. Structure preservation. As the content is extracted, the process goes beyond pulling out raw text. The structure of the document is preserved alongside it, including elements like headings, tables, hierarchies, formulas, and layout. This ensures that the meaning carried by the organization of the document is not lost during processing.

3. Unified conversion. Once parsed, everything is converted into a standardized internal representation. This step is what allows very different formats, like PDFs, slide decks, and spreadsheets, to be handled in the same way. Regardless of their original structure, they are brought into a common format before moving forward.

4. Flexible export. The content is exported into a usable output format, with Markdown as the default. This is the version that enters your knowledge base and becomes available for retrieval.

5. Chunking integration. The output flows directly into Agno’s chunking pipeline, where it is split and prepared for vector storage. Because the data is already structured and standardized, no additional preprocessing is needed, and the transition into retrieval is seamless.

How Docling preserves document structure for better RAG

What sets the Docling Reader apart is not just the range of formats it supports, but how it processes them. Rather than extracting plain text, it preserves the underlying structure of each document, capturing elements like headings, tables, hierarchies, formulas, and layout. This structural awareness makes a meaningful difference in retrieval quality. Context is not lost in translation: a financial table retains its headers, and a key insight remains tied to the section it came from. That added structure helps downstream systems understand not just what the content says, but how it is organized.

The output from Docling also integrates directly with Agno’s chunking pipeline, allowing structured data to flow cleanly into your vector store without extra preprocessing. The result is a more coherent, maintainable foundation for building retrieval systems that actually work as intended.

How the Docling Reader reduces complexity for RAG developers

The retrieval quality improvements are the most visible benefit, but there are two quieter ones that matter just as much in practice.

The first is development speed. When every format goes through the same reader, you stop writing format-specific ingestion logic. There is no PDF branch, no separate spreadsheet handler, no one-off script for the slide deck someone sent over. That reduction in surface area means faster iteration and fewer places for things to go wrong.

The second is dependency overhead. A typical multi-format document pipeline pulls in a different library for each format, each with its own versioning, authentication, and maintenance requirements. The Docling Reader consolidates that into a single dependency. Less to install, less to update, and less to debug when something changes upstream.

How to install the Docling Reader

Install the dependencies:

pip install -U docling agno openai

# For audio/video processing
pip install -U openai-whisper

‍

For audio and video, you'll also need FFmpeg installed on your system:

macOS: brew install ffmpeg
Ubuntu: sudo apt-get install ffmpeg
Windows: Download from ffmpeg.org

How to ingest multiple document formats with one reader

The core experience is a DoclingReader instance that you pass directly to knowledge.insert(). The same reader handles any supported format.

from agno.agent import Agent
from agno.knowledge.knowledge import Knowledge
from agno.knowledge.reader.docling_reader import DoclingReader
from agno.vectordb.pgvector import PgVector

knowledge = Knowledge(
    vector_db=PgVector(
        table_name="docling_documents",
        db_url="postgresql+psycopg://ai:ai@localhost:5532/ai",
    )
)

reader = DoclingReader()

# Each of these uses the same reader
knowledge.insert(path="documents/proposal.docx", reader=reader)
knowledge.insert(path="documents/presentation.pptx", reader=reader)
knowledge.insert(path="documents/financials.xlsx", reader=reader)
knowledge.insert(path="documents/invoice.jpeg", reader=reader)

# Audio/video uses the vtt output format
knowledge.insert(
    path="documents/meeting_recording.mp4",
    reader=DoclingReader(output_format="vtt"),
)

agent = Agent(knowledge=knowledge, search_knowledge=True)

agent.print_response(
    "Summarize the key information across all of these documents",
    markdown=True,
)

‍

You can also load directly from a URL:

knowledge.insert(
    path="<https://arxiv.org/pdf/2408.09869>",
    reader=DoclingReader(),
)

Docling Reader output formats explained

The Docling Reader defaults to Markdown output, which works well for most RAG use cases. For situations that require a different representation, the output_format parameter accepts several options.

FORMAT	USE CASES
markdown	Default. Clean structured text for most RAG applications
text	Plain text, no formatting
json/td>	Structured JSON representation
html	HTML output
html_split_page	HTML with per-page splits
doctags	DocTags format for fine-tuning
vtt	WebVTT captions — used for audio and video files

‍

For most teams building knowledge bases, the default is the right starting point. The flexibility is there when you need it.

What you can build with the Docling Reader

A unified research assistant. Instead of juggling separate pipelines for arXiv papers, supplementary PDFs, LaTeX files, and datasets in spreadsheets, you can bring all of them into one shared knowledge base. An agent can then answer questions that draw across these sources naturally, connecting ideas from a paper to its supporting data without any extra orchestration on your part.

An enterprise document system. Most companies already have information scattered across formats: legal contracts stored as PDFs, product plans living in slide decks, and financial models sitting in spreadsheets. With a unified reader, all of that material can be ingested through a single interface, making it possible to build an enterprise knowledge system that reflects how information actually exists, rather than forcing everything into one format before it becomes usable.

A meeting intelligence agent. You can ingest recorded meetings as video files alongside the slides that were presented and any documents shared afterward. From there, an agent can answer questions about what was discussed, what decisions were made, and how those decisions were supported, all within the same context instead of across disconnected tools.

An image-aware knowledge agent. Invoices, receipts, scanned forms, and diagrams can be processed and indexed alongside traditional text documents. That means they are no longer isolated or manually handled artifacts, but fully searchable components of your knowledge base, accessible through the same retrieval experience as everything else.

Docling Reader documentation and examples

from agno.knowledge.reader.docling_reader import DoclingReader

reader = DoclingReader()

# Pass to any knowledge.insert() call
knowledge.insert(path="your_document.pdf", reader=reader)

‍

Getting started with the Docling Reader is straightforward, and the Docling Reader documentation is designed to walk you through everything you might need. It covers the full set of parameters, explains the available output formats, and shows how to use the reader in asynchronous workflows. If you prefer to learn by example, there are ready-to-run samples that demonstrate common patterns like ingesting multiple document types, loading content from URLs, and working with async variants.

One reader. Every format. Better retrieval.