Agents that coordinate, not collide
How Datai turned a fragile multi-agent prototype into a production system
“We stopped writing plumbing code and started writing business logic. That's the real win.”
- Igor Lessio, Chief Technology Officer at Datai Network
Industry
Headquarters
Company Size
Product Focus
Challenge
- No real orchestration layer; agents wired together manually
- Every workflow change triggered cascading rewrites
- Response times averaged 2.5 minutes with a 5–10% failure rate
- Manual Redis session management caused mid-conversation context loss
Solution
- Rebuilt on Agno with a coordinated multi-agent workflow
- Replaced manual session tracking with Agno's PostgreSQL-backed storage
- Enabled streaming with a single
stream=Trueswitch - Added grounding safeguards to prevent hallucinated numbers
Results
- Response latency: 2.5 minutes → ~25 seconds (~80% faster)
- Development time per feature: ~3 weeks → 2–3 days
- API cost per complex query: ~40–50% lower
- Workflow failure rate: 5–10% → under 1%
- Orchestration code: ~2,000 → ~500 lines
About
Datai Network and Crunchie
Datai Network is a decentralized data and analytics platform that builds structured blockchain intelligence for multiple verticals, including DeFi, NFTs, and real-world assets. Their proprietary enrichment process surfaces data that is usable, queryable, and safe to build products on.
Crunchie, Datai’s “AI-powered on-chain companion”, is a multi-agent system that’s designed to democratize market intelligence, bringing institutional-level analysis into a format anyone can access and learn from.
Challenge
Orchestration chaos and the state management trap
Datai launched the Alpha version of Crunchie in Q2 2025. It was built on Datai's enterprise-grade infrastructure, the same data streams and analytics backbone used by funds, protocols, and analytics firms. The system was organized as a coordinated team of five specialist agents: a Yield Analyzer that scanned liquidity pools for sustainable APYs, a Market Pulse agent that detected sentiment and buying pressure, a News Monitor that tracked DeFi headlines, a Risk Analyzer that evaluated liquidity and project credibility, and a Smart Synthesizer that combined everything into a readable output.
It showed real promise. The concept was sound, but the execution had limits.
Crunchie Alpha was built in Python and Go, using a data lake solution, and the system had no real orchestration layer underneath it. Agents weren't coordinated—they were wired together manually, with no shared state management and no structured way to handle parallel execution.
Every workflow change triggered cascading rewrites. Add an agent, tweak a step, adjust routing logic, and suddenly half the orchestration layer needed to be rebuilt from scratch. Response times ran around 2.5 minutes per query, and there was no reliable way to trace what had gone wrong when something failed.
After some changes in tech leadership, the Datai team assessed the situation quickly. The data foundation was strong, but the agent architecture needed to be rebuilt from the ground up.
Decision
Choosing to rebuild Crunchie with Agno
First, they tried LangChain but found it frustrating. “The libraries were too difficult to navigate. Too many of them belonged to the community. And it was not easy to keep pace with the changelog.” They also wrote their own pure Python orchestration loop, which worked for simple use cases but didn’t scale.
Finally, they determined that what Crunchie needed was a framework that could handle real multi-agent coordination without becoming its own maintenance burden. So, they decided to rebuild Crunchie with Agno.
"What Agno provides is a reliable foundation for autonomous, production-grade agent workflows. Especially when you want to do an MVP that is fast."
Four things specifically drew Datai’s tech team to Agno:
1. Flexibility
Simplicity and real flexibility rarely coexist. They found both in Agno.
“It's so flexible. Any model, any tool sets, any MCP server, it just works.”
Because Agno is plain Python, they can extend memory systems, add custom output formatting, or swap models without fighting the framework. “We can expand tools in five, six seconds. We can expand the memory system by ourselves.”
2. Reliability and logging
“We can trust it. It has one of the best logging systems we’ve ever seen. If something is wrong, we know immediately.”
They contrast this directly with other frameworks. “With CrewAI the logging system was not good for our use case. NVIDIA did not even provide one.” When an LLM provider has an issue, such as a rate limit, a hallucination, or a context overflow, Agno surfaces it immediately. “In seconds you can find out.”
3. Speed
Speed to prototype matters because the field moves too fast for anything else.
"With Agno, you go from zero to an MVP in hours, not days, not weeks. We were not able to do that with the other frameworks we tested.”
4. Team responsiveness
In a field that changes weekly, direct access to the team is a meaningful technical advantage. “The first time we interacted with your CEO, we had a question. We couldn’t fix something, and he literally told us how to do it on Twitter. That’s priceless.”
Solution
Replacing glue code with a production-grade foundation
Agno didn't just speed things up. It removed entire categories of work that don't differentiate a product but still determine whether it survives in production like session management, streaming infrastructure, workflow orchestration, and error handling. What had been a prototype held together by glue code became a system the team could iterate on quickly and ship with confidence.
What changed under the hood
Previously, conversation state was manually tracked with Redis key management and session timeouts. With Agno, PostgreSQL-backed workflow storage persists state automatically. Datai passes a session_id, and the workflow stays consistent across runs.
Streaming, which once required weeks of custom SSE work, became a single switch: stream=True.
Parallel execution, which previously required custom async coordination logic, was replaced by clean Step objects and Parallel blocks.
Error handling and retries, which used to be a tangle of try/catch logic, are handled gracefully by default.
The new architecture
Igor, Datai Network's CTO, describes the new setup as a professional kitchen, and the contrast with Alpha is the point.
"Before Agno, chefs were working in isolation, passing notes through runners and trying to coordinate timing manually. With Agno, there's a head chef—the workflow—coordinating specialists and making sure the final dish arrives coherent and on time."
The pipeline works in four stages:
A Prompt Expert opens every query, cleaning up ambiguous inputs before they reach any specialist agent. For example, “BNB” becomes “Binance Coin on Binance Smart Chain.” It also filters low-quality requests early. This step alone cut API costs by around 30 percent.
Then three agents run in parallel: an APY Expert identifying yield opportunities, a Pool Expert analyzing liquidity and pool health, and a News Expert gathering market sentiment and relevant context.
Finally, a Coordinator synthesizes all three outputs into a single ranked, structured, and readable response, complete with direct links to the relevant pools and platforms.
Agno handles everything between those steps, including timing, coordination, state persistence, and streaming responses as they are generated. Response time dropped from 2.5 minutes to around 25 seconds.
Keeping responses grounded
Crunchie needed to stay anchored in real data, especially for APYs, pool risk, and market context. Datai layered four safeguards to ensure the agents never invented numbers.
A vector knowledge base using LanceDB is queried first. DeFi protocols, token data, and documentation are embedded as a reference library that the agent consults before answering. Tool-based live data retrieval through Datai’s APIs ensures that if the agent needs current figures, it calls tools instead of generating them. Prompt constraints require the Coordinator to synthesize only from verified agent outputs. A Redis conversation cache preserves consistency across turns.
“He cannot lie to me because he has to present numbers that come from our database,” Igor says. “So there’s no lie there.”
The parallel design also solves a subtler problem: context poisoning. Rather than allowing a long sequential chain to accumulate tokens, each agent formats and cleans its own output before passing it forward.
“We control the output. We format it, we clean it. Then we pass it to the last agent. They read everything in a context that is clean first and that never goes over a certain number of tokens.”
Igor had seen what happens when context becomes overloaded. In one case, data ingested in Mandarin caused the agent to fail entirely. The architecture is designed to prevent that from happening.
Reliability: fixing the session state ghost
The worst production failure Datai encountered was Crunchie forgetting context in the middle of a conversation. A thread would be flowing, and then suddenly the agent behaved like it had amnesia.
The cause was subtle. Datai was manually managing session state in Redis while Agno’s PostgreSQL-backed workflow storage maintained its own session layer. Those systems were not synchronized, so state drifted over time.
The fix was to stop fighting the framework and let Agno own workflow state entirely. PostgreSQL-backed storage became the single source of truth, while Redis was limited to conversation history caching. Session state issues dropped to near zero, shifting from a weekly debugging event to something the team barely has to think about.
What’s next
Crunchie was first made available to the public through Datai's marketing campaign, allowing users to try the first version for free, test it out, and help enhance it. Free access has now ended, and Crunchie may relaunch as a full on-chain companion.
Results
Measurable gains across speed, cost, and reliability
Datai measured clear wins across speed, reliability, and cost.
Note: metrics are based on Datai’s internal production measurements and vary by chain, query complexity, market conditions, and workload.
These numbers don't just look good in a dashboard—they change what's possible. When you can iterate in days, ship safer workflows, and respond in around 20 seconds, you can experiment and improve without constantly paying an infrastructure tax.
The "wow" moment: DeFi Yield Hunter in ~20 seconds
A user asked: "Find me high-yield opportunities for BNB on BSC right now, including risks and recent news." Around 20 seconds later, they had a structured response: ranked opportunities by risk-adjusted yield, current pool numbers, risk context, and relevant news—with direct links to act on it.
The user's reaction: "How did you get all this data so fast? This would've taken me hours."
What surprised Datai in production
A few unexpected wins stood out after shipping on Agno
- Streaming worked out of the box, and output arrived progressively without custom SSE headaches
- Session management became almost invisible, with migrations, persistence, and recovery handled automatically.
- Prompt filtering drove more savings than expected.
- Catching and discarding malformed queries early reduced API costs by around 30 percent.
- Step-level tracing also transformed debugging. Instead of reconstructing what went wrong from scattered logs, the team could pinpoint the exact step that misfired.

