Gemini 3.1 Pro in practice: 12 cookbooks and a multi-agent investment team

We got early access to Gemini 3.1 Pro, so we've been putting it through its paces.

Rather than just benchmarking it, we wanted to answer a more practical question: what can you actually build with it right now? So we put together 12 guided cookbooks that take you from a blank slate to production-ready agents. Then we stress-tested the whole stack by building something real.

Gemini 3.1 Pro

What makes Gemini 3.1 Pro notable is that it significantly boosts core reasoning and problem-solving abilities, more than doubling performance on advanced logic benchmarks compared to Gemini 3 Pro, and improving how the model handles complex, multi-step tasks and multimodal inputs like text, code, and images.

It’s essentially a smarter, more capable baseline for challenging workflows and agentic applications, with broader rollout across Google’s AI tools and APIs.

Agent cookbooks: Zero to production

We put together 12 guided cookbooks that take you from a single tool-using agent to multi-agent teams and step-based workflows. These cookbooks are designed to be worked through in sequence, each one layering on a new capability.

The setup is intentionally minimal: you need Python and an API key. That's it. Each cookbook is a clean, runnable example that focuses on one core capability, can be run independently, and is loaded with detailed comments so you understand what's happening under the hood, not just what to copy and paste.

All examples use Gemini 3.1 Pro for its strong multi-step reasoning and reliable tool execution. If you want to move faster or reduce costs during development, Gemini 3 Flash is excellent at tool calling and swaps in with a one-line change. In fact, any supported model works; the abstractions are designed so the model is never a lock-in decision.

You start with tools. A Finance Agent wired with YFinanceTools can pull real-time market data and produce an investment brief on NVIDIA in a single prompt. From there, structured output and typed I/O lock down the response format. Instead of parsing freeform text, you receive clean Pydantic models with full type safety on both input and output.

Next comes persistence. The storage cookbook adds a SQLite database so your agent remembers conversations across sessions. You can ask it about NVDA, close the script, come back later, and it will pick up right where you left off. Memory takes this further by extracting and retaining user preferences over time. State management shows how to maintain items such as a personal stock watchlist that persists across interactions.

The knowledge cookbook is where things start to feel real. You load documents into a vector store with hybrid search, which gives your agent the ability to search over your own data instead of relying only on its training set. The custom tools section shows you how to write your own toolkits and extend what the agent can do beyond the 100 built-in options.

The final cookbooks address production concerns. Guardrails add input validation to block PII and prevent prompt injection. Human-in-the-loop requires explicit user confirmation before the agent executes sensitive operations. The agent pauses during execution, asks for approval, and proceeds only after you confirm.

Then comes the payoff: multi agent teams and workflows. The teams cookbook builds a Bull vs. Bear stock analysis system. A Bull Analyst argues the case for a stock, a Bear Analyst argues against it, and a Lead Analyst synthesizes both perspectives into a balanced recommendation. The system is adversarial by design and produces richer analysis than any single agent could. The workflows cookbook chains these components into a sequential research pipeline with predictable, step by step execution.

Here’s the quickstart.

The real test: A multi-agent investment team

To push Gemini 3.1 Pro beyond simple cookbook exercises, we built a full multi-agent investment committee: seven specialized agents managing a simulated $10M fund under real institutional constraints.

Each agent has a distinct role: a Market Analyst tracking macro trends and sector rotation, a Financial Analyst running fundamentals and valuation, a Technical Analyst reading price action and momentum, and a Risk Officer stress-testing downside scenarios and portfolio exposure. A Knowledge Agent manages the research library and memo archive, a Memo Writer produces formal investment write-ups, and a Committee Chair makes final allocation decisions.

This isn’t a toy; the fund operates under real institutional constraints, including large-cap-only exposure, sector concentration limits, position sizing rules, and correlation caps.

View the code for the multi-agent investment team here.

What makes the architecture interesting

The architecture has three layers of knowledge working together. The fund's mandate and rules are baked directly into the prompt as static constraints. On top of that, a RAG pipeline backed by PgVector lets agents search across company profiles and sector analyses loaded into the knowledge base. As the system operates, a file based memo archive accumulates past decisions, which gives the committee a form of institutional memory that compounds over time.

What makes the orchestration interesting is that there is not one fixed way the agents talk to each other. The system selects from five multi-agent architectures—Coordinate, Route, Broadcast, Task, and Workflow—depending on what the query requires. A broad question about semiconductor trends is broadcast to multiple analysts in parallel. A targeted stock lookup is routed directly to the appropriate specialist. A formal investment decision triggers the full committee workflow: Market Assessment → Deep Dive (where Fundamental and Technical Analysis run in parallel) → Risk Assessment → Investment Memo → Committee Decision. Each step's output feeds into the next, producing a complete investment memo with a final allocation call.

The part that surprised us most was the institutional learning. Across runs, the system begins to discover its own patterns by surfacing corrections, refining its assumptions, and building on insights from previous sessions. It stops feeling like a question and answer tool and starts behaving more like a team that develops a shared and evolving view of the portfolio and its market environment.

Observability from day one

One thing we did not want to compromise on was visibility. Every cookbook (and the investment team) comes wired with AgentOS for tracing, monitoring, and scheduling right out of the box. You can inspect agent runs step by step, track sessions over time, and schedule recurring tasks without adding extra infrastructure.

Just start the server and connect at os.agno.com. You will see every agent, every tool call, and every decision rendered in a visual interface where you can chat with your agents, explore sessions, and drill into detailed traces.

This matters because most agent projects die in the gap between "cool demo" and "thing I can actually debug at 2am." AgentOS closes that gap.

How Gemini 3.1 Pro actually performs

Here's where it gets honest.

The good: Gemini 3.1 Pro is impressively strong at capturing and retaining learnings. The investment team is where this really became clear. The model synthesized information across agents and sessions in a way that made the memory and knowledge layers feel genuinely useful instead of gimmicky.

The tradeoff: Latency is noticeably high right now. For the investment team use case, where most queries are async research tasks, this is manageable. But if you're building real-time, user-facing agents, it's something to plan around.

Check out the agent quickstart cookbooks and the multi-agent investment team.