Securing Your Agents with Guardrails

Protect your AI agents from PII leaks, prompt injections, and NSFW content using Agno's guardrails.

Guardrails are security checkpoints that validate inputs before they reach your language model, protecting against PII leaks, prompt injection, jailbreaks, and inappropriate content.

Overview

Protect your agents from:

PII Leaks: SSNs, credit cards, emails, phone numbers
Prompt Injection: "Ignore previous instructions..."
Jailbreaks: "Developer mode" attempts
NSFW Content: Hate speech, violence, harmful content

Use guardrails when your agent is exposed to real users or handles sensitive data.

Prerequisites

Python 3.9+
pip install agno
OpenAI API key
Basic Agno knowledge: Getting Started

from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.guardrails import PIIDetectionGuardrail

agent = Agent(
    model=OpenAIChat(id="gpt-5-mini"),
    pre_hooks=[PIIDetectionGuardrail()],
)

# Safe input works
agent.print_response("What's your return policy?")

# PII is blocked
agent.print_response("My SSN is 123-45-6789")  # Raises InputCheckError

Built-in Guardrails

1. PII Detection

The PII Detection Guardrail automatically scans inputs for sensitive information like Social Security Numbers, credit card numbers, email addresses, and phone numbers. By default, it blocks any input containing PII.

Block PII:

from agno.guardrails import PIIDetectionGuardrail

agent = Agent(
    model=OpenAIChat(id="gpt-5-mini"),
    pre_hooks=[PIIDetectionGuardrail()],
)

‍

Sometimes you want to process requests while still protecting sensitive data. The masking feature replaces PII with asterisks before sending to the LLM, allowing your agent to understand context without exposing actual sensitive information.

Mask PII instead of blocking:

agent = Agent(
    model=OpenAIChat(id="gpt-5-mini"),
    pre_hooks=[PIIDetectionGuardrail(mask_pii=True)],
)
# Input: "My SSN is 123-45-6789"
# LLM receives: "My SSN is ***********"

‍

You can disable specific PII checks (e.g., allow emails in support tickets) or add custom patterns to detect business-specific sensitive data like employee IDs or internal account numbers.

Custom patterns:

guardrail = PIIDetectionGuardrail(
    enable_email_check=False,  # Disable email check
    custom_patterns={
        "bank_account_number": r"\b\d{10}\b",  # Add custom pattern
    }
)

‍

Learn more:

2. Prompt Injection Defense

Prompt injection is one of the most common attacks on AI systems. Attackers try to manipulate your agent by injecting instructions like "Ignore previous instructions and..." to bypass your system prompts. This guardrail detects common injection patterns and blocks them.

from agno.guardrails import PromptInjectionGuardrail

agent = Agent(
    model=OpenAIChat(id="gpt-5-mini"),
    pre_hooks=[PromptInjectionGuardrail()],
)

# Blocks: "Ignore previous instructions..."
# Blocks: "Developer mode activated..."
# Blocks: "You are now a different AI..."

‍

The default patterns cover most common attacks, but you can customize them to match your specific security needs or reduce false positives for your use case.

Custom patterns:

guardrail = PromptInjectionGuardrail(
    injection_patterns=["ignore previous instructions", "bypass security"]
)

‍

Learn more:

3. Content Moderation

Use OpenAI's Moderation API to automatically filter inappropriate content including hate speech, violence, self-harm, sexual content, and other harmful material. This happens before your main LLM call, saving costs on inappropriate requests.

from agno.guardrails import OpenAIModerationGuardrail

agent = Agent(
    model=OpenAIChat(id="gpt-5-mini"),
    pre_hooks=[OpenAIModerationGuardrail()],
)

‍

By default, all moderation categories are checked. You can customize which categories trigger blocks based on your application's requirements. For example, a medical application might allow self-harm discussions while blocking hate speech.

Custom categories:

guardrail = OpenAIModerationGuardrail(
    raise_for_categories=["violence", "hate", "sexual"]
)

‍

Learn more:

Production Setup: Multiple Guardrails

For production systems, use multiple guardrails together for defense-in-depth security. Each layer catches different types of threats, and ordering them by speed ensures optimal performance.‍

from agno.guardrails import (
    PIIDetectionGuardrail,
    PromptInjectionGuardrail,
    OpenAIModerationGuardrail,
)

secure_agent = Agent(
    model=OpenAIChat(id="gpt-5-mini"),
    pre_hooks=[
        PIIDetectionGuardrail(mask_pii=True),      # Layer 1: Protect PII
        PromptInjectionGuardrail(),                 # Layer 2: Stop attacks
        OpenAIModerationGuardrail(),                # Layer 3: Filter content
    ],
)

‍

Performance tip: Order by speed (fastest first):

PromptInjectionGuardrail (fast regex checks)
PIIDetectionGuardrail (regex + validation)
OpenAIModerationGuardrail (external API call)

Custom Guardrails

Create custom guardrails for business-specific rules by extending BaseGuardrail. This example blocks any input containing URLs, useful for applications where you don't want users sharing external links.‍

import re
from agno.exceptions import CheckTrigger, InputCheckError
from agno.guardrails import BaseGuardrail
from agno.run.agent import RunInput


class URLGuardrail(BaseGuardrail):
    """Block inputs containing URLs."""

    def check(self, run_input: RunInput) -> None:
        if isinstance(run_input.input_content, str):
            url_pattern = r'https?://[^\s]+|www\.[^\s]+'
            if re.search(url_pattern, run_input.input_content):
                raise InputCheckError(
                    "URLs are not allowed.",
                    check_trigger=CheckTrigger.INPUT_NOT_ALLOWED,
                )

    async def async_check(self, run_input: RunInput) -> None:
        self.check(run_input)  # Reuse sync logic for async

‍

You must implement both check() (sync) and async_check() (async) methods. Agno automatically uses the right one based on whether you call .run() or .arun().

Learn more: BaseGuardrail Reference

Result

Your agent now has:

PII Protection: Detect and mask sensitive information
Injection Defense: Block malicious prompts automatically
Content Filtering: Stop NSFW and harmful content
Custom Rules: Enforce business-specific validation

Next Steps

Guardrails for Teams - Multi-agent security
Guardrails Overview - Detailed concepts
Cookbook - More patterns
Deploy with AgentOS - Production deployment