What Are AI Guardrails?

The complete guide to constraining AI behavior — from content filtering to execution boundaries.

AI guardrails are safety mechanisms designed to constrain AI system behavior within acceptable boundaries. The term is used broadly to describe very different technical approaches — from input filtering to execution governance. Understanding the distinctions is critical because different guardrail types solve different problems.

Four Types of AI Guardrails

(1) Input Guardrails — filter malicious or inappropriate inputs before they reach the model. Detect prompt injection, block PII, enforce topic boundaries. (2) Output Guardrails — filter model outputs for harmful, inaccurate, or policy-violating content. Catch hallucinations, block toxic content, enforce formatting. (3) Behavioral Guardrails — constrain what the model can say or how it responds. System prompts, Constitutional AI, RLHF. (4) Execution Guardrails — control what agents can do. Validate tool calls, enforce action policies, verify system state before execution. Most "guardrail" tools focus on types 1-3. Type 4 is the missing layer.

The Guardrails Landscape

Guardrails AI focuses on output validation — checking model responses against predefined validators. NeMo Guardrails controls dialog flow — managing what the model can say in a conversation. Lakera detects prompt injection and PII in inputs. Rebuff provides prompt injection detection. These tools address important problems, but none operate at the execution boundary — the point where agent actions become real-world state changes.

Why Execution Guardrails Are Different

Content guardrails (types 1-3) are about what the model says. Execution guardrails (type 4) are about what the model does. A model can produce perfectly safe text and still execute a destructive function call. The function call is syntactically correct, passes schema validation, and follows all output formatting rules — but it deletes a production database. This is why execution governance is a separate discipline from content moderation.

Building a Complete Guardrail Stack

Defense in depth requires all four types: Use input guardrails to block malicious inputs. Use output guardrails to catch harmful content. Use behavioral guardrails to shape model intent. Use execution guardrails to govern agent actions. Exogram provides the execution layer — the final gate between agent reasoning and tool execution. It works alongside, not instead of, content-level guardrails.

Frequently Asked Questions

Are guardrails enough to make AI safe?

Content guardrails alone are not sufficient for agents with tool-use capabilities. You also need execution governance — controlling what agents can do, not just what they say.

What is the difference between Guardrails AI and Exogram?

Guardrails AI validates model outputs (text content). Exogram validates agent actions (tool calls, database writes, API requests). Different layers, different problems. They are complementary.

Can I use multiple guardrail systems together?

Yes, and you should. Defense in depth means layering input filtering, output validation, behavioral constraints, and execution governance. Exogram operates at the execution layer and integrates with any model or framework.