AI Guardrails

Definition

Safety mechanisms designed to constrain AI system behavior within acceptable boundaries. Guardrails can operate at multiple levels: input filtering (blocking malicious prompts), output filtering (removing harmful content), behavioral constraints (limiting what models can say), and execution boundaries (controlling what agents can do). The term is often used broadly but encompasses very different technical approaches.

Why It Matters

Without guardrails, AI systems can generate harmful content, leak sensitive data, hallucinate facts, and execute destructive actions. As AI agents gain more autonomy and tool-use capabilities, the need for guardrails extends beyond content moderation to execution control — preventing harmful actions, not just harmful text.

How Exogram Addresses This

Exogram provides execution-level guardrails — the gate between agent reasoning and tool execution. While tools like Guardrails AI filter model outputs and NeMo Guardrails control dialog, Exogram governs what agents are allowed to do. Different layers, different problems.

Related Terms

high severityProduction Risk Level

Key Takeaways

  • Guardrails exist at multiple layers: input, output, behavioral, and execution
  • Content guardrails ≠ execution guardrails — both are needed
  • Exogram provides the execution layer; tools like NeMo handle dialog

Comparison

TypeWhat It ControlsExample
Input filteringWhat users can sayBlock malicious prompts
Output filteringWhat models can generateRemove PII from responses
BehavioralHow models behaveConstitutional AI principles
Execution (Exogram)What agents can DOBlock destructive tool calls

Governance Checklist

0/6Vulnerable

Frequently Asked Questions