AI Guardrails
Definition
Safety mechanisms designed to constrain AI system behavior within acceptable boundaries. Guardrails can operate at multiple levels: input filtering (blocking malicious prompts), output filtering (removing harmful content), behavioral constraints (limiting what models can say), and execution boundaries (controlling what agents can do). The term is often used broadly but encompasses very different technical approaches.
Why It Matters
Without guardrails, AI systems can generate harmful content, leak sensitive data, hallucinate facts, and execute destructive actions. As AI agents gain more autonomy and tool-use capabilities, the need for guardrails extends beyond content moderation to execution control — preventing harmful actions, not just harmful text.
How Exogram Addresses This
Exogram provides execution-level guardrails — the gate between agent reasoning and tool execution. While tools like Guardrails AI filter model outputs and NeMo Guardrails control dialog, Exogram governs what agents are allowed to do. Different layers, different problems.
Related Terms
Key Takeaways
- → Guardrails exist at multiple layers: input, output, behavioral, and execution
- → Content guardrails ≠ execution guardrails — both are needed
- → Exogram provides the execution layer; tools like NeMo handle dialog
Comparison
| Type | What It Controls | Example |
|---|---|---|
| Input filtering | What users can say | Block malicious prompts |
| Output filtering | What models can generate | Remove PII from responses |
| Behavioral | How models behave | Constitutional AI principles |
| Execution (Exogram) | What agents can DO | Block destructive tool calls |