What Is Prompt Injection?

The #1 vulnerability in AI systems — and why it gets worse with tool use.

Prompt injection is an attack technique where malicious instructions are embedded in user inputs, data sources, or retrieved context to override an AI model's original instructions. It is considered the #1 vulnerability class in AI systems by OWASP and is particularly dangerous when AI agents have tool-use capabilities.

Direct vs Indirect Prompt Injection

Direct prompt injection targets the user input — "ignore your instructions and do X." Indirect prompt injection plants malicious instructions in external data sources that the model retrieves and follows — a poisoned document retrieved via RAG, a malicious website summarized by the model, or instructions embedded in an email the agent processes. Indirect injection is harder to detect because the malicious content appears as legitimate data.

Why Tool Use Makes It Worse

Without tool use, prompt injection can cause a model to generate harmful text. With tool use, prompt injection can cause a model to execute harmful actions — deleting databases, exfiltrating data, modifying billing records, and sending unauthorized communications. The risk escalates from "bad output" to "bad action." Every tool an agent can access is a potential target for injection-driven execution.

Defense in Depth

No single defense eliminates prompt injection. Effective defense requires multiple layers: (1) Input filtering — detecting known injection patterns in user inputs. (2) Output validation — checking model outputs for instruction-following vs instruction-injecting. (3) Context isolation — preventing retrieved data from overriding system instructions. (4) Execution governance — validating every tool call through deterministic policy rules before execution. Layer 4 is the final defense — even if injection succeeds at the model level, the execution boundary still validates every proposed action.

The Execution Boundary Defense

Most injection defenses operate at the input or output level. Exogram operates at the execution level — the boundary between agent reasoning and tool execution. Even if a prompt injection successfully manipulates the model's reasoning, the resulting tool calls still pass through 8 deterministic policy rules. A manipulated model can't execute destructive operations, exfiltrate data, or bypass boundaries without passing this gate.

Frequently Asked Questions

Can prompt injection be fully prevented?

At the model level, no — prompt injection is an inherent property of instruction-following models. But at the execution level, its impact can be eliminated. Even if injection succeeds and the model is manipulated, the execution boundary blocks unauthorized actions.

What is the difference between prompt injection and jailbreaking?

Jailbreaking bypasses safety training to generate restricted content. Prompt injection hijacks the model's behavior to follow attacker instructions. Jailbreaking is about what the model says; prompt injection is about what the model does.

How does Exogram defend against prompt injection?

Exogram detects injection patterns in tool call payloads as one of its 8 policy rules. But more importantly, even if injection succeeds at the model level, every proposed action is validated through deterministic policy evaluation before execution. A manipulated model still can't bypass the execution boundary.