Prompt Injection

Definition

An attack technique where malicious instructions are embedded in user inputs, data sources, or retrieved context to override an AI model's original instructions. Direct prompt injection targets the user's input. Indirect prompt injection plants malicious instructions in external data sources (websites, documents, emails) that the model retrieves and follows.

Why It Matters

Prompt injection can cause AI agents to: bypass safety filters, exfiltrate sensitive data, execute unauthorized tool calls, modify system configurations, and ignore established constraints. As AI agents gain tool-use capabilities, prompt injection becomes a direct attack vector for code execution, database manipulation, and unauthorized API access.

How Exogram Addresses This

Exogram detects prompt injection patterns in tool call payloads as one of its 8 policy rules. But more importantly, even if injection succeeds at the model level, the execution boundary still validates every proposed action — a manipulated model still can't execute destructive operations without passing deterministic policy evaluation.

critical severityProduction Risk Level

Key Takeaways

→ Prompt injection is the #1 AI vulnerability — and it gets worse with tool use
→ Filtering inputs reduces risk but cannot eliminate it
→ Execution governance (Exogram) is the last line of defense when injection succeeds
→ Indirect injection through RAG/data sources is the emerging threat vector

Governance Checklist

0/6 — Vulnerable

Agent tool calls are validated independently of model intentSystem prompts are isolated from user-controlled dataRetrieved documents are treated as untrusted inputDestructive operations require additional verification beyond model reasoningInjection detection is layered (input filtering + execution governance)Red team testing includes indirect injection via RAG pipelines

Quick Assessment

1. Which type of prompt injection is harder to detect?

Frequently Asked Questions

Try the Proving Ground 2-Minute Quickstart →