Prompt Injection
Definition
An attack technique where malicious instructions are embedded in user inputs, data sources, or retrieved context to override an AI model's original instructions. Direct prompt injection targets the user's input. Indirect prompt injection plants malicious instructions in external data sources (websites, documents, emails) that the model retrieves and follows.
Why It Matters
Prompt injection can cause AI agents to: bypass safety filters, exfiltrate sensitive data, execute unauthorized tool calls, modify system configurations, and ignore established constraints. As AI agents gain tool-use capabilities, prompt injection becomes a direct attack vector for code execution, database manipulation, and unauthorized API access.
How Exogram Addresses This
Exogram detects prompt injection patterns in tool call payloads as one of its 8 policy rules. But more importantly, even if injection succeeds at the model level, the execution boundary still validates every proposed action — a manipulated model still can't execute destructive operations without passing deterministic policy evaluation.
Related Terms
Key Takeaways
- → Prompt injection is the #1 AI vulnerability — and it gets worse with tool use
- → Filtering inputs reduces risk but cannot eliminate it
- → Execution governance (Exogram) is the last line of defense when injection succeeds
- → Indirect injection through RAG/data sources is the emerging threat vector
Governance Checklist
0/6 — VulnerableQuick Assessment
1. Which type of prompt injection is harder to detect?