Constitutional AI

Definition

A training methodology developed by Anthropic where AI models are trained to follow a set of principles (a "constitution") that guides their behavior. During training, the model critiques and revises its own outputs according to these principles. Constitutional AI shapes model behavior through training-time alignment — it reduces the probability of harmful outputs but does not eliminate them.

Why It Matters

Constitutional AI is a significant advance in AI safety, but it is probabilistic — not guaranteed. The constitution shapes intent, but it cannot enforce boundaries. A constitutionally-trained model can still hallucinate schemas, forget constraints, and propose destructive mutations. Training-time alignment is necessary but not sufficient for production safety.

How Exogram Addresses This

Constitutional AI shapes intent. Exogram enforces boundaries. One is training. The other is infrastructure. They are complementary: use Constitutional AI to reduce harmful intent, use Exogram to prevent harmful actions. Intent shaping + execution governance = defense in depth.

Related Terms

medium severityProduction Risk Level

Key Takeaways

  • This concept is part of the broader AI governance landscape
  • Production AI requires multiple layers of protection
  • Deterministic enforcement provides zero-error-rate guarantees

Governance Checklist

0/4Vulnerable

Frequently Asked Questions