LLM-as-Judge
Definition
A pattern where one language model is used to evaluate the output of another language model. The "judge" model checks for quality, accuracy, safety, or policy compliance. While convenient, LLM-as-judge has fundamental limitations: the judge model can itself hallucinate, has inherent error rates, is susceptible to adversarial inputs, and provides probabilistic rather than deterministic decisions.
Why It Matters
Using a probabilistic system to validate another probabilistic system creates compound uncertainty. If the judge model has a 5% error rate and the producer model has a 5% error rate, the combined false negative rate can be much higher. For safety-critical applications — database writes, billing modifications, compliance actions — probabilistic validation is insufficient.
How Exogram Addresses This
Exogram replaces LLM-as-judge with deterministic code-based policy evaluation. No probability, no error rate, no hallucination risk. The policy engine runs Python logic gates, not model inference. 0.07ms vs 50-200ms for LLM-based validation. "Using a slot machine to guard a bank vault" — that's LLM-as-judge. Exogram is the vault door.
Related Terms
Key Takeaways
- → This concept is part of the broader AI governance landscape
- → Production AI requires multiple layers of protection
- → Deterministic enforcement provides zero-error-rate guarantees