Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
This is an old revision of the document!
LLM Hallucination refers to the phenomenon where large language models generate content that is plausible-sounding but factually incorrect, internally inconsistent, or unfaithful to provided context. The comprehensive survey by Huang et al. (2023) establishes a systematic taxonomy of hallucination types, causes, detection methods, and mitigation strategies across the full LLM development lifecycle.
As LLMs are deployed in high-stakes applications (medicine, law, finance), hallucination represents a critical reliability challenge. Unlike traditional NLP errors that are often obviously wrong, LLM hallucinations are fluent and confident, making them particularly dangerous. The survey provides a unified framework for understanding and addressing this problem.
The survey distinguishes two primary categories:
These can further be classified as intrinsic (contradictions with training data) or extrinsic (contradictions with external facts not in training data).
The survey identifies causes across three levels of the LLM development cycle:
Hallucination can be formally characterized as a divergence between generated text $y$ and a reference knowledge set $K$:
$$\text{Hallucination}(y) = \{c \in \text{Claims}(y) : c \notin K \lor \neg\text{Verify}(c, K)\}$$
For faithfulness, the reference is the input context $x$:
$$\text{Faithfulness}(y, x) = 1 - \frac{|\{c \in \text{Claims}(y) : \text{Entailed}(c, x)\}|}{|\text{Claims}(y)|}$$
External knowledge sources verify model outputs against factual databases. The model's claims are extracted, relevant documents retrieved, and each claim checked for support. Limitations include incomplete knowledge bases and retrieval errors.
Expert annotation remains the gold standard but is expensive and not scalable. Used primarily for benchmark creation and method validation.
import openai from collections import Counter def detect_hallucination_self_consistency(query, client, n_samples=5, threshold=0.6): """Detect potential hallucinations via self-consistency checking.""" responses = [] for _ in range(n_samples): resp = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": query}], temperature=0.7 ) responses.append(resp.choices[0].message.content) # Extract claims from each response claims_per_response = [] for resp in responses: extract_prompt = ( f"Extract all factual claims from this text as a numbered list:\n{resp}" ) claims_raw = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": extract_prompt}], temperature=0 ).choices[0].message.content claims_per_response.append(claims_raw) # Check consistency across samples check_prompt = ( "Given these claim sets from multiple responses to the same query, " "identify claims that appear inconsistently (potential hallucinations):\n\n" + "\n---\n".join(claims_per_response) ) inconsistencies = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": check_prompt}], temperature=0 ).choices[0].message.content return {"responses": responses, "inconsistencies": inconsistencies}
Training reward models to penalize hallucinated content. Effective but can introduce its own biases – models may learn to hedge rather than be accurate.
Grounding model outputs in retrieved documents reduces factual hallucination by providing explicit evidence. However, RAG alone is insufficient when retrieval quality is poor or the question requires reasoning beyond retrieved facts.
Generating multiple candidate responses and selecting the most consistent answer. Related techniques include Chain-of-Verification (CoVe) and self-reflection prompting.
Requiring models to cite sources for claims, analogous to web search attribution. This makes hallucinations easier to detect and provides an accountability mechanism.
| Benchmark | Type | Description |
|---|---|---|
| TruthfulQA | Factuality | Tests whether models produce truthful answers to adversarial questions |
| HaluEval | Discrimination | Requires models to identify whether statements contain hallucinations |
| FACTOR | Likelihood | Tests whether models assign higher probability to factual vs. non-factual statements |
| FActScore | Atomic facts | Decomposes generations into atomic facts and verifies each against Wikipedia |