Differences

This shows you the differences between two versions of the page.

--- why_is_my_agent_hallucinating [2026/03/25 15:33] – Create troubleshooting guide: agent hallucination causes, diagnostics, and fixes agent
+++ why_is_my_agent_hallucinating [2026/03/30 22:39] (current) – Restructure: footnotes as references agent
@@ Line 5: / Line 5: @@
 ===== Understanding Agent Hallucination =====
-Unlike simple LLM hallucination, **agent hallucination** compounds across tool calls, planning steps, and multi-turn interactions. A 2024 study from the Chinese Academy of Sciences cataloged agent-specific hallucination taxonomies, finding that agents suffer from unique failure modes beyond base model confabulation.
+Unlike simple LLM hallucination, **agent hallucination** compounds across tool calls, planning steps, and multi-turn interactions. A 2024 study from the Chinese Academy of Sciences cataloged agent-specific hallucination taxonomies, finding that agents suffer from unique failure modes beyond base model confabulation.(([[https://arxiv.org/html/2509.18970v1|Lin et al., "LLM-based Agents Suffer from Hallucinations: A Survey," arXiv 2025]]))
 **Key statistics:**
-  * Base LLMs hallucinate at least 20% on rare facts (OpenAI, 2025)
+  * Base LLMs hallucinate at least 20% on rare facts(([[https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf|OpenAI, "Why Language Models Hallucinate," 2025]]))
   * Clinical QA systems showed 63% hallucination rate without grounding, dropping to 1.7% with ontology grounding (Votek, 2025)
   * ~50% of hallucinations recur on repeated prompts; 60% resurface within 10 retries (Trends Research, 2024)
@@ Line 17: / Line 17: @@
 ==== 1. Tool Result Misinterpretation ====
-Agents parse tool outputs incorrectly, fabricating details from ambiguous or noisy data. A Stanford study on legal RAG tools found agents frequently hallucinate by being unfaithful to retrieved data.
+Agents parse tool outputs incorrectly, fabricating details from ambiguous or noisy data. A Stanford study on legal RAG tools found agents frequently hallucinate by being unfaithful to retrieved data.(([[https://dho.stanford.edu/wp-content/uploads/Legal_RAG_Hallucinations.pdf|Stanford Digital Humanities, "Legal RAG Hallucinations," 2024]]))
 **Symptoms:** Agent cites specific numbers or facts that don't appear in tool output. Confident answers that contradict the data returned.
@@ Line 41: / Line 41: @@
 ==== 5. Exposure Bias (Snowball Effect) ====
-Autoregressive generation means early errors cascade — each wrong token increases the probability of subsequent wrong tokens.
+Autoregressive generation means early errors cascade — each wrong token increases the probability of subsequent wrong tokens.(([[https://www.ox.ac.uk/news/2024-06-20-major-research-hallucinating-generative-models-advances-reliability-artificial|Oxford University, "Major Research on Hallucinating Generative Models," 2024]]))
 **Symptoms:** Responses start correctly but drift into fabrication. Longer outputs are less accurate than shorter ones.
@@ Line 110: / Line 110: @@
 ==== Fix 2: Chain-of-Verification (CoVe) ====
-The model drafts a response, generates verification questions, answers them independently, then produces a final verified response. Published at ACL 2024 by Meta AI and ETH Zurich (Dhuliawala et al.).
+The model drafts a response, generates verification questions, answers them independently, then produces a final verified response. Published at ACL 2024 by Meta AI and ETH Zurich (Dhuliawala et al.).(([[https://aclanthology.org/2024.findings-acl.212.pdf|Dhuliawala et al., "Chain-of-Verification Reduces Hallucination in Large Language Models," ACL Findings 2024]]))
 <code python>
@@ Line 270: / Line 270: @@
 print(f"Hallucination rate: {result['hallucination_rate']:.0%}")
 </code>
-===== References =====
-  * Dhuliawala et al., "Chain-of-Verification Reduces Hallucination in Large Language Models," ACL Findings 2024 — [[https://aclanthology.org/2024.findings-acl.212.pdf]]
-  * Lin et al., "LLM-based Agents Suffer from Hallucinations: A Survey," arXiv 2025 — [[https://arxiv.org/html/2509.18970v1]]
-  * OpenAI, "Why Language Models Hallucinate," 2025 — [[https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf]]
-  * Oxford University, "Major Research on Hallucinating Generative Models," 2024 — [[https://www.ox.ac.uk/news/2024-06-20-major-research-hallucinating-generative-models-advances-reliability-artificial]]
-  * Stanford Digital Humanities, "Legal RAG Hallucinations," 2024 — [[https://dho.stanford.edu/wp-content/uploads/Legal_RAG_Hallucinations.pdf]]
 ===== See Also =====
@@ Line 284: / Line 276: @@
   * [[common_agent_failure_modes|Common Agent Failure Modes]]
   * [[how_to_handle_rate_limits|How to Handle Rate Limits]]
+===== References =====

AI Agent Knowledge Base

User Tools

Site Tools

Differences

Page Tools