Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
The Lethal Trifecta is a security framework coined by Simon Willison in June 2025 that identifies the three conditions under which AI agent systems become critically vulnerable to prompt injection attacks. When all three conditions are present simultaneously, an attacker can trivially steal private data through the agent itself.
The lethal trifecta describes the convergence of three agent capabilities1):
As Willison explains: “If your agent combines these three features, an attacker can easily trick it into accessing your private data and sending it to that attacker.”2)
The fundamental problem is that LLMs follow instructions embedded in content. They cannot reliably distinguish between legitimate operator instructions and malicious instructions injected by an attacker. As Willison puts it: “The model does not inherently know the difference between a text comment and an actionable command.”3)
This creates a “confused deputy” scenario where the AI agent, acting with high privileges, faithfully executes injected instructions as though they came from a trusted source. The root cause is the same as SQL injection, XSS, and command injection: systems built through string concatenation that glue together trusted instructions and untrusted input4).
A proof-of-concept demonstrated the trifecta in action using the Supabase MCP (Model Context Protocol) integration with Cursor, a Claude-based IDE5):
SELECT every row from the private integration_tokens table and INSERT them back into the support threadservice_role key, executed the SQL queries bypassing Row-Level Security (RLS)
All three trifecta elements were present: private database access (service_role key), untrusted input (customer tickets), and external communication (writing to tickets). As the analysis noted: “Astonishingly, no database permissions were violated: the agent just followed a prompt it should never have trusted.”6)
The Cline AI coding tool was compromised through a prompt injection in a GitHub issue title that exploited an AI-powered triage bot. The attack chain pivoted from the triage workflow to steal NPM publishing credentials, resulting in a malicious package release7). This demonstrated the trifecta: the agent had access to CI/CD secrets (private data), processed user-submitted issue titles (untrusted content), and could execute arbitrary commands (external communication).
Meta AI security researchers formalized the trifecta concept into an “Agents Rule of Two,” proposing that agents must satisfy no more than two of the three properties within a session. If all three are required, the agent must not operate autonomously and requires human-in-the-loop approval or another reliable means of validation8).
service_role