Table of Contents

The Lethal Trifecta: When AI Agents Become Dangerous

The Lethal Trifecta is a security framework coined by Simon Willison in June 2025 that identifies the three conditions under which AI agent systems become critically vulnerable to prompt injection attacks. When all three conditions are present simultaneously, an attacker can trivially steal private data through the agent itself.

The Three Conditions

The lethal trifecta describes the convergence of three agent capabilities1):

  1. Access to private data – the most common purpose of agent tools in the first place
  2. Exposure to untrusted content – any mechanism by which text or images controlled by a malicious attacker could reach the LLM
  3. The ability to communicate externally – any channel that could be used to exfiltrate data to an attacker

As Willison explains: “If your agent combines these three features, an attacker can easily trick it into accessing your private data and sending it to that attacker.”2)

Why It Works

The fundamental problem is that LLMs follow instructions embedded in content. They cannot reliably distinguish between legitimate operator instructions and malicious instructions injected by an attacker. As Willison puts it: “The model does not inherently know the difference between a text comment and an actionable command.”3)

This creates a “confused deputy” scenario where the AI agent, acting with high privileges, faithfully executes injected instructions as though they came from a trusted source. The root cause is the same as SQL injection, XSS, and command injection: systems built through string concatenation that glue together trusted instructions and untrusted input4).

Real-World Examples

Supabase MCP / Cursor Incident

A proof-of-concept demonstrated the trifecta in action using the Supabase MCP (Model Context Protocol) integration with Cursor, a Claude-based IDE5):

All three trifecta elements were present: private database access (service_role key), untrusted input (customer tickets), and external communication (writing to tickets). As the analysis noted: “Astonishingly, no database permissions were violated: the agent just followed a prompt it should never have trusted.”6)

Cline / Clinejection

The Cline AI coding tool was compromised through a prompt injection in a GitHub issue title that exploited an AI-powered triage bot. The attack chain pivoted from the triage workflow to steal NPM publishing credentials, resulting in a malicious package release7). This demonstrated the trifecta: the agent had access to CI/CD secrets (private data), processed user-submitted issue titles (untrusted content), and could execute arbitrary commands (external communication).

Meta's Rule of Two

Meta AI security researchers formalized the trifecta concept into an “Agents Rule of Two,” proposing that agents must satisfy no more than two of the three properties within a session. If all three are required, the agent must not operate autonomously and requires human-in-the-loop approval or another reliable means of validation8).

Mitigations

See Also

References