====== Agent Threat Modeling ======
**Agent threat modeling** is the systematic analysis of security vulnerabilities in LLM-based [[autonomous_agents|autonomous agents]].(("Agent Threat Modeling for LLM Systems." [[https://arxiv.org/abs/2603.11619|arXiv:2603.11619]])) As agents gain capabilities to execute code, access tools, and interact with external systems, they introduce novel attack surfaces that extend far beyond traditional prompt injection. The OWASP Top 10 for Agentic Applications (2026) and research by Schneier et al. frame these as multi-stage "Promptware Kill Chains" that hijack planning, tools, and propagation across systems.

===== Prompt Injection Chains =====
In agentic systems, prompt injections evolve from isolated manipulations into coordinated multi-tool, multi-step attacks:

  * **Direct injection** — Malicious instructions embedded in user inputs that subvert agent behavior
  * **Indirect injection** — Commands hidden in external data sources (documents, API responses, emails, web pages) that agents process without adequate validation
  * **Multi-chain injection** — "Russian doll" attacks where nested injections propagate across multiple LLM chains in a workflow, each injection activating the next
  * **Memory poisoning** — Injections that persist in agent memory or conversation history, affecting future interactions
  * **Recency bias exploitation** — Adversarial instructions placed late in context windows to override earlier legitimate instructions

The **Promptware Kill Chain** (Schneier et al., 2026) models five stages of agentic prompt injection attacks:(("Prompt Injection: Agentic Amplification." [[https://christian-schneider.net/blog/prompt-injection-agentic-amplification/|christian-schneider.net]]))

  - **Initial Access** — Injection via user input, poisoned RAG data, emails, or web content
  - **Privilege Escalation** — Exploiting agent tool permissions to gain broader system access
  - **Execution** — Triggering unintended tool calls, code execution, or data modifications
  - **Persistence** — Embedding malicious instructions in agent memory or external stores
  - **Propagation** — Spreading compromised instructions to other agents or downstream systems

===== Tool Misuse =====
Agents inherit user privileges for tools, creating dangerous attack vectors:(("Security Analysis of Agentic AI." [[https://arxiv.org/abs/2603.12230|arXiv:2603.12230]]))

  * **Excessive agency** — Over-privileged agents with more tool access than tasks require, enabling injected prompts to trigger actions like remote code execution, SSRF, or SQL injection
  * **Tool chaining attacks** — Attackers exploit the agent's planning capability to sequence legitimate tool calls in harmful ways (e.g., read credentials then exfiltrate via HTTP)
  * **Plugin vulnerabilities** — Third-party tool integrations ([[langchain|LangChain]] plugins, API connectors) that lack input validation or have their own security flaws
  * **Iterative refinement** — Agents that retry and adjust tool calls may be manipulated into gradually escalating harmful behavior across multiple turns

===== Data Exfiltration =====
Compromised agents can leak sensitive data through multiple channels:(("AI Agent Prompt Injection." [[https://unit42.paloaltonetworks.com/ai-agent-prompt-injection/|Palo Alto Unit 42]]))

  * **Direct exfiltration** — Using network tools to send data to attacker-controlled endpoints
  * **Steganographic leakage** — Encoding secrets into seemingly innocuous agent outputs (e.g., markdown images with data in URLs)
  * **Inter-agent propagation** — In [[multi_agent_systems|multi-agent systems]], compromised agents passing sensitive data to other agents that have external communication capabilities
  * **Side-channel leakage** — Information leaking through timing, error messages, or behavioral patterns

===== Supply-Chain Attacks =====
Agent supply chains introduce multiple points of compromise:(("Securing LLM Systems Against Prompt Injection." [[https://developer.[[nvidia|nvidia]].com/blog/securing-llm-systems-against-prompt-injection/|NVIDIA Developer Blog]]))

  * **Poisoned tool descriptions** — Malicious instructions embedded in tool/function documentation that agents read during planning
  * **Compromised RAG corpora** — Adversarial content injected into retrieval databases that agents consult for knowledge
  * **Malicious API responses** — Third-party APIs returning crafted responses designed to hijack agent behavior
  * **Model supply chain** — Backdoored fine-tuned models or adapters that activate under specific conditions
  * **Inter-agent message poisoning** — Compromised agents in [[multi_agent_systems|multi-agent systems]] sending malicious instructions disguised as legitimate coordination

===== Mitigations =====
Defense-in-depth strategies for securing LLM agents:

**Input/Output Validation:**
  * Sanitize all input sources: prompts, RAG results, tool outputs, API responses, inter-agent messages
  * Deploy classifiers (e.g., LLM Guard) to detect injection attempts
  * Apply semantic checks for instruction-like content in data fields
  * Enforce length, format, and content-type constraints

**Tool Sandboxing and Privilege Minimization:**
  * Grant least-privilege access — agents should only access tools needed for the current task
  * Validate all tool calls before execution against an allowlist of permitted operations
  * Implement resource quotas and rate limiting on tool usage

**Goal-Lock and [[human_in_the_loop|Human-in-the-Loop]]:**
  * Enforce immutable task goals that cannot be overridden by injected instructions
  * Require human approval for high-impact actions (financial transactions, data deletion, credential access)
  * Implement "break glass" kill switches for emergency agent termination

**Monitoring and Detection:**
  * Continuous behavioral monitoring for anomalous tool usage patterns
  * Multi-chain injection detectors that identify coordinated attack sequences
  * Provenance tracking for all data flowing through agent systems

<code python>
# Example: Agent threat detection middleware
class AgentSecurityMiddleware:
    def __init__(self, policy):
        self.policy = policy
        self.injection_detector = InjectionClassifier()
        self.anomaly_detector = BehaviorAnomalyDetector()

    def validate_tool_call(self, agent_id, tool_name, arguments):
        """Validate a tool call before execution."""
        # Check tool is in agent's allowlist
        if tool_name not in self.policy.allowed_tools(agent_id):
            raise SecurityViolation(f"Unauthorized tool: {tool_name}")

        # Scan arguments for injection attempts
        if self.injection_detector.scan(str(arguments)):
            raise SecurityViolation("Potential injection in tool args")

        # Check for anomalous behavior patterns
        if self.anomaly_detector.is_anomalous(agent_id, tool_name):
            self.escalate_to_human(agent_id, tool_name, arguments)

        return True  # Allow execution
</code> (([[https://arxiv.org/abs/2603.11619|Agent Threat Modeling for LLM Systems (arXiv:2603.11619]])) (([[https://arxiv.org/abs/2603.12230|Security Analysis of Agentic AI (arXiv:2603.12230]])) (([[https://christian-schneider.net/blog/prompt-injection-agentic-amplification/|Prompt Injection: Agentic Amplification — Schneier et al.]])) (([[https://developer.[[nvidia|nvidia]])).com/blog/securing-llm-systems-against-prompt-injection/|Securing LLM Systems Against Prompt Injection — NVIDIA]])) (([[https://unit42.paloaltonetworks.com/ai-agent-prompt-injection/|AI Agent Prompt Injection — Palo Alto Unit 42]]))

===== See Also =====
  * [[agent_resource_management|Agent Resource Management: AgentRM]]
  * [[agent_red_teaming|Agent Red Teaming]]
  * [[ai_agent_security|AI Agent Security]]
  * [[taskflow_agent|Taskflow Agent]]
  * [[agentless|Agentless]]

===== References =====