Table of Contents

Agent Threat Modeling

Agent threat modeling is the systematic analysis of security vulnerabilities in LLM-based autonomous agents. As agents gain capabilities to execute code, access tools, and interact with external systems, they introduce novel attack surfaces that extend far beyond traditional prompt injection. The OWASP Top 10 for Agentic Applications (2026) and research by Schneier et al. frame these as multi-stage “Promptware Kill Chains” that hijack planning, tools, and propagation across systems.

Prompt Injection Chains

In agentic systems, prompt injections evolve from isolated manipulations into coordinated multi-tool, multi-step attacks:

The Promptware Kill Chain (Schneier et al., 2026) models five stages of agentic prompt injection attacks:

  1. Initial Access — Injection via user input, poisoned RAG data, emails, or web content
  2. Privilege Escalation — Exploiting agent tool permissions to gain broader system access
  3. Execution — Triggering unintended tool calls, code execution, or data modifications
  4. Persistence — Embedding malicious instructions in agent memory or external stores
  5. Propagation — Spreading compromised instructions to other agents or downstream systems

Tool Misuse

Agents inherit user privileges for tools, creating dangerous attack vectors:

Data Exfiltration

Compromised agents can leak sensitive data through multiple channels:

Supply-Chain Attacks

Agent supply chains introduce multiple points of compromise:

Mitigations

Defense-in-depth strategies for securing LLM agents:

Input/Output Validation:

Tool Sandboxing and Privilege Minimization:

Goal-Lock and Human-in-the-Loop:

Monitoring and Detection:

# Example: Agent threat detection middleware
class AgentSecurityMiddleware:
    def __init__(self, policy):
        self.policy = policy
        self.injection_detector = InjectionClassifier()
        self.anomaly_detector = BehaviorAnomalyDetector()
 
    def validate_tool_call(self, agent_id, tool_name, arguments):
        """Validate a tool call before execution."""
        # Check tool is in agent's allowlist
        if tool_name not in self.policy.allowed_tools(agent_id):
            raise SecurityViolation(f"Unauthorized tool: {tool_name}")
 
        # Scan arguments for injection attempts
        if self.injection_detector.scan(str(arguments)):
            raise SecurityViolation("Potential injection in tool args")
 
        # Check for anomalous behavior patterns
        if self.anomaly_detector.is_anomalous(agent_id, tool_name):
            self.escalate_to_human(agent_id, tool_name, arguments)
 
        return True  # Allow execution

References

See Also