STAC (Sequential Tool Attack Chaining) is a multi-turn attack framework targeting tool-enabled LLM agents, where sequences of individually benign tool calls combine to produce harmful operations that only become apparent at the final step. With 483 systematically generated attack cases and an average attack success rate exceeding 90%, STAC reveals a fundamental security blind spot in current agent architectures: per-call safety checks cannot detect threats that emerge from the cumulative effect of tool sequences.
As LLM agents gain access to real-world tools (file systems, APIs, databases, code execution), security research has focused primarily on prompt injection and single-turn jailbreaks. STAC identifies a fundamentally different threat class: distributed malicious intent across multiple tool calls, where each individual call passes safety filters but the sequence achieves harmful outcomes.
Unlike text-based jailbreaks, STAC attacks cause real environmental changes – file deletions, unauthorized access, data exfiltration – making impacts severe and difficult to reverse.
The attack operates under a realistic threat model:
The key insight is that malicious intent is factored across time steps:
$$\text{Intent}(a_1, a_2, \ldots, a_n) \neq \sum_{i=1}^{n} \text{Intent}(a_i)$$
where each action $a_i$ has benign individual intent, but the composed sequence achieves a harmful goal. This is analogous to emergent behavior in complex systems – the whole exceeds the sum of its parts.
STAC uses a closed-loop automated pipeline with three stages:
Generate sequences of 2-6 tool calls where:
Execute the generated chains in target environments:
Reverse-engineer natural-sounding multi-turn prompts:
The 483 attack cases span diverse harmful outcomes:
The attack success rate (ASR) can be modeled as:
$$\text{ASR} = P\left(\bigcap_{i=1}^{n} \text{pass}(a_i)\right) \cdot P\left(\text{harmful}(a_1, \ldots, a_n) | \bigcap_{i=1}^{n} \text{pass}(a_i)\right)$$
Since each $P(\text{pass}(a_i)) \approx 1$ for benign-appearing calls, and the conditional probability of harm given passage is high by construction, the overall ASR remains above 90%.
The defense gap can be quantified. A per-call safety filter with false negative rate $\epsilon$ per call has cumulative detection probability:
$$P(\text{detect sequence}) = 1 - \prod_{i=1}^{n} (1 - \epsilon_i) \approx 0$$
when $\epsilon_i \approx 0$ for each benign-appearing call.
from dataclasses import dataclass @dataclass class ToolCall: name: str args: dict appears_benign: bool = True def detect_stac_pattern(tool_history): # Detect potential STAC attacks by analyzing tool call sequences. # Returns risk score [0, 1] based on sequential pattern analysis. risk_score = 0.0 info_names = {"read_file", "list_dir", "get_permissions"} action_names = {"write_file", "delete", "http_request"} info_calls = [t for t in tool_history if t.name in info_names] action_calls = [t for t in tool_history if t.name in action_names] if info_calls and action_calls: # Escalation pattern: info gathering followed by destructive action info_idx = max(tool_history.index(t) for t in info_calls) action_idx = min(tool_history.index(t) for t in action_calls) if info_idx < action_idx: risk_score += 0.4 # Check for data flow between calls for i in range(1, len(tool_history)): prev_out = tool_history[i - 1].args.get("output_path", "") curr_in = tool_history[i].args.get("input_path", "") if prev_out and prev_out == curr_in: risk_score += 0.2 return min(risk_score, 1.0)
The authors identify the need for sequence-level safety analysis: