Table of Contents

Sequential Tool Attack Chaining

STAC (Sequential Tool Attack Chaining) is a multi-turn attack framework targeting tool-enabled LLM agents, where sequences of individually benign tool calls combine to produce harmful operations that only become apparent at the final step. With 483 systematically generated attack cases and an average attack success rate exceeding 90%, STAC reveals a fundamental security blind spot in current agent architectures: per-call safety checks cannot detect threats that emerge from the cumulative effect of tool sequences.

graph TD B1[Benign Call 1] --> B2[Benign Call 2] B2 --> B3[Benign Call 3] B3 --> COMBINE[Combined Effect] COMBINE --> HARM[Harmful Action] style B1 fill:#90EE90 style B2 fill:#90EE90 style B3 fill:#90EE90 style HARM fill:#FF6B6B

Background

As LLM agents gain access to real-world tools (file systems, APIs, databases, code execution), security research has focused primarily on prompt injection and single-turn jailbreaks. STAC identifies a fundamentally different threat class: distributed malicious intent across multiple tool calls, where each individual call passes safety filters but the sequence achieves harmful outcomes.

Unlike text-based jailbreaks, STAC attacks cause real environmental changes – file deletions, unauthorized access, data exfiltration – making impacts severe and difficult to reverse.

Threat Model

The attack operates under a realistic threat model:

The key insight is that malicious intent is factored across time steps:

$$\text{Intent}(a_1, a_2, \ldots, a_n) \neq \sum_{i=1}^{n} \text{Intent}(a_i)$$

where each action $a_i$ has benign individual intent, but the composed sequence achieves a harmful goal. This is analogous to emergent behavior in complex systems – the whole exceeds the sum of its parts.

Attack Pipeline

STAC uses a closed-loop automated pipeline with three stages:

1. Tool Chain Generation

Generate sequences of 2-6 tool calls where:

2. Verification

Execute the generated chains in target environments:

3. Prompt Engineering

Reverse-engineer natural-sounding multi-turn prompts:

Attack Categories

The 483 attack cases span diverse harmful outcomes:

Formal Analysis

The attack success rate (ASR) can be modeled as:

$$\text{ASR} = P\left(\bigcap_{i=1}^{n} \text{pass}(a_i)\right) \cdot P\left(\text{harmful}(a_1, \ldots, a_n) | \bigcap_{i=1}^{n} \text{pass}(a_i)\right)$$

Since each $P(\text{pass}(a_i)) \approx 1$ for benign-appearing calls, and the conditional probability of harm given passage is high by construction, the overall ASR remains above 90%.

The defense gap can be quantified. A per-call safety filter with false negative rate $\epsilon$ per call has cumulative detection probability:

$$P(\text{detect sequence}) = 1 - \prod_{i=1}^{n} (1 - \epsilon_i) \approx 0$$

when $\epsilon_i \approx 0$ for each benign-appearing call.

Code Example

from dataclasses import dataclass
 
@dataclass
class ToolCall:
    name: str
    args: dict
    appears_benign: bool = True
 
def detect_stac_pattern(tool_history):
    # Detect potential STAC attacks by analyzing tool call sequences.
    # Returns risk score [0, 1] based on sequential pattern analysis.
    risk_score = 0.0
    info_names = {"read_file", "list_dir", "get_permissions"}
    action_names = {"write_file", "delete", "http_request"}
 
    info_calls = [t for t in tool_history if t.name in info_names]
    action_calls = [t for t in tool_history if t.name in action_names]
 
    if info_calls and action_calls:
        # Escalation pattern: info gathering followed by destructive action
        info_idx = max(tool_history.index(t) for t in info_calls)
        action_idx = min(tool_history.index(t) for t in action_calls)
        if info_idx < action_idx:
            risk_score += 0.4
 
    # Check for data flow between calls
    for i in range(1, len(tool_history)):
        prev_out = tool_history[i - 1].args.get("output_path", "")
        curr_in = tool_history[i].args.get("input_path", "")
        if prev_out and prev_out == curr_in:
            risk_score += 0.2
 
    return min(risk_score, 1.0)

Defense Recommendations

The authors identify the need for sequence-level safety analysis:

Results

References

See Also