Table of Contents

Agent Red Teaming

As LLM agents are deployed in multi-agent systems and web automation, new attack surfaces emerge beyond traditional prompt injection. This page covers red-teaming of multi-agent communication, evolutionary attacks on web agents, and systematic penetration testing of LLM systems.

The Multi-Agent Attack Surface

Multi-agent LLM systems introduce vulnerabilities absent from single-agent deployments:

Agent-in-the-Middle (AiTM): Communication Attacks

Red-Teaming LLM Multi-Agent Systems via Communication Attacks (He et al., 2025, arXiv:2502.14847) introduces the Agent-in-the-Middle (AiTM) attack, which targets the communication layer of LLM-based multi-agent systems.

Attack Model

AiTM intercepts and manipulates inter-agent messages without directly compromising individual agents. This mirrors network man-in-the-middle attacks but operates at the semantic level:

LLM-Powered Adversary

The adversarial agent uses:

Topologies Tested

Key Findings

Genesis: Evolutionary Attacks on Web Agents

Genesis (Zhang et al., 2025, arXiv:2510.18314) proposes an evolutionary framework for discovering and evolving attack strategies against LLM web agents.

Three-Module Architecture

Evolutionary Attack Process

  1. Initial population of attack strategies is seeded
  2. Attacker generates adversarial web content using current strategies
  3. Scorer evaluates whether the target agent was successfully misled
  4. Strategist analyzes successful attacks to extract generalizable patterns
  5. New strategies are added to the library and deployed in subsequent generations
  6. The attack evolves continuously, discovering novel strategies that static methods miss

Key Results

LLM Penetration Testing: Excalibur

What Makes a Good LLM Agent for Real-world Penetration Testing? (Deng et al., 2026, arXiv:2602.17622) analyzes 28 LLM-based pentesting systems and identifies fundamental failure modes.

Two Failure Modes

Root Cause: Missing Difficulty Estimation

Type B failures share a common root cause: agents cannot estimate task difficulty in real-time. Consequences:

Excalibur Architecture

Results

Code Example

# Genesis-style evolutionary red teaming (simplified)
import random
 
class EvolutionaryRedTeam:
    def __init__(self, attacker_llm, scorer_llm, strategist_llm):
        self.attacker = attacker_llm
        self.scorer = scorer_llm
        self.strategist = strategist_llm
        self.strategy_library = []
 
    def evolve_attacks(self, target_agent, web_task, generations=20, pop_size=10):
        # Initialize population with seed strategies
        population = self._seed_strategies(web_task, pop_size)
 
        for gen in range(generations):
            # Generate adversarial injections
            attacks = [self.attacker.generate(s, web_task) for s in population]
 
            # Score attacks against target
            scores = [self.scorer.evaluate(target_agent, a, web_task) for a in attacks]
 
            # Extract successful patterns
            successful = [(a, s) for a, s in zip(attacks, scores) if s > 0.5]
            if successful:
                new_strategies = self.strategist.extract_patterns(successful)
                self.strategy_library.extend(new_strategies)
 
            # Evolutionary selection + mutation
            population = self._evolve(population, scores)
 
        return self.strategy_library
 
    def _evolve(self, population, scores):
        # Tournament selection + crossover + mutation
        sorted_pop = sorted(zip(population, scores), key=lambda x: -x[1])
        elite = [p for p, _ in sorted_pop[:len(population)//2]]
        offspring = [self._mutate(random.choice(elite)) for _ in range(len(population)//2)]
        return elite + offspring
 
    def _mutate(self, strategy):
        return self.attacker.mutate(strategy, self.strategy_library)

Attack Taxonomy

Attack Type Target Method Defender Awareness
AiTM Communication Multi-agent messages Semantic interception Low (messages appear normal)
Genesis Web Injection Web agent actions Evolutionary adversarial content Adaptive (evolves past defenses)
Excalibur Pentesting System vulnerabilities Difficulty-aware tree search N/A (offensive tool)

References

See Also