Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
As LLM agents are deployed in multi-agent systems and web automation, new attack surfaces emerge beyond traditional prompt injection. This page covers red-teaming of multi-agent communication, evolutionary attacks on web agents, and systematic penetration testing of LLM systems.
Multi-agent LLM systems introduce vulnerabilities absent from single-agent deployments:
Red-Teaming LLM Multi-Agent Systems via Communication Attacks (He et al., 2025, arXiv:2502.14847) introduces the Agent-in-the-Middle (AiTM) attack, which targets the communication layer of LLM-based multi-agent systems.
AiTM intercepts and manipulates inter-agent messages without directly compromising individual agents. This mirrors network man-in-the-middle attacks but operates at the semantic level:
The adversarial agent uses:
Genesis (Zhang et al., 2025, arXiv:2510.18314) proposes an evolutionary framework for discovering and evolving attack strategies against LLM web agents.
What Makes a Good LLM Agent for Real-world Penetration Testing? (Deng et al., 2026, arXiv:2602.17622) analyzes 28 LLM-based pentesting systems and identifies fundamental failure modes.
Type B failures share a common root cause: agents cannot estimate task difficulty in real-time. Consequences:
# Genesis-style evolutionary red teaming (simplified) import random class EvolutionaryRedTeam: def __init__(self, attacker_llm, scorer_llm, strategist_llm): self.attacker = attacker_llm self.scorer = scorer_llm self.strategist = strategist_llm self.strategy_library = [] def evolve_attacks(self, target_agent, web_task, generations=20, pop_size=10): # Initialize population with seed strategies population = self._seed_strategies(web_task, pop_size) for gen in range(generations): # Generate adversarial injections attacks = [self.attacker.generate(s, web_task) for s in population] # Score attacks against target scores = [self.scorer.evaluate(target_agent, a, web_task) for a in attacks] # Extract successful patterns successful = [(a, s) for a, s in zip(attacks, scores) if s > 0.5] if successful: new_strategies = self.strategist.extract_patterns(successful) self.strategy_library.extend(new_strategies) # Evolutionary selection + mutation population = self._evolve(population, scores) return self.strategy_library def _evolve(self, population, scores): # Tournament selection + crossover + mutation sorted_pop = sorted(zip(population, scores), key=lambda x: -x[1]) elite = [p for p, _ in sorted_pop[:len(population)//2]] offspring = [self._mutate(random.choice(elite)) for _ in range(len(population)//2)] return elite + offspring def _mutate(self, strategy): return self.attacker.mutate(strategy, self.strategy_library)
| Attack Type | Target | Method | Defender Awareness |
|---|---|---|---|
| AiTM Communication | Multi-agent messages | Semantic interception | Low (messages appear normal) |
| Genesis Web Injection | Web agent actions | Evolutionary adversarial content | Adaptive (evolves past defenses) |
| Excalibur Pentesting | System vulnerabilities | Difficulty-aware tree search | N/A (offensive tool) |