====== Software Testing Agents ======

LLM-powered agents for automated software test generation represent a paradigm shift from prompt-based code assistants to fully autonomous testing workflows. These agents interact directly with repositories to create, modify, and execute tests without manual developer intervention.

===== Overview =====

Agent-based coding tools have transformed how tests are written in real-world software projects. Unlike traditional prompt-based approaches that require developers to manually integrate generated code, agentic testing tools autonomously interact with repositories to handle the full test lifecycle: creation, modification, execution, and coverage analysis. Two key research threads have emerged: empirical studies of how AI agents generate tests in practice, and structural testing methodologies that adapt established software engineering practices to LLM-based agent architectures.

===== Agent-Driven Test Generation =====

The empirical study by Yoshimoto et al. (2026) analyzed 2,232 commits from the AIDev dataset containing test-related changes.((([[https://arxiv.org/abs/2603.13724|Yoshimoto et al. "Testing with AI Agents: An Empirical Study." arXiv:2603.13724, 2026.]]))) Key findings include:

  * **AI authorship rate**: AI agents authored 16.4% of all commits that added tests in real-world repositories
  * **Structural patterns**: AI-generated test methods are longer, have higher assertion density, and maintain lower cyclomatic complexity through linear logic
  * **Coverage parity**: AI-generated tests achieve code coverage comparable to human-written tests, frequently producing positive coverage gains

The assertion density metric can be expressed as:

<latex>D_a = rac{N_{assert}}{L_{method}}</latex>

where $N_{assert}$ is the number of assertions and $L_{method}$ is the lines of code in the test method. AI-generated tests consistently show higher $D_a$ values while maintaining lower cyclomatic complexity:

<latex>CC = E - N + 2P</latex>

where $E$ is edges, $N$ is nodes in the control flow graph, and $P$ is the number of connected components.

===== Structural Testing of LLM-Based Agents =====

The structural testing framework leverages three core technical components for deeper, automated evaluation:((([[https://arxiv.org/abs/2601.18827|"Automated Structural Testing of LLM-Based Agents." arXiv:2601.18827, 2025.]])))

  * **Traces (OpenTelemetry-based)**: Capture agent execution trajectories to record detailed paths through the system
  * **Mocking**: Enforce reproducible LLM behavior for deterministic, repeatable tests
  * **Assertions**: Automate test verification without manual evaluation

This approach adapts established software engineering practices to the agentic context:
  * Test automation pyramid applied to agent hierarchies
  * Regression testing across agent versions
  * Test-driven development for agent behaviors
  * Multi-language testing support

===== Code Example =====

<code python>
from opentelemetry import trace
from agent_test_framework import AgentTestCase, assert_coverage

class TestCodeGenAgent(AgentTestCase):
    def setUp(self):
        self.tracer = trace.get_tracer("agent-test")
        self.agent = CodeGenAgent(model="gpt-4o")

    def test_unit_test_generation(self):
        with self.tracer.start_as_current_span("test-gen"):
            result = self.agent.generate_tests(
                repo_path="/src/module.py",
                strategy="branch-coverage"
            )
        self.assertGreater(result.assertion_density, 0.15)
        self.assertLess(result.cyclomatic_complexity, 5)
        assert_coverage(result, min_branch=0.80)

    def test_regression_detection(self):
        baseline = self.agent.generate_tests(repo_path="/src/api.py")
        modified = self.agent.generate_tests(
            repo_path="/src/api.py",
            context="refactored error handling"
        )
        self.assertGreaterEqual(
            modified.coverage, baseline.coverage,
            "Regression: coverage decreased after refactor"
        )
</code>

===== Architecture =====

<mermaid>
graph TD
    A[Developer Commit] --> B[Agent Controller]
    B --> C[Test Planner Agent]
    C --> D[Code Analyzer]
    C --> E[Coverage Analyzer]
    D --> F[Test Generator Agent]
    E --> F
    F --> G[Test Executor]
    G --> H{Tests Pass?}
    H -->|Yes| I[Coverage Report]
    H -->|No| J[Repair Agent]
    J --> F
    I --> K[Commit Tests to Repo]
    K --> L[Regression Monitor]
    L -->|Drift Detected| C
</mermaid>

===== Key Metrics =====

^ Metric ^ AI-Generated ^ Human-Written ^
| Test method length | Longer | Shorter |
| Assertion density | Higher (D_a > 0.15) | Lower |
| Cyclomatic complexity | Lower (linear logic) | Higher |
| Branch coverage gain | Comparable | Comparable |
| Commit frequency | 16.4% of test commits | 83.6% |

===== See Also =====

  * [[competitive_programming_agents|Competitive Programming Agents]]
  * [[devops_incident_agents|DevOps Incident Agents]]
  * [[multi_hop_qa_agents|Multi-Hop QA Agents]]

===== References =====