====== Software Testing Agents ====== LLM-powered agents for automated software test generation represent a paradigm shift from prompt-based code assistants to fully autonomous testing workflows. These agents interact directly with repositories to create, modify, and execute tests without manual developer intervention. ===== Overview ===== Agent-based coding tools have transformed how tests are written in real-world software projects. Unlike traditional prompt-based approaches that require developers to manually integrate generated code, agentic testing tools autonomously interact with repositories to handle the full test lifecycle: creation, modification, execution, and coverage analysis. Two key research threads have emerged: empirical studies of how AI agents generate tests in practice, and structural testing methodologies that adapt established software engineering practices to LLM-based agent architectures. ===== Agent-Driven Test Generation ===== The empirical study by Yoshimoto et al. (2026) analyzed 2,232 commits from the AIDev dataset containing test-related changes.((([[https://arxiv.org/abs/2603.13724|Yoshimoto et al. "Testing with AI Agents: An Empirical Study." arXiv:2603.13724, 2026.]]))) Key findings include: * **AI authorship rate**: AI agents authored 16.4% of all commits that added tests in real-world repositories * **Structural patterns**: AI-generated test methods are longer, have higher assertion density, and maintain lower cyclomatic complexity through linear logic * **Coverage parity**: AI-generated tests achieve code coverage comparable to human-written tests, frequently producing positive coverage gains The assertion density metric can be expressed as: D_a = rac{N_{assert}}{L_{method}} where $N_{assert}$ is the number of assertions and $L_{method}$ is the lines of code in the test method. AI-generated tests consistently show higher $D_a$ values while maintaining lower cyclomatic complexity: CC = E - N + 2P where $E$ is edges, $N$ is nodes in the control flow graph, and $P$ is the number of connected components. ===== Structural Testing of LLM-Based Agents ===== The structural testing framework leverages three core technical components for deeper, automated evaluation:((([[https://arxiv.org/abs/2601.18827|"Automated Structural Testing of LLM-Based Agents." arXiv:2601.18827, 2025.]]))) * **Traces (OpenTelemetry-based)**: Capture agent execution trajectories to record detailed paths through the system * **Mocking**: Enforce reproducible LLM behavior for deterministic, repeatable tests * **Assertions**: Automate test verification without manual evaluation This approach adapts established software engineering practices to the agentic context: * Test automation pyramid applied to agent hierarchies * Regression testing across agent versions * Test-driven development for agent behaviors * Multi-language testing support ===== Code Example ===== from opentelemetry import trace from agent_test_framework import AgentTestCase, assert_coverage class TestCodeGenAgent(AgentTestCase): def setUp(self): self.tracer = trace.get_tracer("agent-test") self.agent = CodeGenAgent(model="gpt-4o") def test_unit_test_generation(self): with self.tracer.start_as_current_span("test-gen"): result = self.agent.generate_tests( repo_path="/src/module.py", strategy="branch-coverage" ) self.assertGreater(result.assertion_density, 0.15) self.assertLess(result.cyclomatic_complexity, 5) assert_coverage(result, min_branch=0.80) def test_regression_detection(self): baseline = self.agent.generate_tests(repo_path="/src/api.py") modified = self.agent.generate_tests( repo_path="/src/api.py", context="refactored error handling" ) self.assertGreaterEqual( modified.coverage, baseline.coverage, "Regression: coverage decreased after refactor" ) ===== Architecture ===== graph TD A[Developer Commit] --> B[Agent Controller] B --> C[Test Planner Agent] C --> D[Code Analyzer] C --> E[Coverage Analyzer] D --> F[Test Generator Agent] E --> F F --> G[Test Executor] G --> H{Tests Pass?} H -->|Yes| I[Coverage Report] H -->|No| J[Repair Agent] J --> F I --> K[Commit Tests to Repo] K --> L[Regression Monitor] L -->|Drift Detected| C ===== Key Metrics ===== ^ Metric ^ AI-Generated ^ Human-Written ^ | Test method length | Longer | Shorter | | Assertion density | Higher (D_a > 0.15) | Lower | | Cyclomatic complexity | Lower (linear logic) | Higher | | Branch coverage gain | Comparable | Comparable | | Commit frequency | 16.4% of test commits | 83.6% | ===== See Also ===== * [[competitive_programming_agents|Competitive Programming Agents]] * [[devops_incident_agents|DevOps Incident Agents]] * [[multi_hop_qa_agents|Multi-Hop QA Agents]] ===== References =====