Automated Program Repair

LLM-powered agents are revolutionizing automated program repair (APR) by moving beyond static code rewriting to interactive debugging, context-aware refinement, and specification-centric repair. This page covers InspectCoder's debugger collaboration, REFINE's patch refinement framework, and VibeRepair's specification-centric approach.

The APR Challenge

Automated program repair aims to automatically fix bugs in software given issue descriptions, codebases, and test suites. LLM-based APR faces several challenges:

Limited code context: Models struggle to understand large repository structures
Test suite overfitting: Patches may pass tests without actually fixing the underlying bug
Draft patches: Current methods frequently produce partially correct fixes that incompletely address bugs
Hallucinated fixes: Code-centric approaches risk generating behaviorally inconsistent patches

InspectCoder: Dynamic Analysis with Debugger Collaboration

InspectCoder (arXiv:2510.18327) is the first agentic program repair system that enables LLMs to actively conduct dynamic analysis through interactive debugger control.¹⁾)

Dual-Agent Architecture

Program Inspector: Controls program execution via InspectWare middleware, strategically places breakpoints, inspects runtime state, and makes incremental runtime experiments
Patch Coder: Leverages diagnostic insights from the Inspector to generate and verify code patches

Key Capabilities

Strategic breakpoint placement: The Inspector agent decides where to pause execution based on the bug hypothesis
Targeted state inspection: Examines variable values, stack traces, and memory state at breakpoints
Temporal, reversible perturbations: Modifies intermediate program states temporarily to test root cause hypotheses, providing immediate process reward signals
Adaptive inspection: Responds to runtime behavior dynamically rather than following fixed log collection procedures
Iterative patch verification: After patch generation, failing tests trigger return to the Inspector for further dynamic analysis

Results

BigCodeBench-R and LiveCodeBench-R: 5.10% - 60.37% relative improvement in repair accuracy over strongest baselines
Bug-fix efficiency: 1.67x - 2.24x superior to baselines

REFINE: Context-Aware Patch Refinement

REFINE (Pabba et al., 2025) transforms partially correct “Draft Patches” into correct ones through a systematic refinement framework.²⁾)

Three Key Challenges Addressed

Context disambiguation: Resolves vague issue descriptions and unclear code context by enriching the repair prompt with structured context
Candidate diversification: Uses test-time scaling to generate diverse patch candidates, increasing the probability of including a correct fix
Partial fix aggregation: An LLM-powered code review process combines insights from multiple partial fixes into a complete solution

Integration Architecture

REFINE is designed as a general refinement module that plugs into existing APR systems:

Works with open-agent-based systems (e.g., SWE-Agent)
Works with workflow-based systems (e.g., AutoCodeRover)
Adds refinement as a post-processing step to any base APR approach

Results

SWE-Bench Lite: Boosts AutoCodeRover by 14.67%, achieving 51.67% (SOTA among workflow-based approaches)
SWE-Bench Verified: 12.2% improvement in resolution rate
Approaches best-known performance across all APR categories

Specification Vibing (VibeRepair)

VibeRepair (Zhu et al., 2026) introduces a paradigm shift from code-centric repair to specification-centric repair, treating bug fixing as behavior-specification alignment rather than ad-hoc code editing.³⁾)

The Specification-Centric Approach

Code to Specification: Translates buggy code into a structured behavior specification capturing intended runtime behavior
Specification Repair: Infers and repairs misalignments in the specification (not the code)
Specification to Code: Synthesizes corrected code strictly guided by the repaired behavior specification

On-Demand Reasoning

For difficult cases, an enrichment component provides:

Program analysis insights
Historical bug-fix evidence from similar patterns
Cost-controlled reasoning that activates only when needed

Results

Defects4J v1.2: 174 bugs repaired (19% improvement over strongest baseline, +28 bugs)
Defects4J v2.0: 178 bugs repaired (23% improvement, +33 bugs)
Significantly smaller patch space than code-centric approaches
Generalizes to real-world benchmarks collected after the LLM training period

Code Example

# InspectCoder-style debugger-driven APR (simplified)
class InspectCoderAgent:
    def __init__(self, inspector_llm, coder_llm, debugger):
        self.inspector = inspector_llm
        self.coder = coder_llm
        self.debugger = debugger
 
    def repair(self, buggy_code, test_suite, max_iterations=5):
        for iteration in range(max_iterations):
            # Phase 1: Dynamic analysis via Program Inspector
            diagnosis = self.inspect(buggy_code, test_suite)
 
            # Phase 2: Patch generation via Patch Coder
            patch = self.coder.generate_patch(buggy_code, diagnosis)
 
            # Phase 3: Verification
            results = test_suite.run(patch)
            if results.all_pass:
                return patch
            # Feed failing test back to Inspector
            buggy_code = patch  # Refine from current best
 
        return None  # Could not repair within budget
 
    def inspect(self, code, test_suite):
        failing_test = test_suite.get_first_failing()
        # Inspector decides breakpoint strategy
        breakpoints = self.inspector.plan_breakpoints(code, failing_test)
        self.debugger.set_breakpoints(breakpoints)
 
        # Run under debugger and collect state
        states = self.debugger.run(code, failing_test)
 
        # Inspector analyzes runtime state
        root_cause = self.inspector.analyze(states, code, failing_test)
 
        # Optional: perturbation experiment
        if root_cause.confidence < 0.7:
            perturbed = self.inspector.perturb_state(states, root_cause.hypothesis)
            root_cause = self.inspector.refine_hypothesis(perturbed)
 
        return root_cause

Comparison of APR Approaches

Method	Paradigm	Benchmark	Key Result	Innovation
InspectCoder	Dynamic analysis	BigCodeBench-R	+60% relative improvement	Debugger collaboration
REFINE	Patch refinement	SWE-Bench Lite	51.67% resolution	Draft patch aggregation
VibeRepair	Specification-centric	Defects4J v1.2	174 bugs (+19%)	Behavior specification repair

References

¹⁾

https://arxiv.org/abs/2510.18327|InspectCoder: Dynamic Analysis-Enabled LLM Self-Repair (arXiv:2510.18327

²⁾

https://arxiv.org/abs/2510.03588|REFINE: Enhancing Program Repair Agents through Context-Aware Patch Refinement (arXiv:2510.03588

³⁾

https://arxiv.org/abs/2602.08263|Specification Vibing for Automated Program Repair (arXiv:2602.08263

AI Agent Knowledge Base

Sidebar

Table of Contents

Automated Program Repair

The APR Challenge

InspectCoder: Dynamic Analysis with Debugger Collaboration

Dual-Agent Architecture

Key Capabilities

Results

REFINE: Context-Aware Patch Refinement

Three Key Challenges Addressed

Integration Architecture

Results

Specification Vibing (VibeRepair)

The Specification-Centric Approach

On-Demand Reasoning

Results

Code Example

Comparison of APR Approaches

See Also

References

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Automated Program Repair

The APR Challenge

InspectCoder: Dynamic Analysis with Debugger Collaboration

Dual-Agent Architecture

Key Capabilities

Results

REFINE: Context-Aware Patch Refinement

Three Key Challenges Addressed

Integration Architecture

Results

Specification Vibing (VibeRepair)

The Specification-Centric Approach

On-Demand Reasoning

Results

Code Example

Comparison of APR Approaches

See Also

References

Page Tools