The APR Challenge
InspectCoder: Dynamic Analysis with Debugger Collaboration
REFINE: Context-Aware Patch Refinement
Specification Vibing (VibeRepair)
Code Example
Comparison of APR Approaches
See Also
References

Automated Program Repair

LLM-powered agents are revolutionizing automated program repair (APR) by moving beyond static code rewriting to interactive debugging, context-aware refinement, and specification-centric repair. This page covers InspectCoder's debugger collaboration, REFINE's patch refinement framework, and VibeRepair's specification-centric approach.

The APR Challenge

Automated program repair aims to automatically fix bugs in software given issue descriptions, codebases, and test suites. LLM-based APR faces several challenges:

Limited code context: Models struggle to understand large repository structures
Test suite overfitting: Patches may pass tests without actually fixing the underlying bug
Draft patches: Current methods frequently produce partially correct fixes that incompletely address bugs
Hallucinated fixes: Code-centric approaches risk generating behaviorally inconsistent patches

InspectCoder: Dynamic Analysis with Debugger Collaboration

InspectCoder (arXiv:2510.18327) is the first agentic program repair system that enables LLMs to actively conduct dynamic analysis through interactive debugger control.¹⁾)

Dual-Agent Architecture

Program Inspector: Controls program execution via InspectWare middleware, strategically places breakpoints, inspects runtime state, and makes incremental runtime experiments
Patch Coder: Leverages diagnostic insights from the Inspector to generate and verify code patches

Key Capabilities

Strategic breakpoint placement: The Inspector agent decides where to pause execution based on the bug hypothesis
Targeted state inspection: Examines variable values, stack traces, and memory state at breakpoints
Temporal, reversible perturbations: Modifies intermediate program states temporarily to test root cause hypotheses, providing immediate process reward signals
Adaptive inspection: Responds to runtime behavior dynamically rather than following fixed log collection procedures
Iterative patch verification: After patch generation, failing tests trigger return to the Inspector for further dynamic analysis

Results

BigCodeBench-R and LiveCodeBench-R: 5.10% - 60.37% relative improvement in repair accuracy over strongest baselines
Bug-fix efficiency: 1.67x - 2.24x superior to baselines

REFINE: Context-Aware Patch Refinement

REFINE (Pabba et al., 2025) transforms partially correct “Draft Patches” into correct ones through a systematic refinement framework.²⁾)

Three Key Challenges Addressed

Context disambiguation: Resolves vague issue descriptions and unclear code context by enriching the repair prompt with structured context
Candidate diversification: Uses test-time scaling to generate diverse patch candidates, increasing the probability of including a correct fix
Partial fix aggregation: An LLM-powered code review process combines insights from multiple partial fixes into a complete solution

Integration Architecture

REFINE is designed as a general refinement module that plugs into existing APR systems:

Works with open-agent-based systems (e.g., SWE-Agent)
Works with workflow-based systems (e.g., AutoCodeRover)
Adds refinement as a post-processing step to any base APR approach

Results

SWE-Bench Lite: Boosts AutoCodeRover by 14.67%, achieving 51.67% (SOTA among workflow-based approaches)
SWE-Bench Verified: 12.2% improvement in resolution rate
Approaches best-known performance across all APR categories

Specification Vibing (VibeRepair)

VibeRepair (Zhu et al., 2026) introduces a paradigm shift from code-centric repair to specification-centric repair, treating bug fixing as behavior-specification alignment rather than ad-hoc code editing.³⁾)

The Specification-Centric Approach

Code to Specification: Translates buggy code into a structured behavior specification capturing intended runtime behavior
Specification Repair: Infers and repairs misalignments in the specification (not the code)
Specification to Code: Synthesizes corrected code strictly guided by the repaired behavior specification

On-Demand Reasoning

For difficult cases, an enrichment component provides:

Program analysis insights
Historical bug-fix evidence from similar patterns
Cost-controlled reasoning that activates only when needed

Results

Defects4J v1.2: 174 bugs repaired (19% improvement over strongest baseline, +28 bugs)
Defects4J v2.0: 178 bugs repaired (23% improvement, +33 bugs)
Significantly smaller patch space than code-centric approaches
Generalizes to real-world benchmarks collected after the LLM training period

Code Example

# InspectCoder-style debugger-driven APR (simplified)
class InspectCoderAgent:
    def __init__(self, inspector_llm, coder_llm, debugger):
        self.inspector = inspector_llm
        self.coder = coder_llm
        self.debugger = debugger
 
    def repair(self, buggy_code, test_suite, max_iterations=5):
        for iteration in range(max_iterations):
            # Phase 1: Dynamic analysis via Program Inspector
            diagnosis = self.inspect(buggy_code, test_suite)
 
            # Phase 2: Patch generation via Patch Coder
            patch = self.coder.generate_patch(buggy_code, diagnosis)
 
            # Phase 3: Verification
            results = test_suite.run(patch)
            if results.all_pass:
                return patch
            # Feed failing test back to Inspector
            buggy_code = patch  # Refine from current best
 
        return None  # Could not repair within budget
 
    def inspect(self, code, test_suite):
        failing_test = test_suite.get_first_failing()
        # Inspector decides breakpoint strategy
        breakpoints = self.inspector.plan_breakpoints(code, failing_test)
        self.debugger.set_breakpoints(breakpoints)
 
        # Run under debugger and collect state
        states = self.debugger.run(code, failing_test)
 
        # Inspector analyzes runtime state
        root_cause = self.inspector.analyze(states, code, failing_test)
 
        # Optional: perturbation experiment
        if root_cause.confidence < 0.7:
            perturbed = self.inspector.perturb_state(states, root_cause.hypothesis)
            root_cause = self.inspector.refine_hypothesis(perturbed)
 
        return root_cause

Comparison of APR Approaches

Method	Paradigm	Benchmark	Key Result	Innovation
InspectCoder	Dynamic analysis	BigCodeBench-R	+60% relative improvement	Debugger collaboration
REFINE	Patch refinement	SWE-Bench Lite	51.67% resolution	Draft patch aggregation
VibeRepair	Specification-centric	Defects4J v1.2	174 bugs (+19%)	Behavior specification repair

References

¹⁾

https://arxiv.org/abs/2510.18327|InspectCoder: Dynamic Analysis-Enabled LLM Self-Repair (arXiv:2510.18327

²⁾

https://arxiv.org/abs/2510.03588|REFINE: Enhancing Program Repair Agents through Context-Aware Patch Refinement (arXiv:2510.03588

³⁾

https://arxiv.org/abs/2602.08263|Specification Vibing for Automated Program Repair (arXiv:2602.08263

Table of Contents

Automated Program Repair

The APR Challenge

InspectCoder: Dynamic Analysis with Debugger Collaboration

Dual-Agent Architecture

Key Capabilities

Results

REFINE: Context-Aware Patch Refinement

Three Key Challenges Addressed

Integration Architecture

Results

Specification Vibing (VibeRepair)

The Specification-Centric Approach

On-Demand Reasoning

Results

Code Example

Comparison of APR Approaches

See Also

References