Automated Program Repair
LLM-powered agents are revolutionizing automated program repair (APR) by moving beyond static code rewriting to interactive debugging, context-aware refinement, and specification-centric repair. This page covers InspectCoder's debugger collaboration, REFINE's patch refinement framework, and VibeRepair's specification-centric approach.
The APR Challenge
Automated program repair aims to automatically fix bugs in software given issue descriptions, codebases, and test suites. LLM-based APR faces several challenges:
Limited code context: Models struggle to understand large repository structures
Test suite overfitting: Patches may pass tests without actually fixing the underlying bug
Draft patches: Current methods frequently produce partially correct fixes that incompletely address bugs
Hallucinated fixes: Code-centric approaches risk generating behaviorally inconsistent patches
InspectCoder: Dynamic Analysis with Debugger Collaboration
InspectCoder (arXiv:2510.18327) is the first agentic program repair system that enables LLMs to actively conduct dynamic analysis through interactive debugger control.
Dual-Agent Architecture
Program Inspector: Controls program execution via InspectWare middleware – strategically places breakpoints, inspects runtime state, and makes incremental runtime experiments
Patch Coder: Leverages diagnostic insights from the Inspector to generate and verify code patches
Key Capabilities
Strategic breakpoint placement: The Inspector agent decides where to pause execution based on the bug hypothesis
Targeted state inspection: Examines variable values, stack traces, and memory state at breakpoints
Temporal, reversible perturbations: Modifies intermediate program states temporarily to test root cause hypotheses, providing immediate process reward signals
Adaptive inspection: Responds to runtime behavior dynamically rather than following fixed log collection procedures
Iterative patch verification: After patch generation, failing tests trigger return to the Inspector for further dynamic analysis
Results
REFINE: Context-Aware Patch Refinement
REFINE (Pabba et al., 2025) transforms partially correct “Draft Patches” into correct ones through a systematic refinement framework.
Three Key Challenges Addressed
Context disambiguation: Resolves vague issue descriptions and unclear code context by enriching the repair prompt with structured context
Candidate diversification: Uses test-time scaling to generate diverse patch candidates, increasing the probability of including a correct fix
Partial fix aggregation: An LLM-powered code review process combines insights from multiple partial fixes into a complete solution
Integration Architecture
REFINE is designed as a general refinement module that plugs into existing APR systems:
Works with open-agent-based systems (e.g., SWE-Agent)
Works with workflow-based systems (e.g., AutoCodeRover)
Adds refinement as a post-processing step to any base APR approach
Results
SWE-Bench Lite: Boosts AutoCodeRover by 14.67%, achieving 51.67% (SOTA among workflow-based approaches)
SWE-Bench Verified: 12.2% improvement in resolution rate
Approaches best-known performance across all APR categories
Specification Vibing (VibeRepair)
VibeRepair (Zhu et al., 2026) introduces a paradigm shift from code-centric repair to specification-centric repair, treating bug fixing as behavior-specification alignment rather than ad-hoc code editing.
The Specification-Centric Approach
Code to Specification: Translates buggy code into a structured behavior specification capturing intended runtime behavior
Specification Repair: Infers and repairs misalignments in the specification (not the code)
Specification to Code: Synthesizes corrected code strictly guided by the repaired behavior specification
On-Demand Reasoning
For difficult cases, an enrichment component provides:
Program analysis insights
Historical bug-fix evidence from similar patterns
Cost-controlled reasoning that activates only when needed
Results
Defects4J v1.2: 174 bugs repaired (19% improvement over strongest baseline, +28 bugs)
Defects4J v2.0: 178 bugs repaired (23% improvement, +33 bugs)
Significantly smaller patch space than code-centric approaches
Generalizes to real-world benchmarks collected after the LLM training period
Code Example
# InspectCoder-style debugger-driven APR (simplified)
class InspectCoderAgent:
def __init__(self, inspector_llm, coder_llm, debugger):
self.inspector = inspector_llm
self.coder = coder_llm
self.debugger = debugger
def repair(self, buggy_code, test_suite, max_iterations=5):
for iteration in range(max_iterations):
# Phase 1: Dynamic analysis via Program Inspector
diagnosis = self.inspect(buggy_code, test_suite)
# Phase 2: Patch generation via Patch Coder
patch = self.coder.generate_patch(buggy_code, diagnosis)
# Phase 3: Verification
results = test_suite.run(patch)
if results.all_pass:
return patch
# Feed failing test back to Inspector
buggy_code = patch # Refine from current best
return None # Could not repair within budget
def inspect(self, code, test_suite):
failing_test = test_suite.get_first_failing()
# Inspector decides breakpoint strategy
breakpoints = self.inspector.plan_breakpoints(code, failing_test)
self.debugger.set_breakpoints(breakpoints)
# Run under debugger and collect state
states = self.debugger.run(code, failing_test)
# Inspector analyzes runtime state
root_cause = self.inspector.analyze(states, code, failing_test)
# Optional: perturbation experiment
if root_cause.confidence < 0.7:
perturbed = self.inspector.perturb_state(states, root_cause.hypothesis)
root_cause = self.inspector.refine_hypothesis(perturbed)
return root_cause
Comparison of APR Approaches
| Method | Paradigm | Benchmark | Key Result | Innovation |
| InspectCoder | Dynamic analysis | BigCodeBench-R | +60% relative improvement | Debugger collaboration |
| REFINE | Patch refinement | SWE-Bench Lite | 51.67% resolution | Draft patch aggregation |
| VibeRepair | Specification-centric | Defects4J v1.2 | 174 bugs (+19%) | Behavior specification repair |
References
See Also