====== Automated Program Repair ====== LLM-powered agents are revolutionizing automated program repair (APR) by moving beyond static code rewriting to interactive debugging, context-aware refinement, and specification-centric repair. This page covers InspectCoder's debugger collaboration, REFINE's patch refinement framework, and VibeRepair's specification-centric approach. ===== The APR Challenge ===== Automated program repair aims to automatically fix bugs in software given issue descriptions, codebases, and test suites. LLM-based APR faces several challenges: * **Limited code context**: Models struggle to understand large repository structures * **Test suite overfitting**: Patches may pass tests without actually fixing the underlying bug * **Draft patches**: Current methods frequently produce partially correct fixes that incompletely address bugs * **Hallucinated fixes**: Code-centric approaches risk generating behaviorally inconsistent patches ===== InspectCoder: Dynamic Analysis with Debugger Collaboration ===== **InspectCoder** (arXiv:2510.18327) is the first agentic program repair system that enables LLMs to actively conduct dynamic analysis through interactive debugger control. === Dual-Agent Architecture === * **Program Inspector**: Controls program execution via InspectWare middleware -- strategically places breakpoints, inspects runtime state, and makes incremental runtime experiments * **Patch Coder**: Leverages diagnostic insights from the Inspector to generate and verify code patches === Key Capabilities === * **Strategic breakpoint placement**: The Inspector agent decides where to pause execution based on the bug hypothesis * **Targeted state inspection**: Examines variable values, stack traces, and memory state at breakpoints * **Temporal, reversible perturbations**: Modifies intermediate program states temporarily to test root cause hypotheses, providing immediate process reward signals * **Adaptive inspection**: Responds to runtime behavior dynamically rather than following fixed log collection procedures * **Iterative patch verification**: After patch generation, failing tests trigger return to the Inspector for further dynamic analysis === Results === * **BigCodeBench-R and LiveCodeBench-R**: 5.10% - 60.37% relative improvement in repair accuracy over strongest baselines * **Bug-fix efficiency**: 1.67x - 2.24x superior to baselines ===== REFINE: Context-Aware Patch Refinement ===== **REFINE** (Pabba et al., 2025) transforms partially correct "Draft Patches" into correct ones through a systematic refinement framework. === Three Key Challenges Addressed === - **Context disambiguation**: Resolves vague issue descriptions and unclear code context by enriching the repair prompt with structured context - **Candidate diversification**: Uses test-time scaling to generate diverse patch candidates, increasing the probability of including a correct fix - **Partial fix aggregation**: An LLM-powered code review process combines insights from multiple partial fixes into a complete solution === Integration Architecture === REFINE is designed as a general refinement module that plugs into existing APR systems: * Works with open-agent-based systems (e.g., SWE-Agent) * Works with workflow-based systems (e.g., AutoCodeRover) * Adds refinement as a post-processing step to any base APR approach === Results === * **SWE-Bench Lite**: Boosts AutoCodeRover by 14.67%, achieving 51.67% (SOTA among workflow-based approaches) * **SWE-Bench Verified**: 12.2% improvement in resolution rate * Approaches best-known performance across all APR categories ===== Specification Vibing (VibeRepair) ===== **VibeRepair** (Zhu et al., 2026) introduces a paradigm shift from code-centric repair to specification-centric repair, treating bug fixing as behavior-specification alignment rather than ad-hoc code editing. === The Specification-Centric Approach === - **Code to Specification**: Translates buggy code into a structured behavior specification capturing intended runtime behavior - **Specification Repair**: Infers and repairs misalignments in the specification (not the code) - **Specification to Code**: Synthesizes corrected code strictly guided by the repaired behavior specification === On-Demand Reasoning === For difficult cases, an enrichment component provides: * Program analysis insights * Historical bug-fix evidence from similar patterns * Cost-controlled reasoning that activates only when needed === Results === * **Defects4J v1.2**: 174 bugs repaired (19% improvement over strongest baseline, +28 bugs) * **Defects4J v2.0**: 178 bugs repaired (23% improvement, +33 bugs) * Significantly smaller patch space than code-centric approaches * Generalizes to real-world benchmarks collected after the LLM training period ===== Code Example ===== # InspectCoder-style debugger-driven APR (simplified) class InspectCoderAgent: def __init__(self, inspector_llm, coder_llm, debugger): self.inspector = inspector_llm self.coder = coder_llm self.debugger = debugger def repair(self, buggy_code, test_suite, max_iterations=5): for iteration in range(max_iterations): # Phase 1: Dynamic analysis via Program Inspector diagnosis = self.inspect(buggy_code, test_suite) # Phase 2: Patch generation via Patch Coder patch = self.coder.generate_patch(buggy_code, diagnosis) # Phase 3: Verification results = test_suite.run(patch) if results.all_pass: return patch # Feed failing test back to Inspector buggy_code = patch # Refine from current best return None # Could not repair within budget def inspect(self, code, test_suite): failing_test = test_suite.get_first_failing() # Inspector decides breakpoint strategy breakpoints = self.inspector.plan_breakpoints(code, failing_test) self.debugger.set_breakpoints(breakpoints) # Run under debugger and collect state states = self.debugger.run(code, failing_test) # Inspector analyzes runtime state root_cause = self.inspector.analyze(states, code, failing_test) # Optional: perturbation experiment if root_cause.confidence < 0.7: perturbed = self.inspector.perturb_state(states, root_cause.hypothesis) root_cause = self.inspector.refine_hypothesis(perturbed) return root_cause ===== Comparison of APR Approaches ===== ^ Method ^ Paradigm ^ Benchmark ^ Key Result ^ Innovation ^ | InspectCoder | Dynamic analysis | BigCodeBench-R | +60% relative improvement | Debugger collaboration | | REFINE | Patch refinement | SWE-Bench Lite | 51.67% resolution | Draft patch aggregation | | VibeRepair | Specification-centric | Defects4J v1.2 | 174 bugs (+19%) | Behavior specification repair | ===== References ===== * [[https://arxiv.org/abs/2510.18327|InspectCoder: Dynamic Analysis-Enabled LLM Self-Repair (arXiv:2510.18327)]] * [[https://arxiv.org/abs/2510.03588|REFINE: Enhancing Program Repair Agents through Context-Aware Patch Refinement (arXiv:2510.03588)]] * [[https://arxiv.org/abs/2602.08263|Specification Vibing for Automated Program Repair (arXiv:2602.08263)]] ===== See Also ===== * [[text_to_sql_agents|Agentic Text-to-SQL]] -- related agentic code generation paradigm * [[llm_theorem_proving|LLM Theorem Proving]] -- formal verification for correctness * [[agentic_uncertainty|Agentic Uncertainty]] -- confidence in generated patches