====== Automated Program Repair ======
LLM-powered agents are revolutionizing automated program repair (APR) by moving beyond static code rewriting to interactive debugging, context-aware refinement, and specification-centric repair. This page covers InspectCoder's debugger collaboration, REFINE's patch refinement framework, and VibeRepair's specification-centric approach.

===== The APR Challenge =====
Automated program repair aims to automatically fix bugs in software given issue descriptions, codebases, and test suites. LLM-based APR faces several challenges:

  * **Limited code context**: Models struggle to understand large repository structures
  * **Test suite overfitting**: Patches may pass tests without actually fixing the underlying bug
  * **Draft patches**: Current methods frequently produce partially correct fixes that incompletely address bugs
  * **Hallucinated fixes**: Code-centric approaches risk generating behaviorally inconsistent patches

===== InspectCoder: Dynamic Analysis with Debugger Collaboration =====
**InspectCoder** (arXiv:2510.18327) is the first agentic program repair system that enables LLMs to actively conduct dynamic analysis through interactive debugger control.((https://arxiv.org/abs/2510.18327|InspectCoder: Dynamic Analysis-Enabled LLM Self-Repair (arXiv:2510.18327)))

=== Dual-Agent Architecture ===
  * **Program Inspector**: Controls program execution via InspectWare middleware, strategically places breakpoints, inspects runtime state, and makes incremental runtime experiments
  * **Patch Coder**: Leverages diagnostic insights from the Inspector to generate and verify code patches

=== Key Capabilities ===
  * **Strategic breakpoint placement**: The Inspector agent decides where to pause execution based on the bug hypothesis
  * **Targeted state inspection**: Examines variable values, stack traces, and memory state at breakpoints
  * **Temporal, reversible perturbations**: Modifies intermediate program states temporarily to test root cause hypotheses, providing immediate process reward signals
  * **Adaptive inspection**: Responds to runtime behavior dynamically rather than following fixed log collection procedures
  * **Iterative patch verification**: After patch generation, failing tests trigger return to the Inspector for further dynamic analysis

=== Results ===
  * **BigCodeBench-R and LiveCodeBench-R**: 5.10% - 60.37% relative improvement in repair accuracy over strongest baselines
  * **Bug-fix efficiency**: 1.67x - 2.24x superior to baselines

===== REFINE: Context-Aware Patch Refinement =====
**REFINE** (Pabba et al., 2025) transforms partially correct "Draft Patches" into correct ones through a systematic refinement framework.((https://arxiv.org/abs/2510.03588|REFINE: Enhancing Program Repair Agents through Context-Aware Patch Refinement (arXiv:2510.03588)))

=== Three Key Challenges Addressed ===
  - **Context disambiguation**: Resolves vague issue descriptions and unclear code context by enriching the repair prompt with structured context
  - **Candidate diversification**: Uses test-time scaling to generate diverse patch candidates, increasing the probability of including a correct fix
  - **Partial fix aggregation**: An LLM-powered code review process combines insights from multiple partial fixes into a complete solution

=== Integration Architecture ===
REFINE is designed as a general refinement module that plugs into existing APR systems:

  * Works with open-agent-based systems (e.g., SWE-Agent)
  * Works with workflow-based systems (e.g., [[autocoderover|AutoCodeRover]])
  * Adds refinement as a post-processing step to any base APR approach

=== Results ===
  * **[[swe_bench|SWE-Bench]] Lite**: Boosts [[autocoderover|AutoCodeRover]] by 14.67%, achieving 51.67% (SOTA among workflow-based approaches)
  * **[[swe_bench|SWE-Bench]] Verified**: 12.2% improvement in resolution rate
  * Approaches best-known performance across all APR categories

===== Specification Vibing (VibeRepair) =====
**VibeRepair** (Zhu et al., 2026) introduces a paradigm shift from code-centric repair to specification-centric repair, treating bug fixing as behavior-specification alignment rather than ad-hoc code editing.((https://arxiv.org/abs/2602.08263|Specification Vibing for Automated Program Repair (arXiv:2602.08263)))

=== The Specification-Centric Approach ===
  - **Code to Specification**: Translates buggy code into a structured behavior specification capturing intended runtime behavior
  - **Specification Repair**: Infers and repairs misalignments in the specification (not the code)
  - **Specification to Code**: Synthesizes corrected code strictly guided by the repaired behavior specification

=== On-Demand Reasoning ===
For difficult cases, an enrichment component provides:
  * Program analysis insights
  * Historical bug-fix evidence from similar patterns
  * Cost-controlled reasoning that activates only when needed

=== Results ===
  * **Defects4J v1.2**: 174 bugs repaired (19% improvement over strongest baseline, +28 bugs)
  * **Defects4J v2.0**: 178 bugs repaired (23% improvement, +33 bugs)
  * Significantly smaller patch space than code-centric approaches
  * Generalizes to real-world benchmarks collected after the LLM training period

===== Code Example =====
<code python>
# InspectCoder-style debugger-driven APR (simplified)
class InspectCoderAgent:
    def __init__(self, inspector_llm, coder_llm, debugger):
        self.inspector = inspector_llm
        self.coder = coder_llm
        self.debugger = debugger

    def repair(self, buggy_code, test_suite, max_iterations=5):
        for iteration in range(max_iterations):
            # Phase 1: Dynamic analysis via Program Inspector
            diagnosis = self.inspect(buggy_code, test_suite)

            # Phase 2: Patch generation via Patch Coder
            patch = self.coder.generate_patch(buggy_code, diagnosis)

            # Phase 3: Verification
            results = test_suite.run(patch)
            if results.all_pass:
                return patch
            # Feed failing test back to Inspector
            buggy_code = patch  # Refine from current best

        return None  # Could not repair within budget

    def inspect(self, code, test_suite):
        failing_test = test_suite.get_first_failing()
        # Inspector decides breakpoint strategy
        breakpoints = self.inspector.plan_breakpoints(code, failing_test)
        self.debugger.set_breakpoints(breakpoints)

        # Run under debugger and collect state
        states = self.debugger.run(code, failing_test)

        # Inspector analyzes runtime state
        root_cause = self.inspector.analyze(states, code, failing_test)

        # Optional: perturbation experiment
        if root_cause.confidence < 0.7:
            perturbed = self.inspector.perturb_state(states, root_cause.hypothesis)
            root_cause = self.inspector.refine_hypothesis(perturbed)

        return root_cause
</code>

===== Comparison of APR Approaches =====
^ Method ^ Paradigm ^ Benchmark ^ Key Result ^ Innovation ^
| InspectCoder | Dynamic analysis | BigCodeBench-R | +60% relative improvement | Debugger collaboration |
| REFINE | Patch refinement | [[swe_bench|SWE-Bench]] Lite | 51.67% resolution | Draft patch aggregation |
| VibeRepair | Specification-centric | Defects4J v1.2 | 174 bugs (+19%) | Behavior specification repair |

===== See Also =====
  * [[software_testing_agents|Software Testing Agents]]
  * [[code_generation_agents|Code Generation Agents]]
  * [[how_to_build_a_coding_agent|How to Build a Coding Agent]]
  * [[autocoderover|AutoCodeRover]]
  * [[swe_agent|SWE-agent: Agent-Computer Interface for Software Engineering]]

===== References =====