AutoCodeRover

AutoCodeRover is an autonomous program improvement agent that automatically detects and fixes issues in software repositories. Developed by researchers at the National University of Singapore, it combines LLM-based reasoning with program structure-aware code search to generate patches for real-world GitHub issues. With over 3,100 GitHub stars, it achieved 37.3% on SWE-bench Lite and 46.2% on SWE-bench Verified at under $0.70 per task.¹⁾)

GitHub: AutoCodeRoverSG/auto-code-rover

Key Features

Structure-Aware Code Search — Navigates project structure using AST-level understanding rather than simple text search
Autonomous Issue Resolution — Takes a GitHub issue description and produces a working patch without human intervention
Cost Efficient — Resolves tasks at less than $0.70 per issue on average
SWE-bench Performance — 37.3% pass@1 on SWE-bench Lite, 46.2% on SWE-bench Verified²⁾)
Multi-Stage Pipeline — Systematic approach: context retrieval, fault localization, patch generation, and validation
Docker-Based Execution — Runs in containerized environments for reproducibility and safety
Multiple LLM Support — Works with GPT-4, Claude, and other large language models

Architecture

AutoCodeRover is built in Python with a pipeline architecture:

Issue Parser — Extracts intent, error descriptions, and expected behavior from GitHub issue text
Code Search Engine — AST-aware search across classes, methods, and code blocks using program structure
Context Collector — Gathers relevant code snippets, test files, and documentation
Patch Generator — LLM-based reasoning to produce minimal, correct patches
Validation Engine — Runs existing test suites to verify patches don't introduce regressions
SWE-bench Integration — Direct support for SWE-bench evaluation framework via Docker

Usage Example

# Clone the repository
git clone https://github.com/AutoCodeRoverSG/auto-code-rover.git
cd auto-code-rover
 
# Build the Docker image
docker build -t acr .
 
# Run on a specific SWE-bench instance
python3 ACR.py --task django__django-16379 \
    --model gpt-4 \
    --output results/
 
# Run on a custom GitHub issue
python3 ACR.py --repo https://github.com/user/project \
    --issue 42 \
    --model claude-3.5-sonnet

How It Works

graph TD A[GitHub Issue] --> B[Issue Parser] B --> C[Extract Intent & Error Info] C --> D[Code Search Engine] D --> E[AST-Level Structure Analysis] E --> F[Relevant Classes & Methods] F --> G[Context Collection] G --> H[Code Snippets + Tests + Docs] H --> I[LLM Reasoning] I --> J[Fault Localization] J --> K[Patch Generation] K --> L[Minimal Code Patch] L --> M[Test Validation] M --> N{Tests Pass?} N -->|Yes| O[Output Patch] N -->|No| P[Refine with Feedback] P --> I

Research

AutoCodeRover was introduced in the paper “AutoCodeRover: Autonomous Program Improvement” (arXiv:2404.05427) by Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, and Abhik Roychoudhury. Key contributions:³⁾)

Program Structure Awareness — Demonstrated that AST-level code navigation significantly outperforms text-based search for bug localization
Spectrum-Based Fault Localization — Combines traditional SE techniques with LLM reasoning
Iterative Refinement — Multi-round patch generation with test feedback loops
Cost Analysis — Showed autonomous patching is economically viable at scale

The team later developed the Sonar Foundation Agent, scoring 79.2% on SWE-bench Verified.

References

¹⁾ , ²⁾ , ³⁾

https://arxiv.org/abs/2404.05427|AutoCodeRover: Autonomous Program Improvement (arXiv:2404.05427

Table of Contents