SWE-agent is a language model agent system by Yang et al. (Princeton, 2024) that resolves real-world GitHub issues autonomously through a carefully designed Agent-Computer Interface (ACI). Rather than giving the LLM raw terminal access, SWE-agent provides a minimal set of custom shell commands for searching, viewing, and editing code — an interface design that dramatically improves the agent's ability to navigate large codebases and produce correct patches. Accepted at NeurIPS 2024.
The ACI is SWE-agent's central contribution — the insight that how an agent interacts with a computer matters as much as the underlying model capability. The interface provides:
The deliberate simplicity of the interface reduces hallucination risk — fewer, well-defined commands produce more reliable agent behavior than unrestricted shell access.
SWE-agent uses three core tool categories:
search <regex> [--filename <regex>]
Finds files or code matching patterns across the repository. Supports regex for both content and filename filtering.
open <filename> [<line_number>] scroll_up / scroll_down goto <line_number>
Paginated file navigation with line numbers and context awareness. The viewer maintains state across commands, showing the agent's current position in the file.
edit <filename> <line_start> <line_end> <new_content> end_of_edit
Replaces specific line ranges with new content. This precise, line-addressed editing avoids the ambiguity of natural language edit instructions.
Standard Unix utilities (ls, grep, git) remain available for auxiliary tasks.
# SWE-agent style ACI interaction loop class SWEAgentLoop: def __init__(self, model, repo_path, issue_description): self.model = model self.repo = repo_path self.issue = issue_description self.history = [] def run(self, max_steps: int = 30) -> str: observation = self.setup_environment() for step in range(max_steps): # Model reasons about current state and selects action action = self.model.generate_action( issue=self.issue, observation=observation, history=self.history, ) # Execute command through ACI observation = self.execute_aci_command(action) self.history.append((action, observation)) if action.startswith("submit"): return self.generate_patch() return self.generate_patch() def execute_aci_command(self, command: str) -> str: if command.startswith("search"): return self.search_codebase(command) elif command.startswith("open"): return self.open_file_viewer(command) elif command.startswith("edit"): return self.apply_edit(command) else: return self.run_shell(command)
SWE-bench is a benchmark of 2,294 real GitHub issues from 12 popular Python repositories, requiring full repository-level bug fixing and feature implementation.
SWE-agent was among the first agent systems to demonstrate strong autonomous performance on SWE-bench. The benchmark has since become the standard evaluation for coding agents:
| System | SWE-bench Verified (500 tasks) |
|---|---|
| SWE-agent (GPT-4) | ~18% (early 2024) |
| SWE-agent + Claude 3.5 Sonnet | ~33% (late 2024) |
| Current SOTA (2026) | ~79% (with advanced scaffolding) |
SWE-agent's key contribution is not just the benchmark scores but the demonstration that ACI design is a first-class research problem — the same underlying LLM performs significantly better with well-designed tool interfaces.
| Approach | Key Difference |
|---|---|
| Agentless | Three-phase pipeline (localize, repair, validate) — no agentic loop |
| OpenHands | Broader action space including web browsing and code writing |
| HyperAgent | Multi-agent architecture for multi-language tasks |
| SWE-agent | Minimal ACI focused on search/view/edit reliability |
The ACI design embodies several principles for effective agent-tool interaction:
<latex>P(\text{correct patch} | \text{ACI}) > P(\text{correct patch} | \text{raw shell})</latex>