====== SWE-agent: Agent-Computer Interface for Software Engineering ======
SWE-agent is a language model agent system by Yang et al. (Princeton, 2024) that resolves real-world GitHub issues autonomously through a carefully designed **Agent-Computer Interface (ACI)**. Rather than giving the LLM raw terminal access, SWE-agent provides a minimal set of custom shell commands for searching, viewing, and editing code — an interface design that dramatically improves the agent's ability to navigate large codebases and produce correct patches. Accepted at NeurIPS 2024.
graph TD
ISS[GitHub Issue] --> SEARCH[Search Codebase]
SEARCH --> VIEW[View File]
VIEW --> EDIT[Edit Code]
EDIT --> TEST[Run Tests]
TEST --> CHECK{Tests Pass?}
CHECK -->|No| SEARCH
CHECK -->|Yes| SUBMIT[Submit Patch]
===== Agent-Computer Interface (ACI) Design =====
The ACI is SWE-agent's central contribution — the insight that **how** an agent interacts with a computer matters as much as the underlying model capability. The interface provides:
* **Structured observations**: File contents displayed with line numbers, surrounding context indicators (e.g., "400 lines above, 2684 lines below"), and syntax highlighting
* **Paginated viewing**: Prevents token overflow by showing manageable chunks of code rather than entire files
* **Action-observation loop**: Agent issues a command, receives structured output, reasons about next steps, and iterates
* **Repository isolation**: Each task runs in a cloned GitHub repo preserving the exact pre-fix state
The deliberate simplicity of the interface reduces hallucination risk — fewer, well-defined commands produce more reliable agent behavior than unrestricted shell access.
===== Custom Command Set =====
SWE-agent uses three core tool categories:
=== Search ===
search [--filename ]
Finds files or code matching patterns across the repository. Supports regex for both content and filename filtering.
=== File Viewer ===
open []
scroll_up / scroll_down
goto
Paginated file navigation with line numbers and context awareness. The viewer maintains state across commands, showing the agent's current position in the file.
=== Edit ===
edit
end_of_edit
Replaces specific line ranges with new content. This precise, line-addressed editing avoids the ambiguity of natural language edit instructions.
Standard Unix utilities (''ls'', ''grep'', ''git'') remain available for auxiliary tasks.
===== Code Example =====
# SWE-agent style ACI interaction loop
class SWEAgentLoop:
def __init__(self, model, repo_path, issue_description):
self.model = model
self.repo = repo_path
self.issue = issue_description
self.history = []
def run(self, max_steps: int = 30) -> str:
observation = self.setup_environment()
for step in range(max_steps):
# Model reasons about current state and selects action
action = self.model.generate_action(
issue=self.issue,
observation=observation,
history=self.history,
)
# Execute command through ACI
observation = self.execute_aci_command(action)
self.history.append((action, observation))
if action.startswith("submit"):
return self.generate_patch()
return self.generate_patch()
def execute_aci_command(self, command: str) -> str:
if command.startswith("search"):
return self.search_codebase(command)
elif command.startswith("open"):
return self.open_file_viewer(command)
elif command.startswith("edit"):
return self.apply_edit(command)
else:
return self.run_shell(command)
===== SWE-bench Results =====
SWE-bench is a benchmark of 2,294 real GitHub issues from 12 popular Python repositories, requiring full repository-level bug fixing and feature implementation.
SWE-agent was among the first agent systems to demonstrate strong autonomous performance on SWE-bench. The benchmark has since become the standard evaluation for coding agents:
^ System ^ SWE-bench Verified (500 tasks) ^
| SWE-agent (GPT-4) | ~18% (early 2024) |
| SWE-agent + Claude 3.5 Sonnet | ~33% (late 2024) |
| Current SOTA (2026) | ~79% (with advanced scaffolding) |
SWE-agent's key contribution is not just the benchmark scores but the demonstration that **ACI design is a first-class research problem** — the same underlying LLM performs significantly better with well-designed tool interfaces.
===== Comparison with Other Approaches =====
^ Approach ^ Key Difference ^
| **Agentless** | Three-phase pipeline (localize, repair, validate) — no agentic loop |
| **OpenHands** | Broader action space including web browsing and code writing |
| **HyperAgent** | Multi-agent architecture for multi-language tasks |
| **SWE-agent** | Minimal ACI focused on search/view/edit reliability |
===== Design Principles =====
The ACI design embodies several principles for effective agent-tool interaction:
P(\text{correct patch} | \text{ACI}) > P(\text{correct patch} | \text{raw shell})
* **Minimal action space**: Fewer, well-defined commands reduce action selection errors
* **Structured observations**: Formatted output with line numbers provides unambiguous context
* **Stateful navigation**: The file viewer remembers position, reducing redundant exploration
* **Precise editing**: Line-addressed edits eliminate ambiguity in code modification
===== References =====
* [[https://arxiv.org/abs/2405.15793|Yang et al. "SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering" (arXiv:2405.15793)]]
* [[https://github.com/swe-agent/swe-agent|SWE-agent GitHub Repository]]
* [[https://www.swebench.com/|SWE-bench Leaderboard]]
* [[https://arxiv.org/abs/2310.06770|Jimenez et al. "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?"]]
===== See Also =====
* [[metagpt|MetaGPT — Multi-agent framework for software development]]
* [[gorilla|Gorilla — LLM trained for accurate tool/API calling]]
* [[agent_distillation|Agent Distillation — Compressing agent capabilities into smaller models]]