====== SWE-agent: Agent-Computer Interface for Software Engineering ====== SWE-agent is a language model agent system by Yang et al. (Princeton, 2024) that resolves real-world GitHub issues autonomously through a carefully designed **Agent-Computer Interface (ACI)**. Rather than giving the LLM raw terminal access, SWE-agent provides a minimal set of custom shell commands for searching, viewing, and editing code — an interface design that dramatically improves the agent's ability to navigate large codebases and produce correct patches. Accepted at NeurIPS 2024. graph TD ISS[GitHub Issue] --> SEARCH[Search Codebase] SEARCH --> VIEW[View File] VIEW --> EDIT[Edit Code] EDIT --> TEST[Run Tests] TEST --> CHECK{Tests Pass?} CHECK -->|No| SEARCH CHECK -->|Yes| SUBMIT[Submit Patch] ===== Agent-Computer Interface (ACI) Design ===== The ACI is SWE-agent's central contribution — the insight that **how** an agent interacts with a computer matters as much as the underlying model capability. The interface provides: * **Structured observations**: File contents displayed with line numbers, surrounding context indicators (e.g., "400 lines above, 2684 lines below"), and syntax highlighting * **Paginated viewing**: Prevents token overflow by showing manageable chunks of code rather than entire files * **Action-observation loop**: Agent issues a command, receives structured output, reasons about next steps, and iterates * **Repository isolation**: Each task runs in a cloned GitHub repo preserving the exact pre-fix state The deliberate simplicity of the interface reduces hallucination risk — fewer, well-defined commands produce more reliable agent behavior than unrestricted shell access. ===== Custom Command Set ===== SWE-agent uses three core tool categories: === Search === search [--filename ] Finds files or code matching patterns across the repository. Supports regex for both content and filename filtering. === File Viewer === open [] scroll_up / scroll_down goto Paginated file navigation with line numbers and context awareness. The viewer maintains state across commands, showing the agent's current position in the file. === Edit === edit end_of_edit Replaces specific line ranges with new content. This precise, line-addressed editing avoids the ambiguity of natural language edit instructions. Standard Unix utilities (''ls'', ''grep'', ''git'') remain available for auxiliary tasks. ===== Code Example ===== # SWE-agent style ACI interaction loop class SWEAgentLoop: def __init__(self, model, repo_path, issue_description): self.model = model self.repo = repo_path self.issue = issue_description self.history = [] def run(self, max_steps: int = 30) -> str: observation = self.setup_environment() for step in range(max_steps): # Model reasons about current state and selects action action = self.model.generate_action( issue=self.issue, observation=observation, history=self.history, ) # Execute command through ACI observation = self.execute_aci_command(action) self.history.append((action, observation)) if action.startswith("submit"): return self.generate_patch() return self.generate_patch() def execute_aci_command(self, command: str) -> str: if command.startswith("search"): return self.search_codebase(command) elif command.startswith("open"): return self.open_file_viewer(command) elif command.startswith("edit"): return self.apply_edit(command) else: return self.run_shell(command) ===== SWE-bench Results ===== SWE-bench is a benchmark of 2,294 real GitHub issues from 12 popular Python repositories, requiring full repository-level bug fixing and feature implementation. SWE-agent was among the first agent systems to demonstrate strong autonomous performance on SWE-bench. The benchmark has since become the standard evaluation for coding agents: ^ System ^ SWE-bench Verified (500 tasks) ^ | SWE-agent (GPT-4) | ~18% (early 2024) | | SWE-agent + Claude 3.5 Sonnet | ~33% (late 2024) | | Current SOTA (2026) | ~79% (with advanced scaffolding) | SWE-agent's key contribution is not just the benchmark scores but the demonstration that **ACI design is a first-class research problem** — the same underlying LLM performs significantly better with well-designed tool interfaces. ===== Comparison with Other Approaches ===== ^ Approach ^ Key Difference ^ | **Agentless** | Three-phase pipeline (localize, repair, validate) — no agentic loop | | **OpenHands** | Broader action space including web browsing and code writing | | **HyperAgent** | Multi-agent architecture for multi-language tasks | | **SWE-agent** | Minimal ACI focused on search/view/edit reliability | ===== Design Principles ===== The ACI design embodies several principles for effective agent-tool interaction: P(\text{correct patch} | \text{ACI}) > P(\text{correct patch} | \text{raw shell}) * **Minimal action space**: Fewer, well-defined commands reduce action selection errors * **Structured observations**: Formatted output with line numbers provides unambiguous context * **Stateful navigation**: The file viewer remembers position, reducing redundant exploration * **Precise editing**: Line-addressed edits eliminate ambiguity in code modification ===== References ===== * [[https://arxiv.org/abs/2405.15793|Yang et al. "SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering" (arXiv:2405.15793)]] * [[https://github.com/swe-agent/swe-agent|SWE-agent GitHub Repository]] * [[https://www.swebench.com/|SWE-bench Leaderboard]] * [[https://arxiv.org/abs/2310.06770|Jimenez et al. "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?"]] ===== See Also ===== * [[metagpt|MetaGPT — Multi-agent framework for software development]] * [[gorilla|Gorilla — LLM trained for accurate tool/API calling]] * [[agent_distillation|Agent Distillation — Compressing agent capabilities into smaller models]]