SWE-agent: Agent-Computer Interface for Software Engineering

SWE-agent is a language model agent system by Yang et al. (Princeton, 2024) that resolves real-world GitHub issues autonomously through a carefully designed Agent-Computer Interface (ACI)¹⁾ . Rather than giving the LLM raw terminal access, SWE-agent provides a minimal set of custom shell commands for searching, viewing, and editing code — an interface design that dramatically improves the agent's ability to navigate large codebases and produce correct patches. Accepted at NeurIPS 2024.

graph TD ISS[[[github|GitHub]] Issue] --> SEARCH[Search Codebase] SEARCH --> VIEW[View File] VIEW --> EDIT[Edit Code] EDIT --> TEST[Run Tests] TEST --> CHECK{Tests Pass?} CHECK -->|No| SEARCH CHECK -->|Yes| SUBMIT[Submit Patch]

Agent-Computer Interface (ACI) Design

The ACI is SWE-agent's central contribution — the insight that how an agent interacts with a computer matters as much as the underlying model capability. The interface provides:

Structured observations: File contents displayed with line numbers, surrounding context indicators (e.g., “400 lines above, 2684 lines below”), and syntax highlighting
Paginated viewing: Prevents token overflow by showing manageable chunks of code rather than entire files
Action-observation loop: Agent issues a command, receives structured output, reasons about next steps, and iterates
Repository isolation: Each task runs in a cloned GitHub repo preserving the exact pre-fix state

The deliberate simplicity of the interface reduces hallucination risk — fewer, well-defined commands produce more reliable agent behavior than unrestricted shell access.

Custom Command Set

SWE-agent uses three core tool categories:

Search

search <regex> [--filename <regex>]

Finds files or code matching patterns across the repository. Supports regex for both content and filename filtering.

File Viewer

open <filename> [<line_number>]
scroll_up / scroll_down
goto <line_number>

Paginated file navigation with line numbers and context awareness. The viewer maintains state across commands, showing the agent's current position in the file.

Edit

edit <filename> <line_start> <line_end>
<new_content>
end_of_edit

Replaces specific line ranges with new content. This precise, line-addressed editing avoids the ambiguity of natural language edit instructions.

Standard Unix utilities (ls, grep, git) remain available for auxiliary tasks.

Code Example

# SWE-agent style ACI interaction loop
class SWEAgentLoop:
    def __init__(self, model, repo_path, issue_description):
        self.model = model
        self.repo = repo_path
        self.issue = issue_description
        self.history = []
 
    def run(self, max_steps: int = 30) -> str:
        observation = self.setup_environment()
        for step in range(max_steps):
            # Model reasons about current state and selects action
            action = self.model.generate_action(
                issue=self.issue,
                observation=observation,
                history=self.history,
            )
            # Execute command through ACI
            observation = self.execute_aci_command(action)
            self.history.append((action, observation))
 
            if action.startswith("submit"):
                return self.generate_patch()
        return self.generate_patch()
 
    def execute_aci_command(self, command: str) -> str:
        if command.startswith("search"):
            return self.search_codebase(command)
        elif command.startswith("open"):
            return self.open_file_viewer(command)
        elif command.startswith("edit"):
            return self.apply_edit(command)
        else:
            return self.run_shell(command)

SWE-bench Results

SWE-bench is a benchmark of 2,294 real GitHub issues from 12 popular Python repositories, requiring full repository-level bug fixing and feature implementation.²⁾: Can Language Models Resolve Real-World GitHub Issues?“]])) )

SWE-agent was among the first agent systems to demonstrate strong autonomous performance on SWE-bench.³⁾ ) The benchmark has since become the standard evaluation for coding agents:

System	SWE-bench Verified (500 tasks)
SWE-agent (GPT-4)	~18% (early 2024)
SWE-agent + Claude 3.5 Sonnet	~33% (late 2024)
Current SOTA (2026)	~79% (with advanced scaffolding)

SWE-agent's key contribution is not just the benchmark scores but the demonstration that ACI design is a first-class research problem — the same underlying LLM performs significantly better with well-designed tool interfaces.

Comparison with Other Approaches

Approach	Key Difference
Agentless	Three-phase pipeline (localize, repair, validate) — no agentic loop
OpenHands	Broader action space including web browsing and code writing
HyperAgent	Multi-agent architecture for multi-language tasks
SWE-agent	Minimal ACI focused on search/view/edit reliability

Design Principles

The ACI design embodies several principles for effective agent-tool interaction:

<latex>P(\text{correct patch} | \text{ACI}) > P(\text{correct patch} | \text{raw shell})</latex>

Minimal action space: Fewer, well-defined commands reduce action selection errors
Structured observations: Formatted output with line numbers provides unambiguous context
Stateful navigation: The file viewer remembers position, reducing redundant exploration
Precise editing: Line-addressed edits eliminate ambiguity in code modification

References

¹⁾

Yang et al. "SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering" (arXiv:2405.15793

²⁾

Jimenez et al. "[[swe_bench|SWE-bench

³⁾

SWE-agent GitHub Repository

Table of Contents