====== How to Build a Coding Agent ======

A coding agent is an AI system that can autonomously read, edit, test, and debug code. Systems like Claude Code, Aider, and SWE-agent have demonstrated that LLM-powered agents can resolve real GitHub issues, scaffold applications, and refactor codebases with minimal human intervention. This guide covers the architecture, tool design, and implementation patterns for building your own.(([[https://sidbharath.com/blog/build-a-coding-agent-python-tutorial/|Build a Coding Agent from Scratch - Sid Bharath]]))(([[https://developers.openai.com/cookbook/examples/build_a_coding_agent_with_gpt-5.1/|Build a Coding Agent - OpenAI Cookbook]]))(([[https://arxiv.org/html/2603.00601v4|Architectural Beliefs in Coding Agents - arXiv]]))

===== Architecture Overview =====

Every coding agent follows a core loop: **Observe -> Reason -> Act -> Verify**. The agent reads files to understand context, reasons about what changes are needed, applies edits, then runs tests to verify correctness. On failure, it loops back with error context.

<mermaid>
graph TD
    A[User Task] --> B[Plan: Decompose Task]
    B --> C[Read: Load Relevant Files]
    C --> D[Reason: Analyze Code Context]
    D --> E[Edit: Apply Code Changes]
    E --> F[Test: Run Tests / Linter]
    F -->|Pass| G[Commit and Report]
    F -->|Fail| H[Error Analysis]
    H --> I{Retry Count < Max?}
    I -->|Yes| C
    I -->|No| J[Report Failure to User]
    G --> K[Done]
</mermaid>

===== Core Components =====

==== 1. Tool System ====

The tool system is the interface between the LLM and the filesystem. Well-designed tool schemas are critical for agent performance.

**File Reading Tool:**

<code python>
TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read contents of a file. Use for understanding code before editing.",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string", "description": "Relative file path"},
                    "start_line": {"type": "integer", "description": "Starting line (1-indexed)"},
                    "end_line": {"type": "integer", "description": "Ending line (optional)"}
                },
                "required": ["path"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "edit_file",
            "description": "Replace exact text in a file. old_text must match exactly.",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string", "description": "File path to edit"},
                    "old_text": {"type": "string", "description": "Exact text to find"},
                    "new_text": {"type": "string", "description": "Replacement text"}
                },
                "required": ["path", "old_text", "new_text"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "run_command",
            "description": "Execute a shell command and return stdout/stderr.",
            "parameters": {
                "type": "object",
                "properties": {
                    "command": {"type": "string", "description": "Shell command to execute"},
                    "timeout": {"type": "integer", "description": "Timeout in seconds", "default": 30}
                },
                "required": ["command"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "list_files",
            "description": "List files matching a glob pattern in the project.",
            "parameters": {
                "type": "object",
                "properties": {
                    "pattern": {"type": "string", "description": "Glob pattern e.g. **/*.py"},
                    "path": {"type": "string", "description": "Base directory", "default": "."}
                },
                "required": ["pattern"]
            }
        }
    }
]
</code>

==== 2. The Agent Loop ====

The core loop sends messages to the LLM, dispatches tool calls, appends results, and repeats until the model stops calling tools.

==== 3. Error Recovery ====

Production agents externalize their belief state after each action cycle. When tests fail, the agent re-reads the failing file, analyzes the traceback, and applies a targeted fix. A retry budget (typically 3-5 attempts) prevents infinite loops.

==== 4. Git Integration ====

Agents create branches, commit after successful edits, and can roll back via ''git reset'' on failure. This provides a safety net for destructive operations.

===== Approach 1: Pure Python from Scratch =====

This approach requires only the ''openai'' (or ''anthropic'') SDK. No frameworks needed.

<code python>
import os, json, subprocess, glob
from openai import OpenAI

client = OpenAI()
MODEL = "gpt-4o"
MAX_RETRIES = 5

SYSTEM_PROMPT = (
    "You are a coding agent. You can read files, edit files, "
    "list directory contents, and run shell commands. "
    "Always read a file before editing it. Run tests after making changes. "
    "If tests fail, analyze the error and fix the code. "
    "Work methodically: understand first, then change, then verify."
)

# --- Tool implementations ---
def read_file(path, start_line=None, end_line=None):
    with open(path, "r") as f:
        lines = f.readlines()
    if start_line and end_line:
        lines = lines[start_line - 1 : end_line]
    return "".join(lines)

def edit_file(path, old_text, new_text):
    content = open(path).read()
    if old_text not in content:
        return f"ERROR: old_text not found in {path}"
    updated = content.replace(old_text, new_text, 1)
    with open(path, "w") as f:
        f.write(updated)
    return f"Successfully edited {path}"

def run_command(command, timeout=30):
    result = subprocess.run(
        command, shell=True, capture_output=True,
        text=True, timeout=timeout
    )
    output = result.stdout + result.stderr
    return output[:5000]  # Truncate for context window

def list_files(pattern, path="."):
    matches = glob.glob(os.path.join(path, pattern), recursive=True)
    return "\n".join(sorted(matches))

TOOL_MAP = {
    "read_file": lambda args: read_file(**args),
    "edit_file": lambda args: edit_file(**args),
    "run_command": lambda args: run_command(**args),
    "list_files": lambda args: list_files(**args),
}

# --- Agent loop ---
def run_agent(task: str):
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": task},
    ]

    for i in range(MAX_RETRIES):
        response = client.chat.completions.create(
            model=MODEL, messages=messages, tools=TOOLS
        )
        msg = response.choices[0].message
        messages.append(msg)

        if not msg.tool_calls:
            return msg.content  # Agent is done

        for tool_call in msg.tool_calls:
            name = tool_call.function.name
            args = json.loads(tool_call.function.arguments)
            print(f"  [{name}] {args}")

            result = TOOL_MAP[name](args)
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": str(result),
            })

    return "Max retries reached."

if __name__ == "__main__":
    result = run_agent("Fix the failing test in tests/test_auth.py")
    print(result)
</code>

**Key design decisions:**
  * ''edit_file'' uses exact string matching (not line numbers) -- this is how Claude Code works and is more reliable since line numbers shift after edits
  * Tool output is truncated to prevent context window overflow
  * The loop has a hard retry limit to prevent runaway costs

===== Approach 2: Using the OpenAI Agents SDK =====

The OpenAI Agents SDK provides built-in primitives for agents, tools, handoffs, and guardrails.(([[https://agentincome.io/blog/openai-agents-sdk-tutorial-2026/|OpenAI Agents SDK Tutorial 2026]]))

<code python>
import subprocess
from agents import Agent, Runner, function_tool

@function_tool
def read_file(path: str, start_line: int = 0, end_line: int = 0) -> str:
    """Read file contents. Optionally specify line range."""
    with open(path) as f:
        lines = f.readlines()
    if start_line and end_line:
        lines = lines[start_line - 1 : end_line]
    return "".join(lines)

@function_tool
def edit_file(path: str, old_text: str, new_text: str) -> str:
    """Replace exact text in a file."""
    content = open(path).read()
    if old_text not in content:
        return f"ERROR: text not found in {path}"
    open(path, "w").write(content.replace(old_text, new_text, 1))
    return f"Edited {path}"

@function_tool
def run_command(command: str, timeout: int = 30) -> str:
    """Execute a shell command."""
    result = subprocess.run(
        command, shell=True, capture_output=True, text=True, timeout=timeout
    )
    return (result.stdout + result.stderr)[:5000]

@function_tool
def list_files(pattern: str) -> str:
    """List files matching a glob pattern."""
    import glob
    return "\n".join(glob.glob(pattern, recursive=True))

coding_agent = Agent(
    name="CodingAgent",
    instructions=(
        "You are a coding agent. Read files before editing. "
        "Run tests after changes. Fix failures by analyzing errors. "
        "Work step-by-step: understand, change, verify."
    ),
    tools=[read_file, edit_file, run_command, list_files],
)

# Run the agent
result = Runner.run_sync(
    coding_agent,
    "Add input validation to the create_user endpoint in app/routes.py",
)
print(result.final_output)
</code>

**Advantages of the SDK approach:**
  * ''@function_tool'' auto-generates JSON schemas from type hints
  * Built-in tracing for debugging agent behavior
  * Handoffs enable multi-agent orchestration (e.g., a planning agent delegates to a coding agent)
  * Guardrails can validate outputs before they reach the user

===== Comparison: Pure Python vs SDK =====

^ Criteria ^ Pure Python ^ OpenAI Agents SDK ^
| Setup complexity | Minimal (just openai package) | Low (pip install openai-agents) |
| Tool definition | Manual JSON schema | Auto-generated from type hints |
| Multi-agent | Must build from scratch | Built-in handoffs |
| Tracing/debugging | Manual logging | Built-in tracing dashboard |
| Vendor lock-in | Swap any LLM provider | OpenAI-specific |
| State persistence | In-memory or custom | Checkpoint support |
| Production readiness | You own everything | Battle-tested primitives |
| Best for | Learning, custom needs | Rapid development, production |

===== Best Practices =====

  * **Read before edit**: Always load file contents before attempting modifications. Blind edits hallucinate.
  * **Exact-match edits**: String replacement is more robust than line-number edits because line numbers shift as the file changes.
  * **Truncate outputs**: Cap tool output at 3000-5000 chars to preserve context window for reasoning.
  * **Git branch per task**: Create a branch before starting work. Commit on success, reset on failure.
  * **Belief externalization**: After every few actions, have the agent summarize its understanding of the codebase state. Research shows this boosts task success by 14+ percentage points.(([[https://www.oreilly.com/radar/how-to-build-a-general-purpose-ai-agent-in-131-lines-of-python/|How to Build an AI Agent in 131 Lines of Python - O Reilly]]))
  * **Sandbox execution**: Run commands in Docker containers or restricted environments to prevent destructive operations.
  * **Test-driven verification**: Always run the test suite after edits. Parse failure output to guide the next fix.

===== Benchmarks =====

^ System ^ SWE-bench Verified ^ Architecture ^
| Claude Code | ~72% | Single agent + sub-agents, 200K context |
| SWE-agent | ~23% | ReAct loop with custom tools |
| Aider | ~26% | Edit-focused with git integration |
| OpenAI Codex | ~70% | Multi-agent with code execution |
| Devin | ~55% | Full IDE agent with browser |

//Benchmarks as of late 2025. SWE-bench Verified is the gold standard for coding agent evaluation.//

===== See Also =====

  * [[how_to_build_a_research_agent|How to Build a Research Agent]]
  * [[how_to_build_a_multi_agent_system|How to Build a Multi-Agent System]]
  * [[how_to_build_a_data_analysis_agent|How to Build a Data Analysis Agent]]

===== References =====