====== How to Build a Coding Agent ====== A coding agent is an AI system that can autonomously read, edit, test, and debug code. Systems like Claude Code, Aider, and SWE-agent have demonstrated that LLM-powered agents can resolve real GitHub issues, scaffold applications, and refactor codebases with minimal human intervention. This guide covers the architecture, tool design, and implementation patterns for building your own.(([[https://sidbharath.com/blog/build-a-coding-agent-python-tutorial/|Build a Coding Agent from Scratch - Sid Bharath]]))(([[https://developers.openai.com/cookbook/examples/build_a_coding_agent_with_gpt-5.1/|Build a Coding Agent - OpenAI Cookbook]]))(([[https://arxiv.org/html/2603.00601v4|Architectural Beliefs in Coding Agents - arXiv]])) ===== Architecture Overview ===== Every coding agent follows a core loop: **Observe -> Reason -> Act -> Verify**. The agent reads files to understand context, reasons about what changes are needed, applies edits, then runs tests to verify correctness. On failure, it loops back with error context. graph TD A[User Task] --> B[Plan: Decompose Task] B --> C[Read: Load Relevant Files] C --> D[Reason: Analyze Code Context] D --> E[Edit: Apply Code Changes] E --> F[Test: Run Tests / Linter] F -->|Pass| G[Commit and Report] F -->|Fail| H[Error Analysis] H --> I{Retry Count < Max?} I -->|Yes| C I -->|No| J[Report Failure to User] G --> K[Done] ===== Core Components ===== ==== 1. Tool System ==== The tool system is the interface between the LLM and the filesystem. Well-designed tool schemas are critical for agent performance. **File Reading Tool:** TOOLS = [ { "type": "function", "function": { "name": "read_file", "description": "Read contents of a file. Use for understanding code before editing.", "parameters": { "type": "object", "properties": { "path": {"type": "string", "description": "Relative file path"}, "start_line": {"type": "integer", "description": "Starting line (1-indexed)"}, "end_line": {"type": "integer", "description": "Ending line (optional)"} }, "required": ["path"] } } }, { "type": "function", "function": { "name": "edit_file", "description": "Replace exact text in a file. old_text must match exactly.", "parameters": { "type": "object", "properties": { "path": {"type": "string", "description": "File path to edit"}, "old_text": {"type": "string", "description": "Exact text to find"}, "new_text": {"type": "string", "description": "Replacement text"} }, "required": ["path", "old_text", "new_text"] } } }, { "type": "function", "function": { "name": "run_command", "description": "Execute a shell command and return stdout/stderr.", "parameters": { "type": "object", "properties": { "command": {"type": "string", "description": "Shell command to execute"}, "timeout": {"type": "integer", "description": "Timeout in seconds", "default": 30} }, "required": ["command"] } } }, { "type": "function", "function": { "name": "list_files", "description": "List files matching a glob pattern in the project.", "parameters": { "type": "object", "properties": { "pattern": {"type": "string", "description": "Glob pattern e.g. **/*.py"}, "path": {"type": "string", "description": "Base directory", "default": "."} }, "required": ["pattern"] } } } ] ==== 2. The Agent Loop ==== The core loop sends messages to the LLM, dispatches tool calls, appends results, and repeats until the model stops calling tools. ==== 3. Error Recovery ==== Production agents externalize their belief state after each action cycle. When tests fail, the agent re-reads the failing file, analyzes the traceback, and applies a targeted fix. A retry budget (typically 3-5 attempts) prevents infinite loops. ==== 4. Git Integration ==== Agents create branches, commit after successful edits, and can roll back via ''git reset'' on failure. This provides a safety net for destructive operations. ===== Approach 1: Pure Python from Scratch ===== This approach requires only the ''openai'' (or ''anthropic'') SDK. No frameworks needed. import os, json, subprocess, glob from openai import OpenAI client = OpenAI() MODEL = "gpt-4o" MAX_RETRIES = 5 SYSTEM_PROMPT = ( "You are a coding agent. You can read files, edit files, " "list directory contents, and run shell commands. " "Always read a file before editing it. Run tests after making changes. " "If tests fail, analyze the error and fix the code. " "Work methodically: understand first, then change, then verify." ) # --- Tool implementations --- def read_file(path, start_line=None, end_line=None): with open(path, "r") as f: lines = f.readlines() if start_line and end_line: lines = lines[start_line - 1 : end_line] return "".join(lines) def edit_file(path, old_text, new_text): content = open(path).read() if old_text not in content: return f"ERROR: old_text not found in {path}" updated = content.replace(old_text, new_text, 1) with open(path, "w") as f: f.write(updated) return f"Successfully edited {path}" def run_command(command, timeout=30): result = subprocess.run( command, shell=True, capture_output=True, text=True, timeout=timeout ) output = result.stdout + result.stderr return output[:5000] # Truncate for context window def list_files(pattern, path="."): matches = glob.glob(os.path.join(path, pattern), recursive=True) return "\n".join(sorted(matches)) TOOL_MAP = { "read_file": lambda args: read_file(**args), "edit_file": lambda args: edit_file(**args), "run_command": lambda args: run_command(**args), "list_files": lambda args: list_files(**args), } # --- Agent loop --- def run_agent(task: str): messages = [ {"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": task}, ] for i in range(MAX_RETRIES): response = client.chat.completions.create( model=MODEL, messages=messages, tools=TOOLS ) msg = response.choices[0].message messages.append(msg) if not msg.tool_calls: return msg.content # Agent is done for tool_call in msg.tool_calls: name = tool_call.function.name args = json.loads(tool_call.function.arguments) print(f" [{name}] {args}") result = TOOL_MAP[name](args) messages.append({ "role": "tool", "tool_call_id": tool_call.id, "content": str(result), }) return "Max retries reached." if __name__ == "__main__": result = run_agent("Fix the failing test in tests/test_auth.py") print(result) **Key design decisions:** * ''edit_file'' uses exact string matching (not line numbers) -- this is how Claude Code works and is more reliable since line numbers shift after edits * Tool output is truncated to prevent context window overflow * The loop has a hard retry limit to prevent runaway costs ===== Approach 2: Using the OpenAI Agents SDK ===== The OpenAI Agents SDK provides built-in primitives for agents, tools, handoffs, and guardrails.(([[https://agentincome.io/blog/openai-agents-sdk-tutorial-2026/|OpenAI Agents SDK Tutorial 2026]])) import subprocess from agents import Agent, Runner, function_tool @function_tool def read_file(path: str, start_line: int = 0, end_line: int = 0) -> str: """Read file contents. Optionally specify line range.""" with open(path) as f: lines = f.readlines() if start_line and end_line: lines = lines[start_line - 1 : end_line] return "".join(lines) @function_tool def edit_file(path: str, old_text: str, new_text: str) -> str: """Replace exact text in a file.""" content = open(path).read() if old_text not in content: return f"ERROR: text not found in {path}" open(path, "w").write(content.replace(old_text, new_text, 1)) return f"Edited {path}" @function_tool def run_command(command: str, timeout: int = 30) -> str: """Execute a shell command.""" result = subprocess.run( command, shell=True, capture_output=True, text=True, timeout=timeout ) return (result.stdout + result.stderr)[:5000] @function_tool def list_files(pattern: str) -> str: """List files matching a glob pattern.""" import glob return "\n".join(glob.glob(pattern, recursive=True)) coding_agent = Agent( name="CodingAgent", instructions=( "You are a coding agent. Read files before editing. " "Run tests after changes. Fix failures by analyzing errors. " "Work step-by-step: understand, change, verify." ), tools=[read_file, edit_file, run_command, list_files], ) # Run the agent result = Runner.run_sync( coding_agent, "Add input validation to the create_user endpoint in app/routes.py", ) print(result.final_output) **Advantages of the SDK approach:** * ''@function_tool'' auto-generates JSON schemas from type hints * Built-in tracing for debugging agent behavior * Handoffs enable multi-agent orchestration (e.g., a planning agent delegates to a coding agent) * Guardrails can validate outputs before they reach the user ===== Comparison: Pure Python vs SDK ===== ^ Criteria ^ Pure Python ^ OpenAI Agents SDK ^ | Setup complexity | Minimal (just openai package) | Low (pip install openai-agents) | | Tool definition | Manual JSON schema | Auto-generated from type hints | | Multi-agent | Must build from scratch | Built-in handoffs | | Tracing/debugging | Manual logging | Built-in tracing dashboard | | Vendor lock-in | Swap any LLM provider | OpenAI-specific | | State persistence | In-memory or custom | Checkpoint support | | Production readiness | You own everything | Battle-tested primitives | | Best for | Learning, custom needs | Rapid development, production | ===== Best Practices ===== * **Read before edit**: Always load file contents before attempting modifications. Blind edits hallucinate. * **Exact-match edits**: String replacement is more robust than line-number edits because line numbers shift as the file changes. * **Truncate outputs**: Cap tool output at 3000-5000 chars to preserve context window for reasoning. * **Git branch per task**: Create a branch before starting work. Commit on success, reset on failure. * **Belief externalization**: After every few actions, have the agent summarize its understanding of the codebase state. Research shows this boosts task success by 14+ percentage points.(([[https://www.oreilly.com/radar/how-to-build-a-general-purpose-ai-agent-in-131-lines-of-python/|How to Build an AI Agent in 131 Lines of Python - O Reilly]])) * **Sandbox execution**: Run commands in Docker containers or restricted environments to prevent destructive operations. * **Test-driven verification**: Always run the test suite after edits. Parse failure output to guide the next fix. ===== Benchmarks ===== ^ System ^ SWE-bench Verified ^ Architecture ^ | Claude Code | ~72% | Single agent + sub-agents, 200K context | | SWE-agent | ~23% | ReAct loop with custom tools | | Aider | ~26% | Edit-focused with git integration | | OpenAI Codex | ~70% | Multi-agent with code execution | | Devin | ~55% | Full IDE agent with browser | //Benchmarks as of late 2025. SWE-bench Verified is the gold standard for coding agent evaluation.// ===== See Also ===== * [[how_to_build_a_research_agent|How to Build a Research Agent]] * [[how_to_build_a_multi_agent_system|How to Build a Multi-Agent System]] * [[how_to_build_a_data_analysis_agent|How to Build a Data Analysis Agent]] ===== References =====