====== API Tool Generation: Doc2Agent and LRASGen ====== Automating the creation of tools that LLM agents can use is a critical bottleneck in building agentic systems. **Doc2Agent** generates executable Python tools from unstructured REST API documentation, while **LRASGen** generates OpenAPI specifications directly from source code. Together, they address the full API lifecycle -- from code to spec to agent-usable tool. ===== Doc2Agent: From API Docs to Agent Tools ===== Doc2Agent (2025) tackles the challenge of converting messy, incomplete API documentation into validated, executable Python functions that agents can invoke. The pipeline is fully automated with LLM-driven generation and live validation. **Pipeline Stages:** - **Document Parsing:** Ingests unstructured REST API documentation (HTML, markdown, plain text) and extracts endpoint definitions, parameters, authentication requirements, and response schemas - **Tool Generation:** An LLM generates Python functions that wrap HTTP calls, including typed parameters, docstrings, and error handling - **Live Validation:** Generated tools are tested against real API endpoints to verify correctness. Risky methods (DELETE, PUT) are restricted during testing - **Code Agent Refinement:** Failed tools are iteratively repaired by a code agent that diagnoses errors from API responses and adjusts the implementation - **Deployment:** Validated tools are packaged as AI-ready functions with concise signatures for agent frameworks ===== LRASGen: From Source Code to OpenAPI Specs ===== LRASGen (LLM-based RESTful API Specification Generation, 2025) addresses the upstream problem: many APIs lack proper specifications entirely. It uses LLMs to analyze source code and generate OpenAPI Specification (OAS) documents. **Key Capabilities:** * Works even with **incomplete implementations** -- partial code, missing annotations, or absent comments * Combines LLM code understanding with text generation to produce formal endpoint descriptions * Generates path definitions, parameter schemas, request/response models, and authentication specs * First approach to use LLMs and API source code together for OAS generation ===== The Tool Creation Pipeline ===== When combined, these approaches form an end-to-end automated pipeline: $$\text{Source Code} \xrightarrow{\text{LRASGen}} \text{OpenAPI Spec} \xrightarrow{\text{Doc2Agent}} \text{Agent Tools}$$ This eliminates manual specification writing and manual tool coding, enabling agents to interact with any API given only its codebase. ===== Code Example: Automated Tool Generation ===== class APIToolGenerator: def __init__(self, llm, test_client): self.llm = llm self.test_client = test_client self.max_retries = 3 def generate_from_docs(self, api_docs: str) -> list: endpoints = self.llm.extract_endpoints(api_docs) tools = [] for endpoint in endpoints: tool_code = self.llm.generate_tool_function(endpoint) validated = self.validate_and_refine(tool_code, endpoint) if validated: tools.append(validated) return tools def validate_and_refine(self, tool_code, endpoint): for attempt in range(self.max_retries): result = self.test_client.execute(tool_code, endpoint.test_params) if result.status_code in (200, 201): return tool_code diagnosis = self.llm.diagnose_failure(tool_code, result) tool_code = self.llm.refine_tool(tool_code, diagnosis) return None def generate_from_code(self, source_code: str) -> str: spec = self.llm.generate_openapi_spec(source_code) docs = self.render_spec_as_docs(spec) return self.generate_from_docs(docs) ===== Doc2Agent Results ===== * Generated **443 validated tools** from real-world APIs including GitLab, OpenStreetMap, and research APIs * Handles documentation inconsistencies and incomplete specifications * Simpler APIs (Wiki, Map) achieve near-perfect generation success rates * Most failures stem from offline services rather than generation errors * Outperforms manual tool creation in coverage and consistency ===== Comparison of Approaches ===== ^ Aspect ^ Doc2Agent ^ LRASGen ^ | **Input** | Unstructured API docs | Source code | | **Output** | Python agent tools | OpenAPI (JSON/YAML) specs | | **Key Technique** | LLM generation + code agent refinement | LLM code understanding + text generation | | **Validation** | Live API calls | Schema conformance checking | | **Handles Incomplete Input** | Yes (messy docs) | Yes (partial code, missing annotations) | ===== Pipeline Diagram ===== flowchart LR A[Source Code] --> B[LRASGen] B --> C[OpenAPI Spec] C --> D[Doc2Agent] E[API Documentation] --> D D --> F[LLM Tool Generation] F --> G[Live API Validation] G -->|Pass| H[Agent-Ready Tool] G -->|Fail| I[Code Agent Refinement] I --> F H --> J[Agent Framework Deployment] ===== Implications for Agent Ecosystems ===== These approaches fundamentally change how agent tool ecosystems scale: * **No manual tooling:** Agents can autonomously expand their capabilities by discovering and wrapping new APIs * **Self-healing tools:** Live validation and iterative refinement produce robust tools that handle real-world API quirks * **Specification recovery:** LRASGen recovers formal specs from legacy codebases that were never properly documented * **Composability:** Generated tools follow consistent interfaces, enabling agents to chain API calls across services ===== References ===== * [[https://arxiv.org/abs/2506.19998|Doc2Agent: Scalable Generation of Tool-Using Agents from API Documentation (arXiv:2506.19998)]] * [[https://arxiv.org/abs/2504.16833|LRASGen: LLM-based RESTful API Specification Generation (arXiv:2504.16833)]] ===== See Also ===== * [[data_science_agents|Data Science Agents: DatawiseAgent]] * [[agent_resource_management|Agent Resource Management: AgentRM]] * [[recommendation_agents|Recommendation Agents: AgentRecBench and ARAG]]