====== API Tool Generation: Doc2Agent and LRASGen ======
Automating the creation of tools that LLM agents can use is a critical bottleneck in building agentic systems. **Doc2Agent** generates executable Python tools from unstructured REST API documentation, while **LRASGen** generates OpenAPI specifications directly from source code. Together, they address the full API lifecycle -- from code to spec to agent-usable tool.
===== Doc2Agent: From API Docs to Agent Tools =====
Doc2Agent (2025) tackles the challenge of converting messy, incomplete API documentation into validated, executable Python functions that agents can invoke. The pipeline is fully automated with LLM-driven generation and live validation.
**Pipeline Stages:**
- **Document Parsing:** Ingests unstructured REST API documentation (HTML, markdown, plain text) and extracts endpoint definitions, parameters, authentication requirements, and response schemas
- **Tool Generation:** An LLM generates Python functions that wrap HTTP calls, including typed parameters, docstrings, and error handling
- **Live Validation:** Generated tools are tested against real API endpoints to verify correctness. Risky methods (DELETE, PUT) are restricted during testing
- **Code Agent Refinement:** Failed tools are iteratively repaired by a code agent that diagnoses errors from API responses and adjusts the implementation
- **Deployment:** Validated tools are packaged as AI-ready functions with concise signatures for agent frameworks
===== LRASGen: From Source Code to OpenAPI Specs =====
LRASGen (LLM-based RESTful API Specification Generation, 2025) addresses the upstream problem: many APIs lack proper specifications entirely. It uses LLMs to analyze source code and generate OpenAPI Specification (OAS) documents.
**Key Capabilities:**
* Works even with **incomplete implementations** -- partial code, missing annotations, or absent comments
* Combines LLM code understanding with text generation to produce formal endpoint descriptions
* Generates path definitions, parameter schemas, request/response models, and authentication specs
* First approach to use LLMs and API source code together for OAS generation
===== The Tool Creation Pipeline =====
When combined, these approaches form an end-to-end automated pipeline:
$$\text{Source Code} \xrightarrow{\text{LRASGen}} \text{OpenAPI Spec} \xrightarrow{\text{Doc2Agent}} \text{Agent Tools}$$
This eliminates manual specification writing and manual tool coding, enabling agents to interact with any API given only its codebase.
===== Code Example: Automated Tool Generation =====
class APIToolGenerator:
def __init__(self, llm, test_client):
self.llm = llm
self.test_client = test_client
self.max_retries = 3
def generate_from_docs(self, api_docs: str) -> list:
endpoints = self.llm.extract_endpoints(api_docs)
tools = []
for endpoint in endpoints:
tool_code = self.llm.generate_tool_function(endpoint)
validated = self.validate_and_refine(tool_code, endpoint)
if validated:
tools.append(validated)
return tools
def validate_and_refine(self, tool_code, endpoint):
for attempt in range(self.max_retries):
result = self.test_client.execute(tool_code, endpoint.test_params)
if result.status_code in (200, 201):
return tool_code
diagnosis = self.llm.diagnose_failure(tool_code, result)
tool_code = self.llm.refine_tool(tool_code, diagnosis)
return None
def generate_from_code(self, source_code: str) -> str:
spec = self.llm.generate_openapi_spec(source_code)
docs = self.render_spec_as_docs(spec)
return self.generate_from_docs(docs)
===== Doc2Agent Results =====
* Generated **443 validated tools** from real-world APIs including GitLab, OpenStreetMap, and research APIs
* Handles documentation inconsistencies and incomplete specifications
* Simpler APIs (Wiki, Map) achieve near-perfect generation success rates
* Most failures stem from offline services rather than generation errors
* Outperforms manual tool creation in coverage and consistency
===== Comparison of Approaches =====
^ Aspect ^ Doc2Agent ^ LRASGen ^
| **Input** | Unstructured API docs | Source code |
| **Output** | Python agent tools | OpenAPI (JSON/YAML) specs |
| **Key Technique** | LLM generation + code agent refinement | LLM code understanding + text generation |
| **Validation** | Live API calls | Schema conformance checking |
| **Handles Incomplete Input** | Yes (messy docs) | Yes (partial code, missing annotations) |
===== Pipeline Diagram =====
flowchart LR
A[Source Code] --> B[LRASGen]
B --> C[OpenAPI Spec]
C --> D[Doc2Agent]
E[API Documentation] --> D
D --> F[LLM Tool Generation]
F --> G[Live API Validation]
G -->|Pass| H[Agent-Ready Tool]
G -->|Fail| I[Code Agent Refinement]
I --> F
H --> J[Agent Framework Deployment]
===== Implications for Agent Ecosystems =====
These approaches fundamentally change how agent tool ecosystems scale:
* **No manual tooling:** Agents can autonomously expand their capabilities by discovering and wrapping new APIs
* **Self-healing tools:** Live validation and iterative refinement produce robust tools that handle real-world API quirks
* **Specification recovery:** LRASGen recovers formal specs from legacy codebases that were never properly documented
* **Composability:** Generated tools follow consistent interfaces, enabling agents to chain API calls across services
===== References =====
* [[https://arxiv.org/abs/2506.19998|Doc2Agent: Scalable Generation of Tool-Using Agents from API Documentation (arXiv:2506.19998)]]
* [[https://arxiv.org/abs/2504.16833|LRASGen: LLM-based RESTful API Specification Generation (arXiv:2504.16833)]]
===== See Also =====
* [[data_science_agents|Data Science Agents: DatawiseAgent]]
* [[agent_resource_management|Agent Resource Management: AgentRM]]
* [[recommendation_agents|Recommendation Agents: AgentRecBench and ARAG]]