Doc2Agent: From API Docs to Agent Tools
LRASGen: From Source Code to OpenAPI Specs
The Tool Creation Pipeline
Code Example: Automated Tool Generation
Doc2Agent Results
Comparison of Approaches
Pipeline Diagram
Implications for Agent Ecosystems
References
See Also

API Tool Generation: Doc2Agent and LRASGen

Automating the creation of tools that LLM agents can use is a critical bottleneck in building agentic systems. Doc2Agent generates executable Python tools from unstructured REST API documentation, while LRASGen generates OpenAPI specifications directly from source code. Together, they address the full API lifecycle – from code to spec to agent-usable tool.

Doc2Agent: From API Docs to Agent Tools

Doc2Agent (2025) tackles the challenge of converting messy, incomplete API documentation into validated, executable Python functions that agents can invoke. The pipeline is fully automated with LLM-driven generation and live validation.

Pipeline Stages:

Document Parsing: Ingests unstructured REST API documentation (HTML, markdown, plain text) and extracts endpoint definitions, parameters, authentication requirements, and response schemas
Tool Generation: An LLM generates Python functions that wrap HTTP calls, including typed parameters, docstrings, and error handling
Live Validation: Generated tools are tested against real API endpoints to verify correctness. Risky methods (DELETE, PUT) are restricted during testing
Code Agent Refinement: Failed tools are iteratively repaired by a code agent that diagnoses errors from API responses and adjusts the implementation
Deployment: Validated tools are packaged as AI-ready functions with concise signatures for agent frameworks

LRASGen: From Source Code to OpenAPI Specs

LRASGen (LLM-based RESTful API Specification Generation, 2025) addresses the upstream problem: many APIs lack proper specifications entirely. It uses LLMs to analyze source code and generate OpenAPI Specification (OAS) documents.

Key Capabilities:

Works even with incomplete implementations – partial code, missing annotations, or absent comments
Combines LLM code understanding with text generation to produce formal endpoint descriptions
Generates path definitions, parameter schemas, request/response models, and authentication specs
First approach to use LLMs and API source code together for OAS generation

The Tool Creation Pipeline

When combined, these approaches form an end-to-end automated pipeline:

$$\text{Source Code} \xrightarrow{\text{LRASGen}} \text{OpenAPI Spec} \xrightarrow{\text{Doc2Agent}} \text{Agent Tools}$$

This eliminates manual specification writing and manual tool coding, enabling agents to interact with any API given only its codebase.

Code Example: Automated Tool Generation

class APIToolGenerator:
    def __init__(self, llm, test_client):
        self.llm = llm
        self.test_client = test_client
        self.max_retries = 3
 
    def generate_from_docs(self, api_docs: str) -> list:
        endpoints = self.llm.extract_endpoints(api_docs)
        tools = []
        for endpoint in endpoints:
            tool_code = self.llm.generate_tool_function(endpoint)
            validated = self.validate_and_refine(tool_code, endpoint)
            if validated:
                tools.append(validated)
        return tools
 
    def validate_and_refine(self, tool_code, endpoint):
        for attempt in range(self.max_retries):
            result = self.test_client.execute(tool_code, endpoint.test_params)
            if result.status_code in (200, 201):
                return tool_code
            diagnosis = self.llm.diagnose_failure(tool_code, result)
            tool_code = self.llm.refine_tool(tool_code, diagnosis)
        return None
 
    def generate_from_code(self, source_code: str) -> str:
        spec = self.llm.generate_openapi_spec(source_code)
        docs = self.render_spec_as_docs(spec)
        return self.generate_from_docs(docs)

Doc2Agent Results

Generated 443 validated tools from real-world APIs including GitLab, OpenStreetMap, and research APIs
Handles documentation inconsistencies and incomplete specifications
Simpler APIs (Wiki, Map) achieve near-perfect generation success rates
Most failures stem from offline services rather than generation errors
Outperforms manual tool creation in coverage and consistency

Comparison of Approaches

Aspect	Doc2Agent	LRASGen
Input	Unstructured API docs	Source code
Output	Python agent tools	OpenAPI (JSON/YAML) specs
Key Technique	LLM generation + code agent refinement	LLM code understanding + text generation
Validation	Live API calls	Schema conformance checking
Handles Incomplete Input	Yes (messy docs)	Yes (partial code, missing annotations)

Pipeline Diagram

flowchart LR A[Source Code] --> B[LRASGen] B --> C[OpenAPI Spec] C --> D[Doc2Agent] E[API Documentation] --> D D --> F[LLM Tool Generation] F --> G[Live API Validation] G -->|Pass| H[Agent-Ready Tool] G -->|Fail| I[Code Agent Refinement] I --> F H --> J[Agent Framework Deployment]

Implications for Agent Ecosystems

These approaches fundamentally change how agent tool ecosystems scale:

No manual tooling: Agents can autonomously expand their capabilities by discovering and wrapping new APIs
Self-healing tools: Live validation and iterative refinement produce robust tools that handle real-world API quirks
Specification recovery: LRASGen recovers formal specs from legacy codebases that were never properly documented
Composability: Generated tools follow consistent interfaces, enabling agents to chain API calls across services

Table of Contents