Table of Contents

Tool Result Parsing

Reliably handling tool outputs in AI agents, including JSON parsing, error handling, type coercion, retry on malformed output, and structured extraction patterns.

Overview

A significant portion of AI agent failures in production stems not from flawed reasoning but from “tool argument rot” – the generation of malformed JSON, missing fields, or incorrect data types when calling tools. OpenAI reported that enforcing strict JSON schema validation increased output compliance from under 40% to 100%. Teams have seen 7x improvements in multi-step workflow accuracy by adopting schema validation.

The core principle: shift from best-effort parsing to deterministic validation. The agent either produces a perfectly formatted tool call or it fails fast, eliminating ambiguous “close enough” attempts that introduce instability.

Error Taxonomy

Tool output failures fall into distinct categories:

Schema Validation

Define strict schemas for every tool call and validate responses against them. Use Pydantic (Python) or Zod (TypeScript) for runtime validation with type coercion.

from pydantic import BaseModel, ValidationError, field_validator
from typing import Optional
import json
 
 
class SearchResult(BaseModel):
    title: str
    url: str
    relevance_score: float
    snippet: Optional[str] = None
 
    @field_validator("relevance_score")
    @classmethod
    def score_in_range(cls, v):
        if not 0.0 <= v <= 1.0:
            raise ValueError(f"relevance_score must be 0.0-1.0, got {v}")
        return v
 
 
class ToolOutput(BaseModel):
    tool_name: str
    success: bool
    results: list[SearchResult] = []
    error: Optional[str] = None
 
 
def parse_tool_output(raw: str) -> ToolOutput:
    """Parse and validate tool output with progressive fallback."""
    # Step 1: Clean LLM artifacts
    cleaned = raw.strip()
    for artifact in ["<|call|>", "<|endoftext|>", "```json", "```"]:
        cleaned = cleaned.replace(artifact, "")
 
    # Step 2: Attempt JSON parse
    try:
        data = json.loads(cleaned)
    except json.JSONDecodeError:
        # Step 3: Try to repair common issues
        data = attempt_json_repair(cleaned)
 
    # Step 4: Validate against schema
    return ToolOutput.model_validate(data)
 
 
def attempt_json_repair(raw: str) -> dict:
    """Attempt to fix common JSON malformations."""
    import re
    text = raw.strip()
    # Fix single quotes -> double quotes
    text = text.replace("'", '"')
    # Fix trailing commas before closing brackets
    text = re.sub(r",\s*([}\]])", r"\1", text)
    # Fix unquoted keys
    text = re.sub(r"(\{|,)\s*(\w+)\s*:", r'\1 "\2":', text)
    return json.loads(text)

Self-Recovering Structured Output

When parsing fails, feed the error back to the LLM so it can self-correct. This is more effective than blind retries because the model receives specific feedback about what went wrong.

from dataclasses import dataclass
 
 
@dataclass
class ParseAttempt:
    success: bool
    result: Optional[ToolOutput] = None
    error: Optional[str] = None
 
 
async def parse_with_self_correction(
    llm_client,
    messages: list[dict],
    tool_schema: dict,
    max_retries: int = 3,
) -> ToolOutput:
    """Parse tool output with LLM self-correction on failure."""
    for attempt in range(max_retries):
        response = await llm_client.chat(messages, tools=[tool_schema])
 
        if not response.tool_calls:
            # Model gave a text response instead of tool call
            messages.append({"role": "assistant", "content": response.text})
            messages.append({
                "role": "user",
                "content": "Please use the tool to provide a structured response.",
            })
            continue
 
        for tool_call in response.tool_calls:
            try:
                result = parse_tool_output(
                    json.dumps(tool_call.arguments)
                )
                return result
            except (json.JSONDecodeError, ValidationError) as e:
                # Feed error back to model for self-correction
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": f"Parse error: {e}. Fix the JSON and retry.",
                })
 
    raise RuntimeError(f"Failed to parse tool output after {max_retries} attempts")

Structured Extraction Patterns

Use XML tags or JSON schemas in prompts to guide LLMs toward parseable outputs.

Best practices:

Multi-Layer Defense

Production tool parsing should implement three layers:

  1. Auto-repair – Fix common JSON issues (trailing tokens, unquoted keys, single quotes, trailing commas)
  2. Error pipeline – When repair fails, send a descriptive error back to the model as a tool result so it can self-correct
  3. Custom repair hook – Optional application-specific repair logic for known edge cases

This approach is used by the Mastra framework, which added JSON repair for malformed tool call arguments with these three layers.

Type Coercion

Handle common type mismatches gracefully:

Pydantic and Zod both support configurable coercion modes. Use strict=False during initial parsing, then validate the coerced result against business rules.

Monitoring and Metrics

Track these metrics to measure parsing reliability:

References

See Also