Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Reliably handling tool outputs in AI agents, including JSON parsing, error handling, type coercion, retry on malformed output, and structured extraction patterns.
A significant portion of AI agent failures in production stems not from flawed reasoning but from “tool argument rot” – the generation of malformed JSON, missing fields, or incorrect data types when calling tools. OpenAI reported that enforcing strict JSON schema validation increased output compliance from under 40% to 100%. Teams have seen 7x improvements in multi-step workflow accuracy by adopting schema validation.
The core principle: shift from best-effort parsing to deterministic validation. The agent either produces a perfectly formatted tool call or it fails fast, eliminating ambiguous “close enough” attempts that introduce instability.
Tool output failures fall into distinct categories:
<|call|> or <|endoftext|> appended to JSONDefine strict schemas for every tool call and validate responses against them. Use Pydantic (Python) or Zod (TypeScript) for runtime validation with type coercion.
from pydantic import BaseModel, ValidationError, field_validator from typing import Optional import json class SearchResult(BaseModel): title: str url: str relevance_score: float snippet: Optional[str] = None @field_validator("relevance_score") @classmethod def score_in_range(cls, v): if not 0.0 <= v <= 1.0: raise ValueError(f"relevance_score must be 0.0-1.0, got {v}") return v class ToolOutput(BaseModel): tool_name: str success: bool results: list[SearchResult] = [] error: Optional[str] = None def parse_tool_output(raw: str) -> ToolOutput: """Parse and validate tool output with progressive fallback.""" # Step 1: Clean LLM artifacts cleaned = raw.strip() for artifact in ["<|call|>", "<|endoftext|>", "```json", "```"]: cleaned = cleaned.replace(artifact, "") # Step 2: Attempt JSON parse try: data = json.loads(cleaned) except json.JSONDecodeError: # Step 3: Try to repair common issues data = attempt_json_repair(cleaned) # Step 4: Validate against schema return ToolOutput.model_validate(data) def attempt_json_repair(raw: str) -> dict: """Attempt to fix common JSON malformations.""" import re text = raw.strip() # Fix single quotes -> double quotes text = text.replace("'", '"') # Fix trailing commas before closing brackets text = re.sub(r",\s*([}\]])", r"\1", text) # Fix unquoted keys text = re.sub(r"(\{|,)\s*(\w+)\s*:", r'\1 "\2":', text) return json.loads(text)
When parsing fails, feed the error back to the LLM so it can self-correct. This is more effective than blind retries because the model receives specific feedback about what went wrong.
from dataclasses import dataclass @dataclass class ParseAttempt: success: bool result: Optional[ToolOutput] = None error: Optional[str] = None async def parse_with_self_correction( llm_client, messages: list[dict], tool_schema: dict, max_retries: int = 3, ) -> ToolOutput: """Parse tool output with LLM self-correction on failure.""" for attempt in range(max_retries): response = await llm_client.chat(messages, tools=[tool_schema]) if not response.tool_calls: # Model gave a text response instead of tool call messages.append({"role": "assistant", "content": response.text}) messages.append({ "role": "user", "content": "Please use the tool to provide a structured response.", }) continue for tool_call in response.tool_calls: try: result = parse_tool_output( json.dumps(tool_call.arguments) ) return result except (json.JSONDecodeError, ValidationError) as e: # Feed error back to model for self-correction messages.append({ "role": "tool", "tool_call_id": tool_call.id, "content": f"Parse error: {e}. Fix the JSON and retry.", }) raise RuntimeError(f"Failed to parse tool output after {max_retries} attempts")
Use XML tags or JSON schemas in prompts to guide LLMs toward parseable outputs.
Best practices:
—BEGIN OUTPUT—) to clearly mark structured sectionsProduction tool parsing should implement three layers:
This approach is used by the Mastra framework, which added JSON repair for malformed tool call arguments with these three layers.
Handle common type mismatches gracefully:
Pydantic and Zod both support configurable coercion modes. Use strict=False during initial parsing, then validate the coerced result against business rules.
Track these metrics to measure parsing reliability: