Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Agent context files such as AGENTS.md and CLAUDE.md are repository-level instruction documents designed to guide AI coding agents. Gloaguen et al. (2026) present the first rigorous empirical investigation of their effectiveness, finding that these files tend to reduce task success rates while increasing inference costs by over 20%. This counterintuitive result challenges the widespread adoption of context files and suggests that minimal, carefully curated instructions outperform verbose guidance.
As AI coding agents (Claude Code, Codex, Qwen Code) become standard development tools, practitioners have adopted the convention of placing instruction files in repository roots:
CLAUDE.md – Instructions for Anthropic's Claude-based agentsAGENTS.md – OpenAI's recommended format for coding agentsCOPILOT.md – GitHub Copilot workspace instructions.cursorrules – Cursor editor agent instructionsThese files typically contain coding conventions, architectural guidelines, testing requirements, and tool usage preferences. Despite widespread adoption, no prior work had rigorously measured whether they actually improve agent performance.
The study evaluated coding agents with three conditions:
Two evaluation settings were used:
Agent traces were analyzed by categorizing tool calls (Edit/sed, Read/cat) and intents (install dependencies, run tests, explore files) via LLM-based classification.
Repository statistics: ~3337 files per codebase, 75% test coverage, PRs editing ~2.5 files and ~118.9 lines on average.
The findings challenge common assumptions about context file utility:
On AGENTbench:
However, broader analysis across conditions showed that context files lowered success rates overall with higher computational costs (more tool calls, longer traces).
Context files induced measurable behavioral changes:
No consistent advantage from model-matched prompts:
The core problem is that context files introduce unnecessary requirements that constrain agent behavior suboptimally. A formal model of the effect:
$$P(\text{success} | \text{context}) = P(\text{success} | \text{no context}) \cdot \frac{P(\text{helpful instructions})}{P(\text{helpful}) + P(\text{harmful constraints})}$$
When the ratio of helpful to harmful instructions falls below 1, context files degrade performance. The study suggests this ratio is frequently unfavorable for verbose context files.
The inference cost overhead is significant:
$$\Delta C = C_{\text{context}} - C_{\text{baseline}} \approx 0.2 \cdot C_{\text{baseline}}$$
representing a 20%+ increase in API costs for no performance gain.
from pathlib import Path CONTEXT_FILES = ["CLAUDE.md", "AGENTS.md", ".cursorrules", "COPILOT.md"] MAX_RECOMMENDED_WORDS = 300 def load_agent_context(repo_root): # Load agent context file with minimal-first strategy for filename in CONTEXT_FILES: path = Path(repo_root) / filename if path.exists(): content = path.read_text() word_count = len(content.split()) if word_count > MAX_RECOMMENDED_WORDS: print(f"Warning: {filename} has {word_count} words, " f"exceeds recommended {MAX_RECOMMENDED_WORDS}") return content return None def create_minimal_context(): # Generate minimal context following study recommendations # Focus only on non-obvious, repo-specific conventions return "\n".join([ "# Project Context", "- Language: Python 3.12", "- Test runner: pytest", "- Style: ruff format", "- Do not modify generated files in src/generated/", ])
Based on the findings, the authors recommend: