Table of Contents

Agent Context Files

Agent context files such as AGENTS.md and CLAUDE.md are repository-level instruction documents designed to guide AI coding agents. Gloaguen et al. (2026) present the first rigorous empirical investigation of their effectiveness, finding that these files tend to reduce task success rates while increasing inference costs by over 20%. This counterintuitive result challenges the widespread adoption of context files and suggests that minimal, carefully curated instructions outperform verbose guidance.

Background

As AI coding agents (Claude Code, Codex, Qwen Code) become standard development tools, practitioners have adopted the convention of placing instruction files in repository roots:

These files typically contain coding conventions, architectural guidelines, testing requirements, and tool usage preferences. Despite widespread adoption, no prior work had rigorously measured whether they actually improve agent performance.

Methodology

The study evaluated coding agents with three conditions:

  1. No context – Agent operates without any context file
  2. LLM-generated context – Context files generated by an LLM following official prompts from OpenAI and Anthropic (averaging ~641 words, 9.7 sections)
  3. Human-written context – Developer-committed context files from real repositories

Benchmarks

Two evaluation settings were used:

Agent traces were analyzed by categorizing tool calls (Edit/sed, Read/cat) and intents (install dependencies, run tests, explore files) via LLM-based classification.

Repository statistics: ~3337 files per codebase, 75% test coverage, PRs editing ~2.5 files and ~118.9 lines on average.

Key Results

The findings challenge common assumptions about context file utility:

Performance Impact

On AGENTbench:

However, broader analysis across conditions showed that context files lowered success rates overall with higher computational costs (more tool calls, longer traces).

Behavioral Shifts

Context files induced measurable behavioral changes:

Prompt Insensitivity

No consistent advantage from model-matched prompts:

Analysis

The core problem is that context files introduce unnecessary requirements that constrain agent behavior suboptimally. A formal model of the effect:

$$P(\text{success} | \text{context}) = P(\text{success} | \text{no context}) \cdot \frac{P(\text{helpful instructions})}{P(\text{helpful}) + P(\text{harmful constraints})}$$

When the ratio of helpful to harmful instructions falls below 1, context files degrade performance. The study suggests this ratio is frequently unfavorable for verbose context files.

The inference cost overhead is significant:

$$\Delta C = C_{\text{context}} - C_{\text{baseline}} \approx 0.2 \cdot C_{\text{baseline}}$$

representing a 20%+ increase in API costs for no performance gain.

Code Example

from pathlib import Path
 
CONTEXT_FILES = ["CLAUDE.md", "AGENTS.md", ".cursorrules", "COPILOT.md"]
MAX_RECOMMENDED_WORDS = 300
 
def load_agent_context(repo_root):
    # Load agent context file with minimal-first strategy
    for filename in CONTEXT_FILES:
        path = Path(repo_root) / filename
        if path.exists():
            content = path.read_text()
            word_count = len(content.split())
            if word_count > MAX_RECOMMENDED_WORDS:
                print(f"Warning: {filename} has {word_count} words, "
                      f"exceeds recommended {MAX_RECOMMENDED_WORDS}")
            return content
    return None
 
def create_minimal_context():
    # Generate minimal context following study recommendations
    # Focus only on non-obvious, repo-specific conventions
    return "\n".join([
        "# Project Context",
        "- Language: Python 3.12",
        "- Test runner: pytest",
        "- Style: ruff format",
        "- Do not modify generated files in src/generated/",
    ])

Recommendations

Based on the findings, the authors recommend:

References

See Also