This is an old revision of the document!

How to Structure System Prompts

System prompts are the foundation of AI agent behavior. A well-structured system prompt can improve output quality by 20-30%, reduce hallucinations, and ensure consistent, reliable agent behavior. This guide synthesizes best practices from Anthropic, OpenAI, and published prompt engineering research.¹⁾²⁾³⁾⁴⁾⁵⁾

Anatomy of an Effective System Prompt

Every production system prompt should contain these components in order of priority:

graph TD A[System Prompt Structure] --> B[1. Role Definition] A --> C[2. Core Instructions] A --> D[3. Constraints and Guardrails] A --> E[4. Output Format] A --> F[5. Tool Instructions] A --> G[6. Few-Shot Examples] A --> H[7. Error Handling] B --> B1[Identity, expertise, persona] C --> C1[Primary task, goals, priorities] D --> D1[Boundaries, limitations, safety] E --> E1[Structure, format, length] F --> F1[When and how to use tools] G --> G1[Input-output pairs] H --> H1[Fallback behavior] style A fill:#2196F3,color:#fff style B fill:#4CAF50,color:#fff style C fill:#4CAF50,color:#fff style D fill:#FF9800,color:#fff style E fill:#FF9800,color:#fff style F fill:#9C27B0,color:#fff style G fill:#9C27B0,color:#fff style H fill:#9C27B0,color:#fff

The Seven Components

1. Role Definition

Anchor the agent's identity and expertise level. This sets the behavioral baseline for everything that follows.

Pattern: “You are a [expertise level] [role] specializing in [domain]. You [key behavioral trait].”

Good: “You are a senior backend engineer specializing in distributed systems. You prioritize correctness over cleverness.”
Bad: “You are a helpful assistant.” (too vague, no expertise anchor)

2. Core Instructions

Define the primary task and priorities. Be specific about what the agent should do, not just what it is.

Pattern: State the mission, then rank priorities.

Good: “Your task is to review pull requests for security vulnerabilities. Priority order: (1) security flaws, (2) correctness bugs, (3) performance issues, (4) style.”
Bad: “Help with code review.” (no specifics, no priority ordering)

3. Constraints and Guardrails

Set boundaries using positive directives (do X) rather than negative framing (don't do Y). Research shows models follow positive instructions more reliably.

Approach	Example	Effectiveness
Positive (preferred)	“Respond in under 200 words”	More reliable
Negative (avoid)	“Don't write long responses”	Less reliable
Conditional	“If uncertain, say 'I need more information'”	Most precise

4. Output Format

Specify exactly how responses should be structured. This is critical for downstream parsing and consistency.

Use explicit format markers: JSON schema, markdown headers, numbered steps
Include field names and types when expecting structured data
Show the expected shape, not just “return JSON”

5. Tool Instructions

Tell the agent when, why, and how to use each tool. Include decision criteria.

Pattern: “Use [tool] when [condition]. Format: [syntax]. Do not use [tool] for [anti-pattern].”

6. Few-Shot Examples

Provide 1-3 input/output pairs that demonstrate desired behavior. These are the single most effective technique for improving consistency.

7. Error Handling

Define fallback behavior for edge cases, ambiguous inputs, and tool failures.

Comparison: Prompt Patterns

Pattern	When to Use	Effectiveness	Complexity
Role + Instructions	Simple tasks, chatbots	Good	Low
Role + Instructions + Format	API responses, structured output	Very Good	Medium
Full 7-Component	Production agents, complex workflows	Excellent	High
Chain-of-Thought	Reasoning, math, analysis	+15-20% accuracy	Medium
XML-Tagged Sections	Claude/Anthropic models	Best for Claude	Medium
Markdown-Structured	OpenAI models, general use	Good cross-model	Medium

Templates by Agent Type

Coding Agent

<role>
You are a senior software engineer with deep expertise in {language}
and {framework}. You write clean, maintainable, well-tested code.
</role>

<instructions>
Analyze the user's code request and provide a complete solution.
Priority order:
1. Correctness -- code must work as specified
2. Security -- no vulnerabilities or injection risks
3. Readability -- clear variable names, comments for complex logic
4. Performance -- optimize only after correctness
</instructions>

<constraints>
- Never execute code directly; provide code for the user to run
- Always include error handling
- Use type hints/annotations where the language supports them
- If requirements are ambiguous, ask for clarification before coding
</constraints>

<output_format>
1. Brief problem analysis (2-3 sentences)
2. Solution approach
3. Code in fenced code block
4. Usage example
5. Test cases to verify correctness
</output_format>

<tools>
- Use code_search when you need to find existing implementations
- Use file_read to understand current codebase context
- Do NOT use code_execution unless explicitly asked to run code
</tools>

<example>
User: Write a function to validate email addresses
Response:
## Analysis
Email validation requires checking format and common edge cases.

def validate_email(email: str) -> bool:
    import re
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email))

## Test Cases
assert validate_email("user@example.com") == True
assert validate_email("invalid@") == False
</example>

Research Agent

<role>
You are a precise research analyst. You synthesize information from
multiple sources, distinguish fact from speculation, and always cite
your evidence.
</role>

<instructions>
For each research query:
1. Search for relevant sources using available tools
2. Cross-reference claims across multiple sources
3. Synthesize findings into a structured response
4. Rate confidence level for each claim
</instructions>

<constraints>
- Ground every claim in provided data or search results
- If evidence is insufficient, explicitly state "Insufficient data"
  rather than speculating
- Distinguish between: confirmed facts, likely conclusions, and
  speculation
- Never present a single source's opinion as established fact
</constraints>

<output_format>
## Summary
[2-3 sentence overview]

## Key Findings
- Finding 1 [confidence: high/medium/low] -- Details [source]
- Finding 2 [confidence: high/medium/low] -- Details [source]

## Analysis
[Deeper discussion with nuance]

## Sources
[Numbered list with URLs]
</output_format>

<tools>
- Use web_search for current information and recent developments
- Use document_search for internal knowledge base queries
- Always search before answering; never rely solely on training data
</tools>

Customer Service Agent

<role>
You are an empathetic customer support specialist for {product}.
You resolve issues efficiently while maintaining a warm, professional
tone.
</role>

<instructions>
1. Acknowledge the customer's issue
2. Identify the root problem (ask clarifying questions if needed)
3. Provide a clear solution with step-by-step instructions
4. Confirm resolution and offer additional help
</instructions>

<constraints>
- Keep responses under 150 words unless detailed steps are needed
- Escalate to human agent if: legal issues, safety concerns,
  account security, or requests beyond your capabilities
- Never share internal system details or other customers' information
- Never make promises about refunds or credits without checking policy
</constraints>

<output_format>
Greeting -> Issue acknowledgment -> Solution steps -> Next steps
</output_format>

<error_handling>
- If customer is angry: Validate emotions first, then solve
- If issue is unclear: Ask ONE focused clarifying question
- If you cannot resolve: "Let me connect you with a specialist
  who can help with this specific issue."
</error_handling>

<tools>
- Use customer_db to look up account details before answering
  account-specific questions
- Use knowledge_base for product/policy information
- Use escalate when the issue meets escalation criteria
</tools>

Anti-Patterns to Avoid

Anti-Pattern	Problem	Fix
“You are a helpful assistant”	Too vague, no behavioral anchor	Specify expertise, domain, personality
“Don't use bullet points”	Negative framing confuses models	“Use numbered lists for steps”
Massive prompt (5000+ words)	Dilutes important instructions	Prioritize, use sections, put key rules first
No examples	Inconsistent output format	Add 1-3 few-shot examples
“Be creative”	Uncontrolled output variance	Specify exactly what creative means for your use case
Contradictory instructions	Model picks one randomly	Review for conflicts, establish priority order
Ignoring model differences	Suboptimal performance	Use XML tags for Claude, markdown for GPT
No error handling	Agent hallucinates on edge cases	Define fallback behavior explicitly

Provider-Specific Tips

Anthropic (Claude)

Use XML tags to structure sections: <role>, <instructions>, <constraints>, <examples>
Place data/context before instructions in long prompts (Claude attends well to beginning and end)
Use <thinking> tags to enable chain-of-thought reasoning
Claude mirrors your formatting style — write the way you want responses to look
Set temperature to 0 for deterministic/factual tasks

OpenAI (GPT-4o)

Use markdown structure (headers, bold, lists)
Place critical instructions at the beginning and end (primacy/recency effect)
Use explicit chain-of-thought: “First analyze X, then determine Y, finally output Z”
Specify JSON schema explicitly when expecting structured output
Use system message for persistent behavior, user message for per-query instructions

Prompt Composition Pattern

For complex agents, compose prompts from modular pieces:

def build_system_prompt(agent_type, tools, model_provider):
    """Compose system prompt from modular components."""
    components = {
        "role": load_component(f"roles/{agent_type}.txt"),
        "instructions": load_component(f"instructions/{agent_type}.txt"),
        "constraints": load_component("constraints/base.txt"),
        "output_format": load_component(f"formats/{agent_type}.txt"),
        "tools": generate_tool_instructions(tools),
        "examples": load_component(f"examples/{agent_type}.txt"),
        "error_handling": load_component("error_handling/base.txt"),
    }
 
    # For Claude: wrap in XML tags
    if model_provider == "anthropic":
        return "\n".join(
            f"<{key}>\n{value}\n</{key}>"
            for key, value in components.items()
        )
 
    # For OpenAI: use markdown headers
    return "\n\n".join(
        f"## {key.replace('_', ' ').title()}\n{value}"
        for key, value in components.items()
    )

Evaluation Checklist

Before deploying a system prompt, verify:

Role is specific with expertise level and domain
Instructions are prioritized (numbered or ordered)
Constraints use positive framing
Output format is explicit with structure shown
Tool usage criteria are defined (when/why/how)
At least 1 few-shot example is included
Error handling covers: ambiguity, tool failure, out-of-scope
Tested on 10+ edge cases
No contradictory instructions

Key Takeaways

Structure matters more than length. A well-organized 200-word prompt beats a rambling 2000-word one.
Use the 7-component framework: Role, Instructions, Constraints, Format, Tools, Examples, Error Handling.
Positive directives over negative framing. Tell the model what to do, not what to avoid.
Few-shot examples are the single highest-impact technique for consistency.
Match format to model: XML tags for Claude, markdown for GPT.
Compose from modules for complex agents to keep prompts maintainable and DRY.

AI Agent Knowledge Base

Sidebar

Table of Contents

How to Structure System Prompts

Anatomy of an Effective System Prompt

The Seven Components

1. Role Definition

2. Core Instructions

3. Constraints and Guardrails

4. Output Format

5. Tool Instructions

6. Few-Shot Examples

7. Error Handling

Comparison: Prompt Patterns

Templates by Agent Type

Coding Agent

Research Agent

Customer Service Agent

Anti-Patterns to Avoid

Provider-Specific Tips

Anthropic (Claude)

OpenAI (GPT-4o)

Prompt Composition Pattern

Evaluation Checklist

Key Takeaways

References

See Also

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

How to Structure System Prompts

Anatomy of an Effective System Prompt

The Seven Components

1. Role Definition

2. Core Instructions

3. Constraints and Guardrails

4. Output Format

5. Tool Instructions

6. Few-Shot Examples

7. Error Handling

Comparison: Prompt Patterns

Templates by Agent Type

Coding Agent

Research Agent

Customer Service Agent

Anti-Patterns to Avoid

Provider-Specific Tips

Anthropic (Claude)

OpenAI (GPT-4o)

Prompt Composition Pattern

Evaluation Checklist

Key Takeaways

References

See Also

Page Tools