====== How to Write and Structure System Prompts ======

System prompts are the foundation of AI agent behavior. Knowing how to write system prompts effectively — and understanding system prompt composition — can improve output quality by 20-30%, reduce hallucinations, and ensure consistent, reliable agent behavior. This guide synthesizes best practices from Anthropic, OpenAI, and published prompt engineering research.(([[https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api|OpenAI - Best Practices for Prompt Engineering]]))(([[https://www.claude.com/blog/best-practices-for-prompt-engineering|Anthropic - Prompt Engineering Best Practices]]))(([[https://www.promptingguide.ai|DAIR.AI - Prompt Engineering Guide]]))(([[https://aws.amazon.com/blogs/machine-learning/prompt-engineering-techniques-and-best-practices-learn-by-doing-with-anthropics-claude-3-on-amazon-bedrock|AWS - Prompt Engineering with Claude on Bedrock]]))(([[https://www.digitalocean.com/resources/articles/prompt-engineering-best-practices|DigitalOcean - Prompt Engineering Best Practices]]))

===== Anatomy of an Effective System Prompt =====

Every production system prompt should contain these components in order of priority:

<mermaid>
graph TD
    A[System Prompt Structure] --> B[1. Role Definition]
    A --> C[2. Core Instructions]
    A --> D[3. Constraints and Guardrails]
    A --> E[4. Output Format]
    A --> F[5. Tool Instructions]
    A --> G[6. Few-Shot Examples]
    A --> H[7. Error Handling]
    B --> B1[Identity, expertise, persona]
    C --> C1[Primary task, goals, priorities]
    D --> D1[Boundaries, limitations, safety]
    E --> E1[Structure, format, length]
    F --> F1[When and how to use tools]
    G --> G1[Input-output pairs]
    H --> H1[Fallback behavior]
    style A fill:#2196F3,color:#fff
    style B fill:#4CAF50,color:#fff
    style C fill:#4CAF50,color:#fff
    style D fill:#FF9800,color:#fff
    style E fill:#FF9800,color:#fff
    style F fill:#9C27B0,color:#fff
    style G fill:#9C27B0,color:#fff
    style H fill:#9C27B0,color:#fff
</mermaid>

===== The Seven Components =====

=== 1. Role Definition ===

Anchor the agent's identity and expertise level. This sets the behavioral baseline for everything that follows.

**Pattern**: "You are a [expertise level] [role] specializing in [domain]. You [key behavioral trait]."

  * Good: "You are a senior backend engineer specializing in distributed systems. You prioritize correctness over cleverness."
  * Bad: "You are a helpful assistant." (too vague, no expertise anchor)

=== 2. Core Instructions ===

Define the primary task and priorities. Be specific about what the agent should do, not just what it is.

**Pattern**: State the mission, then rank priorities.

  * Good: "Your task is to review pull requests for security vulnerabilities. Priority order: (1) security flaws, (2) correctness bugs, (3) performance issues, (4) style."
  * Bad: "Help with code review." (no specifics, no priority ordering)

=== 3. Constraints and Guardrails ===

Set boundaries using **positive directives** (do X) rather than **negative framing** (don't do Y). Research shows models follow positive instructions more reliably.

^ Approach ^ Example ^ Effectiveness ^
| Positive (preferred) | "Respond in under 200 words" | More reliable |
| Negative (avoid) | "Don't write long responses" | Less reliable |
| Conditional | "If uncertain, say 'I need more information'" | Most precise |

=== 4. Output Format ===

Specify exactly how responses should be structured. This is critical for downstream parsing and consistency.

  * Use explicit format markers: JSON schema, markdown headers, numbered steps
  * Include field names and types when expecting structured data
  * Show the expected shape, not just "return JSON"

=== 5. Tool Instructions ===

Tell the agent when, why, and how to use each tool. Include decision criteria.

**Pattern**: "Use [tool] when [condition]. Format: [syntax]. Do not use [tool] for [anti-pattern]."

=== 6. Few-Shot Examples ===

Provide 1-3 input/output pairs that demonstrate desired behavior. These are the single most effective technique for improving consistency.

=== 7. Error Handling ===

Define fallback behavior for edge cases, ambiguous inputs, and tool failures.

===== Comparison: Prompt Patterns =====

^ Pattern ^ When to Use ^ Effectiveness ^ Complexity ^
| **Role + Instructions** | Simple tasks, chatbots | Good | Low |
| **Role + Instructions + Format** | API responses, structured output | Very Good | Medium |
| **Full 7-Component** | Production agents, complex workflows | Excellent | High |
| **Chain-of-Thought** | Reasoning, math, analysis | +15-20% accuracy | Medium |
| **XML-Tagged Sections** | Claude/Anthropic models | Best for Claude | Medium |
| **Markdown-Structured** | OpenAI models, general use | Good cross-model | Medium |

===== Templates by Agent Type =====

=== Coding Agent ===

<code>
<role>
You are a senior software engineer with deep expertise in {language}
and {framework}. You write clean, maintainable, well-tested code.
</role>

<instructions>
Analyze the user's code request and provide a complete solution.
Priority order:
1. Correctness -- code must work as specified
2. Security -- no vulnerabilities or injection risks
3. Readability -- clear variable names, comments for complex logic
4. Performance -- optimize only after correctness
</instructions>

<constraints>
- Never execute code directly; provide code for the user to run
- Always include error handling
- Use type hints/annotations where the language supports them
- If requirements are ambiguous, ask for clarification before coding
</constraints>

<output_format>
1. Brief problem analysis (2-3 sentences)
2. Solution approach
3. Code in fenced code block
4. Usage example
5. Test cases to verify correctness
</output_format>

<tools>
- Use code_search when you need to find existing implementations
- Use file_read to understand current codebase context
- Do NOT use code_execution unless explicitly asked to run code
</tools>

<example>
User: Write a function to validate email addresses
Response:
## Analysis
Email validation requires checking format and common edge cases.

def validate_email(email: str) -> bool:
    import re
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email))

## Test Cases
assert validate_email("user@example.com") == True
assert validate_email("invalid@") == False
</example>
</code>

=== Research Agent ===

<code>
<role>
You are a precise research analyst. You synthesize information from
multiple sources, distinguish fact from speculation, and always cite
your evidence.
</role>

<instructions>
For each research query:
1. Search for relevant sources using available tools
2. Cross-reference claims across multiple sources
3. Synthesize findings into a structured response
4. Rate confidence level for each claim
</instructions>

<constraints>
- Ground every claim in provided data or search results
- If evidence is insufficient, explicitly state "Insufficient data"
  rather than speculating
- Distinguish between: confirmed facts, likely conclusions, and
  speculation
- Never present a single source's opinion as established fact
</constraints>

<output_format>
## Summary
[2-3 sentence overview]

## Key Findings
- Finding 1 [confidence: high/medium/low] -- Details [source]
- Finding 2 [confidence: high/medium/low] -- Details [source]

## Analysis
[Deeper discussion with nuance]

## Sources
[Numbered list with URLs]
</output_format>

<tools>
- Use web_search for current information and recent developments
- Use document_search for internal knowledge base queries
- Always search before answering; never rely solely on training data
</tools>
</code>

=== Customer Service Agent ===

<code>
<role>
You are an empathetic customer support specialist for {product}.
You resolve issues efficiently while maintaining a warm, professional
tone.
</role>

<instructions>
1. Acknowledge the customer's issue
2. Identify the root problem (ask clarifying questions if needed)
3. Provide a clear solution with step-by-step instructions
4. Confirm resolution and offer additional help
</instructions>

<constraints>
- Keep responses under 150 words unless detailed steps are needed
- Escalate to human agent if: legal issues, safety concerns,
  account security, or requests beyond your capabilities
- Never share internal system details or other customers' information
- Never make promises about refunds or credits without checking policy
</constraints>

<output_format>
Greeting -> Issue acknowledgment -> Solution steps -> Next steps
</output_format>

<error_handling>
- If customer is angry: Validate emotions first, then solve
- If issue is unclear: Ask ONE focused clarifying question
- If you cannot resolve: "Let me connect you with a specialist
  who can help with this specific issue."
</error_handling>

<tools>
- Use customer_db to look up account details before answering
  account-specific questions
- Use knowledge_base for product/policy information
- Use escalate when the issue meets escalation criteria
</tools>
</code>

===== Anti-Patterns to Avoid =====

^ Anti-Pattern ^ Problem ^ Fix ^
| "You are a helpful assistant" | Too vague, no behavioral anchor | Specify expertise, domain, personality |
| "Don't use bullet points" | Negative framing confuses models | "Use numbered lists for steps" |
| Massive prompt (5000+ words) | Dilutes important instructions | Prioritize, use sections, put key rules first |
| No examples | Inconsistent output format | Add 1-3 few-shot examples |
| "Be creative" | Uncontrolled output variance | Specify exactly what creative means for your use case |
| Contradictory instructions | Model picks one randomly | Review for conflicts, establish priority order |
| Ignoring model differences | Suboptimal performance | Use XML tags for Claude, markdown for GPT |
| No error handling | Agent hallucinates on edge cases | Define fallback behavior explicitly |

===== Provider-Specific Tips =====

=== Anthropic (Claude) ===

  * Use **XML tags** to structure sections: ''<role>'', ''<instructions>'', ''<constraints>'', ''<examples>''
  * Place data/context **before** instructions in long prompts (Claude attends well to beginning and end)
  * Use ''<thinking>'' tags to enable chain-of-thought reasoning
  * Claude mirrors your formatting style — write the way you want responses to look
  * Set temperature to 0 for deterministic/factual tasks

=== OpenAI (GPT-4o) ===

  * Use **markdown** structure (headers, bold, lists)
  * Place critical instructions at the **beginning and end** (primacy/recency effect)
  * Use explicit chain-of-thought: "First analyze X, then determine Y, finally output Z"
  * Specify JSON schema explicitly when expecting structured output
  * Use system message for persistent behavior, user message for per-query instructions

===== Prompt Composition Pattern =====

For complex agents, compose prompts from modular pieces:

<code python>
def build_system_prompt(agent_type, tools, model_provider):
    """Compose system prompt from modular components."""
    components = {
        "role": load_component(f"roles/{agent_type}.txt"),
        "instructions": load_component(f"instructions/{agent_type}.txt"),
        "constraints": load_component("constraints/base.txt"),
        "output_format": load_component(f"formats/{agent_type}.txt"),
        "tools": generate_tool_instructions(tools),
        "examples": load_component(f"examples/{agent_type}.txt"),
        "error_handling": load_component("error_handling/base.txt"),
    }

    # For Claude: wrap in XML tags
    if model_provider == "anthropic":
        return "\n".join(
            f"<{key}>\n{value}\n</{key}>"
            for key, value in components.items()
        )

    # For OpenAI: use markdown headers
    return "\n\n".join(
        f"## {key.replace('_', ' ').title()}\n{value}"
        for key, value in components.items()
    )
</code>

===== Evaluation Checklist =====

Before deploying a system prompt, verify:

  * Role is specific with expertise level and domain
  * Instructions are prioritized (numbered or ordered)
  * Constraints use positive framing
  * Output format is explicit with structure shown
  * Tool usage criteria are defined (when/why/how)
  * At least 1 few-shot example is included
  * Error handling covers: ambiguity, tool failure, out-of-scope
  * Tested on 10+ edge cases
  * No contradictory instructions

===== Key Takeaways =====

  - **Structure matters more than length**. A well-organized 200-word prompt beats a rambling 2000-word one.
  - **Use the 7-component framework**: Role, Instructions, Constraints, Format, Tools, Examples, Error Handling.
  - **Positive directives over negative framing**. Tell the model what to do, not what to avoid.
  - **Few-shot examples** are the single highest-impact technique for consistency.
  - **Match format to model**: XML tags for Claude, markdown for GPT.
  - **Compose from modules** for complex agents to keep prompts maintainable and DRY.

===== See Also =====

  * [[when_to_use_rag_vs_fine_tuning|When to Use RAG vs Fine-Tuning]] — Choose knowledge approach for your agent
  * [[single_vs_multi_agent|Single vs Multi-Agent Architectures]] — Agent design patterns
  * [[how_to_choose_chunk_size|How to Choose Chunk Size]] — Optimize RAG for your prompts

===== References =====