====== How to Write and Structure System Prompts ======
System prompts are the foundation of AI agent behavior. Knowing how to write system prompts effectively — and understanding system prompt composition — can improve output quality by 20-30%, reduce hallucinations, and ensure consistent, reliable agent behavior. This guide synthesizes best practices from Anthropic, OpenAI, and published prompt engineering research.(([[https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api|OpenAI - Best Practices for Prompt Engineering]]))(([[https://www.claude.com/blog/best-practices-for-prompt-engineering|Anthropic - Prompt Engineering Best Practices]]))(([[https://www.promptingguide.ai|DAIR.AI - Prompt Engineering Guide]]))(([[https://aws.amazon.com/blogs/machine-learning/prompt-engineering-techniques-and-best-practices-learn-by-doing-with-anthropics-claude-3-on-amazon-bedrock|AWS - Prompt Engineering with Claude on Bedrock]]))(([[https://www.digitalocean.com/resources/articles/prompt-engineering-best-practices|DigitalOcean - Prompt Engineering Best Practices]]))
===== Anatomy of an Effective System Prompt =====
Every production system prompt should contain these components in order of priority:
graph TD
A[System Prompt Structure] --> B[1. Role Definition]
A --> C[2. Core Instructions]
A --> D[3. Constraints and Guardrails]
A --> E[4. Output Format]
A --> F[5. Tool Instructions]
A --> G[6. Few-Shot Examples]
A --> H[7. Error Handling]
B --> B1[Identity, expertise, persona]
C --> C1[Primary task, goals, priorities]
D --> D1[Boundaries, limitations, safety]
E --> E1[Structure, format, length]
F --> F1[When and how to use tools]
G --> G1[Input-output pairs]
H --> H1[Fallback behavior]
style A fill:#2196F3,color:#fff
style B fill:#4CAF50,color:#fff
style C fill:#4CAF50,color:#fff
style D fill:#FF9800,color:#fff
style E fill:#FF9800,color:#fff
style F fill:#9C27B0,color:#fff
style G fill:#9C27B0,color:#fff
style H fill:#9C27B0,color:#fff
===== The Seven Components =====
=== 1. Role Definition ===
Anchor the agent's identity and expertise level. This sets the behavioral baseline for everything that follows.
**Pattern**: "You are a [expertise level] [role] specializing in [domain]. You [key behavioral trait]."
* Good: "You are a senior backend engineer specializing in distributed systems. You prioritize correctness over cleverness."
* Bad: "You are a helpful assistant." (too vague, no expertise anchor)
=== 2. Core Instructions ===
Define the primary task and priorities. Be specific about what the agent should do, not just what it is.
**Pattern**: State the mission, then rank priorities.
* Good: "Your task is to review pull requests for security vulnerabilities. Priority order: (1) security flaws, (2) correctness bugs, (3) performance issues, (4) style."
* Bad: "Help with code review." (no specifics, no priority ordering)
=== 3. Constraints and Guardrails ===
Set boundaries using **positive directives** (do X) rather than **negative framing** (don't do Y). Research shows models follow positive instructions more reliably.
^ Approach ^ Example ^ Effectiveness ^
| Positive (preferred) | "Respond in under 200 words" | More reliable |
| Negative (avoid) | "Don't write long responses" | Less reliable |
| Conditional | "If uncertain, say 'I need more information'" | Most precise |
=== 4. Output Format ===
Specify exactly how responses should be structured. This is critical for downstream parsing and consistency.
* Use explicit format markers: JSON schema, markdown headers, numbered steps
* Include field names and types when expecting structured data
* Show the expected shape, not just "return JSON"
=== 5. Tool Instructions ===
Tell the agent when, why, and how to use each tool. Include decision criteria.
**Pattern**: "Use [tool] when [condition]. Format: [syntax]. Do not use [tool] for [anti-pattern]."
=== 6. Few-Shot Examples ===
Provide 1-3 input/output pairs that demonstrate desired behavior. These are the single most effective technique for improving consistency.
=== 7. Error Handling ===
Define fallback behavior for edge cases, ambiguous inputs, and tool failures.
===== Comparison: Prompt Patterns =====
^ Pattern ^ When to Use ^ Effectiveness ^ Complexity ^
| **Role + Instructions** | Simple tasks, chatbots | Good | Low |
| **Role + Instructions + Format** | API responses, structured output | Very Good | Medium |
| **Full 7-Component** | Production agents, complex workflows | Excellent | High |
| **Chain-of-Thought** | Reasoning, math, analysis | +15-20% accuracy | Medium |
| **XML-Tagged Sections** | Claude/Anthropic models | Best for Claude | Medium |
| **Markdown-Structured** | OpenAI models, general use | Good cross-model | Medium |
===== Templates by Agent Type =====
=== Coding Agent ===
You are a senior software engineer with deep expertise in {language}
and {framework}. You write clean, maintainable, well-tested code.
Analyze the user's code request and provide a complete solution.
Priority order:
1. Correctness -- code must work as specified
2. Security -- no vulnerabilities or injection risks
3. Readability -- clear variable names, comments for complex logic
4. Performance -- optimize only after correctness
- Never execute code directly; provide code for the user to run
- Always include error handling
- Use type hints/annotations where the language supports them
- If requirements are ambiguous, ask for clarification before coding
1. Brief problem analysis (2-3 sentences)
2. Solution approach
3. Code in fenced code block
4. Usage example
5. Test cases to verify correctness
- Use code_search when you need to find existing implementations
- Use file_read to understand current codebase context
- Do NOT use code_execution unless explicitly asked to run code
User: Write a function to validate email addresses
Response:
## Analysis
Email validation requires checking format and common edge cases.
def validate_email(email: str) -> bool:
import re
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
return bool(re.match(pattern, email))
## Test Cases
assert validate_email("user@example.com") == True
assert validate_email("invalid@") == False
=== Research Agent ===
You are a precise research analyst. You synthesize information from
multiple sources, distinguish fact from speculation, and always cite
your evidence.
For each research query:
1. Search for relevant sources using available tools
2. Cross-reference claims across multiple sources
3. Synthesize findings into a structured response
4. Rate confidence level for each claim
- Ground every claim in provided data or search results
- If evidence is insufficient, explicitly state "Insufficient data"
rather than speculating
- Distinguish between: confirmed facts, likely conclusions, and
speculation
- Never present a single source's opinion as established fact
## Summary
[2-3 sentence overview]
## Key Findings
- Finding 1 [confidence: high/medium/low] -- Details [source]
- Finding 2 [confidence: high/medium/low] -- Details [source]
## Analysis
[Deeper discussion with nuance]
## Sources
[Numbered list with URLs]
- Use web_search for current information and recent developments
- Use document_search for internal knowledge base queries
- Always search before answering; never rely solely on training data
=== Customer Service Agent ===
You are an empathetic customer support specialist for {product}.
You resolve issues efficiently while maintaining a warm, professional
tone.
1. Acknowledge the customer's issue
2. Identify the root problem (ask clarifying questions if needed)
3. Provide a clear solution with step-by-step instructions
4. Confirm resolution and offer additional help
- Keep responses under 150 words unless detailed steps are needed
- Escalate to human agent if: legal issues, safety concerns,
account security, or requests beyond your capabilities
- Never share internal system details or other customers' information
- Never make promises about refunds or credits without checking policy
Greeting -> Issue acknowledgment -> Solution steps -> Next steps
- If customer is angry: Validate emotions first, then solve
- If issue is unclear: Ask ONE focused clarifying question
- If you cannot resolve: "Let me connect you with a specialist
who can help with this specific issue."
- Use customer_db to look up account details before answering
account-specific questions
- Use knowledge_base for product/policy information
- Use escalate when the issue meets escalation criteria
===== Anti-Patterns to Avoid =====
^ Anti-Pattern ^ Problem ^ Fix ^
| "You are a helpful assistant" | Too vague, no behavioral anchor | Specify expertise, domain, personality |
| "Don't use bullet points" | Negative framing confuses models | "Use numbered lists for steps" |
| Massive prompt (5000+ words) | Dilutes important instructions | Prioritize, use sections, put key rules first |
| No examples | Inconsistent output format | Add 1-3 few-shot examples |
| "Be creative" | Uncontrolled output variance | Specify exactly what creative means for your use case |
| Contradictory instructions | Model picks one randomly | Review for conflicts, establish priority order |
| Ignoring model differences | Suboptimal performance | Use XML tags for Claude, markdown for GPT |
| No error handling | Agent hallucinates on edge cases | Define fallback behavior explicitly |
===== Provider-Specific Tips =====
=== Anthropic (Claude) ===
* Use **XML tags** to structure sections: '''', '''', '''', ''''
* Place data/context **before** instructions in long prompts (Claude attends well to beginning and end)
* Use '''' tags to enable chain-of-thought reasoning
* Claude mirrors your formatting style — write the way you want responses to look
* Set temperature to 0 for deterministic/factual tasks
=== OpenAI (GPT-4o) ===
* Use **markdown** structure (headers, bold, lists)
* Place critical instructions at the **beginning and end** (primacy/recency effect)
* Use explicit chain-of-thought: "First analyze X, then determine Y, finally output Z"
* Specify JSON schema explicitly when expecting structured output
* Use system message for persistent behavior, user message for per-query instructions
===== Prompt Composition Pattern =====
For complex agents, compose prompts from modular pieces:
def build_system_prompt(agent_type, tools, model_provider):
"""Compose system prompt from modular components."""
components = {
"role": load_component(f"roles/{agent_type}.txt"),
"instructions": load_component(f"instructions/{agent_type}.txt"),
"constraints": load_component("constraints/base.txt"),
"output_format": load_component(f"formats/{agent_type}.txt"),
"tools": generate_tool_instructions(tools),
"examples": load_component(f"examples/{agent_type}.txt"),
"error_handling": load_component("error_handling/base.txt"),
}
# For Claude: wrap in XML tags
if model_provider == "anthropic":
return "\n".join(
f"<{key}>\n{value}\n{key}>"
for key, value in components.items()
)
# For OpenAI: use markdown headers
return "\n\n".join(
f"## {key.replace('_', ' ').title()}\n{value}"
for key, value in components.items()
)
===== Evaluation Checklist =====
Before deploying a system prompt, verify:
* Role is specific with expertise level and domain
* Instructions are prioritized (numbered or ordered)
* Constraints use positive framing
* Output format is explicit with structure shown
* Tool usage criteria are defined (when/why/how)
* At least 1 few-shot example is included
* Error handling covers: ambiguity, tool failure, out-of-scope
* Tested on 10+ edge cases
* No contradictory instructions
===== Key Takeaways =====
- **Structure matters more than length**. A well-organized 200-word prompt beats a rambling 2000-word one.
- **Use the 7-component framework**: Role, Instructions, Constraints, Format, Tools, Examples, Error Handling.
- **Positive directives over negative framing**. Tell the model what to do, not what to avoid.
- **Few-shot examples** are the single highest-impact technique for consistency.
- **Match format to model**: XML tags for Claude, markdown for GPT.
- **Compose from modules** for complex agents to keep prompts maintainable and DRY.
===== See Also =====
* [[when_to_use_rag_vs_fine_tuning|When to Use RAG vs Fine-Tuning]] — Choose knowledge approach for your agent
* [[single_vs_multi_agent|Single vs Multi-Agent Architectures]] — Agent design patterns
* [[how_to_choose_chunk_size|How to Choose Chunk Size]] — Optimize RAG for your prompts
===== References =====