====== How to Write and Structure System Prompts ====== System prompts are the foundation of AI agent behavior. Knowing how to write system prompts effectively — and understanding system prompt composition — can improve output quality by 20-30%, reduce hallucinations, and ensure consistent, reliable agent behavior. This guide synthesizes best practices from Anthropic, OpenAI, and published prompt engineering research.(([[https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api|OpenAI - Best Practices for Prompt Engineering]]))(([[https://www.claude.com/blog/best-practices-for-prompt-engineering|Anthropic - Prompt Engineering Best Practices]]))(([[https://www.promptingguide.ai|DAIR.AI - Prompt Engineering Guide]]))(([[https://aws.amazon.com/blogs/machine-learning/prompt-engineering-techniques-and-best-practices-learn-by-doing-with-anthropics-claude-3-on-amazon-bedrock|AWS - Prompt Engineering with Claude on Bedrock]]))(([[https://www.digitalocean.com/resources/articles/prompt-engineering-best-practices|DigitalOcean - Prompt Engineering Best Practices]])) ===== Anatomy of an Effective System Prompt ===== Every production system prompt should contain these components in order of priority: graph TD A[System Prompt Structure] --> B[1. Role Definition] A --> C[2. Core Instructions] A --> D[3. Constraints and Guardrails] A --> E[4. Output Format] A --> F[5. Tool Instructions] A --> G[6. Few-Shot Examples] A --> H[7. Error Handling] B --> B1[Identity, expertise, persona] C --> C1[Primary task, goals, priorities] D --> D1[Boundaries, limitations, safety] E --> E1[Structure, format, length] F --> F1[When and how to use tools] G --> G1[Input-output pairs] H --> H1[Fallback behavior] style A fill:#2196F3,color:#fff style B fill:#4CAF50,color:#fff style C fill:#4CAF50,color:#fff style D fill:#FF9800,color:#fff style E fill:#FF9800,color:#fff style F fill:#9C27B0,color:#fff style G fill:#9C27B0,color:#fff style H fill:#9C27B0,color:#fff ===== The Seven Components ===== === 1. Role Definition === Anchor the agent's identity and expertise level. This sets the behavioral baseline for everything that follows. **Pattern**: "You are a [expertise level] [role] specializing in [domain]. You [key behavioral trait]." * Good: "You are a senior backend engineer specializing in distributed systems. You prioritize correctness over cleverness." * Bad: "You are a helpful assistant." (too vague, no expertise anchor) === 2. Core Instructions === Define the primary task and priorities. Be specific about what the agent should do, not just what it is. **Pattern**: State the mission, then rank priorities. * Good: "Your task is to review pull requests for security vulnerabilities. Priority order: (1) security flaws, (2) correctness bugs, (3) performance issues, (4) style." * Bad: "Help with code review." (no specifics, no priority ordering) === 3. Constraints and Guardrails === Set boundaries using **positive directives** (do X) rather than **negative framing** (don't do Y). Research shows models follow positive instructions more reliably. ^ Approach ^ Example ^ Effectiveness ^ | Positive (preferred) | "Respond in under 200 words" | More reliable | | Negative (avoid) | "Don't write long responses" | Less reliable | | Conditional | "If uncertain, say 'I need more information'" | Most precise | === 4. Output Format === Specify exactly how responses should be structured. This is critical for downstream parsing and consistency. * Use explicit format markers: JSON schema, markdown headers, numbered steps * Include field names and types when expecting structured data * Show the expected shape, not just "return JSON" === 5. Tool Instructions === Tell the agent when, why, and how to use each tool. Include decision criteria. **Pattern**: "Use [tool] when [condition]. Format: [syntax]. Do not use [tool] for [anti-pattern]." === 6. Few-Shot Examples === Provide 1-3 input/output pairs that demonstrate desired behavior. These are the single most effective technique for improving consistency. === 7. Error Handling === Define fallback behavior for edge cases, ambiguous inputs, and tool failures. ===== Comparison: Prompt Patterns ===== ^ Pattern ^ When to Use ^ Effectiveness ^ Complexity ^ | **Role + Instructions** | Simple tasks, chatbots | Good | Low | | **Role + Instructions + Format** | API responses, structured output | Very Good | Medium | | **Full 7-Component** | Production agents, complex workflows | Excellent | High | | **Chain-of-Thought** | Reasoning, math, analysis | +15-20% accuracy | Medium | | **XML-Tagged Sections** | Claude/Anthropic models | Best for Claude | Medium | | **Markdown-Structured** | OpenAI models, general use | Good cross-model | Medium | ===== Templates by Agent Type ===== === Coding Agent === You are a senior software engineer with deep expertise in {language} and {framework}. You write clean, maintainable, well-tested code. Analyze the user's code request and provide a complete solution. Priority order: 1. Correctness -- code must work as specified 2. Security -- no vulnerabilities or injection risks 3. Readability -- clear variable names, comments for complex logic 4. Performance -- optimize only after correctness - Never execute code directly; provide code for the user to run - Always include error handling - Use type hints/annotations where the language supports them - If requirements are ambiguous, ask for clarification before coding 1. Brief problem analysis (2-3 sentences) 2. Solution approach 3. Code in fenced code block 4. Usage example 5. Test cases to verify correctness - Use code_search when you need to find existing implementations - Use file_read to understand current codebase context - Do NOT use code_execution unless explicitly asked to run code User: Write a function to validate email addresses Response: ## Analysis Email validation requires checking format and common edge cases. def validate_email(email: str) -> bool: import re pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' return bool(re.match(pattern, email)) ## Test Cases assert validate_email("user@example.com") == True assert validate_email("invalid@") == False === Research Agent === You are a precise research analyst. You synthesize information from multiple sources, distinguish fact from speculation, and always cite your evidence. For each research query: 1. Search for relevant sources using available tools 2. Cross-reference claims across multiple sources 3. Synthesize findings into a structured response 4. Rate confidence level for each claim - Ground every claim in provided data or search results - If evidence is insufficient, explicitly state "Insufficient data" rather than speculating - Distinguish between: confirmed facts, likely conclusions, and speculation - Never present a single source's opinion as established fact ## Summary [2-3 sentence overview] ## Key Findings - Finding 1 [confidence: high/medium/low] -- Details [source] - Finding 2 [confidence: high/medium/low] -- Details [source] ## Analysis [Deeper discussion with nuance] ## Sources [Numbered list with URLs] - Use web_search for current information and recent developments - Use document_search for internal knowledge base queries - Always search before answering; never rely solely on training data === Customer Service Agent === You are an empathetic customer support specialist for {product}. You resolve issues efficiently while maintaining a warm, professional tone. 1. Acknowledge the customer's issue 2. Identify the root problem (ask clarifying questions if needed) 3. Provide a clear solution with step-by-step instructions 4. Confirm resolution and offer additional help - Keep responses under 150 words unless detailed steps are needed - Escalate to human agent if: legal issues, safety concerns, account security, or requests beyond your capabilities - Never share internal system details or other customers' information - Never make promises about refunds or credits without checking policy Greeting -> Issue acknowledgment -> Solution steps -> Next steps - If customer is angry: Validate emotions first, then solve - If issue is unclear: Ask ONE focused clarifying question - If you cannot resolve: "Let me connect you with a specialist who can help with this specific issue." - Use customer_db to look up account details before answering account-specific questions - Use knowledge_base for product/policy information - Use escalate when the issue meets escalation criteria ===== Anti-Patterns to Avoid ===== ^ Anti-Pattern ^ Problem ^ Fix ^ | "You are a helpful assistant" | Too vague, no behavioral anchor | Specify expertise, domain, personality | | "Don't use bullet points" | Negative framing confuses models | "Use numbered lists for steps" | | Massive prompt (5000+ words) | Dilutes important instructions | Prioritize, use sections, put key rules first | | No examples | Inconsistent output format | Add 1-3 few-shot examples | | "Be creative" | Uncontrolled output variance | Specify exactly what creative means for your use case | | Contradictory instructions | Model picks one randomly | Review for conflicts, establish priority order | | Ignoring model differences | Suboptimal performance | Use XML tags for Claude, markdown for GPT | | No error handling | Agent hallucinates on edge cases | Define fallback behavior explicitly | ===== Provider-Specific Tips ===== === Anthropic (Claude) === * Use **XML tags** to structure sections: '''', '''', '''', '''' * Place data/context **before** instructions in long prompts (Claude attends well to beginning and end) * Use '''' tags to enable chain-of-thought reasoning * Claude mirrors your formatting style — write the way you want responses to look * Set temperature to 0 for deterministic/factual tasks === OpenAI (GPT-4o) === * Use **markdown** structure (headers, bold, lists) * Place critical instructions at the **beginning and end** (primacy/recency effect) * Use explicit chain-of-thought: "First analyze X, then determine Y, finally output Z" * Specify JSON schema explicitly when expecting structured output * Use system message for persistent behavior, user message for per-query instructions ===== Prompt Composition Pattern ===== For complex agents, compose prompts from modular pieces: def build_system_prompt(agent_type, tools, model_provider): """Compose system prompt from modular components.""" components = { "role": load_component(f"roles/{agent_type}.txt"), "instructions": load_component(f"instructions/{agent_type}.txt"), "constraints": load_component("constraints/base.txt"), "output_format": load_component(f"formats/{agent_type}.txt"), "tools": generate_tool_instructions(tools), "examples": load_component(f"examples/{agent_type}.txt"), "error_handling": load_component("error_handling/base.txt"), } # For Claude: wrap in XML tags if model_provider == "anthropic": return "\n".join( f"<{key}>\n{value}\n" for key, value in components.items() ) # For OpenAI: use markdown headers return "\n\n".join( f"## {key.replace('_', ' ').title()}\n{value}" for key, value in components.items() ) ===== Evaluation Checklist ===== Before deploying a system prompt, verify: * Role is specific with expertise level and domain * Instructions are prioritized (numbered or ordered) * Constraints use positive framing * Output format is explicit with structure shown * Tool usage criteria are defined (when/why/how) * At least 1 few-shot example is included * Error handling covers: ambiguity, tool failure, out-of-scope * Tested on 10+ edge cases * No contradictory instructions ===== Key Takeaways ===== - **Structure matters more than length**. A well-organized 200-word prompt beats a rambling 2000-word one. - **Use the 7-component framework**: Role, Instructions, Constraints, Format, Tools, Examples, Error Handling. - **Positive directives over negative framing**. Tell the model what to do, not what to avoid. - **Few-shot examples** are the single highest-impact technique for consistency. - **Match format to model**: XML tags for Claude, markdown for GPT. - **Compose from modules** for complex agents to keep prompts maintainable and DRY. ===== See Also ===== * [[when_to_use_rag_vs_fine_tuning|When to Use RAG vs Fine-Tuning]] — Choose knowledge approach for your agent * [[single_vs_multi_agent|Single vs Multi-Agent Architectures]] — Agent design patterns * [[how_to_choose_chunk_size|How to Choose Chunk Size]] — Optimize RAG for your prompts ===== References =====