Promptfoo is an open-source CLI tool and library for testing, evaluating, and red teaming LLM applications.1) With over 18,000 stars on GitHub, it is used by organizations including OpenAI and Anthropic to systematically validate prompt quality, detect regressions, and scan for security vulnerabilities like prompt injection and PII exposure.
Promptfoo brings software testing rigor to LLM development with YAML-based configuration, side-by-side model comparisons, 100+ red teaming attack plugins, and native GitHub Actions integration for CI/CD pipelines.2)
Promptfoo uses a declarative configuration approach. You define providers (LLM endpoints), prompts (templates with variables), and tests (input/output assertions) in a YAML config file. The tool runs each prompt through each provider with all test cases, applies assertions to score outputs pass/fail, and generates comparison reports.3)
For red teaming, Promptfoo ships with 100+ attack plugins that probe for vulnerabilities like prompt injection, PII exposure, excessive agency, and more — integrating directly into GitHub Actions to scan PRs automatically.4)
# Install Promptfoo # npm install -g promptfoo # or use npx: npx promptfoo@latest init # promptfooconfig.yaml - Example evaluation config # providers: # - openai:gpt-4o # - anthropic:messages:claude-3-5-sonnet-20241022 # # prompts: # - "Summarize this text in {{style}} style: {{text}}" # - "Write a {{style}} summary of: {{text}}" # # tests: # - vars: # text: "The quick brown fox jumps over the lazy dog" # style: "professional" # assert: # - type: contains # value: "fox" # - type: llm-rubric # value: "Response should be professional in tone" # - type: javascript # value: "output.length < 200" # - vars: # text: "Machine learning is transforming healthcare" # style: "casual" # assert: # - type: similar # value: "ML is changing medicine" # threshold: 0.7 # Run evaluation # npx promptfoo@latest eval # Run red teaming scan # npx promptfoo@latest redteam run -c config-id -t target-id -o results.json # Python provider example for custom logic # custom_provider.py import json def call_api(prompt, options, context): # Your custom LLM logic here config = options.get("config", {}) response = your_llm_call(prompt, **config) return { "output": response.text, "tokenUsage": { "prompt": response.prompt_tokens, "completion": response.completion_tokens, "total": response.total_tokens } }
%%{init: {'theme': 'dark'}}%%
graph TB
Dev([Developer]) -->|YAML Config| Config[promptfooconfig.yaml]
Config -->|Providers| Providers{LLM Providers}
Config -->|Prompts| Prompts[Prompt Templates]
Config -->|Tests| Tests[Test Cases + Assertions]
Providers -->|API Calls| OpenAI[OpenAI]
Providers -->|API Calls| Anthropic[Anthropic]
Providers -->|API Calls| Local[Local Models]
Providers -->|API Calls| Custom[Custom Providers]
EvalEngine[Evaluation Engine] -->|Run| Providers
EvalEngine -->|Substitute| Prompts
EvalEngine -->|Assert| Tests
Tests -->|Scoring| Results[Results Report]
Results -->|Web UI| WebUI[Interactive Dashboard]
Results -->|Export| Export[CSV / JSON / HTML]
RedTeam[Red Team Engine] -->|100+ Plugins| Attacks[Attack Scenarios]
Attacks -->|Scan| Providers
Attacks -->|Results| Security[Security Report]
GHA[GitHub Actions] -->|Trigger| EvalEngine
GHA -->|Trigger| RedTeam
GHA -->|PR Comments| PR[Pull Request]