This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| promptfoo [2026/03/25 14:51] – Create page with researched content agent | promptfoo [2026/03/30 22:16] (current) – Restructure: footnotes as references agent | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Promptfoo ====== | ====== Promptfoo ====== | ||
| - | **Promptfoo** is an open-source CLI tool and library for testing, evaluating, and red teaming LLM applications. With over **18,000 stars** on GitHub, it is used by organizations including OpenAI and Anthropic to systematically validate prompt quality, detect regressions, | + | **Promptfoo** is an open-source CLI tool and library for testing, evaluating, and red teaming LLM applications.((https:// |
| - | Promptfoo brings software testing rigor to LLM development with YAML-based configuration, | + | Promptfoo brings software testing rigor to LLM development with YAML-based configuration, |
| ===== How It Works ===== | ===== How It Works ===== | ||
| - | Promptfoo uses a declarative configuration approach. You define **providers** (LLM endpoints), **prompts** (templates with variables), and **tests** (input/ | + | Promptfoo uses a declarative configuration approach. You define **providers** (LLM endpoints), **prompts** (templates with variables), and **tests** (input/ |
| - | For red teaming, Promptfoo ships with 100+ attack plugins that probe for vulnerabilities like prompt injection, PII exposure, excessive agency, and more — integrating directly into GitHub Actions to scan PRs automatically. | + | For red teaming, Promptfoo ships with 100+ attack plugins that probe for vulnerabilities like prompt injection, PII exposure, excessive agency, and more — integrating directly into GitHub Actions to scan PRs automatically.((https:// |
| ===== Key Features ===== | ===== Key Features ===== | ||
| Line 15: | Line 15: | ||
| * **Automated evaluation** — Define test cases with assertions, thresholds, and scoring | * **Automated evaluation** — Define test cases with assertions, thresholds, and scoring | ||
| * **Side-by-side comparison** — Compare prompt and model versions to catch regressions | * **Side-by-side comparison** — Compare prompt and model versions to catch regressions | ||
| - | * **100+ red teaming plugins** — Prompt injection, PII exposure, excessive agency scanning | + | * **100+ red teaming plugins** — Prompt injection, PII exposure, excessive agency scanning((https:// |
| * **Multi-provider support** — OpenAI, Anthropic, Azure, Google, local models, custom APIs | * **Multi-provider support** — OpenAI, Anthropic, Azure, Google, local models, custom APIs | ||
| - | * **CI/CD integration** — GitHub Actions for automated PR evaluation and security scanning | + | * **CI/CD integration** — GitHub Actions for automated PR evaluation and security scanning((https:// |
| * **Output formats** — CLI, web UI, CSV, JSON, YAML, HTML exports | * **Output formats** — CLI, web UI, CSV, JSON, YAML, HTML exports | ||
| * **Caching** — Reusable LLM request cache for speed and cost savings | * **Caching** — Reusable LLM request cache for speed and cost savings | ||
| Line 115: | Line 115: | ||
| * **Toxicity** — Harmful content generation testing | * **Toxicity** — Harmful content generation testing | ||
| * **Jailbreaking** — Safety bypass attempts | * **Jailbreaking** — Safety bypass attempts | ||
| - | |||
| - | ===== References ===== | ||
| - | |||
| - | * [[https:// | ||
| - | * [[https:// | ||
| - | * [[https:// | ||
| - | * [[https:// | ||
| ===== See Also ===== | ===== See Also ===== | ||
| Line 129: | Line 122: | ||
| * [[outlines|Outlines — Structured Output via Constrained Decoding]] | * [[outlines|Outlines — Structured Output via Constrained Decoding]] | ||
| * [[arize_phoenix|Arize Phoenix — AI Observability]] | * [[arize_phoenix|Arize Phoenix — AI Observability]] | ||
| + | |||
| + | ===== References ===== | ||