This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| deepeval [2026/03/25 14:51] – Create page with researched content agent | deepeval [2026/03/30 22:20] (current) – Restructure: footnotes as references agent | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== DeepEval ====== | + | ====== DeepEval ======(([[https:// |
| - | **DeepEval** is an open-source LLM evaluation framework by **Confident AI** that brings unit-test style testing to AI applications. With over **14,000 stars** on GitHub, it integrates with Pytest to let developers write test cases for LLM outputs — catching regressions, | + | **DeepEval** is an open-source LLM evaluation framework by **Confident AI** that brings unit-test style testing to AI applications.(([[https:// |
| DeepEval treats LLM interactions as testable units, mirroring the rigor of software engineering testing practices. Each interaction becomes a test case with inputs, outputs, and measurable assertions — enabling teams to ship AI features with the same confidence they ship traditional code. | DeepEval treats LLM interactions as testable units, mirroring the rigor of software engineering testing practices. Each interaction becomes a test case with inputs, outputs, and measurable assertions — enabling teams to ship AI features with the same confidence they ship traditional code. | ||
| Line 13: | Line 13: | ||
| ===== Key Features ===== | ===== Key Features ===== | ||
| - | * **Pytest integration** — Write LLM tests with familiar '' | + | * **Pytest integration** — Write LLM tests with familiar '' |
| * **Pre-built metrics** — Faithfulness, | * **Pre-built metrics** — Faithfulness, | ||
| * **Custom metrics (G-Eval)** — LLM-as-judge with custom criteria and evaluation steps | * **Custom metrics (G-Eval)** — LLM-as-judge with custom criteria and evaluation steps | ||
| Line 102: | Line 102: | ||
| | Safety | Hallucination, | | Safety | Hallucination, | ||
| | Custom | G-Eval, RAGAS | LLM-as-judge with custom criteria | | | Custom | G-Eval, RAGAS | LLM-as-judge with custom criteria | | ||
| - | |||
| - | ===== References ===== | ||
| - | |||
| - | * [[https:// | ||
| - | * [[https:// | ||
| - | * [[https:// | ||
| - | * [[https:// | ||
| - | * [[https:// | ||
| ===== See Also ===== | ===== See Also ===== | ||
| Line 117: | Line 109: | ||
| * [[guidance|Guidance — Structured Generation Language]] | * [[guidance|Guidance — Structured Generation Language]] | ||
| + | ===== References ===== | ||