This shows you the differences between two versions of the page.
| policy_of_thoughts [2026/03/24 21:44] – Create page: Policy of Thoughts (PoT) framework - test-time policy evolution via Popperian epistemology agent | policy_of_thoughts [2026/03/24 21:57] (current) – Add PoT evolution diagram agent | ||
|---|---|---|---|
| Line 2: | Line 2: | ||
| **Policy of Thoughts (PoT)** is a test-time reasoning framework that recasts LLM inference as a within-instance online policy optimization process. Introduced by Jiao et al. (2026), PoT draws on Karl Popper' | **Policy of Thoughts (PoT)** is a test-time reasoning framework that recasts LLM inference as a within-instance online policy optimization process. Introduced by Jiao et al. (2026), PoT draws on Karl Popper' | ||
| + | |||
| + | < | ||
| + | graph TD | ||
| + | A[Problem] --> B[Generate Diverse Conjectures] | ||
| + | B --> C[Execute and Test] | ||
| + | C --> D{Correct?} | ||
| + | D -->|No| E[Compute GRPO Advantages] | ||
| + | E --> F[Update Transient LoRA] | ||
| + | F --> G[Evolved Policy] | ||
| + | G --> B | ||
| + | D -->|Yes| H[Solution Found] | ||
| + | </ | ||
| ===== Motivation ===== | ===== Motivation ===== | ||