Active-Prompt is a method that enhances large language model reasoning by combining chain-of-thought (CoT) prompting with active learning principles. Rather than using fixed, human-annotated exemplars for all tasks, Active-Prompt dynamically selects the most uncertain examples from a pool for human annotation, creating task-adapted CoT demonstrations.1)
Traditional CoT prompting relies on a fixed set of hand-crafted exemplars applied uniformly across all tasks.2) This approach has a fundamental limitation: fixed exemplars may not be well-suited for every task or question type. Active-Prompt addresses this by adaptively choosing which questions to annotate, focusing annotation effort where it matters most.
Active-Prompt follows a four-step iterative process:
Given a pool of unlabeled questions, the model generates multiple responses per question through repeated sampling. Uncertainty is measured using one of three metrics:
Disagreement and entropy proved most effective in experiments.3)
The top-k most uncertain questions are selected from the pool. These are the questions where the model struggles most and would benefit most from human-annotated demonstrations.
Human annotators provide chain-of-thought reasoning chains for the selected uncertain questions. This creates task-specific exemplars that address the model's actual weaknesses.
The annotated exemplars are used as few-shot demonstrations in prompts for new test questions. The process can be repeated iteratively for further refinement.
This forms an adaptive prompting loop where prompts evolve based on high-value, uncertainty-driven annotations rather than static example sets.
Active-Prompt was evaluated on eight complex reasoning tasks across arithmetic, commonsense, and symbolic reasoning:4)
| Category | Example Tasks | Performance |
| Arithmetic | MultiArith, GSM8K | Outperforms CoT and Self-Consistency |
| Commonsense | CommonsenseQA, StrategyQA | Best results across benchmarks |
| Symbolic | Coin Flip, Letter Concatenation | Superior on uncertain examples |
Active-Prompt achieved state-of-the-art results on the tested tasks, consistently surpassing standard CoT, Self-Consistency, and Auto-CoT methods.
| Method | Approach | Active-Prompt Advantage |
| Standard CoT | Fixed hand-crafted exemplars | Task-adapted, uncertainty-driven selection |
| Self-Consistency | Multiple decoding paths, majority vote | Better exemplar quality through active selection |
| Auto-CoT | Automatic diverse sampling | Leverages human annotation on hardest examples |
| Zero-Shot-CoT | “Let's think step by step” trigger | Task-specific demonstrations outperform generic triggers |
The key insight is that not all examples are equally valuable for prompting. By focusing annotation effort on questions where the model is most uncertain, Active-Prompt maximizes the impact of human annotation.