====== Active-Prompt ====== Active-Prompt is a method that enhances large language model reasoning by combining chain-of-thought (CoT) prompting with active learning principles. Rather than using fixed, human-annotated exemplars for all tasks, Active-Prompt dynamically selects the most uncertain examples from a pool for human annotation, creating task-adapted CoT demonstrations.((Diao et al. 2023, [[https://arxiv.org/abs/2302.12246|Active Prompting with Chain-of-Thought for Large Language Models]])) ===== Motivation ===== Traditional CoT prompting relies on a fixed set of hand-crafted exemplars applied uniformly across all tasks.((Wei et al. 2022, Chain-of-Thought Prompting Elicits Reasoning in Large Language Models)) This approach has a fundamental limitation: fixed exemplars may not be well-suited for every task or question type. Active-Prompt addresses this by adaptively choosing which questions to annotate, focusing annotation effort where it matters most. ===== How It Works: The Four Steps ===== Active-Prompt follows a four-step iterative process: ==== Step 1: Uncertainty Estimation ==== Given a pool of unlabeled questions, the model generates multiple responses per question through repeated sampling. Uncertainty is measured using one of three metrics: * **Disagreement**: Measures variation across multiple model responses to the same input. * **Entropy**: Quantifies prediction uncertainty based on the distribution of responses. * **Variance**: Assesses the spread in generated answers. Disagreement and entropy proved most effective in experiments.((Diao et al. 2023, Section 3.2)) ==== Step 2: Selection ==== The top-k most uncertain questions are selected from the pool. These are the questions where the model struggles most and would benefit most from human-annotated demonstrations. ==== Step 3: Annotation ==== Human annotators provide chain-of-thought reasoning chains for the selected uncertain questions. This creates task-specific exemplars that address the model's actual weaknesses. ==== Step 4: Inference ==== The annotated exemplars are used as few-shot demonstrations in prompts for new test questions. The process can be repeated iteratively for further refinement. This forms an **adaptive prompting loop** where prompts evolve based on high-value, uncertainty-driven annotations rather than static example sets. ===== Benchmark Results ===== Active-Prompt was evaluated on eight complex reasoning tasks across arithmetic, commonsense, and symbolic reasoning:((Diao et al. 2023, Table 1)) | **Category** | **Example Tasks** | **Performance** | | Arithmetic | MultiArith, GSM8K | Outperforms CoT and Self-Consistency | | Commonsense | CommonsenseQA, StrategyQA | Best results across benchmarks | | Symbolic | Coin Flip, Letter Concatenation | Superior on uncertain examples | Active-Prompt achieved state-of-the-art results on the tested tasks, consistently surpassing standard CoT, Self-Consistency, and Auto-CoT methods. ===== Comparison to Other CoT Methods ===== | **Method** | **Approach** | **Active-Prompt Advantage** | | Standard CoT | Fixed hand-crafted exemplars | Task-adapted, uncertainty-driven selection | | Self-Consistency | Multiple decoding paths, majority vote | Better exemplar quality through active selection | | Auto-CoT | Automatic diverse sampling | Leverages human annotation on hardest examples | | Zero-Shot-CoT | "Let's think step by step" trigger | Task-specific demonstrations outperform generic triggers | The key insight is that **not all examples are equally valuable** for prompting. By focusing annotation effort on questions where the model is most uncertain, Active-Prompt maximizes the impact of human annotation. ===== Limitations ===== * **Requires human annotation**: Unlike fully automated methods, Active-Prompt needs human effort to annotate selected examples. * **Computational overhead**: Multiple model queries per question are needed for uncertainty estimation. * **Pool size sensitivity**: Performance varies with the size of the initial question pool. * **Task-specific**: Selected exemplars may not transfer well across different task types. ===== See Also ===== * [[prompt_engineering]] * [[chain_of_thought_prompting]] * [[few_shot_prompting]] * [[zero_shot_prompting]] * [[automatic_prompt_engineer]] ===== References =====