====== Active-Prompt ======

Active-Prompt is a method that enhances large language model reasoning by combining chain-of-thought (CoT) prompting with active learning principles. Rather than using fixed, human-annotated exemplars for all tasks, Active-Prompt dynamically selects the most uncertain examples from a pool for human annotation, creating task-adapted CoT demonstrations.((Diao et al. 2023, [[https://arxiv.org/abs/2302.12246|Active Prompting with Chain-of-Thought for Large Language Models]]))

===== Motivation =====

Traditional CoT prompting relies on a fixed set of hand-crafted exemplars applied uniformly across all tasks.((Wei et al. 2022, Chain-of-Thought Prompting Elicits Reasoning in Large Language Models)) This approach has a fundamental limitation: fixed exemplars may not be well-suited for every task or question type. Active-Prompt addresses this by adaptively choosing which questions to annotate, focusing annotation effort where it matters most.

===== How It Works: The Four Steps =====

Active-Prompt follows a four-step iterative process:

==== Step 1: Uncertainty Estimation ====

Given a pool of unlabeled questions, the model generates multiple responses per question through repeated sampling. Uncertainty is measured using one of three metrics:

  * **Disagreement**: Measures variation across multiple model responses to the same input.
  * **Entropy**: Quantifies prediction uncertainty based on the distribution of responses.
  * **Variance**: Assesses the spread in generated answers.

Disagreement and entropy proved most effective in experiments.((Diao et al. 2023, Section 3.2))

==== Step 2: Selection ====

The top-k most uncertain questions are selected from the pool. These are the questions where the model struggles most and would benefit most from human-annotated demonstrations.

==== Step 3: Annotation ====

Human annotators provide chain-of-thought reasoning chains for the selected uncertain questions. This creates task-specific exemplars that address the model's actual weaknesses.

==== Step 4: Inference ====

The annotated exemplars are used as few-shot demonstrations in prompts for new test questions. The process can be repeated iteratively for further refinement.

This forms an **adaptive prompting loop** where prompts evolve based on high-value, uncertainty-driven annotations rather than static example sets.

===== Benchmark Results =====

Active-Prompt was evaluated on eight complex reasoning tasks across arithmetic, commonsense, and symbolic reasoning:((Diao et al. 2023, Table 1))

| **Category** | **Example Tasks** | **Performance** |
| Arithmetic | MultiArith, GSM8K | Outperforms CoT and Self-Consistency |
| Commonsense | CommonsenseQA, StrategyQA | Best results across benchmarks |
| Symbolic | Coin Flip, Letter Concatenation | Superior on uncertain examples |

Active-Prompt achieved state-of-the-art results on the tested tasks, consistently surpassing standard CoT, Self-Consistency, and Auto-CoT methods.

===== Comparison to Other CoT Methods =====

| **Method** | **Approach** | **Active-Prompt Advantage** |
| Standard CoT | Fixed hand-crafted exemplars | Task-adapted, uncertainty-driven selection |
| Self-Consistency | Multiple decoding paths, majority vote | Better exemplar quality through active selection |
| Auto-CoT | Automatic diverse sampling | Leverages human annotation on hardest examples |
| Zero-Shot-CoT | "Let's think step by step" trigger | Task-specific demonstrations outperform generic triggers |

The key insight is that **not all examples are equally valuable** for prompting. By focusing annotation effort on questions where the model is most uncertain, Active-Prompt maximizes the impact of human annotation.

===== Limitations =====

  * **Requires human annotation**: Unlike fully automated methods, Active-Prompt needs human effort to annotate selected examples.
  * **Computational overhead**: Multiple model queries per question are needed for uncertainty estimation.
  * **Pool size sensitivity**: Performance varies with the size of the initial question pool.
  * **Task-specific**: Selected exemplars may not transfer well across different task types.

===== See Also =====

  * [[prompt_engineering]]
  * [[chain_of_thought_prompting]]
  * [[few_shot_prompting]]
  * [[zero_shot_prompting]]
  * [[automatic_prompt_engineer]]

===== References =====