Active-Prompt

Active-Prompt is a method that enhances large language model reasoning by combining chain-of-thought (CoT) prompting with active learning principles. Rather than using fixed, human-annotated exemplars for all tasks, Active-Prompt dynamically selects the most uncertain examples from a pool for human annotation, creating task-adapted CoT demonstrations.¹⁾

Motivation

Traditional CoT prompting relies on a fixed set of hand-crafted exemplars applied uniformly across all tasks.²⁾ This approach has a fundamental limitation: fixed exemplars may not be well-suited for every task or question type. Active-Prompt addresses this by adaptively choosing which questions to annotate, focusing annotation effort where it matters most.

How It Works: The Four Steps

Active-Prompt follows a four-step iterative process:

Step 1: Uncertainty Estimation

Given a pool of unlabeled questions, the model generates multiple responses per question through repeated sampling. Uncertainty is measured using one of three metrics:

Disagreement: Measures variation across multiple model responses to the same input.
Entropy: Quantifies prediction uncertainty based on the distribution of responses.
Variance: Assesses the spread in generated answers.

Disagreement and entropy proved most effective in experiments.³⁾

Step 2: Selection

The top-k most uncertain questions are selected from the pool. These are the questions where the model struggles most and would benefit most from human-annotated demonstrations.

Step 3: Annotation

Human annotators provide chain-of-thought reasoning chains for the selected uncertain questions. This creates task-specific exemplars that address the model's actual weaknesses.

Step 4: Inference

The annotated exemplars are used as few-shot demonstrations in prompts for new test questions. The process can be repeated iteratively for further refinement.

This forms an adaptive prompting loop where prompts evolve based on high-value, uncertainty-driven annotations rather than static example sets.

Benchmark Results

Active-Prompt was evaluated on eight complex reasoning tasks across arithmetic, commonsense, and symbolic reasoning:⁴⁾

Category	Example Tasks	Performance
Arithmetic	MultiArith, GSM8K	Outperforms CoT and Self-Consistency
Commonsense	CommonsenseQA, StrategyQA	Best results across benchmarks
Symbolic	Coin Flip, Letter Concatenation	Superior on uncertain examples

Active-Prompt achieved state-of-the-art results on the tested tasks, consistently surpassing standard CoT, Self-Consistency, and Auto-CoT methods.

Comparison to Other CoT Methods

Method	Approach	Active-Prompt Advantage
Standard CoT	Fixed hand-crafted exemplars	Task-adapted, uncertainty-driven selection
Self-Consistency	Multiple decoding paths, majority vote	Better exemplar quality through active selection
Auto-CoT	Automatic diverse sampling	Leverages human annotation on hardest examples
Zero-Shot-CoT	“Let's think step by step” trigger	Task-specific demonstrations outperform generic triggers

The key insight is that not all examples are equally valuable for prompting. By focusing annotation effort on questions where the model is most uncertain, Active-Prompt maximizes the impact of human annotation.

Limitations

Requires human annotation: Unlike fully automated methods, Active-Prompt needs human effort to annotate selected examples.
Computational overhead: Multiple model queries per question are needed for uncertainty estimation.
Pool size sensitivity: Performance varies with the size of the initial question pool.
Task-specific: Selected exemplars may not transfer well across different task types.

References

¹⁾

Diao et al. 2023, Active Prompting with Chain-of-Thought for Large Language Models

²⁾

Wei et al. 2022, Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

³⁾

Diao et al. 2023, Section 3.2

⁴⁾

Diao et al. 2023, Table 1

Table of Contents