Table of Contents

Active-Prompt

Active-Prompt is a method that enhances large language model reasoning by combining chain-of-thought (CoT) prompting with active learning principles. Rather than using fixed, human-annotated exemplars for all tasks, Active-Prompt dynamically selects the most uncertain examples from a pool for human annotation, creating task-adapted CoT demonstrations.1)

Motivation

Traditional CoT prompting relies on a fixed set of hand-crafted exemplars applied uniformly across all tasks.2) This approach has a fundamental limitation: fixed exemplars may not be well-suited for every task or question type. Active-Prompt addresses this by adaptively choosing which questions to annotate, focusing annotation effort where it matters most.

How It Works: The Four Steps

Active-Prompt follows a four-step iterative process:

Step 1: Uncertainty Estimation

Given a pool of unlabeled questions, the model generates multiple responses per question through repeated sampling. Uncertainty is measured using one of three metrics:

Disagreement and entropy proved most effective in experiments.3)

Step 2: Selection

The top-k most uncertain questions are selected from the pool. These are the questions where the model struggles most and would benefit most from human-annotated demonstrations.

Step 3: Annotation

Human annotators provide chain-of-thought reasoning chains for the selected uncertain questions. This creates task-specific exemplars that address the model's actual weaknesses.

Step 4: Inference

The annotated exemplars are used as few-shot demonstrations in prompts for new test questions. The process can be repeated iteratively for further refinement.

This forms an adaptive prompting loop where prompts evolve based on high-value, uncertainty-driven annotations rather than static example sets.

Benchmark Results

Active-Prompt was evaluated on eight complex reasoning tasks across arithmetic, commonsense, and symbolic reasoning:4)

Category Example Tasks Performance
Arithmetic MultiArith, GSM8K Outperforms CoT and Self-Consistency
Commonsense CommonsenseQA, StrategyQA Best results across benchmarks
Symbolic Coin Flip, Letter Concatenation Superior on uncertain examples

Active-Prompt achieved state-of-the-art results on the tested tasks, consistently surpassing standard CoT, Self-Consistency, and Auto-CoT methods.

Comparison to Other CoT Methods

Method Approach Active-Prompt Advantage
Standard CoT Fixed hand-crafted exemplars Task-adapted, uncertainty-driven selection
Self-Consistency Multiple decoding paths, majority vote Better exemplar quality through active selection
Auto-CoT Automatic diverse sampling Leverages human annotation on hardest examples
Zero-Shot-CoT “Let's think step by step” trigger Task-specific demonstrations outperform generic triggers

The key insight is that not all examples are equally valuable for prompting. By focusing annotation effort on questions where the model is most uncertain, Active-Prompt maximizes the impact of human annotation.

Limitations

See Also

References

2)
Wei et al. 2022, Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
3)
Diao et al. 2023, Section 3.2
4)
Diao et al. 2023, Table 1