Least-to-Most Prompting

Least-to-Most Prompting (LtM) is a prompting paradigm introduced by Zhou et al. at Google Research in 2022 that enables LLMs to solve complex problems by decomposing them into an ordered sequence of simpler sub-problems, solved from easiest to hardest.¹⁾ Each sub-problem's solution is passed as context to the next, enabling compositional generalization far beyond what standard few-shot prompting achieves.

graph LR A[Complex Question Q] --> B[Decompose] B --> C[Sub-Q1 easiest] C --> D[Answer A1] D --> E[Sub-Q2 + A1 context] E --> F[Answer A2] F --> G[Sub-Qn + all prior As] G --> H[Final Answer]

Overview

Standard few-shot prompting and even Chain-of-Thought (CoT) struggle when test problems are harder or longer than the provided examples. LtM overcomes this limitation by explicitly separating the planning of how to decompose a problem from the execution of solving each piece. This separation allows models to generalize to problems significantly more complex than those in the prompt exemplars.

Two-Stage Approach

LtM operates in two distinct stages:

Decomposition Stage: The LLM is prompted with exemplars showing how to break a complex problem $Q$ into an ordered list of sub-problems $Q_1, Q_2, \ldots, Q_n$, where complexity increases monotonically. The output is only the sub-problem list, no solving occurs.
Sequential Solving Stage: For each sub-problem $Q_i$ (from $i=1$ to $n$), the LLM generates answer $A_i$ conditioned on all prior question-answer pairs. The prompt explicitly appends this accumulated context.

Formal Description

The decomposition yields an ordered set of sub-problems:

$$Q \rightarrow \{Q_1, Q_2, \ldots, Q_n\} \quad \text{where } \text{complexity}(Q_i) \leq \text{complexity}(Q_{i+1})$$

Each sub-problem is solved with full prior context:

$$A_i = \text{LLM}(Q_i \mid \{(Q_1, A_1), (Q_2, A_2), \ldots, (Q_{i-1}, A_{i-1})\})$$

The final answer $A_n$ addresses the original problem $Q$. This recursive conditioning creates an auditable chain of reasoning where each step builds on verified prerequisites.

How It Differs from Chain-of-Thought

Aspect	Chain-of-Thought	Least-to-Most
Structure	Monolithic: interleaved reasoning and answers in a single prompt	Separated: decomposition phase distinct from solving phase
Ordering	Implicit in exemplar patterns	Explicit least-to-most ordering
Context	Fixed exemplar context	Growing context: each step sees all prior solutions
Generalization	Struggles with OOD harder-than-seen problems	Robust to problems harder than exemplars
Dependency	No explicit dependency graph	Explicit dependency chain between sub-problems

Code Example

import [[openai|openai]]
 
def least_to_most(question, exemplars, client):
    # Stage 1: Decompose into sub-problems
    decompose_prompt = (
        "Break the following problem into simpler sub-problems, "
        "ordered from easiest to hardest.\n\n"
    )
    for ex in exemplars["decomposition"]:
        decompose_prompt += f"Q: {ex['question']}\nSub-problems: {ex['subproblems']}\n\n"
    decompose_prompt += f"Q: {question}\nSub-problems:"
 
    decomposition = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": decompose_prompt}],
        temperature=0
    ).choices[0].message.content
 
    subproblems = [s.strip() for s in decomposition.split("\n") if s.strip()]
 
    # Stage 2: Solve sequentially, passing prior context
    solved_context = []
    for subproblem in subproblems:
        solve_prompt = "Solve each sub-problem using prior answers.\n\n"
        for ex in exemplars["solving"]:
            solve_prompt += f"{ex}\n\n"
        for sq, sa in solved_context:
            solve_prompt += f"Q: {sq}\nA: {sa}\n\n"
        solve_prompt += f"Q: {subproblem}\nA:"
 
        answer = client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": solve_prompt}],
            temperature=0
        ).choices[0].message.content
 
        solved_context.append((subproblem, answer))
 
    return solved_context[-1][1]  # Final answer

Experimental Results

Evaluated on PaLM-540B with 8-shot exemplars and temperature 0:²⁾

Benchmark	Standard Prompting	CoT	Least-to-Most	Key Insight
SCAN (length-4 OOD)	14.3%	,	99.7%	~10x gain on compositional generalization
DROP	,	69.1%	78.2%	+9.1% via entity extraction then arithmetic
GSM8K	,	58.0%	74.4%	Solves 7-step problems from 3-step demos

The most striking result is on SCAN, where LtM achieves near-perfect accuracy on compositional commands that are far longer and more complex than any seen in the exemplars. This demonstrates the power of explicit decomposition for out-of-distribution generalization.

References

¹⁾ , ²⁾

Zhou et al., "Least-to-Most Prompting Enables Complex Reasoning in Large Language Models", arXiv:2205.10625 (2022

Table of Contents