Browse
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety
Meta
Browse
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety
Meta
Step-Back Prompting is a reasoning technique introduced by Zheng et al. at Google DeepMind in 2023 that improves LLM performance on complex tasks by first abstracting the problem to high-level principles before attempting detailed reasoning.1) The method draws inspiration from how human experts approach difficult problems, by stepping back to identify the relevant concepts before diving into specifics.
Standard prompting and even Chain-of-Thought (CoT) methods can fail on complex reasoning tasks because they attempt to reason directly over low-level details, leading to compounding errors in intermediate steps. Step-Back Prompting addresses this by inserting an abstraction step that identifies the relevant principles, concepts, or frameworks before the model reasons toward a solution.
The technique operates in two phases:
This two-phase approach grounds the reasoning chain in verified principles, reducing the likelihood of hallucination or faulty intermediate steps.
Given an original question $q$, the process is:
$$q_{\text{sb}} = \text{StepBack}(q)$$
$$p = \text{LLM}(q_{\text{sb}})$$
$$a = \text{LLM}(q \mid q_{\text{sb}}, p)$$
where $q_{\text{sb}}$ is the step-back question, $p$ is the derived principle or concept, and $a$ is the final answer conditioned on both the abstraction and the original question.
The abstraction function can be viewed as a mapping from a specific instance to a general class:
$$\text{StepBack}: \mathcal{Q}_{\text{specific}} \rightarrow \mathcal{Q}_{\text{abstract}}$$
import [[openai|openai]] def step_back_prompting(question, client): # Phase 1: Generate step-back question sb_prompt = ( "You are an expert at abstracting problems to their core principles.\n" f"Given this question: {question}\n" "What is a more general step-back question that identifies " "the underlying principles needed to solve this?" ) step_back_q = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": sb_prompt}] ).choices[0].message.content # Phase 1b: Answer the step-back question principles = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": step_back_q}] ).choices[0].message.content # Phase 2: Reason over original question with principles reason_prompt = ( f"Step-back question: {step_back_q}\n" f"Relevant principles: {principles}\n\n" f"Now answer the original question using the above principles:\n" f"{question}" ) answer = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": reason_prompt}] ).choices[0].message.content return answer
Evaluated primarily on PaLM-2L (340B parameters):
| Benchmark | Baseline (CoT) | Step-Back | Improvement |
|---|---|---|---|
| MMLU Physics | 66.4% | 73.4% | +7.0% |
| MMLU Chemistry | 70.9% | 81.9% | +11.0% |
| TimeQA | 41.5% | 68.5% | +27.0% |
| MuSiQue | 35.5% | 42.5% | +7.0% |
Error analysis on MMLU Physics shows Step-Back corrects approximately 20.5% of baseline errors while introducing only 11.9% new errors. Most residual errors stem from the LLM's intrinsic reasoning limits rather than abstraction failures.
Results generalize to GPT-4 and LLaMA2-70B, confirming the technique is model-independent.
Step-Back Prompting outperforms CoT by up to 36% on select tasks.2) The key difference is that CoT decomposes problems linearly via intermediate steps, risking compounding errors from early detail immersion. Step-Back preempts this by establishing a correct conceptual framework first. It also outperforms variants like zero-shot CoT and “take-a-deep-breath” prompting.