====== Step-Back Prompting ====== **Step-Back Prompting** is a reasoning technique introduced by Zheng et al. at Google DeepMind in 2023 that improves LLM performance on complex tasks by first abstracting the problem to high-level principles before attempting detailed reasoning. The method draws inspiration from how human experts approach difficult problems -- by stepping back to identify the relevant concepts before diving into specifics. ===== Overview ===== Standard prompting and even Chain-of-Thought (CoT) methods can fail on complex reasoning tasks because they attempt to reason directly over low-level details, leading to compounding errors in intermediate steps. Step-Back Prompting addresses this by inserting an **abstraction step** that identifies the relevant principles, concepts, or frameworks before the model reasons toward a solution. ===== Method ===== The technique operates in two phases: - **Abstraction Phase**: Given the original question, the LLM generates a higher-level "step-back question" that targets the underlying principles. For example: * Original: "What happens to the pressure of an ideal gas if temperature increases by factor 2 and volume increases by factor 8?" * Step-back question: "What is the Ideal Gas Law and its key relationships?" - **Reasoning Phase**: The LLM answers the original question by explicitly referencing the derived high-level concepts. The full prompt concatenates the step-back question, its answer, and the original question. This two-phase approach grounds the reasoning chain in verified principles, reducing the likelihood of hallucination or faulty intermediate steps. ===== Formal Description ===== Given an original question $q$, the process is: $$q_{\text{sb}} = \text{StepBack}(q)$$ $$p = \text{LLM}(q_{\text{sb}})$$ $$a = \text{LLM}(q \mid q_{\text{sb}}, p)$$ where $q_{\text{sb}}$ is the step-back question, $p$ is the derived principle or concept, and $a$ is the final answer conditioned on both the abstraction and the original question. The abstraction function can be viewed as a mapping from a specific instance to a general class: $$\text{StepBack}: \mathcal{Q}_{\text{specific}} \rightarrow \mathcal{Q}_{\text{abstract}}$$ ===== Code Example ===== import openai def step_back_prompting(question, client): # Phase 1: Generate step-back question sb_prompt = ( "You are an expert at abstracting problems to their core principles.\n" f"Given this question: {question}\n" "What is a more general step-back question that identifies " "the underlying principles needed to solve this?" ) step_back_q = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": sb_prompt}] ).choices[0].message.content # Phase 1b: Answer the step-back question principles = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": step_back_q}] ).choices[0].message.content # Phase 2: Reason over original question with principles reason_prompt = ( f"Step-back question: {step_back_q}\n" f"Relevant principles: {principles}\n\n" f"Now answer the original question using the above principles:\n" f"{question}" ) answer = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": reason_prompt}] ).choices[0].message.content return answer ===== Experimental Results ===== Evaluated primarily on PaLM-2L (340B parameters): ^ Benchmark ^ Baseline (CoT) ^ Step-Back ^ Improvement ^ | MMLU Physics | 66.4% | 73.4% | +7.0% | | MMLU Chemistry | 70.9% | 81.9% | +11.0% | | TimeQA | 41.5% | 68.5% | +27.0% | | MuSiQue | 35.5% | 42.5% | +7.0% | Error analysis on MMLU Physics shows Step-Back corrects approximately 20.5% of baseline errors while introducing only 11.9% new errors. Most residual errors stem from the LLM's intrinsic reasoning limits rather than abstraction failures. Results generalize to GPT-4 and LLaMA2-70B, confirming the technique is model-independent. ===== Comparison with Chain-of-Thought ===== Step-Back Prompting outperforms CoT by up to 36% on select tasks. The key difference is that CoT decomposes problems linearly via intermediate steps, risking compounding errors from early detail immersion. Step-Back preempts this by establishing a correct conceptual framework first. It also outperforms variants like zero-shot CoT and "take-a-deep-breath" prompting. ===== References ===== * [[https://arxiv.org/abs/2310.06117|Zheng et al., "Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models", arXiv:2310.06117 (2023)]] * [[https://deepmind.google/research/publications/50274/|Google DeepMind publication page]] ===== See Also ===== * [[chain_of_verification|Chain-of-Verification (CoVe)]] * [[least_to_most_prompting|Least-to-Most Prompting]] * [[skeleton_of_thought|Skeleton-of-Thought]]