====== Step-Back Prompting ======
**Step-Back Prompting** is a reasoning technique introduced by Zheng et al. at [[google_deepmind|Google DeepMind]] in 2023 that improves LLM performance on complex tasks by first abstracting the problem to high-level principles before attempting detailed reasoning.(([[https://arxiv.org/abs/2310.06117|Zheng et al. "Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models." arXiv:2310.06117, 2023.]])) The method draws inspiration from how human experts approach difficult problems, by stepping back to identify the relevant concepts before diving into specifics.

===== Overview =====
Standard prompting and even Chain-of-Thought (CoT) methods can fail on complex reasoning tasks because they attempt to reason directly over low-level details, leading to compounding errors in intermediate steps. Step-Back Prompting addresses this by inserting an **abstraction step** that identifies the relevant principles, concepts, or frameworks before the model reasons toward a solution.

===== Method =====
The technique operates in two phases:

  - **Abstraction Phase**: Given the original question, the LLM generates a higher-level "step-back question" that targets the underlying principles. For example:
    * Original: "What happens to the pressure of an ideal gas if temperature increases by factor 2 and volume increases by factor 8?"
    * Step-back question: "What is the Ideal Gas Law and its key relationships?"
  - **Reasoning Phase**: The LLM answers the original question by explicitly referencing the derived high-level concepts. The full prompt concatenates the step-back question, its answer, and the original question.

This two-phase approach grounds the reasoning chain in verified principles, reducing the likelihood of hallucination or faulty intermediate steps.

===== Formal Description =====
Given an original question $q$, the process is:

$$q_{\text{sb}} = \text{StepBack}(q)$$

$$p = \text{LLM}(q_{\text{sb}})$$

$$a = \text{LLM}(q \mid q_{\text{sb}}, p)$$

where $q_{\text{sb}}$ is the step-back question, $p$ is the derived principle or concept, and $a$ is the final answer conditioned on both the abstraction and the original question.

The abstraction function can be viewed as a mapping from a specific instance to a general class:

$$\text{StepBack}: \mathcal{Q}_{\text{specific}} \rightarrow \mathcal{Q}_{\text{abstract}}$$

===== Code Example =====
<code python>
import [[openai|openai]]

def step_back_prompting(question, client):
    # Phase 1: Generate step-back question
    sb_prompt = (
        "You are an expert at abstracting problems to their core principles.\n"
        f"Given this question: {question}\n"
        "What is a more general step-back question that identifies "
        "the underlying principles needed to solve this?"
    )
    step_back_q = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": sb_prompt}]
    ).choices[0].message.content

    # Phase 1b: Answer the step-back question
    principles = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": step_back_q}]
    ).choices[0].message.content

    # Phase 2: Reason over original question with principles
    reason_prompt = (
        f"Step-back question: {step_back_q}\n"
        f"Relevant principles: {principles}\n\n"
        f"Now answer the original question using the above principles:\n"
        f"{question}"
    )
    answer = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": reason_prompt}]
    ).choices[0].message.content

    return answer
</code>

===== Experimental Results =====
Evaluated primarily on PaLM-2L (340B parameters):

^ Benchmark ^ Baseline (CoT) ^ Step-Back ^ Improvement ^
| MMLU Physics | 66.4% | 73.4% | +7.0% |
| MMLU Chemistry | 70.9% | 81.9% | +11.0% |
| TimeQA | 41.5% | 68.5% | +27.0% |
| MuSiQue | 35.5% | 42.5% | +7.0% |

Error analysis on MMLU Physics shows Step-Back corrects approximately 20.5% of baseline errors while introducing only 11.9% new errors. Most residual errors stem from the LLM's intrinsic reasoning limits rather than abstraction failures.

Results generalize to GPT-4 and LLaMA2-70B, confirming the technique is model-independent.

===== Comparison with Chain-of-Thought =====
Step-Back Prompting outperforms CoT by up to 36% on select tasks.(([[https://arxiv.org/abs/2310.06117|Zheng et al. "Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models." arXiv:2310.06117, 2023.]])) The key difference is that CoT decomposes problems linearly via intermediate steps, risking compounding errors from early detail immersion. Step-Back preempts this by establishing a correct conceptual framework first. It also outperforms variants like zero-shot CoT and "take-a-deep-breath" prompting.

===== See Also =====
  * [[reasoning_via_planning|RAP: Reasoning via Planning with LLM as World Model]]
  * [[analogical_prompting|Analogical Prompting]]
  * [[reasoning_sandwich|Reasoning Sandwich]]
  * [[least_to_most_prompting|Least-to-Most Prompting]]
  * [[bonsai_reasoning_vs_simple_tasks|Bonsai 8B Multi-Step Reasoning vs Simpler Tasks]]

===== References =====