Table of Contents

Self-Refine

Self-Refine, introduced by Madaan et al. (2023), is a framework that improves LLM outputs through iterative self-feedback: the same model generates an initial output, critiques it for weaknesses, and refines it based on that feedback. No external training, reinforcement learning, or additional models are required — the approach works purely at inference time with a single LLM.

Motivation

Humans rarely produce perfect first drafts — we revise through iterative critique and improvement. Self-Refine brings this natural revision process to LLMs. While a model's initial output may contain errors, the same model can often identify those issues when explicitly prompted to critique, and fix them when prompted to revise. This leverages an asymmetry: evaluation is easier than generation.

The Generate-Critique-Refine Loop

Self-Refine operates in three steps, iterated until convergence:

  1. Generate: The LLM produces an initial output $y_0$ given a task prompt $x$
  2. Critique: The LLM provides multi-aspect feedback $f_t$ on the current output $y_t$, identifying specific weaknesses
  3. Refine: The LLM generates an improved output $y_{t+1}$ conditioned on $x$, $y_t$, and $f_t$

$$y_0 = \text{LLM}(x), \quad f_t = \text{LLM}_{\text{fb}}(x, y_t), \quad y_{t+1} = \text{LLM}_{\text{refine}}(x, y_t, f_t)$$

The loop repeats for $T$ iterations or until the model indicates no further improvements are possible.

def self_refine(model, task_prompt, max_iters=3):
    """Iterative self-refinement loop."""
    output = model.generate(task_prompt)
 
    for i in range(max_iters):
        critique = model.generate(
            f"Review and identify specific issues:\n"
            f"Task: {task_prompt}\nOutput: {output}"
        )
        if "no improvements needed" in critique.lower():
            break
        output = model.generate(
            f"Improve based on feedback:\n"
            f"Task: {task_prompt}\nOutput: {output}\n"
            f"Feedback: {critique}"
        )
    return output

Key Design Principles

Stopping Criteria

Two mechanisms determine when to stop:

Experimental Results

Self-Refine was evaluated on 7 diverse tasks with improvements of 5-40% over direct generation:

Task Improvement
Code optimization ~31% relative gain
Sentiment reversal ~9% absolute gain
Math reasoning Consistent improvement
Dialogue response Quality gains in coherence
Code readability Multi-iteration gains
Acronym generation Strong human preference
Review rewriting ~20% average improvement

Results hold across GPT-3.5 and GPT-4, with quality improving monotonically over 2-3 rounds. Human evaluators consistently prefer Self-Refine outputs.

Limitations

Significance

Self-Refine demonstrates that significant quality improvements are achievable without any additional training — purely through structured inference-time computation. This suggests that frontier model capabilities are underutilized by single-pass generation, and iterative refinement is a general-purpose method for extracting better performance.

References

See Also