Table of Contents

Chain of Draft

Chain of Draft (CoD) is a prompting technique for large language models (LLMs) that produces concise, minimalistic intermediate reasoning steps instead of verbose explanations. Introduced in February 2025 by researchers at Zoom Communications, CoD matches or surpasses the accuracy of Chain of Thought (CoT) prompting while using as little as 7.6% of the reasoning tokens.1)

Background

Chain of Thought prompting, introduced by Wei et al. in 2022, revolutionized LLM reasoning by instructing models to “think step by step,” producing detailed intermediate reasoning chains.2) While effective at boosting accuracy on arithmetic, commonsense, and symbolic reasoning tasks, CoT generates verbose outputs that increase token usage, inference latency, and cost. The authors of Chain of Draft observed that this verbosity contrasts with how humans actually solve problems: by jotting down only the essential pieces of information needed to advance toward a solution.

How It Works

CoD modifies the standard CoT approach with a single key constraint: each intermediate reasoning step must be kept to roughly five words or fewer. The technique uses few-shot prompting with manually crafted examples that demonstrate this concise style.

The core instruction appended to the prompt is:

Think step by step, but only keep a minimum draft for each thinking step,
with 5 words at most. Return the answer at the end of the response after
a separator ####.

Rather than writing full sentences of explanation, the model outputs only the critical calculation or transformation at each step, similar to how a human might scribble shorthand notes on a scratch pad.

Example Comparison

Consider the problem: “Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12. How many lollipops did Jason give to Denny?”

Chain of Thought output:

Jason started with 20 lollipops. He gave some to Denny, and now he has 12.
To find how many he gave away, we subtract: 20 - 12 = 8.
So Jason gave Denny 8 lollipops.
#### 8

Chain of Draft output:

20 - 12 = 8
#### 8

The CoD response conveys the same reasoning path in a fraction of the tokens.

Benchmarks and Results

The paper evaluated CoD against standard (direct answer) prompting and CoT prompting on four reasoning benchmarks using GPT-4o and Claude 3.5 Sonnet.3)

Arithmetic Reasoning: GSM8k

Model Standard CoT CoD CoT Tokens CoD Tokens
GPT-4o 53.3% 95.4% 91.1% 205.1 43.9
Claude 3.5 Sonnet 64.6% 95.8% 91.4% 190.0 39.8

CoD reduced token usage by approximately 79% while maintaining accuracy within 4-5 percentage points of CoT.

Commonsense Reasoning: Date Understanding

Model Standard CoT CoD CoT Tokens CoD Tokens
GPT-4o 72.6% 90.2% 88.1% 75.7 30.2
Claude 3.5 Sonnet 84.3% 87.0% 89.7% 172.5 31.3

On Date Understanding, Claude 3.5 Sonnet with CoD actually surpassed CoT accuracy (89.7% vs. 87.0%) while using only 18.2% of the tokens.

Commonsense Reasoning: Sports Understanding

Model Standard CoT CoD CoT Tokens CoD Tokens
GPT-4o 90.0% 95.9% 98.3% 28.7 15.0
Claude 3.5 Sonnet 90.6% 93.2% 97.3% 189.4 14.3

Sports Understanding produced the most dramatic results: CoD outperformed CoT on both models while Claude used only 7.6% of CoT's tokens – the headline figure cited in the paper.

Symbolic Reasoning: Coin Flip

Model Standard CoT CoD CoT Tokens CoD Tokens
GPT-4o 73.2% 100.0% 100.0% 52.4 16.8
Claude 3.5 Sonnet 85.2% 100.0% 100.0% 135.3 18.9

Both methods achieved perfect accuracy on the Coin Flip task, but CoD used 68-86% fewer tokens.

Limitations

The paper identifies several important limitations:

Relationship to Other Techniques

CoD occupies a specific niche in the landscape of prompting strategies:

Practical Recommendations

See Also

References

1)
Xu S, Xie W, Zhao L, He P. “Chain of Draft: Thinking Faster by Writing Less.” arXiv:2502.18600, February 2025. arxiv.org
2)
Wei J et al. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” NeurIPS 2022. arxiv.org
3)
Xu S et al. “Chain of Draft: Thinking Faster by Writing Less.” arxiv.org
4)
Xu S et al. “Chain of Draft: Thinking Faster by Writing Less.” arxiv.org