Reasoning-on-Tap

Reasoning-on-Tap is the concept that advanced reasoning capabilities — chain-of-thought processing, multi-step deduction, extended thinking — can be toggled on or off depending on the task, rather than being always active. It treats reasoning as a dial, not a switch: users pay for deep thinking only when the problem demands it. ¹⁾

The Two-System Analogy

The concept mirrors Daniel Kahneman's System 1 / System 2 framework from cognitive psychology:

System 1 (fast): Pattern matching, quick responses, low compute — suitable for simple questions, summarization, and chat
System 2 (slow): Deliberate step-by-step reasoning, high compute — necessary for math proofs, complex code, scientific analysis

Traditional LLMs operate primarily in System 1 mode. Reasoning-on-Tap adds the ability to engage System 2 when needed, then disengage it to save cost and latency. ²⁾

How Models Implement It

Different providers offer reasoning as a toggleable capability:

Anthropic Claude: Offers an extended thinking mode that can be activated per request. When enabled, the model generates internal reasoning traces before producing its final answer. When disabled, it responds with standard speed and cost. ³⁾
OpenAI o1/o3: Purpose-built reasoning models that inherently allocate extra inference-time compute for chain-of-thought generation, self-verification, and step-by-step refinement via reinforcement learning. These are separate model endpoints from GPT-4o. ⁴⁾
DeepSeek R1: An open-weights reasoning model trained with RL, demonstrating that reasoning capabilities can be added to existing base models through post-training.

Cost Implications

Reasoning-on-Tap fundamentally changes the economics of AI inference:

Standard mode: Fast, cheap — cents per million tokens for simple tasks
Reasoning mode: Slow, expensive — the model generates additional “thinking tokens” that consume compute but are not shown to the user

The key insight is that most queries do not need deep reasoning. A customer service chatbot answering FAQs should not pay the cost of a model solving differential equations. Reasoning-on-Tap allows organizations to match compute cost to task complexity. ⁵⁾

Impact on Model Selection

Reasoning-on-Tap shifts the model selection landscape from choosing a single model to routing between modes:

Task Type	Appropriate Mode	Cost Profile
Simple Q&A, chat	Standard (System 1)	Low cost, low latency
Summarization, translation	Standard (System 1)	Low cost, low latency
Math, logic, proofs	Reasoning (System 2)	Higher cost, higher latency
Complex code generation	Reasoning (System 2)	Higher cost, higher latency
Multi-step analysis	Reasoning (System 2)	Higher cost, higher latency

Intelligent routing systems can automatically detect when a query warrants reasoning mode, optimizing cost without user intervention.

Relationship to Inference-Time Compute

Reasoning-on-Tap is closely tied to the broader shift toward inference-time compute scaling described in Post-Training RL vs Model Scaling. Instead of spending more on pre-training, reasoning models spend more at inference — generating multiple candidate solutions, verifying each one, and selecting the best. Harder problems get more thinking time; easy problems get less. ⁶⁾

Limitations

Hidden reasoning: Some providers (notably OpenAI with o1) do not expose the full reasoning trace to users, limiting transparency and debuggability
Domain constraints: Reasoning mode excels in domains with verifiable answers (math, code) but provides less clear benefit for subjective or creative tasks
Latency: Reasoning mode can take seconds to minutes, making it unsuitable for real-time interactive applications
Cost unpredictability: The number of thinking tokens varies per query, making costs harder to forecast

References

¹⁾ , ⁴⁾ , ⁵⁾

Source: Ultralytics - Reasoning Models

²⁾ , ⁶⁾

Source: Wikipedia - Reasoning Model

³⁾

Source: Hiflylabs - Reasoning Models

Table of Contents