Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Reasoning-on-Tap is the concept that advanced reasoning capabilities — chain-of-thought processing, multi-step deduction, extended thinking — can be toggled on or off depending on the task, rather than being always active. It treats reasoning as a dial, not a switch: users pay for deep thinking only when the problem demands it. 1)
The concept mirrors Daniel Kahneman's System 1 / System 2 framework from cognitive psychology:
Traditional LLMs operate primarily in System 1 mode. Reasoning-on-Tap adds the ability to engage System 2 when needed, then disengage it to save cost and latency. 2)
Different providers offer reasoning as a toggleable capability:
Reasoning-on-Tap fundamentally changes the economics of AI inference:
The key insight is that most queries do not need deep reasoning. A customer service chatbot answering FAQs should not pay the cost of a model solving differential equations. Reasoning-on-Tap allows organizations to match compute cost to task complexity. 5)
Reasoning-on-Tap shifts the model selection landscape from choosing a single model to routing between modes:
| Task Type | Appropriate Mode | Cost Profile |
| Simple Q&A, chat | Standard (System 1) | Low cost, low latency |
| Summarization, translation | Standard (System 1) | Low cost, low latency |
| Math, logic, proofs | Reasoning (System 2) | Higher cost, higher latency |
| Complex code generation | Reasoning (System 2) | Higher cost, higher latency |
| Multi-step analysis | Reasoning (System 2) | Higher cost, higher latency |
Intelligent routing systems can automatically detect when a query warrants reasoning mode, optimizing cost without user intervention.
Reasoning-on-Tap is closely tied to the broader shift toward inference-time compute scaling described in Post-Training RL vs Model Scaling. Instead of spending more on pre-training, reasoning models spend more at inference — generating multiple candidate solutions, verifying each one, and selecting the best. Harder problems get more thinking time; easy problems get less. 6)