====== Reasoning-on-Tap ======

Reasoning-on-Tap is the concept that advanced reasoning capabilities — chain-of-thought processing, multi-step deduction, extended thinking — can be **toggled on or off** depending on the task, rather than being always active. It treats reasoning as a dial, not a switch: users pay for deep thinking only when the problem demands it. ((Source: [[https://www.ultralytics.com/glossary/reasoning-models|Ultralytics - Reasoning Models]]))

===== The Two-System Analogy =====

The concept mirrors Daniel Kahneman's **System 1 / System 2** framework from cognitive psychology:

  * **System 1** (fast): Pattern matching, quick responses, low compute — suitable for simple questions, summarization, and chat
  * **System 2** (slow): Deliberate step-by-step reasoning, high compute — necessary for math proofs, complex code, scientific analysis

Traditional LLMs operate primarily in System 1 mode. Reasoning-on-Tap adds the ability to engage System 2 when needed, then disengage it to save cost and latency. ((Source: [[https://en.wikipedia.org/wiki/Reasoning_model|Wikipedia - Reasoning Model]]))

===== How Models Implement It =====

Different providers offer reasoning as a toggleable capability:

  * **Anthropic Claude**: Offers an **extended thinking mode** that can be activated per request. When enabled, the model generates internal reasoning traces before producing its final answer. When disabled, it responds with standard speed and cost. ((Source: [[https://hiflylabs.com/blog/2025/4/3/reasoning-models|Hiflylabs - Reasoning Models]]))
  * **OpenAI o1/o3**: Purpose-built reasoning models that inherently allocate extra inference-time compute for chain-of-thought generation, self-verification, and step-by-step refinement via reinforcement learning. These are separate model endpoints from GPT-4o. ((Source: [[https://www.ultralytics.com/glossary/reasoning-models|Ultralytics - Reasoning Models]]))
  * **DeepSeek R1**: An open-weights reasoning model trained with RL, demonstrating that reasoning capabilities can be added to existing base models through post-training.

===== Cost Implications =====

Reasoning-on-Tap fundamentally changes the economics of AI inference:

  * **Standard mode**: Fast, cheap — cents per million tokens for simple tasks
  * **Reasoning mode**: Slow, expensive — the model generates additional "thinking tokens" that consume compute but are not shown to the user

The key insight is that most queries do not need deep reasoning. A customer service chatbot answering FAQs should not pay the cost of a model solving differential equations. Reasoning-on-Tap allows organizations to **match compute cost to task complexity**. ((Source: [[https://www.ultralytics.com/glossary/reasoning-models|Ultralytics - Reasoning Models]]))

===== Impact on Model Selection =====

Reasoning-on-Tap shifts the model selection landscape from choosing a single model to **routing between modes**:

| Task Type | Appropriate Mode | Cost Profile |
| Simple Q&A, chat | Standard (System 1) | Low cost, low latency |
| Summarization, translation | Standard (System 1) | Low cost, low latency |
| Math, logic, proofs | Reasoning (System 2) | Higher cost, higher latency |
| Complex code generation | Reasoning (System 2) | Higher cost, higher latency |
| Multi-step analysis | Reasoning (System 2) | Higher cost, higher latency |

Intelligent routing systems can automatically detect when a query warrants reasoning mode, optimizing cost without user intervention.

===== Relationship to Inference-Time Compute =====

Reasoning-on-Tap is closely tied to the broader shift toward **inference-time compute scaling** described in [[post_training_rl_vs_scaling|Post-Training RL vs Model Scaling]]. Instead of spending more on pre-training, reasoning models spend more at inference — generating multiple candidate solutions, verifying each one, and selecting the best. Harder problems get more thinking time; easy problems get less. ((Source: [[https://en.wikipedia.org/wiki/Reasoning_model|Wikipedia - Reasoning Model]]))

===== Limitations =====

  * **Hidden reasoning**: Some providers (notably OpenAI with o1) do not expose the full reasoning trace to users, limiting transparency and debuggability
  * **Domain constraints**: Reasoning mode excels in domains with verifiable answers (math, code) but provides less clear benefit for subjective or creative tasks
  * **Latency**: Reasoning mode can take seconds to minutes, making it unsuitable for real-time interactive applications
  * **Cost unpredictability**: The number of thinking tokens varies per query, making costs harder to forecast

===== See Also =====

  * [[post_training_rl_vs_scaling|Post-Training RL vs Model Scaling]]
  * [[ai_self_verification|AI Self-Verification]]
  * [[inference_economics|Inference Economics]]

===== References =====