====== GPT-5 Computer Use ======

**GPT-5.4**, released by OpenAI on March 5, 2026, is the first general-purpose AI model with **native computer-use capabilities** — the ability to see a screen, move a cursor, click, type, and execute multi-step workflows without wrapper code or external tools. It scored **75.0% on OSWorld-Verified**, surpassing the human expert baseline of 72.4%, making it the strongest computer-operating AI model available. ((Source: [[https://www.digitalapplied.com/blog/gpt-5-4-computer-use-tool-search-benchmarks-pricing|Digital Applied — GPT-5.4 Benchmarks and Pricing]]))

===== Background =====

GPT-5 was first released on August 7, 2025, as OpenAI's flagship multimodal model excelling in coding, math, writing, and visual perception. ((Source: [[https://openai.com/index/introducing-gpt-5/|OpenAI — Introducing GPT-5]])) The computer-use capability arrived with the **GPT-5.4 update** on March 5, 2026, representing what many analysts consider OpenAI's most significant product advancement since the original ChatGPT launch.

===== How Computer Use Works =====

Unlike previous browser-automation approaches that relied on brittle scripts or browser-specific APIs, GPT-5.4 operates at the **visual level**:

  - The model receives screenshots of the current screen state
  - It decides what to click, type, or interact with
  - It executes the action via mouse/keyboard commands
  - It observes the result and continues the loop until the task is complete

The key innovation is that GPT-5.4 was **trained end-to-end** for computer use tasks — the capability is built into the model weights, not added as a plugin or post-training layer. The model supports both **Playwright code generation** for browser automation and **direct mouse/keyboard commands** from screenshots. ((Source: [[https://vpn07.com/en/blog/2026-gpt54-computer-use-ai-agent-automates-pc.html|VPN07 — GPT-5.4 Computer Use Guide]]))

===== Performance =====

  * **OSWorld-Verified:** 75.0% (human baseline: 72.4%) — a 27.7 percentage-point improvement over GPT-5.2 ((Source: [[https://exzilcalanza.info/gpt-5-4-openai-computer-use-autonomous-agents-architecture-2026/|Exzil Calanza — GPT-5.4 Architecture Deep Dive]]))
  * **GDPval Score:** 83.0% — matches or exceeds industry professionals across 44 occupations in the top 9 U.S. GDP-contributing industries ((Source: [[https://www.digitalapplied.com/blog/gpt-5-4-computer-use-tool-search-benchmarks-pricing|Digital Applied — GPT-5.4]]))
  * **Context window:** 1 million tokens
  * **Hallucination reduction:** 33% fewer false claims vs GPT-5.2
  * **Tool Search:** New API feature reduces token usage by 47% with zero accuracy loss

===== Comparison with Anthropic Computer Use =====

Anthropic introduced computer use for Claude in **October 2024**, establishing the category. Key differences:

  * **Anthropic's approach:** Screenshot-based cursor movement, clicking, and typing via API, initially in beta with Claude 3.5 Sonnet
  * **OpenAI's approach:** End-to-end trained capability baked into the model, supporting both visual (screenshot) and programmatic (Playwright) modes
  * **Performance gap:** GPT-5.4's 75.0% OSWorld score surpasses all prior models including Claude's computer use implementations

===== Model Variants =====

GPT-5.4 is available in three configurations:

  * **GPT-5.4 Standard** — available in ChatGPT for Plus, Team, and Pro subscribers
  * **GPT-5.4 Thinking** — extended reasoning variant replacing GPT-5.2 Thinking
  * **GPT-5.4 Pro** — maximum performance variant for ChatGPT Pro ($200/month) and API use

===== Pricing =====

  * **Input:** $2.50 per million tokens
  * **Context over 272K tokens:** Billed at 2x normal rate
  * Available via ChatGPT, OpenAI API (model ID: gpt-5.4), and Codex ((Source: [[https://www.digitalapplied.com/blog/gpt-5-4-computer-use-tool-search-benchmarks-pricing|Digital Applied — GPT-5.4]]))

===== See Also =====

  * [[openai|OpenAI]]
  * [[computer_use|Computer Use]]
  * [[ai_agents|AI Agents]]

===== References =====