GPT-5 Computer Use

GPT-5.4, released by OpenAI on March 5, 2026, is the first general-purpose AI model with native computer-use capabilities — the ability to see a screen, move a cursor, click, type, and execute multi-step workflows without wrapper code or external tools. It scored 75.0% on OSWorld-Verified, surpassing the human expert baseline of 72.4%, making it the strongest computer-operating AI model available. ¹⁾

Background

GPT-5 was first released on August 7, 2025, as OpenAI's flagship multimodal model excelling in coding, math, writing, and visual perception. ²⁾ The computer-use capability arrived with the GPT-5.4 update on March 5, 2026, representing what many analysts consider OpenAI's most significant product advancement since the original ChatGPT launch.

How Computer Use Works

Unlike previous browser-automation approaches that relied on brittle scripts or browser-specific APIs, GPT-5.4 operates at the visual level:

The model receives screenshots of the current screen state
It decides what to click, type, or interact with
It executes the action via mouse/keyboard commands
It observes the result and continues the loop until the task is complete

The key innovation is that GPT-5.4 was trained end-to-end for computer use tasks — the capability is built into the model weights, not added as a plugin or post-training layer. The model supports both Playwright code generation for browser automation and direct mouse/keyboard commands from screenshots. ³⁾

Performance

OSWorld-Verified: 75.0% (human baseline: 72.4%) — a 27.7 percentage-point improvement over GPT-5.2 ⁴⁾
GDPval Score: 83.0% — matches or exceeds industry professionals across 44 occupations in the top 9 U.S. GDP-contributing industries ⁵⁾
Context window: 1 million tokens
Hallucination reduction: 33% fewer false claims vs GPT-5.2
Tool Search: New API feature reduces token usage by 47% with zero accuracy loss

Comparison with Anthropic Computer Use

Anthropic introduced computer use for Claude in October 2024, establishing the category. Key differences:

Anthropic's approach: Screenshot-based cursor movement, clicking, and typing via API, initially in beta with Claude 3.5 Sonnet
OpenAI's approach: End-to-end trained capability baked into the model, supporting both visual (screenshot) and programmatic (Playwright) modes
Performance gap: GPT-5.4's 75.0% OSWorld score surpasses all prior models including Claude's computer use implementations

Model Variants

GPT-5.4 is available in three configurations:

GPT-5.4 Standard — available in ChatGPT for Plus, Team, and Pro subscribers
GPT-5.4 Thinking — extended reasoning variant replacing GPT-5.2 Thinking
GPT-5.4 Pro — maximum performance variant for ChatGPT Pro ($200/month) and API use

Pricing

Input: $2.50 per million tokens
Context over 272K tokens: Billed at 2x normal rate
Available via ChatGPT, OpenAI API (model ID: gpt-5.4), and Codex ⁶⁾

References

¹⁾

Source: Digital Applied — GPT-5.4 Benchmarks and Pricing

²⁾

Source: OpenAI — Introducing GPT-5

³⁾

Source: VPN07 — GPT-5.4 Computer Use Guide

⁴⁾

Source: Exzil Calanza — GPT-5.4 Architecture Deep Dive

⁵⁾ , ⁶⁾

Source: Digital Applied — GPT-5.4

AI Agent Knowledge Base

Sidebar

Table of Contents

GPT-5 Computer Use

Background

How Computer Use Works

Performance

Comparison with Anthropic Computer Use

Model Variants

Pricing

See Also

References

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

GPT-5 Computer Use

Background

How Computer Use Works

Performance

Comparison with Anthropic Computer Use

Model Variants

Pricing

See Also

References

Page Tools