This is an old revision of the document!

Browser-Use

Browser-Use is a popular open-source Python library that enables AI agents to autonomously control web browsers using natural language instructions. Built on top of Playwright and integrated with LangChain, it allows LLMs like GPT-4o and Claude to navigate websites, fill forms, extract data, and perform complex multi-step web tasks. With over 50,000 GitHub stars, it has become the leading framework for agent-driven browser automation.

Architecture

Browser-Use follows a modular, agent-based architecture with three core components:

Agent — The central orchestrator that takes a natural language task, an LLM, and a browser session. It autonomously reasons about page content and decides actions (click, type, scroll, navigate).
BrowserSession — Manages browser connections via Chrome DevTools Protocol (CDP). Supports local Playwright browsers or cloud-hosted browsers via Browserless WebSocket endpoints. Configurable via BrowserProfile for headless mode, viewport size, and user-agent.
LLM Integration — Uses LangChain-compatible chat models (ChatOpenAI, ChatAnthropic) for decision-making. The LLM interprets DOM content, screenshots, and page state to determine the next action.

The agent loop works as follows: observe the page state (DOM + optional screenshot) → send to LLM → receive action → execute via Playwright → repeat until task complete.

How It Works with Playwright

Browser-Use relies on Playwright as its browser automation engine. Rather than requiring developers to write Playwright scripts, the library abstracts browser control behind the Agent interface:

Playwright launches Chromium, Firefox, or WebKit browsers
The agent connects via CDP (Chrome DevTools Protocol) for real-time control
Actions like clicking, typing, scrolling, and navigation are executed through Playwright's async API
Screenshots are captured for vision-capable LLMs to analyze
DOM extraction provides structured page content for text-based reasoning

For cloud deployments, Browser-Use connects to Browserless or similar services via WebSocket CDP URLs, avoiding the need for local browser installations.

Key Features

Multi-Tab Browsing — Agents can open and manage multiple browser tabs simultaneously for tasks like comparison shopping
Vision Capabilities — GPT-4o and other vision models analyze screenshots for visual reasoning alongside DOM text
DOM Extraction — Full DOM tree parsing with intelligent element selection for LLM consumption
Custom Actions — Define custom action handlers for domain-specific interactions
Structured Output — Pydantic schema support for typed, validated extraction results
Parallel Agents — Run multiple agents concurrently for cross-site tasks
Async/Streaming — Real-time step-by-step visibility into agent actions

Integration with LangChain and OpenAI

Browser-Use is designed as a LangChain-native tool:

Uses langchain_openai.ChatOpenAI or langchain_anthropic.ChatAnthropic as the reasoning engine
Compatible with any LangChain-compatible LLM provider
Agents can be embedded into larger LangChain chains and workflows
Supports OpenAI function calling for structured tool use

Code Example

import asyncio
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from browser_use import Agent, BrowserSession, BrowserProfile
 
load_dotenv()
 
async def main():
    # Configure browser session
    session = BrowserSession(
        browser_profile=BrowserProfile(headless=True)
    )
 
    # Create agent with GPT-4o
    agent = Agent(
        task="Go to Hacker News, find the top post, and return its title and URL.",
        llm=ChatOpenAI(model="gpt-4o"),
        browser=session,
    )
 
    # Run the agent
    result = await agent.run()
    print(f"Result: {result}")
 
asyncio.run(main())

Architecture Diagram

                    ┌─────────────┐
                    │  User Task  │
                    │ (natural    │
                    │  language)  │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │    Agent    │
                    │  (reasoning │
                    │    loop)    │
                    └──┬──────┬──┘
                       │      │
              ┌────────▼┐  ┌──▼────────┐
              │   LLM   │  │  Browser  │
              │ (GPT-4o │  │  Session  │
              │  Claude) │  │(Playwright│
              └─────────┘  │   CDP)    │
                           └─────┬────┘
                                 │
                          ┌──────▼──────┐
                          │   Browser   │
                          │ (Chromium/  │
                          │  Firefox)   │
                          └─────────────┘

AI Agent Knowledge Base

Sidebar

Table of Contents

Browser-Use

Architecture

How It Works with Playwright

Key Features

Integration with LangChain and OpenAI

Code Example

Architecture Diagram

References

See Also

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Browser-Use

Architecture

How It Works with Playwright

Key Features

Integration with LangChain and OpenAI

Code Example

Architecture Diagram

References

See Also

Page Tools