====== Browser-Use ======
**Browser-Use** is a popular open-source Python library that enables AI agents to autonomously control web browsers using natural language instructions. Built on top of Playwright and integrated with LangChain, it allows LLMs like GPT-4o and Claude to navigate websites, fill forms, extract data, and perform complex multi-step web tasks. With over 50,000 GitHub stars, it has become the leading framework for agent-driven browser automation.
===== Architecture =====
Browser-Use follows a modular, agent-based architecture with three core components:
* **Agent** — The central orchestrator that takes a natural language task, an LLM, and a browser session. It autonomously reasons about page content and decides actions (click, type, scroll, navigate).
* **BrowserSession** — Manages browser connections via Chrome DevTools Protocol (CDP). Supports local Playwright browsers or cloud-hosted browsers via Browserless WebSocket endpoints. Configurable via ''BrowserProfile'' for headless mode, viewport size, and user-agent.
* **LLM Integration** — Uses LangChain-compatible chat models (''ChatOpenAI'', ''ChatAnthropic'') for decision-making. The LLM interprets DOM content, screenshots, and page state to determine the next action.
The agent loop works as follows: observe the page state (DOM + optional screenshot) → send to LLM → receive action → execute via Playwright → repeat until task complete.
===== How It Works with Playwright =====
Browser-Use relies on **Playwright** as its browser automation engine. Rather than requiring developers to write Playwright scripts, the library abstracts browser control behind the Agent interface:
* Playwright launches Chromium, Firefox, or WebKit browsers
* The agent connects via CDP (Chrome DevTools Protocol) for real-time control
* Actions like clicking, typing, scrolling, and navigation are executed through Playwright's async API
* Screenshots are captured for vision-capable LLMs to analyze
* DOM extraction provides structured page content for text-based reasoning
For cloud deployments, Browser-Use connects to Browserless or similar services via WebSocket CDP URLs, avoiding the need for local browser installations.
===== Key Features =====
* **Multi-Tab Browsing** — Agents can open and manage multiple browser tabs simultaneously for tasks like comparison shopping
* **Vision Capabilities** — GPT-4o and other vision models analyze screenshots for visual reasoning alongside DOM text
* **DOM Extraction** — Full DOM tree parsing with intelligent element selection for LLM consumption
* **Custom Actions** — Define custom action handlers for domain-specific interactions
* **Structured Output** — Pydantic schema support for typed, validated extraction results
* **Parallel Agents** — Run multiple agents concurrently for cross-site tasks
* **Async/Streaming** — Real-time step-by-step visibility into agent actions
===== Integration with LangChain and OpenAI =====
Browser-Use is designed as a LangChain-native tool:
* Uses ''langchain_openai.ChatOpenAI'' or ''langchain_anthropic.ChatAnthropic'' as the reasoning engine
* Compatible with any LangChain-compatible LLM provider
* Agents can be embedded into larger LangChain chains and workflows
* Supports OpenAI function calling for structured tool use
===== Code Example =====
import asyncio
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from browser_use import Agent, BrowserSession, BrowserProfile
load_dotenv()
async def main():
# Configure browser session
session = BrowserSession(
browser_profile=BrowserProfile(headless=True)
)
# Create agent with GPT-4o
agent = Agent(
task="Go to Hacker News, find the top post, and return its title and URL.",
llm=ChatOpenAI(model="gpt-4o"),
browser=session,
)
# Run the agent
result = await agent.run()
print(f"Result: {result}")
asyncio.run(main())
===== Architecture Diagram =====
graph TD
A["User Task (natural language)"] --> B["Agent (reasoning loop)"]
B --> C["LLM (GPT-4o / Claude)"]
B --> D["Browser Session (Playwright CDP)"]
C -->|interprets page state| B
D --> E["Browser (Chromium / Firefox)"]
E -->|DOM + screenshots| B
===== References =====
* [[https://github.com/browser-use/browser-use|Browser-Use GitHub Repository]]
* [[https://docs.browser-use.com/|Browser-Use Documentation]]
* [[https://docs.browserless.io/ai-integrations/browser-use/python|Browserless Integration Guide]]
* [[https://docs.langchain.com/oss/python/integrations/tools/playwright|LangChain Playwright Integration]]
===== See Also =====
* [[firecrawl|Firecrawl]] — Web scraping API for LLM-ready data
* [[composio|Composio]] — Tool integration platform with browser actions
* [[e2b|E2B]] — Sandboxed execution environments for agents