====== Browser-Use ====== **Browser-Use** is a popular open-source Python library that enables AI agents to autonomously control web browsers using natural language instructions. Built on top of Playwright and integrated with LangChain, it allows LLMs like GPT-4o and Claude to navigate websites, fill forms, extract data, and perform complex multi-step web tasks. With over 50,000 GitHub stars, it has become the leading framework for agent-driven browser automation. ===== Architecture ===== Browser-Use follows a modular, agent-based architecture with three core components: * **Agent** — The central orchestrator that takes a natural language task, an LLM, and a browser session. It autonomously reasons about page content and decides actions (click, type, scroll, navigate). * **BrowserSession** — Manages browser connections via Chrome DevTools Protocol (CDP). Supports local Playwright browsers or cloud-hosted browsers via Browserless WebSocket endpoints. Configurable via ''BrowserProfile'' for headless mode, viewport size, and user-agent. * **LLM Integration** — Uses LangChain-compatible chat models (''ChatOpenAI'', ''ChatAnthropic'') for decision-making. The LLM interprets DOM content, screenshots, and page state to determine the next action. The agent loop works as follows: observe the page state (DOM + optional screenshot) → send to LLM → receive action → execute via Playwright → repeat until task complete. ===== How It Works with Playwright ===== Browser-Use relies on **Playwright** as its browser automation engine. Rather than requiring developers to write Playwright scripts, the library abstracts browser control behind the Agent interface: * Playwright launches Chromium, Firefox, or WebKit browsers * The agent connects via CDP (Chrome DevTools Protocol) for real-time control * Actions like clicking, typing, scrolling, and navigation are executed through Playwright's async API * Screenshots are captured for vision-capable LLMs to analyze * DOM extraction provides structured page content for text-based reasoning For cloud deployments, Browser-Use connects to Browserless or similar services via WebSocket CDP URLs, avoiding the need for local browser installations. ===== Key Features ===== * **Multi-Tab Browsing** — Agents can open and manage multiple browser tabs simultaneously for tasks like comparison shopping * **Vision Capabilities** — GPT-4o and other vision models analyze screenshots for visual reasoning alongside DOM text * **DOM Extraction** — Full DOM tree parsing with intelligent element selection for LLM consumption * **Custom Actions** — Define custom action handlers for domain-specific interactions * **Structured Output** — Pydantic schema support for typed, validated extraction results * **Parallel Agents** — Run multiple agents concurrently for cross-site tasks * **Async/Streaming** — Real-time step-by-step visibility into agent actions ===== Integration with LangChain and OpenAI ===== Browser-Use is designed as a LangChain-native tool: * Uses ''langchain_openai.ChatOpenAI'' or ''langchain_anthropic.ChatAnthropic'' as the reasoning engine * Compatible with any LangChain-compatible LLM provider * Agents can be embedded into larger LangChain chains and workflows * Supports OpenAI function calling for structured tool use ===== Code Example ===== import asyncio from dotenv import load_dotenv from langchain_openai import ChatOpenAI from browser_use import Agent, BrowserSession, BrowserProfile load_dotenv() async def main(): # Configure browser session session = BrowserSession( browser_profile=BrowserProfile(headless=True) ) # Create agent with GPT-4o agent = Agent( task="Go to Hacker News, find the top post, and return its title and URL.", llm=ChatOpenAI(model="gpt-4o"), browser=session, ) # Run the agent result = await agent.run() print(f"Result: {result}") asyncio.run(main()) ===== Architecture Diagram ===== graph TD A["User Task (natural language)"] --> B["Agent (reasoning loop)"] B --> C["LLM (GPT-4o / Claude)"] B --> D["Browser Session (Playwright CDP)"] C -->|interprets page state| B D --> E["Browser (Chromium / Firefox)"] E -->|DOM + screenshots| B ===== References ===== * [[https://github.com/browser-use/browser-use|Browser-Use GitHub Repository]] * [[https://docs.browser-use.com/|Browser-Use Documentation]] * [[https://docs.browserless.io/ai-integrations/browser-use/python|Browserless Integration Guide]] * [[https://docs.langchain.com/oss/python/integrations/tools/playwright|LangChain Playwright Integration]] ===== See Also ===== * [[firecrawl|Firecrawl]] — Web scraping API for LLM-ready data * [[composio|Composio]] — Tool integration platform with browser actions * [[e2b|E2B]] — Sandboxed execution environments for agents