Playwright MCP is Microsoft's Model Context Protocol server for browser automation. It reads the browser's accessibility tree instead of taking screenshots, giving AI agents a structured, token-efficient view of any web page. 27,900 GitHub stars, works with every major coding agent, zero vision model required.
How Playwright MCP Works
Browsers maintain an accessibility tree for screen readers. Every button, link, input, heading, and interactive element has a node in this tree with its role, label, and state. Playwright MCP exposes this tree to LLMs through the Model Context Protocol.
The workflow is straightforward:
- The AI agent calls
browser_navigatewith a URL - Playwright MCP loads the page and captures the accessibility snapshot
- The snapshot (structured text, not pixels) returns to the LLM
- The LLM reads the snapshot, identifies the target element, and calls the next tool (
browser_click,browser_type, etc.)
No computer vision. No pixel coordinates. No guessing where a button is based on a screenshot. The LLM gets a labeled tree of elements and picks the one it needs by reference.
Accessibility snapshot (simplified)
- heading "Sign In" [level=1]
- textbox "Email address" [focused]
- textbox "Password"
- button "Sign in"
- link "Forgot password?"
- separator
- button "Sign in with Google"
- button "Sign in with GitHub"The LLM sees this instead of a 1280x720 PNG. It knows exactly what elements exist, what they do, and how to interact with them.
Accessibility Snapshots vs Screenshots
The choice between snapshots and screenshots defines how the agent "sees" the page. Playwright MCP defaults to snapshots for good reason.
| Factor | Snapshot Mode (default) | Vision Mode (--vision) |
|---|---|---|
| Input format | Structured text (accessibility tree) | Base64-encoded screenshots |
| Token cost per page | ~3,800 tokens for a login form | ~10,000+ tokens for same page |
| Model requirement | Any text LLM | Vision-capable model required |
| Element targeting | Deterministic (reference IDs) | Coordinate-based (can miss) |
| Works on canvas/WebGL | Limited | Yes |
| Custom-rendered UIs | May miss elements | Captures visual layout |
Use snapshot mode for forms, navigation, data extraction, and standard web UIs. Switch to vision mode only for canvas-heavy applications, games, or pages with custom rendering that bypasses the DOM.
Setup for Every IDE
Playwright MCP ships as a single npm package. No global install needed. Every configuration points to npx @playwright/mcp@latest, which downloads and runs the server on demand. Browser binaries install automatically on first use.
Claude Code
Add Playwright MCP to Claude Code
claude mcp add playwright -- npx @playwright/mcp@latestThis persists in your ~/.claude.json. The server starts automatically when Claude Code needs browser access.
Cursor
Cursor MCP configuration
// Settings → MCP → Add new MCP Server
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"]
}
}
}VS Code Copilot
VS Code / Copilot configuration
// .vscode/mcp.json or settings.json
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"]
}
}
}GitHub Copilot Coding Agent has Playwright MCP pre-configured. It can read, interact with, and screenshot web pages on localhost during code generation with no setup.
OpenAI Codex
Codex config.toml
[mcp_servers.playwright]
command = "npx"
args = ["@playwright/mcp@latest"]Cline
Cline MCP configuration
{
"mcpServers": {
"playwright": {
"type": "stdio",
"command": "npx",
"timeout": 30,
"args": ["-y", "@playwright/mcp@latest"]
}
}
}Common flags
--headless runs without a visible browser (for CI). --browser firefox switches from Chromium. --vision enables screenshot mode. --device "iPhone 14" emulates mobile viewports.
Available MCP Tools
Playwright MCP exposes 15+ tools through the Model Context Protocol. Each tool maps to a specific browser action.
Navigation
browser_navigate, browser_navigate_back, browser_navigate_forward, browser_tab_new, browser_tab_close. Navigate to URLs, manage tabs, move through history.
Interaction
browser_click, browser_type, browser_select_option, browser_hover, browser_press_key, browser_handle_dialog. Click elements, fill forms, select dropdowns, dismiss alerts.
Observation
browser_snapshot, browser_wait_for, browser_pdf_save, browser_close. Capture page state, wait for conditions, export PDFs, clean up resources.
Each tool accepts structured parameters. browser_click takes an element reference from the accessibility snapshot. browser_type takes a reference and the text to enter. The LLM never guesses pixel coordinates. It picks elements from the labeled tree.
Example: AI agent filling a login form
// Agent receives accessibility snapshot:
// - textbox "Email address" [ref=e3]
// - textbox "Password" [ref=e4]
// - button "Sign in" [ref=e5]
// Agent calls:
browser_type({ ref: "e3", text: "user@example.com" })
browser_type({ ref: "e4", text: "password123" })
browser_click({ ref: "e5" })
// Each action returns an updated snapshot
// confirming the state changeMCP vs CLI: The Token Efficiency Debate
In February 2026, the Playwright team released Playwright CLI alongside MCP, and the benchmarks raised eyebrows. A typical browser automation task: 114,000 tokens with MCP, 27,000 tokens with CLI. A 4x difference.
The architectural difference explains the gap. MCP streams the full accessibility snapshot back to the LLM in the tool response after every action. Navigate to a login page, and 3,800 tokens of snapshot data enter the context window. Navigate to the dashboard, and another 8,000 tokens pile on top. By step 15, the conversation carries 60,000-80,000 tokens of accumulated page state from screens the agent already left behind.
CLI saves snapshots as YAML files on disk. The agent gets a one-line file path in the response and chooses whether to read the file. Most of the time, it doesn't need the full snapshot again. Context stays flat.
| Scenario | Use MCP | Use CLI |
|---|---|---|
| Claude Desktop / chat interfaces | Yes (no filesystem) | No |
| Claude Code / Cursor / Copilot | Works but costly | Yes (4x cheaper) |
| Sandboxed agent (no shell) | Yes (only option) | No |
| CI/CD test generation | Possible | Better (headless + flat context) |
| Long multi-step workflows | Degrades after ~15 steps | Stable through 50+ steps |
| Custom MCP tool orchestration | Native MCP protocol | Requires shell wrapping |
The Playwright team's recommendation
If your agent has filesystem access, use CLI. If it's sandboxed, use MCP. For most coding agent workflows (Claude Code, Cursor, Copilot), CLI is the better default because these tools already have shell access.
Stagehand vs Browser Use vs Playwright MCP
Three frameworks dominate AI browser automation in 2026, each with a different philosophy.
| Factor | Playwright MCP | Stagehand | Browser Use |
|---|---|---|---|
| Approach | Structured MCP tools | AI primitives (act/extract/observe) | Full autonomous agent loop |
| Built on | Playwright (Microsoft) | Playwright (Browserbase) | Playwright (open source) |
| LLM calls per action | 1 (tool call) | 1 (with auto-cache skip) | 1+ (re-reasons each step) |
| Caching | None | Auto-caches selectors after first run | None (re-reasons every step) |
| GitHub stars | 27.9K | 12K+ | 50K+ |
| Best for | Coding agent integration | Surgical AI actions in workflows | Fully autonomous browsing |
Playwright MCP is the right choice when you're integrating browser control into an existing coding agent. It speaks the MCP protocol natively, so Claude Code, Cursor, and Copilot can use it without wrappers.
Stagehand (by Browserbase) adds intelligence on top of Playwright. Its act() method translates natural language to browser actions, and extract() pulls structured data from pages. The auto-caching is the differentiator: once an action succeeds, Stagehand records the selector and replays it without calling the LLM on subsequent runs.
Browser Use gives the LLM complete control. The agent decides what to click, when to scroll, and when the task is done. No cached selectors, no predefined tools. This makes it the most flexible but also the most expensive per task. 50,000+ GitHub stars make it one of the fastest-growing open-source AI projects.
Production systems increasingly use hybrid approaches: Playwright for the 80% of steps that are deterministic, and Stagehand or Browser Use for the 20% that need AI reasoning.
Use Cases
Visual Verification for Coding Agents
The most common use case in coding workflows. After generating or editing frontend code, the agent navigates to localhost, takes an accessibility snapshot (or screenshot in vision mode), and verifies the UI matches expectations. This closes the feedback loop between code generation and visual correctness without a human checking the browser.
E2E Test Generation
Checkly documented this workflow: the agent navigates the app, captures snapshots at each step, and generates a complete Playwright test file with proper selectors and assertions. Instead of writing boilerplate test code manually, the agent produces a ready-to-run test suite from a natural language description of the user flow.
Web Scraping with Dynamic Content
Traditional scrapers break on JavaScript-heavy pages. Playwright MCP handles dynamic content natively: clicking "load more" buttons, handling infinite scroll, navigating pagination, and extracting structured data. The accessibility tree approach means the agent understands page structure, not just HTML strings.
Form Automation and RPA
Filling complex multi-step forms, navigating admin panels, and automating repetitive browser tasks. The accessibility tree gives the agent labeled form fields with their types and validation states, making form interaction reliable across different UI frameworks.
Coding Agent Workflow
Edit code, start dev server, navigate to localhost, verify UI, iterate. The agent sees exactly what changed without leaving the IDE.
Testing Workflow
Describe a user flow in natural language. The agent navigates the app, captures snapshots, and generates a complete Playwright test file with assertions.
Performance Optimization
Token consumption is the primary cost driver with Playwright MCP. A few targeted optimizations make a significant difference.
1. Use includeSnapshot: false
The expectation parameter on tool responses controls whether the full accessibility snapshot returns with each action. Setting includeSnapshot: false provides a 70-80% token reduction per action. Only request the snapshot when the agent actually needs to read the page state.
2. Switch to CLI When Possible
For coding agents with filesystem access, Playwright CLI saves snapshots to disk instead of streaming them into context. 4x fewer tokens. The agent reads snapshot files only when it needs them.
3. Minimize Navigation
Each navigation loads a new page and returns a full snapshot. Plan browser interactions to minimize page transitions. Use browser_wait_for to handle dynamic content on the current page instead of refreshing.
4. Run Headless in CI
The --headless flag skips rendering the browser UI. Faster execution, lower resource usage, same automation capabilities. Standard for CI pipelines and background tasks.
Frequently Asked Questions
What is Playwright MCP?
Playwright MCP is Microsoft's Model Context Protocol server for browser automation. It gives AI agents structured access to web pages through accessibility snapshots instead of screenshots. Ships as @playwright/mcp on npm, works with Claude Code, Cursor, VS Code Copilot, Cline, and Codex.
How do I set up Playwright MCP with Claude Code?
One command: claude mcp add playwright -- npx @playwright/mcp@latest. This persists in your Claude Code configuration. Browser binaries install automatically on first use.
What is the difference between Playwright MCP and Playwright CLI?
MCP streams accessibility snapshots directly into the AI's context window through the Model Context Protocol. CLI saves snapshots as YAML files on disk, and the agent reads them only when needed. CLI uses roughly 4x fewer tokens (27K vs 114K per task) but requires the agent to have filesystem access. The Playwright team recommends CLI for coding agents (Claude Code, Cursor) and MCP for sandboxed environments (Claude Desktop, custom chat UIs).
Does Playwright MCP require a vision model?
No. In snapshot mode (the default), Playwright MCP uses the accessibility tree, which is structured text. Any text-based LLM can control the browser. Vision mode (--vision flag) uses screenshots, but it's only needed for pages with incomplete accessibility trees (canvas apps, custom-rendered UIs).
What browsers does Playwright MCP support?
Chromium (default), Firefox, WebKit, and Microsoft Edge. Specify with --browser firefox or --browser webkit. Runs headed (visible window) by default, or headless with --headless. Device emulation is available with --device "iPhone 14".
How does Playwright MCP compare to Stagehand and Browser Use?
Playwright MCP provides raw structured browser control through MCP tools. Stagehand (Browserbase) wraps Playwright with AI primitives and adds auto-caching for repeated actions. Browser Use gives the LLM full autonomous browser control with re-reasoning at every step. MCP integrates natively with coding agents. Stagehand is best for production workflows needing cached reliability. Browser Use is best for fully autonomous browsing.
How can I reduce token usage with Playwright MCP?
Set includeSnapshot: false in tool responses for 70-80% token savings per action. Switch to Playwright CLI when your agent has filesystem access (4x savings). Minimize unnecessary navigations. Use browser_wait_for instead of re-navigating to check dynamic content.
Can Playwright MCP run in CI/CD pipelines?
Yes. Use --headless to run without a browser UI. Browser binaries install automatically. Works with any CI provider that supports Node.js. Combine with --browser chromium for the most stable CI experience.
Build Smarter Coding Agent Workflows
Playwright MCP gives your agent eyes. Morph's MCP tools give it a brain. WarpGrep MCP provides RL-trained code search that keeps your agent's context clean, and Fast Apply turns diffs into working code changes. Pair them with Playwright MCP for agents that can search, edit, and verify.