Browser Control - EnConvo Documentation

Overview

Browser Control lets EnConvo’s AI interact with your real browser — the same one where you’re logged into all your accounts. Unlike headless browser tools that start fresh with no cookies or sessions, Browser Control works with your existing browser instance, including all your login sessions, cookies, bookmarks, and extensions. This means the AI can:

Navigate to your dashboard without needing your credentials
Fill forms on sites where you’re already authenticated
Take screenshots of exactly what you see
Read page content from any tab
Click buttons, manage tabs, and automate workflows — all in your real browser

Use Browser Control for websites and browser tabs. Use Computer Use for native macOS apps, file pickers, system dialogs, and desktop app workflows.

Real Browser, Real Sessions

Works with your existing login sessions — no re-authentication needed

Multi-Browser Support

Chrome, Edge, Brave, Arc, Vivaldi, Opera, Firefox, and more

86+ Actions

Navigate, click, type, screenshot, snapshot, eval, tab management, cookies, and much more

Background Mode

Automate without bringing the browser to the foreground

How It Works

Browser Control uses a lightweight companion extension installed in your browser. The extension communicates with EnConvo via WebSocket, allowing the AI to send commands and receive results.

Install the Companion Extension

Install the Enconvo Companion extension in your browser

Extension Connects Automatically

The extension connects to EnConvo via WebSocket when your browser is running

AI Sends Commands

When you ask the AI to interact with a webpage, it sends commands through the extension

Extension Executes & Returns Results

The extension performs the action in your browser and returns the result to the AI

Installing the Companion Extension

Chrome Web Store (Recommended)
Manual Installation

Open the Enconvo Companion extension page in Chrome Web Store
Click Add to Chrome
Click Add extension in the confirmation popup

This also works for Edge, Brave, Vivaldi, Opera, and other Chromium-based browsers — they all support Chrome Web Store extensions.

If the extension isn’t available on the Chrome Web Store yet, you can install it manually:

Open Extensions Page

Open your browser and navigate to chrome://extensions (or edge://extensions, brave://extensions, etc.)

Enable Developer Mode

Toggle on Developer mode using the switch in the top-right corner

Load the Extension

Click Load unpacked and select the folder:

~/.enconvo/chrome_extension/

Verify

You should see “Enconvo Companion” in your extensions list. The extension icon will appear in your browser toolbar.

EnConvo automatically copies the extension files to ~/.enconvo/chrome_extension/ when the app starts. If the folder doesn’t exist, restart EnConvo first.

Setting Up

Choose Your Default Browser

By default, Browser Control uses your system’s default browser. You can override this:

Open EnConvo Settings
Find the Browser Use extension
Set Default Browser to your preferred browser

Verify Connection

To check if the extension is connected, use the browser-use/status API route:

browser-use/status

This shows:

Which browsers are connected
Which browser is currently active
Whether the extension is running

Core Workflow

The typical pattern for AI browser automation:

Navigate to a page

Tell the AI to open a URL — it uses the navigate action

Take a snapshot

The AI takes a snapshot of the page, getting an accessibility tree with element references like @e1, @e2

Interact with elements

Using the references, the AI can click buttons (@e3), fill inputs (@e5), select dropdowns, etc.

Re-snapshot after changes

After any navigation or DOM change, the AI takes a fresh snapshot to get updated references

Since Browser Control uses your real browser sessions, you don’t need to handle login flows. Just navigate directly to authenticated pages — you’re already logged in.

What Can It Do?

Page Interaction

Action	Description
Navigate	Open URLs, go back/forward in history
Click / Double-click	Click any element by reference, CSS selector, or text
Fill / Type	Enter text into inputs (React/Vue compatible)
Select	Choose dropdown options
Check / Uncheck	Toggle checkboxes and radio buttons
Press	Keyboard shortcuts like `Enter`, `Ctrl+A`, `Cmd+C`
Scroll	Scroll page or specific elements into view
Drag	Drag and drop between elements
Focus / Hover	Focus inputs or hover over elements
Submit Form	Submit form elements

Content Reading

Action	Description
Screenshot	Capture the visible tab as an image
Annotated Screenshot	Screenshot with numbered labels on interactive elements
Snapshot	Accessibility tree with `@eN` element references
Get Content	Extract page text and optionally HTML
Get Text / HTML / Value	Read specific element content
Get All Links	Extract all links from the page
Get Table Data	Extract HTML tables as structured JSON
Get Form Data	Read all form field values
Get Meta Tags	Extract SEO and OpenGraph metadata
Get Images	List all images with src, alt, and dimensions
Eval	Execute custom JavaScript in the page

Browser Management

Action	Description
Tab Management	Open, close, switch, and list tabs
Cookie Management	Get, set, remove, and clear cookies
Storage	Read/write localStorage and sessionStorage
State Save/Load	Save and restore browser state (cookies + storage)
Window Management	Open new windows
Zoom	Get, set, zoom in/out, reset zoom
Clear Cache	Clear the browser cache

Inspection & Debugging

Action	Description
Get Element Info	Bounding box, visibility, enabled state, ARIA info
Is Visible / Enabled / Checked	Quick boolean state checks
Is In Viewport	Check if element is visible without scrolling
Highlight	Visually highlight an element with colored outline
Console Messages	Capture console.log output
Page Errors	Capture JavaScript errors
Get Performance	Page load timing and resource metrics
Network Monitor	Track XHR and fetch requests

Advanced

Action	Description
Batch	Execute multiple actions in one request
Inject CSS	Add custom styles to the page
Remove Element	Remove elements from the DOM
Set Attribute	Modify HTML attributes
Toggle Class	Add/remove CSS classes
Emulate Device	Set viewport and user agent for mobile testing
Set Media	Emulate dark/light mode
Block Resources	Block images, scripts, or ads
Frame	Switch between iframes

Examples

Reading the Current Page

Simply ask:

“What’s on the current page in my browser?”

The AI will use get_frontmost_browser_active_tab_content to read whatever page is currently in the foreground, regardless of which browser you’re using.

Filling a Form

“Go to example.com/signup and fill in the form with my name John Doe and email john@example.com”

The AI will:

Navigate to the URL
Snapshot the page to find form fields
Fill in each field using the @eN references
Submit the form

Taking a Screenshot

“Take a screenshot of my GitHub dashboard”

The AI navigates to GitHub (you’re already logged in), waits for the page to load, and captures a screenshot that displays directly in the chat.

Extracting Data

“Get all the links from the Hacker News front page”

The AI navigates to Hacker News and uses get_all_links to extract every link with its text and URL.

Background Automation

“In the background, check the price of AAPL on Google Finance”

With background: true, the browser stays minimized while the AI navigates, reads the price, and reports back.

Multi-Browser Support

Browser Control works with any browser that has the companion extension installed:

Browser	Detection Method	Name in Commands
Google Chrome	AppleScript	`google_chrome`
Microsoft Edge	Extension (userAgent)	`edge`
Brave	AppleScript	`brave_browser`
Arc	AppleScript	`arc`
Vivaldi	AppleScript	`vivaldi`
Opera	Extension (userAgent)	`opera`
Firefox	Extension (URL scheme)	`firefox`

Browser Selection Priority

When you don’t specify a browser, EnConvo follows this order:

Your manual setting — if you chose a default in Computer Use preferences
System default browser — your macOS default (skips Safari since it doesn’t support the extension)
Connected browsers — whichever connected browser was used most recently

Auto-Launch

If the target browser isn’t running, EnConvo will automatically launch it and wait for the extension to connect (up to 10 seconds).

Reliability Improvements

Browser Control is designed to recover from common real-world page behavior:

Exact tab targeting keeps actions attached to the intended tab.
Fresh page snapshots are taken after navigation, loading, clicking, and DOM changes so element references stay current.
Cross-origin iframe handling improves interaction with embedded forms and dashboards.
Drag-safe clicks and coordinate actions help with canvases, custom UI widgets, and controls that do not expose clean accessibility labels.
Clipboard proxy support preserves formatted text when pasting into rich editors.
Cleaner page-state output makes it easier for agents to choose the right link, button, textbox, or table row.

If a task still fails, ask EnConvo to take a new snapshot or describe the visible page before trying the next action.

Snapshot & Element References

The snapshot is one of the most powerful features. It produces an accessibility tree of the page:

[page] My Dashboard
  [navigation] Main Nav
    [link @e1] Home
    [link @e2] Projects
    [link @e3] Settings
  [main]
    [heading @e4] Welcome back, John
    [textbox @e5] "Search..." (placeholder)
    [button @e6] New Project
    [list]
      [listitem @e7] Project Alpha
      [listitem @e8] Project Beta

Interactive elements get @eN references that the AI can use directly:

click @e6 → clicks “New Project”
fill @e5 "my search term" → types in the search box
get_text @e4 → reads “Welcome back, John”

Important: Element references (@e1, @e2, etc.) are invalidated when the page changes. The AI always takes a fresh snapshot after navigation or DOM changes.

Troubleshooting

'No browser extension connected' error

The companion extension isn’t installed or the browser isn’t running.Solutions:

Make sure your browser is running
Install the extension from Chrome Web Store
Or install manually from ~/.enconvo/chrome_extension/ (see Manual Installation above)
Check chrome://extensions and make sure the extension is enabled

Extension installed but not connecting

The extension connects via WebSocket to localhost:11225. Ensure:

EnConvo is running
No firewall is blocking port 11225
Try disabling and re-enabling the extension
Check the extension’s service worker for errors in chrome://extensions

Actions fail on certain pages

Some pages restrict extension access:

chrome:// and browser internal pages cannot be controlled
Pages with strict Content Security Policy (CSP) may block injected scripts
Closed Shadow DOM elements cannot be accessed

Workaround: Use eval with caution, or try a different approach like keyboard shortcuts (press).

Wrong browser is being controlled

Check which browser is active:

browser-use/status

You can specify a browser explicitly:

browser-use/navigate { "url": "https://example.com", "browser": "edge" }

Or set a default in Computer Use preferences.

Screenshots are blank or wrong

captureVisibleTab only captures the visible viewport. Make sure:

The browser window is not minimized
The target tab is the active tab
The page has finished loading (use wait_for first)

AI Agents

Use Browser Control as a tool within AI agent workflows

Computer Use

Control native macOS apps, dialogs, and file pickers

Context Awareness

EnConvo can automatically read your current browser tab as context

MCP Servers

Extend browser capabilities with MCP-based web tools

Workflows

Automate multi-step browser tasks with visual workflows

​Overview

Real Browser, Real Sessions

Multi-Browser Support

86+ Actions

Background Mode

​How It Works

​Installing the Companion Extension

​Setting Up

​Choose Your Default Browser

​Verify Connection

​Core Workflow

​What Can It Do?

​Page Interaction

​Content Reading

​Browser Management

​Inspection & Debugging

​Advanced

​Examples

​Reading the Current Page

​Filling a Form

​Taking a Screenshot

​Extracting Data

​Background Automation

​Multi-Browser Support

​Browser Selection Priority

​Auto-Launch

​Reliability Improvements

​Snapshot & Element References

​Troubleshooting

​Related Features

AI Agents

Computer Use

Context Awareness

MCP Servers

Workflows

Overview

How It Works

Installing the Companion Extension

Setting Up

Choose Your Default Browser

Verify Connection

Core Workflow

What Can It Do?

Page Interaction

Content Reading

Browser Management

Inspection & Debugging

Advanced

Examples

Reading the Current Page

Filling a Form

Taking a Screenshot

Extracting Data

Background Automation

Multi-Browser Support

Browser Selection Priority

Auto-Launch

Reliability Improvements

Snapshot & Element References

Troubleshooting

Related Features