agent()

Agent

See how to use agent() to create autonomous AI agents for multi-step browser workflows

Agent Creation

TypeScript

// Create agent instance
const agent = stagehand.agent(config?: AgentConfig): AgentInstance

AgentConfig Interface:

interface AgentConfig {
  systemPrompt?: string;
  integrations?: (Client | string)[];
  tools?: ToolSet;
  /** @deprecated Use `mode: "cua"` instead */
  cua?: boolean;
  model?: string | AgentModelConfig<string>;
  executionModel?: string | AgentModelConfig<string>;
  stream?: boolean; // Enable streaming mode (experimental)
  mode?: "dom" | "hybrid" | "cua"; // Tool mode
}

// AgentModelConfig for advanced configuration
type AgentModelConfig<TModelName extends string = string> = {
  modelName: TModelName;
} & Record<string, unknown>;

AgentInstance Interface:

interface AgentInstance {
  execute: (instructionOrOptions: string | AgentExecuteOptions) => Promise<AgentResult>;
}

Agent Configuration

systemPrompt

string

Custom system prompt to provide to the agent. Overrides the default system prompt and defines agent behavior.

model

string | AgentModelConfig

The model to use for agent functionality. Can be either:

A string in the format "provider/model" (e.g., "openai/computer-use-preview", "anthropic/claude-sonnet-4-20250514")
An object with modelName and additional provider-specific options

Available CUA Models:

"anthropic/claude-3-7-sonnet-latest"
"anthropic/claude-haiku-4-5-20251001"
"anthropic/claude-sonnet-4-20250514"
"anthropic/claude-sonnet-4-5-20250929"
"anthropic/claude-opus-4-5-20251101"
"anthropic/claude-opus-4-6"
"google/gemini-2.5-computer-use-preview-10-2025"
"google/gemini-3-flash-preview"
"google/gemini-3-pro-preview"
"microsoft/fara-7b"
"openai/computer-use-preview"
"openai/computer-use-preview-2025-03-11"

Show AgentModelConfig Object

modelName

string

required

The model name

[key: string]

unknown

Additional provider-specific options (e.g., apiKey, baseURL)

executionModel

string | AgentModelConfig

The model to use for tool execution (observe/act calls within agent tools). If not specified, inherits from the main model configuration.Format: "provider/model" (e.g., "openai/gpt-4o-mini", "google/gemini-2.0-flash-exp")

cua

boolean

Deprecated: Use mode: "cua" instead. This option will be removed in a future version.

Indicates whether Computer Use Agent (CUA) mode is enabled. When false, the agent uses standard tool-based operation instead of computer control.

integrations

(Client | string)[]

MCP (Model Context Protocol) integrations for external tools and services.Array of: MCP server URLs (strings) or connected Client objects

tools

ToolSet

Custom tool definitions to extend agent capabilities using the AI SDK ToolSet format.

stream

boolean

Enable streaming mode for the agent. When true, execute() returns AgentStreamResult with textStream for incremental output. When false (default), execute() returns AgentResult after completion.Default: false

Non-CUA agents only. Requires experimental: true. Not available when mode: "cua".

mode

"dom" | "hybrid" | "cua"

Tool mode for the agent. Determines which set of tools are available to the agent.Modes:

"dom" (default): Uses DOM-based tools (act, fillForm) for structured page interactions. Works with any model.
"hybrid": Uses both DOM-based and coordinate-based tools (act, click, type, dragAndDrop, clickAndHold, fillForm) for visual/screenshot-based interactions. Requires models with reliable coordinate-based action capabilities.
"cua": Uses Computer Use Agent (CUA) providers like Anthropic Claude, Google Gemini, or OpenAI for screenshot-based automation. This is the preferred way to enable CUA mode (replaces the deprecated cua: true option).

Default: "dom"

Hybrid Mode Model Requirements: Only use hybrid mode with models that can reliably perform coordinate-based actions:

Google: google/gemini-3-flash-preview
Anthropic: anthropic/claude-sonnet-4-20250514, anthropic/claude-sonnet-4-5-20250929, anthropic/claude-haiku-4-5-20251001

Requires experimental: true in Stagehand constructor.

Execute Method

Non-Streaming
Streaming

// String instruction
await agent.execute(instruction: string): Promise<AgentResult>

// With options
await agent.execute(options: AgentExecuteOptions): Promise<AgentResult>

AgentExecuteOptions Interface:

interface AgentExecuteOptions {
  instruction: string;
  maxSteps?: number;
  page?: PlaywrightPage | PuppeteerPage | PatchrightPage | Page;
  highlightCursor?: boolean;
  messages?: ModelMessage[]; // Continue from previous conversation (experimental)
  signal?: AbortSignal; // Cancel execution (experimental)
  excludeTools?: string[]; // Tools to exclude from this execution (experimental)
  output?: ZodObject; // Zod schema for structured output (experimental)
  callbacks?: AgentExecuteCallbacks;
}

interface AgentExecuteCallbacks {
  prepareStep?: PrepareStepFunction<ToolSet>;
  onStepFinish?: GenerateTextOnStepFinishCallback<ToolSet>;
}

// With stream: true in AgentConfig
await agent.execute(options: AgentStreamExecuteOptions): Promise<AgentStreamResult>

AgentStreamExecuteOptions Interface:

interface AgentStreamExecuteOptions {
  instruction: string;
  maxSteps?: number;
  page?: PlaywrightPage | PuppeteerPage | PatchrightPage | Page;
  highlightCursor?: boolean;
  messages?: ModelMessage[]; // Continue from previous conversation (experimental)
  signal?: AbortSignal; // Cancel execution (experimental)
  excludeTools?: string[]; // Tools to exclude from this execution (experimental)
  output?: ZodObject; // Zod schema for structured output (experimental)
  callbacks?: AgentStreamCallbacks;
}

interface AgentStreamCallbacks {
  prepareStep?: PrepareStepFunction<ToolSet>;
  onStepFinish?: StreamTextOnStepFinishCallback<ToolSet>;
  onChunk?: StreamTextOnChunkCallback<ToolSet>;
  onFinish?: StreamTextOnFinishCallback<ToolSet>;
  onError?: StreamTextOnErrorCallback;
  onAbort?: (event: { steps: Array<StepResult<ToolSet>> }) => void | Promise<void>;
}

Execute Parameters

instruction

string

required

High-level task description in natural language.

maxSteps

number

Maximum number of actions the agent can take before stopping.Default: 20

page

PlaywrightPage | PuppeteerPage | PatchrightPage | Page

Optional: Specify which page to perform the agent execution on. Supports multiple browser automation libraries:

Playwright: Native Playwright Page objects
Puppeteer: Puppeteer Page objects
Patchright: Patchright Page objects
Stagehand Page: Stagehand’s wrapped Page object

If not specified, defaults to the current “active” page in your Stagehand instance.

highlightCursor

boolean

Whether to show a visual cursor on the page during agent execution. Useful for debugging and demonstrations.Default: false

messages

ModelMessage[]

Previous conversation messages to continue from. Pass the messages from a previous AgentResult to continue that conversation.

Non-CUA agents only. Requires experimental: true. Not available when mode: "cua".

signal

AbortSignal

An AbortSignal that can be used to cancel the agent execution. When aborted, the agent will stop and throw an AgentAbortError.

Non-CUA agents only. Requires experimental: true. Not available when mode: "cua".

excludeTools

string[]

Tools to exclude from this execution. Pass an array of tool names to prevent the agent from using those tools.Available tools by mode:DOM mode (default): act, fillForm, ariaTree, extract, goto, scroll, keys, navback, screenshot, think, wait, searchHybrid mode: click, type, dragAndDrop, clickAndHold, fillFormVision, act, ariaTree, extract, goto, scroll, keys, navback, screenshot, think, wait, search

Non-CUA agents only. Requires experimental: true. Not available when cua: true.

output

ZodObject

A Zod schema defining structured output data to return when the task completes. The agent will populate this data based on the information it gathered during execution. The result will be available in AgentResult.output.

Non-CUA agents only. Requires experimental: true. Not available when mode: "cua".

import { z } from "zod/v3";

const result = await agent.execute({
  instruction: "Find the cheapest flight from NYC to LA",
  output: z.object({
    price: z.string().describe("The price of the flight"),
    airline: z.string().describe("The airline name"),
    departureTime: z.string().describe("Departure time"),
  }),
});

console.log(result.output); // { price: "$199", airline: "Delta", departureTime: "8:00 AM" }

callbacks

AgentExecuteCallbacks | AgentStreamCallbacks

Callbacks to hook into the agent’s execution lifecycle. The available callbacks depend on whether streaming is enabled.

Non-CUA agents only. Requires experimental: true. Not available when mode: "cua".

Show Non-Streaming Callbacks (AgentExecuteCallbacks)

prepareStep

PrepareStepFunction<ToolSet>

Called before each step to modify settings. You can change the model, tool choices, active tools, system prompt, and input messages for each step.

onStepFinish

GenerateTextOnStepFinishCallback<ToolSet>

Called when each step (LLM call) completes. Provides access to tool calls, reasoning, and step results.

Show Streaming Callbacks (AgentStreamCallbacks)

prepareStep

PrepareStepFunction<ToolSet>

Called before each step to modify settings.

onStepFinish

StreamTextOnStepFinishCallback<ToolSet>

Called when each step completes during streaming.

onChunk

StreamTextOnChunkCallback<ToolSet>

Called for each chunk of the stream. Stream processing will pause until the callback promise resolves.

onFinish

StreamTextOnFinishCallback<ToolSet>

Called when the stream finishes successfully.

onError

StreamTextOnErrorCallback

Called when an error occurs during streaming.

onAbort

(event: { steps: StepResult[] }) => void | Promise<void>

Called when the stream is aborted via the signal option.

Response

Returns: Promise<AgentResult> (non-streaming) or Promise<AgentStreamResult> (streaming)

Non-Streaming
Streaming

AgentResult Interface:

interface AgentResult {
  success: boolean;
  message: string;
  actions: AgentAction[];
  completed: boolean;
  metadata?: Record<string, unknown>;
  messages?: ModelMessage[]; // Conversation history for continuation (experimental)
  output?: Record<string, unknown>; // Structured output data (experimental)
  usage?: {
    input_tokens: number;
    output_tokens: number;
    reasoning_tokens?: number;
    cached_input_tokens?: number;
    inference_time_ms: number;
  };
}

// AgentAction can contain various tool-specific fields
interface AgentAction {
  type: string;
  reasoning?: string;
  taskCompleted?: boolean;
  action?: string;
  timeMs?: number;        // wait tool
  pageText?: string;      // ariaTree tool
  pageUrl?: string;       // ariaTree tool
  instruction?: string;   // various tools
  timestamp?: number;     // Action timestamp
  [key: string]: unknown; // Additional tool-specific fields
}

AgentStreamResult Interface:

interface AgentStreamResult {
  // Async iterable of text chunks for incremental output
  textStream: AsyncIterable<string>;
  
  // Async iterable of all stream events (tool calls, messages, etc.)
  fullStream: AsyncIterable<StreamPart>;
  
  // Promise that resolves to the final AgentResult when streaming completes
  result: Promise<AgentResult>;
  
  // Additional properties from StreamTextResult<ToolSet, never>
  // See Vercel AI SDK documentation for full details
}

success

boolean

Whether the task was completed successfully.

message

string

Description of the execution result and status.

actions

AgentAction[]

Array of individual actions taken during execution. Each action contains tool-specific data.

completed

boolean

Whether the agent believes the task is fully complete.

metadata

Record<string, unknown>

Additional execution metadata and debugging information.

messages

ModelMessage[]

The conversation messages from this execution. Pass these to a subsequent execute() call via the messages option to continue the conversation.

Non-CUA agents only. Requires experimental: true.

output

Record<string, unknown>

Custom structured output data extracted based on the output Zod schema provided in execute options. Only populated if an output schema was provided.

Non-CUA agents only. Requires experimental: true.

usage

object

Token usage and performance metrics.

Show Usage Metrics

input_tokens

number

Number of input tokens used

output_tokens

number

Number of output tokens generated

reasoning_tokens

number

Number of reasoning tokens (if supported by the model)

cached_input_tokens

number

Number of cached input tokens (if supported by the model)

inference_time_ms

number

Total inference time in milliseconds

Example Response

{
  "success": true,
  "message": "Task completed successfully",
  "actions": [
    {
      "type": "act",
      "instruction": "click the submit button",
      "reasoning": "User requested to submit the form",
      "taskCompleted": false
    },
    {
      "type": "observe",
      "instruction": "check if submission was successful",
      "taskCompleted": true
    }
  ],
  "completed": true,
  "metadata": {
    "steps_taken": 2
  },
  "output": {
    "price": "$199",
    "airline": "Delta",
    "departureTime": "8:00 AM"
  },
  "usage": {
    "input_tokens": 1250,
    "output_tokens": 340,
    "reasoning_tokens": 42,
    "cached_input_tokens": 0,
    "inference_time_ms": 2500
  }
}

Code Examples

import { Stagehand } from "@browserbasehq/stagehand";

// Initialize with Browserbase (API key and project ID from environment variables)
// Set BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID in your environment
const stagehand = new Stagehand({
  env: "BROWSERBASE",
  model: "anthropic/claude-sonnet-4-20250514"
});
await stagehand.init();

const page = stagehand.context.pages()[0];
// Create agent with default configuration
const agent = stagehand.agent();

// Navigate to a page
await page.goto("https://www.google.com");

// Execute a task
const result = await agent.execute("Search for 'Stagehand automation' and click the first result");

console.log(result.message);
console.log(`Completed: ${result.completed}`);
console.log(`Actions taken: ${result.actions.length}`);

// Create agent with custom model and system prompt
const agent = stagehand.agent({
  model: "openai/computer-use-preview",
  systemPrompt: "You are a helpful assistant that can navigate websites efficiently. Always verify actions before proceeding.",
  executionModel: "openai/gpt-4o-mini"  // Use faster model for tool execution
});

const page = stagehand.context.pages()[0];
await page.goto("https://example.com");

const result = await agent.execute({
  instruction: "Fill out the contact form with test data",
  maxSteps: 10,
  highlightCursor: true
});

// Using AgentModelConfig for advanced configuration
const agent = stagehand.agent({
  model: {
    modelName: "anthropic/claude-sonnet-4-20250514",
    apiKey: process.env.ANTHROPIC_API_KEY,
    baseURL: "https://custom-proxy.com/v1"
  }
});

const result = await agent.execute("Complete the checkout process");

const page1 = stagehand.context.pages()[0];
const page2 = await stagehand.context.newPage();

const agent = stagehand.agent();

// Execute on specific page
await page2.goto("https://example.com/dashboard");
const result = await agent.execute({
  instruction: "Export the data table",
  page: page2
});

const stagehand = new Stagehand({
  env: "LOCAL",
  experimental: true, // Required for streaming
});
await stagehand.init();

const page = stagehand.context.pages()[0];
await page.goto("https://amazon.com");

// Create a streaming agent
const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
  stream: true,
});

const streamResult = await agent.execute({
  instruction: "Search for headphones and find the best deal",
  maxSteps: 20,
});

// Stream text output incrementally
for await (const delta of streamResult.textStream) {
  process.stdout.write(delta);
}

// Get the final result
const finalResult = await streamResult.result;
console.log("Completed:", finalResult.completed);

const stagehand = new Stagehand({
  env: "LOCAL",
  experimental: true,
});
await stagehand.init();

const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
});

const page = stagehand.context.pages()[0];
await page.goto("https://example.com");

const result = await agent.execute({
  instruction: "Fill out the contact form",
  maxSteps: 10,
  callbacks: {
    prepareStep: async (stepContext) => {
      console.log(`Starting step ${stepContext.stepNumber}`);
      return stepContext;
    },
    onStepFinish: async (event) => {
      console.log(`Step finished: ${event.finishReason}`);
      if (event.toolCalls) {
        for (const tc of event.toolCalls) {
          console.log(`Tool: ${tc.toolName}`, tc.input);
        }
      }
    },
  },
});

const stagehand = new Stagehand({
  env: "LOCAL",
  experimental: true,
});
await stagehand.init();

const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
});

const page = stagehand.context.pages()[0];
await page.goto("https://example.com");

const controller = new AbortController();

// Abort after 30 seconds
setTimeout(() => controller.abort("Timeout exceeded"), 30000);

try {
  const result = await agent.execute({
    instruction: "Complete a complex multi-step workflow",
    maxSteps: 50,
    signal: controller.signal,
  });
} catch (error) {
  if (error.name === "AgentAbortError") {
    console.log("Task cancelled:", error.message);
  }
}

const stagehand = new Stagehand({
  env: "LOCAL",
  experimental: true,
});
await stagehand.init();

const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
});

const page = stagehand.context.pages()[0];
await page.goto("https://example.com/shop");

// First execution
const firstResult = await agent.execute({
  instruction: "Search for laptops and list the top 3 options",
  maxSteps: 10,
});

// Continue conversation with context from first run
const secondResult = await agent.execute({
  instruction: "Filter those results by price under $1000",
  maxSteps: 10,
  messages: firstResult.messages, // Pass previous messages
});

// Chain further with accumulated context
const thirdResult = await agent.execute({
  instruction: "Add the best-rated one to cart",
  maxSteps: 10,
  messages: secondResult.messages,
});

console.log("Final:", thirdResult.message);

import { Client } from "@modelcontextprotocol/sdk/client/index.js";

// Create agent with MCP integrations
const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-20250514",
  integrations: [
    "https://mcp-server.example.com",  // MCP server URL
    mcpClientInstance  // Or pre-connected Client object
  ]
});

const result = await agent.execute("Use the external tool to process this data");

const stagehand = new Stagehand({
  env: "LOCAL",
  experimental: true, // Required for excludeTools
});
await stagehand.init();

const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
});

const page = stagehand.context.pages()[0];
await page.goto("https://example.com");

// Exclude specific tools from this execution
const result = await agent.execute({
  instruction: "Navigate the page and click buttons",
  maxSteps: 15,
  excludeTools: ["screenshot", "extract", "search"],
});

console.log("Completed:", result.completed);

const stagehand = new Stagehand({
  env: "LOCAL",
  experimental: true, // Required for hybrid mode
});
await stagehand.init();

// Create agent with hybrid mode for coordinate-based interactions
const agent = stagehand.agent({
  mode: "hybrid",
  model: "google/gemini-3-flash-preview", // Use a model that supports coordinate-based actions
});

const page = stagehand.context.pages()[0];
await page.goto("https://example.com/form");

const result = await agent.execute({
  instruction: "Fill out the registration form with test data",
  maxSteps: 15,
  highlightCursor: true, // Enabled by default in hybrid mode
});

console.log("Completed:", result.completed);

import { z } from "zod/v3";

const stagehand = new Stagehand({
  env: "LOCAL",
  experimental: true, // Required for structured output
});
await stagehand.init();

const agent = stagehand.agent({
  model: "anthropic/claude-sonnet-4-5-20250929",
});

const page = stagehand.context.pages()[0];
await page.goto("https://www.google.com/flights");

// Define output schema to receive structured data
const result = await agent.execute({
  instruction: "Find the cheapest flight from NYC to LA for next week",
  maxSteps: 20,
  output: z.object({
    price: z.string().describe("The price of the flight"),
    airline: z.string().describe("The airline name"),
    departureTime: z.string().describe("Departure time"),
    arrivalTime: z.string().describe("Arrival time"),
    flightNumber: z.string().optional().describe("Flight number if available"),
  }),
});

// Access the structured output
console.log("Flight found:");
console.log(`  Price: ${result.output?.price}`);
console.log(`  Airline: ${result.output?.airline}`);
console.log(`  Departure: ${result.output?.departureTime}`);
console.log(`  Arrival: ${result.output?.arrivalTime}`);

import { tool } from "ai";
import { z } from "zod/v3";

// Define custom tools using AI SDK format
const customTools = {
  calculateTotal: tool({
    description: "Calculate the total of items in cart",
    parameters: z.object({
      items: z.array(z.object({
        price: z.number(),
        quantity: z.number()
      }))
    }),
    execute: async ({ items }) => {
      const total = items.reduce((sum, item) => sum + (item.price * item.quantity), 0);
      return { total };
    }
  })
};

const agent = stagehand.agent({
  model: "openai/computer-use-preview",
  tools: customTools
});

const result = await agent.execute("Calculate the total cost of items in the shopping cart");

Error Types

The following errors may be thrown by the agent() method:

StagehandError - Base class for all Stagehand-specific errors
StagehandInitError - Agent was not properly initialized
MissingLLMConfigurationError - No LLM API key or client configured
UnsupportedModelError - The specified model is not supported for agent functionality
UnsupportedModelProviderError - The specified model provider is not supported
InvalidAISDKModelFormatError - Model string does not follow the required provider/model format
MCPConnectionError - Failed to connect to MCP server
StagehandDefaultError - General execution error with detailed message
AgentAbortError - Thrown when agent execution is cancelled via an AbortSignal
StreamingCallbacksInNonStreamingModeError - Thrown when streaming-only callbacks (onChunk, onFinish, onError, onAbort) are used without stream: true
ExperimentalNotConfiguredError - Thrown when experimental features (callbacks, signal, messages, streaming) are used without experimental: true in Stagehand constructor

First Steps

The Basics

Configuration

Best Practices

Integrations

Reference

Migration Guides

Agent

Agent Creation

Agent Configuration

Execute Method

Execute Parameters

Response

Example Response

Code Examples

Error Types

First Steps

The Basics

Configuration

Best Practices

Integrations

Reference

Migration Guides

Agent

​Agent Creation

​Agent Configuration

​Execute Method

​Execute Parameters

​Response

​Example Response

​Code Examples

​Error Types

Agent Creation

Agent Configuration

Execute Method

Execute Parameters

Response

Example Response

Code Examples

Error Types