diff --git a/.docs/code-example-guide.md b/.docs/code-example-guide.md index 0d97278..d73dbb4 100644 --- a/.docs/code-example-guide.md +++ b/.docs/code-example-guide.md @@ -360,7 +360,7 @@ Before publishing a code example, verify: ## Reference See these files for examples: -- `browsers/create-a-browser.mdx` - Standard browser creation pattern +- `introduction/create.mdx` - Standard browser creation pattern - `apps/develop.mdx` - App development pattern - `browsers/file-io.mdx` - Complex automation example diff --git a/auth/faq.mdx b/auth/faq.mdx index 4efb13d..cb0c912 100644 --- a/auth/faq.mdx +++ b/auth/faq.mdx @@ -36,7 +36,7 @@ Call `.login()` on the connection to trigger auth immediately. See [Triggering r ## What types of flows does Managed Auth support? -Managed Auth handles login and authentication flows end-to-end: entering credentials, multi-step login forms (e.g. email on one page, password on the next), SSO redirects, MFA challenges, and keeping sessions alive. For post-login browser work like form filling, sign-ups, or other workflows, use [Kernel's browser automation](/browsers/create-a-browser) directly. +Managed Auth handles login and authentication flows end-to-end: entering credentials, multi-step login forms (e.g. email on one page, password on the next), SSO redirects, MFA challenges, and keeping sessions alive. For post-login browser work like form filling, sign-ups, or other workflows, use [Kernel's browser automation](/introduction/control) directly. ## How do I debug a managed auth session? diff --git a/browsers/create-a-browser.mdx b/browsers/create-a-browser.mdx deleted file mode 100644 index 17071db..0000000 --- a/browsers/create-a-browser.mdx +++ /dev/null @@ -1,231 +0,0 @@ ---- -title: "Create a Browser" -description: "on-demand browsers for your agents" ---- - -Kernel browsers were designed to be lightweight and fast. Your agent can quickly create them on-demand and tear them down as soon as it is done using them. They can be used as part of the Kernel [app platform](/apps/develop) or connected to from another service with the Chrome DevTools Protocol. - -## 1. Create a Kernel browser - - -First, install the Kernel SDK: -- Typescript/Javascript: `npm install @onkernel/sdk` -- Python: `pip install kernel` - - -Use our SDK to create a browser: - - -```typescript Typescript/Javascript -import Kernel from '@onkernel/sdk'; - -const kernel = new Kernel(); - -const kernelBrowser = await kernel.browsers.create(); -console.log(kernelBrowser.session_id); -``` - -```python Python -from kernel import Kernel - -kernel = Kernel() - -kernel_browser = kernel.browsers.create() -print(kernel_browser.session_id) -``` - - - -## 2. Connect to the browser - -Kernel browsers support three connection methods: CDP for framework-level browser automation, WebDriver BiDi for W3C-standard control, and Computer Controls for OS-level mouse/keyboard input ideal for vision-based LLM loops. - - - - Connect with any Chrome DevTools Protocol framework like [Playwright](https://playwright.dev/) or [Puppeteer](https://pptr.dev/). Use `cdp_ws_url` from the created browser session. - - - ```typescript Typescript/Javascript - import { chromium } from 'playwright'; - - const browser = await chromium.connectOverCDP(kernelBrowser.cdp_ws_url); - const context = browser.contexts()[0]; - const page = context.pages()[0]; - - await page.goto('https://example.com'); - const title = await page.title(); - console.log(title); - ``` - - ```python Python - from playwright.async_api import async_playwright - - async with async_playwright() as playwright: - browser = await playwright.chromium.connect_over_cdp(kernel_browser.cdp_ws_url) - context = browser.contexts[0] - page = context.pages[0] - - await page.goto('https://example.com') - title = await page.title() - print(title) - ``` - - - - Connect with [Vibium](/integrations/vibium) or any WebDriver BiDi-compatible client. Use `webdriver_ws_url` from the created browser session. - - - ```typescript Typescript/Javascript - import { browser } from 'vibium'; - - const bro = await browser.start(kernelBrowser.webdriver_ws_url); - const page = await bro.page(); - - await page.goto('https://example.com'); - const title = await page.title(); - console.log(title); - ``` - - ```python Python - from vibium.sync_api import browser - - bro = browser.start(kernel_browser.webdriver_ws_url) - page = bro.page() - - page.goto('https://example.com') - title = page.title() - print(title) - ``` - - - - Control the browser's mouse, keyboard, and screen directly through the Kernel SDK — no CDP or WebDriver connection needed. This is ideal for vision-based LLM loops like [Claude Computer Use](/integrations/computer-use/anthropic). - - - ```typescript Typescript/Javascript - import Kernel from '@onkernel/sdk'; - - const kernel = new Kernel(); - const kernelBrowser = await kernel.browsers.create(); - - // Take a screenshot - const response = await kernel.browsers.computer.captureScreenshot(kernelBrowser.session_id); - - // Click at coordinates - await kernel.browsers.computer.clickMouse(kernelBrowser.session_id, { - x: 100, - y: 200, - }); - - // Type text - await kernel.browsers.computer.typeText(kernelBrowser.session_id, { - text: 'Hello, World!', - }); - ``` - - ```python Python - from kernel import Kernel - - kernel = Kernel() - kernel_browser = kernel.browsers.create() - - # Take a screenshot - screenshot = kernel.browsers.computer.capture_screenshot(id=kernel_browser.session_id) - - # Click at coordinates - kernel.browsers.computer.click_mouse( - id=kernel_browser.session_id, - x=100, - y=200, - ) - - # Type text - kernel.browsers.computer.type_text( - id=kernel_browser.session_id, - text="Hello, World!", - ) - ``` - - - - -## 3. Tear it down - -When you're finished with the browser, you can delete it: - - -```typescript Typescript/Javascript -import Kernel from '@onkernel/sdk'; - -const kernel = new Kernel(); - -await kernel.browsers.deleteByID(kernelBrowser.session_id); -``` - -```python Python -from kernel import Kernel - -kernel = Kernel() -await kernel.browsers.delete_by_id(kernel_browser.session_id) -``` - - -Browsers automatically delete after a timeout (default 60 seconds) if they don't receive a CDP or live view connection. You can [configure this timeout](/browsers/termination#automatic-deletion-via-timeout) when creating the browser. - -## Full example - -Once you've connected to the Kernel browser, you can do anything with it. - - - Kernel browsers launch with a default context and page. Make sure to access - the [existing context and - page](https://playwright.dev/docs/api/class-browsertype#browser-type-connect-over-cdp) - (`contexts()[0]` and `pages()[0]`), rather than trying to create a new one. - - - -```typescript Typescript/Javascript -import Kernel from '@onkernel/sdk'; -import { chromium } from 'playwright'; - -const kernel = new Kernel(); - -const kernelBrowser = await kernel.browsers.create(); -const browser = await chromium.connectOverCDP(kernelBrowser.cdp_ws_url); - -try { - const context = browser.contexts()[0] || (await browser.newContext()); - const page = context.pages()[0] || (await context.newPage()); - await page.goto('https://www.onkernel.com'); - const title = await page.title(); -} catch (error) { - console.error(error); -} finally { - await browser.close(); - await kernel.browsers.deleteByID(kernelBrowser.session_id); -} -``` - -```python Python -from kernel import Kernel -from playwright.async_api import async_playwright - -kernel = Kernel() - -kernel_browser = kernel.browsers.create() - -async with async_playwright() as playwright: - browser = await playwright.chromium.connect_over_cdp(kernel_browser.cdp_ws_url) - - try: - context = browser.contexts[0] if browser.contexts else await browser.new_context() - page = context.pages[0] if context.pages else await context.new_page() - await page.goto('https://www.onkernel.com') - title = await page.title() - except Exception as e: - print(e) - finally: - await browser.close() - await kernel.browsers.delete_by_id(kernel_browser.session_id) -``` - diff --git a/browsers/pools/scaling.mdx b/browsers/pools/scaling.mdx index 2d9b486..d293a58 100644 --- a/browsers/pools/scaling.mdx +++ b/browsers/pools/scaling.mdx @@ -6,7 +6,7 @@ description: "Recommended practices for scaling" This guide explains how to architect production-scale browser automation systems using Kernel, how to handle high-concurrency workloads, and best practices for building resilient systems. -After understanding the [basics](/browsers/create-a-browser) of our browsers, you should understand how to create and connect to individual browsers on-demand. This guide builds on that foundation to help you design systems using browser pools that can handle hundreds or thousands of concurrent browser tasks reliably. +After understanding the basics of [creating](/introduction/create) and [controlling](/introduction/control) our browsers, you should understand how to create and connect to individual browsers on-demand. This guide builds on that foundation to help you design systems using browser pools that can handle hundreds or thousands of concurrent browser tasks reliably. ## Understanding your requirements diff --git a/browsers/standby.mdx b/browsers/standby.mdx index afa8f4e..5be44dd 100644 --- a/browsers/standby.mdx +++ b/browsers/standby.mdx @@ -4,7 +4,16 @@ title: "Standby Mode" Kernel browsers enter standby mode during periods of inactivity. When a browser goes into standby mode, the browser's state remains the same but incurs zero usage costs. -Kernel browsers automatically enter standby when no CDP or Live View client is connected for `five seconds`. After it enters standby, the browser's [timeout](/browsers/termination#automatic-deletion-via-timeout) countdown begins. +Kernel browsers automatically enter standby after `five seconds` with no activity. After it enters standby, the browser's [timeout](/browsers/termination#automatic-deletion-via-timeout) countdown begins. + +A browser is considered active while any of the following is happening: + +- A CDP client is connected (Playwright, Puppeteer, or a raw CDP client) +- A WebDriver/BiDi client is connected +- A [Live View](/browsers/live-view) client is connected +- A [computer controls](/browsers/computer-controls) API request is in flight (clicks, keypresses, screenshots, etc.) + +Any of the above resets the standby idle timer. As soon as none are active for five seconds, the browser enters standby. See [here](/browsers/termination) to learn about destroying browsers. GPU-accelerated browsers do not support standby mode. diff --git a/browsers/termination.mdx b/browsers/termination.mdx index e32f04a..79e2f22 100644 --- a/browsers/termination.mdx +++ b/browsers/termination.mdx @@ -31,7 +31,7 @@ kernel.browsers.delete_by_id("htzv5orfit78e1m2biiifpbv") ## Automatic deletion via timeout -If you don't manually delete a browser, it will be automatically deleted after a configurable `timeout` (default 60 seconds). The timeout begins when the browser does not see a CDP or live view connection. +If you don't manually delete a browser, it will be automatically deleted after a configurable `timeout` (default 60 seconds). The timeout begins once the browser enters [standby](/browsers/standby) — i.e. when there's no CDP or WebDriver client, no Live View viewer, and no [computer controls](/browsers/computer-controls) request in flight. You can set a custom timeout of up to 72 hours when creating a browser: diff --git a/changelog.mdx b/changelog.mdx index 68c8152..bc12827 100644 --- a/changelog.mdx +++ b/changelog.mdx @@ -199,7 +199,7 @@ For API library updates, see the [Node SDK](https://github.com/onkernel/kernel-n ## Documentation updates -- Renamed "Scaling in Production" to [Reserved Browsers](/browsers/pools/overview) and added a new [On-Demand Browsers](/browsers/create-a-browser) section for clearer guidance on browser provisioning strategies. +- Renamed "Scaling in Production" to [Reserved Browsers](/browsers/pools/overview) and added a new [On-Demand Browsers](/introduction/create) section for clearer guidance on browser provisioning strategies. - Added [mobile and tablet viewport configurations](/browsers/viewport) with supported screen sizes and usage guidance. - Added [proxy-bypass-hosts](https://www.kernel.sh/docs/proxies/overview#bypass-hosts) documentation for configuring proxy bypass lists on browser pools. - Documented the `--force` flag for [viewport resizing](/browsers/viewport) during active recordings. diff --git a/docs.json b/docs.json index ff068a1..763e3d5 100644 --- a/docs.json +++ b/docs.json @@ -11,6 +11,7 @@ { "source": "/auth/agent/programmatic", "destination": "/auth/programmatic" }, { "source": "/auth/agent/faq", "destination": "/auth/faq" }, { "source": "/browsers/hardware-acceleration", "destination": "/browsers/gpu-acceleration" }, + { "source": "/browsers/create-a-browser", "destination": "/introduction/create" }, { "source": "/introduction", "destination": "/" }, { "source": "/quickstart", "destination": "/" }, { "source": "/home", "destination": "/" } @@ -64,9 +65,12 @@ "tab": "Guides", "groups": [ { - "group": "home", + "group": "Introduction", "pages": [ - "index" + "index", + "introduction/create", + "introduction/control", + "introduction/observe" ] }, { @@ -76,7 +80,6 @@ "group": "Basics", "expanded": true, "pages": [ - "browsers/create-a-browser", "browsers/live-view", "browsers/termination", "browsers/standby", diff --git a/index.mdx b/index.mdx index a2d25d3..4769ead 100644 --- a/index.mdx +++ b/index.mdx @@ -8,7 +8,7 @@ mode: "wide" We build crazy fast, open source infra for AI agents to access the internet. Trusted by Cash App, Framer, and 3,000+ teams. - + We spin up cloud browsers in <30ms with GPU acceleration when needed. @@ -17,11 +17,25 @@ We build crazy fast, open source infra for AI agents to access the internet. Tru We solve CAPTCHAs and manage residential proxies to help you see fewer of them. - + You can view sessions live and record them as MP4s for debugging. +## start here + + + + Spin up a browser and pick the shape — headless, stealth, GPU, profiles. + + + Drive it with computer use, playwright execution, CDP, or WebDriver BiDi. + + + Watch it live, record replays, and capture screenshots. + + + import { CopyPromptButton } from '/snippets/copy-prompt-button.jsx';
diff --git a/integrations/notte.mdx b/integrations/notte.mdx index ea1784f..895226e 100644 --- a/integrations/notte.mdx +++ b/integrations/notte.mdx @@ -118,7 +118,7 @@ if __name__ == "__main__": ## Next steps -- Learn about [creating browsers](/browsers/create-a-browser) on Kernel +- Learn about [creating browsers](/introduction/create) on Kernel - Check out [live view](/browsers/live-view) for debugging your automations - Learn about [stealth mode](/browsers/bot-detection/stealth) for avoiding detection - Explore [Profiles](/auth/profiles) for maintaining browser state across sessions diff --git a/integrations/overview.mdx b/integrations/overview.mdx index 6ad6c7b..49b4ff6 100644 --- a/integrations/overview.mdx +++ b/integrations/overview.mdx @@ -46,4 +46,4 @@ Kernel provides detailed guides for popular agent frameworks: ## Custom Integrations -Kernel works with any tool that supports CDP. Check out our [browser creation guide](/browsers/create-a-browser) to learn how to connect any other agent framework. +Kernel works with any tool that supports CDP. Check out our [browser control guide](/introduction/control) to learn how to connect any other agent framework. diff --git a/integrations/vercel/ai-sdk.mdx b/integrations/vercel/ai-sdk.mdx index ebad77c..5f0a464 100644 --- a/integrations/vercel/ai-sdk.mdx +++ b/integrations/vercel/ai-sdk.mdx @@ -181,6 +181,6 @@ So, any code you can run through the SDK can be run via the tool. ## Related - [Vercel Marketplace Integration](/integrations/vercel/marketplace) -- [Browser Creation](/browsers/create-a-browser) +- [Browser Creation](/introduction/create) - [Stealth Mode](/browsers/bot-detection/stealth) - [Live View](/browsers/live-view) diff --git a/integrations/vercel/marketplace.mdx b/integrations/vercel/marketplace.mdx index ef3a6b3..92dc717 100644 --- a/integrations/vercel/marketplace.mdx +++ b/integrations/vercel/marketplace.mdx @@ -95,7 +95,7 @@ finally: ``` -For more examples and features like profiles, stealth mode, and live view, check out the [Browsers documentation](/browsers/create-a-browser). +For more examples and features like profiles, stealth mode, and live view, check out the [Browsers documentation](/introduction/create). Check out our [integration guides](/integrations/overview) to learn how to use Vercel + Kernel with your preferred browser automation framework. diff --git a/introduction/control.mdx b/introduction/control.mdx new file mode 100644 index 0000000..820efc9 --- /dev/null +++ b/introduction/control.mdx @@ -0,0 +1,202 @@ +--- +title: "Control" +description: "Drive the browser with computer use, playwright execution, CDP, or WebDriver BiDi" +--- + +Kernel browsers expose four ways to drive a session. For agents, we recommend [computer use](/browsers/computer-controls) or [playwright execution](/browsers/playwright-execution) — both run co-located with the browser and avoid the bot-detection surface a direct CDP connection introduces. + + + + Kernel's [Computer Controls](/browsers/computer-controls) API exposes OS-level mouse, keyboard, and screen primitives — the surface a computer-use model already knows how to drive (screenshot, click, type, key, scroll, drag). No CDP or WebDriver connection required, so there's no protocol fingerprint to leak. Ideal for [Claude](/integrations/computer-use/anthropic), [OpenAI](/integrations/computer-use/openai), or [Gemini](/integrations/computer-use/gemini) computer-use loops. + + + ```typescript Typescript/Javascript + import Kernel from '@onkernel/sdk'; + + const kernel = new Kernel(); + const kernelBrowser = await kernel.browsers.create(); + + const screenshot = await kernel.browsers.computer.captureScreenshot(kernelBrowser.session_id); + + await kernel.browsers.computer.clickMouse(kernelBrowser.session_id, { + x: 420, + y: 280, + }); + + await kernel.browsers.computer.typeText(kernelBrowser.session_id, { + text: 'kernel cloud browsers', + }); + ``` + + ```python Python + from kernel import Kernel + + kernel = Kernel() + kernel_browser = kernel.browsers.create() + + screenshot = kernel.browsers.computer.capture_screenshot(id=kernel_browser.session_id) + + kernel.browsers.computer.click_mouse( + id=kernel_browser.session_id, + x=420, + y=280, + ) + + kernel.browsers.computer.type_text( + id=kernel_browser.session_id, + text="kernel cloud browsers", + ) + ``` + + + + Run any Playwright code from anywhere — no local Playwright install, no Chromium download, no CDP connection to manage. Your code executes inside the browser's VM with the full Playwright API in scope and returns structured data back to your agent. Ships with [Patchright](/browsers/bot-detection/stealth) by default. + + + ```typescript Typescript/Javascript + const response = await kernel.browsers.playwright.execute( + kernelBrowser.session_id, + { + code: ` + await page.goto('https://example.com'); + return await page.title(); + `, + }, + ); + + console.log(response.result); + ``` + + ```python Python + response = kernel.browsers.playwright.execute( + id=kernel_browser.session_id, + code=""" + await page.goto('https://example.com') + return await page.title() + """, + ) + + print(response.result) + ``` + + + + Chrome DevTools Protocol — the wire format Playwright, Puppeteer, and most browser frameworks speak. Use `cdp_ws_url` from the created browser session for deterministic, scripted automation driven from your own infra. + + + ```typescript Typescript/Javascript + import { chromium } from 'playwright'; + + const browser = await chromium.connectOverCDP(kernelBrowser.cdp_ws_url); + const context = browser.contexts()[0]; + const page = context.pages()[0]; + + await page.goto('https://example.com'); + const title = await page.title(); + console.log(title); + ``` + + ```python Python + from playwright.async_api import async_playwright + + async with async_playwright() as playwright: + browser = await playwright.chromium.connect_over_cdp(kernel_browser.cdp_ws_url) + context = browser.contexts[0] + page = context.pages[0] + + await page.goto('https://example.com') + title = await page.title() + print(title) + ``` + + + + W3C-standard browser control. Use `webdriver_ws_url` with [Vibium](/integrations/vibium) or any other BiDi client. + + + ```typescript Typescript/Javascript + import { browser } from 'vibium'; + + const bro = await browser.start(kernelBrowser.webdriver_ws_url); + const page = await bro.page(); + + await page.goto('https://example.com'); + const title = await page.title(); + console.log(title); + ``` + + ```python Python + from vibium.sync_api import browser + + bro = browser.start(kernel_browser.webdriver_ws_url) + page = bro.page() + + page.goto('https://example.com') + title = page.title() + print(title) + ``` + + + + +## Why computer use for agents + +Kernel's computer controls are built to match how computer-use models were trained — the same primitives the model emits (screenshot, click at coords, type, key, scroll, drag) map 1:1 onto the API. There's no harness translating model output into framework calls. + +- **Native fit.** Screenshot, click, type, key, scroll, drag — the primitives the model already speaks. +- **Faster screenshots.** Captures bypass CDP, which removes the largest source of latency in a vision loop. +- **Better against bot detection.** No CDP connection means no CDP fingerprint to leak. Pairs naturally with [stealth mode](/browsers/bot-detection/stealth) and [residential proxies](/proxies/residential). +- **Human-like input.** OS-level events with Bézier-curve mouse paths, variable typing speed, and configurable mistype rate. +- **Not DOM-limited.** Screenshots capture the full VM, so the agent can see and interact with native dialogs, canvas elements, iframes, and PDFs — not just things you can address with a selector. + +## Why playwright execution over a direct CDP connection + +If you're reaching for Playwright, prefer the execution API over `connectOverCDP`. Same Playwright API you already know, none of the setup. + +- **Run from anywhere.** No `playwright` package to version-pin, no Chromium download, no CDP connection to manage. Send the code, get the result. +- **Co-located with the browser.** Code runs in the same VM as the browser — no network hop between your script and the page, fewer flakes. +- **Patchright by default.** Hardened against bot detection out of the box. +- **Full Playwright API.** `page`, `context`, and `browser` are all in scope. Anything Playwright can do — DOM queries, file uploads, full-page screenshots — works here. +- **Returns values.** `return` from your code and the result comes back in the response. Easy to use as an agent tool. + +## Computer use + playwright execution + +Computer controls drive the browser the way a person would — they don't speak the programmatic API surface. Anything you'd reach for the DOM or Playwright client for (reading text and attributes, `page.goto`, file uploads, cookie or storage access, switching tabs) belongs on the [playwright execution](/browsers/playwright-execution) side. The recommended pattern for agents is computer controls for interaction, playwright execution as a tool the agent can call when it needs structured data or a programmatic action. + + +```typescript Typescript/Javascript +const response = await kernel.browsers.playwright.execute( + kernelBrowser.session_id, + { + code: ` + const rows = await page.$$eval('table tr', (trs) => + trs.map((tr) => Array.from(tr.querySelectorAll('td')).map((td) => td.textContent)) + ); + return rows; + `, + }, +); + +console.log(response.result); +``` + +```python Python +response = kernel.browsers.playwright.execute( + id=kernel_browser.session_id, + code=""" + const rows = await page.$$eval('table tr', (trs) => + trs.map((tr) => Array.from(tr.querySelectorAll('td')).map((td) => td.textContent)) + ); + return rows; + """, +) + +print(response.result) +``` + + +## Going deeper + +- [Computer Controls reference](/browsers/computer-controls) — every mouse, keyboard, and screen primitive. +- [Playwright Execution reference](/browsers/playwright-execution) — the full execution surface, return values, and timeouts. +- [Computer use integrations](/integrations/computer-use/anthropic) — drop-in examples for Anthropic, Gemini, OpenAI, and more. diff --git a/introduction/create.mdx b/introduction/create.mdx new file mode 100644 index 0000000..78c01fe --- /dev/null +++ b/introduction/create.mdx @@ -0,0 +1,141 @@ +--- +title: "Create" +description: "Spin up a cloud browser for your agent" +--- + +Kernel browsers are sandboxed Chromium instances that boot in under 30ms. Your agent creates them on demand, drives them, and tears them down — no infra to provision, no servers to run. + +## Your first browser + + + Install the Kernel SDK first: + - Typescript/Javascript: `npm install @onkernel/sdk` + - Python: `pip install kernel` + + + +```typescript Typescript/Javascript +import Kernel from '@onkernel/sdk'; + +const kernel = new Kernel(); + +const kernelBrowser = await kernel.browsers.create(); +console.log(kernelBrowser.session_id); +``` + +```python Python +from kernel import Kernel + +kernel = Kernel() + +kernel_browser = kernel.browsers.create() +print(kernel_browser.session_id) +``` + +```bash CLI +kernel browsers create + +# or with a starting URL and stealth +kernel browsers create --stealth --start-url https://example.com +``` + + +The response includes everything you need to drive the browser: `session_id`, `cdp_ws_url`, `webdriver_ws_url`, and `browser_live_view_url`. + +## Pick the right shape + +Most of what you'll tune at creation time falls into four buckets: + + + + headful (default) supports live view, replays, and better stealth — ideal for agent workflows on bot-detected sites. headless is lighter (1 gb vs 8 gb), good for simple scraping. + + + Turn on stealth mode and route through residential, ISP, or datacenter proxies when you're hitting sites with bot detection. + + + Required for WebGL, video, and canvas-heavy workloads. Trades off standby support. + + + Persist cookies, storage, and logged-in sessions across runs with a [profile](/auth/profiles), or hand auth off to Kernel entirely with [managed auth](/auth/overview). + + + +## Lifecycle + +A browser stays alive as long as something is driving it — a CDP or WebDriver client, a [Live View](/browsers/live-view) viewer, or an in-flight [computer controls](/browsers/computer-controls) request. After five seconds with none of those active, it enters [standby](/browsers/standby) — state is preserved, billing stops. Once in standby, after the configurable timeout (60s by default) elapses it's deleted. + +We recommend you delete a browser explicitly when you're done with it: + + +```typescript Typescript/Javascript +await kernel.browsers.deleteByID(kernelBrowser.session_id); +``` + +```python Python +kernel.browsers.delete_by_id(kernel_browser.session_id) +``` + +```bash CLI +kernel browsers delete +``` + + +See [Termination & timeouts](/browsers/termination#automatic-deletion-via-timeout) for the full set of teardown options. + +## Full example + +Putting it together — create a browser, run Playwright code inside it, tear down: + + +```typescript Typescript/Javascript +import Kernel from '@onkernel/sdk'; + +const kernel = new Kernel(); + +const kernelBrowser = await kernel.browsers.create(); + +try { + const response = await kernel.browsers.playwright.execute( + kernelBrowser.session_id, + { + code: ` + await page.goto('https://www.onkernel.com'); + return await page.title(); + `, + }, + ); + console.log(response.result); +} catch (error) { + console.error(error); +} finally { + await kernel.browsers.deleteByID(kernelBrowser.session_id); +} +``` + +```python Python +from kernel import Kernel + +kernel = Kernel() + +kernel_browser = kernel.browsers.create() + +try: + response = kernel.browsers.playwright.execute( + id=kernel_browser.session_id, + code=""" + await page.goto('https://www.onkernel.com') + return await page.title() + """, + ) + print(response.result) +except Exception as e: + print(e) +finally: + kernel.browsers.delete_by_id(kernel_browser.session_id) +``` + + +## What's next + +Once you have a browser, you need to drive it. Head to [Control](/introduction/control) to see the four primitives Kernel exposes — computer use, playwright execution, CDP, and WebDriver BiDi — and when to reach for each. diff --git a/introduction/observe.mdx b/introduction/observe.mdx new file mode 100644 index 0000000..fba6a8f --- /dev/null +++ b/introduction/observe.mdx @@ -0,0 +1,132 @@ +--- +title: "Observe" +description: "Watch your agent work, debug what went wrong" +--- + +Browser agents fail in ways that don't show up in logs. Kernel gives you four ways to see what's actually happening — live, after the fact, frame by frame, and line by line. + +## Live view + +Every browser exposes a `browser_live_view_url` you can open in a browser tab or embed in an iframe. Use it to watch an agent run in real time, hand a session off to a human-in-the-loop, or surface the browser as part of your own UI. + + +```typescript Typescript/Javascript +import Kernel from '@onkernel/sdk'; + +const kernel = new Kernel(); + +const kernelBrowser = await kernel.browsers.create(); +console.log(kernelBrowser.browser_live_view_url); +``` + +```python Python +from kernel import Kernel + +kernel = Kernel() + +kernel_browser = kernel.browsers.create() +print(kernel_browser.browser_live_view_url) +``` + +```bash CLI +kernel browsers view +``` + + +Add `?readOnly=true` for a non-interactive view, or enable [kiosk mode](/browsers/live-view#kiosk-mode) at creation for a fullscreen, cinematic experience. Full reference: [Live View](/browsers/live-view). + +## Replays + +Replays are MP4 recordings you start and stop on demand — capture as many clips per session as you need. They're the right tool for post-hoc debugging: a failed run gives you one or more videos to scrub through, share, or attach to a bug report. + +Replays can also be enabled on managed auth sessions, so you can [debug failed logins](https://www.kernel.sh/docs/auth/configuration#record-sessions-for-debugging) the same way. + + +```typescript Typescript/Javascript +const replay = await kernel.browsers.replays.start(kernelBrowser.session_id); + +// ...run the agent... + +await kernel.browsers.replays.stop(replay.replay_id, { + id: kernelBrowser.session_id, +}); +``` + +```python Python +replay = kernel.browsers.replays.start(kernel_browser.session_id) + +# ...run the agent... + +kernel.browsers.replays.stop( + replay.replay_id, + id=kernel_browser.session_id, +) +``` + +```bash CLI +# Start recording +kernel browsers replays start + +# ...run the agent... + +# Stop and download +kernel browsers replays stop +kernel browsers replays download -o replay.mp4 +``` + + +Full reference: [Replays](/browsers/replays). + +## Screenshots + +Pull a frame at any moment with computer controls — useful for snapshotting state at decision points, attaching to traces, or feeding back into a vision model. + + +```typescript Typescript/Javascript +const screenshot = await kernel.browsers.computer.captureScreenshot( + kernelBrowser.session_id, +); +``` + +```python Python +screenshot = kernel.browsers.computer.capture_screenshot( + id=kernel_browser.session_id, +) +``` + +```bash CLI +kernel browsers computer screenshot --to screenshot.png +``` + + +For full-page captures, use [Playwright execution](/browsers/playwright-execution#screenshots) instead. + +## Invocation logs + +If you're running an agent on Kernel's [app platform](/apps/develop), every invocation produces a streaming log feed. Tail it live while the agent runs, or pull it after the fact for debugging. + + +```typescript Typescript/Javascript +const logs = await kernel.invocations.follow(invocationId); + +for await (const event of logs) { + console.log(event); +} +``` + +```python Python +logs = kernel.invocations.follow(invocation_id) + +for event in logs: + print(event) +``` + + +Full reference: [Logs](/apps/logs). + +## Picking the right tool + +- **Building the agent?** Keep a [live view](/browsers/live-view) tab open while you iterate. +- **Debugging a failure?** Capture a [replay](/browsers/replays) for the run, then watch the video. +- **Instrumenting the agent itself?** Drop [screenshots](/browsers/computer-controls#take-screenshots) and [logs](/apps/logs) into your traces at the points that matter. +- **Putting a human in the loop?** Embed the [live view](/browsers/live-view#embedding-in-an-iframe) in your own UI.