Skip to content

Add browsability skill — assess how usable a site is for an AI browser agent#122

Open
shubh24 wants to merge 5 commits into
mainfrom
shubh24/browsability-skill
Open

Add browsability skill — assess how usable a site is for an AI browser agent#122
shubh24 wants to merge 5 commits into
mainfrom
shubh24/browsability-skill

Conversation

@shubh24
Copy link
Copy Markdown
Contributor

@shubh24 shubh24 commented May 30, 2026

What

Adds the browsability skill: a way to assess how usable a website is by an AI browser agent.

It's the sibling of agent-experience (which audits docs/SDK onboarding DX). This one is about operability — can an agent actually perceive and drive the live UI — not discoverability (no SEO/AEO/llms.txt).

The opinion

Browsability is how little help an agent needs to succeed, and how much harder the site is for an agent than for a person. Only agent-specific friction counts — a workflow that's long for humans too isn't a browsability problem; a simple task made hard by unlabeled controls is.

No score, no formulas. The skill is guidance, not a calculator: the agent looks at the site, uses the rubric as a flexible checklist, and decides what matters for that site. It reports findings as two separate tables — what helps, and what hurts (with the fix) — so the agent's judgment drives the result rather than a hard-coded weighting.

What it looks at

  • Getting in — how much infrastructure help (stealth / proxy / captcha-solving) the agent needs. Agents run on remote/cloud browsers, so that's the environment that counts: if a task works on a local browser but is blocked or errors on a remote one, the site is gating automated browsers — flagged as a major browsability failure, not a testing caveat. (Includes the diagnostic that a chrome-error://… URL or bare-domain title = a blocked navigation, not an empty render.)
  • Seeing the controls — whether controls survive the accessibility-tree prune (labeled/native vs unlabeled <div>-as-button).
  • Structural traps — cross-origin iframes, shadow DOM, very deep/large DOMs, never-settling pages, virtualized lists.
  • Extra steps — steps the agent pays beyond a human (e.g. a custom dropdown that costs two actions where a native <select> costs one).
  • Recovery — blocking cookie/consent walls, unstable DOM, navigation timeouts.

Contents

  • SKILL.md — the workflow: try the site with the browse skill, judge against the rubric, report helps/hurts.
  • references/rubric.md — the full checklist (what helps vs hurts) + a remediation table. Guidance, not a rulebook.

No scripts — the agent uses the existing browse skill to look at and drive the site.

Grounding & scope

  • Grounded in what the open-source Stagehand framework treats as hard; references only public Browserbase session settings.
  • Operability layer only; explicitly not SEO/AEO/discoverability, and not docs/SDK onboarding (that's agent-experience).

🤖 Generated with Claude Code

Adds the `browsability` skill — an operational rubric for how well an AI
*browser* agent can drive a website's UI (the sibling of agent-experience,
which covers docs/SDK onboarding DX).

Scores 0–100 across:
  A  Access Resistance — lowest assistance rung (stealth/proxy/captcha ladder)
                          a task needs to complete
  B1 Reachability      — % of controls that survive the accessibility-tree prune
  B3 Structural traps  — cross-origin iframes, shadow DOM, DOM depth/size
  C  Agent tax         — agent steps OVER the human baseline (delta, not absolute)
  D  Recoverability    — self-heal / site errors / blocking overlays / step ceiling

The Drivability slice (B1+B3) runs deterministically from one page load via
scripts/friction.ts (no model). The full score adds an agent run across the
assistance ladder via scripts/score.ts. Rubric grounded in what the open-source
Stagehand framework treats as hard; uses only public Browserbase session settings.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 9c30e80. Configure here.

Comment thread skills/browsability/scripts/score.ts Outdated
Comment thread skills/browsability/scripts/score.ts Outdated
shubh24 and others added 4 commits June 1, 2026 12:40
… table

Simplify per review: remove the numeric 0–100 score, weighted axes, and the
friction.ts/score.ts probes. The skill is now pure guidance — the agent looks
at the site with the `browser` skill, uses the rubric as a flexible checklist,
and reports what helps vs what hurts (with fixes), letting the agent judge what
matters for each site rather than applying hard-coded formulas.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
A combined two-column table implied that a Helps row and the Hurts row beside
it were related. Split into two standalone tables (Hurts keeps a Fix column).

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…ilure

Agents run on remote/cloud browsers, so that's the environment that counts.
If a task works on a local browser but is blocked/errors on a remote one, the
site is gating automated browsers — a major browsability failure, not a
testing caveat. Adds the remote-vs-local test and the chrome-error/bare-domain
diagnostic (an empty remote page is usually a blocked navigation, not a render).

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@shubh24 shubh24 changed the title Add browsability skill — score how usable a site is for a browser agent Add browsability skill — assess how usable a site is for an AI browser agent Jun 1, 2026
@shubh24 shubh24 requested a review from shrey150 June 2, 2026 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant