Add browsability skill — assess how usable a site is for an AI browser agent#122
Open
shubh24 wants to merge 5 commits into
Open
Add browsability skill — assess how usable a site is for an AI browser agent#122shubh24 wants to merge 5 commits into
shubh24 wants to merge 5 commits into
Conversation
Adds the `browsability` skill — an operational rubric for how well an AI
*browser* agent can drive a website's UI (the sibling of agent-experience,
which covers docs/SDK onboarding DX).
Scores 0–100 across:
A Access Resistance — lowest assistance rung (stealth/proxy/captcha ladder)
a task needs to complete
B1 Reachability — % of controls that survive the accessibility-tree prune
B3 Structural traps — cross-origin iframes, shadow DOM, DOM depth/size
C Agent tax — agent steps OVER the human baseline (delta, not absolute)
D Recoverability — self-heal / site errors / blocking overlays / step ceiling
The Drivability slice (B1+B3) runs deterministically from one page load via
scripts/friction.ts (no model). The full score adds an agent run across the
assistance ladder via scripts/score.ts. Rubric grounded in what the open-source
Stagehand framework treats as hard; uses only public Browserbase session settings.
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 9c30e80. Configure here.
… table Simplify per review: remove the numeric 0–100 score, weighted axes, and the friction.ts/score.ts probes. The skill is now pure guidance — the agent looks at the site with the `browser` skill, uses the rubric as a flexible checklist, and reports what helps vs what hurts (with fixes), letting the agent judge what matters for each site rather than applying hard-coded formulas. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
A combined two-column table implied that a Helps row and the Hurts row beside it were related. Split into two standalone tables (Hurts keeps a Fix column). Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…ilure Agents run on remote/cloud browsers, so that's the environment that counts. If a task works on a local browser but is blocked/errors on a remote one, the site is gating automated browsers — a major browsability failure, not a testing caveat. Adds the remote-vs-local test and the chrome-error/bare-domain diagnostic (an empty remote page is usually a blocked navigation, not a render). Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

What
Adds the
browsabilityskill: a way to assess how usable a website is by an AI browser agent.It's the sibling of
agent-experience(which audits docs/SDK onboarding DX). This one is about operability — can an agent actually perceive and drive the live UI — not discoverability (no SEO/AEO/llms.txt).The opinion
Browsability is how little help an agent needs to succeed, and how much harder the site is for an agent than for a person. Only agent-specific friction counts — a workflow that's long for humans too isn't a browsability problem; a simple task made hard by unlabeled controls is.
No score, no formulas. The skill is guidance, not a calculator: the agent looks at the site, uses the rubric as a flexible checklist, and decides what matters for that site. It reports findings as two separate tables — what helps, and what hurts (with the fix) — so the agent's judgment drives the result rather than a hard-coded weighting.
What it looks at
chrome-error://…URL or bare-domain title = a blocked navigation, not an empty render.)<div>-as-button).<select>costs one).Contents
SKILL.md— the workflow: try the site with thebrowseskill, judge against the rubric, report helps/hurts.references/rubric.md— the full checklist (what helps vs hurts) + a remediation table. Guidance, not a rulebook.No scripts — the agent uses the existing
browseskill to look at and drive the site.Grounding & scope
agent-experience).🤖 Generated with Claude Code