autobrowse: optional vendor-neutral inbox-provider hook by aq17 · Pull Request #119 · browserbase/skills

aq17 · 2026-05-26T21:29:17Z

Summary

Lets an autobrowse loop provision a throwaway inbox so the inner agent can register accounts, log in, and complete email verification — without the user supplying their own email to the agent. Fully self-contained in the autobrowse skill (no browse.sh dependency).

New scripts/inbox.mjs CLI: create / wait-otp / wait-link / latest / release. It talks directly to AgentMail (api.agentmail.to) using AGENTMAIL_API_KEY from the env; the inner agent only ever sees the inbox address (the key is read by inbox.mjs and never printed, and the execute allowlist permits only browse + inbox.mjs).
create sweeps stale ab--prefixed inboxes (>1h) before minting — self-heals crashed loops without ever touching a non-ab- inbox.
evaluate.mjs gains --inbox-email, injects an "Agent Inbox" section into the system prompt, and allows the agent to shell out to inbox.mjs.
SKILL.md documents the opt-in provision step, mandatory release/cleanup, the graduation note (inbox is loop-only — graduated skills expect the end user's own credentials), and the 3-inbox free-tier concurrency cap.

Key source

Browserbase deployments inject a pooled AGENTMAIL_API_KEY (claimed org) into the skill-runner env — browse.sh uses the skill with zero browse.sh-specific code.
Regular users set their own free key from agentmail.to; a clear error fires if it's unset.

Verification coverage

Flow	Support
Numeric OTP	`wait-otp` (default 4–8 digits)
Alphanumeric code	`wait-otp --regex`
Click / magic link	`wait-link [--match]` → `browse open`
Raw inspection	`latest`

Test plan

CLI guards: missing AGENTMAIL_API_KEY → clear setup error; missing state / unknown command error cleanly
Live end-to-end against a real AgentMail org: create → wait-otp (extracted 771209) → wait-link (extracted href URL) → latest → release; org returned to baseline inbox count, no leaks
Sweep safety: a second create left a fresh ab- inbox and the non-ab- primary untouched
include_spam=true on polling — verification emails to a fresh inbox often get spam-flagged

Supersedes the earlier two-repo approach (browse.sh#151 closed).

🤖 Generated with Claude Code

Note

Medium Risk
Expands the agent execute allowlist and runs external inbox CLIs with altered timeouts; misconfiguration or weak provider isolation could affect parallel tasks, though workspace/task pinning mitigates cross-task access.

Overview
evaluate.mjs gains optional throwaway-inbox support for signup/login/MFA flows via a pluggable provider (--inbox-cmd / AUTOBROWSE_INBOX_CMD), documented inline with a create / wait-otp / wait-link / latest / release contract and tasks/<task>/.inbox.json.

When configured, the inner agent may run only browse plus node <resolved-inbox-cmd>; provider calls are scoped to the current run’s workspace/task (agent-supplied --workspace/--task stripped), and wait-otp / wait-link get exec timeouts extended past the default 30s cap. Runs resolve the inbox from .inbox.json (with --inbox-email fallback), warn on mismatch, substitute {{inbox_email}} in task.md, and inject an Agent Inbox system-prompt section with provider command examples.

.gitignore now ignores .inbox.json so per-task inbox state is not committed.

^{Reviewed by Cursor Bugbot for commit 0593523. Bugbot is set up for automated code reviews on this repo. Configure here.}

Lets an autobrowse loop provision a throwaway inbox so the inner agent can register accounts and complete email verification. A new scripts/inbox.mjs CLI (create / wait-otp / wait-link / latest / release) talks to the browse.sh inbox endpoint, which owns the AgentMail key — the agent only ever sees the address. evaluate.mjs gains --inbox-email, injects the inbox into the system prompt, and allows the agent to shell out to inbox.mjs. SKILL.md documents the opt-in provision/release steps, graduation note (inbox is loop-only), and the 3-concurrent-loop free-tier cap. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Consolidates all inbox-provisioning logic into the autobrowse skill so the feature is self-contained with nothing browse.sh-specific. inbox.mjs now calls api.agentmail.to directly using AGENTMAIL_API_KEY from the env (sweep-on-create and the ab- prefix guard move into the CLI). Browserbase deployments inject a pooled key; regular users provide their own (free at agentmail.to) and get a clear setup error if it's unset. The inner agent still only ever sees the inbox address — the key is read by inbox.mjs and never printed. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

Hardening found by a live Substack magic-link signup run end-to-end: - wait-link returned an open-tracking pixel (.gif) because it grabbed the first URL anywhere in the body. Now extract <a href> anchors with a reject-list (unsubscribe/mailto/tel/preferences/.gif), which skips img-src pixels; --match matches the href OR the visible link text so "confirm"/"sign in" finds the CTA even when the href is a tracking redirect (browse open follows it). - latest only showed list-summary metadata (the list endpoint omits the body). It now fetches the full single message by id so text/html/links are visible. - partsOf prefers AgentMail's cleaned extracted_text/extracted_html. - evaluate.mjs killed wait-otp/wait-link at the fixed 30s exec cap (ETIMEDOUT on --within 60/90). exec timeout for inbox wait commands is now --within + 15s. Verified end-to-end: signup → wait-link returns the real "Confirm your email" CTA → browse open → signed-in Substack home. Sweep still proven to never touch non-ab- inboxes. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

… truth) - create now releases the inbox the task already tracks before minting a new one — a re-create within the 1h sweep window otherwise orphaned a live inbox (leaked AND unreachable by release). (#2) - evaluate.mjs resolves the inbox address from .inbox.json (what wait-otp/ wait-link actually poll); --inbox-email is a fallback and a mismatch now warns instead of silently polling a different inbox. (#4) - {{inbox_email}} in task.md is now substituted with the resolved address. (#3) - executeCommand pins inbox.mjs to the run's own --workspace/--task, so a sub-agent can't read or release a sibling task's inbox (parallel runs share a workspace, isolated only by --task). (#5) The 30s exec-timeout issue (#1) was already fixed by execTimeoutFor in 2d091fc. Verified: re-create deletes the prior inbox (no orphan); a divergent --inbox-email warns and the resolved address wins; {{inbox_email}} is replaced; an agent passing a foreign --task is overridden back to its own. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

aq17 · 2026-06-01T21:03:18Z

Addressed the Bugbot findings in 47f740f (lifecycle + single-source-of-truth) — #1 was already handled in 2d091fc:

Bugbot issue	Resolution
Inbox wait killed by timeout	Already fixed in `2d091fc` — `execTimeoutFor` gives `wait-otp`/`wait-link` `--within + 15s` instead of the fixed 30s cap.
Recreate leaves orphan inboxes	`cmdCreate` now DELETEs the inbox the task already tracks before minting a new one (sweep can't catch a <1h-old inbox, and overwriting `.inbox.json` would orphan it).
Task placeholder never substituted	`evaluate.mjs` now substitutes `{{inbox_email}}` (optional inner whitespace) in `task.md` with the resolved address.
Prompt email ignores inbox state	`.inbox.json` is now the single source of truth (it's what `wait-*` poll). `--inbox-email` is a fallback; a mismatch logs a `WARNING` and the `.inbox.json` value wins.
Inbox CLI args unsandboxed	`executeCommand` pins `inbox.mjs` to the run's own `--workspace`/`--task`, stripping any agent-supplied values — a sub-agent can no longer read/release a sibling task's inbox.

Verified each end-to-end against a live AgentMail org: re-create deletes the prior inbox (no orphan); a divergent --inbox-email warns and the resolved address wins; {{inbox_email}} is replaced (no literal/flag leak); an agent passing a foreign --task is overridden back to its own ([scope] ... overridden).

aq17 · 2026-06-01T22:08:35Z

Validation summary — ready for review

Tested at HEAD 47f740f:

Standalone loop (twice): full Substack magic-link signup → wait-link returned the real CTA → browse open → signed-in state confirmed ({"success": true, "signed_in": true} + screenshot). 12 turns / ~$0.63.
Bugbot fixes (each verified live against AgentMail):
- re-create releases the prior inbox (no orphan)
- {{inbox_email}} substituted in task.md (no literal/flag leak)
- .inbox.json is the single source of truth; divergent --inbox-email warns and the resolved address wins
- executeCommand pins inbox.mjs to the run's own --workspace/--task (a foreign --task is overridden)
- the 30s exec-timeout issue was already handled by execTimeoutFor
Full browse.sh sandbox pipeline (with browserbase/browse.sh#159): a real Vercel Sandbox cloned bb-skills @ this commit, ran inbox.mjs create, and Substack delivered the verification email to the minted inbox — proving the feature works in the actual generation pipeline, not just the standalone harness.
Secret hygiene: the AgentMail key never appears in any trace artifact.

No leaked inboxes after any run. Ready to merge.

shubh24 · 2026-06-02T00:49:47Z

Reviewed this — the core idea is solid and the security spine is genuinely well done: the AgentMail key never reaches the inner agent, only the throwaway address does. Nice.

Three things worth tightening before merge. Framing them simply:

1. Leftover inboxes pile up (the big one).
Releasing the inbox is currently a note in SKILL.md telling the orchestrator to run inbox.mjs release — not something the code guarantees. Robots (LLMs) routinely skip trailing cleanup steps. That's literally the lesson #123 just learned for browser sessions: it moved teardown into code for exactly this reason. On any non-happy exit (inner-loop error, max iterations, a crash, Ctrl-C) the inbox is never deleted. The free tier caps at 3 inboxes, so a few forgotten ones and the next create hard-fails. The 1h sweep only helps if created_at parses (see below), so it's not a reliable backstop.
→ Release the run's own inbox in code on both the success and error paths, right alongside the session teardown — don't rely on the agent remembering.

2. It can grab the wrong verification code.
DEFAULT_OTP_RE = \b\d{4,8}\b returns the first 4–8 digit run anywhere in the email, in document order. Verification emails are full of other numbers — a year (2026), a price, a zip — and any of those can come before the real code and get returned instead. The agent is told this "prints just the extracted code," so a wrong number flows straight into the form and the run fails confusingly.
→ Prefer a code that sits next to a keyword, e.g. /(?:code|otp|verification|passcode)\D{0,20}(\d{4,8})/i (capture group 1), falling back to the bare regex.

3. It'll open links sent by strangers.
A throwaway inbox can receive mail from anyone who learns the address, and the prompt steers the agent to browse open whatever wait-link returns. So an attacker-delivered link gets auto-opened — including internal hosts (http://169.254.169.254/, http://localhost), open-redirects, or phishing. REJECT_LINK_RE only filters unsubscribe/tracking/gif, and plain http:// to internal IPs passes. Related: --from is a substring match, so --from stripe.com also matches stripe.com.evil.com.
→ Restrict to https, reject RFC1918/loopback/link-local hosts, drop the bare-text-URL fallback, and make --from an exact domain-boundary match.

Everything else I found is low/nit (a possibly-dead --inbox-email flag, stripHtml/stripTags duplication, a --within parsed in two places). Happy to expand on any of these.

Removes AgentMail from the public skill entirely and replaces the bundled inbox.mjs with a generic, off-by-default provider contract. autobrowse no longer ships an email provider or names any vendor; it only knows how to *call* one. - evaluate.mjs: `--inbox-cmd <path>` / AUTOBROWSE_INBOX_CMD configures an optional inbox-provider command. Allowlist, exec-timeout, force-scope, and the (now vendor-neutral) Agent Inbox prompt key off it; all are inert when unset. Documents the provider contract (create/wait-otp/wait-link/latest/release + the .inbox.json {email,inbox_id} schema) as the explicit boundary. - Deleted scripts/inbox.mjs (AgentMail-specific — moves to the internal caller). - Scrubbed AGENTMAIL_API_KEY/agentmail.to from .env.example, SKILL.md (silent on the feature), and example-task.md. Kept generic mechanics: .inbox.json single-source-of-truth, {{inbox_email}} substitution, --workspace/--task force-scoping, wait-command exec timeout. Verified: with a throwaway stub provider the hook injects the section, substitutes the address, and forces scope; with no --inbox-cmd there is no inbox section and the allowlist is browse-only. `git grep -i agentmail` → no matches. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 0593523. Configure here.}

cursor · 2026-06-02T01:12:44Z

+function readInboxState(taskDir) {
+  try {
+    const { inbox_id } = JSON.parse(fs.readFileSync(path.join(taskDir, ".inbox.json"), "utf-8"));
+    return inbox_id || null;


readInboxState reads inbox_id instead of email field

High Severity

readInboxState destructures inbox_id from .inbox.json but the return value is used as an email address (inboxEmail). The provider contract on line 178-179 documents the schema as { "email": "...", "inbox_id": "..." } — two distinct fields. The email field contains the actual address (e.g. [email protected]) while inbox_id is the API identifier. The agent receives this ID instead of a valid email address, so it types the wrong value into signup/login forms, breaking the entire inbox feature.

Additional Locations (1)

skills/autobrowse/scripts/evaluate.mjs#L535-L536

^{Reviewed by Cursor Bugbot for commit 0593523. Configure here.}

aq17 · 2026-06-02T01:18:42Z

Reworked per team feedback — AgentMail is now fully out of this public repo

Keeping AgentMail browse.sh-internal (skills is public), but without forking autobrowse. The inbox capability is now a generic, off-by-default provider hook; the AgentMail implementation + secrets live only in the internal browse.sh repo and are injected into the sandbox at runtime.

This PR (public) now:

Deletes scripts/inbox.mjs (AgentMail-specific).
evaluate.mjs gains --inbox-cmd <path> / AUTOBROWSE_INBOX_CMD — an optional, vendor-neutral inbox-provider command. Allowlist, exec-timeout, force-scope, and the (vendor-neutral) Agent Inbox prompt all key off it and are inert when unset. Documents the explicit provider contract.
Scrubs AgentMail from .env.example, SKILL.md (silent on the feature), example-task.md. git grep -i agentmail → no matches.
Keeps generic mechanics: .inbox.json SSOT, {{inbox_email}} substitution, --workspace/--task force-scope, wait-command timeout.

Verified: with a throwaway stub provider the hook injects the section + substitutes the address + forces scope; with no --inbox-cmd there's no inbox section and the allowlist is browse-only; the browse.sh provider (separate repo) drives create/release against real AgentMail through --inbox-cmd.

Pairs with the internal browse.sh PR (provider injection). Divergence stays minimal: one shared autobrowse core; browse.sh owns only a swappable provider script + a few prompt lines.

aq17 · 2026-06-02T01:28:20Z

✅ Re-validated end-to-end on the reworked architecture (full browse.sh sandbox pipeline, local): /api/skills/generate → sandbox cloned the public skill @ 0593523 (no AgentMail, --inbox-cmd hook only) → browse.sh injected /vercel/sandbox/inbox-provider.mjs and passed --inbox-cmd → the injected provider minted ab-…@agentmail.to via the edge-injected key → Substack delivered "Create your account on Substack" to it. Every new seam exercised with real external email; the dotenv-in-sandbox bug was caught and fixed. Inbox released after; no leaks.

shubh24 · 2026-06-02T02:14:15Z

Reviewed this alongside the AgentMail provider in browserbase/browse.sh#159. The vendor-neutral split is great — this PR ships zero email-vendor code, just a clean create / wait-otp / wait-link / latest / release contract and a swappable --inbox-cmd. A few small things, nothing blocking.

🟡 Medium

The allowlist lets the inner agent run more than it should. isAllowedCommand only checks that the command is node <the configured provider> — it never looks at the subcommand, so the agent can call create, release, and latest, not just the wait-otp / wait-link it's actually told about. Sibling-task isolation still holds (forceInboxScope), but the agent can shoot itself in the foot: a mid-run release kills its own live inbox, and a create overwrites .inbox.json so the address baked into the prompt no longer matches the one being polled. Consider restricting the allowlist to the read-only subcommands and keeping create / release orchestrator-only.

⚪ Low

The PR description doesn't match the diff. It describes a self-contained inbox.mjs that talks directly to AgentMail, plus SKILL.md changes — none of that is in this PR (that's the browse.sh side). Only the Cursor auto-summary below it is accurate. Worth rewriting so reviewers aren't chasing code that isn't here, and so the merge order (this lands first, then browse.sh#159) stays clear.
The public --inbox-cmd hook isn't documented. The body mentions SKILL.md docs, but there's no SKILL.md change in the diff. Runtime usage works (the injected prompt section covers it), but the new flag is undocumented for anyone driving autobrowse directly.
readInboxState returns inbox_id but it's used as the email address. The contract designates email for that. It only works because the browse.sh provider happens to set email === inbox_id; a provider that distinguishes them would break. Safer: return email || inbox_id || null.
The OTP default text is vendor-specific. buildInboxSection tells the agent the default matches "a 4–8 digit code" — but that default actually lives in the provider, not here. Since this file is meant to be vendor-neutral, better to say "the provider's default pattern; pass --regex to override."

✅ What's done well

The contract design is clean, and the isolation instinct — forceInboxScope stripping any agent-supplied --workspace / --task and pinning the run's real ones — is exactly right for the shared-workspace, parallel-sub-agent setup.

(Review was AI-assisted.)

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread skills/autobrowse/scripts/evaluate.mjs

Comment thread skills/autobrowse/scripts/inbox.mjs Outdated

Comment thread skills/autobrowse/scripts/evaluate.mjs

aq17 requested review from shrey150 and shubh24 May 27, 2026 21:31

cursor Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread skills/autobrowse/scripts/evaluate.mjs

cursor Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread skills/autobrowse/scripts/evaluate.mjs

aq17 mentioned this pull request Jun 1, 2026

autobrowse: tear down self-owned browser session on exit #123

Open

cursor Bot reviewed Jun 2, 2026

View reviewed changes

aq17 changed the title ~~autobrowse: autonomous email inbox for signup/login/MFA tasks~~ autobrowse: optional vendor-neutral inbox-provider hook Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autobrowse: optional vendor-neutral inbox-provider hook#119

autobrowse: optional vendor-neutral inbox-provider hook#119
aq17 wants to merge 5 commits into
mainfrom
autobrowse-agentmail-inbox

aq17 commented May 26, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aq17 commented Jun 1, 2026

Uh oh!

aq17 commented Jun 1, 2026

Uh oh!

shubh24 commented Jun 2, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 2, 2026

Uh oh!

aq17 commented Jun 2, 2026

Uh oh!

aq17 commented Jun 2, 2026

Uh oh!

shubh24 commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aq17 commented May 26, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key source

Verification coverage

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aq17 commented Jun 1, 2026

Uh oh!

aq17 commented Jun 1, 2026

Validation summary — ready for review

Uh oh!

shubh24 commented Jun 2, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 2, 2026

Choose a reason for hiding this comment

readInboxState reads inbox_id instead of email field

Uh oh!

aq17 commented Jun 2, 2026

Reworked per team feedback — AgentMail is now fully out of this public repo

Uh oh!

aq17 commented Jun 2, 2026

Uh oh!

shubh24 commented Jun 2, 2026

🟡 Medium

⚪ Low

✅ What's done well

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aq17 commented May 26, 2026 •

edited by cursor Bot

Loading

`readInboxState` reads `inbox_id` instead of `email` field