Skip to content

autobrowse: optional vendor-neutral inbox-provider hook#119

Open
aq17 wants to merge 5 commits into
mainfrom
autobrowse-agentmail-inbox
Open

autobrowse: optional vendor-neutral inbox-provider hook#119
aq17 wants to merge 5 commits into
mainfrom
autobrowse-agentmail-inbox

Conversation

@aq17
Copy link
Copy Markdown
Contributor

@aq17 aq17 commented May 26, 2026

Summary

Lets an autobrowse loop provision a throwaway inbox so the inner agent can register accounts, log in, and complete email verification — without the user supplying their own email to the agent. Fully self-contained in the autobrowse skill (no browse.sh dependency).

  • New scripts/inbox.mjs CLI: create / wait-otp / wait-link / latest / release. It talks directly to AgentMail (api.agentmail.to) using AGENTMAIL_API_KEY from the env; the inner agent only ever sees the inbox address (the key is read by inbox.mjs and never printed, and the execute allowlist permits only browse + inbox.mjs).
  • create sweeps stale ab--prefixed inboxes (>1h) before minting — self-heals crashed loops without ever touching a non-ab- inbox.
  • evaluate.mjs gains --inbox-email, injects an "Agent Inbox" section into the system prompt, and allows the agent to shell out to inbox.mjs.
  • SKILL.md documents the opt-in provision step, mandatory release/cleanup, the graduation note (inbox is loop-only — graduated skills expect the end user's own credentials), and the 3-inbox free-tier concurrency cap.

Key source

  • Browserbase deployments inject a pooled AGENTMAIL_API_KEY (claimed org) into the skill-runner env — browse.sh uses the skill with zero browse.sh-specific code.
  • Regular users set their own free key from agentmail.to; a clear error fires if it's unset.

Verification coverage

Flow Support
Numeric OTP wait-otp (default 4–8 digits)
Alphanumeric code wait-otp --regex
Click / magic link wait-link [--match]browse open
Raw inspection latest

Test plan

  • CLI guards: missing AGENTMAIL_API_KEY → clear setup error; missing state / unknown command error cleanly
  • Live end-to-end against a real AgentMail org: create → wait-otp (extracted 771209) → wait-link (extracted href URL) → latest → release; org returned to baseline inbox count, no leaks
  • Sweep safety: a second create left a fresh ab- inbox and the non-ab- primary untouched
  • include_spam=true on polling — verification emails to a fresh inbox often get spam-flagged

Supersedes the earlier two-repo approach (browse.sh#151 closed).

🤖 Generated with Claude Code


Note

Medium Risk
Expands the agent execute allowlist and runs external inbox CLIs with altered timeouts; misconfiguration or weak provider isolation could affect parallel tasks, though workspace/task pinning mitigates cross-task access.

Overview
evaluate.mjs gains optional throwaway-inbox support for signup/login/MFA flows via a pluggable provider (--inbox-cmd / AUTOBROWSE_INBOX_CMD), documented inline with a create / wait-otp / wait-link / latest / release contract and tasks/<task>/.inbox.json.

When configured, the inner agent may run only browse plus node <resolved-inbox-cmd>; provider calls are scoped to the current run’s workspace/task (agent-supplied --workspace/--task stripped), and wait-otp / wait-link get exec timeouts extended past the default 30s cap. Runs resolve the inbox from .inbox.json (with --inbox-email fallback), warn on mismatch, substitute {{inbox_email}} in task.md, and inject an Agent Inbox system-prompt section with provider command examples.

.gitignore now ignores .inbox.json so per-task inbox state is not committed.

Reviewed by Cursor Bugbot for commit 0593523. Bugbot is set up for automated code reviews on this repo. Configure here.

Lets an autobrowse loop provision a throwaway inbox so the inner agent can
register accounts and complete email verification. A new scripts/inbox.mjs CLI
(create / wait-otp / wait-link / latest / release) talks to the browse.sh
inbox endpoint, which owns the AgentMail key — the agent only ever sees the
address. evaluate.mjs gains --inbox-email, injects the inbox into the system
prompt, and allows the agent to shell out to inbox.mjs. SKILL.md documents the
opt-in provision/release steps, graduation note (inbox is loop-only), and the
3-concurrent-loop free-tier cap.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Comment thread skills/autobrowse/scripts/evaluate.mjs
Comment thread skills/autobrowse/scripts/inbox.mjs Outdated
Comment thread skills/autobrowse/scripts/evaluate.mjs
@aq17 aq17 requested review from shrey150 and shubh24 May 27, 2026 21:31
Consolidates all inbox-provisioning logic into the autobrowse skill so the
feature is self-contained with nothing browse.sh-specific. inbox.mjs now calls
api.agentmail.to directly using AGENTMAIL_API_KEY from the env (sweep-on-create
and the ab- prefix guard move into the CLI). Browserbase deployments inject a
pooled key; regular users provide their own (free at agentmail.to) and get a
clear setup error if it's unset. The inner agent still only ever sees the inbox
address — the key is read by inbox.mjs and never printed.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Comment thread skills/autobrowse/scripts/evaluate.mjs
Hardening found by a live Substack magic-link signup run end-to-end:

- wait-link returned an open-tracking pixel (.gif) because it grabbed the first
  URL anywhere in the body. Now extract <a href> anchors with a reject-list
  (unsubscribe/mailto/tel/preferences/.gif), which skips img-src pixels; --match
  matches the href OR the visible link text so "confirm"/"sign in" finds the CTA
  even when the href is a tracking redirect (browse open follows it).
- latest only showed list-summary metadata (the list endpoint omits the body).
  It now fetches the full single message by id so text/html/links are visible.
- partsOf prefers AgentMail's cleaned extracted_text/extracted_html.
- evaluate.mjs killed wait-otp/wait-link at the fixed 30s exec cap (ETIMEDOUT
  on --within 60/90). exec timeout for inbox wait commands is now --within + 15s.

Verified end-to-end: signup → wait-link returns the real "Confirm your email"
CTA → browse open → signed-in Substack home. Sweep still proven to never touch
non-ab- inboxes.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Comment thread skills/autobrowse/scripts/evaluate.mjs
… truth)

- create now releases the inbox the task already tracks before minting a new
  one — a re-create within the 1h sweep window otherwise orphaned a live inbox
  (leaked AND unreachable by release). (#2)
- evaluate.mjs resolves the inbox address from .inbox.json (what wait-otp/
  wait-link actually poll); --inbox-email is a fallback and a mismatch now warns
  instead of silently polling a different inbox. (#4)
- {{inbox_email}} in task.md is now substituted with the resolved address. (#3)
- executeCommand pins inbox.mjs to the run's own --workspace/--task, so a
  sub-agent can't read or release a sibling task's inbox (parallel runs share a
  workspace, isolated only by --task). (#5)

The 30s exec-timeout issue (#1) was already fixed by execTimeoutFor in 2d091fc.

Verified: re-create deletes the prior inbox (no orphan); a divergent
--inbox-email warns and the resolved address wins; {{inbox_email}} is replaced;
an agent passing a foreign --task is overridden back to its own.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@aq17
Copy link
Copy Markdown
Contributor Author

aq17 commented Jun 1, 2026

Addressed the Bugbot findings in 47f740f (lifecycle + single-source-of-truth) — #1 was already handled in 2d091fc:

Bugbot issue Resolution
Inbox wait killed by timeout Already fixed in 2d091fcexecTimeoutFor gives wait-otp/wait-link --within + 15s instead of the fixed 30s cap.
Recreate leaves orphan inboxes cmdCreate now DELETEs the inbox the task already tracks before minting a new one (sweep can't catch a <1h-old inbox, and overwriting .inbox.json would orphan it).
Task placeholder never substituted evaluate.mjs now substitutes {{inbox_email}} (optional inner whitespace) in task.md with the resolved address.
Prompt email ignores inbox state .inbox.json is now the single source of truth (it's what wait-* poll). --inbox-email is a fallback; a mismatch logs a WARNING and the .inbox.json value wins.
Inbox CLI args unsandboxed executeCommand pins inbox.mjs to the run's own --workspace/--task, stripping any agent-supplied values — a sub-agent can no longer read/release a sibling task's inbox.

Verified each end-to-end against a live AgentMail org: re-create deletes the prior inbox (no orphan); a divergent --inbox-email warns and the resolved address wins; {{inbox_email}} is replaced (no literal/flag leak); an agent passing a foreign --task is overridden back to its own ([scope] ... overridden).

@aq17
Copy link
Copy Markdown
Contributor Author

aq17 commented Jun 1, 2026

Validation summary — ready for review

Tested at HEAD 47f740f:

  • Standalone loop (twice): full Substack magic-link signup → wait-link returned the real CTA → browse opensigned-in state confirmed ({"success": true, "signed_in": true} + screenshot). 12 turns / ~$0.63.
  • Bugbot fixes (each verified live against AgentMail):
    • re-create releases the prior inbox (no orphan)
    • {{inbox_email}} substituted in task.md (no literal/flag leak)
    • .inbox.json is the single source of truth; divergent --inbox-email warns and the resolved address wins
    • executeCommand pins inbox.mjs to the run's own --workspace/--task (a foreign --task is overridden)
    • the 30s exec-timeout issue was already handled by execTimeoutFor
  • Full browse.sh sandbox pipeline (with browserbase/browse.sh#159): a real Vercel Sandbox cloned bb-skills @ this commit, ran inbox.mjs create, and Substack delivered the verification email to the minted inbox — proving the feature works in the actual generation pipeline, not just the standalone harness.
  • Secret hygiene: the AgentMail key never appears in any trace artifact.

No leaked inboxes after any run. Ready to merge.

@shubh24
Copy link
Copy Markdown
Contributor

shubh24 commented Jun 2, 2026

Reviewed this — the core idea is solid and the security spine is genuinely well done: the AgentMail key never reaches the inner agent, only the throwaway address does. Nice.

Three things worth tightening before merge. Framing them simply:

1. Leftover inboxes pile up (the big one).
Releasing the inbox is currently a note in SKILL.md telling the orchestrator to run inbox.mjs release — not something the code guarantees. Robots (LLMs) routinely skip trailing cleanup steps. That's literally the lesson #123 just learned for browser sessions: it moved teardown into code for exactly this reason. On any non-happy exit (inner-loop error, max iterations, a crash, Ctrl-C) the inbox is never deleted. The free tier caps at 3 inboxes, so a few forgotten ones and the next create hard-fails. The 1h sweep only helps if created_at parses (see below), so it's not a reliable backstop.
→ Release the run's own inbox in code on both the success and error paths, right alongside the session teardown — don't rely on the agent remembering.

2. It can grab the wrong verification code.
DEFAULT_OTP_RE = \b\d{4,8}\b returns the first 4–8 digit run anywhere in the email, in document order. Verification emails are full of other numbers — a year (2026), a price, a zip — and any of those can come before the real code and get returned instead. The agent is told this "prints just the extracted code," so a wrong number flows straight into the form and the run fails confusingly.
→ Prefer a code that sits next to a keyword, e.g. /(?:code|otp|verification|passcode)\D{0,20}(\d{4,8})/i (capture group 1), falling back to the bare regex.

3. It'll open links sent by strangers.
A throwaway inbox can receive mail from anyone who learns the address, and the prompt steers the agent to browse open whatever wait-link returns. So an attacker-delivered link gets auto-opened — including internal hosts (http://169.254.169.254/, http://localhost), open-redirects, or phishing. REJECT_LINK_RE only filters unsubscribe/tracking/gif, and plain http:// to internal IPs passes. Related: --from is a substring match, so --from stripe.com also matches stripe.com.evil.com.
→ Restrict to https, reject RFC1918/loopback/link-local hosts, drop the bare-text-URL fallback, and make --from an exact domain-boundary match.

Everything else I found is low/nit (a possibly-dead --inbox-email flag, stripHtml/stripTags duplication, a --within parsed in two places). Happy to expand on any of these.

Removes AgentMail from the public skill entirely and replaces the bundled
inbox.mjs with a generic, off-by-default provider contract. autobrowse no longer
ships an email provider or names any vendor; it only knows how to *call* one.

- evaluate.mjs: `--inbox-cmd <path>` / AUTOBROWSE_INBOX_CMD configures an optional
  inbox-provider command. Allowlist, exec-timeout, force-scope, and the (now
  vendor-neutral) Agent Inbox prompt key off it; all are inert when unset.
  Documents the provider contract (create/wait-otp/wait-link/latest/release +
  the .inbox.json {email,inbox_id} schema) as the explicit boundary.
- Deleted scripts/inbox.mjs (AgentMail-specific — moves to the internal caller).
- Scrubbed AGENTMAIL_API_KEY/agentmail.to from .env.example, SKILL.md (silent on
  the feature), and example-task.md.

Kept generic mechanics: .inbox.json single-source-of-truth, {{inbox_email}}
substitution, --workspace/--task force-scoping, wait-command exec timeout.

Verified: with a throwaway stub provider the hook injects the section,
substitutes the address, and forces scope; with no --inbox-cmd there is no inbox
section and the allowlist is browse-only. `git grep -i agentmail` → no matches.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0593523. Configure here.

function readInboxState(taskDir) {
try {
const { inbox_id } = JSON.parse(fs.readFileSync(path.join(taskDir, ".inbox.json"), "utf-8"));
return inbox_id || null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readInboxState reads inbox_id instead of email field

High Severity

readInboxState destructures inbox_id from .inbox.json but the return value is used as an email address (inboxEmail). The provider contract on line 178-179 documents the schema as { "email": "...", "inbox_id": "..." } — two distinct fields. The email field contains the actual address (e.g. [email protected]) while inbox_id is the API identifier. The agent receives this ID instead of a valid email address, so it types the wrong value into signup/login forms, breaking the entire inbox feature.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0593523. Configure here.

@aq17 aq17 changed the title autobrowse: autonomous email inbox for signup/login/MFA tasks autobrowse: optional vendor-neutral inbox-provider hook Jun 2, 2026
@aq17
Copy link
Copy Markdown
Contributor Author

aq17 commented Jun 2, 2026

Reworked per team feedback — AgentMail is now fully out of this public repo

Keeping AgentMail browse.sh-internal (skills is public), but without forking autobrowse. The inbox capability is now a generic, off-by-default provider hook; the AgentMail implementation + secrets live only in the internal browse.sh repo and are injected into the sandbox at runtime.

This PR (public) now:

  • Deletes scripts/inbox.mjs (AgentMail-specific).
  • evaluate.mjs gains --inbox-cmd <path> / AUTOBROWSE_INBOX_CMD — an optional, vendor-neutral inbox-provider command. Allowlist, exec-timeout, force-scope, and the (vendor-neutral) Agent Inbox prompt all key off it and are inert when unset. Documents the explicit provider contract.
  • Scrubs AgentMail from .env.example, SKILL.md (silent on the feature), example-task.md. git grep -i agentmail → no matches.
  • Keeps generic mechanics: .inbox.json SSOT, {{inbox_email}} substitution, --workspace/--task force-scope, wait-command timeout.

Verified: with a throwaway stub provider the hook injects the section + substitutes the address + forces scope; with no --inbox-cmd there's no inbox section and the allowlist is browse-only; the browse.sh provider (separate repo) drives create/release against real AgentMail through --inbox-cmd.

Pairs with the internal browse.sh PR (provider injection). Divergence stays minimal: one shared autobrowse core; browse.sh owns only a swappable provider script + a few prompt lines.

@aq17
Copy link
Copy Markdown
Contributor Author

aq17 commented Jun 2, 2026

Re-validated end-to-end on the reworked architecture (full browse.sh sandbox pipeline, local): /api/skills/generate → sandbox cloned the public skill @ 0593523 (no AgentMail, --inbox-cmd hook only) → browse.sh injected /vercel/sandbox/inbox-provider.mjs and passed --inbox-cmd → the injected provider minted ab-…@agentmail.to via the edge-injected key → Substack delivered "Create your account on Substack" to it. Every new seam exercised with real external email; the dotenv-in-sandbox bug was caught and fixed. Inbox released after; no leaks.

@shubh24
Copy link
Copy Markdown
Contributor

shubh24 commented Jun 2, 2026

Reviewed this alongside the AgentMail provider in browserbase/browse.sh#159. The vendor-neutral split is great — this PR ships zero email-vendor code, just a clean create / wait-otp / wait-link / latest / release contract and a swappable --inbox-cmd. A few small things, nothing blocking.

🟡 Medium

The allowlist lets the inner agent run more than it should. isAllowedCommand only checks that the command is node <the configured provider> — it never looks at the subcommand, so the agent can call create, release, and latest, not just the wait-otp / wait-link it's actually told about. Sibling-task isolation still holds (forceInboxScope), but the agent can shoot itself in the foot: a mid-run release kills its own live inbox, and a create overwrites .inbox.json so the address baked into the prompt no longer matches the one being polled. Consider restricting the allowlist to the read-only subcommands and keeping create / release orchestrator-only.

⚪ Low

  • The PR description doesn't match the diff. It describes a self-contained inbox.mjs that talks directly to AgentMail, plus SKILL.md changes — none of that is in this PR (that's the browse.sh side). Only the Cursor auto-summary below it is accurate. Worth rewriting so reviewers aren't chasing code that isn't here, and so the merge order (this lands first, then browse.sh#159) stays clear.
  • The public --inbox-cmd hook isn't documented. The body mentions SKILL.md docs, but there's no SKILL.md change in the diff. Runtime usage works (the injected prompt section covers it), but the new flag is undocumented for anyone driving autobrowse directly.
  • readInboxState returns inbox_id but it's used as the email address. The contract designates email for that. It only works because the browse.sh provider happens to set email === inbox_id; a provider that distinguishes them would break. Safer: return email || inbox_id || null.
  • The OTP default text is vendor-specific. buildInboxSection tells the agent the default matches "a 4–8 digit code" — but that default actually lives in the provider, not here. Since this file is meant to be vendor-neutral, better to say "the provider's default pattern; pass --regex to override."

✅ What's done well

The contract design is clean, and the isolation instinct — forceInboxScope stripping any agent-supplied --workspace / --task and pinning the run's real ones — is exactly right for the shared-workspace, parallel-sub-agent setup.

(Review was AI-assisted.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants