Skip to content

v0.2+ email-auth enhancement: WebAuthn binding integration + stateless HMAC tokens for multi-broker scale #81

@hanwencheng

Description

@hanwencheng

Background

PR #75 ships v0.1 of email-link auth: stateful CSPRNG magic-link tokens, single-broker deployment, no HMAC (per architecture.md §3 K-table + §5a.1.M Stage 1). Issue #80 closed the original "broker can't be initialized via CLI" gap.

This issue tracks the two known v0.2+ enhancements that v0.1 deliberately deferred, surfaced during PR #75 design discussion. Both are documented as "Open trade-offs" in hardcoded.md but need a tracked issue so they don't silently regress.


Enhancement A — Integrate WebAuthn binding into magic-link Stage 2

Current state (v1c-interim, per architecture.md §5a.1.M + step-1c plan)

The architecture's target for master init is the two-stage ceremony:

  • Stage 1 — Identity ceremony: operator clicks magic-link → broker confirms (email, binding_nonce).
  • Stage 2 — Binding ceremony: WebAuthn enrollment binds D_pub (K10 device key) atomically inside the WebAuthn challenge → broker mints J0 with claims agentkeys_device_pubkey=D_pub + agentkeys_webauthn_cred=K11_id.

Today (v1c-interim) ships bespoke per-identity PoP shapes (pop_sig field for email/oauth2; SIWE-payload Device Pubkey commit for evm) instead of WebAuthn at Stage 2 — see step-1c plan. The wire shapes work but aren't uniform across identity types.

v0.2 target

Collapse all three identity-type binding flows into one WebAuthn ceremony:

1. CLI: agentkeys init --email [email protected] --broker-url B
2. Broker: POST /v1/auth/email/request → mints magic-link token (TTL 10 min)
3. Broker: SES send → alice's inbox
4. Operator: click → browser opens https://broker/auth/email/landing#t=<token>
5. Browser-side landing page:
   a. Read token from URL fragment (never sent in path/query)
   b. Call navigator.credentials.create(challenge=token) — triggers Touch ID / Windows Hello / Android StrongBox
   c. POST /v1/auth/email/verify {webauthn_attestation, D_pub, token}
6. Broker:
   a. Verify WebAuthn attestation against the challenge (binds D_pub atomically — attacker can't substitute D_pub without breaking the WebAuthn signature)
   b. Verify token (consume-once, TTL check)
   c. Mint J0: claims include agentkeys_device_pubkey=D_pub, agentkeys_webauthn_cred=K11_id
7. CLI polls /v1/auth/email/status/{request_id} → gets J0
8. CLI proceeds to J0 → J1 bridge per §5a.1.M

What needs to land

  • Browser-side landing page calls navigator.credentials.create (today: shows "Verified — return to your terminal")
  • /v1/auth/email/verify accepts {webauthn_attestation, D_pub, token} instead of just {token}
  • Broker validates the attestation (e.g. via webauthn-rs crate) AND that the challenge byte-equals the token — atomically binds D_pub to this magic-link round-trip
  • J0 minting writes agentkeys_webauthn_cred=K11_id claim
  • CLI polls + receives J0 with both K10 + K11 claims (already supported by agentkeys_device_pubkey claim path; just needs the parallel WebAuthn claim)
  • Same flow for --oauth2-google (single uniform Stage 2 across identity types)
  • Update docs/spec/plans/issue-74-step-1c-device-key-auth.md status: v1c-interim → v0.2-shipped

Architectural impact

  • K-table §3: K11 description already covers WebAuthn credential — no schema change.
  • §5a.1.M sequenceDiagram already shows the WebAuthn ceremony as the target — implementation just catches up.
  • Mitigates email-bypass attack: in v1c, an attacker with email access (e.g. shared mailbox) can complete the ceremony without hardware presence. v0.2 requires WebAuthn → biometric/PIN unlock at the bound device.

Enhancement B — Stateless HMAC tokens for multi-broker-replica scale

Current state (v0.1, per PR #75)

Single-broker deployment. Magic-link tokens are stateful: broker stores SHA256(token) in EmailTokenStore SQLite, looks up on click, marks consumed. No HMAC.

Why v0.1 didn't need HMAC

  • One broker process owns the SQLite — no cross-replica coordination needed.
  • Threat model: SQLite is local file under same UID as broker → attacker compromising one likely has the other → HMAC defense-in-depth is theoretical for this deployment.
  • HMAC was previously implemented as a vestigial dead field (loaded + length-validated but never used cryptographically); removed in b8481fe to align with architecture.md §3 K-table.

v0.2+ multi-broker scenarios

When the broker scales horizontally (HA, multi-region, blue-green deploys), v0.1's stateful-only design breaks in three ways:

  1. Replica routing: token issued by broker-A, click hits broker-B → broker-B has no row → 404. Mitigated only by sticky sessions (ALB-level), which don't survive failover.
  2. Failover: broker-A dies between issuance and click → in-flight tokens lost.
  3. Cross-region read latency: if SQLite is replaced with a shared DB (RDS/DynamoDB), every magic-link click costs a cross-region round-trip.

Recommended v0.2+ design: hybrid HMAC + consume-once

Stateless integrity + minimal shared state:

Token = base64url( {request_id, email, expires_at, nonce} ) || "." || base64url( HMAC-SHA256(K12, payload) )
  • Issuance: broker generates random nonce, signs (request_id, email, expires_at, nonce) with K12 (shared HMAC key, replicated to all broker replicas).
  • Click: any broker validates HMAC + expires_at locally (no DB lookup). Then a single small write to a shared consume-once store (Redis SETNX, DynamoDB conditional put, or Postgres unique constraint) marks the nonce consumed.
  • Cross-region: HMAC verify is local; consume-once is the only shared-state op (and it's small + can be eventually-consistent within a region).

Architectural impact

  • K-table §3: add K12 — Email-token HMAC key (32 bytes, shared across broker replicas, mounted from secrets manager). Sibling to K8 (broker session keypair).
  • §5a.1.M Stage 1: amend "Broker emails magic link; operator clicks; broker confirms single-use within TTL" → "Broker emails HMAC-signed magic link; operator clicks; ANY broker replica verifies HMAC locally, then consume-once write to shared store within TTL."
  • New env var: BROKER_EMAIL_HMAC_KEY_PATH (re-introduced — but this time documented in K-table, not vestigial).
  • New deployment requirement: shared K12 (e.g. AWS Secrets Manager, mounted via instance role at all broker hosts).
  • New deployment requirement: shared consume-once store (Redis / DynamoDB / Postgres — operator choice).

What needs to land

  • Add K12 to architecture.md §3 K-table.
  • Update §5a.1.M Stage 1 to describe the HMAC + consume-once flow.
  • Re-introduce BROKER_EMAIL_HMAC_KEY_PATH env var (with proper architectural documentation this time).
  • Re-introduce HMAC sign + verify in EmailLinkAuth (commit b8481fe removed it; revert is straightforward).
  • Add a ConsumeOnceStore trait with implementations for SQLite (single-broker, today) + Redis + DynamoDB.
  • setup-broker-host.sh: re-add the email-hmac.key mint step (only when --multi-broker flag is set, otherwise stays stateful-SQLite).

Why one issue covers both

The two enhancements are coupled:

  • WebAuthn binding (Enhancement A) is a Stage 2 change that's orthogonal to the token transport mechanism.
  • HMAC stateless tokens (Enhancement B) is a Stage 1 change that doesn't affect Stage 2.

But both touch the same files (crates/agentkeys-broker-server/src/plugins/auth/email_link.rs, boot.rs, setup-broker-host.sh, architecture.md §3 + §5a.1.M), so landing them together avoids two rounds of churn through the same code.

Acceptance criteria for closing this issue

  • Enhancement A: WebAuthn ceremony lands at email_link Stage 2; v1c-interim PoP shapes deprecated; demo doc shows the unified Stage-1 + Stage-2 flow.
  • Enhancement B: K12 lands in architecture K-table; HMAC sign+verify in EmailLinkAuth; consume-once store abstraction; setup-broker-host.sh --multi-broker flag wires it; all single-broker behavior preserved.
  • hardcoded.md "Open trade-offs" section updated: HMAC re-introduction landed, link to this closed issue.
  • PR agentkeys: stage 7+ — issue #74 step 1 (dev_key_service signer + bootstrap chain) #75's K10 + K11 claims path (already shipped) used unchanged by Enhancement A.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions