Background
PR #75 ships v0.1 of email-link auth: stateful CSPRNG magic-link tokens, single-broker deployment, no HMAC (per architecture.md §3 K-table + §5a.1.M Stage 1). Issue #80 closed the original "broker can't be initialized via CLI" gap.
This issue tracks the two known v0.2+ enhancements that v0.1 deliberately deferred, surfaced during PR #75 design discussion. Both are documented as "Open trade-offs" in hardcoded.md but need a tracked issue so they don't silently regress.
Enhancement A — Integrate WebAuthn binding into magic-link Stage 2
Current state (v1c-interim, per architecture.md §5a.1.M + step-1c plan)
The architecture's target for master init is the two-stage ceremony:
- Stage 1 — Identity ceremony: operator clicks magic-link → broker confirms
(email, binding_nonce).
- Stage 2 — Binding ceremony: WebAuthn enrollment binds
D_pub (K10 device key) atomically inside the WebAuthn challenge → broker mints J0 with claims agentkeys_device_pubkey=D_pub + agentkeys_webauthn_cred=K11_id.
Today (v1c-interim) ships bespoke per-identity PoP shapes (pop_sig field for email/oauth2; SIWE-payload Device Pubkey commit for evm) instead of WebAuthn at Stage 2 — see step-1c plan. The wire shapes work but aren't uniform across identity types.
v0.2 target
Collapse all three identity-type binding flows into one WebAuthn ceremony:
1. CLI: agentkeys init --email [email protected] --broker-url B
2. Broker: POST /v1/auth/email/request → mints magic-link token (TTL 10 min)
3. Broker: SES send → alice's inbox
4. Operator: click → browser opens https://broker/auth/email/landing#t=<token>
5. Browser-side landing page:
a. Read token from URL fragment (never sent in path/query)
b. Call navigator.credentials.create(challenge=token) — triggers Touch ID / Windows Hello / Android StrongBox
c. POST /v1/auth/email/verify {webauthn_attestation, D_pub, token}
6. Broker:
a. Verify WebAuthn attestation against the challenge (binds D_pub atomically — attacker can't substitute D_pub without breaking the WebAuthn signature)
b. Verify token (consume-once, TTL check)
c. Mint J0: claims include agentkeys_device_pubkey=D_pub, agentkeys_webauthn_cred=K11_id
7. CLI polls /v1/auth/email/status/{request_id} → gets J0
8. CLI proceeds to J0 → J1 bridge per §5a.1.M
What needs to land
Architectural impact
- K-table §3: K11 description already covers WebAuthn credential — no schema change.
- §5a.1.M sequenceDiagram already shows the WebAuthn ceremony as the target — implementation just catches up.
- Mitigates email-bypass attack: in v1c, an attacker with email access (e.g. shared mailbox) can complete the ceremony without hardware presence. v0.2 requires WebAuthn → biometric/PIN unlock at the bound device.
Enhancement B — Stateless HMAC tokens for multi-broker-replica scale
Current state (v0.1, per PR #75)
Single-broker deployment. Magic-link tokens are stateful: broker stores SHA256(token) in EmailTokenStore SQLite, looks up on click, marks consumed. No HMAC.
Why v0.1 didn't need HMAC
- One broker process owns the SQLite — no cross-replica coordination needed.
- Threat model: SQLite is local file under same UID as broker → attacker compromising one likely has the other → HMAC defense-in-depth is theoretical for this deployment.
- HMAC was previously implemented as a vestigial dead field (loaded + length-validated but never used cryptographically); removed in
b8481fe to align with architecture.md §3 K-table.
v0.2+ multi-broker scenarios
When the broker scales horizontally (HA, multi-region, blue-green deploys), v0.1's stateful-only design breaks in three ways:
- Replica routing: token issued by broker-A, click hits broker-B → broker-B has no row → 404. Mitigated only by sticky sessions (ALB-level), which don't survive failover.
- Failover: broker-A dies between issuance and click → in-flight tokens lost.
- Cross-region read latency: if SQLite is replaced with a shared DB (RDS/DynamoDB), every magic-link click costs a cross-region round-trip.
Recommended v0.2+ design: hybrid HMAC + consume-once
Stateless integrity + minimal shared state:
Token = base64url( {request_id, email, expires_at, nonce} ) || "." || base64url( HMAC-SHA256(K12, payload) )
- Issuance: broker generates random
nonce, signs (request_id, email, expires_at, nonce) with K12 (shared HMAC key, replicated to all broker replicas).
- Click: any broker validates HMAC + expires_at locally (no DB lookup). Then a single small write to a shared consume-once store (Redis SETNX, DynamoDB conditional put, or Postgres unique constraint) marks the nonce consumed.
- Cross-region: HMAC verify is local; consume-once is the only shared-state op (and it's small + can be eventually-consistent within a region).
Architectural impact
- K-table §3: add K12 — Email-token HMAC key (32 bytes, shared across broker replicas, mounted from secrets manager). Sibling to K8 (broker session keypair).
- §5a.1.M Stage 1: amend "Broker emails magic link; operator clicks; broker confirms single-use within TTL" → "Broker emails HMAC-signed magic link; operator clicks; ANY broker replica verifies HMAC locally, then consume-once write to shared store within TTL."
- New env var:
BROKER_EMAIL_HMAC_KEY_PATH (re-introduced — but this time documented in K-table, not vestigial).
- New deployment requirement: shared K12 (e.g. AWS Secrets Manager, mounted via instance role at all broker hosts).
- New deployment requirement: shared consume-once store (Redis / DynamoDB / Postgres — operator choice).
What needs to land
Why one issue covers both
The two enhancements are coupled:
- WebAuthn binding (Enhancement A) is a Stage 2 change that's orthogonal to the token transport mechanism.
- HMAC stateless tokens (Enhancement B) is a Stage 1 change that doesn't affect Stage 2.
But both touch the same files (crates/agentkeys-broker-server/src/plugins/auth/email_link.rs, boot.rs, setup-broker-host.sh, architecture.md §3 + §5a.1.M), so landing them together avoids two rounds of churn through the same code.
Acceptance criteria for closing this issue
References
Background
PR #75 ships v0.1 of email-link auth: stateful CSPRNG magic-link tokens, single-broker deployment, no HMAC (per
architecture.md§3 K-table + §5a.1.M Stage 1). Issue #80 closed the original "broker can't be initialized via CLI" gap.This issue tracks the two known v0.2+ enhancements that v0.1 deliberately deferred, surfaced during PR #75 design discussion. Both are documented as "Open trade-offs" in
hardcoded.mdbut need a tracked issue so they don't silently regress.Enhancement A — Integrate WebAuthn binding into magic-link Stage 2
Current state (v1c-interim, per
architecture.md §5a.1.M+ step-1c plan)The architecture's target for master init is the two-stage ceremony:
(email, binding_nonce).D_pub(K10 device key) atomically inside the WebAuthn challenge → broker mints J0 with claimsagentkeys_device_pubkey=D_pub+agentkeys_webauthn_cred=K11_id.Today (v1c-interim) ships bespoke per-identity PoP shapes (
pop_sigfield for email/oauth2; SIWE-payloadDevice Pubkeycommit for evm) instead of WebAuthn at Stage 2 — see step-1c plan. The wire shapes work but aren't uniform across identity types.v0.2 target
Collapse all three identity-type binding flows into one WebAuthn ceremony:
What needs to land
navigator.credentials.create(today: shows "Verified — return to your terminal")/v1/auth/email/verifyaccepts{webauthn_attestation, D_pub, token}instead of just{token}webauthn-rscrate) AND that the challenge byte-equals the token — atomically binds D_pub to this magic-link round-tripagentkeys_webauthn_cred=K11_idclaimagentkeys_device_pubkeyclaim path; just needs the parallel WebAuthn claim)--oauth2-google(single uniform Stage 2 across identity types)docs/spec/plans/issue-74-step-1c-device-key-auth.mdstatus: v1c-interim → v0.2-shippedArchitectural impact
Enhancement B — Stateless HMAC tokens for multi-broker-replica scale
Current state (v0.1, per PR #75)
Single-broker deployment. Magic-link tokens are stateful: broker stores
SHA256(token)inEmailTokenStoreSQLite, looks up on click, marks consumed. No HMAC.Why v0.1 didn't need HMAC
b8481feto align witharchitecture.md §3K-table.v0.2+ multi-broker scenarios
When the broker scales horizontally (HA, multi-region, blue-green deploys), v0.1's stateful-only design breaks in three ways:
Recommended v0.2+ design: hybrid HMAC + consume-once
Stateless integrity + minimal shared state:
nonce, signs(request_id, email, expires_at, nonce)withK12(shared HMAC key, replicated to all broker replicas).Architectural impact
BROKER_EMAIL_HMAC_KEY_PATH(re-introduced — but this time documented in K-table, not vestigial).What needs to land
architecture.md §3K-table.BROKER_EMAIL_HMAC_KEY_PATHenv var (with proper architectural documentation this time).EmailLinkAuth(commitb8481feremoved it; revert is straightforward).ConsumeOnceStoretrait with implementations for SQLite (single-broker, today) + Redis + DynamoDB.setup-broker-host.sh: re-add the email-hmac.key mint step (only when--multi-brokerflag is set, otherwise stays stateful-SQLite).Why one issue covers both
The two enhancements are coupled:
But both touch the same files (
crates/agentkeys-broker-server/src/plugins/auth/email_link.rs,boot.rs,setup-broker-host.sh,architecture.md§3 + §5a.1.M), so landing them together avoids two rounds of churn through the same code.Acceptance criteria for closing this issue
EmailLinkAuth; consume-once store abstraction;setup-broker-host.sh --multi-brokerflag wires it; all single-broker behavior preserved.hardcoded.md"Open trade-offs" section updated: HMAC re-introduction landed, link to this closed issue.References
docs/spec/architecture.md§3 (K-table), §5a.1.M (master binding ceremony)docs/spec/plans/issue-74-step-1c-device-key-auth.md(current v1c-interim design)hardcoded.md"Open trade-offs" — HMAC removal trade-offb8481fe— removed vestigial HMAC field (would be re-introduced by Enhancement B)