What an operator running pk-auth in production needs to know: secrets, persistence, observability, rotation, and the most common ways a deployment goes wrong.
pk-auth ships as a JVM library — Spring Boot 4, Dropwizard 5, or Micronaut 4 adapters all consume the same core. A typical production deployment needs:
- JDK 21 (records, sealed types, virtual threads). Earlier JDKs will not compile.
- Postgres 16+ (when using
pk-auth-persistence-jdbi) — Flyway migrations run at startup, no manual schema work. - DynamoDB (when using
pk-auth-persistence-dynamodb) — single table, schema per item type. See ADR 0008 for the table layout. - At least one trusted dispatcher for magic links + OTP if you enable those
flows. The testkit's
LoggingEmailSender/LoggingSmsSenderlog secrets to stdout; never use them in production.
| Setting | Min length | Notes |
|---|---|---|
pkauth.jwt.secret (HS256) |
32 bytes | Hard fail at boot if shorter. Rotate by issuing a fresh secret and tolerating a grace window (issue + verify in parallel — pk-auth itself does not rotate; the host shoulds run two issuers behind a load balancer until tokens expire). |
pkauth.relying-party.id |
n/a | The eTLD+1 (e.g. example.com, NOT auth.example.com). Cross-subdomain passkeys all bind to this. Once a credential is registered against an RP ID, it cannot be re-registered against a different one without a fresh enrollment. |
pkauth.relying-party.origins |
n/a | Strict allow-list of https:// origins. WebAuthn rejects mismatches; expand the list as you add subdomains. |
| Argon2id pepper | n/a | Per-deployment pepper for OTP hashes only (backup codes use Argon2id without a pepper). Treat as a long-lived secret; rotating it invalidates every existing OTP hash. |
Recommended: stash secrets in a KMS/Secrets Manager and inject as environment
variables (PKAUTH_JWT_SECRET, PKAUTH_OTP_PEPPER). The adapters bind both.
- Flyway resources live in
pk-auth-persistence-jdbi/src/main/resources/db/migration. - Migrations run automatically when the SPI is wired (see ADR 0003).
- The shipped baseline is split across
V1__credentials.sql,V2__challenges.sql,V3__backup_codes.sql,V4__otp_codes.sql, andV5__example_users.sql— five tables (credentials,challenges,backup_codes,otp_codes,users) with nopkauth_prefix.V6__audit_soft_delete.sqladds the append-onlypkauth_audit_eventstable.V7__credentials_hard_delete.sqldrops therevoked_at/revoked_reasoncolumns oncredentials— credential delete is a hard delete, with the audit record captured as a structured log event (pkauth.credential.deleted). - Magic-link tokens are not persisted: the JWT is the credential, and the
consumed-JTI store is in-memory by default (see
ConsumedJtiStoreSPI for a multi-replica override). - The unique key on credential ID is byte-array shaped — do not introduce a string-encoded column without a migration.
- Single physical table (see ADR 0008). Provision it before the app starts; the adapter does not create it.
- TTL attribute
expiresAtis honored onChallenge/OneTimePasscode/MagicLinkitems — enable it on the table. - 1.1.0 adds
access_tokensandrefresh_tokensitems on the same table (ADR 0015, 0013). Both set the DynamoDB-nativettlattribute to the row'sexpiresAtepoch second so background pruning is automatic — TTL must be enabled on the table for this to work. - Capacity-mode: on-demand is recommended for steady reads but bursty registration; provisioned only makes sense once you have a stable signing/verification baseline.
The new stateful access-token store (ADR 0015) and refresh-token store (ADR 0013) keep used/revoked rows around for a configurable retention window so operators have a forensic trail. Schedule a daily cleanup job:
JDBI / Postgres — call the SPI methods or run the canonical SQL:
-- Access tokens: drop rows whose exp has passed.
DELETE FROM access_tokens WHERE expires_at < NOW() - INTERVAL '1 day';
-- Refresh tokens: keep used/revoked rows for the configured retention
-- (default 30 days) so a forensic look-back survives.
DELETE FROM refresh_tokens
WHERE expires_at < NOW() - INTERVAL '30 days'
AND (used_at IS NOT NULL OR revoked_at IS NOT NULL);DynamoDB — native TTL handles routine expiry asynchronously. If you
need synchronous pruning (operator action / test), call
DynamoDbAccessTokenStore.deleteExpiredBefore(Instant) and
DynamoDbRefreshTokenRepository.deleteExpiredBefore(Instant) —
both walk the primary items and remove anything past the cutoff.
A daily cron is sufficient for both tables; neither row count grows unboundedly because TTL is set at issue time.
Every ceremony and admin operation emits structured logs at INFO. Suggested fields to forward into your SIEM:
userHandle(base64url),challengeId,credentialIdceremony.phase(start/finish) andceremony.step(registration/authentication)verification.kind(signature/originPolicy/rpIdPolicy/counterRegression/attestationPolicy)result(success/denied:<reason>)
Counter regression and origin mismatch both surface as INFO log entries with a
distinct result.denied.reason. Alert on either — they are signals of credential
cloning or a misconfigured RP.
Recommended dashboards:
- p99 of
registration.finish/authentication.finish(target < 200ms with Postgres on the same VPC). - 4xx by reason on
/auth/passkeys/*(origin mismatch is almost always config drift; counter regression is almost always an issue). - Backup-code redemption and OTP attempt rates per user (the SPIs already rate-limit, but operator-side alerts catch credential-stuffing).
- Passkey rotation: users delete and re-add via
DELETE /auth/admin/credentials/{id}and a fresh registration ceremony. The "last credential" guard returns 409 — that is intentional. Encourage users to add a second passkey before removing the first. - JWT secret rotation: roll via the dual-issuer pattern in §2.
- RP ID change: a one-way migration. Every existing passkey is invalidated. Plan a re-enrollment campaign with backup codes / magic links as the bridge.
| Symptom | Likely cause | First check |
|---|---|---|
| Browser shows "Relying party not registrable" | RP ID doesn't match the page's domain | The pkauth.relying-party.id config and the page's actual host |
4xx on authentication.finish with counter_regression |
A counter wound back — either credential clone or counter-0 (synced) passkey crossing devices | Inspect the credential's backupEligible flag; if true, consider switching the policy to warn |
Challenge expired 4xx |
Five-minute default TTL elapsed | Often a slow user; do not extend the TTL — re-issue start |
DynamoDB ConditionalCheckFailedException on takeOnce |
Two clients tried to consume the same challenge | Expected; only one succeeds. If the rate is high, inspect for double-submit on the client |
| Spring Security 7 chain mounts before the pk-auth filter | Filter order regression | Verify PkAuthSecurityConfig.pkAuthSecurityFilterChain has the higher precedence in the host's chain |
See docs/threat-model.md for the formal STRIDE pass.