Skip to content

v2 stage 2 follow-up: K3 rotation eager re-encryption tool (scan + rewrite all blobs from old epoch) #94

@hanwencheng

Description

@hanwencheng

Context

Stage-2 hardening (#90) shipped:

  • K3EpochCounter contract on Heima (one-tx-per-rotation, O(1) global epoch counter)
  • ✅ K3 rotation operational runbook (signer-governance multisig procedure)
  • ✅ Lazy on-read re-encryption (blob read → decrypt under old K3 → re-encrypt under new K3 → upload to same S3 path)

Missing: the eager scan-and-rewrite tool. Lazy re-encryption is correct but leaves blobs in old K3 epoch indefinitely if they're never read. For operators who want to drop old K3 from the signer's keystore (smaller blast radius post-rotation), they need to be sure every blob under their operator-omni has been re-encrypted first.

Why

After a K3 rotation:

  • New writes use K3_v[N+1] (correct, immediate).
  • On-read decrypt path tries K3_v[N+1] first; falls back to K3_v[N] for old blobs (lazy migration).
  • Workers and the signer hold BOTH K3_v[N] + K3_v[N+1] indefinitely — until every old blob is read at least once.
  • Operator can't safely drop K3_v[N] from signer keystore unless they know every blob has been re-encrypted.

The eager tool closes this loop: scan all blobs under bots/<actor_omni>/{credentials,memory,inbound}/*, identify the ones still in old K3 epoch (by version byte in AES-GCM envelope), decrypt-and-re-encrypt under new K3, PUT back to same S3 path.

Scope

Tool shape

  • scripts/heima-k3-reencrypt.sh --operator-omni 0x<64-hex> --old-epoch N --new-epoch N+1 [--dry-run]
  • Reads STS creds via the existing OIDC path (operator's master device wallet mints session JWT → AssumeRoleWithWebIdentity → STS creds with PrincipalTag/agentkeys_actor_omni set to operator's omni)
  • Lists every blob under each actor's prefix (S3 list-objects-v2)
  • For each blob:
    1. GET object
    2. Parse v2 envelope: 1B version || 12B nonce || ciphertext || 16B tag
    3. If version-byte matches old K3 epoch → call worker's /v1/cred/teardown + /v1/cred/store round-trip (or /v1/memory/* for memory) to force re-encrypt under new K3
    4. If version-byte already matches new K3 → skip
  • Logs per-blob outcome (re-encrypted / skipped / failed) to stdout + final summary
  • --dry-run lists what would change without actually rewriting

Acceptance

  • After heima-k3-reencrypt.sh --operator-omni 0x..., all blobs under that operator's prefix have envelope-version byte == new K3 epoch
  • Signer's keystore can drop K3_v[N] (operator-confirmed via agentkeys signer drop-old-k3 --epoch N — separate tool, also new)
  • Re-running the tool is idempotent (already-re-encrypted blobs are skipped)
  • Tool fails fast if STS creds expired mid-run (operator re-runs to continue)

Out of scope

  • Multi-operator parallel runs (one operator at a time; signer-side coordination is the operator's responsibility)
  • Live-write coordination (operator should pause new writes during the rewrite, or accept the brief window where new writes use new K3 anyway)
  • Heima chain anchor for the rewrite event (CredentialAudit append for visibility is in scope; chain-side proof is out)

Dependencies

  • ✅ K3EpochCounter contract live on Heima
  • ✅ Worker AES-GCM envelope versioning (worker-creds + worker-memory both honor envelope version-byte today)
  • agentkeys signer drop-old-k3 --epoch N CLI command (out of scope of this issue — separate follow-up for signer-keystore eviction)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions