Skip to content

TLS Certificate Management for JGDMS/DirtyChai #205

Description

@pfirmstone

TLS Certificate Management for JGDMS/DirtyChai

The challenge is real: JERI requires both server and client certificates for every connection, clients are heterogeneous (any OS), services are on remote hosts, and the system spans five distinct service identities plus an unbounded number of client identities. Manual keytool management at this scale is operationally untenable. The solution needs to be automatic, auditable, and OS-agnostic.

The Core Problem: Two Distinct Certificate Populations

Before recommending tooling it's important to recognise the system has two structurally different certificate populations with different management requirements:

Infrastructure certificates — one per service host (Hosts 1–5), plus one per engine instance in the Host 2 pool. Long-lived relative to clients, operator-controlled, well-known identities. These are machine identities — the services themselves.

Client certificates — one per JVM client, potentially large numbers, spread across heterogeneous OS environments. Short-lived, self-service, dynamic. These are workload identities.

These two populations have fundamentally different lifecycle and trust requirements and should be managed separately.


Recommendation: SPIFFE/SPIRE as the Identity Fabric

SPIFFE is designed to assign a unique, cryptographically verifiable identity to each service within a system, embodied in SPIFFE Verifiable Identity Documents (SVIDs), which are typically X.509 certificates. This enables mutual TLS and allows workloads to establish trust securely and automatically in dynamic and heterogeneous environments without relying on static network information or long-lived credentials. github

This maps directly onto the JGDMS problem. SPIRE delivers workload-specific, short-lived, automatically rotated X.509 SVIDs suitable for establishing mTLS directly to workloads via the Workload API, attesting the identity of workloads in a distributed software system at runtime. github

X.509-SVIDs are X.509 certificates in which the SPIFFE ID is embedded in the Subject Alternative Name field as a URI, integrating directly with mutual TLS — the standard mechanism for service-to-service authentication. This means JERI's JSSE-based mTLS works with SVIDs without modification — they're just X.509 certificates from JSSE's perspective. github

Why SPIFFE/SPIRE fits JGDMS particularly well

By providing short-lived, automatically rotated credentials, SVIDs eliminate the need for hardcoded secrets and manual certificate provisioning, reducing the attack surface and supporting compliance with security frameworks that mandate strong identity assurance and mutual authentication such as NIST 800-207. github

Because SPIFFE SVIDs are short-lived, the migration path to post-quantum cryptography algorithms is significantly simpler — a meaningful forward-looking property given DirtyChai's security-forward posture. github

Trust Domain Design for JGDMS

The SPIFFE trust domain should map onto the JGDMS deployment boundary. A suggested scheme:

spiffe://jgdms.example.org/host/lookup          → Host 1
spiffe://jgdms.example.org/host/bae/engine-1    → Host 2, instance 1
spiffe://jgdms.example.org/host/bae/engine-N    → Host 2, instance N
spiffe://jgdms.example.org/host/registry        → Host 3
spiffe://jgdms.example.org/host/downloader      → Host 4
spiffe://jgdms.example.org/host/telemetry       → Host 5
spiffe://jgdms.example.org/client/<identity>    → client JVMs

Each SPIFFE ID maps to one JERI connection identity. JERI's BasicJeriTrustVerifier or a custom TrustVerifier then validates the SPIFFE URI in the peer certificate's SAN against an authorisation policy — exactly the kind of identity-based access control JGDMS already performs at the application layer.

SPIRE Attestation for Heterogeneous Clients

The client population across Windows, macOS, and Linux is where SPIRE's platform-agnostic attestation is most valuable. SPIRE is designed to enable widespread deployment of mTLS between workloads in distributed systems by attesting the identity of workloads at runtime. Each client JVM runs a SPIRE Agent which: github

  • Attests the workload identity via platform-specific mechanisms (TPM on Windows, kernel join token on Linux, instance identity on cloud VMs)
  • Delivers the X.509 SVID directly to the JVM process via the Workload API
  • Automatically rotates the certificate before expiry — no operator intervention

The SPIRE Server (a separate, highly-available service in your infrastructure, outside the five JGDMS hosts) is the CA for the trust domain. It issues SVIDs with short lifetimes — typically one hour — and agents handle rotation transparently.


Certificate Lifecycle: Service Hosts vs. Clients

Service hosts (Hosts 1–5 and the Host 2 pool) get their SVIDs from the SPIRE Agent running on each host. The SPIRE Agent is a sidecar process. On Linux hosts this integrates naturally with systemd. Certificate rotation is automatic. The SELinux policy on Host 2 needs to allow the JVM to read from the SPIRE Workload API socket — this is the only additional SELinux rule required.

Engine instance key pairs (Host 2 pool) are a special case. The engine's signing key — used to sign JarAnalysisReport — is distinct from its TLS identity and should be treated differently. The signing key should be longer-lived, stored in a hardware-backed keystore if possible (HSM or TPM), and rotated deliberately rather than automatically. The TLS SVID rotates hourly; the signing key rotates quarterly or on compromise. These are separate concerns and should not be conflated.

Remote clients on Windows and macOS get their SVID from a SPIRE Agent installed alongside the JVM. On Windows this runs as a Windows Service; on macOS as a launchd agent. From the JVM's perspective the Workload API is a local socket regardless of OS.


JERI Integration

JERI uses JSSE under the hood for its TLS endpoint. The integration point is straightforward:

The SPIRE Workload API delivers the SVID as a standard X509Certificate chain plus PrivateKey. These are loaded into a KeyStore in memory (no filesystem keystore file) and presented to JERI's net.jini.jeri.ssl.SslServerEndpoint and SslEndpoint via a custom KeyManager. A TrustManager backed by the SPIRE-provided trust bundle validates the peer certificate. Both rotate automatically when SPIRE delivers a new SVID.

The practical implementation is a small JGDMS service utility — a SpiffeCredentialManager — that:

  1. Opens the SPIRE Workload API socket
  2. Receives the X.509-SVID and trust bundle via the SPIFFE Workload API protocol
  3. Populates an in-memory KeyStore and TrustStore
  4. Constructs the SSLContext for JERI
  5. Listens for SVID rotation notifications and rebuilds the SSLContext without restarting the JVM

This is a self-contained module, reusable across all five hosts and clients, and is a natural candidate for a JGDMS platform module.


What to Avoid

Self-signed certificates with manual distribution — doesn't scale to a large client population and creates an unmanageable revocation problem.

Long-lived certificates — a compromised client certificate that lasts a year is a year-long credential for an attacker. Short-lived SVIDs (one hour) limit the blast radius of any compromise to the rotation period.

Shared certificates across service instances — Host 2's engine pool must have per-instance certificates so individual instances can be individually revoked from Host 3's trusted key set without taking down the pool.

Filesystem keystores with static passwords — these are inappropriate for the SELinux-isolated Host 2. The in-memory keystore approach eliminates the static password problem entirely.


Summary

Concern Recommendation
Identity fabric SPIFFE/SPIRE — OS-agnostic, automatic rotation
Service host certs SPIRE Agent sidecar on each host, X.509-SVID via Workload API
Client certs SPIRE Agent per client machine, platform-attested Engine signing keys (Host 2) | Separate from TLS identity — longer-lived, hardware-backed where possible
JERI integration SpiffeCredentialManager utility module — in-memory KeyStore, rotating SSLContext
Trust domain Single SPIFFE trust domain per JGDMS deployment, SPIFFE URI validated by JERI TrustVerifier
Certificate lifetime SVIDs: ~1 hour. Engine signing keys: deliberate rotation only
Post-quantum readiness SPIRE's short-lived SVIDs make algorithm migration straightforward

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions