Kubernetes platform integrity validation operator and CLI.
Fathom ships an OCI-only Helm chart published to GHCR. Install the operator, its CRDs, and RBAC with:
helm install fathom oci://ghcr.io/skaphos/charts/fathom-operator \
--version X.Y.Z \
-n fathom-system --create-namespaceReplace X.Y.Z with a released chart version (plain semver, no leading v).
The chart installs the four fathom.skaphos.io CRDs from its native crds/
directory — Helm installs CRDs on first install only and never upgrades or
removes them, so apply new CRDs with kubectl before a breaking helm upgrade.
The probe is not a separate Deployment; the operator launches it as short-lived
pods. Point at a specific probe build with --set probeImage.tag=vX.Y.Z
(defaults to the chart's appVersion). Metrics are served over HTTPS on :8443
using controller-runtime's built-in authn/authz filter; see
deploy/helm/fathom-operator/values.yaml for the full value reference.
The chart sources its CRDs and the manager ClusterRole rules from config/
(kustomize stays the source of truth). Regenerate the derived bits with
go -C tools tool task helm:sync.
The built-in cert-manager adapter supports system_health, issuer_health, and
certificate_health families. system_health checks the core cert-manager
deployments, their matching pods, required cert-manager CRDs, and optionally the
webhook Service plus admission webhook configuration. issuer_health checks
Issuer and ClusterIssuer readiness. certificate_health checks Certificate
readiness, renewal timing, expiry thresholds, issuer references, and secret
linkage. Set the cert-manager name thresholds for distributions that rename the
controller, webhook, cainjector, Service, or webhook configuration objects.
apiVersion: fathom.skaphos.io/v1alpha1
kind: AddonCheck
metadata:
name: cert-manager-system-health
spec:
addonType: cert-manager
interval: 5m
timeout: 30s
policy:
system_health:
enabled: true
thresholds:
controllerName: "cert-manager"
webhookName: "cert-manager-webhook"
cainjectorName: "cert-manager-cainjector"
webhookServiceName: "cert-manager-webhook"
webhookConfigName: "cert-manager-webhook"
restartWarnCount: "3"
webhookProbe: "true"
issuer_health:
enabled: true
thresholds:
kinds: "Issuer,ClusterIssuer"
certificate_health:
enabled: true
thresholds:
warnDays: "30"
failDays: "7"The built-in CoreDNS adapter supports system_health and dns_resolution
families. system_health checks the CoreDNS deployment, matching pods,
kube-dns Service, EndpointSlices, and optionally a node-count autoscaler
deployment. Set deploymentName, serviceName, and autoscalerName for
distributions such as RKE2 that rename CoreDNS objects. dns_resolution
launches a short-lived probe Pod per target in the AddonCheck's namespace
(per ADR-0003, so the resolver topology matches workloads rather than the
operator pod) and records the per-target outcome plus resolver latency.
Override probeImage if the default image tag is not pullable from your
cluster.
apiVersion: fathom.skaphos.io/v1alpha1
kind: AddonCheck
metadata:
name: coredns-health
spec:
addonType: coredns
interval: 5m
timeout: 30s
policy:
system_health:
enabled: true
namespaces:
- kube-system
thresholds:
deploymentName: "coredns"
serviceName: "kube-dns"
restartWarnCount: "3"
autoscalerName: ""
dns_resolution:
enabled: true
thresholds:
targets: "kubernetes.default.svc.cluster.local"
probeImage: "ghcr.io/skaphos/fathom-probe:v0.0.2"The built-in External Secrets Operator adapter supports system_health and
secret_sync families. system_health checks controller deployments, pods, and
required ESO CRDs. secret_sync checks ExternalSecret readiness, stale refresh
state, failure reasons, store references, and target secret linkage.
apiVersion: fathom.skaphos.io/v1alpha1
kind: AddonCheck
metadata:
name: external-secrets-health
spec:
addonType: external-secrets
interval: 5m
timeout: 30s
policy:
system_health:
enabled: true
thresholds:
restartWarnCount: "3"
secret_sync:
enabled: true
thresholds:
staleMinutes: "60"Fathom has a shared lightweight probe-pod path under internal/probe plus a
tiny Go probe binary in cmd/probe. The probe image is built from
Dockerfile.probe and runs on scratch as a static binary with no shell or
package manager.
Supported probe modes are currently:
dns: resolve a DNS name from inside the probe Podtcp-connect: attempt a TCP connection to a target/porttcp-listen: run a TCP listener for peer connectivity tests
Build helpers:
go -C tools tool task probe-build
go -C tools tool task probe-docker-build PROBE_IMG=example.com/fathom-probe:latestThe shared pod builder applies the default hardening profile used for probe workloads: no service account token, non-root UID, read-only root filesystem, all capabilities dropped, no privilege escalation, runtime-default seccomp, and small CPU/memory requests. It also supports pod anti-affinity so future network checks can place client/server probe Pods on different nodes.
Full documentation lives in docs/:
- Architecture — CRD model, the AddonCheck → HealthCheck → ClusterHealth aggregation chain, reconcilers, adapter contract, probe-pod model.
- API reference — generated CRD reference for
fathom.skaphos.io/v1alpha1. - Configuration reference — every flag, env var, and config-file key.
- Code map — internal package tour for contributors.
- Architecture Decision Records — ADR-0001 … ADR-0004.