Next-Token Prediction Learns Generalisable Representations of Sleep Physiology
June 2026
- Initial release: the pretrained Hypnos model is available on the HuggingFace Hub, together with a minimal inference library for generating sleep embeddings from EDF recordings. Paper: arXiv:2606.09605.
pip install hypnos # or: uv add hypnosTo work on the library itself, clone the repo and install from source:
uv sync # or: pip install -e .Load an EDF, preprocess, and generate embeddings from the pre-trained Hypnos model:
from hypnos.embedding import embed_edf
emb = embed_edf("recording.edf")
# emb: dict {modality_name: np.ndarray [n_seconds, embed_dim] float16}
# e.g. emb["eeg_c3"], emb["ecg"], ... — one vector per second, per present modalityEmbeddings are returned per modality (z^i_t) at the model's native 1 Hz resolution
(one vector per second). Only modalities present in the recording appear in the dict. The
model defaults to the released weights on the Hub (joncarter/hypnos); pass a repo id or
local path to override.
The pipeline runs: EDF → preprocess (resample / causal filter / normalize) → per-modality
tokenization → RQ-Transformer → 1 Hz per-modality embeddings. For US recordings
pass notch_freq=60.0 (the default is 50 Hz) to match the powerline frequency.
Reuse a loaded model across recordings with the step-by-step API:
from hypnos.embedding import load_model, preprocess_edf, tokenize, embed
model, tokenizers, meta = load_model(device="cpu")
signals = preprocess_edf("recording.edf", meta)
tokens, modality_mask, channel_ids = tokenize(tokenizers, meta, signals)
emb = embed(model, tokens, modality_mask, channel_ids, meta) # {name: [T, D]}Hypnos maps each EDF signal label onto a canonical channel (C3, C4, E1, E2, Chin,
ECG, ABD, THX). It already recognizes many common naming conventions out of the box
(e.g. EKG/ECG L-ECG R → ECG, C3-M2 → C3), plus AASM contralateral re-referencing
(mastoid equivalents A1/A2 and TP9/TP10 are accepted as M1/M2) and chin-EMG
bipolar derivation. Matching is tolerant of case, whitespace and :// separators (so
c3:m2 resolves like C3-M2). A modality whose channel can't be found is simply skipped.
If your recording uses labels Hypnos doesn't recognize, pass channel_aliases — a
{canonical_name: [extra EDF labels]} mapping that's merged with the built-ins (your aliases
take precedence):
emb = embed_edf(
"recording.edf",
channel_aliases={
"ECG": ["MyDeviceEKG"], # canonical "ECG" <- EDF label "MyDeviceEKG"
"C3": ["EEG_C3_custom"],
},
)channel_aliases is also accepted by preprocess_edf(...) in the step-by-step API. The
built-in alias tables live in hypnos.data.edf (ALT_COLUMNS).
Hypnos produces embeddings at 1 Hz for each modality. In our experiments, we found that simple pooling over modalities and timescales works well for downstream tasks. For example, to produce a single embedding per 30-second sleep epoch:
import numpy as np
emb = embed_edf("recording.edf")
# Average over modalities -> [n_seconds, embed_dim] (the summary vector z_t)
fused = np.mean(list(emb.values()), axis=0)
# Mean-pool over each 30-second epoch -> [n_epochs, embed_dim]
n_epochs = fused.shape[0] // 30
epochs = fused[: n_epochs * 30].reshape(n_epochs, 30, -1).mean(axis=1)Hypnos is fully generative, and can be used to auto-regressively forecast physiological signals conditioned on input context:
from hypnos.embedding import load_model, synthesize
model, tokenizers, meta = load_model()
print([m.name for m in meta.modalities]) # available modality names
# Jointly generate three modalities from a cold start (no recording needed).
signals = synthesize(model, tokenizers, meta,
modalities=["eeg_c3", "ecg", "resp_thx"], num_steps=30)
# signals: {name: 1-D waveform at the modality's native rate}
# signals["ecg"] → 30 s @ 128 Hz = (3840,); signals["resp_thx"] → (960,)Pass prompt_tokens (e.g. from tokenize(...)) to forecast a continuation of a real
recording.
EEG, ECG and respiration jointly generated by Hypnos from a cold start (30 s).
The whole model — the RQ-Transformer and all 5 tokenizers — ships as a single
safetensors file, hypnos.safetensors. All weights live under namespaced keys
(model/…, tok/<name>/…) and the config (model + tokenizer construction kwargs, modality
layout) is a JSON string in the file's metadata, so loading is fully self-contained and needs
no config framework. safetensors is a pure-tensor format — no arbitrary-code unpickling.
load_model / embed_edf default to the released weights on the Hub, and also accept:
- a HuggingFace repo id, e.g.
"owner/hypnos"(downloads the bundle file), - a local path to the
.safetensorsbundle, - a local directory containing
hypnos.safetensors.
Devices: CUDA, CPU, and Apple Silicon (MPS) are all supported. On CUDA, windowed attention uses a fused
flex_attentionkernel.flex_attentionhas no Metal kernel, so on MPS — and in eager mode on CPU — the model falls back to a dense-mask SDPA path that materialises a full(chunk, chunk)score matrix per head: peak memory grows ~quadratically withchunk_tokens(≈8 GB at the default of 2048; ≈19 GB at 4096). Recording length itself does not raise peak memory — chunks run sequentially — so a full night works on CPU or MPS (a 3 h record takes ~50 s at ~11 GB RAM on CPU). On Apple Silicon this memory is shared with the system, so lowerchunk_tokensif constrained.
