Inspired by Yang Song's Generative Modeling by Estimating Gradients of the Data Distribution.
A clean PyTorch implementation of Denoising Diffusion Probabilistic Models (DDPM) with DDIM sampling and Classifier-Free Guidance (CFG), built on top of PyTorch Lightning.
Note: This repository is a personal study project aimed at reproducing and understanding diffusion models from the ground up. The goal is to stay in sync with the literature and build intuition for the moving parts, not to provide a production-ready library.
ResUNet (hidden_dim=192, depth=4, attention at 16×16 and 8×8), cosine schedule, 1000 timesteps, trained for 105k steps on 64×64 crops.
ResUNet (hidden_dim=128, depth=4, attention at 16×16 and 4×4), cosine schedule, 1000 timesteps, trained for 400 epochs on 32×32 images.
ResUNet (hidden_dim=192, depth=4, attention at 16×16 and 8×8), cosine schedule, 1000 timesteps, trained for 120k steps on 64×64 images with CFG (guidance scale=4, class dropout=0.15).
- ResUNet backbone with time-conditioned residual blocks and multi-head self-attention
- DDPM ancestral sampling and DDIM deterministic sampling
- Classifier-Free Guidance (CFG) for class-conditional generation
- EMA weight averaging for stable, high-quality samples
- Cosine and linear noise schedules
- WandB integration with automatic sample logging during training
- Config-driven training via LightningCLI — no code changes needed to swap datasets or architectures
| Dataset | Resolution | Conditional |
|---|---|---|
| FashionMNIST | 28×28 | No |
| CIFAR-10 | 32×32 | No |
| Imagenette | 64×64 | Yes (10 classes) |
| CelebA | 64×64 | No |
Requires Python 3.14+ and uv.
git clone https://github.com/Soarxyn/diffusion
cd diffusion
uv syncFor GPU support (CUDA 13.0):
uv sync --extra-index-url https://download.pytorch.org/whl/cu130Training is fully config-driven. Pick a preset or supply your own YAML:
# FashionMNIST (default, quick experiment)
uv run python -m diffusion fit --config config/default.yaml
# CIFAR-10
uv run python -m diffusion fit --config config/cifar10.yaml
# Imagenette with CFG
uv run python -m diffusion fit --config config/imagenette_cond.yaml
# CelebA
uv run python -m diffusion fit --config config/celeba.yamlYou can override any parameter directly from the command line:
uv run python -m diffusion fit --config config/celeba.yaml --model.lr 1e-4 --trainer.max_epochs 200Checkpoints are saved to results/ and samples are logged to WandB every N steps.
The model is a ResUNet with:
- Time embeddings injected into each residual block via scale-shift normalization
- Self-attention at configurable resolution levels
- Pixel-shuffle downsampling / nearest-neighbour upsampling
- Class embeddings added to the time embedding for conditional generation (CFG uses a null token with dropout during training)
The diffusion process supports both linear and cosine noise schedules. At inference, both DDPM (full 1000-step) and DDIM (configurable steps, default 50) samplers are available.
- J. Ho, A. Jain, and P. Abbeel, Denoising Diffusion Probabilistic Models, arXiv, 2020.
- A. Nichol and P. Dhariwal, Improved Denoising Diffusion Probabilistic Models, arXiv, 2021.
- J. Song, C. Meng, and S. Ermon, Denoising Diffusion Implicit Models, arXiv, 2020.
- J. Ho and T. Salimans, Classifier-Free Diffusion Guidance, arXiv, 2022.
- J. Howard, Imagenette: A smaller subset of 10 easily classified classes from Imagenet, GitHub, 2019.
- lucidrains/denoising-diffusion-pytorch — the UNet and sampling implementation are based on this repository (MIT License)
- The Annotated Diffusion Model — reference for the diffusion scheduler implementation




