Experiments for training GPT-style language models with different optimizers. The project uses Hydra for configuration, local sweeps, and output organization, with optional SLURM helpers for larger runs.
Set up the environment:
./setup_env.sh
module load python
source venv/bin/activateRun a small local smoke test:
python run_hydra.py model=gpt-tiny optimizer=adamw data=shakespeare training=shakespeare logging=defaultEnable Weights & Biases only when you want online experiment tracking:
wandb login
python run_hydra.py model=gpt-tiny optimizer=adamw data=shakespeare training=shakespeare logging=wandbSelect configs by naming files under hydra_conf/<group>/:
python run_hydra.py \
model=gpt-medium \
optimizer=adamw \
data=slim_pajama10B \
training=slim_pajama10B \
logging=defaultOverride any nested field with Hydra dot syntax:
python run_hydra.py \
model=gpt-tiny \
optimizer=adamw \
data=shakespeare \
training=shakespeare \
logging=default \
optimizer.optimizer_params.lr=0.001 \
training.training_params.batch_size=32By default, single-run outputs go to:
outputs/<model>/<dataset>/<optimizer>/<run_name>/
Each run directory contains a metrics JSON file, task.log, and a .hydra/ snapshot of the resolved config.
Reusable experiment configs live in hydra_conf/experiment/. Select one with experiment=<name>:
python run_hydra.py experiment=tiny_shakespeare_smokeExperiment configs can override model, data, training, optimizer, logging, paths, and Hydra output directories in one place. They can also embed a sweep by setting hydra.mode: MULTIRUN and hydra.sweeper.params.
Use Hydra multirun mode (-m) with comma-separated values:
python run_hydra.py -m \
model=gpt-tiny \
data=shakespeare \
training=shakespeare \
logging=default \
optimizer=adamw,muon \
optimizer.optimizer_params.lr=1e-5,3e-5,1e-4,3e-4,1e-3,3e-3,1e-2,3e-2,1e-1 \
optimizer.optimizer_params.weight_decay=0.0To run an experiment config that already embeds its sweep, use the config name directly:
python run_hydra.py experiment=tiny_shakespeare_adamw_muon_lr_sweepThat works when the experiment YAML contains:
hydra:
mode: MULTIRUN
sweeper:
params:
optimizer: shampoo
optimizer.optimizer_params.lr: 1e-5,3e-5,1e-4,3e-4,1e-3,3e-3,1e-2,3e-2,1e-1Use command-line -m sweeps when the sweep values are not embedded in the experiment config, or when you want to override them at launch time.
The SLURM experiment task generator also reads hydra.sweeper.params; each generated task forces hydra.mode=RUN so that one SLURM task runs one parameter combination.
If you override an experiment to run Muon and it fails inside TorchInductor/Triton compilation, rerun the Muon half with compile disabled:
TORCH_COMPILE_DISABLE=1 python run_hydra.py -m \
experiment=tiny_shakespeare_adamw_muon_lr_sweep \
optimizer=muon \
optimizer.optimizer_params.lr=1e-5,3e-5,1e-4,3e-4,1e-3,3e-3,1e-2,3e-2,1e-1 \
optimizer.optimizer_params.weight_decay=0.0Plot outputs by model and dataset:
python plot.py gpt-tiny tiny_shakespeareThis reads:
outputs/gpt-tiny/tiny_shakespeare
Plot a named output folder directly:
python plot.py test_runThis reads:
outputs/test_run
Figures are written under figures/ using the same naming structure.
Hydra config groups are organized under hydra_conf/:
hydra_conf/
├── config.yaml # default config composition
├── model/ # model architecture configs
├── optimizer/ # optimizer configs
├── training/ # training-loop configs
├── data/ # dataset configs
├── logging/ # default or wandb logging
├── paths/ # run naming helpers
├── hydra/ # Hydra output layout
└── experiment/ # reusable full experiment configs
The default output pattern is controlled by hydra_conf/hydra/default.yaml:
hydra:
run:
dir: outputs/${model.name}/${data.dataset.name}/${optimizer.optimizer_params.name}/${paths.run_name}Experiment configs can override this. For example, a sweep can write to outputs/test_run with:
hydra:
sweep:
dir: outputs/test_run
subdir: ${optimizer.optimizer_params.name}/${paths.run_name}For a deeper explanation of the Hydra setup, see docs/hydra.md.
Use the SLURM helpers when you want to distribute parameter sweeps across GPUs or nodes.
./slurm_scripts/submit.sh \
scripts/run_slim_pajama10B_adam_medium.sh \
param_configs/adamw.json \
experiment_name \
8With a custom partition:
PARTITION=gpu ./slurm_scripts/submit.sh scripts/run_*.sh params.json exp_name 4./slurm_scripts/submit_nodes_ddp.sh \
scripts/run_slim_pajama10B_adam_large.sh \
param_configs/adamw.json \
experiment_name \
4 \
--num_gpus=8 \
--partition=gpu \
--constraint=a100Options:
--num_gpus=N: GPUs per node, default4--partition=P: SLURM partition, defaultgpuxl--constraint=C: GPU constraint, such ash100ora100
squeue --me
tail -f logs/<experiment_name>/run_info_N/logs/log_0.out
scancel <job_id>
scancel --name=<experiment_name>Start an interactive GPU shell:
srun --gpus=1 --cpus-per-gpu=8 --time=4:00:00 --partition=gpu --pty bash
module load python
source venv/bin/activateGPT-opt/
├── gptopt/ # package code: models, data loading, optimizers, training
├── hydra_conf/ # Hydra config groups and experiment configs
├── scripts/ # training wrapper scripts for SLURM workflows
├── slurm_scripts/ # SLURM submission infrastructure
├── param_configs/ # JSON parameter grids for SLURM sweeps
├── outputs/ # training results and Hydra run snapshots
├── figures/ # generated plots
├── logs/ # SLURM task logs
└── slurm_logs/ # SLURM system logs
Training outputs:
outputs/<model>/<dataset>/<optimizer>/<run_name>/
├── .hydra/config.yaml
├── .hydra/overrides.yaml
├── task.log
└── <optimizer>-*.json
SLURM sweep logs:
logs/<experiment_name>/
└── run_info_N/
├── train.sh
├── params.json
├── tasks
└── logs/
├── log_0.out
├── log_0.err
└── ...
SLURM system logs:
slurm_logs/
├── slurm_job_<id>.out
└── slurm_job_<id>.err
| Task | Command |
|---|---|
| Local smoke test | python run_hydra.py model=gpt-tiny optimizer=adamw data=shakespeare training=shakespeare logging=default |
| Run experiment config | python run_hydra.py experiment=tiny_shakespeare_smoke |
| Local Hydra sweep | python run_hydra.py -m optimizer=adamw,muon optimizer.optimizer_params.lr=1e-4,1e-3 |
| Plot model/data outputs | python plot.py gpt-tiny tiny_shakespeare |
| Plot named output folder | python plot.py test_run |
| Submit SLURM sweep | ./slurm_scripts/submit.sh scripts/run_*.sh params.json exp_name 8 |
| Submit multi-node DDP | ./slurm_scripts/submit_nodes_ddp.sh scripts/run_*.sh params.json exp_name 4 |
| Check jobs | squeue --me |
| View task log | tail -f logs/<exp>/run_info_N/logs/log_0.out |
Older pre-Hydra entry points may still exist in historical scripts, but new runs should use run_hydra.py.