Skip to content

gowerrobert/GPT-opt

Repository files navigation

GPT-opt

Experiments for training GPT-style language models with different optimizers. The project uses Hydra for configuration, local sweeps, and output organization, with optional SLURM helpers for larger runs.

Quick Start

Set up the environment:

./setup_env.sh
module load python
source venv/bin/activate

Run a small local smoke test:

python run_hydra.py model=gpt-tiny optimizer=adamw data=shakespeare training=shakespeare logging=default

Enable Weights & Biases only when you want online experiment tracking:

wandb login
python run_hydra.py model=gpt-tiny optimizer=adamw data=shakespeare training=shakespeare logging=wandb

Running Experiments

Basic Hydra Runs

Select configs by naming files under hydra_conf/<group>/:

python run_hydra.py \
    model=gpt-medium \
    optimizer=adamw \
    data=slim_pajama10B \
    training=slim_pajama10B \
    logging=default

Override any nested field with Hydra dot syntax:

python run_hydra.py \
    model=gpt-tiny \
    optimizer=adamw \
    data=shakespeare \
    training=shakespeare \
    logging=default \
    optimizer.optimizer_params.lr=0.001 \
    training.training_params.batch_size=32

By default, single-run outputs go to:

outputs/<model>/<dataset>/<optimizer>/<run_name>/

Each run directory contains a metrics JSON file, task.log, and a .hydra/ snapshot of the resolved config.

Experiment Configs

Reusable experiment configs live in hydra_conf/experiment/. Select one with experiment=<name>:

python run_hydra.py experiment=tiny_shakespeare_smoke

Experiment configs can override model, data, training, optimizer, logging, paths, and Hydra output directories in one place. They can also embed a sweep by setting hydra.mode: MULTIRUN and hydra.sweeper.params.

Local Sweeps

Use Hydra multirun mode (-m) with comma-separated values:

python run_hydra.py -m \
    model=gpt-tiny \
    data=shakespeare \
    training=shakespeare \
    logging=default \
    optimizer=adamw,muon \
    optimizer.optimizer_params.lr=1e-5,3e-5,1e-4,3e-4,1e-3,3e-3,1e-2,3e-2,1e-1 \
    optimizer.optimizer_params.weight_decay=0.0

To run an experiment config that already embeds its sweep, use the config name directly:

python run_hydra.py experiment=tiny_shakespeare_adamw_muon_lr_sweep

That works when the experiment YAML contains:

hydra:
  mode: MULTIRUN
  sweeper:
    params:
      optimizer: shampoo
      optimizer.optimizer_params.lr: 1e-5,3e-5,1e-4,3e-4,1e-3,3e-3,1e-2,3e-2,1e-1

Use command-line -m sweeps when the sweep values are not embedded in the experiment config, or when you want to override them at launch time.

The SLURM experiment task generator also reads hydra.sweeper.params; each generated task forces hydra.mode=RUN so that one SLURM task runs one parameter combination.

If you override an experiment to run Muon and it fails inside TorchInductor/Triton compilation, rerun the Muon half with compile disabled:

TORCH_COMPILE_DISABLE=1 python run_hydra.py -m \
    experiment=tiny_shakespeare_adamw_muon_lr_sweep \
    optimizer=muon \
    optimizer.optimizer_params.lr=1e-5,3e-5,1e-4,3e-4,1e-3,3e-3,1e-2,3e-2,1e-1 \
    optimizer.optimizer_params.weight_decay=0.0

Plotting Results

Plot outputs by model and dataset:

python plot.py gpt-tiny tiny_shakespeare

This reads:

outputs/gpt-tiny/tiny_shakespeare

Plot a named output folder directly:

python plot.py test_run

This reads:

outputs/test_run

Figures are written under figures/ using the same naming structure.

Configuration Layout

Hydra config groups are organized under hydra_conf/:

hydra_conf/
├── config.yaml       # default config composition
├── model/            # model architecture configs
├── optimizer/        # optimizer configs
├── training/         # training-loop configs
├── data/             # dataset configs
├── logging/          # default or wandb logging
├── paths/            # run naming helpers
├── hydra/            # Hydra output layout
└── experiment/       # reusable full experiment configs

The default output pattern is controlled by hydra_conf/hydra/default.yaml:

hydra:
  run:
    dir: outputs/${model.name}/${data.dataset.name}/${optimizer.optimizer_params.name}/${paths.run_name}

Experiment configs can override this. For example, a sweep can write to outputs/test_run with:

hydra:
  sweep:
    dir: outputs/test_run
    subdir: ${optimizer.optimizer_params.name}/${paths.run_name}

For a deeper explanation of the Hydra setup, see docs/hydra.md.

SLURM Runs

Use the SLURM helpers when you want to distribute parameter sweeps across GPUs or nodes.

Standard Parameter Sweep

./slurm_scripts/submit.sh \
    scripts/run_slim_pajama10B_adam_medium.sh \
    param_configs/adamw.json \
    experiment_name \
    8

With a custom partition:

PARTITION=gpu ./slurm_scripts/submit.sh scripts/run_*.sh params.json exp_name 4

Multi-Node DDP

./slurm_scripts/submit_nodes_ddp.sh \
    scripts/run_slim_pajama10B_adam_large.sh \
    param_configs/adamw.json \
    experiment_name \
    4 \
    --num_gpus=8 \
    --partition=gpu \
    --constraint=a100

Options:

  • --num_gpus=N: GPUs per node, default 4
  • --partition=P: SLURM partition, default gpuxl
  • --constraint=C: GPU constraint, such as h100 or a100

Monitoring Jobs

squeue --me
tail -f logs/<experiment_name>/run_info_N/logs/log_0.out
scancel <job_id>
scancel --name=<experiment_name>

Start an interactive GPU shell:

srun --gpus=1 --cpus-per-gpu=8 --time=4:00:00 --partition=gpu --pty bash
module load python
source venv/bin/activate

Repository Structure

GPT-opt/
├── gptopt/              # package code: models, data loading, optimizers, training
├── hydra_conf/          # Hydra config groups and experiment configs
├── scripts/             # training wrapper scripts for SLURM workflows
├── slurm_scripts/       # SLURM submission infrastructure
├── param_configs/       # JSON parameter grids for SLURM sweeps
├── outputs/             # training results and Hydra run snapshots
├── figures/             # generated plots
├── logs/                # SLURM task logs
└── slurm_logs/          # SLURM system logs

Output And Log Locations

Training outputs:

outputs/<model>/<dataset>/<optimizer>/<run_name>/
├── .hydra/config.yaml
├── .hydra/overrides.yaml
├── task.log
└── <optimizer>-*.json

SLURM sweep logs:

logs/<experiment_name>/
└── run_info_N/
    ├── train.sh
    ├── params.json
    ├── tasks
    └── logs/
        ├── log_0.out
        ├── log_0.err
        └── ...

SLURM system logs:

slurm_logs/
├── slurm_job_<id>.out
└── slurm_job_<id>.err

Quick Reference

Task Command
Local smoke test python run_hydra.py model=gpt-tiny optimizer=adamw data=shakespeare training=shakespeare logging=default
Run experiment config python run_hydra.py experiment=tiny_shakespeare_smoke
Local Hydra sweep python run_hydra.py -m optimizer=adamw,muon optimizer.optimizer_params.lr=1e-4,1e-3
Plot model/data outputs python plot.py gpt-tiny tiny_shakespeare
Plot named output folder python plot.py test_run
Submit SLURM sweep ./slurm_scripts/submit.sh scripts/run_*.sh params.json exp_name 8
Submit multi-node DDP ./slurm_scripts/submit_nodes_ddp.sh scripts/run_*.sh params.json exp_name 4
Check jobs squeue --me
View task log tail -f logs/<exp>/run_info_N/logs/log_0.out

Legacy

Older pre-Hydra entry points may still exist in historical scripts, but new runs should use run_hydra.py.

About

Code for testing optimization methods and tricks for training GPT type models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors