Take any nn.Module. Lobotomize it. Make it smaller, faster, cheaper.
Composable model compression for PyTorch. Run pipelines to use compression techniques like quantization, pruning, knowledge distillation, and more with a one-liner, an explicit pipeline, or the CLI.
Over time will try to serve as an easy-to-use collection of popular techniques to help with R&D in the field.
pip install lobotomizerOptional extras:
pip install lobotomizer[all] # everything
pip install lobotomizer[dev] # pytest
pip install lobotomizer[pruning] # torch-pruning
pip install lobotomizer[quantize] # bitsandbytesOr install from source:
git clone https://github.com/usmank13/lobotomizer.git
cd lobotomizer
pip install -e ".[dev]"import lobotomizer as lob
result = lob.compress(model, recipe="balanced")
print(result.summary())
result.save("compressed/")import lobotomizer as lob
result = lob.Pipeline([
lob.Prune(method="l1_unstructured", sparsity=0.4),
lob.Quantize(method="dynamic"),
]).run(model)
print(result.summary())# Compress with a recipe
lobotomize model.pt --recipe balanced --output compressed/
# Compress with explicit options
lobotomize model.pt --prune l1_unstructured --sparsity 0.3 --quantize dynamic -o out/
# Profile only
lobotomize model.pt --profile-only --input-shape "1,3,224,224"
# List available recipes
lobotomize --list-recipesReal results from compressing Whisper-tiny (39M params) with dynamic int8 quantization:
┌───────────────────────┬────────────┬────────────┬────────┐
│ Metric │ Before │ After │ Δ │
├───────────────────────┼────────────┼────────────┼────────┤
│ param_count │ 37,760,640 │ 37,760,640 │ +0.0% │
│ param_count_trainable │ 37,760,640 │ 21,245,568 │ -43.7% │
│ size_mb │ 144.10 │ 97.02 │ -32.7% │
└───────────────────────┴────────────┴────────────┴────────┘
| Method | Description |
|---|---|
l1_unstructured |
Remove weights with smallest L1 magnitude |
random_unstructured |
Remove random weights |
l1_structured |
Remove entire channels by L1 norm (Conv2d) — mask-based |
random_structured |
Remove random channels (Conv2d) — mask-based |
| Method | Description | Extra Deps |
|---|---|---|
dynamic / dynamic_int8 |
Dynamic int8 quantization (Linear layers) | — |
static / static_int8 |
Static int8 quantization (requires calibration data) | — |
int4_weight_only |
INT4 weight-only quantization via torchao | — |
gptq |
GPTQ quantization (HuggingFace models) | auto-gptq |
awq |
AWQ quantization (HuggingFace models) | autoawq |
Methods with optional dependencies will raise a clear ImportError if the dep is missing.
# INT4 weight-only (no extra deps needed)
lob.Quantize(method="int4_weight_only")
# GPTQ with custom settings
lob.Quantize(method="gptq", bits=4, group_size=128)
# AWQ
lob.Quantize(method="awq", w_bit=4, q_group_size=128)Train a compressed student model to mimic the original teacher:
import lobotomizer as lob
# Logit-based distillation (Hinton-style)
result = lob.Pipeline([
lob.StructuredPrune(sparsity=0.3),
lob.Distill(method="logit", temperature=4.0, epochs=5, lr=1e-4),
]).run(model, training_data=train_loader)
# Feature matching — align intermediate representations
result = lob.Pipeline([
lob.Distill(
method="feature",
feature_layers={"fc1": "fc1", "fc2": "fc2"},
epochs=10,
),
]).run(model, training_data=train_loader)
# Both logit + feature distillation
result = lob.Pipeline([
lob.Distill(method="both", alpha=0.7, temperature=4.0, epochs=10),
]).run(model, training_data=train_loader)| Parameter | Description |
|---|---|
method |
"logit", "feature", or "both" |
temperature |
Softmax temperature for logit KD (default: 4.0) |
alpha |
KD loss weight; 1-alpha goes to task loss (default: 1.0) |
feature_layers |
dict[str,str] mapping student→teacher layer names (auto-matched if None) |
teacher |
nn.Module, file path, or None (uses original model) |
epochs |
Training epochs (default: 5) |
YAML recipe:
stages:
- type: structured_prune
sparsity: 0.3
- type: distill
method: logit
temperature: 4.0
epochs: 5Recipes are YAML files that define a sequence of stages:
name: balanced
description: "Structured pruning + dynamic int8 quantization"
stages:
- type: prune
method: l1_unstructured
sparsity: 0.25
- type: quantize
method: dynamic
dtype: qint8Built-in recipes: balanced, aggressive
Use custom recipes: lob.compress(model, recipe="path/to/recipe.yaml")
StructuredPrune physically removes neurons and filters, producing smaller architectures with real speedups — no sparse hardware needed.
| Layer Type | What's Removed | Propagation |
|---|---|---|
nn.Linear |
Output neurons (rows) | Downstream Linear input columns |
nn.Conv2d |
Output filters (dim 0) | Downstream Conv2d input channels + BatchNorm2d |
import lobotomizer as lob
# Prune 30% of neurons/filters from all layers
result = lob.Pipeline([
lob.StructuredPrune(sparsity=0.3, criterion="l1"),
]).run(model)
# Exclude specific layers, protect output interface
result = lob.Pipeline([
lob.StructuredPrune(
sparsity=0.4,
protect_output=True, # default: don't prune output layers
exclude_layers={"layer4"}, # skip specific layers
),
]).run(model)Note: Conv2d chains are only detected within nn.Sequential containers to avoid breaking skip/residual connections. Models with residual blocks (ResNet, etc.) will have their conv layers automatically protected.
Replace weight matrices with low-rank SVD approximations, factoring a single layer into two smaller layers:
import lobotomizer as lob
# Fraction-based: keep 50% of singular values
result = lob.Pipeline([
lob.LowRank(rank_fraction=0.5),
]).run(model)
# Energy-based: keep enough singular values for 90% spectral energy
result = lob.Pipeline([
lob.LowRank(rank_fraction=0.9, criterion="energy"),
]).run(model)| Parameter | Description |
|---|---|
rank_fraction |
Fraction of rank to keep (fraction mode) or energy to preserve (energy mode) |
criterion |
"fraction" (default) or "energy" |
min_rank |
Minimum rank to preserve (default: 1) |
min_compression |
Only decompose if result is ≤ this fraction of original params (default: 0.9) |
exclude_layers |
Named layers to skip |
Works on both nn.Linear and nn.Conv2d layers. Conv2d layers are decomposed into a spatial conv (original kernel) followed by a 1×1 conv.
YAML recipe:
stages:
- type: low_rank
rank_fraction: 0.5
criterion: energyExport compressed models to ONNX for deployment with ONNX Runtime, TensorRT, etc:
import lobotomizer as lob
result = lob.compress(model, recipe="balanced")
# From Result object
result.to_onnx("model.onnx", input_shape=(1, 3, 224, 224))
# Standalone function
lob.to_onnx(model, "model.onnx", input_shape=(1, 3, 224, 224))CLI:
lobotomize model.pt --recipe balanced --export-onnx model.onnx --input-shape "1,3,224,224"Optional: install onnx for verification, onnxsim for graph simplification.
Model → [Stage 1] → [Stage 2] → ... → Result
Prune Quantize
- Pipeline — a list of
Stageobjects run sequentially - Stages — each stage (
Prune,Quantize) transforms the model in-place on a deep copy - Profiler — measures param count, size, and FLOPs before/after each stage
- Result — holds the compressed model, profiles, and stage history; can save and summarize
- Recipes — YAML configs that build pipelines from named stages
The original model is never mutated.
See examples/ for complete, runnable scripts:
| Script | What it does |
|---|---|
resnet50_edge.py |
ResNet50 pruned + quantized for edge deployment |
bert_quantize.py |
BERT-base quantized for faster CPU inference |
whisper_compress.py |
Whisper small compressed for on-device transcription |
yolo_edge.py |
YOLOv8n compressed for real-time edge inference |
mobilevit_compress.py |
MobileViT further compressed for ultra-constrained devices |
Each script is self-contained and falls back to a dummy model if optional dependencies aren't installed. Note: not all of these are fully tested yet.
- Prune, Quantize, Pipeline, profiler, recipes, CLI
- Knowledge distillation (logit, feature)
- Low-rank decomposition (SVD)
- Sparsity techniques
- ONNX export (API, CLI, Result method)
- Hardware profiling and optimization
- Search & automation (sweeps, finding lobotomization pipelines to hit given targets)
Over time: progressively support and wrap more techniques, layer types, tools, etc.
Contributions welcome! Let's grow the lobotomization movement.
- Fork & clone
pip install -e ".[dev]"pytest- PR
MIT