Hugging Face Model integration in Superbench#803
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds HuggingFace Hub as a first-class model source across SuperBench training benchmarks and ORT/TensorRT inference micro-benchmarks, enabling users to benchmark arbitrary HF models via CLI flags (including gated models via HF_TOKEN).
Changes:
- Introduces
ModelSourceConfigandHuggingFaceModelLoaderfor unified HF model configuration/loading and memory-fit checks. - Extends PyTorch model benchmarks to optionally load HF backbones and wrap them with task-specific heads.
- Adds HF→ONNX export support and integrates HF flows into ORT and TensorRT inference micro-benchmarks, plus new tests and examples.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/benchmarks/micro_benchmarks/test_model_source_config.py | Adds unit tests for ModelSourceConfig validation/defaulting. |
| tests/benchmarks/micro_benchmarks/test_huggingface_loader.py | Adds unit tests for HF loader dtype handling, load flow, and size estimation. |
| tests/benchmarks/micro_benchmarks/test_huggingface_e2e.py | Adds integration tests that download real HF models and validate basic forward pass. |
| superbench/benchmarks/model_benchmarks/pytorch_mixtral_impl.py | Adds HF config customization + wrapper and HF-loading branch for Mixtral benchmark. |
| superbench/benchmarks/model_benchmarks/pytorch_lstm.py | Adds HF-loading path + wrapper and refactors in-house model creation. |
| superbench/benchmarks/model_benchmarks/pytorch_llama.py | Adds HF-loading path + wrapper and refactors in-house model creation. |
| superbench/benchmarks/model_benchmarks/pytorch_gpt2.py | Adds HF-loading path + wrapper and refactors in-house model creation. |
| superbench/benchmarks/model_benchmarks/pytorch_cnn.py | Adds HF-loading path + wrapper for HF vision backbones, keeps in-house torchvision path. |
| superbench/benchmarks/model_benchmarks/pytorch_bert.py | Adds HF-loading path + wrapper and refactors in-house model creation. |
| superbench/benchmarks/model_benchmarks/pytorch_base.py | Adds shared HF model loading flow, memory estimation, and CLI args for model source/identifier. |
| superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py | Adds HF model preprocessing: config-only memory check, HF load, ONNX export, TRT build command. |
| superbench/benchmarks/micro_benchmarks/ort_inference_performance.py | Adds HF preprocessing (config memory check, HF load, ONNX export/quantize) + dynamic input handling. |
| superbench/benchmarks/micro_benchmarks/model_source_config.py | New dataclass encapsulating model source, identifier, dtype, token, and loader kwargs. |
| superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py | New loader for HF Hub with tokenizer support, size/memory estimation utilities, and pre-checks. |
| superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py | Adds HF model ONNX export with vision/NLP detection, dynamic axes, and optional external data output. |
| examples/benchmarks/tensorrt_inference_performance.py | Updates example script to show in-house vs HF usage via CLI. |
| examples/benchmarks/pytorch_huggingface_models.py | New example demonstrating HF-backed training benchmarks, incl. distributed option. |
| examples/benchmarks/ort_inference_performance.py | Updates ORT example script to show in-house vs HF usage via CLI. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f689460 to
2a47dc8
Compare
2a47dc8 to
a61db26
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 13 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a61db26 to
6bebb38
Compare
6bebb38 to
4eec2f7
Compare
4eec2f7 to
2f24e0f
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2f24e0f to
18df07b
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
12a65ad to
7632427
Compare
dca9515 to
7094628
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…e benchmarks - Add HuggingFaceModelLoader for downloading and caching models from HF Hub - Support both NLP (AutoModelForCausalLM) and vision (AutoModelForImageClassification) models - Add model_source and model_identifier parameters to TensorRT/ORT benchmarks - Add ONNX export pipeline for HuggingFace models with dynamic axes - Derive vision input shapes from ONNX graph dims with HF config fallback - Filter ONNX initializers from graph.input for correct NLP input handling - Add PyTorch 2.8+ compatibility (external_data vs use_external_data_format) - Add example script, unit tests, and config schema updates - Support HF_TOKEN env var for gated model access
7094628 to
2ca3e68
Compare
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
c33dda1 to
54e4153
Compare
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #803 +/- ##
==========================================
- Coverage 86.02% 82.54% -3.48%
==========================================
Files 103 105 +2
Lines 7950 8498 +548
==========================================
+ Hits 6839 7015 +176
- Misses 1111 1483 +372
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4603264 to
40eb599
Compare
40eb599 to
beebe48
Compare
beebe48 to
603ccf6
Compare
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
603ccf6 to
09f14b2
Compare
| # Export directly to final destination to avoid path issues with external data | ||
| onnx_path = exporter.export_huggingface_model( | ||
| model=hf_model, | ||
| model_name=model_name_with_precision, | ||
| batch_size=self._args.batch_size, | ||
| seq_length=self._args.seq_length, | ||
| output_dir=str(proc_output_path), | ||
| ) |
| loader = HuggingFaceModelLoader(allow_remote_code=allow_remote_code) | ||
| hf_model, _, _ = loader.load_model_from_config(model_config, device='cpu') |
| onnx_path = exporter.export_huggingface_model( | ||
| model=hf_model, | ||
| model_name=model_name, | ||
| batch_size=self._args.batch_size, | ||
| seq_length=self._args.seq_length, | ||
| output_dir=output_dir, | ||
| ) |
| from superbench.benchmarks.micro_benchmarks.huggingface_model_loader import HuggingFaceModelLoader # noqa: E402 | ||
| from superbench.benchmarks.micro_benchmarks.model_source_config import ModelSourceConfig # noqa: E402 |
|
|
||
| Uses prajjwal1/bert-tiny which is a small public BERT model (~17MB). | ||
| """ | ||
| model, config, tokenizer = loader.load_model('prajjwal1/bert-tiny', device='cpu') |
|
|
||
| Uses distilbert/distilgpt2 which is a small public GPT-2 model (~82MB). | ||
| """ | ||
| model, config, tokenizer = loader.load_model('distilbert/distilgpt2', device='cpu') |
| """Test loading model using ModelSourceConfig via load_model_from_config.""" | ||
| config = ModelSourceConfig(source='huggingface', identifier='prajjwal1/bert-tiny', torch_dtype='float32') | ||
|
|
||
| model, hf_config, tokenizer = loader.load_model_from_config(config, device='cpu') |
|
|
||
| def test_load_model_with_dtype(self, loader): | ||
| """Test loading model and converting dtype after load.""" | ||
| model, config, tokenizer = loader.load_model('prajjwal1/bert-tiny', device='cpu') |
| @pytest.mark.skipif(not torch.cuda.is_available(), reason='Requires GPU') | ||
| def test_load_model_to_gpu(self, loader): | ||
| """Test loading model and moving to GPU.""" | ||
| model, config, tokenizer = loader.load_model('prajjwal1/bert-tiny', device='cpu') |
|
|
||
| def test_architecture_detection(self, loader): | ||
| """Test that architecture is correctly detected from loaded model.""" | ||
| model, config, tokenizer = loader.load_model('prajjwal1/bert-tiny', device='cpu') |
Adds support for loading and benchmarking models from HuggingFace Hub across Inference micro-benchmarks -ORT/TensorRT inference. Users can run any compatible HF-hosted model through the existing benchmark harness using --model_source huggingface --model_identifier <org/model>.
SuperBench previously only supported in-house model definitions with hardcoded architectures. Adding new models required code changes. This PR allows benchmarking any compatible HuggingFace model with a CLI flag change, including gated models via HF_TOKEN.
Key Changes
New modules:
HuggingFaceModelLoader — Downloads, caches, and loads models from HF Hub. Estimates parameter count from model config (few KB) and checks GPU
memory before downloading full weights to avoid failed multi-GB downloads.
ModelSourceConfig — Dataclass for model source configuration (in-house / huggingface), dtype, revision, auth token, and device mapping.
Micro-benchmarks (inference):
ORT inference — Downloads HF model → exports to ONNX → runs ORT inference. Handles both vision (pixel_values) and NLP (input_ids) inputs
automatically.
TensorRT inference — Same flow: download → ONNX export → trtexec engine build → inference. Includes dynamic input shape detection from the
exported ONNX graph.
ONNX exporter — New export_huggingface_model() method with vision/NLP auto-detection, dynamic axes, and external data support for large models
(>2GB).
Testing
Usage
Training benchmark
ORT inference
python examples/benchmarks/ort_inference_performance.py
--model_source huggingface --model_identifier bert-base-uncased
TensorRT inference
python examples/benchmarks/tensorrt_inference_performance.py
--model_source huggingface --model_identifier microsoft/resnet-50
Gated models
export HF_TOKEN=hf_xxxxx