Hugging Face Model integration in Superbench by Aishwarya-Tonpe · Pull Request #803 · microsoft/superbenchmark

Aishwarya-Tonpe · 2026-04-13T17:35:59Z

Adds support for loading and benchmarking models from HuggingFace Hub across Inference micro-benchmarks -ORT/TensorRT inference. Users can run any compatible HF-hosted model through the existing benchmark harness using --model_source huggingface --model_identifier <org/model>.

SuperBench previously only supported in-house model definitions with hardcoded architectures. Adding new models required code changes. This PR allows benchmarking any compatible HuggingFace model with a CLI flag change, including gated models via HF_TOKEN.

Key Changes

New modules:

HuggingFaceModelLoader — Downloads, caches, and loads models from HF Hub. Estimates parameter count from model config (few KB) and checks GPU
memory before downloading full weights to avoid failed multi-GB downloads.
ModelSourceConfig — Dataclass for model source configuration (in-house / huggingface), dtype, revision, auth token, and device mapping.

Micro-benchmarks (inference):
ORT inference — Downloads HF model → exports to ONNX → runs ORT inference. Handles both vision (pixel_values) and NLP (input_ids) inputs
automatically.
TensorRT inference — Same flow: download → ONNX export → trtexec engine build → inference. Includes dynamic input shape detection from the
exported ONNX graph.
ONNX exporter — New export_huggingface_model() method with vision/NLP auto-detection, dynamic axes, and external data support for large models
(>2GB).

Testing

test_model_source_config.py — Unit tests for validation, defaults, and edge cases.
test_huggingface_loader.py — Unit tests for dtype conversion, model size calculation, memory estimation, and param count estimation.
test_huggingface_e2e.py — End-to-end integration tests covering micro-benchmarks with real HF models.

Usage

Training benchmark

ORT inference
python examples/benchmarks/ort_inference_performance.py
--model_source huggingface --model_identifier bert-base-uncased

TensorRT inference
python examples/benchmarks/tensorrt_inference_performance.py
--model_source huggingface --model_identifier microsoft/resnet-50

Gated models
export HF_TOKEN=hf_xxxxx

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds HuggingFace Hub as a first-class model source across SuperBench training benchmarks and ORT/TensorRT inference micro-benchmarks, enabling users to benchmark arbitrary HF models via CLI flags (including gated models via HF_TOKEN).

Changes:

Introduces ModelSourceConfig and HuggingFaceModelLoader for unified HF model configuration/loading and memory-fit checks.
Extends PyTorch model benchmarks to optionally load HF backbones and wrap them with task-specific heads.
Adds HF→ONNX export support and integrates HF flows into ORT and TensorRT inference micro-benchmarks, plus new tests and examples.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tests/benchmarks/micro_benchmarks/test_model_source_config.py	Adds unit tests for `ModelSourceConfig` validation/defaulting.
tests/benchmarks/micro_benchmarks/test_huggingface_loader.py	Adds unit tests for HF loader dtype handling, load flow, and size estimation.
tests/benchmarks/micro_benchmarks/test_huggingface_e2e.py	Adds integration tests that download real HF models and validate basic forward pass.
superbench/benchmarks/model_benchmarks/pytorch_mixtral_impl.py	Adds HF config customization + wrapper and HF-loading branch for Mixtral benchmark.
superbench/benchmarks/model_benchmarks/pytorch_lstm.py	Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_llama.py	Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_gpt2.py	Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_cnn.py	Adds HF-loading path + wrapper for HF vision backbones, keeps in-house torchvision path.
superbench/benchmarks/model_benchmarks/pytorch_bert.py	Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_base.py	Adds shared HF model loading flow, memory estimation, and CLI args for model source/identifier.
superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py	Adds HF model preprocessing: config-only memory check, HF load, ONNX export, TRT build command.
superbench/benchmarks/micro_benchmarks/ort_inference_performance.py	Adds HF preprocessing (config memory check, HF load, ONNX export/quantize) + dynamic input handling.
superbench/benchmarks/micro_benchmarks/model_source_config.py	New dataclass encapsulating model source, identifier, dtype, token, and loader kwargs.
superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py	New loader for HF Hub with tokenizer support, size/memory estimation utilities, and pre-checks.
superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py	Adds HF model ONNX export with vision/NLP detection, dynamic axes, and optional external data output.
examples/benchmarks/tensorrt_inference_performance.py	Updates example script to show in-house vs HF usage via CLI.
examples/benchmarks/pytorch_huggingface_models.py	New example demonstrating HF-backed training benchmarks, incl. distributed option.
examples/benchmarks/ort_inference_performance.py	Updates ORT example script to show in-house vs HF usage via CLI.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 13 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…e benchmarks - Add HuggingFaceModelLoader for downloading and caching models from HF Hub - Support both NLP (AutoModelForCausalLM) and vision (AutoModelForImageClassification) models - Add model_source and model_identifier parameters to TensorRT/ORT benchmarks - Add ONNX export pipeline for HuggingFace models with dynamic axes - Derive vision input shapes from ONNX graph dims with HF config fallback - Filter ONNX initializers from graph.input for correct NLP input handling - Add PyTorch 2.8+ compatibility (external_data vs use_external_data_format) - Add example script, unit tests, and config schema updates - Support HF_TOKEN env var for gated model access

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Co-authored-by: Copilot Autofix powered by AI <[email protected]>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Co-authored-by: Copilot Autofix powered by AI <[email protected]>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

codecov · 2026-06-02T19:05:23Z

Codecov Report

❌ Patch coverage is 33.68984% with 372 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.54%. Comparing base (67298ae) to head (621c826).

Files with missing lines	Patch %	Lines
...marks/micro_benchmarks/huggingface_model_loader.py	45.81%	110 Missing ⚠️
...micro_benchmarks/tensorrt_inference_performance.py	19.04%	102 Missing ⚠️
...arks/micro_benchmarks/ort_inference_performance.py	22.52%	86 Missing ⚠️
...nchmarks/micro_benchmarks/_export_torch_to_onnx.py	16.27%	72 Missing ⚠️
...benchmarks/micro_benchmarks/model_source_config.py	94.28%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #803      +/-   ##
==========================================
- Coverage   86.02%   82.54%   -3.48%     
==========================================
  Files         103      105       +2     
  Lines        7950     8498     +548     
==========================================
+ Hits         6839     7015     +176     
- Misses       1111     1483     +372

Flag	Coverage Δ
cpu-python3.10-unit-test	`68.03% <28.08%> (-2.85%)`	⬇️
cpu-python3.7-unit-test	`67.51% <28.21%> (-2.80%)`	⬇️
cuda-unit-test	`80.54% <33.09%> (-3.42%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Co-authored-by: Copilot Autofix powered by AI <[email protected]>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.

+        # Export directly to final destination to avoid path issues with external data
+        onnx_path = exporter.export_huggingface_model(
+            model=hf_model,
+            model_name=model_name_with_precision,
+            batch_size=self._args.batch_size,
+            seq_length=self._args.seq_length,
+            output_dir=str(proc_output_path),
+        )


+        loader = HuggingFaceModelLoader(allow_remote_code=allow_remote_code)
+        hf_model, _, _ = loader.load_model_from_config(model_config, device='cpu')


+        onnx_path = exporter.export_huggingface_model(
+            model=hf_model,
+            model_name=model_name,
+            batch_size=self._args.batch_size,
+            seq_length=self._args.seq_length,
+            output_dir=output_dir,
+        )


+from superbench.benchmarks.micro_benchmarks.huggingface_model_loader import HuggingFaceModelLoader    # noqa: E402
+from superbench.benchmarks.micro_benchmarks.model_source_config import ModelSourceConfig    # noqa: E402


+
+        Uses prajjwal1/bert-tiny which is a small public BERT model (~17MB).
+        """
+        model, config, tokenizer = loader.load_model('prajjwal1/bert-tiny', device='cpu')


+
+        Uses distilbert/distilgpt2 which is a small public GPT-2 model (~82MB).
+        """
+        model, config, tokenizer = loader.load_model('distilbert/distilgpt2', device='cpu')


+        """Test loading model using ModelSourceConfig via load_model_from_config."""
+        config = ModelSourceConfig(source='huggingface', identifier='prajjwal1/bert-tiny', torch_dtype='float32')
+
+        model, hf_config, tokenizer = loader.load_model_from_config(config, device='cpu')


+
+    def test_load_model_with_dtype(self, loader):
+        """Test loading model and converting dtype after load."""
+        model, config, tokenizer = loader.load_model('prajjwal1/bert-tiny', device='cpu')


+    @pytest.mark.skipif(not torch.cuda.is_available(), reason='Requires GPU')
+    def test_load_model_to_gpu(self, loader):
+        """Test loading model and moving to GPU."""
+        model, config, tokenizer = loader.load_model('prajjwal1/bert-tiny', device='cpu')


+
+    def test_architecture_detection(self, loader):
+        """Test that architecture is correctly detected from loaded model."""
+        model, config, tokenizer = loader.load_model('prajjwal1/bert-tiny', device='cpu')


Aishwarya-Tonpe requested a review from a team as a code owner April 13, 2026 17:36

Copilot AI review requested due to automatic review settings April 13, 2026 17:36

Copilot AI reviewed Apr 13, 2026

View reviewed changes

Copilot started reviewing on behalf of Aishwarya-Tonpe April 13, 2026 17:48 View session

Aishwarya-Tonpe force-pushed the hf-models-clean branch from f689460 to 2a47dc8 Compare April 13, 2026 19:33

Aishwarya-Tonpe changed the title ~~Hf models clean~~ Hugging Face Model integration in Superbench Apr 14, 2026

Copilot AI review requested due to automatic review settings April 14, 2026 17:30

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 2a47dc8 to a61db26 Compare April 14, 2026 17:30

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Aishwarya-Tonpe force-pushed the hf-models-clean branch from a61db26 to 6bebb38 Compare April 14, 2026 18:27

Copilot started reviewing on behalf of Aishwarya-Tonpe April 14, 2026 18:34 View session

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 6bebb38 to 4eec2f7 Compare April 14, 2026 20:05

Copilot AI review requested due to automatic review settings April 14, 2026 20:34

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 4eec2f7 to 2f24e0f Compare April 14, 2026 20:34

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 2f24e0f to 18df07b Compare April 14, 2026 20:47

Copilot AI review requested due to automatic review settings April 14, 2026 20:51

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Copilot started reviewing on behalf of Aishwarya-Tonpe April 14, 2026 22:44 View session

Copilot started reviewing on behalf of Aishwarya-Tonpe April 14, 2026 23:07 View session

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 12a65ad to 7632427 Compare April 20, 2026 18:28

Copilot AI review requested due to automatic review settings April 23, 2026 22:31

Copilot started reviewing on behalf of Aishwarya-Tonpe April 23, 2026 22:31 View session

Aishwarya-Tonpe force-pushed the hf-models-clean branch 2 times, most recently from dca9515 to 7094628 Compare April 23, 2026 22:37

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 7094628 to 2ca3e68 Compare April 29, 2026 23:39

polarG reviewed May 26, 2026

View reviewed changes

Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py Outdated

polarG reviewed May 26, 2026

View reviewed changes

Comment thread superbench/benchmarks/micro_benchmarks/ort_inference_performance.py Outdated

Copilot AI reviewed Jun 2, 2026

View reviewed changes

Potential fix for pull request finding

18f13ef

Co-authored-by: Copilot Autofix powered by AI <[email protected]>

Copilot AI review requested due to automatic review settings June 2, 2026 17:29

Copilot started reviewing on behalf of Aishwarya-Tonpe June 2, 2026 17:29 View session

Aishwarya-Tonpe and others added 3 commits June 2, 2026 10:29

Potential fix for pull request finding

83a533f

Co-authored-by: Copilot Autofix powered by AI <[email protected]>

Potential fix for pull request finding

44da2e1

Co-authored-by: Copilot Autofix powered by AI <[email protected]>

Potential fix for pull request finding

864a8e9

Co-authored-by: Copilot Autofix powered by AI <[email protected]>

Copilot AI reviewed Jun 2, 2026

View reviewed changes

Comment thread superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py

Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py

Potential fix for pull request finding

54e4153

Co-authored-by: Copilot Autofix powered by AI <[email protected]>

Aishwarya-Tonpe force-pushed the hf-models-clean branch from c33dda1 to 54e4153 Compare June 2, 2026 17:54

Potential fix for pull request finding

a5d845c

Co-authored-by: Copilot Autofix powered by AI <[email protected]>

Copilot AI review requested due to automatic review settings June 2, 2026 17:55

Copilot started reviewing on behalf of Aishwarya-Tonpe June 2, 2026 17:56 View session

Copilot AI reviewed Jun 2, 2026

View reviewed changes

Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py

Comment thread superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py Outdated

Copilot AI review requested due to automatic review settings June 2, 2026 19:40

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 4603264 to 40eb599 Compare June 2, 2026 19:40

Copilot started reviewing on behalf of Aishwarya-Tonpe June 2, 2026 19:41 View session

Copilot AI reviewed Jun 2, 2026

View reviewed changes

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 40eb599 to beebe48 Compare June 2, 2026 19:57

Copilot AI review requested due to automatic review settings June 2, 2026 20:10

Aishwarya-Tonpe force-pushed the hf-models-clean branch from beebe48 to 603ccf6 Compare June 2, 2026 20:10

Copilot started reviewing on behalf of Aishwarya-Tonpe June 2, 2026 20:11 View session

Copilot AI reviewed Jun 2, 2026

View reviewed changes

Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py

Potential fix for pull request finding

09f14b2

Co-authored-by: Copilot Autofix powered by AI <[email protected]>

Aishwarya-Tonpe force-pushed the hf-models-clean branch from 603ccf6 to 09f14b2 Compare June 3, 2026 17:45

Merge branch 'main' into hf-models-clean

621c826

Copilot AI review requested due to automatic review settings June 3, 2026 20:40

Copilot started reviewing on behalf of polarG June 3, 2026 20:40 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

		loader = HuggingFaceModelLoader(allow_remote_code=allow_remote_code)
		hf_model, _, _ = loader.load_model_from_config(model_config, device='cpu')

		from superbench.benchmarks.micro_benchmarks.huggingface_model_loader import HuggingFaceModelLoader # noqa: E402
		from superbench.benchmarks.micro_benchmarks.model_source_config import ModelSourceConfig # noqa: E402

Conversation

Aishwarya-Tonpe commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

Testing

Usage

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Aishwarya-Tonpe commented Apr 13, 2026 •

edited

Loading

codecov Bot commented Jun 2, 2026 •

edited

Loading