Skip to content

Hugging Face Model integration in Superbench#803

Open
Aishwarya-Tonpe wants to merge 12 commits into
mainfrom
hf-models-clean
Open

Hugging Face Model integration in Superbench#803
Aishwarya-Tonpe wants to merge 12 commits into
mainfrom
hf-models-clean

Conversation

@Aishwarya-Tonpe
Copy link
Copy Markdown
Contributor

@Aishwarya-Tonpe Aishwarya-Tonpe commented Apr 13, 2026

Adds support for loading and benchmarking models from HuggingFace Hub across Inference micro-benchmarks -ORT/TensorRT inference. Users can run any compatible HF-hosted model through the existing benchmark harness using --model_source huggingface --model_identifier <org/model>.

SuperBench previously only supported in-house model definitions with hardcoded architectures. Adding new models required code changes. This PR allows benchmarking any compatible HuggingFace model with a CLI flag change, including gated models via HF_TOKEN.

Key Changes

New modules:

  • HuggingFaceModelLoader — Downloads, caches, and loads models from HF Hub. Estimates parameter count from model config (few KB) and checks GPU
    memory before downloading full weights to avoid failed multi-GB downloads.

  • ModelSourceConfig — Dataclass for model source configuration (in-house / huggingface), dtype, revision, auth token, and device mapping.

    Micro-benchmarks (inference):

  • ORT inference — Downloads HF model → exports to ONNX → runs ORT inference. Handles both vision (pixel_values) and NLP (input_ids) inputs
    automatically.

  • TensorRT inference — Same flow: download → ONNX export → trtexec engine build → inference. Includes dynamic input shape detection from the
    exported ONNX graph.

  • ONNX exporter — New export_huggingface_model() method with vision/NLP auto-detection, dynamic axes, and external data support for large models
    (>2GB).

Testing

  • test_model_source_config.py — Unit tests for validation, defaults, and edge cases.
  • test_huggingface_loader.py — Unit tests for dtype conversion, model size calculation, memory estimation, and param count estimation.
  • test_huggingface_e2e.py — End-to-end integration tests covering micro-benchmarks with real HF models.

Usage

Training benchmark

ORT inference
python examples/benchmarks/ort_inference_performance.py
--model_source huggingface --model_identifier bert-base-uncased

TensorRT inference
python examples/benchmarks/tensorrt_inference_performance.py
--model_source huggingface --model_identifier microsoft/resnet-50

Gated models
export HF_TOKEN=hf_xxxxx

@Aishwarya-Tonpe Aishwarya-Tonpe requested a review from a team as a code owner April 13, 2026 17:36
Copilot AI review requested due to automatic review settings April 13, 2026 17:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds HuggingFace Hub as a first-class model source across SuperBench training benchmarks and ORT/TensorRT inference micro-benchmarks, enabling users to benchmark arbitrary HF models via CLI flags (including gated models via HF_TOKEN).

Changes:

  • Introduces ModelSourceConfig and HuggingFaceModelLoader for unified HF model configuration/loading and memory-fit checks.
  • Extends PyTorch model benchmarks to optionally load HF backbones and wrap them with task-specific heads.
  • Adds HF→ONNX export support and integrates HF flows into ORT and TensorRT inference micro-benchmarks, plus new tests and examples.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/benchmarks/micro_benchmarks/test_model_source_config.py Adds unit tests for ModelSourceConfig validation/defaulting.
tests/benchmarks/micro_benchmarks/test_huggingface_loader.py Adds unit tests for HF loader dtype handling, load flow, and size estimation.
tests/benchmarks/micro_benchmarks/test_huggingface_e2e.py Adds integration tests that download real HF models and validate basic forward pass.
superbench/benchmarks/model_benchmarks/pytorch_mixtral_impl.py Adds HF config customization + wrapper and HF-loading branch for Mixtral benchmark.
superbench/benchmarks/model_benchmarks/pytorch_lstm.py Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_llama.py Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_gpt2.py Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_cnn.py Adds HF-loading path + wrapper for HF vision backbones, keeps in-house torchvision path.
superbench/benchmarks/model_benchmarks/pytorch_bert.py Adds HF-loading path + wrapper and refactors in-house model creation.
superbench/benchmarks/model_benchmarks/pytorch_base.py Adds shared HF model loading flow, memory estimation, and CLI args for model source/identifier.
superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py Adds HF model preprocessing: config-only memory check, HF load, ONNX export, TRT build command.
superbench/benchmarks/micro_benchmarks/ort_inference_performance.py Adds HF preprocessing (config memory check, HF load, ONNX export/quantize) + dynamic input handling.
superbench/benchmarks/micro_benchmarks/model_source_config.py New dataclass encapsulating model source, identifier, dtype, token, and loader kwargs.
superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py New loader for HF Hub with tokenizer support, size/memory estimation utilities, and pre-checks.
superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py Adds HF model ONNX export with vision/NLP detection, dynamic axes, and optional external data output.
examples/benchmarks/tensorrt_inference_performance.py Updates example script to show in-house vs HF usage via CLI.
examples/benchmarks/pytorch_huggingface_models.py New example demonstrating HF-backed training benchmarks, incl. distributed option.
examples/benchmarks/ort_inference_performance.py Updates ORT example script to show in-house vs HF usage via CLI.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread superbench/benchmarks/model_benchmarks/pytorch_base.py Outdated
Comment thread tests/benchmarks/micro_benchmarks/test_model_source_config.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/ort_inference_performance.py
Comment thread superbench/benchmarks/micro_benchmarks/model_source_config.py Outdated
Comment thread superbench/benchmarks/model_benchmarks/pytorch_base.py Outdated
@Aishwarya-Tonpe Aishwarya-Tonpe changed the title Hf models clean Hugging Face Model integration in Superbench Apr 14, 2026
Copilot AI review requested due to automatic review settings April 14, 2026 17:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 13 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread superbench/benchmarks/model_benchmarks/pytorch_base.py Outdated
Comment thread superbench/benchmarks/model_benchmarks/pytorch_base.py Outdated
Comment thread superbench/benchmarks/model_benchmarks/pytorch_base.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/ort_inference_performance.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/ort_inference_performance.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py
Comment thread superbench/benchmarks/micro_benchmarks/model_source_config.py
Comment thread superbench/benchmarks/micro_benchmarks/model_source_config.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/model_source_config.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py
Comment thread tests/benchmarks/micro_benchmarks/test_huggingface_e2e.py Outdated
Copilot AI review requested due to automatic review settings April 14, 2026 20:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/_export_torch_to_onnx.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py Outdated
Comment thread tests/benchmarks/micro_benchmarks/test_huggingface_loader.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/benchmarks/micro_benchmarks/test_huggingface_e2e.py
Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py
Comment thread tests/benchmarks/micro_benchmarks/test_huggingface_e2e.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py Outdated
…e benchmarks

- Add HuggingFaceModelLoader for downloading and caching models from HF Hub
- Support both NLP (AutoModelForCausalLM) and vision (AutoModelForImageClassification) models
- Add model_source and model_identifier parameters to TensorRT/ORT benchmarks
- Add ONNX export pipeline for HuggingFace models with dynamic axes
- Derive vision input shapes from ONNX graph dims with HF config fallback
- Filter ONNX initializers from graph.input for correct NLP input handling
- Add PyTorch 2.8+ compatibility (external_data vs use_external_data_format)
- Add example script, unit tests, and config schema updates
- Support HF_TOKEN env var for gated model access
Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py Outdated
Comment thread superbench/benchmarks/micro_benchmarks/ort_inference_performance.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Comment thread tests/benchmarks/micro_benchmarks/test_model_source_config.py
Comment thread superbench/benchmarks/micro_benchmarks/ort_inference_performance.py
Comment thread superbench/benchmarks/micro_benchmarks/ort_inference_performance.py
Comment thread superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py Outdated
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
Copilot AI review requested due to automatic review settings June 2, 2026 17:29
Aishwarya-Tonpe and others added 3 commits June 2, 2026 10:29
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
Copilot AI review requested due to automatic review settings June 2, 2026 17:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py
Comment thread superbench/benchmarks/micro_benchmarks/tensorrt_inference_performance.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

❌ Patch coverage is 33.68984% with 372 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.54%. Comparing base (67298ae) to head (621c826).

Files with missing lines Patch % Lines
...marks/micro_benchmarks/huggingface_model_loader.py 45.81% 110 Missing ⚠️
...micro_benchmarks/tensorrt_inference_performance.py 19.04% 102 Missing ⚠️
...arks/micro_benchmarks/ort_inference_performance.py 22.52% 86 Missing ⚠️
...nchmarks/micro_benchmarks/_export_torch_to_onnx.py 16.27% 72 Missing ⚠️
...benchmarks/micro_benchmarks/model_source_config.py 94.28% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #803      +/-   ##
==========================================
- Coverage   86.02%   82.54%   -3.48%     
==========================================
  Files         103      105       +2     
  Lines        7950     8498     +548     
==========================================
+ Hits         6839     7015     +176     
- Misses       1111     1483     +372     
Flag Coverage Δ
cpu-python3.10-unit-test 68.03% <28.08%> (-2.85%) ⬇️
cpu-python3.7-unit-test 67.51% <28.21%> (-2.80%) ⬇️
cuda-unit-test 80.54% <33.09%> (-3.42%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI review requested due to automatic review settings June 2, 2026 19:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py
Comment thread superbench/benchmarks/micro_benchmarks/ort_inference_performance.py
Comment thread examples/benchmarks/tensorrt_inference_performance.py
Comment thread superbench/benchmarks/micro_benchmarks/ort_inference_performance.py
Copilot AI review requested due to automatic review settings June 2, 2026 20:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Comment thread superbench/benchmarks/micro_benchmarks/huggingface_model_loader.py
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
Copilot AI review requested due to automatic review settings June 3, 2026 20:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.

Comment on lines +309 to +316
# Export directly to final destination to avoid path issues with external data
onnx_path = exporter.export_huggingface_model(
model=hf_model,
model_name=model_name_with_precision,
batch_size=self._args.batch_size,
seq_length=self._args.seq_length,
output_dir=str(proc_output_path),
)
Comment on lines +281 to +282
loader = HuggingFaceModelLoader(allow_remote_code=allow_remote_code)
hf_model, _, _ = loader.load_model_from_config(model_config, device='cpu')
Comment on lines +311 to +317
onnx_path = exporter.export_huggingface_model(
model=hf_model,
model_name=model_name,
batch_size=self._args.batch_size,
seq_length=self._args.seq_length,
output_dir=output_dir,
)
Comment on lines +21 to +22
from superbench.benchmarks.micro_benchmarks.huggingface_model_loader import HuggingFaceModelLoader # noqa: E402
from superbench.benchmarks.micro_benchmarks.model_source_config import ModelSourceConfig # noqa: E402

Uses prajjwal1/bert-tiny which is a small public BERT model (~17MB).
"""
model, config, tokenizer = loader.load_model('prajjwal1/bert-tiny', device='cpu')

Uses distilbert/distilgpt2 which is a small public GPT-2 model (~82MB).
"""
model, config, tokenizer = loader.load_model('distilbert/distilgpt2', device='cpu')
"""Test loading model using ModelSourceConfig via load_model_from_config."""
config = ModelSourceConfig(source='huggingface', identifier='prajjwal1/bert-tiny', torch_dtype='float32')

model, hf_config, tokenizer = loader.load_model_from_config(config, device='cpu')

def test_load_model_with_dtype(self, loader):
"""Test loading model and converting dtype after load."""
model, config, tokenizer = loader.load_model('prajjwal1/bert-tiny', device='cpu')
@pytest.mark.skipif(not torch.cuda.is_available(), reason='Requires GPU')
def test_load_model_to_gpu(self, loader):
"""Test loading model and moving to GPU."""
model, config, tokenizer = loader.load_model('prajjwal1/bert-tiny', device='cpu')

def test_architecture_detection(self, loader):
"""Test that architecture is correctly detected from loaded model."""
model, config, tokenizer = loader.load_model('prajjwal1/bert-tiny', device='cpu')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants