fix(perf): unify HF and ONNX paths through PerfBenchmark by xieofxie · Pull Request #659 · microsoft/winml-cli

xieofxie · 2026-05-19T03:10:27Z

Summary

Fixes #596.

winml perf -m hf/model and winml perf -m model.onnx previously ran two completely different pipelines: HF went through the full AOT build (export → optimize → quantize → compile) via PerfBenchmark, while .onnx files bypassed the pipeline entirely and ran a raw ORT JIT load through _run_onnx_benchmark. Same user-facing command, non-comparable numbers, and several flags (--no-quantize, --rebuild, --ignore-cache, --precision) silently no-oped on the ONNX path.
Both inputs now flow through PerfBenchmark, which dispatches to WinMLAutoModel.from_pretrained or .from_onnx. The is_onnx branch in _load_model (previously dead code) is now the live entry point, so an .onnx file runs optimize → [quantize] → [compile] like the HF flow minus export.
Delete _run_onnx_benchmark and the duplicate hardware-monitor / stats-collection logic it carried. The CLI keeps is_onnx only for the file-exists check, the --shape-config warning (shapes are baked into pre-exported ONNX), and feeding --op-tracing the raw input path.
Refresh docstrings on the perf command, PerfBenchmark._load_model, and the loop helpers to drop stale references; update the CLI test to assert ONNX inputs route through PerfBenchmark.run.

`winml perf -m hf/model` and `winml perf -m model.onnx` previously ran two completely different pipelines: HF went through the full AOT build (export -> optimize -> quantize -> compile) via PerfBenchmark, while .onnx files bypassed the pipeline entirely and ran a raw ORT JIT load through _run_onnx_benchmark. Same user-facing command, different code path, non-comparable numbers, and several CLI flags (--no-quantize, --rebuild, --ignore-cache, --precision) silently no-oped on the ONNX path. Both paths now flow through PerfBenchmark, which dispatches to WinMLAutoModel.from_pretrained or .from_onnx based on the input. The ONNX branch in _load_model (previously dead code) is now the live entry point, so an .onnx file goes through optimize -> [quantize] -> [compile] just like the HF flow, minus the export stage. - Delete _run_onnx_benchmark and its private helpers' stale references. - Drop the is_onnx dispatcher branch in the CLI; keep is_onnx only for the file-exists check, the --shape-config warning (shapes are baked into a pre-exported ONNX), and feeding --op-tracing the raw input. - Refresh docstrings on the perf command and PerfBenchmark._load_model. - Update the CLI test to assert ONNX inputs route through PerfBenchmark.run; refresh e2e docstrings.

xieofxie · 2026-05-19T03:13:24Z

could wait for perf e2e

xieofxie · 2026-05-25T07:39:58Z

could wait for perf e2e

Done and tested in qnn

DingmaomaoBJTU

Clean, well-motivated change. Unifying both paths through PerfBenchmark removes ~100 lines of duplicated benchmark logic and makes latency numbers directly comparable — solid improvement.

A few inline comments below, mostly nits and one suggestion for robustness.

xieofxie · 2026-05-25T09:29:44Z

--ep cpu needs fix..

xieofxie · 2026-05-26T03:00:26Z

WinMLAutoModel.from_onnx has at least two issues:

when perf --device gpu without ep, it will analyze on all eps but will only perf on one ep
for same model path, if run first with --device cpu, it will cache a cpu model (openvino for example), when running again with --device gpu, it will still load the cpu cache but could not run (dml for example)

xieofxie requested a review from a team as a code owner May 19, 2026 03:10

Merge remote-tracking branch 'origin/main' into hualxie/unify_perf

340b6c7

DingmaomaoBJTU reviewed May 25, 2026

View reviewed changes

Comment thread src/winml/modelkit/commands/perf.py

Comment thread src/winml/modelkit/commands/perf.py

Comment thread tests/unit/commands/test_perf_cli.py

hualxie and others added 2 commits May 25, 2026 17:24

address comments

cf3ecce

Merge branch 'main' into hualxie/unify_perf

fcd83a9

hualxie added 3 commits May 26, 2026 10:29

Merge remote-tracking branch 'origin/main' into hualxie/unify_perf

0308b49

Merge remote-tracking branch 'origin/main' into hualxie/unify_perf

ed6d790

Merge remote-tracking branch 'origin/main' into hualxie/unify_perf

494cf22

Merge remote-tracking branch 'origin/main' into hualxie/unify_perf

6335984

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(perf): unify HF and ONNX paths through PerfBenchmark#659

fix(perf): unify HF and ONNX paths through PerfBenchmark#659
xieofxie wants to merge 8 commits into
mainfrom
hualxie/unify_perf

xieofxie commented May 19, 2026

Uh oh!

xieofxie commented May 19, 2026

Uh oh!

xieofxie commented May 25, 2026

Uh oh!

DingmaomaoBJTU left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xieofxie commented May 25, 2026

Uh oh!

xieofxie commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xieofxie commented May 19, 2026

Summary

Uh oh!

xieofxie commented May 19, 2026

Uh oh!

xieofxie commented May 25, 2026

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xieofxie commented May 25, 2026

Uh oh!

xieofxie commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants