fix(perf): unify HF and ONNX paths through PerfBenchmark#659
Open
xieofxie wants to merge 8 commits into
Open
Conversation
`winml perf -m hf/model` and `winml perf -m model.onnx` previously ran two completely different pipelines: HF went through the full AOT build (export -> optimize -> quantize -> compile) via PerfBenchmark, while .onnx files bypassed the pipeline entirely and ran a raw ORT JIT load through _run_onnx_benchmark. Same user-facing command, different code path, non-comparable numbers, and several CLI flags (--no-quantize, --rebuild, --ignore-cache, --precision) silently no-oped on the ONNX path. Both paths now flow through PerfBenchmark, which dispatches to WinMLAutoModel.from_pretrained or .from_onnx based on the input. The ONNX branch in _load_model (previously dead code) is now the live entry point, so an .onnx file goes through optimize -> [quantize] -> [compile] just like the HF flow, minus the export stage. - Delete _run_onnx_benchmark and its private helpers' stale references. - Drop the is_onnx dispatcher branch in the CLI; keep is_onnx only for the file-exists check, the --shape-config warning (shapes are baked into a pre-exported ONNX), and feeding --op-tracing the raw input. - Refresh docstrings on the perf command and PerfBenchmark._load_model. - Update the CLI test to assert ONNX inputs route through PerfBenchmark.run; refresh e2e docstrings.
Contributor
Author
|
could wait for perf e2e |
Contributor
Author
Done and tested in qnn |
Collaborator
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean, well-motivated change. Unifying both paths through PerfBenchmark removes ~100 lines of duplicated benchmark logic and makes latency numbers directly comparable — solid improvement.
A few inline comments below, mostly nits and one suggestion for robustness.
Contributor
Author
|
--ep cpu needs fix.. |
Contributor
Author
|
WinMLAutoModel.from_onnx has at least two issues:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #596.
winml perf -m hf/modelandwinml perf -m model.onnxpreviously ran two completely different pipelines: HF went through the full AOT build (export → optimize → quantize → compile) viaPerfBenchmark, while.onnxfiles bypassed the pipeline entirely and ran a raw ORT JIT load through_run_onnx_benchmark. Same user-facing command, non-comparable numbers, and several flags (--no-quantize,--rebuild,--ignore-cache,--precision) silently no-oped on the ONNX path.PerfBenchmark, which dispatches toWinMLAutoModel.from_pretrainedor.from_onnx. Theis_onnxbranch in_load_model(previously dead code) is now the live entry point, so an.onnxfile runs optimize → [quantize] → [compile] like the HF flow minus export._run_onnx_benchmarkand the duplicate hardware-monitor / stats-collection logic it carried. The CLI keepsis_onnxonly for the file-exists check, the--shape-configwarning (shapes are baked into pre-exported ONNX), and feeding--op-tracingthe raw input path.perfcommand,PerfBenchmark._load_model, and the loop helpers to drop stale references; update the CLI test to assert ONNX inputs route throughPerfBenchmark.run.