Feat/sam3 onnx hub support (issue #324) by AChenreddy24 · Pull Request #582 · microsoft/winml-cli

AChenreddy24 · 2026-05-12T00:28:45Z

Adds SAM 3 support and introduces a generic Hub-hosted ONNX input form (<org>/<repo>/<path>.onnx) that downloads pre-exported ONNX files from HuggingFace via huggingface_hub. SAM 3 is the first consumer.

The standard SAM 2-style export route is blocked: optimum-onnx pins transformers<4.58, but facebook/sam3 requires transformers>=5. This PR uses the pre-exported onnx-community/sam3-tracker-ONNX artifacts via the existing Scenario D pipeline.

Changes

New Hub-ONNX URI resolver (loader/onnx_hub.py)
Wired into wmk config, wmk build, wmk inspect, wmk run, wmk serve, wmk perf, wmk eval
SAM 3 catalog entries (encoder + decoder) in the new schema
Fixed is_quantized_onnx to detect QOperator format (ConvInteger, MatMulInteger, QLinear*) — was QDQ-only
Fixed is_pre_quantized build branch to truly skip the optimize stage (previously crashed on QOperator models)

Tests

4649 unit tests pass (+12 new regression tests)
6 integration tests pass (live HF download)
Built decoder is byte-identical to input (verified numerically)
Ruff lint + format clean

Limitations

SAM 3 vision encoder requires NPU EP (no CPU ConvInteger kernel in onnxruntime-windowsml); decoder runs on CPU + NPU.

SAM 3 (facebook/sam3) requires transformers>=5, but optimum-onnx pins transformers<4.58, so the standard HF + Optimum export route SAM 2 uses is blocked. This change wires SAM 3 in through the existing pre-exported ONNX (Scenario D) pipeline by recognizing path-style Hub references ('org/repo/path/to/file.onnx') and downloading the file once via huggingface_hub. Changes: - New src/winml/modelkit/loader/onnx_hub.py: is_hf_onnx_path, resolve_hf_onnx_path. Mirrors the is_xxx/resolve_xxx pair pattern used by is_compiled_onnx/is_quantized_onnx. - Wire into wmk config, wmk build, and WinMLAutoModel.from_pretrained with the same 2-line 'if is_hf_onnx_path(x): x = str(resolve_hf_onnx_path(x))' pattern. - Add 2 sam3_tracker entries to hub_models.json so 'wmk hub --model-type sam3_tracker' lists them. - Tests: 12 unit tests for the resolver, 2 CLI plumbing tests, and 3 end-to-end integration tests (slow/network/integration). The existing build_onnx_model pipeline runs unchanged on the resolved local path: the int8 ONNX is auto-detected as quantized via is_quantized_onnx, the quantization stage is skipped, and the artifact flows through Optimize -> Analyze<->Optimize -> Compile -> Finalize.

consumer. Also fix two latent bugs in the build pipeline that any QOperator-quantized model would have hit. Background ---------- Issue #324 asks for SAM 2-style native HuggingFace export support for ``facebook/sam3`` (Sam3*IOConfig, Sam3ModelPatcher, etc.). That path is blocked by an upstream constraint: ``optimum-onnx`` pins ``transformers<4.58``, but ``facebook/sam3`` requires ``transformers>=5`` (the ``Sam3Model`` class only exists there). Resolving the pin would need either an upstream optimum-onnx PR or vendoring SAM 3 patcher code that bypasses optimum entirely. Instead, this PR introduces a generic "Hub-hosted ONNX file" input form and lets SAM 3 ride on the existing pre-exported-ONNX (Scenario D) pipeline that already worked for any local ``.onnx`` file. The infrastructure is reusable for any future model with similar version constraints (Whisper / Phi / RWKV / etc. all ship pre-exported ONNX repos on the Hub today). What's added ------------ 1. Hub-hosted ONNX URI resolver - ``loader/onnx_hub.py``: ``is_hf_onnx_path()``, ``resolve_hf_onnx_path()``, ``maybe_resolve_hf_onnx_path()`` - Recognizes inputs of the form ``<org>/<repo>/<path/to/file>.onnx``, downloads via ``huggingface_hub.hf_hub_download``, returns the local cache path. Falls through unchanged for HF model IDs / local paths / ``None``. - Best-effort ``.onnx_data`` sidecar fetch for >2 GiB models. ``EntryNotFoundError`` is expected (inlined weights); ``OSError`` surfaces as a WARNING (disk/permission/network problems should not be silently dropped — the model would later fail to load with a confusing error). 2. CLI wiring (every command that accepts a model identifier) - ``wmk config`` / ``wmk build``: resolve at the top of the command - ``wmk inspect``: friendly "ONNX inspection not yet supported" error for Hub-ONNX refs (matches local .onnx UX) - ``wmk run`` / ``wmk serve``: ``InferenceEngine.load()`` and ``load_schema_only()`` resolve before routing - ``wmk perf``: resolve before the ``Path(model_id).suffix == '.onnx' and exists()`` check (otherwise Hub refs are mistaken for missing local files and rejected with FileNotFoundError) - ``wmk eval``: ``_resolve_model_path`` resolves before the local existence check - ``WinMLAutoModel.from_pretrained``: resolves before HF/ONNX dispatch - Stage-tool commands (``analyze``/``optimize``/``quantize``/ ``compile``/``export``) intentionally NOT wired — they take ``click.Path(exists=True)`` and operate on local files only. 3. SAM 3 catalog entries (``data/hub_models.json``) - Two entries for ``onnx-community/sam3-tracker-ONNX``: the vision encoder and the prompt-encoder + mask-decoder. Note: was already present in the base branch — this PR does not modify it. 4. Integration tests (``tests/integration/test_sam3_e2e.py``) - 4 decoder tests + 2 encoder tests, marked ``@slow @network @integration`` - Asserts: Hub URI resolves, quantization detected, build produces ``model.onnx``, autoconf produces an ``optimization_config``, and for the encoder: pre-quantized round-trip preserves the ``ConvInteger`` / ``MatMulInteger`` ops byte-identically. - Skips narrowed to ``HfHubHTTPError`` / ``OSError`` only — real bugs in the build/analyze pipeline will surface as test failures rather than green skips. Bugs fixed (would affect any QOperator-quantized model, not just SAM 3) --------------------------------------------------------------------- A. ``is_quantized_onnx`` only detected QDQ format (``QuantizeLinear`` / ``DequantizeLinear``). The SAM 3 vision encoder uses ``QuantFormat.QOperator`` (no QDQ pairs, just integer ops: ``ConvInteger``, ``MatMulInteger``, ``QLinear*``). Previously misclassified as not quantized → routed through the optimize + quantize stages → tried to re-quantize an already-int8 model. Fix: ``compiler/utils.py`` adds ``QOPERATOR_OP_TYPES`` and ``QUANTIZATION_OP_TYPES = QDQ ∪ QOperator``. ``onnx/detection.py`` uses the union. B. The ``is_pre_quantized`` branches in ``build_onnx_model``, ``build_hf_model``, and the CLI's ``_build_onnx_pipeline`` logged "skipping optimize" but still invoked ``optimize_onnx`` → ``ort_graph`` → loaded the model into an ORT session. For QOperator models on hosts without a CPU ``ConvInteger`` kernel (e.g. ``onnxruntime-windowsml`` 1.23.x), this crashes the build stage with ``NOT_IMPLEMENTED``. Fix: ``build/common.py::run_optimize_analyze_loop`` gains a real ``skip_optimize: bool`` knob that bypasses ``optimize_onnx`` and the autoconf re-optim loop, just copying the input as the "optimized" artifact. All three pre-quantized branches now pass ``skip_optimize=True``. The downstream behavior (skip quantize + skip compile when configured) is unchanged. Verification ------------ - ``onnx.checker.check_model(full_check=True)`` passes on built artifacts - Built decoder produces NUMERICALLY IDENTICAL outputs to input decoder (``max|built - input| = 0.0`` across all 3 outputs) — pre-quantized round-trip is a true pass-through, not just structurally similar - Encoder runtime feasibility on CPU is identical to input encoder (both fail on CPU because of upstream ORT ``ConvInteger`` kernel gap; encoder requires NPU EP — unchanged from input) - Decoder real inference produces sane SAM-shaped outputs: ``iou_scores ∈ [0, 1]``, ``pred_masks`` logits span both signs, ``object_score_logits`` non-degenerate Test count ---------- - 4518 unit tests pass (+12 new regression tests across: ``test_onnx_hub.py``, ``test_detection.py``, ``test_eval.py``, ``test_perf_cli.py``, ``test_engine.py``) - 6 integration tests pass (live HF download, ~30s) - Ruff check + format clean on all 24 changed files Silent-skip audit (per SAM 2 review feedback) --------------------------------------------- Removed ``except Exception: pytest.skip(...)`` patterns from SAM 3 integration tests — they were swallowing real bugs (including the ``ConvInteger`` regression fixed in this PR). All skips now narrowed to ``HfHubHTTPError`` / ``OSError`` (network) or specific runtime exceptions; ``RuntimeError`` from ``build_onnx_model`` and ``analyze_onnx`` now fails loudly. Removed unnecessary ``pytest.importorskip("huggingface_hub")`` (it's a hard transitive dep). Sidecar download ``OSError`` now logs WARNING instead of DEBUG. Known limitations (not addressed in this PR) -------------------------------------------- - SAM 3 encoder requires NPU EP (QNN / OpenVINO / VitisAI) because ``onnxruntime-windowsml`` ships no CPU kernel for ``ConvInteger(10)``. This is true for both the input and built artifact — our build preserves runtime behavior exactly. Decoder uses ``MatMulInteger`` and runs on either CPU or NPU. - Catalog entries for SAM 3 have ``quantization: null`` so ``wmk perf`` falls back to default random-input shapes that violate the SAM 3 decoder's internal reshape constraints. Populating ``quantization.input_tensors`` with proper shape hints (the pattern every other catalog entry follows) is the recommended fix; out of scope for this PR.

+# Union of all quantization op types (QDQ + QOperator). Use this for
+# "is the model already quantized?" detection regardless of which format
+# the producer used.
+QUANTIZATION_OP_TYPES: frozenset[str] = QDQ_OP_TYPES | QOPERATOR_OP_TYPES


+    """Pre-exported SAM 3 ONNX flows through Scenario D end-to-end."""
+
+    @pytest.fixture(scope="class")
+    def sam3_onnx_path(self) -> Path:


+    """
+
+    @pytest.fixture(scope="class")
+    def encoder_onnx_path(self) -> Path:


+from typing import TYPE_CHECKING
+
+import numpy as np
+import onnx


Copilot

Pull request overview

Adds support for Hub-hosted pre-exported ONNX inputs (<org>/<repo>/<path>.onnx) to enable SAM 3 consumption via Scenario D, and fixes pre-quantized (QDQ + QOperator) detection/routing so already-quantized models skip ORT optimization/quantization stages that can crash on unsupported integer kernels.

Changes:

Introduces loader/onnx_hub.py to recognize and download Hub-hosted ONNX files (plus best-effort .onnx_data sidecars), and wires resolution into multiple entry points (CLI + inference/model loading).
Expands is_quantized_onnx to detect QOperator quantized models and adds shared quant-op constants.
Updates build pipelines to honor “pre-quantized” routing (skip optimize/quantize) and adds regression/unit/integration tests for SAM 3 and Hub-ONNX refs.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/unit/onnx/test_detection.py	New tests for quantized (QDQ + QOperator) and compiled ONNX detection.
tests/unit/loader/test_onnx_hub.py	New unit tests for Hub-ONNX path detection/splitting/downloading behavior.
tests/unit/inference/test_engine.py	Ensures Hub-ONNX refs are resolved before inference routing logic.
tests/unit/commands/test_perf_cli.py	Verifies `winml perf` resolves Hub-ONNX refs before ONNX path validation.
tests/unit/commands/test_hub_onnx_ref.py	New CLI tests for `wmk config`/`wmk build` with Hub-hosted ONNX refs.
tests/unit/commands/test_eval.py	Adds Hub-ONNX resolution coverage in eval model-path resolution.
tests/unit/build/test_onnx.py	Regression tests ensuring pre-quantized ONNX skips optimize/quantize stages.
tests/unit/build/test_hf.py	Regression tests ensuring pre-quantized exported ONNX skips optimize/quantize stages.
tests/integration/test_sam3_e2e.py	End-to-end integration tests for SAM3 decoder + encoder via Hub-hosted ONNX artifacts.
src/winml/modelkit/onnx/detection.py	Updates quantized detection to use unified QDQ+QOperator op set.
src/winml/modelkit/models/auto.py	Resolves Hub-ONNX refs before fast-path ONNX/HF routing in `from_pretrained`.
src/winml/modelkit/loader/onnx_hub.py	New implementation for Hub-hosted ONNX ref resolution via `hf_hub_download`.
src/winml/modelkit/loader/init.py	Re-exports Hub-ONNX helper APIs.
src/winml/modelkit/inference/engine.py	Normalizes Hub-ONNX refs to local paths in `load` and `load_schema_only`.
src/winml/modelkit/data/hub_models.json	Adds SAM3 encoder/decoder catalog entries using Hub-ONNX refs.
src/winml/modelkit/compiler/utils.py	Adds QOperator + union quantization op-type constants.
src/winml/modelkit/compiler/init.py	Exposes new quantization op-type constants from compiler package.
src/winml/modelkit/commands/perf.py	Resolves Hub-ONNX refs prior to ONNX path checks.
src/winml/modelkit/commands/inspect.py	Treats Hub-ONNX refs as ONNX inputs for consistent “not supported” messaging.
src/winml/modelkit/commands/eval.py	Resolves Hub-ONNX refs before validating ONNX file existence.
src/winml/modelkit/commands/config.py	Resolves Hub-ONNX refs before dispatching config generation path.
src/winml/modelkit/commands/build.py	Resolves Hub-ONNX refs and adds skip-optimize plumbing for ONNX pipeline.
src/winml/modelkit/build/onnx.py	Ensures pre-quantized models skip ORT optimize and don’t crash on QOperator ops.
src/winml/modelkit/build/hf.py	Same as above for HF-exported ONNX artifacts that are already quantized.
src/winml/modelkit/build/common.py	Adds `skip_optimize` to the shared optimize/analyze loop.
README.md	Documents Hub-hosted ONNX input form and supported commands.

Comments suppressed due to low confidence (1)

src/winml/modelkit/commands/build.py:1162

--no-optimize sets extra_kwargs["skip_optimize"], but _build_onnx_pipeline() never reads it. As a result, the CLI flag has no effect unless is_quantized_onnx() happens to detect the model as pre-quantized. Consider consuming extra_kwargs.pop("skip_optimize", False) and combining it with is_pre_quantized (and setting max_iters=0 when skipping) so users can force skipping the optimize/autoconf stage for problematic ONNX inputs.

    from ..onnx import copy_onnx_model, is_quantized_onnx

    max_iters: int = extra_kwargs.pop("hack_max_optim_iterations", 3)

    # ── Validate + setup ─────────────────────────────────────────
    if not onnx_path.exists():
        raise FileNotFoundError(f"ONNX file not found: {onnx_path}")
    try:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    # ``ConvInteger``. Skip the optimize pass and the autoconf re-optim
+    # loop; analyze still runs lint-only.


        logger.info(
-            "Pre-quantized model detected (QDQ nodes present). "
+            "Pre-quantized model detected (QDQ or QOperator nodes present). "
            "Skipping optimize + quantize, running analyze-only."
        )
        stages_skipped.append("optimize")
-        # Optimize+analyze only, no autoconf re-optimization
+        # Analyze-only: skip ORT-based graph optimization (no kernel for
+        # QOperator ops like ConvInteger on the host EP), no autoconf loop.
        current_path, _, analyze_iters, analyze_unsupported, analyze_details = (


    if is_pre_quantized:
        logger.info(
-            "Pre-quantized model detected (QDQ nodes present). "
+            "Pre-quantized model detected (QDQ or QOperator nodes present). "
            "Skipping optimize + quantize, running analyze-only."
        )
        stages_skipped.append("optimize")
-        # Optimize+analyze only, no autoconf re-optimization
+        # Analyze-only: skip ORT-based graph optimization (no kernel for
+        # QOperator ops like ConvInteger on the host EP), no autoconf loop.
        current_path, _, analyze_iterations, analyze_unsupported_nodes, analyze_details = (


        ep: Target execution provider for the analyzer.
        device: Target device for the analyzer.
        max_optim_iterations: Maximum autoconf re-optimization rounds.
            0 means optimize+analyze only (no autoconf re-optimization).
+        skip_optimize: When True, skip the initial ``optimize_onnx`` call and
+            just copy the input model to ``optimized_path``. Used for
+            pre-quantized models (QDQ or QOperator format) where ORT-based


DingmaomaoBJTU

(supersedes the earlier review — consolidated with design feedback)

Code-level issues

See inline comments below.

One additional nit not inlined (existing code, not touched by this PR): _run_quantize_stage at line 952 of commands/build.py still says "(QDQ nodes already present)". Since this PR broadens detection to cover QOperator, the message should read "(QDQ or QOperator nodes already present)" for consistency with the log messages updated in build/onnx.py and build/hf.py.

Design feedback

Three structural concerns — the bug fixes (QOperator detection, skip_optimize) are solid, but the way the Hub-ONNX input form is introduced creates maintenance surface that should be addressed before or shortly after merge.

1. "Hub-hosted ONNX" is not a distinct input type — it is a download step

org/repo/path/file.onnx -> hf_hub_download() -> /local/cache/file.onnx

After the download, this is just a local .onnx file. The downstream pipeline does not (and should not) care that it once lived on the Hub. Yet the PR threads detection logic (is_hf_onnx_path / maybe_resolve_hf_onnx_path) through 7+ command entry points, giving it the status of a persistent input type. This inflates both the concept count and the code surface.

A cleaner model is: resolve once at a single entry point, return a local path, downstream is unaware.

2. Model input identification needs a single resolver — `is_hub_model` already exists

hub_utils.py already has is_hub_model() with comprehensive local-path rejection (checks Path.exists(), ./, ../, ~/, Windows drive letters). The new is_hf_onnx_path reimplements only Path.exists(), missing the other cases.

The codebase now has three parallel detection mechanisms with no shared logic:

Input form	Detector	Local-path rejection
HF model ID (`org/model`)	`is_hub_model()`	Full (exists, `./`, `../`, `~/`, Win drive)
Local ONNX file	scattered `path.suffix == ".onnx"` checks	N/A
Hub-hosted ONNX (`org/repo/path.onnx`)	`is_hf_onnx_path()` (new)	Partial (only `Path.exists()`)

Suggested direction: a single resolve_model_input() that classifies and resolves in one place, reusing is_hub_model's rejection logic. Each command calls it once; adding a fourth input form means changing one function, not 7+.

def resolve_model_input(value: str) -> ModelInput:
    # 1. Local-path rejection (reuse is_hub_model logic)
    # 2. If org/repo/path.onnx -> download -> return as local_onnx
    # 3. If org/model -> return as hf_id
    # 4. If local .onnx exists -> return as local_onnx
    # No persistent "hub_onnx" type needed

3. Pre-quantized ONNX handling should be decided once, not scattered across three pipelines

The detect-and-skip logic lives independently in three places with inconsistent behavior:

Location	Detects via	Skips optimize	Skips quantize
`build/onnx.py`	`is_quantized_onnx()`	`skip_optimize=True`	Explicit
`build/hf.py`	`is_quantized_onnx()`	`skip_optimize=True`	Explicit
`commands/build.py`	`is_quantized_onnx()`	`skip_optimize=True`	Relies on user config having `quant: null`

Plus _run_quantize_stage has its own is_quantized_onnx() guard (line 951) — a fourth detection point.

The CLI pipeline is the odd one out: if a user provides a config with quant settings for a pre-quantized model, it will attempt to re-quantize. The library functions protect against this; the CLI does not.

Suggested direction: detect once at pipeline entry, stamp the result onto WinMLBuildConfig (e.g. config.skip_optimize = True; config.quant = None), and have all downstream stages read config. This also eliminates redundant is_quantized_onnx calls on the same model.

4. No discovery mechanism for Hub ONNX files

The current UX requires the user to already know the exact file path inside a Hub repo (onnx/vision_encoder_int8.onnx), the right variant (fp32 vs int8, encoder vs decoder), and the task. Compare with the HF model ID flow where architecture, task, and export are all auto-detected.

The only discovery path is the static hub_models.json catalog, which is manually curated. If this is an intentional V1 scope limitation, it should be documented as such. Otherwise, consider accepting a repo ID (onnx-community/sam3-tracker-ONNX) and listing available .onnx files, or reading repo metadata to auto-configure task and roles.

DingmaomaoBJTU · 2026-05-26T12:36:52Z

+        # is downloaded once and treated as a local .onnx path thereafter.
+        from ..loader import maybe_resolve_hf_onnx_path
+
+        model_path = maybe_resolve_hf_onnx_path(str(model_path)) or str(model_path)


Dead code: or str(model_path) never triggers. maybe_resolve_hf_onnx_path(str(x)) always returns a non-None string when given a non-None string — it only returns None when the input is None. Since str(model_path) is never None, the or branch is unreachable.

Every other call site in this PR uses the simpler pattern:

model_id = maybe_resolve_hf_onnx_path(model_id)

Suggestion: drop the or to match the other sites, or add a comment explaining the defensive intent. Same applies to load_schema_only below (line 383).

DingmaomaoBJTU · 2026-05-26T12:36:52Z

-        **onnx_kwargs,
-        **config.optim,
-    )
+    # 1. Optimize (or skip for pre-quantized models)


skip_optimize=True + max_optim_iterations > 0 is not guarded. All current callers pair skip_optimize=True with max_optim_iterations=0, so this is not a live bug. However, the function contract allows a caller to pass both skip_optimize=True and max_optim_iterations=3, in which case the autoconf loop would discover flags and call optimize_onnx on a pre-quantized model — the exact crash this fix prevents.

Consider making the invariant self-enforcing:

if skip_optimize: max_optim_iterations = 0 # re-optimize would crash on pre-quantized models

DingmaomaoBJTU · 2026-05-26T12:36:52Z

+# Union of all quantization op types (QDQ + QOperator). Use this for
+# "is the model already quantized?" detection regardless of which format
+# the producer used.
+QUANTIZATION_OP_TYPES: frozenset[str] = QDQ_OP_TYPES | QOPERATOR_OP_TYPES


Consider adding DynamicQuantizeLinear (and DynamicQuantizeMatMul). DynamicQuantizeLinear is a standard ONNX op (opset 11+) used in dynamic quantization. Models quantized via onnxruntime.quantization with QuantType.QUInt8 in dynamic mode contain this op instead of static QDQ pairs or QOperator fused ops.

Without it, a dynamically-quantized model would not be detected by is_quantized_onnx and would be routed through optimize + quantize. If this is an intentional exclusion (SAM 3 uses static QOperator), a comment noting the limitation would help.

DingmaomaoBJTU · 2026-05-26T12:36:52Z

+    ``.onnx`` file, and best-effort fetches an optional
+    ``<filename>.onnx_data`` sidecar so the ONNX loader can find external
+    initializers.
+


Case-sensitive .onnx check vs case-insensitive gate in eval.py. This uses model_id.endswith(".onnx") (case-sensitive), while eval.py:_resolve_model_path gates on Path(value).suffix.lower() == ".onnx" (case-insensitive). A Hub ref ending in .ONNX would pass the eval gate but fail here, producing a confusing "ONNX file not found" error.

Probably fine in practice (Hub repos use lowercase), but model_id.lower().endswith(".onnx") would close the gap.

DingmaomaoBJTU · 2026-05-26T12:36:52Z

+        # Hub-hosted ONNX (e.g. ``onnx-community/sam3-tracker-ONNX/onnx/...``)
+        # is downloaded once and treated as a local .onnx path thereafter.
+        from ..loader import is_hf_onnx_path, resolve_hf_onnx_path
+


role=path sub-model entries are not wired to the Hub resolver. The role=path branch above (around line 434) validates Path(path).exists() for each sub-model entry but does not call is_hf_onnx_path / resolve_hf_onnx_path. A Hub ref like image-encoder=onnx-community/sam3-tracker-ONNX/onnx/vision_encoder_int8.onnx would fail with "ONNX file not found".

Not blocking for current SAM 3 usage (single-model), but worth a TODO if multi-role eval on Hub-hosted models is planned.

DingmaomaoBJTU · 2026-05-26T12:57:27Z

Questions on validation and evaluation

The bug fixes are well-tested (synthetic ONNX models, mock-based CLI tests, regression assertions) — nice work on the silent-skip audit and the QOperator coverage. A few questions on the SAM 3 model validation side — would be great if you could share any additional results you have.

What the PR verifies (solid foundation)

Check	What it proves
Unit tests pass	Build pipeline code paths don't crash
Integration tests pass	Download + build produces a file
`onnx.checker.check_model`	ONNX is structurally valid
Decoder byte-identical to input	Pipeline didn't corrupt the model
"sane SAM-shaped outputs" (`iou_scores in [0,1]`)	Output tensors have expected shapes/ranges

These are all great and necessary. A few additional data points would help reviewers feel confident about the end-to-end quality:

Would love to see

Accuracy numbers. SAM 3 isn't in models_with_acc.json or models_curated.json yet. If you've done any offline accuracy evaluation (e.g., mIoU on SA-1B or COCO), it would be helpful to share the numbers — even informal ones — so we can baseline the model quality.
mask-generation eval support. The eval command doesn't seem to have mask-generation metric support (mIoU, dice, etc.) yet. Is this something you've been working on separately, or is it planned as a follow-up? Would be good to note in the PR if so.
Encoder end-to-end results. Since the encoder requires NPU and the CI tests can only verify structural correctness (ConvInteger nodes preserved), did you get a chance to run it on an NPU-equipped machine? If you have any end-to-end inference results, sharing them would strengthen the case.
Perf benchmarks. The PR wires SAM 3 into wmk perf — do you have any latency/throughput numbers to share? The Known Limitations section mentions the shape-hints gap for wmk perf, so curious if you were able to work around it for manual benchmarking.
Comparison with PyTorch reference. The byte-identical check nicely proves the build is a clean pass-through. Since the ONNX comes from a third-party repo (onnx-community), did you happen to compare the ONNX outputs against the original facebook/sam3 PyTorch model? Even a spot-check on a few samples would be reassuring.

Task inconsistency between catalog and tests

Small thing I noticed — the encoder's task differs between hub_models.json and the integration tests:

Component	`hub_models.json`	Integration test
Vision encoder	`mask-generation`	`image-feature-extraction`
Prompt encoder + mask decoder	`mask-generation`	`mask-generation`

The encoder only produces image embeddings, so image-feature-extraction (as the test uses) seems more accurate for its role. But the catalog lists both components as mask-generation. Could you clarify which is intended? If mask-generation is correct for the catalog (e.g., because the catalog describes the overall pipeline rather than individual components), it might be worth adding a comment explaining that.

Suggestion

If some of the above are blocked (NPU availability, no mask-generation metric yet), it would help to note that in the PR description under Known Limitations so future maintainers know what's been validated vs. what still needs follow-up.

AChenreddy24 added 2 commits May 11, 2026 18:10

AChenreddy24 requested a review from a team as a code owner May 12, 2026 00:28

AChenreddy24 requested a review from Copilot May 12, 2026 00:29

Copilot started reviewing on behalf of AChenreddy24 May 12, 2026 00:29 View session

github-advanced-security AI found potential problems May 12, 2026

View reviewed changes

Copilot AI reviewed May 12, 2026

View reviewed changes

This comment was marked as duplicate.

Sign in to view

DingmaomaoBJTU reviewed May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/sam3 onnx hub support (issue #324)#582

Feat/sam3 onnx hub support (issue #324)#582
AChenreddy24 wants to merge 2 commits into
mainfrom
feat/sam3-onnx-hub-support

AChenreddy24 commented May 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

This comment was marked as duplicate.

Uh oh!

DingmaomaoBJTU left a comment

Uh oh!

DingmaomaoBJTU May 26, 2026

Uh oh!

DingmaomaoBJTU May 26, 2026

Uh oh!

DingmaomaoBJTU May 26, 2026

Uh oh!

DingmaomaoBJTU May 26, 2026

Uh oh!

DingmaomaoBJTU May 26, 2026

Uh oh!

DingmaomaoBJTU commented May 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		# ``ConvInteger``. Skip the optimize pass and the autoconf re-optim
		# loop; analyze still runs lint-only.

Conversation

AChenreddy24 commented May 12, 2026

Changes

Tests

Limitations

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

This comment was marked as duplicate.

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Code-level issues

Design feedback

1. "Hub-hosted ONNX" is not a distinct input type — it is a download step

2. Model input identification needs a single resolver — is_hub_model already exists

3. Pre-quantized ONNX handling should be decided once, not scattered across three pipelines

4. No discovery mechanism for Hub ONNX files

Uh oh!

DingmaomaoBJTU May 26, 2026

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU May 26, 2026

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU May 26, 2026

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU May 26, 2026

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU May 26, 2026

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Questions on validation and evaluation

What the PR verifies (solid foundation)

Would love to see

Task inconsistency between catalog and tests

Suggestion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

2. Model input identification needs a single resolver — `is_hub_model` already exists

DingmaomaoBJTU commented May 26, 2026 •

edited

Loading