Detect fused experts using _apply_gate as activation marker by Mapika · Pull Request #1703 · NVIDIA/Model-Optimizer

Mapika · 2026-06-12T21:40:22Z

What does this PR do?

Type of change: Bug fix

Overview: _is_fused_experts_module() requires an act_fn attribute, so fused expert modules that implement their gated activation as a method instead — e.g. transformers' MiniMaxM3VLExperts, whose GPT-OSS-style clamped swiglu lives in _apply_gate() — are silently left unquantized: PTQ completes without error, but all experts stay high-precision. This PR accepts _apply_gate as an alternative activation marker. The eager forward of such modules still performs the same two F.linear calls per expert that _QuantFusedExperts intercepts, so detection is the only change needed.

Testing

New unit test test_module_with_apply_gate_detected in tests/unit/torch/quantization/plugins/test_fused_experts.py; full TestIsFusedExpertsModule suite passes.
End-to-end: with this fix, NVFP4 PTQ of MiniMax-M3 (854 GB BF16 → ~256 GB) produces a working checkpoint (GSM8K strict 92.6–93.6 across runs vs 93.9 for the official MXFP8 baseline, same engine and sampling). Without it, the exported "quantized" model silently keeps full-precision experts.

Additional notes

Docstring of _is_fused_experts_module updated; no new dependencies.

Summary by CodeRabbit

Bug Fixes
- Improved quantization expert module detection to recognize alternative gate implementations such as clamped gates, expanding compatibility with diverse Mixture of Experts architectures and enabling broader support for specialized expert configurations used in advanced models.
Tests
- Added test coverage to verify expert module detection properly recognizes quantization modules using alternative gate implementations.

_is_fused_experts_module() required an act_fn attribute, so fused expert modules that implement their gated activation as a method instead — e.g. MiniMaxM3VLExperts, whose clamped swiglu lives in _apply_gate() — were silently left unquantized (PTQ completes, experts stay high-precision). Accept _apply_gate as an alternative activation marker. The eager forward of such modules still performs the same two F.linear calls per expert that _QuantFusedExperts intercepts, so no other change is needed. Signed-off-by: Mapika <[email protected]>

copy-pr-bot · 2026-06-12T21:40:26Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-06-12T21:40:37Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0b11edea-17f9-4518-a3ff-a57c2d7923ce

📥 Commits

Reviewing files that changed from the base of the PR and between ddc0a8e and e644ecf.

📒 Files selected for processing (2)

modelopt/torch/quantization/plugins/huggingface.py
tests/unit/torch/quantization/plugins/test_fused_experts.py

📝 Walkthrough

Walkthrough

This PR extends the fused-HuggingFace MoE expert detection helper to recognize expert containers that implement gate fusion via an _apply_gate attribute instead of only act_fn. The detection logic predicate and docstring are updated, and a new unit test verifies detection works for the alternative gate implementation.

Changes

Fused-expert detection expansion

Layer / File(s)	Summary
Detection predicate and documentation `modelopt/torch/quantization/plugins/huggingface.py`, `tests/unit/torch/quantization/plugins/test_fused_experts.py`	`_is_fused_experts_module` docstring clarified to document both `act_fn` and `_apply_gate` as valid gate indicators. Predicate updated from checking only `act_fn` to accepting either `act_fn` or `_apply_gate`. New synthetic test module `_ApplyGateExperts` verifies detection returns `True` for `_apply_gate`-based experts.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely describes the main change: extending fused experts module detection to recognize _apply_gate as an activation marker alternative to act_fn.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	In the PR-touched areas (huggingface.py _is_fused_experts_module + new test), no torch.load(weights_only=False), np.load(allow_pickle=True), trust_remote_code=True, eval/exec, or `#nosec` patterns we...

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Mapika requested a review from a team as a code owner June 12, 2026 21:40

Mapika requested a review from sugunav14 June 12, 2026 21:40

coderabbitai Bot approved these changes Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect fused experts using _apply_gate as activation marker#1703

Detect fused experts using _apply_gate as activation marker#1703
Mapika wants to merge 1 commit into
NVIDIA:mainfrom
Mapika:fix/fused-experts-apply-gate

Mapika commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented Jun 12, 2026

Uh oh!

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mapika commented Jun 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Additional notes

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Jun 12, 2026

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Mapika commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading