Skip to content

Detect fused experts using _apply_gate as activation marker#1703

Open
Mapika wants to merge 1 commit into
NVIDIA:mainfrom
Mapika:fix/fused-experts-apply-gate
Open

Detect fused experts using _apply_gate as activation marker#1703
Mapika wants to merge 1 commit into
NVIDIA:mainfrom
Mapika:fix/fused-experts-apply-gate

Conversation

@Mapika

@Mapika Mapika commented Jun 12, 2026

Copy link
Copy Markdown

What does this PR do?

Type of change: Bug fix

Overview: _is_fused_experts_module() requires an act_fn attribute, so fused expert modules that implement their gated activation as a method instead — e.g. transformers' MiniMaxM3VLExperts, whose GPT-OSS-style clamped swiglu lives in _apply_gate() — are silently left unquantized: PTQ completes without error, but all experts stay high-precision. This PR accepts _apply_gate as an alternative activation marker. The eager forward of such modules still performs the same two F.linear calls per expert that _QuantFusedExperts intercepts, so detection is the only change needed.

Testing

  • New unit test test_module_with_apply_gate_detected in tests/unit/torch/quantization/plugins/test_fused_experts.py; full TestIsFusedExpertsModule suite passes.
  • End-to-end: with this fix, NVFP4 PTQ of MiniMax-M3 (854 GB BF16 → ~256 GB) produces a working checkpoint (GSM8K strict 92.6–93.6 across runs vs 93.9 for the official MXFP8 baseline, same engine and sampling). Without it, the exported "quantized" model silently keeps full-precision experts.

Additional notes

Docstring of _is_fused_experts_module updated; no new dependencies.

Summary by CodeRabbit

  • Bug Fixes

    • Improved quantization expert module detection to recognize alternative gate implementations such as clamped gates, expanding compatibility with diverse Mixture of Experts architectures and enabling broader support for specialized expert configurations used in advanced models.
  • Tests

    • Added test coverage to verify expert module detection properly recognizes quantization modules using alternative gate implementations.

_is_fused_experts_module() required an act_fn attribute, so fused expert
modules that implement their gated activation as a method instead — e.g.
MiniMaxM3VLExperts, whose clamped swiglu lives in _apply_gate() — were
silently left unquantized (PTQ completes, experts stay high-precision).

Accept _apply_gate as an alternative activation marker. The eager forward
of such modules still performs the same two F.linear calls per expert
that _QuantFusedExperts intercepts, so no other change is needed.

Signed-off-by: Mapika <[email protected]>
@Mapika Mapika requested a review from a team as a code owner June 12, 2026 21:40
@Mapika Mapika requested a review from sugunav14 June 12, 2026 21:40
@copy-pr-bot

copy-pr-bot Bot commented Jun 12, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0b11edea-17f9-4518-a3ff-a57c2d7923ce

📥 Commits

Reviewing files that changed from the base of the PR and between ddc0a8e and e644ecf.

📒 Files selected for processing (2)
  • modelopt/torch/quantization/plugins/huggingface.py
  • tests/unit/torch/quantization/plugins/test_fused_experts.py

📝 Walkthrough

Walkthrough

This PR extends the fused-HuggingFace MoE expert detection helper to recognize expert containers that implement gate fusion via an _apply_gate attribute instead of only act_fn. The detection logic predicate and docstring are updated, and a new unit test verifies detection works for the alternative gate implementation.

Changes

Fused-expert detection expansion

Layer / File(s) Summary
Detection predicate and documentation
modelopt/torch/quantization/plugins/huggingface.py, tests/unit/torch/quantization/plugins/test_fused_experts.py
_is_fused_experts_module docstring clarified to document both act_fn and _apply_gate as valid gate indicators. Predicate updated from checking only act_fn to accepting either act_fn or _apply_gate. New synthetic test module _ApplyGateExperts verifies detection returns True for _apply_gate-based experts.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: extending fused experts module detection to recognize _apply_gate as an activation marker alternative to act_fn.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed In the PR-touched areas (huggingface.py _is_fused_experts_module + new test), no torch.load(weights_only=False), np.load(allow_pickle=True), trust_remote_code=True, eval/exec, or #nosec patterns we...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant