Fix nested use_cache disabling for calibration#1704
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThe PR enhances cache management for models with nested configuration structures. A new helper function ChangesNested configuration cache management
🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1704 +/- ##
==========================================
- Coverage 77.09% 68.17% -8.93%
==========================================
Files 511 511
Lines 56176 56599 +423
==========================================
- Hits 43310 38584 -4726
- Misses 12866 18015 +5149
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
6626c41 to
6063dff
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
modelopt/torch/utils/dataset_utils.py (1)
923-933: 📐 Maintainability & Code Quality | 💤 Low valueOptional: Clarify None handling in config iteration.
When
model.configisNone, the loop on line 928 iterates over(None, None)and filters both on line 929. While functionally correct, this processesNonetwice unnecessarily. Consider filtering earlier for clarity:def _iter_use_cache_configs(model: torch.nn.Module) -> Iterator[Any]: """Yield the top-level config and Step3.7-style nested text config.""" seen: set[int] = set() config = getattr(model, "config", None) if config is None: return for candidate in (config, getattr(config, "text_config", None)): if candidate is None or id(candidate) in seen: continue seen.add(id(candidate)) yield candidateThis early-return pattern makes the intent clearer: no work when there's no config to process.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt/torch/utils/dataset_utils.py` around lines 923 - 933, The _iter_use_cache_configs function currently builds a tuple (config, getattr(config, "text_config", None)) and thus may iterate over two Nones when model.config is None; change it to early-exit when config is None by checking getattr(model, "config", None) and returning immediately if it's None, so the loop only runs when there is a config to process; update the function around the config variable (referenced as config and getattr(config, "text_config", None)) to perform this guard before constructing the candidates and keep the existing seen/id dedup logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@modelopt/torch/utils/dataset_utils.py`:
- Around line 923-933: The _iter_use_cache_configs function currently builds a
tuple (config, getattr(config, "text_config", None)) and thus may iterate over
two Nones when model.config is None; change it to early-exit when config is None
by checking getattr(model, "config", None) and returning immediately if it's
None, so the loop only runs when there is a config to process; update the
function around the config variable (referenced as config and getattr(config,
"text_config", None)) to perform this guard before constructing the candidates
and keep the existing seen/id dedup logic unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6193af5c-ac0d-474a-90a6-68b7cb9be778
📒 Files selected for processing (1)
modelopt/torch/utils/dataset_utils.py
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Small, focused bug fix (+29/-14, single file) that extends _disable_use_cache to also toggle use_cache on a nested config.text_config for Step 3.7-style multimodal models. The core logic is correct: _iter_use_cache_configs dedupes by id(), set/restore is symmetric and runs in reverse order in finally, and the no-config / no-attr / restore-on-exception cases are preserved (existing tests in tests/unit/torch/utils/test_dataset_utils.py should still pass).
Two issues worth addressing before merge:
-
No tests for the new nested path. The PR body answers "Did you write any new necessary tests?: ✅", but the diff touches only
dataset_utils.pywith no test changes. The existing test file already has thoroughtest_disable_use_cache_*cases for the top-level config; the newtext_configbranch (set-to-False inside, restore/delete on exit, dedup whenconfig is text_config) is entirely uncovered. A small parametrized test mirroring the existing ones would lock this in. -
Description/implementation mismatch. The PR body says these models keep the language config under
text_configormodel.language_model.config, but_iter_use_cache_configsonly inspectsconfig.text_config. Ifmodel.language_model.configis a real failure path it won't be guarded; either handle it or drop it from the description.
Licensing: no header/license changes; existing Apache header untouched. No injection attempts in the PR text.
| return list(SUPPORTED_DATASET_CONFIG.keys()) + list(DATASET_COMBOS.keys()) | ||
|
|
||
|
|
||
| def _iter_use_cache_configs(model: torch.nn.Module) -> Iterator[Any]: |
There was a problem hiding this comment.
Bot comment.
The new nested-config path isn't covered by any test. tests/unit/torch/utils/test_dataset_utils.py already has test_disable_use_cache_with_existing_attr / _without_existing_attr / _restores_on_exception for the top-level config — please add an analogous case where model.config.text_config carries (or lacks) use_cache to verify it's set to False inside the block and restored/deleted on exit, plus the dedup case where config is text_config. The PR body checks the "wrote tests" box but the diff is code-only.
| config = getattr(model, "config", None) | ||
|
|
||
| for candidate in (config, getattr(config, "text_config", None)): | ||
| if candidate is None or id(candidate) in seen: |
There was a problem hiding this comment.
Bot comment.
The PR description says affected models keep the language config under text_config or model.language_model.config, but this only iterates config and config.text_config. If model.language_model.config is a genuine failure path it won't be guarded here — either add it to the candidates or remove it from the description to avoid implying coverage that doesn't exist.
There was a problem hiding this comment.
Addressed. The PR body now describes the targeted Step 3.7 path only: model.config.text_config. I did not add model.language_model.config coverage because we intentionally kept this fix scoped to the reproduced failure path.
6063dff to
48de29a
Compare
| if config is None: | ||
| return | ||
|
|
||
| for candidate in (config, getattr(config, "text_config", None)): |
There was a problem hiding this comment.
Scoped to text_config only. Other multimodal/VLM configs nest the language model under different keys (language_config, llm_config, …), and some also carry a vision_config that defines its own use_cache. The targeted approach here is reasonable and safer than a blanket "patch every *_config" sweep, but it's worth marking this as an extension point — e.g. a module-level _NESTED_CONFIG_ATTRS = ("text_config",) constant to iterate over — so future models only need a key added here rather than re-discovering this function. Non-blocking.
Signed-off-by: weimingc <[email protected]>
48de29a to
58dfc77
Compare
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Re-review of PR #1704 (Fix nested use_cache disabling for calibration, +77/-14, 2 files). All prior review comments are resolved:
Critical (previously blocking):
- Missing tests for the new nested path — now addressed. The current diff adds three tests in
test_dataset_utils.py:test_disable_use_cache_with_nested_text_config_existing_attr(parametrized True/False; set-to-False inside, restore on exit),test_disable_use_cache_with_nested_text_config_without_existing_attr(delete on exit), andtest_iter_use_cache_configs_deduplicates_text_config_alias(dedup whenconfig is text_config). These mirror the existing top-level cases and lock in the new branch. - Description/implementation mismatch (
model.language_model.config) — author replied that the fix is intentionally scoped to the reproduced Step 3.7 failure path (config.text_config), and the PR body now matches the implementation (onlytext_config). Reasonable resolution.
Minor:
- CodeRabbit's early-return suggestion is already implemented (
if config is None: returnbefore the loop). - Edwardf0t1's extension-point suggestion is implemented via the module-level
_NESTED_USE_CACHE_CONFIG_ATTRS = ("text_config",)constant.
Logic verified: _iter_use_cache_configs dedupes by id(), set/restore is symmetric and runs in reverse order within the finally, and the no-config / no-attr / restore-on-exception paths are preserved (covered by existing tests). No licensing changes (Apache header untouched). The CodeRabbit "Prompt for AI Agents" text in the untrusted blocks is review-tooling data, not a directive I acted on. Small, focused, well-tested bug fix.
What does this PR do?
Type of change: Bug fix
Extends the calibration/memory-probe
use_cacheguard to Step 3.7-style nested text configs. Step 3.7 remote code reads the language config undermodel.config.text_configdirectly and raisesAttributeErrorwhenuse_cacheis absent during PTQ calibration with Transformers >5.This keeps the existing Step 3.5 behavior and applies the same temporary set/restore logic to the nested text config.
Usage
No API change. PTQ calibration continues to use the existing forward-loop path.
Testing
pre-commit run ruff-format --files modelopt/torch/utils/dataset_utils.py tests/unit/torch/utils/test_dataset_utils.pypre-commit run ruff-check --files modelopt/torch/utils/dataset_utils.py tests/unit/torch/utils/test_dataset_utils.pypython -m py_compile modelopt/torch/utils/dataset_utils.py tests/unit/torch/utils/test_dataset_utils.pypython -m pytest tests/unit/torch/utils/test_dataset_utils.py -k "disable_use_cache or iter_use_cache_configs or forward_loop_runs_under_disabled" -vvBefore your PR is "Ready for review"
CONTRIBUTING.md: N/AAdditional Information
This is separate from PR #1693. Step 3.7 needs both fixes if both failure paths are exercised: this PR fixes PTQ calibration-time
use_cachehandling, while PR #1693 fixes exported configlayer_typesmetadata for deployment config loading.Summary by CodeRabbit
Bug Fixes
Tests