Declarative Apriel hybrid block layout via ListDispatchConfigConverter#518
Open
jlamypoirier wants to merge 6 commits into
Open
Declarative Apriel hybrid block layout via ListDispatchConfigConverter#518jlamypoirier wants to merge 6 commits into
jlamypoirier wants to merge 6 commits into
Conversation
Adds a ``ListDispatchConfigConverter`` primitive that subsumes Apriel's positional ``hybrid_block_layout`` discriminator (one entry per pipeline position selecting per-block converters whose HF claims flat-merge at the HF root). The ``AprielBlockConverter``/``AprielDecoderConverter`` imperative dispatchers shrink to weight-side helpers plus the dispatch registries the primitive consumes. Co-Authored-By: Claude Opus 4.7 <[email protected]>
Addresses /review-coarse @ 3397144: * ``ConfigSectionConverter._consumed_hf_paths`` no longer special-cases primitive types via an isinstance switch. Each primitive that has dynamic claims (Nested, Dispatch, TypedDictContainer, ListDispatch) overrides ``ConfigConverter._consumed_hf_paths``; the aggregator collapses to a single union loop. The framework knows nothing about which primitives are "special". * ``ListDispatchConfigConverter`` moves out of ``fast_llm/engine/checkpoint/external.py`` and into ``fast_llm/models/gpt/conversion/apriel.py`` next to its only consumer. The engine→layers import (``FixedBlockSequenceConfig`` / ``PatternBlockSequenceConfig``) goes away. * The primitive's ``__init__`` now asserts ``set(layout_names) <= set(registry)`` so the load-bearing subset invariant is encoded in code rather than prose. * ``AprielDecoderConverter`` is deleted; ``AprielBaseModelConverter`` carries the dispatch registry directly (``block_dispatcher_class``, named to avoid colliding with the parent's ``block_converter_class: type[ConfigSectionConverter]``) and inlines the per-block-index weight loop into its own ``get_converters``. Co-Authored-By: Claude Opus 4.7 <[email protected]>
Addresses /review-fine @ 3397144: * ``ListDispatchConfigConverter``, ``AprielBlockConverter``, ``AprielBaseModelConverter`` docstrings rewritten to describe what each class is, not how the refactor arrived at it (no references to parent overrides, downstream consumers, or design intent). * ``ListDispatchConfigConverter.export_to`` drops the ``list(...)`` wrap around ``block_sequence.blocks.values()`` — the values view is iterated once. Co-Authored-By: Claude Opus 4.7 <[email protected]>
…est walker Addresses /review-coarse @ 07d77c2: * ``AprielBaseModelConverter.get_converters`` now dispatches on the decoder type before iterating per-position blocks. The previous code indexed ``decoder.blocks`` / ``decoder.pattern`` unconditionally, which crashed for ``FixedBlockSequenceConfig`` (the shape ``ListDispatchConfigConverter.import_to`` returns for a single-entry layout). This was a latent bug inherited from the deleted ``AprielDecoderConverter.get_converters``. * ``tests/models/test_converters.py::_children`` now picks up any dispatch primitive exposing a ``_registry: dict`` via ``hasattr`` instead of an explicit isinstance tuple reaching across module boundaries. Drops the now-unneeded ``ListDispatchConfigConverter`` import from the apriel module. Co-Authored-By: Claude Opus 4.7 <[email protected]>
Addresses /review-fine @ 07d77c2: * ``ListDispatchConfigConverter.export_to``'s ``NotImplementedError`` now reports the offending type plus the ``fast_llm_path`` (matches ``DispatchConfigConverter``'s pattern). * ``Assert.custom`` lambda parameter renamed ``lns`` → ``names``. * Docstring rewrites the disjoint-vs-shared claim so it describes the actual invariant (per-mixer nested subdicts disjoint; shared top-level scalars cross-validate). * ``AprielBaseModelConverter.get_converters`` uses ``decoder.expanded_pattern`` instead of the manual ``pattern[i % len(pattern)]`` modulo (folded in per the reviewer's note). Co-Authored-By: Claude Opus 4.7 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ListDispatchConfigConverterto the post-Declarative checkpoint config conversion #508 declarative converter framework: a primitive for HF formats that discriminate per-position block types via a positional list (e.g. Apriel'shybrid_block_layout) and flat-merge each block's HF claims at the HF root.AprielBlockConverter/AprielDecoderConverterfrom imperative dispatchers to weight-side helpers + dispatch registries consumed by the new primitive; deletes ~50 lines of imperativeimport_config/export_config/_consumed_hf_pathsplumbing.+117 / -94acrossexternal.py,apriel.py, andtest_converters.py. Coverage walker,_consumed_hf_paths, and the test fixture's_childrenwalker each get one new branch for the primitive.Test plan
tests/models/test_converters.py + test_checkpoint.py + test_hf_roundtrip.py(cluster): 281 passed, matches post-Declarative checkpoint config conversion #508 baseline.fast_llm_external_models/tests/(cluster, separate invocation): 2109 passed, 42 skipped, matches baseline.AprielBaseModelConverter.export_config↔.import_configround-trips both Fixed (["t"]) and Pattern (["t", "t"]) layouts.