Add Qwen 3.5 (4B/9B, Base/Instruct) to supervised_finetuning recipes#327
Open
dgallitelli wants to merge 1 commit into
Open
Add Qwen 3.5 (4B/9B, Base/Instruct) to supervised_finetuning recipes#327dgallitelli wants to merge 1 commit into
dgallitelli wants to merge 1 commit into
Conversation
Adds 4 notebooks and 8 recipe YAMLs covering Qwen 3.5 in 4B and 9B sizes, each in both Base (pretrained) and Instruct (post-trained) variants. Each notebook supports QLoRA (4-bit) and full fine-tuning via a strategy toggle. Zero source-code changes. Each notebook overrides sagemaker_code/requirements.txt via %%writefile (with .bak backup, restored by the final cell). Pin bumps: - transformers 4.57.0 -> 5.2.0 (qwen3_5 architecture not in 4.x) - peft 0.17.0 -> 0.18.1 (HybridCache removed in transformers 5.x) - bitsandbytes 0.46.1 -> 0.49.2 (first version with CUDA 13.0 binary) - liger-kernel 0.6.1 -> 0.7.0 (HybridCache compat) trl == 0.21.0 unchanged. No edits to sft.py, merge_adapter_weights.py, or sm_accelerate_train.sh. Validated end-to-end with two SageMaker training jobs (us-east-1, ml.g5.2xlarge): Qwen3.5-4B and Qwen3.5-4B-Base, both Completed, loss trajectories ~1.6 -> ~1.4. Reference repo at https://github.com/dgallitelli/qwen35-sft-sagemaker carries a full 8-config validation matrix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Qwen 3.5 (4B / 9B, Base / Instruct) to supervised_finetuning recipes
Summary
Adds four notebooks and eight recipe YAMLs to
0_model_customization_recipes/supervised_finetuning/for fine-tuning Qwen 3.5 — both Base and post-trained ("Instruct") variants in 4B and 9B sizes. Each notebook supports both QLoRA (4-bit) and full fine-tuning, selectable via a strategy toggle.finetune--Qwen--Qwen3.5-4B-Base.ipynbml.g5.2xlarge(1× A10G 24 GB)ml.g7e.2xlarge(1× RTX PRO 6000 96 GB)finetune--Qwen--Qwen3.5-4B.ipynbml.g5.2xlargeml.g7e.2xlargefinetune--Qwen--Qwen3.5-9B-Base.ipynbml.g5.2xlargeml.g7e.12xlarge(4× RTX PRO 6000)finetune--Qwen--Qwen3.5-9B.ipynbml.g5.2xlargeml.g7e.12xlargeNaming note: On HuggingFace, post-trained variants are published as
Qwen/Qwen3.5-{4B,9B}with no-Instructsuffix — the-Basesuffix denotes the pretrained checkpoint. Both share the sameqwen3_5architecture, so the same DLC and dependency pins apply; only weights and chat template differ.What this PR does NOT change
Zero source code modifications. No edits to
sagemaker_code/sft.py,sagemaker_code/utils/merge_adapter_weights.py, orsm_accelerate_train.sh. Qwen 3.5 works with the existing shared scaffolding as-is.What this PR DOES change
finetune--Qwen--Qwen3.5-*.ipynbnotebookssagemaker_code/hf_recipes/Qwen/Each notebook overrides
sagemaker_code/requirements.txtvia a%%writefilecell at job-submit time (with.bakbackup, restored by the final cell). The override:transformersqwen3_5architecture not in 4.xpeftHybridCacheremoved in transformers 5.x; 0.17 hardcodes the importbitsandbytesliger-kernelHybridCachecompatibility issue as pefttrl == 0.21.0and the rest of the toolchain are unchanged — the lockstep TRL 1.x bump that Gemma 4 needs is not required here, so existing sibling recipes are unaffected.Validation
This PR's smoke tests (us-east-1, 2026-05-22)
Two SageMaker training jobs run from a clean upstream checkout with only this PR's notebooks + YAMLs + the
%%writefile-appliedrequirements.txt:finetune--Qwen--Qwen3.5-4B.ipynbml.g5.2xlargefinetune--Qwen--Qwen3.5-4B-Base.ipynbml.g5.2xlargeBoth jobs trained 20 steps over 100 rows of
Josephgflowers/Finance-Instruct-500k, saved a PEFT adapter, and ran the upstreammerge_adapter_weights.pyto completion (the merge step works for Qwen 3.5 — no Gemma-4-style skip needed).Reference-repo validation matrix
All eight recipe × instance combinations have additionally been validated end-to-end with real SageMaker training jobs in the reference repo. Highlights:
ml.g5.2xlargeml.g7e.2xlargeml.g7e.2xlargeml.g7e.12xlargeml.g7e.12xlargeml.g5.2xlargeml.g6e.2xlargeQLoRA recipes also validated portable across
ml.g5.2xlarge/ml.g6.4xlarge/ml.g7e.2xlargewithout any recipe changes.Rationale
Qwen 3.5 has been generally available on Hugging Face for several months and is a strong general-purpose model that customers ask about regularly. The
qwen3_5architecture isn't intransformers 4.57(the sharedrequirements.txtpin), and that single bump cascades topeft/bitsandbytes/liger-kernel— but none of the cascade reachestrlor the source code, so the change is fully contained inrequirements.txt.Submitting this as a notebook-only PR for now (matching the precedent of the Gemma 4 recipe submitted earlier today) so reviewers can merge it without touching shared code or other recipes. If
requirements.txtis bumped repo-wide in a future PR, the%%writefilecell becomes a no-op — the notebook continues to work without change.Files added