Skip to content

Add Qwen 3.5 (4B/9B, Base/Instruct) to supervised_finetuning recipes#327

Open
dgallitelli wants to merge 1 commit into
aws-samples:mainfrom
dgallitelli:add-qwen35-recipes
Open

Add Qwen 3.5 (4B/9B, Base/Instruct) to supervised_finetuning recipes#327
dgallitelli wants to merge 1 commit into
aws-samples:mainfrom
dgallitelli:add-qwen35-recipes

Conversation

@dgallitelli

Copy link
Copy Markdown
Contributor

Add Qwen 3.5 (4B / 9B, Base / Instruct) to supervised_finetuning recipes

Summary

Adds four notebooks and eight recipe YAMLs to 0_model_customization_recipes/supervised_finetuning/ for fine-tuning Qwen 3.5 — both Base and post-trained ("Instruct") variants in 4B and 9B sizes. Each notebook supports both QLoRA (4-bit) and full fine-tuning, selectable via a strategy toggle.

Notebook Variants covered QLoRA default Full default
finetune--Qwen--Qwen3.5-4B-Base.ipynb Pretrained ml.g5.2xlarge (1× A10G 24 GB) ml.g7e.2xlarge (1× RTX PRO 6000 96 GB)
finetune--Qwen--Qwen3.5-4B.ipynb Instruct (post-trained) ml.g5.2xlarge ml.g7e.2xlarge
finetune--Qwen--Qwen3.5-9B-Base.ipynb Pretrained ml.g5.2xlarge ml.g7e.12xlarge (4× RTX PRO 6000)
finetune--Qwen--Qwen3.5-9B.ipynb Instruct ml.g5.2xlarge ml.g7e.12xlarge

Naming note: On HuggingFace, post-trained variants are published as Qwen/Qwen3.5-{4B,9B} with no -Instruct suffix — the -Base suffix denotes the pretrained checkpoint. Both share the same qwen3_5 architecture, so the same DLC and dependency pins apply; only weights and chat template differ.

What this PR does NOT change

Zero source code modifications. No edits to sagemaker_code/sft.py, sagemaker_code/utils/merge_adapter_weights.py, or sm_accelerate_train.sh. Qwen 3.5 works with the existing shared scaffolding as-is.

What this PR DOES change

  • 4 new finetune--Qwen--Qwen3.5-*.ipynb notebooks
  • 8 new recipe YAMLs under sagemaker_code/hf_recipes/Qwen/

Each notebook overrides sagemaker_code/requirements.txt via a %%writefile cell at job-submit time (with .bak backup, restored by the final cell). The override:

Package Shared default Required Why
transformers 4.57.0 5.2.0 qwen3_5 architecture not in 4.x
peft 0.17.0 0.18.1 HybridCache removed in transformers 5.x; 0.17 hardcodes the import
bitsandbytes 0.46.1 0.49.2 First version with a CUDA 13.0 binary (DLC ships CUDA 13)
liger-kernel 0.6.1 0.7.0 Same HybridCache compatibility issue as peft

trl == 0.21.0 and the rest of the toolchain are unchanged — the lockstep TRL 1.x bump that Gemma 4 needs is not required here, so existing sibling recipes are unaffected.

Validation

This PR's smoke tests (us-east-1, 2026-05-22)

Two SageMaker training jobs run from a clean upstream checkout with only this PR's notebooks + YAMLs + the %%writefile-applied requirements.txt:

# Notebook Strategy Instance Status Billable Loss
1 finetune--Qwen--Qwen3.5-4B.ipynb QLoRA ml.g5.2xlarge Completed 688 s 1.66 → 1.41
2 finetune--Qwen--Qwen3.5-4B-Base.ipynb QLoRA ml.g5.2xlarge Completed 471 s 1.57 → 1.41

Both jobs trained 20 steps over 100 rows of Josephgflowers/Finance-Instruct-500k, saved a PEFT adapter, and ran the upstream merge_adapter_weights.py to completion (the merge step works for Qwen 3.5 — no Gemma-4-style skip needed).

Reference-repo validation matrix

All eight recipe × instance combinations have additionally been validated end-to-end with real SageMaker training jobs in the reference repo. Highlights:

# Variant Strategy Instance GPU(s) Billable
T1 4B Instruct QLoRA ml.g5.2xlarge 1× A10G 24 GB ~21 min
T2 4B Base Full SFT ml.g7e.2xlarge 1× RTX PRO 6000 96 GB ~29 min
T3 4B Instruct Full SFT ml.g7e.2xlarge 1× RTX PRO 6000 96 GB ~30 min
T4 9B Base Full SFT ml.g7e.12xlarge 4× RTX PRO 6000 (384 GB total) ~49 min
T5 9B Instruct Full SFT ml.g7e.12xlarge 4× RTX PRO 6000 (384 GB total) ~46 min
T6 9B Instruct QLoRA ml.g5.2xlarge 1× A10G 24 GB ~28 min
T7 9B Instruct QLoRA ml.g6e.2xlarge 1× L40S 48 GB ~22 min

QLoRA recipes also validated portable across ml.g5.2xlarge / ml.g6.4xlarge / ml.g7e.2xlarge without any recipe changes.

Rationale

Qwen 3.5 has been generally available on Hugging Face for several months and is a strong general-purpose model that customers ask about regularly. The qwen3_5 architecture isn't in transformers 4.57 (the shared requirements.txt pin), and that single bump cascades to peft / bitsandbytes / liger-kernel — but none of the cascade reaches trl or the source code, so the change is fully contained in requirements.txt.

Submitting this as a notebook-only PR for now (matching the precedent of the Gemma 4 recipe submitted earlier today) so reviewers can merge it without touching shared code or other recipes. If requirements.txt is bumped repo-wide in a future PR, the %%writefile cell becomes a no-op — the notebook continues to work without change.

Files added

0_model_customization_recipes/supervised_finetuning/
├── finetune--Qwen--Qwen3.5-4B-Base.ipynb
├── finetune--Qwen--Qwen3.5-4B.ipynb
├── finetune--Qwen--Qwen3.5-9B-Base.ipynb
├── finetune--Qwen--Qwen3.5-9B.ipynb
└── sagemaker_code/hf_recipes/Qwen/
    ├── Qwen3.5-4B-Base--vanilla-peft-qlora.yaml
    ├── Qwen3.5-4B-Base--vanilla-full.yaml
    ├── Qwen3.5-4B--vanilla-peft-qlora.yaml
    ├── Qwen3.5-4B--vanilla-full.yaml
    ├── Qwen3.5-9B-Base--vanilla-peft-qlora.yaml
    ├── Qwen3.5-9B-Base--vanilla-full.yaml
    ├── Qwen3.5-9B--vanilla-peft-qlora.yaml
    └── Qwen3.5-9B--vanilla-full.yaml

Adds 4 notebooks and 8 recipe YAMLs covering Qwen 3.5 in 4B and 9B sizes,
each in both Base (pretrained) and Instruct (post-trained) variants. Each
notebook supports QLoRA (4-bit) and full fine-tuning via a strategy toggle.

Zero source-code changes. Each notebook overrides
sagemaker_code/requirements.txt via %%writefile (with .bak backup,
restored by the final cell). Pin bumps:
- transformers 4.57.0 -> 5.2.0  (qwen3_5 architecture not in 4.x)
- peft 0.17.0 -> 0.18.1          (HybridCache removed in transformers 5.x)
- bitsandbytes 0.46.1 -> 0.49.2  (first version with CUDA 13.0 binary)
- liger-kernel 0.6.1 -> 0.7.0    (HybridCache compat)

trl == 0.21.0 unchanged. No edits to sft.py, merge_adapter_weights.py,
or sm_accelerate_train.sh.

Validated end-to-end with two SageMaker training jobs (us-east-1,
ml.g5.2xlarge): Qwen3.5-4B and Qwen3.5-4B-Base, both Completed, loss
trajectories ~1.6 -> ~1.4. Reference repo at
https://github.com/dgallitelli/qwen35-sft-sagemaker carries a full
8-config validation matrix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant