Add Qwen 3.5 (4B/9B, Base/Instruct) to supervised_finetuning recipes by dgallitelli · Pull Request #327 · aws-samples/amazon-sagemaker-generativeai

dgallitelli · 2026-05-22T14:41:46Z

Add Qwen 3.5 (4B / 9B, Base / Instruct) to supervised_finetuning recipes

Summary

Adds four notebooks and eight recipe YAMLs to 0_model_customization_recipes/supervised_finetuning/ for fine-tuning Qwen 3.5 — both Base and post-trained ("Instruct") variants in 4B and 9B sizes. Each notebook supports both QLoRA (4-bit) and full fine-tuning, selectable via a strategy toggle.

Notebook	Variants covered	QLoRA default	Full default
`finetune--Qwen--Qwen3.5-4B-Base.ipynb`	Pretrained	`ml.g5.2xlarge` (1× A10G 24 GB)	`ml.g7e.2xlarge` (1× RTX PRO 6000 96 GB)
`finetune--Qwen--Qwen3.5-4B.ipynb`	Instruct (post-trained)	`ml.g5.2xlarge`	`ml.g7e.2xlarge`
`finetune--Qwen--Qwen3.5-9B-Base.ipynb`	Pretrained	`ml.g5.2xlarge`	`ml.g7e.12xlarge` (4× RTX PRO 6000)
`finetune--Qwen--Qwen3.5-9B.ipynb`	Instruct	`ml.g5.2xlarge`	`ml.g7e.12xlarge`

Naming note: On HuggingFace, post-trained variants are published as Qwen/Qwen3.5-{4B,9B} with no -Instruct suffix — the -Base suffix denotes the pretrained checkpoint. Both share the same qwen3_5 architecture, so the same DLC and dependency pins apply; only weights and chat template differ.

What this PR does NOT change

Zero source code modifications. No edits to sagemaker_code/sft.py, sagemaker_code/utils/merge_adapter_weights.py, or sm_accelerate_train.sh. Qwen 3.5 works with the existing shared scaffolding as-is.

What this PR DOES change

4 new finetune--Qwen--Qwen3.5-*.ipynb notebooks
8 new recipe YAMLs under sagemaker_code/hf_recipes/Qwen/

Each notebook overrides sagemaker_code/requirements.txt via a %%writefile cell at job-submit time (with .bak backup, restored by the final cell). The override:

Package	Shared default	Required	Why
`transformers`	4.57.0	5.2.0	`qwen3_5` architecture not in 4.x
`peft`	0.17.0	0.18.1	`HybridCache` removed in transformers 5.x; 0.17 hardcodes the import
`bitsandbytes`	0.46.1	0.49.2	First version with a CUDA 13.0 binary (DLC ships CUDA 13)
`liger-kernel`	0.6.1	0.7.0	Same `HybridCache` compatibility issue as peft

trl == 0.21.0 and the rest of the toolchain are unchanged — the lockstep TRL 1.x bump that Gemma 4 needs is not required here, so existing sibling recipes are unaffected.

Validation

This PR's smoke tests (us-east-1, 2026-05-22)

Two SageMaker training jobs run from a clean upstream checkout with only this PR's notebooks + YAMLs + the %%writefile-applied requirements.txt:

#	Notebook	Strategy	Instance	Status	Billable	Loss
1	`finetune--Qwen--Qwen3.5-4B.ipynb`	QLoRA	`ml.g5.2xlarge`	Completed	688 s	1.66 → 1.41
2	`finetune--Qwen--Qwen3.5-4B-Base.ipynb`	QLoRA	`ml.g5.2xlarge`	Completed	471 s	1.57 → 1.41

Both jobs trained 20 steps over 100 rows of Josephgflowers/Finance-Instruct-500k, saved a PEFT adapter, and ran the upstream merge_adapter_weights.py to completion (the merge step works for Qwen 3.5 — no Gemma-4-style skip needed).

Reference-repo validation matrix

All eight recipe × instance combinations have additionally been validated end-to-end with real SageMaker training jobs in the reference repo. Highlights:

#	Variant	Strategy	Instance	GPU(s)	Billable
T1	4B Instruct	QLoRA	`ml.g5.2xlarge`	1× A10G 24 GB	~21 min
T2	4B Base	Full SFT	`ml.g7e.2xlarge`	1× RTX PRO 6000 96 GB	~29 min
T3	4B Instruct	Full SFT	`ml.g7e.2xlarge`	1× RTX PRO 6000 96 GB	~30 min
T4	9B Base	Full SFT	`ml.g7e.12xlarge`	4× RTX PRO 6000 (384 GB total)	~49 min
T5	9B Instruct	Full SFT	`ml.g7e.12xlarge`	4× RTX PRO 6000 (384 GB total)	~46 min
T6	9B Instruct	QLoRA	`ml.g5.2xlarge`	1× A10G 24 GB	~28 min
T7	9B Instruct	QLoRA	`ml.g6e.2xlarge`	1× L40S 48 GB	~22 min

QLoRA recipes also validated portable across ml.g5.2xlarge / ml.g6.4xlarge / ml.g7e.2xlarge without any recipe changes.

Rationale

Qwen 3.5 has been generally available on Hugging Face for several months and is a strong general-purpose model that customers ask about regularly. The qwen3_5 architecture isn't in transformers 4.57 (the shared requirements.txt pin), and that single bump cascades to peft / bitsandbytes / liger-kernel — but none of the cascade reaches trl or the source code, so the change is fully contained in requirements.txt.

Submitting this as a notebook-only PR for now (matching the precedent of the Gemma 4 recipe submitted earlier today) so reviewers can merge it without touching shared code or other recipes. If requirements.txt is bumped repo-wide in a future PR, the %%writefile cell becomes a no-op — the notebook continues to work without change.

Files added

0_model_customization_recipes/supervised_finetuning/
├── finetune--Qwen--Qwen3.5-4B-Base.ipynb
├── finetune--Qwen--Qwen3.5-4B.ipynb
├── finetune--Qwen--Qwen3.5-9B-Base.ipynb
├── finetune--Qwen--Qwen3.5-9B.ipynb
└── sagemaker_code/hf_recipes/Qwen/
    ├── Qwen3.5-4B-Base--vanilla-peft-qlora.yaml
    ├── Qwen3.5-4B-Base--vanilla-full.yaml
    ├── Qwen3.5-4B--vanilla-peft-qlora.yaml
    ├── Qwen3.5-4B--vanilla-full.yaml
    ├── Qwen3.5-9B-Base--vanilla-peft-qlora.yaml
    ├── Qwen3.5-9B-Base--vanilla-full.yaml
    ├── Qwen3.5-9B--vanilla-peft-qlora.yaml
    └── Qwen3.5-9B--vanilla-full.yaml

Adds 4 notebooks and 8 recipe YAMLs covering Qwen 3.5 in 4B and 9B sizes, each in both Base (pretrained) and Instruct (post-trained) variants. Each notebook supports QLoRA (4-bit) and full fine-tuning via a strategy toggle. Zero source-code changes. Each notebook overrides sagemaker_code/requirements.txt via %%writefile (with .bak backup, restored by the final cell). Pin bumps: - transformers 4.57.0 -> 5.2.0 (qwen3_5 architecture not in 4.x) - peft 0.17.0 -> 0.18.1 (HybridCache removed in transformers 5.x) - bitsandbytes 0.46.1 -> 0.49.2 (first version with CUDA 13.0 binary) - liger-kernel 0.6.1 -> 0.7.0 (HybridCache compat) trl == 0.21.0 unchanged. No edits to sft.py, merge_adapter_weights.py, or sm_accelerate_train.sh. Validated end-to-end with two SageMaker training jobs (us-east-1, ml.g5.2xlarge): Qwen3.5-4B and Qwen3.5-4B-Base, both Completed, loss trajectories ~1.6 -> ~1.4. Reference repo at https://github.com/dgallitelli/qwen35-sft-sagemaker carries a full 8-config validation matrix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen 3.5 (4B/9B, Base/Instruct) to supervised_finetuning recipes#327

Add Qwen 3.5 (4B/9B, Base/Instruct) to supervised_finetuning recipes#327
dgallitelli wants to merge 1 commit into
aws-samples:mainfrom
dgallitelli:add-qwen35-recipes

dgallitelli commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dgallitelli commented May 22, 2026

Add Qwen 3.5 (4B / 9B, Base / Instruct) to supervised_finetuning recipes

Summary

What this PR does NOT change

What this PR DOES change

Validation

This PR's smoke tests (us-east-1, 2026-05-22)

Reference-repo validation matrix

Rationale

Files added

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant