[PyTorch/Common] Remove legacy FP8DS implementation #2959
[PyTorch/Common] Remove legacy FP8DS implementation #2959cyanguwa wants to merge 10 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
for more information, see https://pre-commit.ci
Greptile SummaryThis PR removes the legacy FP8 Delayed Scaling implementation from TE 1.6.0 that supported only T3HD layout with
Confidence Score: 5/5Safe to merge — a focused deletion of the v0 T3HD FP8 path with consistent cleanup across C++, CUDA, and Python layers. All changes are pure removal or renaming of code that was gated behind the T3HD layout. The v1 functions are promoted to canonical names and the call sites, parameter lists, aux-tensor packing, and tests are updated uniformly. The T3HD layout itself remains valid for F16/BF16 paths; only the FP8 subpath is dropped. No new logic is introduced. No files require special attention. The deletions in Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["nvte_get_fused_attn_backend()"] --> B{FP8 dtype?}
B -- No --> C["F16/BF16 backend selection"]
B -- Yes --> D{cuDNN version & layout checks}
D -- "cuDNN >= 9.2.1 & BSHD/SBHD/BHSD" --> E["NVTE_FP8 backend"]
D -- "layout = T3HD\n(or unsupported)" --> F["NVTE_No_Backend\n(T3HD FP8 no longer supported)"]
E --> G["nvte_fused_attn_fwd / bwd"]
G --> H{qkv_format?}
H -- "BSHD / SBHD / BHSD" --> I["fused_attn_fp8_fwd_impl()\nfused_attn_fp8_bwd_impl()\n(formerly _v1 - now canonical)"]
H -- "Other (e.g. THD)" --> J["NVTE_ERROR: unsupported format"]
I --> K["Aux CTX tensors:\nS (softmax stats)\noptional Max\nrng_state"]
Reviews (6): Last reviewed commit: "address review: drop dead 8.9 FP8 guard ..." | Re-trigger Greptile |
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Charlene Yang <[email protected]>
|
/te-ci L0 |
Description
This PR removes a legacy path of FP8 Delayed Scaling implementation from TE 1.6.0. It supports T3HD with max_seq_len<=512, head_dim=64, and padding mask. cudnn-frontend will remove their pre-FORT hand-written FMHA kernels (MR2829) hence the removal of this FP8 implementation here. General THD support for FP8 will be added in future PRs.
Type of change
Changes
See Description.
Checklist: