Fix rawBufferVectorLoad/Store to widen min precision types to 32-bit#8274
Conversation
RawBufferVectorLoad/Store for min precision types (min16int, min16uint, min16float) was emitting i16/f16 vector operations (e.g., v3i16) which causes WARP and potentially other drivers to load/store 2 bytes per element instead of 4. This mismatches the buffer layout when the CPU writes 32-bit values. Pre-SM6.9 RawBufferLoad correctly handles this by loading as i32/f32 and truncating. Apply the same pattern for SM6.9 vector variants: - RawBufferVectorLoad: load as v_i32/v_f32, then trunc to i16/half - RawBufferVectorStore: sext/fpext to i32/f32, then store as v_i32/v_f32 This matches the existing bool widening pattern already in TranslateBufLoad. Co-authored-by: Copilot <[email protected]>
The DXC compiler fix in PR microsoft#8274 handles min precision vector load/store correctly at the DXIL level, so the test-side IO type indirection (MIN_PRECISION, IO_TYPE, IO_OUT_TYPE, BUFFER_TYPE, dispatchMinPrecisionTest, HLK_MIN_PRECISION_TEST) is no longer needed. Min precision tests now use the same HLK_TEST/HLK_WAVEOP_TEST macros and dispatch paths as all other types. Co-authored-by: Copilot <[email protected]>
…in-precision-vector-load
Are we confident this is the right change? Doesn't SM6.9 make 16bit types manditory and IIRC soon we are going to make them always enabled. |
SM6.9 makes 16-bit type support mandatory. But a shader would still require the flag. If we make them always enabled would that require a new SM? |
The min precision widening changes introduce trunc instructions that
cause SSA values to get names like %.i08 instead of numeric %6. Widen
the FileCheck regex from %{{[0-9]+}} to %{{[^ ,]+}} to accept both.
Co-authored-by: Copilot <[email protected]>
Same fix as min16uint_vec — widen regex from %{{[0-9]+}} to %{{[^ ,]+}}
to accept named SSA values introduced by min precision widening.
Co-authored-by: Copilot <[email protected]>
|
Added by Copilot The min precision widening fix introduces |
V-FEXrt
left a comment
There was a problem hiding this comment.
Few more comments this time, though I again would like a review from a min precision person because while it looks like the PR does what it says it does I'm not actually able to say that we want to do what the PR does (specifically forcing 16->32 given that 16 is a mandatory thing moving forward)
V-FEXrt
left a comment
There was a problem hiding this comment.
Few more comments this time, though I again would like a review from a min precision person because while it looks like the PR does what it says it does I'm not actually able to say that we want to do what the PR does (specifically forcing 16->32 given that 16 is a mandatory thing moving forward)
not sure, but even if it did it would easily go into SM6.10. @daymanp can comment but the idea was to default the flag to true instead of false |
- Extract isMinPrecisionType() and widenMinPrecisionType() helpers - Remove unused TargetTy variable - Shorten verbose comments - Simplify store widening using helper function Co-authored-by: Copilot <[email protected]>
Co-authored-by: Alex Sepkowski <[email protected]>
Co-authored-by: Tex Riddell <[email protected]>
- widenMinPrecisionType now takes a single Type* (vector or scalar) and LLVMContext instead of separate EltTy/VecOrScalarTy and IRBuilder. - Load path uses widenMinPrecisionType upfront, eliminating the else-if-isMinPrec branch per tex3d's suggestion. - Store widening now covers both RawBufferStore and RawBufferVectorStore. Without this, scalar min-precision stores crash with a cast type mismatch. - Added scalar RawBufferLoad/Store test cases for all 3 min-precision types. - Added TODO(microsoft#8314) for SExt/ZExt signedness issue. Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
I don't think that defaulting that flag to causes any apparent problems specific to this change. @tex3d to confirm and keep me honest though! |
|
Good point, I'll change that.
…________________________________
From: Tex Riddell ***@***.***>
Sent: Monday, March 30, 2026 5:34 PM
To: microsoft/DirectXShaderCompiler ***@***.***>
Cc: Alex Sepkowski ***@***.***>; Author ***@***.***>
Subject: Re: [microsoft/DirectXShaderCompiler] Fix rawBufferVectorLoad/Store to widen min precision types to 32-bit (PR #8274)
@tex3d commented on this pull request.
________________________________
On tools/clang/test/HLSLFileCheck/hlsl/objects/ByteAddressBuffer/min_precision_vector_load_store.hlsl<#8274?email_source=notifications&email_token=ABK4EW44PBDODVTS2MNMK534TMG75A5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTIMBTGM4TOOBVGUY2M4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2K64DSL5ZGK5TJMV3V6Y3MNFRWW#discussion_r3012846219>:
My only nit would be with the filename now, since it covers scalar and vector. Not a big deal.
—
Reply to this email directly, view it on GitHub<#8274?email_source=notifications&email_token=ABK4EW6VB4T6VTGVSE4Q6AT4TMG75A5CNFSNUABKM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UKJSXM2LFO4XTIMBTGM4TOOBVGUY2M4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2L24DSL5ZGK5TJMV3V63TPORUWM2LDMF2GS33OONPWG3DJMNVQ#pullrequestreview-4033978551>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABK4EWZLF3L5GTLZAVV3OG34TMG75AVCNFSM6AAAAACWV3OWB2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DAMZTHE3TQNJVGE>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Shorten 3 comments in HLOperationLower.cpp that re-described code behavior. Each now explains the underlying reason in 1 line. The TODO(microsoft#8314) comment was already concise and left unchanged. Co-authored-by: Copilot <[email protected]>
The test covers both vector and scalar paths, so drop 'vector' from the file name to accurately reflect scope. Co-authored-by: Copilot <[email protected]>
Restores SPIRV-Headers and SPIRV-Tools submodule pointers to their previous state and removes setup_agility_sdk.py which was staged unintentionally. Co-authored-by: Copilot <[email protected]>
The store widening for min precision types was applying to both RawBufferStore (scalar) and RawBufferVectorStore (vector). This broke StructuredBuffer stores (struct_buf3.hlsl) because it replaced the correct sext/zext from TranslateMinPrecisionRawBuffer with a blanket sext, losing signedness info for min16uint. Scope the RawBufferStore widening to RawBuffer (ByteAddressBuffer) only. StructuredBuffer scalar stores are correctly handled by the later TranslateMinPrecisionRawBuffer pass in DxilGenerationPass, which has signedness info from struct type annotations. ByteAddressBuffer scalar stores still need widening here because the later pass crashes on non-struct resource types (cast<StructType> on ByteAddressBuffer's i32 inner element). Co-authored-by: Copilot <[email protected]>
Move scalar RawBufferStore widening back to TranslateMinPrecisionRawBuffer in DxilGenerationPass where it belongs, keeping HLOperationLower scoped to RawBufferVectorStore only. Fix crash in ReplaceMinPrecisionRawBufferStoreByType for ByteAddressBuffer: the existing code did cast<StructType> on the inner resource element, which fails for ByteAddressBuffer (inner element is i32, not a struct). Use dyn_cast instead and fall back to sext when no struct annotation is available. Preserve undef args to avoid store mask validation errors. Co-authored-by: Copilot <[email protected]>
Reverts the DxilGenerationPass ByteAddressBuffer scalar store fix from PR microsoft#8274 — it was out of scope and incomplete (sext fallback is wrong for min16uint). Scalar ByteAddressBuffer template store widening should be handled during CodeGen where signedness info is still available. Removes scalar load/store tests from min_precision_raw_load_store.hlsl since scalar ByteAddressBuffer::Store<min16int>() hits a pre-existing crash in TranslateMinPrecisionRawBuffer (cast<StructType> on non-struct resource type). Test now covers vector loads/stores only, which is the scope of the original fix. Co-authored-by: Copilot <[email protected]>
## Summary Reverts the `DxilGenerationPass` ByteAddressBuffer scalar store changes and removes scalar store tests that were included in PR #8274. These changes were out of scope for the vector load/store fix and, on further discussion, were concluded to be incomplete. Follow-up issue for the proper fix: #8322 ## What changed - **DxilGenerationPass.cpp**: Reverted the ByteAddressBuffer fallback path added in `ReplaceMinPrecisionRawBufferStoreByType`. The `sext` fallback was wrong for `min16uint` (loses signedness), and the fix belongs in CodeGen where signedness info is still available. - **min_precision_raw_load_store.hlsl**: Removed scalar load/store tests. Scalar `ByteAddressBuffer::Store<min16int>()` hits a pre-existing crash in `TranslateMinPrecisionRawBuffer` (`cast<StructType>` on ByteAddressBuffer's `i32` inner element). Test now covers vector loads/stores only, which is the scope of the original fix. ## Context The original PR #8274 correctly fixed `RawBufferVectorLoad/Store` to widen min precision types to 32-bit. However, it also added a ByteAddressBuffer scalar store fix in `DxilGenerationPass` that: 1. Crept outside the scope of the vector load/store fix 2. Was incomplete — the `sext` fallback is wrong for unsigned types (`min16uint`) 3. Should instead be handled during Clang CodeGen, where signedness information is available Scalar ByteAddressBuffer template store widening for min precision types is a separate pre-existing issue that needs a proper fix in CodeGen. Co-authored-by: Copilot <[email protected]>
…#8260) This PR extends the SM 6.9 long vector execution tests to cover HLSL min precision types (min16float, min16int, min16uint). These types are always available — `D3D12_SHADER_MIN_PRECISION_SUPPORT` only reports whether hardware actually uses reduced precision, not whether the types compile — so no device capability check is needed and the tests live in the existing `DxilConf_SM69_Vectorized_Core` class alongside other types. Note: I wasn't able to find any existing min precision HLK tests. Unclear if we have coverage. ## Key design decisions **Full-precision buffer I/O:** Min precision types have implementation-defined buffer storage width, so we use full-precision types (`float`/`int`/`uint`) for all Load/Store operations via the `IO_TYPE`/`IO_OUT_TYPE` shader defines, with explicit casts to/from the min precision compute type. This ensures deterministic data layout regardless of the device implementation. **Half-precision tolerances:** Validation compares results in fp16 space using HLSLHalf_t ULP tolerances. Since min precision guarantees at least 16-bit, fp16 tolerances are a correct upper bound — devices computing at higher precision will produce more accurate results, not less. **Test coverage mirrors existing patterns:** - min16float mirrors HLSLHalf_t (float/trig/math/comparison/dot/cast/derivative/wave/quad/load-store) - min16int mirrors int16_t (arithmetic/bitwise/comparison/reduction/cast/wave/quad/load-store) - min16uint mirrors uint16_t (arithmetic/bitwise/comparison/cast/wave/quad/load-store) **Wave and quad op support:** Wave ops (WaveActiveSum/Min/Max/Product/AllEqual, WaveReadLaneAt/First, WavePrefix*, WaveMultiPrefix*, WaveMatch) and quad ops (QuadReadLaneAt, QuadReadAcrossX/Y/Diagonal) are tested for all three min precision types, mirroring the ops supported by their 16-bit equivalents. The wave op shader helpers use `#ifdef MIN_PRECISION` guards to store results via `IO_OUT_TYPE` for deterministic buffer layout without changing DXIL for existing non-min-precision tests. **Excluded operations:** - Signed div/mod on min16int: HLSL does not support signed integer division on min precision types - Bit shifting on min16int/min16uint: Not supported for min precision types - FP specials (INF/NaN/denorm): min precision types do not support them Resolves #7780 All tests require the rawBufferVectorLoad/Store fix from : #8274 The array accessor and wave/quad op tests for min precision require the optimizer fix from: #8269 Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Copilot <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…icrosoft#8274) ## Summary Fixes `RawBufferVectorLoad`/`Store` to use 32-bit element types (`i32`/`f32`) for min precision types (`min16int`, `min16uint`, `min16float`) instead of 16-bit (`i16`/`f16`). This matches how pre-SM6.9 `RawBufferLoad` handles min precision. Resolves microsoft#8273 ## Root Cause `TranslateBufLoad` in `HLOperationLower.cpp` creates the vector type directly from the min precision element type (`i16`/`f16`) without widening to `i32`/`f32`. This causes WARP (and potentially other drivers) to load/store 2 bytes per element instead of 4, mismatching the buffer layout. ## Fix Apply the same widening pattern used for bool types: - **Load**: Load as `v_i32`/`v_f32`, then trunc/fptrunc back to `i16`/`half` - **Store**: `sext`/`fpext` to `i32`/`f32`, then store as `v_i32`/`v_f32` ## Testing Added FileCheck test verifying all 3 min precision types produce `i32`/`f32` vector load/store ops. Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Copilot <[email protected]> Co-authored-by: Tex Riddell <[email protected]>
…icrosoft#8321) ## Summary Reverts the `DxilGenerationPass` ByteAddressBuffer scalar store changes and removes scalar store tests that were included in PR microsoft#8274. These changes were out of scope for the vector load/store fix and, on further discussion, were concluded to be incomplete. Follow-up issue for the proper fix: microsoft#8322 ## What changed - **DxilGenerationPass.cpp**: Reverted the ByteAddressBuffer fallback path added in `ReplaceMinPrecisionRawBufferStoreByType`. The `sext` fallback was wrong for `min16uint` (loses signedness), and the fix belongs in CodeGen where signedness info is still available. - **min_precision_raw_load_store.hlsl**: Removed scalar load/store tests. Scalar `ByteAddressBuffer::Store<min16int>()` hits a pre-existing crash in `TranslateMinPrecisionRawBuffer` (`cast<StructType>` on ByteAddressBuffer's `i32` inner element). Test now covers vector loads/stores only, which is the scope of the original fix. ## Context The original PR microsoft#8274 correctly fixed `RawBufferVectorLoad/Store` to widen min precision types to 32-bit. However, it also added a ByteAddressBuffer scalar store fix in `DxilGenerationPass` that: 1. Crept outside the scope of the vector load/store fix 2. Was incomplete — the `sext` fallback is wrong for unsigned types (`min16uint`) 3. Should instead be handled during Clang CodeGen, where signedness information is available Scalar ByteAddressBuffer template store widening for min precision types is a separate pre-existing issue that needs a proper fix in CodeGen. Co-authored-by: Copilot <[email protected]>
…microsoft#8260) This PR extends the SM 6.9 long vector execution tests to cover HLSL min precision types (min16float, min16int, min16uint). These types are always available — `D3D12_SHADER_MIN_PRECISION_SUPPORT` only reports whether hardware actually uses reduced precision, not whether the types compile — so no device capability check is needed and the tests live in the existing `DxilConf_SM69_Vectorized_Core` class alongside other types. Note: I wasn't able to find any existing min precision HLK tests. Unclear if we have coverage. ## Key design decisions **Full-precision buffer I/O:** Min precision types have implementation-defined buffer storage width, so we use full-precision types (`float`/`int`/`uint`) for all Load/Store operations via the `IO_TYPE`/`IO_OUT_TYPE` shader defines, with explicit casts to/from the min precision compute type. This ensures deterministic data layout regardless of the device implementation. **Half-precision tolerances:** Validation compares results in fp16 space using HLSLHalf_t ULP tolerances. Since min precision guarantees at least 16-bit, fp16 tolerances are a correct upper bound — devices computing at higher precision will produce more accurate results, not less. **Test coverage mirrors existing patterns:** - min16float mirrors HLSLHalf_t (float/trig/math/comparison/dot/cast/derivative/wave/quad/load-store) - min16int mirrors int16_t (arithmetic/bitwise/comparison/reduction/cast/wave/quad/load-store) - min16uint mirrors uint16_t (arithmetic/bitwise/comparison/cast/wave/quad/load-store) **Wave and quad op support:** Wave ops (WaveActiveSum/Min/Max/Product/AllEqual, WaveReadLaneAt/First, WavePrefix*, WaveMultiPrefix*, WaveMatch) and quad ops (QuadReadLaneAt, QuadReadAcrossX/Y/Diagonal) are tested for all three min precision types, mirroring the ops supported by their 16-bit equivalents. The wave op shader helpers use `#ifdef MIN_PRECISION` guards to store results via `IO_OUT_TYPE` for deterministic buffer layout without changing DXIL for existing non-min-precision tests. **Excluded operations:** - Signed div/mod on min16int: HLSL does not support signed integer division on min precision types - Bit shifting on min16int/min16uint: Not supported for min precision types - FP specials (INF/NaN/denorm): min precision types do not support them Resolves microsoft#7780 All tests require the rawBufferVectorLoad/Store fix from : microsoft#8274 The array accessor and wave/quad op tests for min precision require the optimizer fix from: microsoft#8269 Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Copilot <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…in precision types to 32-bit (#8369) Cherry-pick PR (#8274) and revert of out-of-scope changes PR (#8321) Assisted by gh copilot. SHA [dc4354b](dc4354b) SHA [71aa195](71aa195) --------- Co-authored-by: Copilot <[email protected]> Co-authored-by: Tex Riddell <[email protected]>
Summary
Fixes
RawBufferVectorLoad/Storeto use 32-bit element types (i32/f32) for min precision types (min16int,min16uint,min16float) instead of 16-bit (i16/f16). This matches how pre-SM6.9RawBufferLoadhandles min precision.Resolves #8273
Root Cause
TranslateBufLoadinHLOperationLower.cppcreates the vector type directly from the min precision element type (i16/f16) without widening toi32/f32. This causes WARP (and potentially other drivers) to load/store 2 bytes per element instead of 4, mismatching the buffer layout.Fix
Apply the same widening pattern used for bool types:
v_i32/v_f32, then trunc/fptrunc back toi16/halfsext/fpexttoi32/f32, then store asv_i32/v_f32Testing
Added FileCheck test verifying all 3 min precision types produce
i32/f32vector load/store ops.Co-authored-by: Copilot [email protected]