Description
Multiple optimization passes mishandle min precision vector types due to DXC's padded data layout (i16:32, f16:32), where getTypeSizeInBits returns padded sizes for vectors (HLSL change) but primitive sizes for scalars. This causes three related bugs affecting min16float, min16int, and min16uint vector element access ([] operator).
Bug 1: GVN ICE (Internal Compiler Error)
CanCoerceMustAliasedValueToLoad computes an integer type using the padded size (e.g., 96 bits for <3 x half> instead of 48), then CoerceAvailableValueToLoadType attempts a bitcast from the 48-bit LLVM type to i96 — triggering an LLVM assert.
Bug 2: GVN Incorrect Store-to-Load Forwarding (Silent Miscompile)
GVN's processLoad forwards a store <3 x i16> zeroinitializer directly to a later load <3 x i16>, ignoring intermediate partial store i16 writes to individual vector elements. This happens because MemoryDependenceAnalysis uses padded type sizes to determine aliasing.
Bug 3: SROA Element Misindexing (Silent Miscompile)
Root cause of the test failures. SROA's getNaturalGEPRecursively uses getTypeSizeInBits (primitive size: 2 bytes for i16) for vector element offset calculations, while GEP offset computation uses getTypeAllocSize (padded size: 4 bytes with i16:32). This mismatch causes byte offset 4 (element 1) to be mapped to vector index 4/2 = 2 instead of 4/4 = 1, leading SROA to misplace or eliminate stores to vector elements.
Result: Only element [0] is correct; elements [1] and [2] are zeroed.
Repro
RWByteAddressBuffer g_In : register(u0);
RWByteAddressBuffer g_Out : register(u1);
[numthreads(1,1,1)]
void main() {
vector<int, 3> raw = g_In.Load< vector<int, 3> >(0);
vector<min16int, 3> v = (vector<min16int, 3>)raw;
vector<min16int, 3> out_v = (min16int)0;
out_v[0] = v[0];
out_v[2] = v[2];
out_v[1] = v[1];
g_Out.Store< vector<int, 3> >(0, (vector<int, 3>)out_v);
}
Compile with: dxc -T cs_6_9 repro.hlsl
-O0 / -Od: correct results
-O1 (default): Bug 1 (ICE) or Bug 3 (wrong results)
Also reproduces with min16float and min16uint.
Root Cause
DXC's data layout pads min precision types: i16:32 and f16:32. The HLSL change in DataLayout::getTypeSizeInBits (line 540-543) makes vector sizes use getTypeAllocSizeInBits per element, so getTypeSizeInBits(<3 x i16>) = 96 (3 x 32). But scalar getTypeSizeInBits(i16) = 16 returns the primitive width.
This inconsistency propagates through:
- GVN: Uses padded vector sizes for bitcast width calculations and alias reasoning
- SROA: Uses primitive scalar sizes for vector element offsets but padded alloc sizes for GEP offsets — causing index mismatches
Fix
Three guards in lib/Transforms/Scalar/GVN.cpp and lib/Transforms/Scalar/SROA.cpp:
- GVN
CanCoerceMustAliasedValueToLoad: Reject coercion when type sizes include padding
- GVN
processLoad: Skip store-to-load forwarding for padded types
- SROA: Use
getTypeAllocSizeInBits for vector element sizes in getNaturalGEPRecursively, isVectorPromotionViable, and AllocaSliceRewriter, matching GEP offset calculations
Fix branch: https://github.com/alsepkow/DirectXShaderCompiler/tree/user/alsepkow/fix-min-precision-opt-bugs
Squashed commit: alsepkow@b34136b9a
Environment
- DXC version: 1.9.0 (main branch, SM 6.9)
- Affects: all min precision types (min16float, min16int, min16uint) with vector element access
- Does NOT affect native 16-bit types (half with -enable-16bit-types)
Description
Multiple optimization passes mishandle min precision vector types due to DXC's padded data layout (
i16:32,f16:32), wheregetTypeSizeInBitsreturns padded sizes for vectors (HLSL change) but primitive sizes for scalars. This causes three related bugs affectingmin16float,min16int, andmin16uintvector element access ([]operator).Bug 1: GVN ICE (Internal Compiler Error)
CanCoerceMustAliasedValueToLoadcomputes an integer type using the padded size (e.g., 96 bits for<3 x half>instead of 48), thenCoerceAvailableValueToLoadTypeattempts a bitcast from the 48-bit LLVM type toi96— triggering an LLVM assert.Bug 2: GVN Incorrect Store-to-Load Forwarding (Silent Miscompile)
GVN's
processLoadforwards astore <3 x i16> zeroinitializerdirectly to a laterload <3 x i16>, ignoring intermediate partialstore i16writes to individual vector elements. This happens becauseMemoryDependenceAnalysisuses padded type sizes to determine aliasing.Bug 3: SROA Element Misindexing (Silent Miscompile)
Root cause of the test failures. SROA's
getNaturalGEPRecursivelyusesgetTypeSizeInBits(primitive size: 2 bytes for i16) for vector element offset calculations, while GEP offset computation usesgetTypeAllocSize(padded size: 4 bytes withi16:32). This mismatch causes byte offset 4 (element 1) to be mapped to vector index4/2 = 2instead of4/4 = 1, leading SROA to misplace or eliminate stores to vector elements.Result: Only element [0] is correct; elements [1] and [2] are zeroed.
Repro
Compile with:
dxc -T cs_6_9 repro.hlsl-O0/-Od: correct results-O1(default): Bug 1 (ICE) or Bug 3 (wrong results)Also reproduces with
min16floatandmin16uint.Root Cause
DXC's data layout pads min precision types:
i16:32andf16:32. The HLSL change inDataLayout::getTypeSizeInBits(line 540-543) makes vector sizes usegetTypeAllocSizeInBitsper element, sogetTypeSizeInBits(<3 x i16>) = 96(3 x 32). But scalargetTypeSizeInBits(i16) = 16returns the primitive width.This inconsistency propagates through:
Fix
Three guards in
lib/Transforms/Scalar/GVN.cppandlib/Transforms/Scalar/SROA.cpp:CanCoerceMustAliasedValueToLoad: Reject coercion when type sizes include paddingprocessLoad: Skip store-to-load forwarding for padded typesgetTypeAllocSizeInBitsfor vector element sizes ingetNaturalGEPRecursively,isVectorPromotionViable, andAllocaSliceRewriter, matching GEP offset calculationsFix branch: https://github.com/alsepkow/DirectXShaderCompiler/tree/user/alsepkow/fix-min-precision-opt-bugs
Squashed commit: alsepkow@b34136b9a
Environment