Skip to content

Commit d73a9f5

Browse files
authored
[SM 6.9] Fix OuterProductAccumulate FP32 Accumulator case in ExecTest. (#7482)
The switch that sets SrcEltSize and DestEltSize is missing an FP32 case. This results in the matrix buffer not being initialized with all 1.0s and causes tests to fail due to expected result being off by -1.0. Verified correctness with NVIDIA internal driver build.
1 parent 7f86d74 commit d73a9f5

1 file changed

Lines changed: 5 additions & 0 deletions

File tree

tools/clang/unittests/HLSLExec/ExecutionTest.cpp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13501,6 +13501,11 @@ float4 ps_main() : SV_Target {
1350113501
SrcEltSize = 4; // FP32
1350213502
DestEltSize = 2; // FP16
1350313503
break;
13504+
case D3D12_LINEAR_ALGEBRA_DATATYPE_FLOAT32:
13505+
DestInfo.DestDataType = D3D12_LINEAR_ALGEBRA_DATATYPE_FLOAT32;
13506+
SrcEltSize = 4; // FP32
13507+
DestEltSize = 4; // FP32
13508+
break;
1350413509
case D3D12_LINEAR_ALGEBRA_DATATYPE_FLOAT_E4M3:
1350513510
DestInfo.DestDataType = D3D12_LINEAR_ALGEBRA_DATATYPE_FLOAT_E4M3;
1350613511
SrcEltSize = 4; // FP32

0 commit comments

Comments
 (0)