Use half-precision ULP for min16float dot product tolerance

alsepkow · Copilot · alsepkow · commit d8cfc9ee5314 · 2026-03-13T22:53:40.000-07:00
The dot product tolerance computation was using float32 ULPs for
HLSLMin16Float_t, but the GPU may compute at float16 precision.
With NUM=256 elements the accumulated error exceeds the float32-based
epsilon. Use HLSLHalf_t::GetULP to compute half-precision ULPs for
min16float, matching the approach already used for HLSLHalf_t.

Co-authored-by: Copilot &lt;223556219+Copilot@users.noreply.github.com&gt;
diff --git a/tools/clang/unittests/HLSLExec/LongVectors.cpp b/tools/clang/unittests/HLSLExec/LongVectors.cpp
@@ -1359,7 +1359,12 @@ static double computeAbsoluteEpsilon(double A, double ULPTolerance) {
 
   if constexpr (std::is_same_v<T, HLSLHalf_t>)
     ULP = HLSLHalf_t::GetULP(A);
-  else
+  else if constexpr (std::is_same_v<T, HLSLMin16Float_t>) {
+    // Min precision floats may be computed at float16 on the GPU, so use
+    // half-precision ULP for tolerance. Reuse HLSLHalf_t::GetULP which
+    // computes ULP by incrementing the float16 bit representation.
+    ULP = HLSLHalf_t::GetULP(HLSLHalf_t(static_cast<float>(A)));
+  } else
     ULP =
         std::nextafter(static_cast<T>(A), std::numeric_limits<T>::infinity()) -
         static_cast<T>(A);