[SM6.10][Exec][Bugfix] Fix OuterProduct/AccumulateToDescriptor Smoke Tests for Thread Matrices#8387
[SM6.10][Exec][Bugfix] Fix OuterProduct/AccumulateToDescriptor Smoke Tests for Thread Matrices#8387V-FEXrt wants to merge 2 commits intomicrosoft:mainfrom
Conversation
| Device, DxcSupport, std::move(Op), | ||
| [NumElements, Params, FillValue](LPCSTR Name, std::vector<BYTE> &Data, | ||
| st::ShaderOp *) { | ||
| VERIFY_IS_TRUE(fillInputBuffer(Name, Data, Params.CompType, NumElements, |
There was a problem hiding this comment.
If the layout isn't RowMajor, this needs to run a ConvertLinearAlgebraMatrix
There was a problem hiding this comment.
<edit: the comment here was in the wrong place>
|
The title is a little misleading, AccumulateToDescriptor for Thread Scope matrices require OuterProductOptimal layouts, not all thread matrices. |
|
@anupamachandra yep, my bad. I threw the first draft of this together a bit too quickly. I'm working on updating it now. Thanks for pointing that out! |
| SS << " -DUSE=" << static_cast<int>(Params.Use); | ||
| SS << " -DSCOPE=" << static_cast<int>(Params.Scope); | ||
| SS << " -DSTRIDE=" << Params.strideBytes(); | ||
| SS << " -DSTRIDE=" << Params.rowStride(); |
There was a problem hiding this comment.
The stride is a problem for group shared load and store, from spec, the stride of group shared is the count of elements, so it should be N or M for group shared.
it needs to fix:
__builtin_LinAlg_MatrixLoadFromMemory(
Mat, GsData, OFFSET, STRIDE, LAYOUT);
__builtin_LinAlg_MatrixStoreToMemory(
Mat, GsData, OFFSET, STRIDE, LAYOUT);
also, group shared offset is set to 0 from test, it's okay here, but I guess the offset for group shared also the count of elements?
There was a problem hiding this comment.
Working on a fix for the stride issue!
IIRC OFFSET is still a proper offset into the array. If your group shared array is larger than a single matrix then it may contain other data in parts of the array before/atter the matrix data. Either way we should clarify that in the spec. I'll make a note
| // flatten the 2D index into a 1D index then scale by element size | ||
| // Always store row-major and work it out in the test runner | ||
| uint coordToByteOffset(uint2 coord) { | ||
| return (coord.y * N_DIM + coord.x) * ELEM_SIZE; |
There was a problem hiding this comment.
This is not related to this PR, but I guess coordToByteOffset should be this?
return (coord.x * N_DIM + coord.y) * ELEM_SIZE;
There was a problem hiding this comment.
It should be coord.y * M_DIM + coord.x) * ELEM_SIZE for a row-major calculation. This just happens to work because all tests have M == N. Good catch.
There was a problem hiding this comment.
Nice catch thanks!
Yes right now all smoke tests are (intentionally) square matrices. I'll update this in a separate PR
There was a problem hiding this comment.
GetCoordinate() returned coord.x is row coordinate and coord.y is column coordinate, (coord.x * N_DIM + coord.y) * ELEM_SIZE for row major while (coord.y * M_DIM + coord.x) * ELEM_SIZE for column major?
There was a problem hiding this comment.
(coord.x * N_DIM + coord.y) * ELEM_SIZE for row major while (coord.y * M_DIM + coord.x) * ELEM_SIZE for column major
Agree to that.
There was a problem hiding this comment.
Current spec says it converts a specified index into row and column coordinates. The valid range of Index is 0, Length()-1
There was a problem hiding this comment.
so spec will change to coord.x is coordinate within the row, and coord.y is coordinate within the column, not row and column index, is it right?
There was a problem hiding this comment.
I'm open to a change either way, the spec just needs to be more specific. I would've expected that an int2 with x and y members would use x and y coordinate addressing, instead of row and col addressing, but I can see an argument to be made for both directions. I filed microsoft/hlsl-specs#859 to close on it.
There was a problem hiding this comment.
Thanks, either way is okay for us, we will follow current coordinate address, and might change it based on microsoft/hlsl-specs#859
Fixes #8386