Skip to content

[SM6.10][Exec][Bugfix] Fix OuterProduct/AccumulateToDescriptor Smoke Tests for Thread Matrices#8387

Open
V-FEXrt wants to merge 2 commits intomicrosoft:mainfrom
V-FEXrt:linalg-outeropt-layout
Open

[SM6.10][Exec][Bugfix] Fix OuterProduct/AccumulateToDescriptor Smoke Tests for Thread Matrices#8387
V-FEXrt wants to merge 2 commits intomicrosoft:mainfrom
V-FEXrt:linalg-outeropt-layout

Conversation

@V-FEXrt
Copy link
Copy Markdown
Collaborator

@V-FEXrt V-FEXrt commented Apr 17, 2026

Fixes #8386

Comment thread tools/clang/unittests/HLSLExec/LinAlgTests.cpp Outdated
Device, DxcSupport, std::move(Op),
[NumElements, Params, FillValue](LPCSTR Name, std::vector<BYTE> &Data,
st::ShaderOp *) {
VERIFY_IS_TRUE(fillInputBuffer(Name, Data, Params.CompType, NumElements,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the layout isn't RowMajor, this needs to run a ConvertLinearAlgebraMatrix

Copy link
Copy Markdown
Collaborator Author

@V-FEXrt V-FEXrt Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<edit: the comment here was in the wrong place>

Comment thread tools/clang/unittests/HLSLExec/LinAlgTests.cpp
@github-project-automation github-project-automation Bot moved this from New to In progress in HLSL Roadmap Apr 17, 2026
@anupamachandra
Copy link
Copy Markdown
Collaborator

The title is a little misleading, AccumulateToDescriptor for Thread Scope matrices require OuterProductOptimal layouts, not all thread matrices.

@V-FEXrt
Copy link
Copy Markdown
Collaborator Author

V-FEXrt commented Apr 17, 2026

@anupamachandra yep, my bad. I threw the first draft of this together a bit too quickly. I'm working on updating it now. Thanks for pointing that out!

@V-FEXrt V-FEXrt changed the title [SM6.10][Exec][Bugfix] Thread mats should be OuterProductOptimal layout [SM6.10][Exec][Bugfix] AccumulateToDescriptor requiresx OuterProductOptimal for Thread mats Apr 17, 2026
@V-FEXrt V-FEXrt changed the title [SM6.10][Exec][Bugfix] AccumulateToDescriptor requiresx OuterProductOptimal for Thread mats [SM6.10][Exec][Bugfix] AccumulateToDescriptor requires OuterProductOptimal for Thread Mats Apr 17, 2026
@V-FEXrt V-FEXrt changed the title [SM6.10][Exec][Bugfix] AccumulateToDescriptor requires OuterProductOptimal for Thread Mats [SM6.10][Exec][Bugfix] Fix OuterProduct/AccumulateToDescriptor Smoke Tests for Thread Matrices Apr 18, 2026
SS << " -DUSE=" << static_cast<int>(Params.Use);
SS << " -DSCOPE=" << static_cast<int>(Params.Scope);
SS << " -DSTRIDE=" << Params.strideBytes();
SS << " -DSTRIDE=" << Params.rowStride();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stride is a problem for group shared load and store, from spec, the stride of group shared is the count of elements, so it should be N or M for group shared.

it needs to fix:
__builtin_LinAlg_MatrixLoadFromMemory(
Mat, GsData, OFFSET, STRIDE, LAYOUT);
__builtin_LinAlg_MatrixStoreToMemory(
Mat, GsData, OFFSET, STRIDE, LAYOUT);

also, group shared offset is set to 0 from test, it's okay here, but I guess the offset for group shared also the count of elements?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on a fix for the stride issue!

IIRC OFFSET is still a proper offset into the array. If your group shared array is larger than a single matrix then it may contain other data in parts of the array before/atter the matrix data. Either way we should clarify that in the spec. I'll make a note

// flatten the 2D index into a 1D index then scale by element size
// Always store row-major and work it out in the test runner
uint coordToByteOffset(uint2 coord) {
return (coord.y * N_DIM + coord.x) * ELEM_SIZE;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related to this PR, but I guess coordToByteOffset should be this?
return (coord.x * N_DIM + coord.y) * ELEM_SIZE;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be coord.y * M_DIM + coord.x) * ELEM_SIZE for a row-major calculation. This just happens to work because all tests have M == N. Good catch.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch thanks!

Yes right now all smoke tests are (intentionally) square matrices. I'll update this in a separate PR

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetCoordinate() returned coord.x is row coordinate and coord.y is column coordinate, (coord.x * N_DIM + coord.y) * ELEM_SIZE for row major while (coord.y * M_DIM + coord.x) * ELEM_SIZE for column major?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(coord.x * N_DIM + coord.y) * ELEM_SIZE for row major while (coord.y * M_DIM + coord.x) * ELEM_SIZE for column major

Agree to that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yeah the spec is ambiguous. For WARP I took the vector xy result as (x, y) coordinates rather than (row, col). @beanz @tex3d I think we need to add better spec language here.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current spec says it converts a specified index into row and column coordinates. The valid range of Index is 0, Length()-1

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so spec will change to coord.x is coordinate within the row, and coord.y is coordinate within the column, not row and column index, is it right?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to a change either way, the spec just needs to be more specific. I would've expected that an int2 with x and y members would use x and y coordinate addressing, instead of row and col addressing, but I can see an argument to be made for both directions. I filed microsoft/hlsl-specs#859 to close on it.

Copy link
Copy Markdown

@xiaolin-ji xiaolin-ji Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, either way is okay for us, we will follow current coordinate address, and might change it based on microsoft/hlsl-specs#859

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

[SM6.10] linAlgMatrixAccumulateToDescriptor: Incorrect Matrix definition in LinAlgTests

6 participants