-
Notifications
You must be signed in to change notification settings - Fork 853
[0029] [Main] For OuterProductAccumulate, matrix layout must be outerproductoptimal and matrix stride must be zero #7417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
damyanp
merged 17 commits into
microsoft:main
from
anupamachandra:anupamac/outer-prod-acc-matrix-layout-main
May 13, 2025
Merged
Changes from 13 commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
0fa5058
Check if MatrixLayout parameter in OuterPrduct Accumulate is OuterPro…
anupamachandra 7549a90
Clang Format
anupamachandra 01aecfe
Simlify matrix layout check and fix typo in validation message matrix…
anupamachandra 6e6d542
Remove stray CHECK directive
anupamachandra 84eac73
Fix test with new requirement for outer product accumulate layout and…
anupamachandra 15c77e1
Fix more tests
anupamachandra aeb3754
Fix Test, OPA stride = 0
anupamachandra 390fdf8
Fix typo in test
anupamachandra 766ecf5
Updates per review feedback
anupamachandra 8b2170b
Remove old file
anupamachandra baa4376
Add Check for invalid matrix stride, add source correlation strings
anupamachandra b6dde46
Fix path
anupamachandra c33d080
Move outer-product test location, fix test cmd
anupamachandra f69c97b
Update test comments per review feedback
anupamachandra f915958
Update error message per review feedback
anupamachandra c7aea41
Merge remote-tracking branch 'origin/master' into anupamac/outer-prod…
anupamachandra 30525f5
Merge from main
anupamachandra File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
28 changes: 28 additions & 0 deletions
28
tools/clang/test/CodeGenDXIL/hlsl/linalg/outer-product-accumulate-matrix-layout.hlsl
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| // RUN: %dxc -I %hlsl_headers -T cs_6_9 %s -enable-16bit-types -DML=MATRIX_LAYOUT_OUTER_PRODUCT_OPTIMAL -DSTRIDE=0 2>&1 | FileCheck %s | ||
|
|
||
| //Source file for the IR in \tools\clang\test\LitDXILValidation\outer-product-accumulate-matrix-layout-failing.ll | ||
| //Source file for the IR in \tools\clang\test\LitDXILValidation\outer-product-accumulate-matrix-layout-passing.ll | ||
|
|
||
| ByteAddressBuffer input_vector_buffer; | ||
| ByteAddressBuffer input_vector_buffer2; | ||
| RWByteAddressBuffer matrix_buffer; | ||
|
|
||
| #include <dx/linalg.h> | ||
|
|
||
| // CHECK: call void @dx.op.outerProductAccumulate.v8f16.v8f16(i32 307, <8 x half> %{{[^ ]+}}, <8 x half> %{{[^ ]+}}, %dx.types.Handle %{{[^ ]+}}, i32 0, i32 8, i32 3, i32 0) | ||
| using namespace dx::linalg; | ||
|
|
||
| [Numthreads(1,1,1)] | ||
| [shader("compute")] | ||
| void main() | ||
| { | ||
| vector<half, 8> input_vector1 = input_vector_buffer.Load<vector<half, 8> >(0); | ||
| vector<half, 8> input_vector2 = input_vector_buffer2.Load<vector<half, 8> >(0); | ||
|
|
||
| const uint matrix_interpretation = DATA_TYPE_FLOAT16; | ||
| const uint matrix_layout = ML; | ||
| const uint matrix_offset = 0; | ||
| const uint matrix_stride = STRIDE; | ||
|
|
||
| __builtin_OuterProductAccumulate(input_vector1, input_vector2, matrix_buffer, matrix_offset, matrix_interpretation, matrix_layout, matrix_stride); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
84 changes: 84 additions & 0 deletions
84
tools/clang/test/LitDXILValidation/outer-product-accumulate-matrix-layout-failing.ll
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| ; REQUIRES: dxil-1-9 | ||
|
bob80905 marked this conversation as resolved.
|
||
| ; RUN: not %dxv %s 2>&1 | FileCheck %s | ||
|
|
||
| ;Original Source: \tools\clang\test\CodeGenHLSL\linalg\outer-product-accumulate-matrix-layout.hlsl | ||
|
|
||
| target datalayout = "e-m:e-p:32:32-i1:32-i8:8-i16:16-i32:32-i64:64-f16:16-f32:32-f64:64-n8:16:32:64" | ||
| target triple = "dxil-ms-dx" | ||
|
|
||
| %dx.types.Handle = type { i8* } | ||
| %dx.types.ResBind = type { i32, i32, i32, i8 } | ||
| %dx.types.ResourceProperties = type { i32, i32 } | ||
| %dx.types.ResRet.v8f16 = type { <8 x half>, i32 } | ||
| %struct.ByteAddressBuffer = type { i32 } | ||
| %struct.RWByteAddressBuffer = type { i32 } | ||
|
|
||
| ; As noted in other tests, the validation errors come out in | ||
| ; an order different from the IR. So listed them here in the | ||
| ; order they appear and added comments for correlation | ||
|
|
||
| ;CHECK: error: matrix stride must be zero for optimal layouts | ||
| ;CHECK: error: matrix stride must be zero for optimal layouts | ||
| ;CHECK-NOT: error: matrix layout value 'OuterProductOptimal' is not valid for outerproductaccumulate, must be 'OuterProductOptimal' | ||
| ;CHECK: error: matrix layout value 'MulOptimal' is not valid for outerproductaccumulate, must be 'OuterProductOptimal' | ||
| ;CHECK: error: matrix layout value 'ColumnMajor' is not valid for outerproductaccumulate, must be 'OuterProductOptimal' | ||
| ;CHECK: error: matrix layout value 'RowMajor' is not valid for outerproductaccumulate, must be 'OuterProductOptimal' | ||
| ; CHECK: Validation failed. | ||
|
|
||
| define void @main() { | ||
| %1 = call %dx.types.Handle @dx.op.createHandleFromBinding(i32 217, %dx.types.ResBind { i32 0, i32 0, i32 0, i8 1 }, i32 0, i1 false) ; CreateHandleFromBinding(bind,index,nonUniformIndex) | ||
| %2 = call %dx.types.Handle @dx.op.createHandleFromBinding(i32 217, %dx.types.ResBind { i32 1, i32 1, i32 0, i8 0 }, i32 1, i1 false) ; CreateHandleFromBinding(bind,index,nonUniformIndex) | ||
| %3 = call %dx.types.Handle @dx.op.createHandleFromBinding(i32 217, %dx.types.ResBind zeroinitializer, i32 0, i1 false) ; CreateHandleFromBinding(bind,index,nonUniformIndex) | ||
| %4 = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle %3, %dx.types.ResourceProperties { i32 11, i32 0 }) ; AnnotateHandle(res,props) resource: ByteAddressBuffer | ||
| %5 = call %dx.types.ResRet.v8f16 @dx.op.rawBufferVectorLoad.v8f16(i32 303, %dx.types.Handle %4, i32 0, i32 undef, i32 2) ; RawBufferVectorLoad(buf,index,elementOffset,alignment) | ||
| %6 = extractvalue %dx.types.ResRet.v8f16 %5, 0 | ||
| %7 = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle %2, %dx.types.ResourceProperties { i32 11, i32 0 }) ; AnnotateHandle(res,props) resource: ByteAddressBuffer | ||
| %8 = call %dx.types.ResRet.v8f16 @dx.op.rawBufferVectorLoad.v8f16(i32 303, %dx.types.Handle %7, i32 0, i32 undef, i32 2) ; RawBufferVectorLoad(buf,index,elementOffset,alignment) | ||
| %9 = extractvalue %dx.types.ResRet.v8f16 %8, 0 | ||
| %10 = call %dx.types.Handle @dx.op.annotateHandle(i32 216, %dx.types.Handle %1, %dx.types.ResourceProperties { i32 4107, i32 0 }) ; AnnotateHandle(res,props) resource: RWByteAddressBuffer | ||
| ; error: matrix layout value 'RowMajor' is not valid for outerproductaccumulate, must be 'OuterProductOptimal' | ||
| call void @dx.op.outerProductAccumulate.v8f16.v8f16(i32 307, <8 x half> %6, <8 x half> %9, %dx.types.Handle %10, i32 0, i32 8, i32 0, i32 0) ; OuterProductAccumulate(inputVector1,inputVector2,matrixBuffer,matrixOffset,matrixIntepretation,matrixLayout,matrixStride) | ||
| ; error: matrix layout value 'ColumnMajor' is not valid for outerproductaccumulate, must be 'OuterProductOptimal' | ||
| call void @dx.op.outerProductAccumulate.v8f16.v8f16(i32 307, <8 x half> %6, <8 x half> %9, %dx.types.Handle %10, i32 0, i32 8, i32 1, i32 0) ; OuterProductAccumulate(inputVector1,inputVector2,matrixBuffer,matrixOffset,matrixIntepretation,matrixLayout,matrixStride) | ||
| ; matrix layout value 'MulOptimal' is not valid for outerproductaccumulate, must be 'OuterProductOptimal' | ||
| call void @dx.op.outerProductAccumulate.v8f16.v8f16(i32 307, <8 x half> %6, <8 x half> %9, %dx.types.Handle %10, i32 0, i32 8, i32 2, i32 0) ; OuterProductAccumulate(inputVector1,inputVector2,matrixBuffer,matrixOffset,matrixIntepretation,matrixLayout,matrixStride) | ||
| ; error: matrix stride must be zero for optimal layouts | ||
| call void @dx.op.outerProductAccumulate.v8f16.v8f16(i32 307, <8 x half> %6, <8 x half> %9, %dx.types.Handle %10, i32 0, i32 8, i32 3, i32 64) ; OuterProductAccumulate(inputVector1,inputVector2,matrixBuffer,matrixOffset,matrixIntepretation,matrixLayout,matrixStride) | ||
| ; error: matrix stride must be zero for optimal layouts | ||
| call void @dx.op.outerProductAccumulate.v8f16.v8f16(i32 307, <8 x half> %6, <8 x half> %9, %dx.types.Handle %10, i32 0, i32 8, i32 3, i32 63) ; OuterProductAccumulate(inputVector1,inputVector2,matrixBuffer,matrixOffset,matrixIntepretation,matrixLayout,matrixStride) | ||
| ret void | ||
| } | ||
|
|
||
| ; Function Attrs: nounwind readonly | ||
| declare %dx.types.ResRet.v8f16 @dx.op.rawBufferVectorLoad.v8f16(i32, %dx.types.Handle, i32, i32, i32) #0 | ||
|
|
||
| ; Function Attrs: nounwind | ||
| declare void @dx.op.outerProductAccumulate.v8f16.v8f16(i32, <8 x half>, <8 x half>, %dx.types.Handle, i32, i32, i32, i32) #1 | ||
|
|
||
| ; Function Attrs: nounwind readnone | ||
| declare %dx.types.Handle @dx.op.annotateHandle(i32, %dx.types.Handle, %dx.types.ResourceProperties) #2 | ||
|
|
||
| ; Function Attrs: nounwind readnone | ||
| declare %dx.types.Handle @dx.op.createHandleFromBinding(i32, %dx.types.ResBind, i32, i1) #2 | ||
|
|
||
| attributes #0 = { nounwind readonly } | ||
| attributes #1 = { nounwind } | ||
| attributes #2 = { nounwind readnone } | ||
|
|
||
| !dx.version = !{!0} | ||
| !dx.valver = !{!0} | ||
| !dx.shaderModel = !{!1} | ||
| !dx.resources = !{!2} | ||
| !dx.entryPoints = !{!8} | ||
|
|
||
| !0 = !{i32 1, i32 9} | ||
| !1 = !{!"cs", i32 6, i32 9} | ||
| !2 = !{!3, !6, null, null} | ||
| !3 = !{!4, !5} | ||
| !4 = !{i32 0, %struct.ByteAddressBuffer* undef, !"", i32 0, i32 0, i32 1, i32 11, i32 0, null} | ||
| !5 = !{i32 1, %struct.ByteAddressBuffer* undef, !"", i32 0, i32 1, i32 1, i32 11, i32 0, null} | ||
| !6 = !{!7} | ||
| !7 = !{i32 0, %struct.RWByteAddressBuffer* undef, !"", i32 0, i32 0, i32 1, i32 11, i1 false, i1 false, i1 false, null} | ||
| !8 = !{void ()* @main, !"main", null, !2, !9} | ||
| !9 = !{i32 0, i64 8598323216, i32 4, !10} | ||
| !10 = !{i32 1, i32 1, i32 1} | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.