[SM6.10] Implement LinAlg groupshared Builtins#8254
Conversation
hekota
left a comment
There was a problem hiding this comment.
Looks good! Just a few comments about the tests.
|
Also the PR title and description do not mention that this is for LinAlg Matrix. |
Implements the Load/Store/Accumulate to memory groupshared builtins following the pattern of the previous builtins
1f308f9 to
e520d0a
Compare
There was a problem hiding this comment.
First, I think we can do this without adding the array overload, by simply accepting an address space 3 pointer to the overload type. This will require a new type code (perhaps $gs_x1), instead of $x1, mapped to a new macro, instead of EXT in GetOpFunc, which creates an address space 3 pointer to the element type.
There are problems with the array overload mechanism, as defined here:
- The array size is not captured to the overload name, yet it impacts the parameter type. This would lead to conflicting DXIL overloads when different array sizes are used in the same module.
- You can't define a set of allowed element types for the array, so any arbitrary type could be allowed (even non-scalar types).
- You don't capture the pointer address space constraint anywhere, so it could easily be created with the wrong address space. This would also create overload conflicts.
- You don't constrain it to a pointer, so you could use a by-value array argument, if that's what's passed in to create the function. This would also create overload conflicts.
If we decide instead that we want to keep the array overload, we should do it differently, more like the vector overload, where the accepted scalar component type set is still defined. This could use the same pattern as vector, using the [ character to denote an array type overload, overloads to the left indicating allowed non-array types, and ones to the right for what's allowed inside an array. Support could be limited at first to requiring [ to be left-most at first (not allowing array and non-array types). This would still require the new type code to indicate the address space 3 pointer to the element type.
My strong preference at this time is to simply use a pointer to the first array element (like an array-to-pointer decay). This would be a relatively simple change, requiring no new broader overload mechanisms.
|
I'm fine not adding the array overload type. It was a means to an end and I can't say I was exactly happy with it. I remember exploring a new |
Yeah, the spec says I can help with adding the new overload type usage in the DXIL op parameter definition (the |
| } else if (TypeSlot == TS_UDT) { | ||
| } | ||
|
|
||
| switch (TypeSlot) { |
There was a problem hiding this comment.
Since this function no longer has any functional changes, this is now a pure refactor. In fact, there are no other changes in this file that are not a result of generated code changes.
That's fine, but I thought it might be good to call that out. Think of this as just a review note.
| void [[min_sm=6.10]] __builtin_LinAlg_MatrixLoadFromDescriptor(out LinAlgMatrix ret, in ByteAddressBuffer buf, in uint offset, in uint stride, in uint layout); | ||
| void [[min_sm=6.10]] __builtin_LinAlg_MatrixLoadFromDescriptor(out LinAlgMatrix ret, in RWByteAddressBuffer buf, in uint offset, in uint stride, in uint layout); | ||
| void [[min_sm=6.10]] __builtin_LinAlg_MatrixLoadFromMemory(out LinAlgMatrix ret, in int GroupSharedMem, in uint offset, in uint stride, in uint layout); | ||
| void [[min_sm=6.10]] __builtin_LinAlg_MatrixLoadFromMemory(out LinAlgMatrix ret, groupshared numeric[] memory, in uint offset, in uint stride, in uint layout); |
There was a problem hiding this comment.
The experimental builtins above use LinAlg for the HLSL component set usable with the LinAlg operations. We should be able to use that here, unless our plan was to remove it for some reason.
The same would apply to the rest of the HLSL component templates in these builtins, so perhaps this change can be made in a separate follow-up PR.
There was a problem hiding this comment.
Yeah I'd prefer the put that into a separate PR. Do you happen to know where LinAlg<> is defined? The acceptable values are (slightly) different between CoopVec and LinAlg so it would need to be updated
There was a problem hiding this comment.
If this is about the component type enumeration, I checked in ComponentType enum into linalg.h with the same values for as we have for cooperative vectors, and filed an issue for Chris to update the spec to match: microsoft/hlsl-specs#779
There was a problem hiding this comment.
This is not about the enum, it's about the template type that defines the set of the component types allowed in HLSL for linalg operations.
There was a problem hiding this comment.
I think this could be a follow-up issue too.
| // REQUIRES: dxil-1-10 | ||
| // RUN: %dxc -T lib_6_10 -E main %s -ast-dump-implicit | FileCheck %s | ||
|
|
||
| // CHECK: FunctionDecl {{.*}} implicit used __builtin_LinAlg_MatrixStoreToMemory 'void (__builtin_LinAlgMatrix {{.*}}, float const __attribute__((address_space(3))) (&)[64], unsigned int, unsigned int, unsigned int)' extern |
There was a problem hiding this comment.
I think the const here reveals a problem from adding AR_QUAL_IN. I'd like to see the expansion of the SharedArr parameter passed to the function. Is there an RValue cast? I think this should be treated more like a reference, and there shouldn't be a const qualifier.
There was a problem hiding this comment.
Tried AR_QUAL_REF but that didn't change the const. Only diff is && became & is that what you expected?
IN: memory 'float const __attribute__((address_space(3))) (&)[64]'
REF: memory 'float const __attribute__((address_space(3))) (&&)[64]'
There was a problem hiding this comment.
Oh, yeah that's strange. Now that I think about it, maybe the const refers to the address as opposed to the memory it references. Also, I think groupshared might be adding the reference elsewhere, and adding REF might double that up, which would be bad. I think that function just needs to be updated to handle groupshared. I think that function is only used for intrinsics, which would be why the earlier change that added groupshared support for user function parameters didn't update it.
There was a problem hiding this comment.
Actually, I think const does apply to the elements. And (&&) is an rvalue reference, so I think that's wrong.
There was a problem hiding this comment.
What would you consider the "right" thing here? Happy to work on the function but I need to know what to work towards lol
There was a problem hiding this comment.
Maybe another question, is this particular issue something we can push off to an immediate follow up? Helena has a couple PRs blocked on this going in and I have some work that would be nice to build on top of Helena's PRs.
Getting this PR in unclogs the system, but I am committed to getting the signature right here.
There was a problem hiding this comment.
Okay here are all the breakdowns for current possible ParameterModifier::Kind values
IN: memory 'float const __attribute__((address_space(3))) (&)[64]'
INOUT: memory 'float const __attribute__((address_space(3))) (&&&__restrict)[64]'
INVALID: memory 'float const __attribute__((address_space(3))) (&)[64]'
OUT: memory 'float const __attribute__((address_space(3))) (&&&__restrict)[64]'
REF: memory 'float const __attribute__((address_space(3))) (&&)[64]'
UNIFORM: memory 'float const __attribute__((address_space(3))) (&)[64]'
None of them result in the const not being added so I'm not sure how to move forward here
There was a problem hiding this comment.
Let's go with IN and file an issue for follow-up.
tex3d
left a comment
There was a problem hiding this comment.
I'll approve this in the current state, given a couple follow-up issues: 1. The AR_QUAL_IN and const in builtin signature issue, and 2. Looking into replacing numeric with LinAlg in gen_intrin_main.txt.
Implements the LinAlg Load/Store/Accumulate to memory groupshared builtins following the pattern of the previous builtins in alignment with the spec. https://github.com/microsoft/hlsl-specs/blob/main/proposals/0035-linalg-matrix.md
Fixes #7903
Fixes #7905
Fixes #7907