Allow native vectors for LLVM operations#7155
Conversation
Disables various forms of scalarization and vector elimination to permit vectors to pass through to final DXIL when used in native LLVM operations and loading/storing. Introduces a few vector manipulation llvm instructions to DXIL allowing for them to appear in output DXIL. Skips passes for 6.9 that scalarize, convert to arrays, or otherwise eliminate vectors. This eliminates the element-by-element loading of the vectors In many cases, this required plumbing the shader model information to passes that didn't have it before. Many changes were needed for the MatrixBitcastLower pass related to linking to avoid converting matrix vectors, but also to perform the conversion if a shader was compiled for 6.9+, but then linked to a earlier target. This now adapts to the linker target to either preserve vectors for 6.9 or arrays for previous versions. This requires running the DynamicIndexing VectorToArray pass during linking since 6_x and 6_9+ will fail to run this in the initial compile, but will still need to lower vectors to arrays. Ternary conditional/select operators were element extracted in codegen. Removing this allows 6.9 to preserve the vectors, but also maintains behavior for previous shader models because the operations get scalarized later anyway. Keep groupshared variables as vectors for 6.9. They are no longer represented as indivual groupshared scalars. Adds extensive tests for these operations using different types and sizes and testing them appropriately. Booleans produce significantly different code, so they get their own test. Fixes microsoft#7123
|
The first two commits are dependencies that belong to a different PR. Only the last is relevant to this: 68a284b |
This got lost somewhere
llvm-beanz
left a comment
There was a problem hiding this comment.
This PR has changes to multiple passes that do not have IR tests. We should have tests that verify correctness of the changes in the passes separately from the Clang-level changes.
| def err_hlsl_out_indices_array_incorrect_access: Error< | ||
| "a vector in out indices array must be accessed as a whole">; | ||
| def err_hlsl_unsupported_long_vector: Error< | ||
| "Vectors of over 4 elements in %0 are not supported">; |
There was a problem hiding this comment.
| "Vectors of over 4 elements in %0 are not supported">; | |
| "vectors of over 4 elements in %0 are not supported">; |
There was a problem hiding this comment.
This change is not part of this PR, but a prerequisite as I tried to explain in this comment: #7155 (comment) I have fixed it in the relevant PR: #7143
| return false; | ||
| } | ||
|
|
||
| bool hlsl::HasLongVecs(const QualType &qt) { |
There was a problem hiding this comment.
There was a problem hiding this comment.
As above, this is not part of this PR, but part of the prerequisite #7143. I renamed it inline with your suggestion on that change, which I'm afraid doesn't match the coding standards you cited. e010223#r1960826249
| if (qt.isNull()) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
| if (qt.isNull()) { | |
| return false; | |
| } | |
| if (qt.isNull()) | |
| return false; |
There was a problem hiding this comment.
Not part of this PR. I've responded to it here: 50d1af5
| HLMatrixType MatTy = HLMatrixType::cast(EltTy); | ||
| Value *matSize = Builder.getInt32(MatTy.getNumElements()); | ||
| idxList.back() = Builder.CreateMul(idxList.back(), matSize); | ||
| if (!DM.GetShaderModel()->IsSM69Plus()) { |
There was a problem hiding this comment.
Can we base the low-level decisions on a descriptively named boolean field in the MatrixBitcastLowerPass class, initialized based on the shader model, instead of passing the DxilModule down and checking the shader model each time?
There was a problem hiding this comment.
Maybe even something about scalarizing operations, rather than anything to do with the native DXIL vector support?
There was a problem hiding this comment.
Since the behavior for vec1s should proceed exactly as previously, I've added indications of native vector support which will use that and the vector size to determine what to do in all of these passes.
| Value *GEP = CreateEltGEP(A, i, zeroIdx, Builder); | ||
| Value *Elt = Builder.CreateLoad(GEP); | ||
| NewVec = Builder.CreateInsertElement(NewVec, Elt, i); | ||
| } |
There was a problem hiding this comment.
I noticed that below this point, there's:
} else if (StoreInst *ST = dyn_cast<StoreInst>(U)) {where it still scalarizes the store for the vector.
Did you mean to leave that scalarization in?
There was a problem hiding this comment.
Yeah, I omitted store support in error. Clearly we need some testing for it as it never caused any issues.
|
|
||
| if (F.getParent()->HasDxilModule()) | ||
| if (F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus()) | ||
| return false; |
There was a problem hiding this comment.
So, this is where we presumably will still want to do something different for vec1, right?
There was a problem hiding this comment.
It's one of the places. However, this has no effect on the operators, but the intrinsics, which will come in another changes. So this particular file's changes are being reverted for now.
| Value *val = stInst->getValueOperand(); | ||
| Value *ldVal = Builder.CreateLoad(V); | ||
| ldVal = Builder.CreateInsertElement(ldVal, val, constIdx); | ||
| ldVal = Builder.CreateInsertElement(ldVal, val, constIdx); // UGH |
There was a problem hiding this comment.
Did you want to elaborate on the "UGH" comment? Is it on the general operation being performed, or something specific about this line?
There was a problem hiding this comment.
It was just how I identified places that I needed to do something to deal with scalarization. Obviously not meant to be left in. I have now done what I intended to address this.
| // all its users can be transformed, then split up the aggregate into its | ||
| // separate elements. | ||
| if (ShouldAttemptScalarRepl(AI) && isSafeAllocaToScalarRepl(AI)) { | ||
| if (!HLM.GetShaderModel()->IsSM69Plus() && ShouldAttemptScalarRepl(AI) && |
There was a problem hiding this comment.
This pass is so complicated as it is, can we capture a bool for DXIL vector support earlier and use that instead of the individual SM 6.9 checks here as well?
There was a problem hiding this comment.
Yeah. I'll do that throughout.
| bool Scalarizer::runOnFunction(Function &F) { | ||
| if (F.getParent()->HasDxilModule()) | ||
| if (F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus()) | ||
| return false; |
There was a problem hiding this comment.
Do we actually want to turn off scalarization entirely? Maybe this is one place where we need to preserve it for vec1 only?
There was a problem hiding this comment.
Proper support for vec1 was missing from this change. I've added it now.
| llvm::Value *EltL = Builder.CreateExtractElement(LHS, i); | ||
| llvm::Value *EltR = Builder.CreateExtractElement(RHS, i); | ||
| llvm::Value *EltSelect = Builder.CreateSelect(EltCond, EltL, EltR); | ||
| result = Builder.CreateInsertElement(result, EltSelect, i); |
There was a problem hiding this comment.
I guess nothing depended on this early scalarization of select? I wonder why it was added in the first place. I could imagine subtle regressions by removing this code if some optimization before scalarization depended on it.
There was a problem hiding this comment.
Nothing I could discern. If you have suggestions on how to further verify this, I'm happy to try.
There was a problem hiding this comment.
This looks like it was part of HLSL 2018 and earlier, and might be related to how the non-short-circuting behavior was implemented.
I see you have some HLSL 2018 tests below. I'll look at them more closely to see that they preserve non-short-circuiting.
Code coverage does indicate that this isn't dead code: https://microsoft.github.io/DirectXShaderCompiler/coverage/home/runner/work/DirectXShaderCompiler/DirectXShaderCompiler/tools/clang/lib/CodeGen/CGExprScalar.cpp.html#L3709
There was a problem hiding this comment.
I'm sure it's not dead, but I've found a lot of pre-emptive scalarization that just duplicates the work that later scalarization passes would do anyway. Short-circuiting involvement is a possibility though.
There was a problem hiding this comment.
The aspect of this code that prevented short-circuiting was returning before the logic below could potentially insert conditional jumps. The scalarization was incidental. This is why the scalar cases would skip it and just generate the select operation, which would still prevent short-circuiting without any scalarization as scalars don't short circuit any more than vectors do in HLSL 2018. That select generation is the behavior I maintained for vectors and scalars. In pre-6.9 cases, the vector select will be scalarized functionally equivalently as before.
With this hard-coded scalarization in place, it can't be skipped with the existing changes to scalarization passes. We'd need a special case conditional for 6.9 here.
| std::string profile = m_sema->getLangOpts().HLSLProfile; | ||
| const ShaderModel *SM = hlsl::ShaderModel::GetByName(profile.c_str()); |
There was a problem hiding this comment.
I wonder if we couldn't have a LangOpt for HLSLLongVectorSupport, initialized based on the shader model instead?
There was a problem hiding this comment.
We could. We had a slew of these as part of earlier shader models that eventually got removed and merged into a single SM. I could see the readability benefit of being able to query it individually and also in implementing the hidden flag to force scalarization. It's just that precedent of removing similar flags to minimize test area that gives me pause.
| uRows <= MaxVectorSize, | ||
| CAB((uCols > 0 && uRows > 0 && | ||
| ((uCols <= MaxVectorSize && uRows <= MaxVectorSize) || | ||
| (SM->IsSM69Plus() && uRows == 1))), |
There was a problem hiding this comment.
How is this change relevant to allowing native LLVM vectors? Shouldn't this be part of #7143 instead?
There was a problem hiding this comment.
It shouldn't be part of this change, but it isn't part of the decls change either. This is for argument matching of an HLSL intrinsic, which we aren't doing quite yet either.
|
|
||
| Link(L"main", L"ps_6_9", pLinker, {libName, libName2}, | ||
| {"alloca [2 x <12 x float>]", | ||
| "getelementptr [12 x float], [12 x float]*"}, |
There was a problem hiding this comment.
I'm not sure how to tell if this is right. Shouldn't this be dealing with a vector rather than an array here? Can I see the IR?
There was a problem hiding this comment.
I wrote this in December. I don't remember myself just now. I'll look into it and get back to you.
There was a problem hiding this comment.
Well I was ready to write something here about how this was compiled as 6.3 because that's what the file says, but it's actually 6_x. The 12 element array comes from HLMatrixLower, which I hadn't touched.
There was a problem hiding this comment.
Okay, it's just scratch space used to move the elements of the matrix around. They use vectors too. I don't know why the test cares about it so much, but it doesn't need changing. I'll take a look at leveraging long vectors for matrices later, but this works perfectly fine for now.
|
I have fixed the merge conflicts and responded to some of the feedback locally, but the way the dependencies are set up, if I push those changes before #7143 is merged, it's going to make distinguishing the prerequisite changes and the changes that are the subject of this PR even harder. |
joaosaffran
left a comment
There was a problem hiding this comment.
Code looks good to me. Not an expert though
pow2clk
left a comment
There was a problem hiding this comment.
Updated and added to pending comments I forgot to submit some time ago. A new commit should be coming soon, but there are a lot of IR tests to write.
| HLMatrixType MatTy = HLMatrixType::cast(EltTy); | ||
| Value *matSize = Builder.getInt32(MatTy.getNumElements()); | ||
| idxList.back() = Builder.CreateMul(idxList.back(), matSize); | ||
| if (!DM.GetShaderModel()->IsSM69Plus()) { |
There was a problem hiding this comment.
Since the behavior for vec1s should proceed exactly as previously, I've added indications of native vector support which will use that and the vector size to determine what to do in all of these passes.
| Value *GEP = CreateEltGEP(A, i, zeroIdx, Builder); | ||
| Value *Elt = Builder.CreateLoad(GEP); | ||
| NewVec = Builder.CreateInsertElement(NewVec, Elt, i); | ||
| } |
There was a problem hiding this comment.
Yeah, I omitted store support in error. Clearly we need some testing for it as it never caused any issues.
|
|
||
| if (F.getParent()->HasDxilModule()) | ||
| if (F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus()) | ||
| return false; |
There was a problem hiding this comment.
It's one of the places. However, this has no effect on the operators, but the intrinsics, which will come in another changes. So this particular file's changes are being reverted for now.
| Value *val = stInst->getValueOperand(); | ||
| Value *ldVal = Builder.CreateLoad(V); | ||
| ldVal = Builder.CreateInsertElement(ldVal, val, constIdx); | ||
| ldVal = Builder.CreateInsertElement(ldVal, val, constIdx); // UGH |
There was a problem hiding this comment.
It was just how I identified places that I needed to do something to deal with scalarization. Obviously not meant to be left in. I have now done what I intended to address this.
| // all its users can be transformed, then split up the aggregate into its | ||
| // separate elements. | ||
| if (ShouldAttemptScalarRepl(AI) && isSafeAllocaToScalarRepl(AI)) { | ||
| if (!HLM.GetShaderModel()->IsSM69Plus() && ShouldAttemptScalarRepl(AI) && |
There was a problem hiding this comment.
Yeah. I'll do that throughout.
| bool Scalarizer::runOnFunction(Function &F) { | ||
| if (F.getParent()->HasDxilModule()) | ||
| if (F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus()) | ||
| return false; |
There was a problem hiding this comment.
Proper support for vec1 was missing from this change. I've added it now.
| llvm::Value *EltL = Builder.CreateExtractElement(LHS, i); | ||
| llvm::Value *EltR = Builder.CreateExtractElement(RHS, i); | ||
| llvm::Value *EltSelect = Builder.CreateSelect(EltCond, EltL, EltR); | ||
| result = Builder.CreateInsertElement(result, EltSelect, i); |
There was a problem hiding this comment.
Nothing I could discern. If you have suggestions on how to further verify this, I'm happy to try.
| std::string profile = m_sema->getLangOpts().HLSLProfile; | ||
| const ShaderModel *SM = hlsl::ShaderModel::GetByName(profile.c_str()); |
There was a problem hiding this comment.
We could. We had a slew of these as part of earlier shader models that eventually got removed and merged into a single SM. I could see the readability benefit of being able to query it individually and also in implementing the hidden flag to force scalarization. It's just that precedent of removing similar flags to minimize test area that gives me pause.
| uRows <= MaxVectorSize, | ||
| CAB((uCols > 0 && uRows > 0 && | ||
| ((uCols <= MaxVectorSize && uRows <= MaxVectorSize) || | ||
| (SM->IsSM69Plus() && uRows == 1))), |
There was a problem hiding this comment.
It shouldn't be part of this change, but it isn't part of the decls change either. This is for argument matching of an HLSL intrinsic, which we aren't doing quite yet either.
|
|
||
| Link(L"main", L"ps_6_9", pLinker, {libName, libName2}, | ||
| {"alloca [2 x <12 x float>]", | ||
| "getelementptr [12 x float], [12 x float]*"}, |
There was a problem hiding this comment.
I wrote this in December. I don't remember myself just now. I'll look into it and get back to you.
|
|
||
| // Flat Global vector if no dynamic vector indexing. | ||
| // Flatten Global vector if no dynamic vector indexing. | ||
| bool bFlatVector = !hasDynamicVectorIndexing(GV); |
There was a problem hiding this comment.
So I understand not flattening vectors. does this also mean we won't flatten multi-dimensional arrays of vectors if we have dynamic vector indexing?
There was a problem hiding this comment.
Vectors and their dynamic indexing have no bearing on multi-dimensional array flattening. hasDynamicVectorIndexing strips off all array GEPs and only pays attention to how the vector itself is indexed. The DynamicVectorIndexingVectorToArray pass is run before MultiDimArrayToOneDimArray is. By the time we're worried about flattening arrays, any vector flattening has already been done. In non-lib pre-6.9 cases, that means the vector element looks just like another layer of the multidimensional array and gets flattened with the rest of it. In cases where the vector survives, it becomes the element of the flattened array just like the scalar would. That's why no changes to MultiDimArrayToOneDimArray were required.
| if (F.getParent()->HasDxilModule()) | ||
| if (F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus()) |
There was a problem hiding this comment.
| if (F.getParent()->HasDxilModule()) | |
| if (F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus()) | |
| auto *Parent = F.getParent(); | |
| if (Parent->HasDxilModule()) | |
| if (Parent->GetDxilModule().GetShaderModel()->IsSM69Plus()) |
There was a problem hiding this comment.
Could even simplify this further to just:
SupportsVectors = F.getParent()->HasDxilModule() && F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus();There was a problem hiding this comment.
I don't feel like these comments are pulling in the same direction. The first suggests adding a variable to prevent redundant parent retrieval. The second retains that redundancy, but places it on a single line that clang-format will just break up into multiple lines again. I will merge the conditionals though.
Though the compiler will likely remove any redundancy, I'm better disposed toward the temp variable unless there is some coding guideline that insists on a ?: operator in this case.
My reading of https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable doesn't suggest that auto is best here as it's not clear what type the parent will be without knowledge of what function parents always(?) are. So I'm going to spell out Module
| !3 = !{i32 1, i32 6} | ||
| !4 = !{i32 1, i32 6} | ||
| !5 = !{!"lib", i32 6, i32 6} | ||
| !6 = !{i32 1, float ([3 x <3 x float>]*)* @"\01?bar@@YAMY02V?$vector@M$02@@@Z", !7, float ([3 x <3 x float>]*)* @"\01?foo@@YAMY02V?$vector@M$02@@@Z", !7} |
There was a problem hiding this comment.
what happens if we have a [3 x [3 x <3 x float>]]?
There was a problem hiding this comment.
Nothing different. Multidimensional array flattening only interacts with vector flattening when the vectors look like arrays. Anyway, this test is only affected by this change because it was missing the needed metadata to reconstruct the HLModule when invoked with dxopt. It didn't need it before. Because of the shader model dependency, it does now.
| "OTHER", 53, "VAArg", "VAArgInst", "vaarg instruction", "", [] | ||
| ) | ||
|
|
||
| self.add_llvm_instr( |
There was a problem hiding this comment.
I understand why we are doing this since this change maps back to the change in include/dxc/DXIL/DxilInstructions.h.
I'm curious what it means though. What does it mean to add_llvm_instr in this case? I don't think it means anything about validity because i've seen these instructions emitted by DXC before?
Does this means there is a way to print out all the llvm instructions DXC supports via querying the DirectXShaderCompiler.utils.hct.hctdb_instrhelp package?
I'm curious for my own reasons because I want to build a Set difference of DXC to upstream llvm and know which instructions need to be legalized.
There was a problem hiding this comment.
It does have to do with the validity in non-lib shaders. As I at least partially explained above in DxilInstructions.h, it prevents these from being immediately rejected for non-lib shaders in validation:
.You could construct that query or you could just look at the LLVM operations indicated by a call to add_llvm_instr in hctdb.py, or you could look at the final form of the IsLLVMInstructionAllowed function the DxilValidationImpl.inc file from a build of DXC and see the opcode value range checks. The only exceptions previously between libs and other shaders were these three, so there are no other gotchas where an operation might seem to be disallowed by these metrics, but really is for a specific case.
I don't know why the order of the extractelements varies, but it does even for the same build depending on whether a debugger is attached. It would seem to be an unordered container object, but there is no such. It just generates them as it goes and never touches them again which would suggest the opposite order in a few cases. They are constant expressions, that's probably why they move around, but I can't find where to make them consistent.
| // CHECK-SAME: ([11 x <[[NUM:[0-9][0-9]*]] x [[TYPE:[a-z0-9]*]]>]* | ||
| export void bittwiddlers(inout vector<TYPE, NUM> things[11]) { | ||
| // CHECK: [[adr1:%[0-9]*]] = getelementptr inbounds [11 x <[[NUM]] x [[TYPE]]>], [11 x <[[NUM]] x [[TYPE]]>]* %things, i32 0, i32 1 | ||
| // CHECK: [[vec1:%[0-9]*]] = load <[[NUM]] x [[TYPE]]>, <[[NUM]] x [[TYPE]]>* [[adr1]] |
There was a problem hiding this comment.
would it make sense to make most of these CHECK-NEXT?
There was a problem hiding this comment.
Most of them could be, but not all. Some of them have a few commands I left out because they weren't part of the logic flow of interest. I think because I meticulously track the creation and usage of all the values, the matches must either be the instructions I intend or I suppose theoretically some that are entirely equivalent in function. It isn't necessary to the correct outcome that they be on immediately subsequent lines and the dependency tracking means there's no risk of false matches that CHECK-NEXT might prevent otherwise.
That's why it's not needed. I don't want to do it because rechecking all these for which ones can and can't and which ones may on one platform and not an another is not an exercise that I want to undergo without reason to do so.
| #include "dxc/DXIL/DxilModule.h" | ||
|
|
There was a problem hiding this comment.
Presumedly this isn't needed.
| #include "dxc/DXIL/DxilModule.h" |
There was a problem hiding this comment.
Yeah. Left in by mistake.
| const DataLayout &DL, | ||
| bool SupportsVectors, bool hasPrecise, | ||
| DxilTypeSystem &typeSys, const DataLayout &DL, | ||
| SmallVector<Value *, 32> &DeadInsts, |
There was a problem hiding this comment.
I'm not requesting that you fix this, but the use of a SmallVector as an argument does stand out to me. Generally the convention is to use a SmallVectorImpl instead as function arguments so that the function can be used with SmallVectors of different static allocation size.
| llvm::Value *EltL = Builder.CreateExtractElement(LHS, i); | ||
| llvm::Value *EltR = Builder.CreateExtractElement(RHS, i); | ||
| llvm::Value *EltSelect = Builder.CreateSelect(EltCond, EltL, EltR); | ||
| result = Builder.CreateInsertElement(result, EltSelect, i); |
There was a problem hiding this comment.
This looks like it was part of HLSL 2018 and earlier, and might be related to how the non-short-circuting behavior was implemented.
I see you have some HLSL 2018 tests below. I'll look at them more closely to see that they preserve non-short-circuiting.
Code coverage does indicate that this isn't dead code: https://microsoft.github.io/DirectXShaderCompiler/coverage/home/runner/work/DirectXShaderCompiler/DirectXShaderCompiler/tools/clang/lib/CodeGen/CGExprScalar.cpp.html#L3709
| if (F.getParent()->HasDxilModule()) | ||
| if (F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus()) |
There was a problem hiding this comment.
Could even simplify this further to just:
SupportsVectors = F.getParent()->HasDxilModule() && F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus();| } | ||
|
|
||
| // Flat Global vector if no dynamic vector indexing. | ||
| // Flatten Global vector if no dynamic vector indexing. |
There was a problem hiding this comment.
nit: If we're fixing a comment, might as well make it a complete sentence.
| // Flatten Global vector if no dynamic vector indexing. | |
| // Flatten global vector if no dynamic vector indexing occurs. |
There was a problem hiding this comment.
I worded it more in keeping with the name of the utility function.
| // CHECK: [[bvec5:%[0-9]*]] = icmp ne <[[NUM]] x i32> [[vec5]], zeroinitializer | ||
| // MORE STUFF | ||
|
|
||
| res[3] = truth[3] ? truth[4] : truth[5]; |
There was a problem hiding this comment.
Can we have a HLSL 2018 test case something like:
export float4 main(bool4 B, float4 X, float4 Y) {
return B ? ++X : Y;
}
What I want to verify is that the side-effect of the LHS always occurs and no branch is generated under HLSL 2018.
There was a problem hiding this comment.
It's a good suggestion. The native vectors might interfere with that behavior.
Initialize mattype with check for matrixness remove leftover include reword comment refactor SM69 conditional to avoid double parent retrieval Add test that confirms no short-circuiting with native vector logic ops and HLSL 2018. Revise vec1 scalarizer test that was mistakenly generated with HLSL 2021 which included short-circuiting. Add validation check for vector operations in pre-6.9
bogner
left a comment
There was a problem hiding this comment.
Generally looks good to me, though it'd be good to get explicit approval from one of the folks who've done a few rounds of review on it.
Some of the LLVM IR tests have a few bits of unreferenced or unnecessary metadata that it'd be nice to clean up before committing.
| // Since HL module can't be created when linking, check for that first. | ||
| // Otherwise, either retrieve or generate the HL module. |
There was a problem hiding this comment.
Should this say "Since dxmodule can't be created when linking..."?
There was a problem hiding this comment.
No, we actually can create a dxil module when linking. Maybe I'll replace it with getorcreatedxilmodule just in case. We can't create an hlmodule while linking
There was a problem hiding this comment.
I see why it's confusing now though. the "it" was supposed to be a dxil module, but that is not at all clear from the wording.
| } // namespace | ||
|
|
||
| void MatrixBitcastLowerPass::lowerMatrix(Instruction *M, Value *A) { | ||
| void MatrixBitcastLowerPass::lowerMatrix(DxilModule &DM, Instruction *M, |
There was a problem hiding this comment.
I'm not seeing any use of DM, other than the recursive one required by this function signature.
There was a problem hiding this comment.
Yeah, it's a relic from when I was getting the version from it.
| StringRef getPassName() const override { return "Matrix Bitcast lower"; } | ||
| bool runOnFunction(Function &F) override { | ||
| DxilModule &DM = F.getParent()->GetOrCreateDxilModule(); | ||
| SupportsVectors = DM.GetShaderModel()->IsSM69Plus(); |
There was a problem hiding this comment.
If you want to avoid full DxilModule dependency, I think it would be reasonable to have something that can look up the DXIL version in the module without loading full metadata, whether or not it's HLM or DM.
There was a problem hiding this comment.
I like this idea. I know I mentioned it somewhere before. I'd prefer it not block this change though.
| SupportsVectors = M.GetDxilModule().GetShaderModel()->IsSM69Plus(); | ||
| } else { | ||
| HLModule &HLM = M.GetOrCreateHLModule(); | ||
| SupportsVectors = HLM.GetShaderModel()->IsSM69Plus(); |
There was a problem hiding this comment.
I mentioned earlier, it might be better to get the DXIL version from metadata without relying on full HLModule/DxilModule metadata loading. Maybe we can add a dxilutil::ReadDxilVersion function?
But the fixes to the HLModule loading are appreciated!
Just as an informational note, I discovered that the |
Remove some unneeded parts of IR tests. Add some back. I discovered that the pauseresume line is at least sometimes needed to construct the module with the shader model information. I fear some of the tests were passing before becuase they expected no changes, but were operating in non-6.9 mode. So they passed, but for the wrong reason. Remove unused DM param to matrixbitcastlowerpass lower matrix call. reword confusing dynindexingvectortoarray comment
|
|
||
| bool Scalarizer::runOnFunction(Function &F) { | ||
| Module *M = F.getParent(); | ||
| if (M->HasDxilModule() && M->GetDxilModule().GetShaderModel()->IsSM69Plus()) |
There was a problem hiding this comment.
I think the independent lookup of DXIL version would help here, since it won't do something different when there's no created DxilModule already present.
There was a problem hiding this comment.
I could also replace it with getorcreate. I was a bit nervous about that becuase of how it failed for hlmodules, but I've since learned that it's a lot more robust and I've made the hlmodule variant similarly so.
Adds a dxilutil function to retrieve shader model from metadata. To enable this, the metadata helper implementation is split into a few utility functions which will continue to throw the exceptions as they did before, but allows passes to call them without fear of crashes as a result of that. Passes will assume the default shader model where it can't retrieve it for one reason or another. Changes scalarizer to try to retrieve the shader module when no module is found. Changes dyanmicvector to array to use the utility function and never try to recreate the hlmodule. Instead it will try the dxilmodule and if that isn't present, it will use the metadata for shader model only.
With the minimal metadata version requirement, these can be more succinct
Disables various forms of scalarization and vector elimination to permit
vectors to pass through to final DXIL when used in native LLVM
operations and loading/storing.
Introduces a few vector manipulation llvm instructions to DXIL allowing
for them to appear in output DXIL.
Skips passes for 6.9 that scalarize, convert to arrays, or otherwise eliminate vectors.
This eliminates the element-by-element loading of the vectors
In many cases, this required plumbing the shader model information to
passes that didn't have it before.
Many changes were needed for the MatrixBitcastLower pass related to
linking to avoid converting matrix vectors, but also to perform the
conversion if a shader was compiled for 6.9+, but then linked to a
earlier target.
This now adapts to the linker target to either preserve vectors for 6.9 or arrays for previous versions.
This requires running the DynamicIndexing VectorToArray pass during linking since 6_x and 6_9+ will fail to run this in the initial compile, but will still need to lower vectors to arrays.
Ternary conditional/select operators were element extracted in codegen.
Removing this allows 6.9 to preserve the vectors, but also maintains
behavior for previous shader models because the operations get
scalarized later anyway.
Keep groupshared variables as vectors for 6.9. They are no longer represented as indivual groupshared scalars.
Adds extensive tests for these operations using different types and
sizes and testing them appropriately. Booleans produce significantly
different code, so they get their own test.
Fixes #7123