-
Notifications
You must be signed in to change notification settings - Fork 852
Allow native vectors for LLVM operations #7155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
68a284b
556f6e6
654bf8a
248fe80
f1de617
47d42f0
d480caa
c2ee61b
a3a39b8
6f7f9ec
1596d99
411b921
8dc2a3e
46829dd
9002bee
ba831c0
7292edb
96ac2e2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -113,13 +113,13 @@ class MatrixBitcastLowerPass : public FunctionPass { | |
|
|
||
| // Lower matrix first. | ||
| for (BitCastInst *BCI : matCastSet) { | ||
| lowerMatrix(BCI, BCI->getOperand(0)); | ||
| lowerMatrix(DM, BCI, BCI->getOperand(0)); | ||
| } | ||
| return bUpdated; | ||
| } | ||
|
|
||
| private: | ||
| void lowerMatrix(Instruction *M, Value *A); | ||
| void lowerMatrix(DxilModule &DM, Instruction *M, Value *A); | ||
| bool hasCallUser(Instruction *M); | ||
| }; | ||
|
|
||
|
|
@@ -180,7 +180,8 @@ Value *CreateEltGEP(Value *A, unsigned i, Value *zeroIdx, | |
| } | ||
| } // namespace | ||
|
|
||
| void MatrixBitcastLowerPass::lowerMatrix(Instruction *M, Value *A) { | ||
| void MatrixBitcastLowerPass::lowerMatrix(DxilModule &DM, Instruction *M, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not seeing any use of
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, it's a relic from when I was getting the version from it. |
||
| Value *A) { | ||
| for (auto it = M->user_begin(); it != M->user_end();) { | ||
| User *U = *(it++); | ||
| if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(U)) { | ||
|
|
@@ -193,31 +194,42 @@ void MatrixBitcastLowerPass::lowerMatrix(Instruction *M, Value *A) { | |
| SmallVector<Value *, 2> idxList(GEP->idx_begin(), GEP->idx_end()); | ||
| DXASSERT(idxList.size() == 2, | ||
| "else not one dim matrix array index to matrix"); | ||
|
|
||
| HLMatrixType MatTy = HLMatrixType::cast(EltTy); | ||
| Value *matSize = Builder.getInt32(MatTy.getNumElements()); | ||
| idxList.back() = Builder.CreateMul(idxList.back(), matSize); | ||
| if (!DM.GetShaderModel()->IsSM69Plus()) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we base the low-level decisions on a descriptively named boolean field in the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe even something about scalarizing operations, rather than anything to do with the native DXIL vector support?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since the behavior for vec1s should proceed exactly as previously, I've added indications of native vector support which will use that and the vector size to determine what to do in all of these passes. |
||
| HLMatrixType MatTy = HLMatrixType::cast(EltTy); | ||
| Value *matSize = Builder.getInt32(MatTy.getNumElements()); | ||
| idxList.back() = Builder.CreateMul(idxList.back(), matSize); | ||
| } | ||
| Value *NewGEP = Builder.CreateGEP(A, idxList); | ||
| lowerMatrix(GEP, NewGEP); | ||
| lowerMatrix(DM, GEP, NewGEP); | ||
| DXASSERT(GEP->user_empty(), "else lower matrix fail"); | ||
| GEP->eraseFromParent(); | ||
| } else { | ||
| DXASSERT(0, "invalid GEP for matrix"); | ||
| } | ||
| } else if (BitCastInst *BCI = dyn_cast<BitCastInst>(U)) { | ||
| lowerMatrix(BCI, A); | ||
| lowerMatrix(DM, BCI, A); | ||
| DXASSERT(BCI->user_empty(), "else lower matrix fail"); | ||
| BCI->eraseFromParent(); | ||
| } else if (LoadInst *LI = dyn_cast<LoadInst>(U)) { | ||
| if (VectorType *Ty = dyn_cast<VectorType>(LI->getType())) { | ||
| IRBuilder<> Builder(LI); | ||
| Value *zeroIdx = Builder.getInt32(0); | ||
| unsigned vecSize = Ty->getNumElements(); | ||
| Value *NewVec = UndefValue::get(LI->getType()); | ||
| for (unsigned i = 0; i < vecSize; i++) { | ||
| Value *GEP = CreateEltGEP(A, i, zeroIdx, Builder); | ||
| Value *Elt = Builder.CreateLoad(GEP); | ||
| NewVec = Builder.CreateInsertElement(NewVec, Elt, i); | ||
| Value *NewVec = nullptr; | ||
| if (DM.GetShaderModel()->IsSM69Plus()) { | ||
| // Just create a replacement load using the vector pointer. | ||
| Instruction *NewLI = LI->clone(); | ||
| unsigned VecIdx = NewLI->getNumOperands() - 1; | ||
| NewLI->setOperand(VecIdx, A); | ||
| Builder.Insert(NewLI); | ||
| NewVec = NewLI; | ||
| } else { | ||
| Value *zeroIdx = Builder.getInt32(0); | ||
| unsigned vecSize = Ty->getNumElements(); | ||
| NewVec = UndefValue::get(LI->getType()); | ||
| for (unsigned i = 0; i < vecSize; i++) { | ||
| Value *GEP = CreateEltGEP(A, i, zeroIdx, Builder); | ||
| Value *Elt = Builder.CreateLoad(GEP); | ||
| NewVec = Builder.CreateInsertElement(NewVec, Elt, i); | ||
| } | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I noticed that below this point, there's: } else if (StoreInst *ST = dyn_cast<StoreInst>(U)) {where it still scalarizes the store for the vector. Did you mean to leave that scalarization in?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I omitted store support in error. Clearly we need some testing for it as it never caused any issues. |
||
| } | ||
| LI->replaceAllUsesWith(NewVec); | ||
| LI->eraseFromParent(); | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -10,6 +10,8 @@ | |||
| // // | ||||
| /////////////////////////////////////////////////////////////////////////////// | ||||
|
|
||||
| #include "dxc/DXIL/DxilModule.h" | ||||
|
|
||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Presumedly this isn't needed.
Suggested change
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah. Left in by mistake. |
||||
| #include "llvm/IR/Dominators.h" | ||||
| #include "llvm/IR/Instructions.h" | ||||
| #include "llvm/Pass.h" | ||||
|
|
@@ -151,6 +153,10 @@ bool DxilEliminateVector::TryRewriteDebugInfoForVector(InsertElementInst *IE) { | |||
|
|
||||
| bool DxilEliminateVector::runOnFunction(Function &F) { | ||||
|
|
||||
| if (F.getParent()->HasDxilModule()) | ||||
| if (F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus()) | ||||
| return false; | ||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, this is where we presumably will still want to do something different for vec1, right?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's one of the places. However, this has no effect on the operators, but the intrinsics, which will come in another changes. So this particular file's changes are being reverted for now. |
||||
|
|
||||
| auto *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree(); | ||||
| DxilValueCache *DVC = &getAnalysis<DxilValueCache>(); | ||||
|
|
||||
|
|
||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -8,6 +8,7 @@ | |
| //===----------------------------------------------------------------------===// | ||
|
|
||
| #include "dxc/DXIL/DxilConstants.h" | ||
| #include "dxc/DXIL/DxilModule.h" | ||
| #include "dxc/DXIL/DxilOperations.h" | ||
| #include "dxc/DXIL/DxilUtil.h" | ||
| #include "dxc/HLSL/HLModule.h" | ||
|
|
@@ -180,10 +181,12 @@ bool LowerTypePass::runOnModule(Module &M) { | |
| namespace { | ||
| class DynamicIndexingVectorToArray : public LowerTypePass { | ||
| bool ReplaceAllVectors; | ||
| bool SupportsVectors; | ||
|
|
||
| public: | ||
| explicit DynamicIndexingVectorToArray(bool ReplaceAll = false) | ||
| : LowerTypePass(ID), ReplaceAllVectors(ReplaceAll) {} | ||
| : LowerTypePass(ID), ReplaceAllVectors(ReplaceAll), | ||
| SupportsVectors(false) {} | ||
| static char ID; // Pass identification, replacement for typeid | ||
| void applyOptions(PassOptions O) override; | ||
| void dumpConfig(raw_ostream &OS) override; | ||
|
|
@@ -194,6 +197,7 @@ class DynamicIndexingVectorToArray : public LowerTypePass { | |
| Type *lowerType(Type *Ty) override; | ||
| Constant *lowerInitVal(Constant *InitVal, Type *NewTy) override; | ||
| StringRef getGlobalPrefix() override { return ".v"; } | ||
| void initialize(Module &M) override; | ||
|
|
||
| private: | ||
| bool HasVectorDynamicIndexing(Value *V); | ||
|
|
@@ -207,6 +211,11 @@ class DynamicIndexingVectorToArray : public LowerTypePass { | |
| void ReplaceAddrSpaceCast(ConstantExpr *CE, Value *A, IRBuilder<> &Builder); | ||
| }; | ||
|
|
||
| void DynamicIndexingVectorToArray::initialize(Module &M) { | ||
| if (M.HasHLModule()) | ||
| SupportsVectors = M.GetHLModule().GetShaderModel()->IsSM69Plus(); | ||
| } | ||
|
|
||
| void DynamicIndexingVectorToArray::applyOptions(PassOptions O) { | ||
| GetPassOptionBool(O, "ReplaceAllVectors", &ReplaceAllVectors, | ||
| ReplaceAllVectors); | ||
|
|
@@ -286,7 +295,7 @@ void DynamicIndexingVectorToArray::ReplaceStaticIndexingOnVector(Value *V) { | |
| StoreInst *stInst = cast<StoreInst>(GEPUser); | ||
| Value *val = stInst->getValueOperand(); | ||
| Value *ldVal = Builder.CreateLoad(V); | ||
| ldVal = Builder.CreateInsertElement(ldVal, val, constIdx); | ||
| ldVal = Builder.CreateInsertElement(ldVal, val, constIdx); // UGH | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did you want to elaborate on the "UGH" comment? Is it on the general operation being performed, or something specific about this line?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was just how I identified places that I needed to do something to deal with scalarization. Obviously not meant to be left in. I have now done what I intended to address this. |
||
| Builder.CreateStore(ldVal, V); | ||
| stInst->eraseFromParent(); | ||
| } | ||
|
|
@@ -306,8 +315,11 @@ void DynamicIndexingVectorToArray::ReplaceStaticIndexingOnVector(Value *V) { | |
| } | ||
|
|
||
| bool DynamicIndexingVectorToArray::needToLower(Value *V) { | ||
| // Only needed where vectors aren't supported. | ||
| if (SupportsVectors) | ||
| return false; | ||
| Type *Ty = V->getType()->getPointerElementType(); | ||
| if (dyn_cast<VectorType>(Ty)) { | ||
| if (isa<VectorType>(Ty)) { | ||
| if (isa<GlobalVariable>(V) || ReplaceAllVectors) { | ||
| return true; | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1869,7 +1869,8 @@ bool SROAGlobalAndAllocas(HLModule &HLM, bool bHasDbgInfo) { | |
| // if | ||
| // all its users can be transformed, then split up the aggregate into its | ||
| // separate elements. | ||
| if (ShouldAttemptScalarRepl(AI) && isSafeAllocaToScalarRepl(AI)) { | ||
| if (!HLM.GetShaderModel()->IsSM69Plus() && ShouldAttemptScalarRepl(AI) && | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This pass is so complicated as it is, can we capture a bool for DXIL vector support earlier and use that instead of the individual SM 6.9 checks here as well?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah. I'll do that throughout. |
||
| isSafeAllocaToScalarRepl(AI)) { | ||
| std::vector<Value *> Elts; | ||
| IRBuilder<> Builder(dxilutil::FindAllocaInsertionPt(AI)); | ||
| bool hasPrecise = HLModule::HasPreciseAttributeWithMetadata(AI); | ||
|
|
@@ -1945,8 +1946,9 @@ bool SROAGlobalAndAllocas(HLModule &HLM, bool bHasDbgInfo) { | |
| continue; | ||
| } | ||
|
|
||
| // Flat Global vector if no dynamic vector indexing. | ||
| bool bFlatVector = !hasDynamicVectorIndexing(GV); | ||
| // Flat Global vector if no dynamic vector indexing and pre-6.9. | ||
| bool bFlatVector = | ||
| !hasDynamicVectorIndexing(GV) && !HLM.GetShaderModel()->IsSM69Plus(); | ||
|
|
||
| if (bFlatVector) { | ||
| GVDbgOffset &dbgOffset = GVDbgOffsetMap[GV]; | ||
|
|
@@ -1980,10 +1982,12 @@ bool SROAGlobalAndAllocas(HLModule &HLM, bool bHasDbgInfo) { | |
| } else { | ||
| // SROA_Parameter_HLSL has no access to a domtree, if one is needed, | ||
| // it'll be generated | ||
| SROAed = SROA_Helper::DoScalarReplacement( | ||
| GV, Elts, Builder, bFlatVector, | ||
| // TODO: set precise. | ||
| /*hasPrecise*/ false, typeSys, DL, DeadInsts, /*DT*/ nullptr); | ||
| if (!HLM.GetShaderModel()->IsSM69Plus()) { | ||
| SROAed = SROA_Helper::DoScalarReplacement( | ||
| GV, Elts, Builder, bFlatVector, | ||
| // TODO: set precise. | ||
| /*hasPrecise*/ false, typeSys, DL, DeadInsts, /*DT*/ nullptr); | ||
| } | ||
| } | ||
|
|
||
| if (SROAed) { | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -14,6 +14,8 @@ | |||||||||||
| // | ||||||||||||
| //===----------------------------------------------------------------------===// | ||||||||||||
|
|
||||||||||||
| #include "dxc/DXIL/DxilModule.h" | ||||||||||||
|
|
||||||||||||
| #include "llvm/ADT/STLExtras.h" | ||||||||||||
| #include "llvm/IR/IRBuilder.h" | ||||||||||||
| #include "llvm/IR/InstVisitor.h" | ||||||||||||
|
|
@@ -290,6 +292,10 @@ bool Scalarizer::doInitialization(Module &M) { | |||||||||||
| } | ||||||||||||
|
|
||||||||||||
| bool Scalarizer::runOnFunction(Function &F) { | ||||||||||||
| if (F.getParent()->HasDxilModule()) | ||||||||||||
| if (F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus()) | ||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could even simplify this further to just: SupportsVectors = F.getParent()->HasDxilModule() && F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus();
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't feel like these comments are pulling in the same direction. The first suggests adding a variable to prevent redundant parent retrieval. The second retains that redundancy, but places it on a single line that clang-format will just break up into multiple lines again. I will merge the conditionals though. Though the compiler will likely remove any redundancy, I'm better disposed toward the temp variable unless there is some coding guideline that insists on a My reading of https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable doesn't suggest that |
||||||||||||
| return false; | ||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we actually want to turn off scalarization entirely? Maybe this is one place where we need to preserve it for vec1 only?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Proper support for vec1 was missing from this change. I've added it now. |
||||||||||||
|
|
||||||||||||
| for (Function::iterator BBI = F.begin(), BBE = F.end(); BBI != BBE; ++BBI) { | ||||||||||||
| BasicBlock *BB = BBI; | ||||||||||||
| for (BasicBlock::iterator II = BB->begin(), IE = BB->end(); II != IE;) { | ||||||||||||
|
|
||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3713,20 +3713,7 @@ VisitAbstractConditionalOperator(const AbstractConditionalOperator *E) { | |
| llvm::Value *CondV = CGF.EmitScalarExpr(condExpr); | ||
| llvm::Value *LHS = Visit(lhsExpr); | ||
| llvm::Value *RHS = Visit(rhsExpr); | ||
| if (llvm::VectorType *VT = dyn_cast<llvm::VectorType>(CondV->getType())) { | ||
| llvm::VectorType *ResultVT = cast<llvm::VectorType>(LHS->getType()); | ||
| llvm::Value *result = llvm::UndefValue::get(ResultVT); | ||
| for (unsigned i = 0; i < VT->getNumElements(); i++) { | ||
| llvm::Value *EltCond = Builder.CreateExtractElement(CondV, i); | ||
| llvm::Value *EltL = Builder.CreateExtractElement(LHS, i); | ||
| llvm::Value *EltR = Builder.CreateExtractElement(RHS, i); | ||
| llvm::Value *EltSelect = Builder.CreateSelect(EltCond, EltL, EltR); | ||
| result = Builder.CreateInsertElement(result, EltSelect, i); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess nothing depended on this early scalarization of select? I wonder why it was added in the first place. I could imagine subtle regressions by removing this code if some optimization before scalarization depended on it.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nothing I could discern. If you have suggestions on how to further verify this, I'm happy to try.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This looks like it was part of HLSL 2018 and earlier, and might be related to how the non-short-circuting behavior was implemented. I see you have some HLSL 2018 tests below. I'll look at them more closely to see that they preserve non-short-circuiting. Code coverage does indicate that this isn't dead code: https://microsoft.github.io/DirectXShaderCompiler/coverage/home/runner/work/DirectXShaderCompiler/DirectXShaderCompiler/tools/clang/lib/CodeGen/CGExprScalar.cpp.html#L3709
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm sure it's not dead, but I've found a lot of pre-emptive scalarization that just duplicates the work that later scalarization passes would do anyway. Short-circuiting involvement is a possibility though.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The aspect of this code that prevented short-circuiting was returning before the logic below could potentially insert conditional jumps. The scalarization was incidental. This is why the scalar cases would skip it and just generate the select operation, which would still prevent short-circuiting without any scalarization as scalars don't short circuit any more than vectors do in HLSL 2018. That select generation is the behavior I maintained for vectors and scalars. In pre-6.9 cases, the vector select will be scalarized functionally equivalently as before. With this hard-coded scalarization in place, it can't be skipped with the existing changes to scalarization passes. We'd need a special case conditional for 6.9 here. |
||
| } | ||
| return result; | ||
| } else { | ||
| return Builder.CreateSelect(CondV, LHS, RHS); | ||
| } | ||
| return Builder.CreateSelect(CondV, LHS, RHS); | ||
| } | ||
| if (hlsl::IsHLSLMatType(E->getType())) { | ||
| llvm::Value *Cond = CGF.EmitScalarExpr(condExpr); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5185,8 +5185,8 @@ class HLSLExternalSource : public ExternalSemaSource { | |
| } | ||
| return false; | ||
| } else if (Template->getTemplatedDecl()->hasAttr<HLSLTessPatchAttr>()) { | ||
| DXASSERT(TemplateArgList.size() == 1, | ||
| "Tessellation patch has more than one template arg"); | ||
| DXASSERT(TemplateArgList.size() > 0, | ||
| "Tessellation patch should have at least one template args"); | ||
| const TemplateArgumentLoc &argLoc = TemplateArgList[0]; | ||
| const TemplateArgument &arg = argLoc.getArgument(); | ||
| DXASSERT(arg.getKind() == TemplateArgument::ArgKind::Type, ""); | ||
|
|
@@ -6487,6 +6487,9 @@ bool HLSLExternalSource::MatchArguments( | |
| } | ||
| } | ||
|
|
||
| std::string profile = m_sema->getLangOpts().HLSLProfile; | ||
| const ShaderModel *SM = hlsl::ShaderModel::GetByName(profile.c_str()); | ||
|
Comment on lines
+6706
to
+6707
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if we couldn't have a LangOpt for HLSLLongVectorSupport, initialized based on the shader model instead?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could. We had a slew of these as part of earlier shader models that eventually got removed and merged into a single SM. I could see the readability benefit of being able to query it individually and also in implementing the hidden flag to force scalarization. It's just that precedent of removing similar flags to minimize test area that gives me pause. |
||
|
|
||
| // Populate argTypes. | ||
| for (size_t i = 0; i <= Args.size(); i++) { | ||
| const HLSL_INTRINSIC_ARGUMENT *pArgument = &pIntrinsic->pArgs[i]; | ||
|
|
@@ -6657,8 +6660,9 @@ bool HLSLExternalSource::MatchArguments( | |
| } | ||
|
|
||
| // Verify that the final results are in bounds. | ||
| CAB(uCols > 0 && uCols <= MaxVectorSize && uRows > 0 && | ||
| uRows <= MaxVectorSize, | ||
| CAB((uCols > 0 && uRows > 0 && | ||
| ((uCols <= MaxVectorSize && uRows <= MaxVectorSize) || | ||
| (SM->IsSM69Plus() && uRows == 1))), | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How is this change relevant to allowing native LLVM vectors? Shouldn't this be part of #7143 instead?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It shouldn't be part of this change, but it isn't part of the decls change either. This is for argument matching of an HLSL intrinsic, which we aren't doing quite yet either. |
||
| i); | ||
|
|
||
| // Const | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little suprised to see
ExtractElement`InsertElement\ShuffleVector` weren't already defined since I expected at least the first two as required for scalarization which DXIL already supports.If these are new for vectorization, should
isAllowed()be checkingIsSM69Plus()instead of always returning true?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scalarization took place before DXIL was generated. The output has no native vectors in non-lib shaders and so had no need for these functions. Intermediate steps might have these, but were disallowed for validation of final output. For similar reasons as their inclusion in intermediate steps, library shaders would allow these ops as they preserved native vectors in function interfaces. The check for library shader allowed ops accounts for this in validation.
DirectXShaderCompiler/lib/DxilValidation/DxilValidation.cpp
Line 2652 in b646ad3
The inclusion here adds them to the list of instructions that are approved for general validation by
IsLLVMInstructionAllowed, a generated function that merelyreturns whether the opcode is in the list of approved LLVM instructions for a non-lib. It is generated into DxilValidationImpl.inc and called here:
DirectXShaderCompiler/lib/DxilValidation/DxilValidation.cpp
Line 2694 in b646ad3
Similar to this function, this member is meant to return just true or false if it is allowed at all. I should add a 6.9 check to validation though.