Skip to content

Allow native vectors for LLVM operations#7155

Merged
pow2clk merged 18 commits intomicrosoft:mainfrom
pow2clk:longvec_operators
Mar 26, 2025
Merged

Allow native vectors for LLVM operations#7155
pow2clk merged 18 commits intomicrosoft:mainfrom
pow2clk:longvec_operators

Conversation

@pow2clk
Copy link
Copy Markdown
Collaborator

@pow2clk pow2clk commented Feb 20, 2025

Disables various forms of scalarization and vector elimination to permit
vectors to pass through to final DXIL when used in native LLVM
operations and loading/storing.

Introduces a few vector manipulation llvm instructions to DXIL allowing
for them to appear in output DXIL.

Skips passes for 6.9 that scalarize, convert to arrays, or otherwise eliminate vectors.
This eliminates the element-by-element loading of the vectors
In many cases, this required plumbing the shader model information to
passes that didn't have it before.

Many changes were needed for the MatrixBitcastLower pass related to
linking to avoid converting matrix vectors, but also to perform the
conversion if a shader was compiled for 6.9+, but then linked to a
earlier target.
This now adapts to the linker target to either preserve vectors for 6.9 or arrays for previous versions.
This requires running the DynamicIndexing VectorToArray pass during linking since 6_x and 6_9+ will fail to run this in the initial compile, but will still need to lower vectors to arrays.

Ternary conditional/select operators were element extracted in codegen.
Removing this allows 6.9 to preserve the vectors, but also maintains
behavior for previous shader models because the operations get
scalarized later anyway.

Keep groupshared variables as vectors for 6.9. They are no longer represented as indivual groupshared scalars.

Adds extensive tests for these operations using different types and
sizes and testing them appropriately. Booleans produce significantly
different code, so they get their own test.

Fixes #7123

Disables various forms of scalarization and vector elimination to permit
vectors to pass through to final DXIL when used in native LLVM
operations and loading/storing.

Introduces a few vector manipulation llvm instructions to DXIL allowing
for them to appear in output DXIL.

Skips passes for 6.9 that scalarize, convert to arrays, or otherwise eliminate vectors.
This eliminates the element-by-element loading of the vectors
In many cases, this required plumbing the shader model information to
passes that didn't have it before.

Many changes were needed for the MatrixBitcastLower pass related to
linking to avoid converting matrix vectors, but also to perform the
conversion if a shader was compiled for 6.9+, but then linked to a
earlier target.
This now adapts to the linker target to either preserve vectors for 6.9 or arrays for previous versions.
This requires running the DynamicIndexing VectorToArray pass during linking since 6_x and 6_9+ will fail to run this in the initial compile, but will still need to lower vectors to arrays.

Ternary conditional/select operators were element extracted in codegen.
Removing this allows 6.9 to preserve the vectors, but also maintains
behavior for previous shader models because the operations get
scalarized later anyway.

Keep groupshared variables as vectors for 6.9. They are no longer represented as indivual groupshared scalars.

Adds extensive tests for these operations using different types and
sizes and testing them appropriately. Booleans produce significantly
different code, so they get their own test.

Fixes microsoft#7123
@pow2clk pow2clk requested a review from a team as a code owner February 20, 2025 22:09
@pow2clk
Copy link
Copy Markdown
Collaborator Author

pow2clk commented Feb 20, 2025

The first two commits are dependencies that belong to a different PR. Only the last is relevant to this: 68a284b

Copy link
Copy Markdown
Collaborator

@llvm-beanz llvm-beanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has changes to multiple passes that do not have IR tests. We should have tests that verify correctness of the changes in the passes separately from the Clang-level changes.

def err_hlsl_out_indices_array_incorrect_access: Error<
"a vector in out indices array must be accessed as a whole">;
def err_hlsl_unsupported_long_vector: Error<
"Vectors of over 4 elements in %0 are not supported">;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Vectors of over 4 elements in %0 are not supported">;
"vectors of over 4 elements in %0 are not supported">;

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not part of this PR, but a prerequisite as I tried to explain in this comment: #7155 (comment) I have fixed it in the relevant PR: #7143

Comment thread tools/clang/lib/Sema/SemaHLSL.cpp Outdated
return false;
}

bool hlsl::HasLongVecs(const QualType &qt) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, this is not part of this PR, but part of the prerequisite #7143. I renamed it inline with your suggestion on that change, which I'm afraid doesn't match the coding standards you cited. e010223#r1960826249

Comment thread tools/clang/lib/Sema/SemaHLSL.cpp Outdated
Comment on lines +11917 to +11919
if (qt.isNull()) {
return false;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements

Suggested change
if (qt.isNull()) {
return false;
}
if (qt.isNull())
return false;

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not part of this PR. I've responded to it here: 50d1af5

@damyanp damyanp assigned pow2clk and unassigned llvm-beanz and tex3d Feb 25, 2025
@tex3d tex3d self-assigned this Feb 25, 2025
Comment thread lib/HLSL/HLMatrixBitcastLowerPass.cpp Outdated
HLMatrixType MatTy = HLMatrixType::cast(EltTy);
Value *matSize = Builder.getInt32(MatTy.getNumElements());
idxList.back() = Builder.CreateMul(idxList.back(), matSize);
if (!DM.GetShaderModel()->IsSM69Plus()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we base the low-level decisions on a descriptively named boolean field in the MatrixBitcastLowerPass class, initialized based on the shader model, instead of passing the DxilModule down and checking the shader model each time?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe even something about scalarizing operations, rather than anything to do with the native DXIL vector support?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the behavior for vec1s should proceed exactly as previously, I've added indications of native vector support which will use that and the vector size to determine what to do in all of these passes.

Value *GEP = CreateEltGEP(A, i, zeroIdx, Builder);
Value *Elt = Builder.CreateLoad(GEP);
NewVec = Builder.CreateInsertElement(NewVec, Elt, i);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that below this point, there's:

    } else if (StoreInst *ST = dyn_cast<StoreInst>(U)) {

where it still scalarizes the store for the vector.

Did you mean to leave that scalarization in?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I omitted store support in error. Clearly we need some testing for it as it never caused any issues.


if (F.getParent()->HasDxilModule())
if (F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus())
return false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is where we presumably will still want to do something different for vec1, right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's one of the places. However, this has no effect on the operators, but the intrinsics, which will come in another changes. So this particular file's changes are being reverted for now.

Value *val = stInst->getValueOperand();
Value *ldVal = Builder.CreateLoad(V);
ldVal = Builder.CreateInsertElement(ldVal, val, constIdx);
ldVal = Builder.CreateInsertElement(ldVal, val, constIdx); // UGH
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want to elaborate on the "UGH" comment? Is it on the general operation being performed, or something specific about this line?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just how I identified places that I needed to do something to deal with scalarization. Obviously not meant to be left in. I have now done what I intended to address this.

// all its users can be transformed, then split up the aggregate into its
// separate elements.
if (ShouldAttemptScalarRepl(AI) && isSafeAllocaToScalarRepl(AI)) {
if (!HLM.GetShaderModel()->IsSM69Plus() && ShouldAttemptScalarRepl(AI) &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pass is so complicated as it is, can we capture a bool for DXIL vector support earlier and use that instead of the individual SM 6.9 checks here as well?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I'll do that throughout.

Comment thread lib/Transforms/Scalar/Scalarizer.cpp Outdated
bool Scalarizer::runOnFunction(Function &F) {
if (F.getParent()->HasDxilModule())
if (F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus())
return false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually want to turn off scalarization entirely? Maybe this is one place where we need to preserve it for vec1 only?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proper support for vec1 was missing from this change. I've added it now.

llvm::Value *EltL = Builder.CreateExtractElement(LHS, i);
llvm::Value *EltR = Builder.CreateExtractElement(RHS, i);
llvm::Value *EltSelect = Builder.CreateSelect(EltCond, EltL, EltR);
result = Builder.CreateInsertElement(result, EltSelect, i);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess nothing depended on this early scalarization of select? I wonder why it was added in the first place. I could imagine subtle regressions by removing this code if some optimization before scalarization depended on it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing I could discern. If you have suggestions on how to further verify this, I'm happy to try.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it was part of HLSL 2018 and earlier, and might be related to how the non-short-circuting behavior was implemented.

I see you have some HLSL 2018 tests below. I'll look at them more closely to see that they preserve non-short-circuiting.

Code coverage does indicate that this isn't dead code: https://microsoft.github.io/DirectXShaderCompiler/coverage/home/runner/work/DirectXShaderCompiler/DirectXShaderCompiler/tools/clang/lib/CodeGen/CGExprScalar.cpp.html#L3709

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure it's not dead, but I've found a lot of pre-emptive scalarization that just duplicates the work that later scalarization passes would do anyway. Short-circuiting involvement is a possibility though.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aspect of this code that prevented short-circuiting was returning before the logic below could potentially insert conditional jumps. The scalarization was incidental. This is why the scalar cases would skip it and just generate the select operation, which would still prevent short-circuiting without any scalarization as scalars don't short circuit any more than vectors do in HLSL 2018. That select generation is the behavior I maintained for vectors and scalars. In pre-6.9 cases, the vector select will be scalarized functionally equivalently as before.

With this hard-coded scalarization in place, it can't be skipped with the existing changes to scalarization passes. We'd need a special case conditional for 6.9 here.

Comment on lines +6490 to +6491
std::string profile = m_sema->getLangOpts().HLSLProfile;
const ShaderModel *SM = hlsl::ShaderModel::GetByName(profile.c_str());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we couldn't have a LangOpt for HLSLLongVectorSupport, initialized based on the shader model instead?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. We had a slew of these as part of earlier shader models that eventually got removed and merged into a single SM. I could see the readability benefit of being able to query it individually and also in implementing the hidden flag to force scalarization. It's just that precedent of removing similar flags to minimize test area that gives me pause.

uRows <= MaxVectorSize,
CAB((uCols > 0 && uRows > 0 &&
((uCols <= MaxVectorSize && uRows <= MaxVectorSize) ||
(SM->IsSM69Plus() && uRows == 1))),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this change relevant to allowing native LLVM vectors? Shouldn't this be part of #7143 instead?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be part of this change, but it isn't part of the decls change either. This is for argument matching of an HLSL intrinsic, which we aren't doing quite yet either.


Link(L"main", L"ps_6_9", pLinker, {libName, libName2},
{"alloca [2 x <12 x float>]",
"getelementptr [12 x float], [12 x float]*"},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to tell if this is right. Shouldn't this be dealing with a vector rather than an array here? Can I see the IR?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this in December. I don't remember myself just now. I'll look into it and get back to you.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I was ready to write something here about how this was compiled as 6.3 because that's what the file says, but it's actually 6_x. The 12 element array comes from HLMatrixLower, which I hadn't touched.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, it's just scratch space used to move the elements of the matrix around. They use vectors too. I don't know why the test cares about it so much, but it doesn't need changing. I'll take a look at leveraging long vectors for matrices later, but this works perfectly fine for now.

@pow2clk
Copy link
Copy Markdown
Collaborator Author

pow2clk commented Mar 10, 2025

I have fixed the merge conflicts and responded to some of the feedback locally, but the way the dependencies are set up, if I push those changes before #7143 is merged, it's going to make distinguishing the prerequisite changes and the changes that are the subject of this PR even harder.

@pow2clk pow2clk removed their assignment Mar 14, 2025
Copy link
Copy Markdown
Collaborator

@joaosaffran joaosaffran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me. Not an expert though

Copy link
Copy Markdown
Collaborator Author

@pow2clk pow2clk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and added to pending comments I forgot to submit some time ago. A new commit should be coming soon, but there are a lot of IR tests to write.

Comment thread lib/HLSL/HLMatrixBitcastLowerPass.cpp Outdated
HLMatrixType MatTy = HLMatrixType::cast(EltTy);
Value *matSize = Builder.getInt32(MatTy.getNumElements());
idxList.back() = Builder.CreateMul(idxList.back(), matSize);
if (!DM.GetShaderModel()->IsSM69Plus()) {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the behavior for vec1s should proceed exactly as previously, I've added indications of native vector support which will use that and the vector size to determine what to do in all of these passes.

Value *GEP = CreateEltGEP(A, i, zeroIdx, Builder);
Value *Elt = Builder.CreateLoad(GEP);
NewVec = Builder.CreateInsertElement(NewVec, Elt, i);
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I omitted store support in error. Clearly we need some testing for it as it never caused any issues.


if (F.getParent()->HasDxilModule())
if (F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus())
return false;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's one of the places. However, this has no effect on the operators, but the intrinsics, which will come in another changes. So this particular file's changes are being reverted for now.

Value *val = stInst->getValueOperand();
Value *ldVal = Builder.CreateLoad(V);
ldVal = Builder.CreateInsertElement(ldVal, val, constIdx);
ldVal = Builder.CreateInsertElement(ldVal, val, constIdx); // UGH
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just how I identified places that I needed to do something to deal with scalarization. Obviously not meant to be left in. I have now done what I intended to address this.

// all its users can be transformed, then split up the aggregate into its
// separate elements.
if (ShouldAttemptScalarRepl(AI) && isSafeAllocaToScalarRepl(AI)) {
if (!HLM.GetShaderModel()->IsSM69Plus() && ShouldAttemptScalarRepl(AI) &&
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I'll do that throughout.

Comment thread lib/Transforms/Scalar/Scalarizer.cpp Outdated
bool Scalarizer::runOnFunction(Function &F) {
if (F.getParent()->HasDxilModule())
if (F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus())
return false;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proper support for vec1 was missing from this change. I've added it now.

llvm::Value *EltL = Builder.CreateExtractElement(LHS, i);
llvm::Value *EltR = Builder.CreateExtractElement(RHS, i);
llvm::Value *EltSelect = Builder.CreateSelect(EltCond, EltL, EltR);
result = Builder.CreateInsertElement(result, EltSelect, i);
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing I could discern. If you have suggestions on how to further verify this, I'm happy to try.

Comment on lines +6490 to +6491
std::string profile = m_sema->getLangOpts().HLSLProfile;
const ShaderModel *SM = hlsl::ShaderModel::GetByName(profile.c_str());
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. We had a slew of these as part of earlier shader models that eventually got removed and merged into a single SM. I could see the readability benefit of being able to query it individually and also in implementing the hidden flag to force scalarization. It's just that precedent of removing similar flags to minimize test area that gives me pause.

uRows <= MaxVectorSize,
CAB((uCols > 0 && uRows > 0 &&
((uCols <= MaxVectorSize && uRows <= MaxVectorSize) ||
(SM->IsSM69Plus() && uRows == 1))),
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be part of this change, but it isn't part of the decls change either. This is for argument matching of an HLSL intrinsic, which we aren't doing quite yet either.


Link(L"main", L"ps_6_9", pLinker, {libName, libName2},
{"alloca [2 x <12 x float>]",
"getelementptr [12 x float], [12 x float]*"},
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this in December. I don't remember myself just now. I'll look into it and get back to you.


// Flat Global vector if no dynamic vector indexing.
// Flatten Global vector if no dynamic vector indexing.
bool bFlatVector = !hasDynamicVectorIndexing(GV);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I understand not flattening vectors. does this also mean we won't flatten multi-dimensional arrays of vectors if we have dynamic vector indexing?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vectors and their dynamic indexing have no bearing on multi-dimensional array flattening. hasDynamicVectorIndexing strips off all array GEPs and only pays attention to how the vector itself is indexed. The DynamicVectorIndexingVectorToArray pass is run before MultiDimArrayToOneDimArray is. By the time we're worried about flattening arrays, any vector flattening has already been done. In non-lib pre-6.9 cases, that means the vector element looks just like another layer of the multidimensional array and gets flattened with the rest of it. In cases where the vector survives, it becomes the element of the flattened array just like the scalar would. That's why no changes to MultiDimArrayToOneDimArray were required.

Comment thread lib/Transforms/Scalar/Scalarizer.cpp Outdated
Comment on lines +296 to +297
if (F.getParent()->HasDxilModule())
if (F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (F.getParent()->HasDxilModule())
if (F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus())
auto *Parent = F.getParent();
if (Parent->HasDxilModule())
if (Parent->GetDxilModule().GetShaderModel()->IsSM69Plus())

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could even simplify this further to just:

SupportsVectors = F.getParent()->HasDxilModule() && F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus();

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel like these comments are pulling in the same direction. The first suggests adding a variable to prevent redundant parent retrieval. The second retains that redundancy, but places it on a single line that clang-format will just break up into multiple lines again. I will merge the conditionals though.

Though the compiler will likely remove any redundancy, I'm better disposed toward the temp variable unless there is some coding guideline that insists on a ?: operator in this case.

My reading of https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable doesn't suggest that auto is best here as it's not clear what type the parent will be without knowledge of what function parents always(?) are. So I'm going to spell out Module

!3 = !{i32 1, i32 6}
!4 = !{i32 1, i32 6}
!5 = !{!"lib", i32 6, i32 6}
!6 = !{i32 1, float ([3 x <3 x float>]*)* @"\01?bar@@YAMY02V?$vector@M$02@@@Z", !7, float ([3 x <3 x float>]*)* @"\01?foo@@YAMY02V?$vector@M$02@@@Z", !7}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if we have a [3 x [3 x <3 x float>]]?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing different. Multidimensional array flattening only interacts with vector flattening when the vectors look like arrays. Anyway, this test is only affected by this change because it was missing the needed metadata to reconstruct the HLModule when invoked with dxopt. It didn't need it before. Because of the shader model dependency, it does now.

Comment thread utils/hct/hctdb.py
"OTHER", 53, "VAArg", "VAArgInst", "vaarg instruction", "", []
)

self.add_llvm_instr(
Copy link
Copy Markdown
Collaborator

@farzonl farzonl Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why we are doing this since this change maps back to the change in include/dxc/DXIL/DxilInstructions.h.

I'm curious what it means though. What does it mean to add_llvm_instr in this case? I don't think it means anything about validity because i've seen these instructions emitted by DXC before?

Does this means there is a way to print out all the llvm instructions DXC supports via querying the DirectXShaderCompiler.utils.hct.hctdb_instrhelp package?

I'm curious for my own reasons because I want to build a Set difference of DXC to upstream llvm and know which instructions need to be legalized.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does have to do with the validity in non-lib shaders. As I at least partially explained above in DxilInstructions.h, it prevents these from being immediately rejected for non-lib shaders in validation:

if (!IsLLVMInstructionAllowed(I)) {
.

You could construct that query or you could just look at the LLVM operations indicated by a call to add_llvm_instr in hctdb.py, or you could look at the final form of the IsLLVMInstructionAllowed function the DxilValidationImpl.inc file from a build of DXC and see the opcode value range checks. The only exceptions previously between libs and other shaders were these three, so there are no other gotchas where an operation might seem to be disallowed by these metrics, but really is for a specific case.

Greg Roth added 2 commits March 21, 2025 15:46
I don't know why the order of the extractelements varies, but it does
even for the same build depending on whether a debugger is attached.
It would seem to be an unordered container object, but there is no such.
It just generates them as it goes and never touches them again which
would suggest the opposite order in a few cases.

They are constant expressions, that's probably why they move around, but
I can't find where to make them consistent.
// CHECK-SAME: ([11 x <[[NUM:[0-9][0-9]*]] x [[TYPE:[a-z0-9]*]]>]*
export void bittwiddlers(inout vector<TYPE, NUM> things[11]) {
// CHECK: [[adr1:%[0-9]*]] = getelementptr inbounds [11 x <[[NUM]] x [[TYPE]]>], [11 x <[[NUM]] x [[TYPE]]>]* %things, i32 0, i32 1
// CHECK: [[vec1:%[0-9]*]] = load <[[NUM]] x [[TYPE]]>, <[[NUM]] x [[TYPE]]>* [[adr1]]
Copy link
Copy Markdown
Collaborator

@farzonl farzonl Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to make most of these CHECK-NEXT?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of them could be, but not all. Some of them have a few commands I left out because they weren't part of the logic flow of interest. I think because I meticulously track the creation and usage of all the values, the matches must either be the instructions I intend or I suppose theoretically some that are entirely equivalent in function. It isn't necessary to the correct outcome that they be on immediately subsequent lines and the dependency tracking means there's no risk of false matches that CHECK-NEXT might prevent otherwise.

That's why it's not needed. I don't want to do it because rechecking all these for which ones can and can't and which ones may on one platform and not an another is not an exercise that I want to undergo without reason to do so.

Comment on lines +13 to +14
#include "dxc/DXIL/DxilModule.h"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumedly this isn't needed.

Suggested change
#include "dxc/DXIL/DxilModule.h"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Left in by mistake.

const DataLayout &DL,
bool SupportsVectors, bool hasPrecise,
DxilTypeSystem &typeSys, const DataLayout &DL,
SmallVector<Value *, 32> &DeadInsts,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not requesting that you fix this, but the use of a SmallVector as an argument does stand out to me. Generally the convention is to use a SmallVectorImpl instead as function arguments so that the function can be used with SmallVectors of different static allocation size.

llvm::Value *EltL = Builder.CreateExtractElement(LHS, i);
llvm::Value *EltR = Builder.CreateExtractElement(RHS, i);
llvm::Value *EltSelect = Builder.CreateSelect(EltCond, EltL, EltR);
result = Builder.CreateInsertElement(result, EltSelect, i);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it was part of HLSL 2018 and earlier, and might be related to how the non-short-circuting behavior was implemented.

I see you have some HLSL 2018 tests below. I'll look at them more closely to see that they preserve non-short-circuiting.

Code coverage does indicate that this isn't dead code: https://microsoft.github.io/DirectXShaderCompiler/coverage/home/runner/work/DirectXShaderCompiler/DirectXShaderCompiler/tools/clang/lib/CodeGen/CGExprScalar.cpp.html#L3709

Comment thread lib/Transforms/Scalar/Scalarizer.cpp Outdated
Comment on lines +296 to +297
if (F.getParent()->HasDxilModule())
if (F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could even simplify this further to just:

SupportsVectors = F.getParent()->HasDxilModule() && F.getParent()->GetDxilModule().GetShaderModel()->IsSM69Plus();

}

// Flat Global vector if no dynamic vector indexing.
// Flatten Global vector if no dynamic vector indexing.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If we're fixing a comment, might as well make it a complete sentence.

Suggested change
// Flatten Global vector if no dynamic vector indexing.
// Flatten global vector if no dynamic vector indexing occurs.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worded it more in keeping with the name of the utility function.

// CHECK: [[bvec5:%[0-9]*]] = icmp ne <[[NUM]] x i32> [[vec5]], zeroinitializer
// MORE STUFF

res[3] = truth[3] ? truth[4] : truth[5];
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a HLSL 2018 test case something like:

export float4 main(bool4 B, float4 X, float4 Y) {
    return B ? ++X : Y;
}

What I want to verify is that the side-effect of the LHS always occurs and no branch is generated under HLSL 2018.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good suggestion. The native vectors might interfere with that behavior.

Greg Roth added 4 commits March 24, 2025 04:35
Initialize mattype with check for matrixness

remove leftover include

reword comment

refactor SM69 conditional to avoid double parent retrieval

Add test that confirms no short-circuiting with native vector logic ops and HLSL
2018.

Revise vec1 scalarizer test that was mistakenly generated with HLSL 2021
which included short-circuiting.

Add validation check for vector operations in pre-6.9
Copy link
Copy Markdown
Collaborator

@bogner bogner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me, though it'd be good to get explicit approval from one of the folks who've done a few rounds of review on it.

Some of the LLVM IR tests have a few bits of unreferenced or unnecessary metadata that it'd be nice to clean up before committing.

Comment on lines +219 to +220
// Since HL module can't be created when linking, check for that first.
// Otherwise, either retrieve or generate the HL module.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say "Since dxmodule can't be created when linking..."?

Copy link
Copy Markdown
Collaborator Author

@pow2clk pow2clk Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we actually can create a dxil module when linking. Maybe I'll replace it with getorcreatedxilmodule just in case. We can't create an hlmodule while linking

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why it's confusing now though. the "it" was supposed to be a dxil module, but that is not at all clear from the wording.

Comment thread lib/HLSL/HLMatrixBitcastLowerPass.cpp Outdated
} // namespace

void MatrixBitcastLowerPass::lowerMatrix(Instruction *M, Value *A) {
void MatrixBitcastLowerPass::lowerMatrix(DxilModule &DM, Instruction *M,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing any use of DM, other than the recursive one required by this function signature.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a relic from when I was getting the version from it.

StringRef getPassName() const override { return "Matrix Bitcast lower"; }
bool runOnFunction(Function &F) override {
DxilModule &DM = F.getParent()->GetOrCreateDxilModule();
SupportsVectors = DM.GetShaderModel()->IsSM69Plus();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to avoid full DxilModule dependency, I think it would be reasonable to have something that can look up the DXIL version in the module without loading full metadata, whether or not it's HLM or DM.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea. I know I mentioned it somewhere before. I'd prefer it not block this change though.

SupportsVectors = M.GetDxilModule().GetShaderModel()->IsSM69Plus();
} else {
HLModule &HLM = M.GetOrCreateHLModule();
SupportsVectors = HLM.GetShaderModel()->IsSM69Plus();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned earlier, it might be better to get the DXIL version from metadata without relying on full HLModule/DxilModule metadata loading. Maybe we can add a dxilutil::ReadDxilVersion function?

But the fixes to the HLModule loading are appreciated!

@pow2clk
Copy link
Copy Markdown
Collaborator Author

pow2clk commented Mar 25, 2025

Generally looks good to me, though it'd be good to get explicit approval from one of the folks who've done a few rounds of review on it.

Some of the LLVM IR tests have a few bits of unreferenced or unnecessary metadata that it'd be nice to clean up before committing.

Just as an informational note, I discovered that the !paseresume line that I think was identified as unneeded at some point actually turns out to be in some places. Without it, we don't get the shader model, so we fall back to default pre-6.9 behavior. That's okay for the vec1 tests because they should be scalarized, but they weren't actually confirming that the scalarization still took place in 6.9, which was their intent.

Remove some unneeded parts of IR tests. Add some back. I discovered that
the pauseresume line is at least sometimes needed to construct the
module with the shader model information. I fear some of the tests were
passing before becuase they expected no changes, but were operating in
non-6.9 mode. So they passed, but for the wrong reason.

Remove unused DM param to matrixbitcastlowerpass lower matrix call.

reword confusing dynindexingvectortoarray comment
Comment thread lib/Transforms/Scalar/Scalarizer.cpp Outdated

bool Scalarizer::runOnFunction(Function &F) {
Module *M = F.getParent();
if (M->HasDxilModule() && M->GetDxilModule().GetShaderModel()->IsSM69Plus())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the independent lookup of DXIL version would help here, since it won't do something different when there's no created DxilModule already present.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could also replace it with getorcreate. I was a bit nervous about that becuase of how it failed for hlmodules, but I've since learned that it's a lot more robust and I've made the hlmodule variant similarly so.

Greg Roth added 4 commits March 25, 2025 17:07
Adds a dxilutil function to retrieve shader model from metadata. To
enable this, the metadata helper implementation is split into a few
utility functions which will continue to throw the exceptions as they
did before, but allows passes to call them without fear of crashes as a
result of that. Passes will assume the default shader model where it
can't retrieve it for one reason or another.

Changes scalarizer to try to retrieve the shader module when no module
is found. Changes dyanmicvector to array to use the utility function and
never try to recreate the hlmodule. Instead it will try the dxilmodule
and if that isn't present, it will use the metadata for shader model
only.
With the minimal metadata version requirement, these can be more
succinct
@pow2clk pow2clk merged commit 1eb83c7 into microsoft:main Mar 26, 2025
12 checks passed
@github-project-automation github-project-automation Bot moved this from New to Done in HLSL Roadmap Mar 26, 2025
@pow2clk pow2clk deleted the longvec_operators branch March 26, 2025 11:19
@damyanp damyanp moved this from Needs Review to Closed in HLSL Support Apr 22, 2025
@damyanp damyanp removed this from HLSL Support Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[SM6.9] Generate native vector variants of elementwise operators

7 participants