JIT: Add support for async variant delegate GDV#128273
Draft
jakobbotsch wants to merge 5 commits into
Draft
Conversation
Switch to the async variant if user implemented when doing delegate GDV. The async variant can then further be inlined.
Contributor
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR extends CoreCLR JIT indirect-call transformations so that guarded devirtualization (GDV) of delegate Invoke within async methods can opportunistically switch a guessed target to its user-implemented async variant (when available), enabling additional hoisting/inlining and avoiding the thunk path. It also introduces a small CallArgs helper to standardize insertion of the async-continuation well-known argument.
Changes:
- Enhance
GuardedDevirtualizationTransformerto detectAsyncHelpers.Awaitconsumption of a GDV return value, split the await region, and (for applicable guesses) switch to the async “other variant” method handle. - Add
CallArgs::InsertAsyncContinuationand use it from the importer path forldvirtftnasync calls (and from the new GDV async-variant path). - Refactor/rename some transformer state to
m_members and adjust block weight handling when an await remainder block is introduced.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| src/coreclr/jit/indirectcalltransformer.cpp | Adds async-variant selection for delegate GDV in async methods, plus await splitting and related refactoring. |
| src/coreclr/jit/importercalls.cpp | Switches async-continuation insertion for ldvirtftn async calls to a shared CallArgs helper. |
| src/coreclr/jit/gentree.h | Declares the new CallArgs::InsertAsyncContinuation helper. |
| src/coreclr/jit/gentree.cpp | Implements CallArgs::InsertAsyncContinuation insertion policy for the async continuation arg. |
Comments suppressed due to low confidence (1)
src/coreclr/jit/indirectcalltransformer.cpp:873
CreateOrGetLocalForRetExprwrites the unused-return result to*isUnused, but the subsequent conditional checksm_returnValueUnusedinstead of*isUnused. This breaks callers that pass a different output flag (e.g.m_awaitReturnValueUnusedinSplitAwait) and can cause the ret-expr to be linked/handled incorrectly (temp allocated vs NOP substitution) depending on stale member state.
unsigned resultLcl = BAD_VAR_NUM;
if (noReturnValue)
{
JITDUMP("Linking GT_RET_EXPR [%06u] for VOID return to NOP\n",
compiler->dspTreeID(inlineInfo->retExpr));
inlineInfo->retExpr->gtSubstExpr = compiler->gtNewNothingNode();
}
else if (m_returnValueUnused)
{
JITDUMP("Linking GT_RET_EXPR [%06u] for UNUSED return to NOP\n",
compiler->dspTreeID(inlineInfo->retExpr));
inlineInfo->retExpr->gtSubstExpr = compiler->gtNewNothingNode();
}
Comment on lines
+309
to
320
| Compiler* compiler = nullptr; | ||
| BasicBlock* currBlock = nullptr; | ||
| BasicBlock* m_remainderBlock = nullptr; | ||
| BasicBlock* m_checkBlock = nullptr; | ||
| BasicBlock* m_thenBlock = nullptr; | ||
| BasicBlock* m_elseBlock = nullptr; | ||
| Statement* stmt = nullptr; | ||
| GenTreeCall* m_origCall = nullptr; | ||
| unsigned m_likelihood = HIGH_PROBABILITY; | ||
|
|
||
| const int HIGH_PROBABILITY = 80; | ||
| }; |
Comment on lines
+384
to
+391
| m_checkBlock = CreateAndInsertBasicBlock(BBJ_ALWAYS, currBlock, currBlock); | ||
| GenTree* fatPointerMask = new (compiler, GT_CNS_INT) GenTreeIntCon(TYP_I_IMPL, FAT_POINTER_MASK); | ||
| GenTree* m_fptrAddressCopy = compiler->gtCloneExpr(m_fptrAddress); | ||
| GenTree* fatPointerAnd = compiler->gtNewOperNode(GT_AND, TYP_I_IMPL, m_fptrAddressCopy, fatPointerMask); | ||
| GenTree* zero = new (compiler, GT_CNS_INT) GenTreeIntCon(TYP_I_IMPL, 0); | ||
| GenTree* fatPointerCmp = compiler->gtNewOperNode(GT_NE, TYP_INT, fatPointerAnd, zero); | ||
| GenTree* jmpTree = compiler->gtNewOperNode(GT_JTRUE, TYP_VOID, fatPointerCmp); | ||
| Statement* jmpStmt = compiler->fgNewStmtFromTree(jmpTree, stmt->GetDebugInfo()); |
Comment on lines
+430
to
+432
| GenTree* m_fptrAddressCopy = compiler->gtCloneExpr(m_fptrAddress); | ||
| GenTree* fatPointerMask = new (compiler, GT_CNS_INT) GenTreeIntCon(TYP_I_IMPL, FAT_POINTER_MASK); | ||
| return compiler->gtNewOperNode(GT_SUB, m_pointerType, m_fptrAddressCopy, fatPointerMask); |
Comment on lines
845
to
852
| if (!noReturnValue) | ||
| { | ||
| Statement* const nextStmt = stmt->GetNextStmt(); | ||
| if (nextStmt != nullptr) | ||
| { | ||
| Compiler::FindLinkData fld = compiler->gtFindLink(nextStmt, retExprNode); | ||
| GenTree* const parent = fld.parent; | ||
|
|
| // | ||
| // So to figure out the proper divisor, we start with 1.0 and subtract off each | ||
| // preceeding test's likelihood of success. | ||
| // preceeding test's m_likelihood of success. |
Comment on lines
+1797
to
+1804
| CallArg* CallArgs::InsertAsyncContinuation(Compiler* comp, GenTree* node) | ||
| { | ||
| NewCallArg newArg = NewCallArg::Primitive(node).WellKnown(WellKnownArg::AsyncContinuation); | ||
| assert(FindWellKnownArg(WellKnownArg::InstParam) == nullptr); | ||
|
|
||
| if (Target::g_tgtArgOrder == Target::ARG_ORDER_R2L) | ||
| { | ||
| CallArg* retBufferArg = GetRetBufferArg(); |
Open
3 tasks
Comment on lines
+382
to
+389
| m_checkBlock = CreateAndInsertBasicBlock(BBJ_ALWAYS, currBlock, currBlock); | ||
| GenTree* fatPointerMask = new (compiler, GT_CNS_INT) GenTreeIntCon(TYP_I_IMPL, FAT_POINTER_MASK); | ||
| GenTree* m_fptrAddressCopy = compiler->gtCloneExpr(m_fptrAddress); | ||
| GenTree* fatPointerAnd = compiler->gtNewOperNode(GT_AND, TYP_I_IMPL, m_fptrAddressCopy, fatPointerMask); | ||
| GenTree* zero = new (compiler, GT_CNS_INT) GenTreeIntCon(TYP_I_IMPL, 0); | ||
| GenTree* fatPointerCmp = compiler->gtNewOperNode(GT_NE, TYP_INT, fatPointerAnd, zero); | ||
| GenTree* jmpTree = compiler->gtNewOperNode(GT_JTRUE, TYP_VOID, fatPointerCmp); | ||
| Statement* jmpStmt = compiler->fgNewStmtFromTree(jmpTree, stmt->GetDebugInfo()); |
Comment on lines
+1117
to
+1121
| // No configuration is currently matched for the awaiter. | ||
| callInfo->ContinuationContextHandling = ContinuationContextHandling::ContinueOnCapturedContext; | ||
| call->SetIsAsync(callInfo); | ||
| call->gtArgs.InsertAsyncContinuation(compiler, compiler->gtNewNull()); | ||
|
|
| // | ||
| // So to figure out the proper divisor, we start with 1.0 and subtract off each | ||
| // preceeding test's likelihood of success. | ||
| // preceeding test's m_likelihood of success. |
|
|
||
| // When the current candidate has sufficiently high likelihood, scan | ||
| // When the current candidate has sufficiently high m_likelihood, scan | ||
| // the remainer block looking for another GDV candidate. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Switch to the async variant if user implemented when doing delegate GDV. This avoids calling through the task-returning thunk, and further allows inlining the async variant (under the current restrictions).
Before: Took 1334 ms
After: Took 122 ms
There are some unnecessary context saves/restores in the inner loop, but at least we do GDV, hoisting and inlining now.
Fix #128260