Skip to content

feat: make E2E VM SKU selection restriction-aware#5671

Draft
roivaz wants to merge 1 commit into
Azure:mainfrom
roivaz:feat/e2e-suite-sku-overrides
Draft

feat: make E2E VM SKU selection restriction-aware#5671
roivaz wants to merge 1 commit into
Azure:mainfrom
roivaz:feat/e2e-suite-sku-overrides

Conversation

@roivaz

@roivaz roivaz commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

The E2E suite hardcoded VM SKUs (mainly Standard_D8s_v3), which fails when a SKU is quota-restricted (e.g. NotAvailableForSubscription) in a particular subscription/region. This makes onboarding new subscriptions for E2E brittle.

Extend the suite's dynamic SKU discovery to be restriction-aware using the Azure Resource SKUs API: SelectVMSize filters out SKUs restricted in the current subscription/location/zone, applies capability constraints, prefers the historical SKU first (unchanged behaviour when available), then falls back to a deterministic pick. Returns ErrNoUsableVMSize when nothing fits.

Route all general-purpose, arm64 and GPU SKU choices through it. Default node pool params leave VMSize empty and CreateNodePoolFromParam resolves it. The GPU test now uses the same mechanism and Skips when no unrestricted GPU SKU exists.

The E2E suite hardcoded VM SKUs (mainly Standard_D8s_v3), which fails when a
SKU is quota-restricted (e.g. NotAvailableForSubscription) in a particular
subscription/region. This made onboarding new subscriptions for E2E brittle.

Extend the suite's dynamic SKU discovery to be restriction-aware using the
Azure Resource SKUs API: SelectVMSize filters out SKUs restricted in the
current subscription/location/zone, applies capability constraints, prefers the
historical SKU first (unchanged behaviour when available), then falls back to a
deterministic pick. Returns ErrNoUsableVMSize when nothing fits.

Route all general-purpose, arm64 and GPU SKU choices through it. Default node
pool params leave VMSize empty and CreateNodePoolFromParam resolves it. The GPU
test now uses the same mechanism and Skips when no unrestricted GPU SKU exists.

Co-authored-by: Copilot <[email protected]>
@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: roivaz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@roivaz roivaz added ai-assisted AI/LLM tool was used to help create this MR and removed approved labels Jun 16, 2026
@miquelsi

Copy link
Copy Markdown
Collaborator

We also define default VM size (and disk storage account type) in https://github.com/Azure/ARO-HCP/blob/main/test/util/framework/deployment_params.go

@openshift-ci

openshift-ci Bot commented Jun 17, 2026

Copy link
Copy Markdown

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mbukatov mbukatov left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general direction is good, but I need to review details later when I have more time.

By("creating node pool version " + matchingNodePoolVersion + " and verifying a simple web app can run")
nodePoolDefaults := defaultNodePoolDefaults
nodePoolDefaults.vmSize, err = tc.SelectVMSize(ctx, framework.DefaultWorkerVMSizeSelector())
Expect(err).NotTo(HaveOccurred(), "failed to resolve the default worker VM size for back-level node pool")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also directly state that such error is configuration/infra issue of the test subscription/region?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-assisted AI/LLM tool was used to help create this MR do-not-merge/work-in-progress needs-rebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants