feat: make E2E VM SKU selection restriction-aware#5671
Conversation
The E2E suite hardcoded VM SKUs (mainly Standard_D8s_v3), which fails when a SKU is quota-restricted (e.g. NotAvailableForSubscription) in a particular subscription/region. This made onboarding new subscriptions for E2E brittle. Extend the suite's dynamic SKU discovery to be restriction-aware using the Azure Resource SKUs API: SelectVMSize filters out SKUs restricted in the current subscription/location/zone, applies capability constraints, prefers the historical SKU first (unchanged behaviour when available), then falls back to a deterministic pick. Returns ErrNoUsableVMSize when nothing fits. Route all general-purpose, arm64 and GPU SKU choices through it. Default node pool params leave VMSize empty and CreateNodePoolFromParam resolves it. The GPU test now uses the same mechanism and Skips when no unrestricted GPU SKU exists. Co-authored-by: Copilot <[email protected]>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: roivaz The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Skipping CI for Draft Pull Request. |
|
We also define default VM size (and disk storage account type) in https://github.com/Azure/ARO-HCP/blob/main/test/util/framework/deployment_params.go |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
mbukatov
left a comment
There was a problem hiding this comment.
The general direction is good, but I need to review details later when I have more time.
| By("creating node pool version " + matchingNodePoolVersion + " and verifying a simple web app can run") | ||
| nodePoolDefaults := defaultNodePoolDefaults | ||
| nodePoolDefaults.vmSize, err = tc.SelectVMSize(ctx, framework.DefaultWorkerVMSizeSelector()) | ||
| Expect(err).NotTo(HaveOccurred(), "failed to resolve the default worker VM size for back-level node pool") |
There was a problem hiding this comment.
Could we also directly state that such error is configuration/infra issue of the test subscription/region?
The E2E suite hardcoded VM SKUs (mainly Standard_D8s_v3), which fails when a SKU is quota-restricted (e.g. NotAvailableForSubscription) in a particular subscription/region. This makes onboarding new subscriptions for E2E brittle.
Extend the suite's dynamic SKU discovery to be restriction-aware using the Azure Resource SKUs API: SelectVMSize filters out SKUs restricted in the current subscription/location/zone, applies capability constraints, prefers the historical SKU first (unchanged behaviour when available), then falls back to a deterministic pick. Returns ErrNoUsableVMSize when nothing fits.
Route all general-purpose, arm64 and GPU SKU choices through it. Default node pool params leave VMSize empty and CreateNodePoolFromParam resolves it. The GPU test now uses the same mechanism and Skips when no unrestricted GPU SKU exists.