Skip to content

Brev 9032/add nebius live capacity#123

Draft
kirtiip20 wants to merge 2 commits into
mainfrom
BREV-9032/Add-nebius-live-capacity
Draft

Brev 9032/add nebius live capacity#123
kirtiip20 wants to merge 2 commits into
mainfrom
BREV-9032/Add-nebius-live-capacity

Conversation

@kirtiip20

Copy link
Copy Markdown

Problem

Nebius GPU instance types showed as available in the Brev UI based on tenant quota alone, even when Nebius had no on-demand capacity in that region. Users could select such a type (e.g. 8× H200, L40s) and the launch would then fail at provisioning time.

Root cause
Availability was computed only from tenant quota allowances (a region-specific compute.instance.gpu.* quota check), with no check against the provider's actual capacity. A tenant can hold quota in a region where Nebius currently has no capacity available so the type was still marked available and failed on launch.

Fix
Integrated the Nebius Capacity Advisor (ResourceAdvice) API so availability reflects real-time on-demand capacity & tenant quota:

  1. Fetch Capacity Advisor data during each instance-type synchronization and build a region:platform:preset availability map (getResourceAdviceMap, buildResourceAdviceMapFromItems).
  2. Updated GPU availability resolution (resolvePresetAvailability) to require:Available capacity from Capacity Advisor, and
    Remaining tenant quota.
    Treated DATA_STATE_UNKNOWN and AVAILABILITY_LEVEL_LIMIT_REACHED as unavailable capacity
  3. If the Capacity Advisor API is fully unavailable, degrade gracefully to quota-only (logged as a warning) so the catalog doesn't go blank.
  4. Upgraded github.com/nebius/gosdk to v0.2.22, which includes support for the Capacity Advisor API.

@kirtiip20 kirtiip20 self-assigned this Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant