Use runtime host firmware config in upgrade flows#3061
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Summary by CodeRabbit
WalkthroughThis PR adds host firmware override support to firmware snapshots, then updates API, explorer, update, and preingestion flows to read DB-backed override snapshots. It also adjusts tests and the OpenAPI description. ChangesFirmware Override Snapshot Pipeline
Estimated code review effort: 4 (Complex) | ~60 minutes Sequence Diagram(s)sequenceDiagram
participant Client
participant Api as api::api.rs
participant Handler as handlers::firmware
participant DB as api-db::host_firmware_config
participant FirmwareConfig
Client->>Api: request
Api->>Handler: get_desired_firmware_versions(self, request).await
Handler->>DB: list_configs(...)
DB-->>Handler: host firmware configs
Handler->>FirmwareConfig: create_snapshot_with_overrides(configs)
FirmwareConfig-->>Handler: FirmwareConfigSnapshot
Handler-->>Api: entries
Api-->>Client: response
sequenceDiagram
participant Manager as preingestion-manager
participant DB as api-db::host_firmware_config
participant FirmwareConfig
participant Selection as select_firmware_for_artifact_continuation
Manager->>DB: list_configs(db)
DB-->>Manager: host firmware configs
Manager->>FirmwareConfig: create_snapshot_with_overrides(configs)
FirmwareConfig-->>Manager: FirmwareConfigSnapshot
Manager->>Selection: select by final_version or default
Selection-->>Manager: selected firmware entry
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
crates/api-db/src/host_firmware_config.rs (1)
113-145: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winUse
impl DbReaderfor these read-only helpers.
list,list_configs, andsummarydo not mutate state, but these&mut PgConnectionsignatures force downstream callers to open and commit transactions just to read. Acceptingimpl DbReaderhere would remove that boilerplate and let pool-backed callers stay out of transactions. As per coding guidelines,crates/api-db/**/*.rs: "Prefer acceptingimpl DbReaderas a connection in read-only database functions to allow callers to pass aPgPooland avoid transaction boilerplate."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/api-db/src/host_firmware_config.rs` around lines 113 - 145, The read-only helpers in host_firmware_config currently force mutable PgConnection access, which makes callers use unnecessary transactions. Update list, list_configs, and summary to accept impl DbReader instead of &mut PgConnection, and keep their query logic the same so pool-backed callers can invoke them without transaction boilerplate.Source: Coding guidelines
crates/preingestion-manager/src/lib.rs (1)
3113-3141: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winUse a table-driven test for the selector cases.
These two tests call the same operation with different inputs; consolidate them with
value_scenarios!orcheck_values. As per coding guidelines, “Use table-driven test style when writing tests in Rust.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/preingestion-manager/src/lib.rs` around lines 3113 - 3141, The two tests around select_firmware_for_artifact_continuation are duplicating the same setup and assertion pattern, so consolidate them into a table-driven test. Refactor the cases into a single parameterized test using value_scenarios! or check_values, keeping the existing scenarios for the final-version preference and default fallback while varying only the requested version and expected selected version.Source: Coding guidelines
crates/api-core/src/handlers/firmware.rs (1)
87-98: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winExtract the repeated snapshot-building block into a shared helper.
The begin-txn →
list_configs→ commit →create_snapshot_with_overridessequence ineffective_host_firmware_snapshotis duplicated nearly verbatim incrates/api-core/src/handlers/site_explorer.rs(inline inrefresh_endpoint_report),crates/site-explorer/src/lib.rs(firmware_config_snapshot), andcrates/preingestion-manager/src/lib.rs(PreingestionManagerStatic::firmware_config_snapshot) — four independent copies with three different names. Any future change (error handling, isolation level, filtering by vendor) has to be applied consistently in all four places, which is easy to miss.Consider hoisting this into a single function (e.g. on
FirmwareConfigincarbide-firmware, or a free function inapi-db) that takes aPgPool/impl DbReaderplus the baseFirmwareConfig/Arc<FirmwareConfig>and returns the effective snapshot, then have all four call sites delegate to it.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/api-core/src/handlers/firmware.rs` around lines 87 - 98, The begin-txn → list_configs → commit → create_snapshot_with_overrides flow is duplicated across effective_host_firmware_snapshot, refresh_endpoint_report, firmware_config_snapshot, and PreingestionManagerStatic::firmware_config_snapshot. Extract that snapshot-building logic into one shared helper (for example on FirmwareConfig or as a free function in api-db) that accepts a DB reader/pool plus the base FirmwareConfig/Arc<FirmwareConfig> and returns the effective FirmwareConfigSnapshot, then update all four call sites to delegate to it.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/firmware/src/config.rs`:
- Around line 296-299: The runtime override merge in the Firmware config update
is overwriting explicit_start_needed even when the override omits it, which
clears the base catalog value. Update the override handling around the cur_model
assignment in the config merge path to use tri-state semantics (for example,
Option<bool>) so only a present value updates explicit_start_needed and omission
preserves the existing value. Keep the behavior consistent with the other merged
fields in the same block and the override application logic that feeds
cur_model.
In `@crates/preingestion-manager/src/lib.rs`:
- Around line 519-539: The host firmware lookup in find_fw_info_for_host is
bypassing the snapshot’s full model resolution logic by only using
endpoint.report.model(), which can miss matches stored in the explicit
report.model field. Update this method to reuse
FirmwareConfigSnapshot::find_fw_info_for_host (or the same model-selection logic
it uses) after obtaining the snapshot from firmware_config_snapshot, so
vendor/model matching behaves consistently with the snapshot’s lookup semantics.
---
Nitpick comments:
In `@crates/api-core/src/handlers/firmware.rs`:
- Around line 87-98: The begin-txn → list_configs → commit →
create_snapshot_with_overrides flow is duplicated across
effective_host_firmware_snapshot, refresh_endpoint_report,
firmware_config_snapshot, and
PreingestionManagerStatic::firmware_config_snapshot. Extract that
snapshot-building logic into one shared helper (for example on FirmwareConfig or
as a free function in api-db) that accepts a DB reader/pool plus the base
FirmwareConfig/Arc<FirmwareConfig> and returns the effective
FirmwareConfigSnapshot, then update all four call sites to delegate to it.
In `@crates/api-db/src/host_firmware_config.rs`:
- Around line 113-145: The read-only helpers in host_firmware_config currently
force mutable PgConnection access, which makes callers use unnecessary
transactions. Update list, list_configs, and summary to accept impl DbReader
instead of &mut PgConnection, and keep their query logic the same so pool-backed
callers can invoke them without transaction boilerplate.
In `@crates/preingestion-manager/src/lib.rs`:
- Around line 3113-3141: The two tests around
select_firmware_for_artifact_continuation are duplicating the same setup and
assertion pattern, so consolidate them into a table-driven test. Refactor the
cases into a single parameterized test using value_scenarios! or check_values,
keeping the existing scenarios for the final-version preference and default
fallback while varying only the requested version and expected selected version.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 03ddd07b-4db0-45aa-adb3-0e089ae64b1f
📒 Files selected for processing (14)
crates/api-core/src/api.rscrates/api-core/src/handlers/firmware.rscrates/api-core/src/handlers/site_explorer.rscrates/api-core/src/machine_update_manager/host_firmware.rscrates/api-core/src/tests/host_firmware_config.rscrates/api-db/src/host_firmware_config.rscrates/firmware/src/config.rscrates/firmware/src/tests/config.rscrates/machine-controller/src/handler.rscrates/preingestion-manager/src/lib.rscrates/preingestion-manager/tests/integration/host_bmc_firmware.rscrates/site-explorer/src/lib.rsrest-api/docs/index.htmlrest-api/openapi/spec.yaml
💤 Files with no reviewable changes (1)
- rest-api/openapi/spec.yaml
f9f9ffc to
4877f1d
Compare
|
🌿 Preview your docs: https://nvidia-preview-pull-request-3061.docs.buildwithfern.com/infra-controller |
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-07-02 10:08:46 UTC | Commit: 4877f1d |
4877f1d to
1755a42
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
crates/api-core/src/handlers/firmware.rs (1)
317-336: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winNeedless clone on a dead-write fallback path.
In the
existing.unwrap_or_else(...)fallback (Lines 321-327),vendorandmodel(via.clone()) are set, only to be unconditionally overwritten by the exact samepatch.vendor/patch.modeltwo lines later (Lines 329-330) — regardless of whetherexistingwasSomeorNone. The.clone()on Line 323 is therefore pure waste on the create path.Since
HostFirmwareConfignow derivesDefault(seecrates/api-model/src/firmware.rs), this can be simplified to drop the redundant fields and the clone:♻️ Proposed simplification
- let mut runtime_config = existing.unwrap_or_else(|| HostFirmwareConfig { - vendor: patch.vendor, - model: patch.model.clone(), - components: HashMap::new(), - explicit_start_needed: None, - ordering: Vec::new(), - }); + let mut runtime_config = existing.unwrap_or_default(); runtime_config.vendor = patch.vendor; runtime_config.model = patch.model;This assumes
bmc_vendor::BMCVendorimplementsDefault(flagged for verification on theHostFirmwareConfigstruct definition).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/api-core/src/handlers/firmware.rs` around lines 317 - 336, The fallback path in merge_host_firmware_config_patch is doing a needless clone and dead write: vendor and model are initialized from patch only to be overwritten immediately after. Simplify the HostFirmwareConfig construction in the existing.unwrap_or_else branch by relying on HostFirmwareConfig::default (or otherwise removing vendor/model from the fallback), then keep the later assignments to runtime_config.vendor and runtime_config.model as the single source of truth. Verify BMCVendor supports Default before using the default-based initialization.Source: Coding guidelines
crates/site-explorer/src/lib.rs (1)
2185-2185: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick winConsider tracking latency for the new DB-backed snapshot fetch.
Every other DB-touching phase in
update_explored_endpoints(load, preallocate, interface_load, build_index, plan, delete_stale, probe, persist, remediate) is wrapped withmetrics.record_phase_latency(...). This newfirmware_config_snapshot().await?call is not, leaving a blind spot if the host_firmware_config query becomes slow.♻️ Suggested addition
+ let fw_snapshot_start = Instant::now(); let fw_config_snapshot = Arc::new(self.firmware_config_snapshot().await?); + metrics.record_phase_latency("update_explored_endpoints_fw_snapshot", fw_snapshot_start.elapsed());🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/site-explorer/src/lib.rs` at line 2185, The new firmware config snapshot fetch in update_explored_endpoints is not being timed like the other DB-backed phases, so wrap the firmware_config_snapshot() await in metrics.record_phase_latency just as is done for load, preallocate, interface_load, build_index, plan, delete_stale, probe, persist, and remediate. Use the existing metrics and update_explored_endpoints flow to ensure the fw_config_snapshot step is recorded consistently for latency tracking.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/api-core/src/handlers/firmware.rs`:
- Around line 317-336: The fallback path in merge_host_firmware_config_patch is
doing a needless clone and dead write: vendor and model are initialized from
patch only to be overwritten immediately after. Simplify the HostFirmwareConfig
construction in the existing.unwrap_or_else branch by relying on
HostFirmwareConfig::default (or otherwise removing vendor/model from the
fallback), then keep the later assignments to runtime_config.vendor and
runtime_config.model as the single source of truth. Verify BMCVendor supports
Default before using the default-based initialization.
In `@crates/site-explorer/src/lib.rs`:
- Line 2185: The new firmware config snapshot fetch in update_explored_endpoints
is not being timed like the other DB-backed phases, so wrap the
firmware_config_snapshot() await in metrics.record_phase_latency just as is done
for load, preallocate, interface_load, build_index, plan, delete_stale, probe,
persist, and remediate. Use the existing metrics and update_explored_endpoints
flow to ensure the fw_config_snapshot step is recorded consistently for latency
tracking.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: bd42e9bb-12cf-460f-8a79-8cdcc31e46ae
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (16)
crates/api-core/src/api.rscrates/api-core/src/handlers/firmware.rscrates/api-core/src/handlers/site_explorer.rscrates/api-core/src/machine_update_manager/host_firmware.rscrates/api-core/src/tests/host_firmware_config.rscrates/api-db/src/host_firmware_config.rscrates/api-model/src/firmware.rscrates/firmware/src/config.rscrates/firmware/src/tests/config.rscrates/machine-controller/src/handler.rscrates/preingestion-manager/Cargo.tomlcrates/preingestion-manager/src/lib.rscrates/preingestion-manager/tests/integration/host_bmc_firmware.rscrates/site-explorer/src/lib.rsrest-api/docs/index.htmlrest-api/openapi/spec.yaml
✅ Files skipped from review due to trivial changes (2)
- rest-api/openapi/spec.yaml
- crates/preingestion-manager/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (9)
- crates/api-core/src/handlers/site_explorer.rs
- crates/api-core/src/api.rs
- crates/preingestion-manager/tests/integration/host_bmc_firmware.rs
- crates/firmware/src/tests/config.rs
- crates/firmware/src/config.rs
- crates/preingestion-manager/src/lib.rs
- crates/api-core/src/machine_update_manager/host_firmware.rs
- crates/api-core/src/tests/host_firmware_config.rs
- crates/machine-controller/src/handler.rs
🔍 Container Scan Summary
Per-CVE detail lives in the per-service |
Apply DB-backed host firmware config overrides on top of static and metadata.toml firmware catalogs across firmware listing, desired firmware snapshotting, site explorer, host reprovisioning, machine update handling, and preingestion. Runtime config now participates in version comparison and upgrade selection, while metadata remains the base catalog and DB values take precedence when they overlap.
1755a42 to
ac61712
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
crates/api-core/src/handlers/firmware.rs (1)
1194-1268: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winConsolidate these merge variants into table-driven tests.
The optional-field merge cases exercise the same operation with different inputs/expectations. A
check_casestable would keep the newOption<bool>semantics easier to extend and audit.As per coding guidelines, Rust tests should prefer table-driven style using
carbide-test-supportscenarios or explicitcheck_cases/check_values.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/api-core/src/handlers/firmware.rs` around lines 1194 - 1268, The new merge tests for merge_host_firmware_config_patch duplicate the same optional-field behavior across separate cases, so consolidate them into a table-driven test. Refactor the three test functions around merge_host_firmware_config_preserves_optional_fields_when_omitted, merge_host_firmware_config_updates_optional_fields_when_present, and merge_host_firmware_config_keeps_omitted_explicit_start_absent_on_create into a single check_cases or check_values style test that enumerates inputs and expected Option<bool> outcomes. Keep the existing helpers test_firmware, test_patch, test_component, and test_entry, but drive them from a scenario table to match the Rust test style guidelines.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/api-core/src/handlers/firmware.rs`:
- Line 744: The config response in firmware handling is collapsing “unset” into
false by using unwrap_or(false), which hides whether there is a DB override or
not. Update the response shape in firmware.rs (and the associated config
model/handler path) so explicit_start_needed preserves None for stored config
responses, or add presence metadata and only default to false in effective
inventory views. Make sure any API/schema changes stay compatible with existing
clients and clearly distinguish unset from an explicit false.
- Around line 99-104: `list_host_firmware` still ignores the deserialized
request, so it bypasses the standard core API request logging path. Update the
`list_host_firmware` handler to log the request by calling `log_request_data`
with the incoming `rpc::ListHostFirmwareRequest` before continuing with
`effective_host_firmware_snapshot`, matching the pattern used by other core API
handlers and keeping the request argument available for safe logging.
---
Nitpick comments:
In `@crates/api-core/src/handlers/firmware.rs`:
- Around line 1194-1268: The new merge tests for
merge_host_firmware_config_patch duplicate the same optional-field behavior
across separate cases, so consolidate them into a table-driven test. Refactor
the three test functions around
merge_host_firmware_config_preserves_optional_fields_when_omitted,
merge_host_firmware_config_updates_optional_fields_when_present, and
merge_host_firmware_config_keeps_omitted_explicit_start_absent_on_create into a
single check_cases or check_values style test that enumerates inputs and
expected Option<bool> outcomes. Keep the existing helpers test_firmware,
test_patch, test_component, and test_entry, but drive them from a scenario table
to match the Rust test style guidelines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b9100067-22a7-4801-9be1-e9893e96c4b0
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (16)
crates/api-core/src/api.rscrates/api-core/src/handlers/firmware.rscrates/api-core/src/handlers/site_explorer.rscrates/api-core/src/machine_update_manager/host_firmware.rscrates/api-core/src/tests/host_firmware_config.rscrates/api-db/src/host_firmware_config.rscrates/api-model/src/firmware.rscrates/firmware/src/config.rscrates/firmware/src/tests/config.rscrates/machine-controller/src/handler.rscrates/preingestion-manager/Cargo.tomlcrates/preingestion-manager/src/lib.rscrates/preingestion-manager/tests/integration/host_bmc_firmware.rscrates/site-explorer/src/lib.rsrest-api/docs/index.htmlrest-api/openapi/spec.yaml
💤 Files with no reviewable changes (1)
- rest-api/openapi/spec.yaml
🚧 Files skipped from review as they are similar to previous changes (13)
- crates/preingestion-manager/Cargo.toml
- crates/api-core/src/api.rs
- crates/api-core/src/handlers/site_explorer.rs
- crates/preingestion-manager/tests/integration/host_bmc_firmware.rs
- crates/site-explorer/src/lib.rs
- crates/machine-controller/src/handler.rs
- crates/api-model/src/firmware.rs
- crates/firmware/src/config.rs
- crates/api-db/src/host_firmware_config.rs
- crates/api-core/src/tests/host_firmware_config.rs
- crates/api-core/src/machine_update_manager/host_firmware.rs
- crates/firmware/src/tests/config.rs
- crates/preingestion-manager/src/lib.rs
Apply DB-backed host firmware config overrides on top of static and metadata.toml firmware catalogs across firmware listing, desired firmware snapshotting, site explorer, host reprovisioning, machine update handling, and preingestion.
Runtime config now participates in version comparison and upgrade selection, while metadata remains the base catalog and DB values take precedence when they overlap.
Type of Change
Testing