Skip to content

feat: Dpf: Support for multiple DpuDeployments to handle BF3, BF4 and other deployment strategy #3084

Open
abvarshney-nv wants to merge 3 commits into
NVIDIA:mainfrom
abvarshney-nv:dpf_multi_deployments
Open

feat: Dpf: Support for multiple DpuDeployments to handle BF3, BF4 and other deployment strategy #3084
abvarshney-nv wants to merge 3 commits into
NVIDIA:mainfrom
abvarshney-nv:dpf_multi_deployments

Conversation

@abvarshney-nv

@abvarshney-nv abvarshney-nv commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Adds support for multiple DPUDeployments in DPF, enabling BF4 DPUs to be provisionedalongside BF3 using a separate deployment configuration. BF3 and BF4 DPUs are now automatically routed to their respective deployments based on hardware part number.

Attempting to register a BF4 DPU when no BF4 deployment is configured fails with a clear error.

For the completeness, a BF4Generic deployment is added, but Dpuflavor is not implemented. This will be done in another PR when the information is available.

This PR also restructures the BF3 config into deployments section. The BF3 section is default, thus will be configured even if not present in the toml.

The updated tool looks like:

  [dpf]
  enabled = false

  # Optional: override the pull secret for all mandatory services (except dts and doca_hbn)
  docker_image_pull_secret = "dpf-pull-secret"
  
  # Optional: route DPU containerd traffic through a proxy
  [dpf.proxy]
  https_proxy = "http://proxy.example.com:3128"
  no_proxy = "localhost,127.0.0.1,.cluster.local"

  # BF3 deployment — all fields have defaults; omit the block entirely or override all values (except BFB url has a  default).
  [dpf.deployments.bf3]
  bfb_url        = "https://content.mellanox.com/BlueField/BFBs/Ubuntu24.04/bf-bundle-
  3.2.2-125_26.02_ubuntu-24.04_64k_prod.bfb"
  flavor_name    = "carbide-dpu-flavor"
  deployment_name = "nico-deployment-v2"
  node_label_key = "carbide.nvidia.com/controlled.node.v2"

  # BF4 deployment — omit the block entirely to disable BF4 support.
  # All fields are required when the block is present (no defaults, except BFB url).
  [dpf.deployments.bf4_generic]
  bfb_url         = "https://..."   # required
  flavor_name     = "..."           # required
  deployment_name = "..."           # required
  node_label_key  = "..."           # required

  # Mandatory Helm services — defaults are built in; only override when using custom 
  chart repos or versions.
  [dpf.services.dpu_agent]
  name                   = "dpu-agent"
  helm_repo_url          = "..."
  helm_chart             = "dpu-agent"
  helm_version           = "x.y.z"
  docker_repo_url        = "..."
  docker_image_tag       = "x.y.z"
  docker_image_pull_secret = "dpf-pull-secret"

  # Same shape for: dpf.services.dts, dpf.services.doca_hbn,
  #                 dpf.services.dhcp_server, dpf.services.fmds, dpf.services.otel

Related issues

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

abvarshney-nv and others added 2 commits July 1, 2026 16:45
Adds config structure and initialization support for multiple named
DPUDeployments. Each deployment (e.g. bf4_generic) has its own BFB,
DPUFlavor, DPUDeployment CR, node selector label, and mandatory services.

- Add DpfDeploymentConfig + DpfDeploymentsConfig to DpfConfig; the
  existing top-level fields remain as the BF3 (default) deployment
- Add node_labels_for_deployment(name) to ResourceLabeler trait so
  build_deployment looks up per-deployment node selector labels
- Extend CarbideDPFLabeler with a deployment_node_labels registry
  (BTreeMap<deployment_name, labels>) and with_deployment_node_labels()
  builder method
- setup.rs builds the labeler with labels for all configured deployments
  and calls create_initialization_objects for bf4_generic if present

State machine changes (setting deployment_name on DpuNodeInfo/
DpuDeviceInfo at registration time) are a follow-up.

Closes NVIDIA#3056

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@abvarshney-nv abvarshney-nv requested a review from a team as a code owner July 2, 2026 13:07
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 66d5881e-f4fa-4924-8dd2-6e26f5f259a9

📥 Commits

Reviewing files that changed from the base of the PR and between 3fa1b6d and c88a5f1.

📒 Files selected for processing (10)
  • crates/api-core/src/cfg/file.rs
  • crates/api-core/src/setup.rs
  • crates/api-core/src/tests/dpf/duplicate_events.rs
  • crates/api-core/src/tests/dpf/happy_path.rs
  • crates/api-core/src/tests/dpf/reprovisioning.rs
  • crates/api-core/src/tests/dpf/stale_labels.rs
  • crates/api-core/src/tests/dpf/waiting_for_ready.rs
  • crates/api-core/src/tests/machine_admin_force_delete.rs
  • crates/machine-controller/src/dpf.rs
  • crates/machine-controller/src/handler/dpf.rs
🚧 Files skipped from review as they are similar to previous changes (10)
  • crates/api-core/src/tests/dpf/happy_path.rs
  • crates/api-core/src/tests/dpf/waiting_for_ready.rs
  • crates/machine-controller/src/handler/dpf.rs
  • crates/api-core/src/setup.rs
  • crates/api-core/src/tests/machine_admin_force_delete.rs
  • crates/api-core/src/tests/dpf/duplicate_events.rs
  • crates/api-core/src/tests/dpf/stale_labels.rs
  • crates/api-core/src/tests/dpf/reprovisioning.rs
  • crates/machine-controller/src/dpf.rs
  • crates/api-core/src/cfg/file.rs

Summary by CodeRabbit

  • New Features

    • Added support for multiple named deployments in DPF configuration, including deployment-specific settings.
    • Introduced deployment-aware handling for DPU devices and nodes, with support for BF3 and BF4 generic types.
    • Expanded startup and provisioning to select labels and resources based on deployment type.
  • Bug Fixes

    • Improved validation to catch duplicate deployment identifiers and prevent conflicting configuration.
    • Updated node label checks to account for deployment-specific label sets.

Walkthrough

This PR adds deployment-type-aware DPF configuration, SDK initialization, and label verification. DpuDeploymentType is threaded through DPF types, machine-controller flows, API-core startup, and tests. BF3 is the default deployment; BF4Generic is optionally initialized.

Changes

Deployment-type-aware DPF provisioning and labeling

Layer / File(s) Summary
Deployment type contracts
crates/dpf/src/types.rs, crates/api-core/src/cfg/file.rs, crates/api-core/src/dpf_services.rs
Adds DpuDeploymentType, adds deployment_type to DPF resource/config types, and reshapes DPF config to use named deployments with validation and mandatory service construction.
SDK runtime and provisioning flow
crates/dpf/src/lib.rs, crates/dpf/src/flavor.rs, crates/dpf/src/sdk.rs, crates/dpf/src/bin/api_harness.rs
Makes label lookup, flavor creation, deployment construction, and the harness provisioning path deployment-type-aware.
SDK and API-core tests
crates/dpf/src/sdk.rs (tests), crates/dpf/src/test/*, crates/api-core/src/tests/dpf/*, crates/api-core/src/tests/machine_admin_force_delete.rs
Updates fixtures and expectations to supply deployment_type and the new verification signature.
Machine-controller deployment typing
crates/machine-controller/src/dpf.rs, crates/machine-controller/src/handler/dpf.rs
Derives deployment type from hardware, threads it through node registration, and verifies labels against deployment-specific label maps.
API-core startup
crates/api-core/src/setup.rs
Extracts DPF startup into a helper that builds mandatory services, label maps, and BF3/BF4Generic initialization objects.

Estimated code review effort: 4 (Complex) | ~60 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: support for multiple DPU deployments, including BF3 and BF4 routing.
Description check ✅ Passed The description matches the implemented changes, covering multiple deployments, BF3 defaults, BF4 routing, and updated tests.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
crates/machine-controller/src/handler/dpf.rs (1)

205-232: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Use one host-level deployment type consistently.

Line 232 registers the host’s DpuNodeInfo with the primary DPU’s deployment type, but Lines 494-496 verify that same host-level node using whichever dpu_snapshot is currently being reconciled. If a host has mixed or transiently inconsistent DPU classifications, a correctly labeled DPUNode can be marked stale. Resolve and validate a single deployment type for the host, then reuse it for both registration and label verification.

As per path instructions, controller logic should be reviewed for reconciliation correctness, idempotency, timeout/cancellation behavior, state-machine transitions, and safe recovery from partial failures.

Also applies to: 494-496

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/machine-controller/src/handler/dpf.rs` around lines 205 - 232, The
host-level DpuNodeInfo registration and label verification are using different
DPU snapshots, which can cause a correctly labeled node to be treated as stale.
In dpf.rs, resolve a single host deployment type once from the primary DPU (or a
validated host-wide source) in the reconciliation flow around DpuNodeInfo
creation, then reuse that same value both when registering the node and when
verifying labels in the later check path near the stale-label logic. Make the
deployment type selection explicit and consistent across the relevant dpf_sdk
and state reconciliation code paths.

Source: Path instructions

crates/dpf/src/sdk.rs (2)

2282-2303: 🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win

TestLabeler wasn't updated for the new node_labels_for_deployment_type hook — this breaks two tests.

TestLabeler overrides device_labels, node_labels, and node_context_labels, but not the new node_labels_for_deployment_type. It therefore falls back to the trait default, which returns Ok(BTreeMap::new()).

This has two concrete downstream effects:

  1. test_dpu_node_labels (Line 2371-2395): register_dpu_node now sources labels via node_labels_for_deployment_type (empty for TestLabeler) + node_context_labels (also empty), so metadata.labels becomes None. The test's node.metadata.labels.as_ref().unwrap() will panic.
  2. verify_node_labels_against_seeded_node (Line 3155-3292): required_labels is computed via the same empty default, so required_labels.iter().all(...) is vacuously true for every row — the Yields(false) cases ("stale labels", "no labels field at all", "wrong value") no longer exercise the real comparison logic; they'd pass for the wrong reason (or the earlier scenario collides and fails, depending on evaluation order).

Both are fixed by giving TestLabeler a matching override.

🐛 Proposed fix
     impl ResourceLabeler for TestLabeler {
         fn device_labels(&self, info: &DpuDeviceInfo) -> BTreeMap<String, String> {
             BTreeMap::from([
                 ("test/device".to_string(), "true".to_string()),
                 ("test/host-bmc-ip".to_string(), info.host_bmc_ip.to_string()),
                 (
                     "test/dpu-machine-id".to_string(),
                     info.dpu_machine_id.clone(),
                 ),
             ])
         }

         fn node_labels(&self) -> BTreeMap<String, String> {
             BTreeMap::from([("test/node".to_string(), "true".to_string())])
         }

+        fn node_labels_for_deployment_type(
+            &self,
+            _deployment_type: DpuDeploymentType,
+        ) -> Result<BTreeMap<String, String>, DpfError> {
+            Ok(self.node_labels())
+        }
+
         fn node_context_labels(&self, _info: &DpuNodeInfo) -> BTreeMap<String, String> {
             BTreeMap::new()
         }
     }

Also applies to: 2371-2395, 3155-3292

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/dpf/src/sdk.rs` around lines 2282 - 2303, Update TestLabeler to
override node_labels_for_deployment_type with the same test node label it
previously provided via node_labels, so register_dpu_node and the label
verification tests keep seeing non-empty required labels. Keep the existing
device_labels and node_context_labels behavior unchanged, and make the new
override return the deployment-type-independent node label expected by
test_dpu_node_labels and verify_node_labels_against_seeded_node.

1265-1404: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Remove deployment-specific node labels during deletion In crates/dpf/src/sdk.rs, node_label_removal_patch() still clears only node_labels(), while registration now applies node_labels_for_deployment_type(info.deployment_type). For labelers like CarbideDPFLabeler, BF4 nodes can keep their deployment selector label after delete_dpu_node() / force_delete_host(). Thread deployment_type into the delete path, or derive it before patching, so cleanup matches creation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/dpf/src/sdk.rs` around lines 1265 - 1404, The delete path in
DpfSdk::delete_dpu_node is only removing the generic node labels, but
register_dpu_node now applies deployment-specific labels via
node_labels_for_deployment_type(info.deployment_type). Update the cleanup flow
so it knows the DpuDeploymentType before building the patch, and use that when
calling node_label_removal_patch (or equivalent) so labels like the BF4 selector
are removed consistently. Make sure any higher-level delete helper such as
force_delete_host passes the deployment type through to this path.
🧹 Nitpick comments (2)
crates/machine-controller/src/dpf.rs (1)

177-192: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Seed the BF3 labels in new() to keep the labeler hard to misuse.

CarbideDPFLabeler::new(...) now constructs a labeler where node_labels_for_deployment_type(DpuDeploymentType::Bf3) fails unless every caller remembers with_deployment_type_labels(...). Preserve the single-deployment default in the constructor and let the builder override/add configured deployments.

Suggested refactor
     pub fn new(node_label_key: String) -> Self {
+        let bf3_labels = BTreeMap::from([
+            (node_label_key.clone(), "true".to_string()),
+            (
+                "feature.node.kubernetes.io/dpu-enabled".to_string(),
+                "true".to_string(),
+            ),
+        ]);
+
         Self {
             node_label_key,
-            deployment_type_labels: BTreeMap::new(),
+            deployment_type_labels: BTreeMap::from([(DpuDeploymentType::Bf3, bf3_labels)]),
         }
     }

As per coding guidelines, prefer designs that are hard to misuse.

Also applies to: 225-237

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/machine-controller/src/dpf.rs` around lines 177 - 192,
`CarbideDPFLabeler::new` currently leaves `deployment_type_labels` empty, which
makes `node_labels_for_deployment_type(DpuDeploymentType::Bf3)` fail unless
every caller remembers to call `with_deployment_type_labels`. Seed the BF3
default labels in the constructor so the labeler works out of the box, and keep
`with_deployment_type_labels` as an override path that can replace or extend the
configured deployment labels.

Source: Coding guidelines

crates/api-core/src/cfg/file.rs (1)

1166-1189: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Consider validating deployment-specific config to prevent silent misrouting.

DpfDeploymentConfig has no validation ensuring node_label_key (or deployment_name/flavor_name) is distinct from the top-level DpfConfig values or non-empty. Per the referenced build_deployment_type_labels in setup.rs, a bf4_generic.node_label_key equal to the top-level dpf.node_label_key would produce identical BF3/BF4Generic label sets, silently defeating deployment-type routing rather than failing fast at startup.

Consider adding a small validation step (e.g., in config loading or initialize_dpf_sdk) that rejects overlapping label keys/deployment names across configured deployments.

🛡️ Example validation sketch
impl DpfConfig {
    fn validate_deployments(&self) -> Result<(), String> {
        if let Some(bf4) = &self.deployments.bf4_generic {
            if bf4.node_label_key == self.node_label_key {
                return Err(format!(
                    "dpf.deployments.bf4_generic.node_label_key must differ from dpf.node_label_key ({})",
                    self.node_label_key
                ));
            }
        }
        Ok(())
    }
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/cfg/file.rs` around lines 1166 - 1189, Add validation for
deployment-specific DPF config so overlapping values fail fast instead of
silently reusing routing labels. In the DpfConfig/DpfDeploymentsConfig path,
validate DpfDeploymentConfig fields like node_label_key, deployment_name, and
flavor_name against the top-level DpfConfig equivalents and reject empty or
duplicate values. Wire this into the config loading or initialize_dpf_sdk flow,
and use the existing build_deployment_type_labels/setup.rs routing logic as the
place to confirm the BF3/BF4Generic label sets cannot collide.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/machine-controller/src/dpf.rs`:
- Around line 89-90: The DPU deployment resolver currently treats missing or
unknown hardware as BF4 by falling through to DpuDeploymentType::Bf4Generic in
deployment_type_for_dpu, which can misclassify unreported BF3/unknown devices.
Update the resolver around the hardware_info/dpu_info/part_number handling to
return an error or unknown state when the part number is absent or unrecognized,
and only select Bf4Generic after an explicit BF4 check. Make the logic in
deployment_type_for_dpu and its callers fail safely instead of defaulting
missing hardware to BF4.

---

Outside diff comments:
In `@crates/dpf/src/sdk.rs`:
- Around line 2282-2303: Update TestLabeler to override
node_labels_for_deployment_type with the same test node label it previously
provided via node_labels, so register_dpu_node and the label verification tests
keep seeing non-empty required labels. Keep the existing device_labels and
node_context_labels behavior unchanged, and make the new override return the
deployment-type-independent node label expected by test_dpu_node_labels and
verify_node_labels_against_seeded_node.
- Around line 1265-1404: The delete path in DpfSdk::delete_dpu_node is only
removing the generic node labels, but register_dpu_node now applies
deployment-specific labels via
node_labels_for_deployment_type(info.deployment_type). Update the cleanup flow
so it knows the DpuDeploymentType before building the patch, and use that when
calling node_label_removal_patch (or equivalent) so labels like the BF4 selector
are removed consistently. Make sure any higher-level delete helper such as
force_delete_host passes the deployment type through to this path.

In `@crates/machine-controller/src/handler/dpf.rs`:
- Around line 205-232: The host-level DpuNodeInfo registration and label
verification are using different DPU snapshots, which can cause a correctly
labeled node to be treated as stale. In dpf.rs, resolve a single host deployment
type once from the primary DPU (or a validated host-wide source) in the
reconciliation flow around DpuNodeInfo creation, then reuse that same value both
when registering the node and when verifying labels in the later check path near
the stale-label logic. Make the deployment type selection explicit and
consistent across the relevant dpf_sdk and state reconciliation code paths.

---

Nitpick comments:
In `@crates/api-core/src/cfg/file.rs`:
- Around line 1166-1189: Add validation for deployment-specific DPF config so
overlapping values fail fast instead of silently reusing routing labels. In the
DpfConfig/DpfDeploymentsConfig path, validate DpfDeploymentConfig fields like
node_label_key, deployment_name, and flavor_name against the top-level DpfConfig
equivalents and reject empty or duplicate values. Wire this into the config
loading or initialize_dpf_sdk flow, and use the existing
build_deployment_type_labels/setup.rs routing logic as the place to confirm the
BF3/BF4Generic label sets cannot collide.

In `@crates/machine-controller/src/dpf.rs`:
- Around line 177-192: `CarbideDPFLabeler::new` currently leaves
`deployment_type_labels` empty, which makes
`node_labels_for_deployment_type(DpuDeploymentType::Bf3)` fail unless every
caller remembers to call `with_deployment_type_labels`. Seed the BF3 default
labels in the constructor so the labeler works out of the box, and keep
`with_deployment_type_labels` as an override path that can replace or extend the
configured deployment labels.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c0230885-2a65-46e7-87af-5c77f06166aa

📥 Commits

Reviewing files that changed from the base of the PR and between 0d5452b and f5d17e7.

📒 Files selected for processing (18)
  • crates/api-core/src/cfg/file.rs
  • crates/api-core/src/dpf_services.rs
  • crates/api-core/src/setup.rs
  • crates/api-core/src/tests/dpf/duplicate_events.rs
  • crates/api-core/src/tests/dpf/happy_path.rs
  • crates/api-core/src/tests/dpf/reprovisioning.rs
  • crates/api-core/src/tests/dpf/stale_labels.rs
  • crates/api-core/src/tests/dpf/waiting_for_ready.rs
  • crates/api-core/src/tests/machine_admin_force_delete.rs
  • crates/dpf/src/bin/api_harness.rs
  • crates/dpf/src/flavor.rs
  • crates/dpf/src/lib.rs
  • crates/dpf/src/sdk.rs
  • crates/dpf/src/test/sdk_device_registration.rs
  • crates/dpf/src/test/sdk_reboot_annotation.rs
  • crates/dpf/src/types.rs
  • crates/machine-controller/src/dpf.rs
  • crates/machine-controller/src/handler/dpf.rs

Comment thread crates/machine-controller/src/dpf.rs Outdated
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct DpfDeploymentConfig {
/// URL to the BlueField firmware bundle (BFB) for DPU provisioning.
pub bfb_url: String,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BF4 uses a completely separate BFB?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can, for testing or any reason. I just want to make it configurable. but yeah, let me set a sane default value to it.

Comment thread crates/api-core/src/cfg/file.rs
@abvarshney-nv abvarshney-nv linked an issue Jul 2, 2026 that may be closed by this pull request
2 tasks
@abvarshney-nv abvarshney-nv marked this pull request as draft July 2, 2026 13:41
@copy-pr-bot

copy-pr-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Comment thread crates/dpf/src/flavor.rs
// This should be used to create a separate flavor for BF3, BF4Generic, BF4Astra
_deployment_type: DpuDeploymentType,
) -> Result<DPUFlavor, crate::error::DpfError> {
default_flavor(namespace, proxy)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once the BF4 is implemented, I'm not sure "default flavor" should imply BF3. I don't think there would be a reasonable default

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is not implemented as I have not received a working DPUflavor from DPF team yet.

Comment thread crates/dpf/src/types.rs
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub enum DpuDeploymentType {
Bf3,
Bf4Generic,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will there be multiple BF4 types?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if so, wouldn't there be multiple BF3 types (or at least a rename of the Bf3 to Bf3Generic)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, BF4Generic and BF4Astra afaik.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will category or configuration like BF4NicMode or BF4DPUMode fall under Bf4Generic?

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
boot-artifacts-aarch64 3 0 0 3 0 0
boot-artifacts-x86_64 3 0 0 3 0 0
forge-admin-cli-x86_64 272 5 27 89 7 144
machine-validation-runner 769 25 209 278 36 221
machine_validation 769 25 209 278 36 221
machine_validation-aarch64 769 25 209 278 36 221
nvmetal-carbide 769 25 209 278 36 221
TOTAL 3354 105 863 1207 151 1028

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

@abvarshney-nv abvarshney-nv marked this pull request as ready for review July 2, 2026 16:27

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/api-core/src/cfg/file.rs (1)

1152-1167: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Allow BF3 partial overrides to inherit defaults

[dpf.deployments.bf3] is documented as allowing per-field overrides, but once the table is present it still requires all four keys. That breaks the bfb_url-only migration path. Make BF3 merge supplied values onto DpfDeploymentConfig::default(), and add a deserialization test for a partial BF3 block.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-core/src/cfg/file.rs` around lines 1152 - 1167, The BF3 deployment
config currently requires all fields whenever the [dpf.deployments.bf3] table is
present, so partial overrides cannot inherit the default BF3 values. Update
DpfDeploymentConfig deserialization so BF3 values merge onto
DpfDeploymentConfig::default() instead of treating every field as mandatory,
while keeping the existing behavior for other deployment blocks like
bf4_generic. Add a deserialization test around
DpfDeploymentsConfig/DpfDeploymentConfig that covers a partial bf3 block (for
example only bfb_url) and verifies the missing fields come from the default
config.

Source: Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/api-core/src/setup.rs`:
- Around line 690-693: The BF4Generic initialization path currently reuses BF3
identifiers, which can create overlapping resources or selectors. In the setup
flow around carbide_config.dpf.deployments.bf4_generic and the
make_init_config/create_initialization_objects call, add a preflight validation
that checks deployment_name, flavor_name, and node_label_key against the BF3
configuration and fails with a clear config error if any values overlap. Ensure
the guard runs before any resource creation so the bf4_generic path never
proceeds with conflicting names or label keys.

---

Outside diff comments:
In `@crates/api-core/src/cfg/file.rs`:
- Around line 1152-1167: The BF3 deployment config currently requires all fields
whenever the [dpf.deployments.bf3] table is present, so partial overrides cannot
inherit the default BF3 values. Update DpfDeploymentConfig deserialization so
BF3 values merge onto DpfDeploymentConfig::default() instead of treating every
field as mandatory, while keeping the existing behavior for other deployment
blocks like bf4_generic. Add a deserialization test around
DpfDeploymentsConfig/DpfDeploymentConfig that covers a partial bf3 block (for
example only bfb_url) and verifies the missing fields come from the default
config.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c45efd5c-da95-40dd-8348-59f621271866

📥 Commits

Reviewing files that changed from the base of the PR and between f5d17e7 and 3fa1b6d.

📒 Files selected for processing (10)
  • crates/api-core/src/cfg/file.rs
  • crates/api-core/src/setup.rs
  • crates/api-core/src/tests/dpf/duplicate_events.rs
  • crates/api-core/src/tests/dpf/happy_path.rs
  • crates/api-core/src/tests/dpf/reprovisioning.rs
  • crates/api-core/src/tests/dpf/stale_labels.rs
  • crates/api-core/src/tests/dpf/waiting_for_ready.rs
  • crates/api-core/src/tests/machine_admin_force_delete.rs
  • crates/machine-controller/src/dpf.rs
  • crates/machine-controller/src/handler/dpf.rs
🚧 Files skipped from review as they are similar to previous changes (4)
  • crates/machine-controller/src/handler/dpf.rs
  • crates/api-core/src/tests/machine_admin_force_delete.rs
  • crates/api-core/src/tests/dpf/reprovisioning.rs
  • crates/api-core/src/tests/dpf/duplicate_events.rs

Comment thread crates/api-core/src/setup.rs
@abvarshney-nv abvarshney-nv force-pushed the dpf_multi_deployments branch from 3fa1b6d to c88a5f1 Compare July 2, 2026 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: DPF: support for multiple DPUDeployment

3 participants