Run CPUFJ bursts at the root by aliceb-nv · Pull Request #1179 · NVIDIA/cuopt

aliceb-nv · 2026-05-05T09:01:04Z

Some primal heuristics, like CPUFJ, may benefit in some cases from operating on the problem with root cuts.
This PR adds support for this, by running bursts of a few iterations of CPUFJ per cut pass opportunistically, on threads that would otherwise remain idle; and also once after the root cut passes are completed.

In most cases, this provides no benefits, but some instances like tutaki find their first integer incumbent much faster. Additionally, awhea is now reliably feasibilized and solved to a gap of <5% on 10min runs.

Description

Issue

Checklist

I am familiar with the Contributing Guidelines.
Testing
- New or existing tests cover these changes
- Added tests
- Created an issue to follow-up
- NA
Documentation
- The documentation is up to date with these changes
- Added new documentation
- NA

copy-pr-bot · 2026-05-05T09:01:07Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

aliceb-nv · 2026-05-05T09:01:36Z

/ok to test 64ba333

coderabbitai · 2026-05-05T09:10:56Z

📝 Walkthrough

Walkthrough

Introduces CPUFJ (CPU-based feasibility-jump heuristic) task execution into the root-cut processing loop via new structured task types, refactored initialization and solver extensions supporting work-unit limits, a factory API for task creation, and OpenMP task-based orchestration within branch-and-bound with explicit lifecycle management via scope guards.

Changes

CPU Feasibility-Jump Root-Cut Integration

Layer / File(s)	Summary
Type Definitions `cpp/src/mip_heuristics/feasibility_jump/cpu_fj_thread.cuh`	New `cpu_fj_thread_t<i_t, f_t>` worker with lifecycle hooks (`on_start`, `run_worker`, `on_terminate`), solution-found signaling, configurable time/work-unit limits, and ownership of `fj_cpu_climber_t`. New `fj_cpu_task_t<i_t, f_t>` encapsulates task execution state with preemption flag and custom-deleter `unique_ptr`.
Solver Extensions `cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu` (lines 1637–1816)	`cpufj_solve_loop` now accepts `work_unit_limit` parameter; loop breaks when work units exceed limit. `cpu_fj_thread_t::run_worker()` passes limit to solver.
Initialization Refactoring `cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu` (lines 1313–1560)	New `finalize_fj_cpu_host_initialization(...)` initializes flip-move caches and problem features. New `init_fj_cpu_from_host_lp(...)` constructs host FJ state from LP problem, rebuilds reverse CSR, and calls finalize.
Task Lifecycle API `cpp/src/mip_heuristics/feasibility_jump/cpu_fj_thread.cuh`, `cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu` (lines 1856–1898)	`make_fj_cpu_task_from_host_lp(...)` constructs tasks from host LP with improvement callback. `run_fj_cpu_task(...)` executes with time and work-unit limits. `stop_fj_cpu_task(...)` signals preemption. Custom deleter manages climber cleanup.
Root-Cut Loop Integration `cpp/src/branch_and_bound/branch_and_bound.cpp` (lines 2229–2556)	Introduces `cut_pass_action_t`/`cut_pass_result_t` for structured control flow. Root-cut CPUFJ task spawned as OpenMP task per cut pass, explicitly awaited, and stopped. Direct `break`/`continue` decisions replaced with structured CONTINUE/BREAK/RETURN actions propagating infeasibility, numerical, and termination outcomes.
Final Root-Cut Phase `cpp/src/branch_and_bound/branch_and_bound.cpp` (lines 2570–2587)	After main cut processing, conditionally builds and runs final CPUFJ root-cut task when cuts exist, using computed remaining time limit in deterministic mode.
Header Reorganization & Wiring `cpp/src/branch_and_bound/branch_and_bound.cpp` (lines 13, 29), `cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cuh` (lines 19–23), `cpp/src/mip_heuristics/local_search/local_search.cu` (line 128–132)	Added includes for `cpu_fj_thread.cuh` and `scope_guard.hpp`. Refactored `fj_cpu.cuh` to include new thread header and removed local `cpu_fj_thread_t` declaration. Updated improvement callback lambda to capture `this` for member access.
Template Instantiations `cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu` (lines 1899–1954)	Explicit template instantiations for `float` and `double` of new task factory, execution, finalization, and helper utilities.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Run CPUFJ bursts at the root' clearly describes the main objective of the pull request: enabling CPUFJ heuristic to run during root cut passes.
Description check	✅ Passed	The description explains the purpose and benefits of running CPUFJ bursts during root cuts, citing specific performance improvements on instances like tutaki and awhea.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu (1)

1768-1775: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce work_unit_limit on the actual consumed work.

work_units_elapsed is only accrued and checked every 100 iterations, and the last partial batch is never flushed before exit. For the short root-cut bursts this PR adds, that means a small work_unit_limit can be overshot by a wide margin while still underreporting the final work consumed.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu` around lines 1768 - 1775,
The code only updates fj_cpu.work_units_elapsed (via
fj_cpu.memory_aggregator.collect() and applying fj_cpu.work_unit_bias) every 100
iterations, causing the final partial batch to be unaccounted and allowing
work_unit_limit to be overshot; modify the loop so that you flush and apply the
remaining memory statistics on every iteration boundary that may exit: call auto
[loads, stores] = fj_cpu.memory_aggregator.collect(), compute biased_work =
(loads + stores) * fj_cpu.work_unit_bias / 1e10, add it to
fj_cpu.work_units_elapsed, call fj_cpu.producer_sync->notify_progress() if
non-null, and then check fj_cpu.work_units_elapsed >= work_unit_limit to
break—either by moving this logic out of the 100-iteration guard into the common
path or by ensuring you perform one final collect/update/notify/check right
before breaking/returning so the true consumed work is enforced.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/src/branch_and_bound/branch_and_bound.cpp`:
- Around line 2231-2237: The callback root_cut_cpufj_improvement_callback is
decoding the host-LP assignment against the live original_lp_ (via
uncrush_primal_solution) which can be concurrently mutated by
add_cuts/remove_cuts; capture and use the exact LP snapshot used to build the
CPUFJ task (or change the task to emit user-space assignments directly) so
uncrush_primal_solution runs against an immutable snapshot, and add
synchronization (or assert/guideline comment) around shared state mutation to
flag the missing concurrency protection for original_problem_/original_lp_
before calling set_new_solution().
- Around line 2254-2256: These return paths call set_solution_at_root(...) and
return immediately, which bypasses the existing async clique cleanup; insert a
call to finish_clique_thread() immediately before each early return so the async
clique task is stopped and joined. Specifically, add finish_clique_thread() just
prior to the set_solution_at_root(solution, cut_info); return
mip_status_t::OPTIMAL; sequence shown, and make the same change at the other
analogous exit that calls set_solution_at_root and returns (the second location
referenced in the comment).

In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu`:
- Around line 1881-1886: The task restart fails because run_fj_cpu_task only
clears fj_cpu.halted but not fj_cpu.preemption_flag, so after stop_fj_cpu_task
sets preemption_flag = true subsequent runs immediately exit; update
run_fj_cpu_task (and the duplicate block around the other overload at the
1890–1896 region) to reset fj_cpu.preemption_flag to false before calling
cpufj_solve_loop (i.e., clear the atomic preemption flag on the fj_cpu within
the fj_cpu_task_t<i_t,f_t> instance so the while (!fj_cpu.halted &&
!fj_cpu.preemption_flag.load()) loop can run again).

---

Outside diff comments:
In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu`:
- Around line 1768-1775: The code only updates fj_cpu.work_units_elapsed (via
fj_cpu.memory_aggregator.collect() and applying fj_cpu.work_unit_bias) every 100
iterations, causing the final partial batch to be unaccounted and allowing
work_unit_limit to be overshot; modify the loop so that you flush and apply the
remaining memory statistics on every iteration boundary that may exit: call auto
[loads, stores] = fj_cpu.memory_aggregator.collect(), compute biased_work =
(loads + stores) * fj_cpu.work_unit_bias / 1e10, add it to
fj_cpu.work_units_elapsed, call fj_cpu.producer_sync->notify_progress() if
non-null, and then check fj_cpu.work_units_elapsed >= work_unit_limit to
break—either by moving this logic out of the 100-iteration guard into the common
path or by ensuring you perform one final collect/update/notify/check right
before breaking/returning so the true consumed work is enforced.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 86405377-b8f0-4c1e-888b-d909dd15851b

📥 Commits

Reviewing files that changed from the base of the PR and between a4de253 and 64ba333.

📒 Files selected for processing (5)

cpp/src/branch_and_bound/branch_and_bound.cpp
cpp/src/mip_heuristics/feasibility_jump/cpu_fj_thread.cuh
cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu
cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cuh
cpp/src/mip_heuristics/local_search/local_search.cu

coderabbitai · 2026-05-05T09:10:59Z

+  auto root_cut_cpufj_improvement_callback =
+    [this](f_t obj, const std::vector<f_t>& assignment, double) {
+      std::vector<f_t> user_assignment;
+      uncrush_primal_solution(original_problem_, original_lp_, assignment, user_assignment);
+      settings_.log.debug("Root cut CPUFJ found solution with objective %.16e\n", obj);
+      set_new_solution(user_assignment);
+    };


⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Uncrush against the task’s LP snapshot, not the live root LP.

assignment comes from the host-LP snapshot used to build the CPUFJ task, but this callback decodes it against original_lp_, which the concurrent cut pass is still mutating. That can remap cut/slack columns incorrectly or race with add_cuts/remove_cuts, so a valid CPUFJ incumbent can be corrupted before set_new_solution() sees it. Please either capture the exact LP snapshot used for task creation or make the CPUFJ callback emit user-space assignments directly.

As per coding guidelines, flag missing synchronization for shared state in concurrent code.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 2231 - 2237, The callback root_cut_cpufj_improvement_callback is decoding the host-LP assignment against the live original_lp_ (via uncrush_primal_solution) which can be concurrently mutated by add_cuts/remove_cuts; capture and use the exact LP snapshot used to build the CPUFJ task (or change the task to emit user-space assignments directly) so uncrush_primal_solution runs against an immutable snapshot, and add synchronization (or assert/guideline comment) around shared state mutation to flag the missing concurrency protection for original_problem_/original_lp_ before calling set_new_solution().

coderabbitai · 2026-05-05T09:10:59Z

    if (num_fractional == 0) {
      set_solution_at_root(solution, cut_info);
      return mip_status_t::OPTIMAL;


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Run the existing clique-thread cleanup on these new root-cut exits.

These return paths bypass finish_clique_thread(), unlike the earlier root-level exits in this function. That leaves the async clique task running past solve() on these paths.

Proposed cleanup

for (i_t cut_pass = 0; cut_pass < settings_.max_cut_passes; cut_pass++) { if (num_fractional == 0) { set_solution_at_root(solution, cut_info); + finish_clique_thread(); return mip_status_t::OPTIMAL; } @@ - if (cut_pass_result.action == cut_pass_action_t::RETURN) { return cut_pass_result.status; } + if (cut_pass_result.action == cut_pass_action_t::RETURN) { + finish_clique_thread(); + return cut_pass_result.status; + }

Also applies to: 2540-2541

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 2254 - 2256, These return paths call set_solution_at_root(...) and return immediately, which bypasses the existing async clique cleanup; insert a call to finish_clique_thread() immediately before each early return so the async clique task is stopped and joined. Specifically, add finish_clique_thread() just prior to the set_solution_at_root(solution, cut_info); return mip_status_t::OPTIMAL; sequence shown, and make the same change at the other analogous exit that calls set_solution_at_root and returns (the second location referenced in the comment).

coderabbitai · 2026-05-05T09:10:59Z

+void run_fj_cpu_task(fj_cpu_task_t<i_t, f_t>& task, f_t time_limit, double work_unit_limit)
+{
+  cuopt_assert(task.fj_cpu != nullptr, "CPUFJ task has no climber");
+  auto& fj_cpu  = *task.fj_cpu;
+  fj_cpu.halted = false;
+  cpufj_solve_loop(fj_cpu, time_limit, work_unit_limit);


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep fj_cpu_task_t restartable across bursts.

stop_fj_cpu_task() sets preemption_flag = true, but run_fj_cpu_task() only clears halted. If the same task is reused for multiple root-cut bursts, every run after the first stop will hit while (!fj_cpu.halted && !fj_cpu.preemption_flag.load()) and return immediately.

Suggested fix

template <typename i_t, typename f_t> void run_fj_cpu_task(fj_cpu_task_t<i_t, f_t>& task, f_t time_limit, double work_unit_limit) { cuopt_assert(task.fj_cpu != nullptr, "CPUFJ task has no climber"); auto& fj_cpu = *task.fj_cpu; + task.preemption_flag.store(false); fj_cpu.halted = false; cpufj_solve_loop(fj_cpu, time_limit, work_unit_limit); } template <typename i_t, typename f_t> void stop_fj_cpu_task(fj_cpu_task_t<i_t, f_t>& task) { if (task.fj_cpu) { auto& fj_cpu = *task.fj_cpu; - fj_cpu.preemption_flag = true; fj_cpu.halted = true; } }

Also applies to: 1890-1896

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu` around lines 1881 - 1886, The task restart fails because run_fj_cpu_task only clears fj_cpu.halted but not fj_cpu.preemption_flag, so after stop_fj_cpu_task sets preemption_flag = true subsequent runs immediately exit; update run_fj_cpu_task (and the duplicate block around the other overload at the 1890–1896 region) to reset fj_cpu.preemption_flag to false before calling cpufj_solve_loop (i.e., clear the atomic preemption flag on the fj_cpu within the fj_cpu_task_t<i_t,f_t> instance so the while (!fj_cpu.halted && !fj_cpu.preemption_flag.load()) loop can run again).

akifcorduk · 2026-05-05T10:37:44Z

+                                                      "[RootCut CPUFJ] ");
+    settings_.log.debug("Root cut CPUFJ final problem build time: %.6f seconds\n",
+                        toc(root_cut_cpufj_build_start_time));
+    f_t fj_time_limit = settings_.deterministic


Why does undetermistic path get 1s?

akifcorduk · 2026-05-05T10:39:29Z

 namespace cuopt::linear_programming::detail {

+template <typename f_t>
+static f_t clamp_value(f_t value, f_t lower, f_t upper)


I think there is std::clamp available

akifcorduk · 2026-05-05T10:42:07Z

+}
+
+template <typename i_t, typename f_t>
+static void rebuild_reverse_matrix(i_t n_variables,


We have a transpose function available in sparse_matrix.cpp even though the parameters are a bit different, we should reuse it.

akifcorduk · 2026-05-05T10:44:28Z

+  cuopt::scope_guard root_cut_cpufj_guard([&]() { stop_root_cut_cpufj(); });
+
+  enum class cut_pass_action_t { CONTINUE, BREAK, RETURN };
+  struct cut_pass_result_t {


I am not sure if I understand the need/logic for this. Can't we just launch a new cpufj task after each cut pass and wait until the next pass finishes and launch another cpufj task while staying in cut pass loop ?

aliceb-nv added 9 commits April 27, 2026 05:28

run a single burst

9698d00

run cpufj in parallel during cuts

80ee80a

Merge branch 'main' into cpufj_cut_bursts

bb71745

adjust logs, fix time limit

9e464bb

Merge branch 'main' into cpufj_cut_bursts

0053a27

cleanup

e2300a4

Merge branch 'main' into cpufj_cut_bursts

d4f1975

some cleanup

45c8154

AI review comments

64ba333

aliceb-nv added this to the 26.06 milestone May 5, 2026

aliceb-nv added the non-breaking Introduces a non-breaking change label May 5, 2026

aliceb-nv requested a review from a team as a code owner May 5, 2026 09:01

aliceb-nv added the improvement Improves an existing functionality label May 5, 2026

aliceb-nv requested review from akifcorduk and hlinsen May 5, 2026 09:01

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

akifcorduk reviewed May 5, 2026

View reviewed changes

Conversation

aliceb-nv commented May 5, 2026

Description

Issue

Checklist

Uh oh!

copy-pr-bot Bot commented May 5, 2026

Uh oh!

aliceb-nv commented May 5, 2026

Uh oh!

coderabbitai Bot commented May 5, 2026

Walkthrough

Changes

Estimated code review effort

Review ran into problems

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

akifcorduk May 5, 2026

Choose a reason for hiding this comment

Uh oh!

akifcorduk May 5, 2026

Choose a reason for hiding this comment

Uh oh!

akifcorduk May 5, 2026

Choose a reason for hiding this comment

Uh oh!

akifcorduk May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants