Skip to content

Run CPUFJ bursts at the root#1179

Open
aliceb-nv wants to merge 9 commits intoNVIDIA:mainfrom
aliceb-nv:cpufj_cut_bursts
Open

Run CPUFJ bursts at the root#1179
aliceb-nv wants to merge 9 commits intoNVIDIA:mainfrom
aliceb-nv:cpufj_cut_bursts

Conversation

@aliceb-nv
Copy link
Copy Markdown
Contributor

Some primal heuristics, like CPUFJ, may benefit in some cases from operating on the problem with root cuts.
This PR adds support for this, by running bursts of a few iterations of CPUFJ per cut pass opportunistically, on threads that would otherwise remain idle; and also once after the root cut passes are completed.

In most cases, this provides no benefits, but some instances like tutaki find their first integer incumbent much faster. Additionally, awhea is now reliably feasibilized and solved to a gap of <5% on 10min runs.

Description

Issue

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

@aliceb-nv aliceb-nv added this to the 26.06 milestone May 5, 2026
@aliceb-nv aliceb-nv added the non-breaking Introduces a non-breaking change label May 5, 2026
@aliceb-nv aliceb-nv requested a review from a team as a code owner May 5, 2026 09:01
@aliceb-nv aliceb-nv added the improvement Improves an existing functionality label May 5, 2026
@aliceb-nv aliceb-nv requested review from akifcorduk and hlinsen May 5, 2026 09:01
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 5, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@aliceb-nv
Copy link
Copy Markdown
Contributor Author

/ok to test 64ba333

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

📝 Walkthrough

Walkthrough

Introduces CPUFJ (CPU-based feasibility-jump heuristic) task execution into the root-cut processing loop via new structured task types, refactored initialization and solver extensions supporting work-unit limits, a factory API for task creation, and OpenMP task-based orchestration within branch-and-bound with explicit lifecycle management via scope guards.

Changes

CPU Feasibility-Jump Root-Cut Integration

Layer / File(s) Summary
Type Definitions
cpp/src/mip_heuristics/feasibility_jump/cpu_fj_thread.cuh
New cpu_fj_thread_t<i_t, f_t> worker with lifecycle hooks (on_start, run_worker, on_terminate), solution-found signaling, configurable time/work-unit limits, and ownership of fj_cpu_climber_t. New fj_cpu_task_t<i_t, f_t> encapsulates task execution state with preemption flag and custom-deleter unique_ptr.
Solver Extensions
cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu (lines 1637–1816)
cpufj_solve_loop now accepts work_unit_limit parameter; loop breaks when work units exceed limit. cpu_fj_thread_t::run_worker() passes limit to solver.
Initialization Refactoring
cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu (lines 1313–1560)
New finalize_fj_cpu_host_initialization(...) initializes flip-move caches and problem features. New init_fj_cpu_from_host_lp(...) constructs host FJ state from LP problem, rebuilds reverse CSR, and calls finalize.
Task Lifecycle API
cpp/src/mip_heuristics/feasibility_jump/cpu_fj_thread.cuh, cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu (lines 1856–1898)
make_fj_cpu_task_from_host_lp(...) constructs tasks from host LP with improvement callback. run_fj_cpu_task(...) executes with time and work-unit limits. stop_fj_cpu_task(...) signals preemption. Custom deleter manages climber cleanup.
Root-Cut Loop Integration
cpp/src/branch_and_bound/branch_and_bound.cpp (lines 2229–2556)
Introduces cut_pass_action_t/cut_pass_result_t for structured control flow. Root-cut CPUFJ task spawned as OpenMP task per cut pass, explicitly awaited, and stopped. Direct break/continue decisions replaced with structured CONTINUE/BREAK/RETURN actions propagating infeasibility, numerical, and termination outcomes.
Final Root-Cut Phase
cpp/src/branch_and_bound/branch_and_bound.cpp (lines 2570–2587)
After main cut processing, conditionally builds and runs final CPUFJ root-cut task when cuts exist, using computed remaining time limit in deterministic mode.
Header Reorganization & Wiring
cpp/src/branch_and_bound/branch_and_bound.cpp (lines 13, 29), cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cuh (lines 19–23), cpp/src/mip_heuristics/local_search/local_search.cu (line 128–132)
Added includes for cpu_fj_thread.cuh and scope_guard.hpp. Refactored fj_cpu.cuh to include new thread header and removed local cpu_fj_thread_t declaration. Updated improvement callback lambda to capture this for member access.
Template Instantiations
cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu (lines 1899–1954)
Explicit template instantiations for float and double of new task factory, execution, finalization, and helper utilities.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Run CPUFJ bursts at the root' clearly describes the main objective of the pull request: enabling CPUFJ heuristic to run during root cut passes.
Description check ✅ Passed The description explains the purpose and benefits of running CPUFJ bursts during root cuts, citing specific performance improvements on instances like tutaki and awhea.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu (1)

1768-1775: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce work_unit_limit on the actual consumed work.

work_units_elapsed is only accrued and checked every 100 iterations, and the last partial batch is never flushed before exit. For the short root-cut bursts this PR adds, that means a small work_unit_limit can be overshot by a wide margin while still underreporting the final work consumed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu` around lines 1768 - 1775,
The code only updates fj_cpu.work_units_elapsed (via
fj_cpu.memory_aggregator.collect() and applying fj_cpu.work_unit_bias) every 100
iterations, causing the final partial batch to be unaccounted and allowing
work_unit_limit to be overshot; modify the loop so that you flush and apply the
remaining memory statistics on every iteration boundary that may exit: call auto
[loads, stores] = fj_cpu.memory_aggregator.collect(), compute biased_work =
(loads + stores) * fj_cpu.work_unit_bias / 1e10, add it to
fj_cpu.work_units_elapsed, call fj_cpu.producer_sync->notify_progress() if
non-null, and then check fj_cpu.work_units_elapsed >= work_unit_limit to
break—either by moving this logic out of the 100-iteration guard into the common
path or by ensuring you perform one final collect/update/notify/check right
before breaking/returning so the true consumed work is enforced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/src/branch_and_bound/branch_and_bound.cpp`:
- Around line 2231-2237: The callback root_cut_cpufj_improvement_callback is
decoding the host-LP assignment against the live original_lp_ (via
uncrush_primal_solution) which can be concurrently mutated by
add_cuts/remove_cuts; capture and use the exact LP snapshot used to build the
CPUFJ task (or change the task to emit user-space assignments directly) so
uncrush_primal_solution runs against an immutable snapshot, and add
synchronization (or assert/guideline comment) around shared state mutation to
flag the missing concurrency protection for original_problem_/original_lp_
before calling set_new_solution().
- Around line 2254-2256: These return paths call set_solution_at_root(...) and
return immediately, which bypasses the existing async clique cleanup; insert a
call to finish_clique_thread() immediately before each early return so the async
clique task is stopped and joined. Specifically, add finish_clique_thread() just
prior to the set_solution_at_root(solution, cut_info); return
mip_status_t::OPTIMAL; sequence shown, and make the same change at the other
analogous exit that calls set_solution_at_root and returns (the second location
referenced in the comment).

In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu`:
- Around line 1881-1886: The task restart fails because run_fj_cpu_task only
clears fj_cpu.halted but not fj_cpu.preemption_flag, so after stop_fj_cpu_task
sets preemption_flag = true subsequent runs immediately exit; update
run_fj_cpu_task (and the duplicate block around the other overload at the
1890–1896 region) to reset fj_cpu.preemption_flag to false before calling
cpufj_solve_loop (i.e., clear the atomic preemption flag on the fj_cpu within
the fj_cpu_task_t<i_t,f_t> instance so the while (!fj_cpu.halted &&
!fj_cpu.preemption_flag.load()) loop can run again).

---

Outside diff comments:
In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu`:
- Around line 1768-1775: The code only updates fj_cpu.work_units_elapsed (via
fj_cpu.memory_aggregator.collect() and applying fj_cpu.work_unit_bias) every 100
iterations, causing the final partial batch to be unaccounted and allowing
work_unit_limit to be overshot; modify the loop so that you flush and apply the
remaining memory statistics on every iteration boundary that may exit: call auto
[loads, stores] = fj_cpu.memory_aggregator.collect(), compute biased_work =
(loads + stores) * fj_cpu.work_unit_bias / 1e10, add it to
fj_cpu.work_units_elapsed, call fj_cpu.producer_sync->notify_progress() if
non-null, and then check fj_cpu.work_units_elapsed >= work_unit_limit to
break—either by moving this logic out of the 100-iteration guard into the common
path or by ensuring you perform one final collect/update/notify/check right
before breaking/returning so the true consumed work is enforced.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 86405377-b8f0-4c1e-888b-d909dd15851b

📥 Commits

Reviewing files that changed from the base of the PR and between a4de253 and 64ba333.

📒 Files selected for processing (5)
  • cpp/src/branch_and_bound/branch_and_bound.cpp
  • cpp/src/mip_heuristics/feasibility_jump/cpu_fj_thread.cuh
  • cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu
  • cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cuh
  • cpp/src/mip_heuristics/local_search/local_search.cu

Comment on lines +2231 to +2237
auto root_cut_cpufj_improvement_callback =
[this](f_t obj, const std::vector<f_t>& assignment, double) {
std::vector<f_t> user_assignment;
uncrush_primal_solution(original_problem_, original_lp_, assignment, user_assignment);
settings_.log.debug("Root cut CPUFJ found solution with objective %.16e\n", obj);
set_new_solution(user_assignment);
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Uncrush against the task’s LP snapshot, not the live root LP.

assignment comes from the host-LP snapshot used to build the CPUFJ task, but this callback decodes it against original_lp_, which the concurrent cut pass is still mutating. That can remap cut/slack columns incorrectly or race with add_cuts/remove_cuts, so a valid CPUFJ incumbent can be corrupted before set_new_solution() sees it. Please either capture the exact LP snapshot used for task creation or make the CPUFJ callback emit user-space assignments directly.

As per coding guidelines, flag missing synchronization for shared state in concurrent code.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 2231 - 2237, The
callback root_cut_cpufj_improvement_callback is decoding the host-LP assignment
against the live original_lp_ (via uncrush_primal_solution) which can be
concurrently mutated by add_cuts/remove_cuts; capture and use the exact LP
snapshot used to build the CPUFJ task (or change the task to emit user-space
assignments directly) so uncrush_primal_solution runs against an immutable
snapshot, and add synchronization (or assert/guideline comment) around shared
state mutation to flag the missing concurrency protection for
original_problem_/original_lp_ before calling set_new_solution().

Comment on lines 2254 to 2256
if (num_fractional == 0) {
set_solution_at_root(solution, cut_info);
return mip_status_t::OPTIMAL;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Run the existing clique-thread cleanup on these new root-cut exits.

These return paths bypass finish_clique_thread(), unlike the earlier root-level exits in this function. That leaves the async clique task running past solve() on these paths.

Proposed cleanup
   for (i_t cut_pass = 0; cut_pass < settings_.max_cut_passes; cut_pass++) {
     if (num_fractional == 0) {
       set_solution_at_root(solution, cut_info);
+      finish_clique_thread();
       return mip_status_t::OPTIMAL;
     }
@@
-    if (cut_pass_result.action == cut_pass_action_t::RETURN) { return cut_pass_result.status; }
+    if (cut_pass_result.action == cut_pass_action_t::RETURN) {
+      finish_clique_thread();
+      return cut_pass_result.status;
+    }

Also applies to: 2540-2541

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 2254 - 2256,
These return paths call set_solution_at_root(...) and return immediately, which
bypasses the existing async clique cleanup; insert a call to
finish_clique_thread() immediately before each early return so the async clique
task is stopped and joined. Specifically, add finish_clique_thread() just prior
to the set_solution_at_root(solution, cut_info); return mip_status_t::OPTIMAL;
sequence shown, and make the same change at the other analogous exit that calls
set_solution_at_root and returns (the second location referenced in the
comment).

Comment on lines +1881 to +1886
void run_fj_cpu_task(fj_cpu_task_t<i_t, f_t>& task, f_t time_limit, double work_unit_limit)
{
cuopt_assert(task.fj_cpu != nullptr, "CPUFJ task has no climber");
auto& fj_cpu = *task.fj_cpu;
fj_cpu.halted = false;
cpufj_solve_loop(fj_cpu, time_limit, work_unit_limit);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep fj_cpu_task_t restartable across bursts.

stop_fj_cpu_task() sets preemption_flag = true, but run_fj_cpu_task() only clears halted. If the same task is reused for multiple root-cut bursts, every run after the first stop will hit while (!fj_cpu.halted && !fj_cpu.preemption_flag.load()) and return immediately.

Suggested fix
 template <typename i_t, typename f_t>
 void run_fj_cpu_task(fj_cpu_task_t<i_t, f_t>& task, f_t time_limit, double work_unit_limit)
 {
   cuopt_assert(task.fj_cpu != nullptr, "CPUFJ task has no climber");
   auto& fj_cpu  = *task.fj_cpu;
+  task.preemption_flag.store(false);
   fj_cpu.halted = false;
   cpufj_solve_loop(fj_cpu, time_limit, work_unit_limit);
 }

 template <typename i_t, typename f_t>
 void stop_fj_cpu_task(fj_cpu_task_t<i_t, f_t>& task)
 {
   if (task.fj_cpu) {
     auto& fj_cpu           = *task.fj_cpu;
-    fj_cpu.preemption_flag = true;
     fj_cpu.halted          = true;
   }
 }

Also applies to: 1890-1896

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu` around lines 1881 - 1886,
The task restart fails because run_fj_cpu_task only clears fj_cpu.halted but not
fj_cpu.preemption_flag, so after stop_fj_cpu_task sets preemption_flag = true
subsequent runs immediately exit; update run_fj_cpu_task (and the duplicate
block around the other overload at the 1890–1896 region) to reset
fj_cpu.preemption_flag to false before calling cpufj_solve_loop (i.e., clear the
atomic preemption flag on the fj_cpu within the fj_cpu_task_t<i_t,f_t> instance
so the while (!fj_cpu.halted && !fj_cpu.preemption_flag.load()) loop can run
again).

"[RootCut CPUFJ] ");
settings_.log.debug("Root cut CPUFJ final problem build time: %.6f seconds\n",
toc(root_cut_cpufj_build_start_time));
f_t fj_time_limit = settings_.deterministic
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does undetermistic path get 1s?

namespace cuopt::linear_programming::detail {

template <typename f_t>
static f_t clamp_value(f_t value, f_t lower, f_t upper)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is std::clamp available

}

template <typename i_t, typename f_t>
static void rebuild_reverse_matrix(i_t n_variables,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a transpose function available in sparse_matrix.cpp even though the parameters are a bit different, we should reuse it.

cuopt::scope_guard root_cut_cpufj_guard([&]() { stop_root_cut_cpufj(); });

enum class cut_pass_action_t { CONTINUE, BREAK, RETURN };
struct cut_pass_result_t {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I understand the need/logic for this. Can't we just launch a new cpufj task after each cut pass and wait until the next pass finishes and launch another cpufj task while staying in cut pass loop ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants