Skip to content

Hublabel#4870

Merged
adamnovak merged 116 commits into
masterfrom
hublabel
Jun 30, 2026
Merged

Hublabel#4870
adamnovak merged 116 commits into
masterfrom
hublabel

Conversation

@electricEpilith

Copy link
Copy Markdown
Contributor

Changelog Entry

To be copied to the draft changelog by merger:

  • Add hub labeling to distance index, which allows efficient exact shortest distance queries even in "oversized" snarls
  • Bug fix for minimizer, significant speed improvement

Description

Adds hub labeling functionality to the snarl distance index.

electricEpilith and others added 30 commits November 12, 2025 14:32

@adamnovak adamnovak left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good now. The only thing I think we really need to change is making sure we point at the actual libbdsg commit we want.

Comment thread src/subcommand/bench_dist_query_main.cpp Outdated
Comment thread deps/libbdsg
electricEpilith and others added 13 commits June 3, 2026 23:36
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.8 <[email protected]>
The docs-job ran git submodule update in place on submodules the CI
workspace left with stale origin URLs (a local mirror that is not a valid
repo on the runner). deinit alone did not fix it because it leaves the
cached gitdir under .git/modules/<path>, which update --init then reuses
with its stale file:// origin. Remove both the working tree and the cached
gitdir so each dep is re-cloned fresh from its canonical https URL, and set
protocol.file.allow=never to deny local transport entirely (CVE-2022-39253)
instead of the previous =always. Recurse only into libvgio, the one dep
whose Doxygen INPUT needs a nested submodule tree.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
I had to reconcile the two approaches to setting up the submodules for
the docs build here. I'm using the synthetic approach, which was faster
whern I compared the runtimes of the CI jobs, but I replaced all the
explanation of how it was supposed to be working (which was unsourced
and so unverifiable) with my own text.

I'm also noting it's not safe to run the publish-docs.sh script if you
have any data in your submodules you need to keep.

@adamnovak adamnovak left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good except that we don't want to add unit tests that depend on on-disk data files. We want the unit tests to pass when all you have is the binary, since we ship them in the binary.

Comment thread .gitlab-ci.yml
Comment on lines +323 to +324
- sudo apt-get -q -y update
- sudo apt-get -q -y install --no-upgrade doxygen graphviz rsync git

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we need these (including git and rsync which we never installed before?)

If we install doxygen here, do we need it in the general before_script anymore?

Comment thread src/unittest/snarl_distance_index.cpp Outdated
Comment on lines +7889 to +7890
cerr << "[overreport] fwd(103fd0->154rev3) = " << fwd
<< " rev(154rev3->103fd0) = " << rev << endl;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to keep debugging prints in the tests if we can avoid it.

Suggested change
cerr << "[overreport] fwd(103fd0->154rev3) = " << fwd
<< " rev(154rev3->103fd0) = " << rev << endl;

Comment thread src/unittest/snarl_distance_index.cpp Outdated
// Dijkstra) but the reverse 154rev3 -> 103fd0 IS reachable. The hub-label
// query was returning the reverse direction's finite distance (264) for the
// unreachable forward query.
unique_ptr<HandleGraph> graph = vg::io::VPKG::load_one<HandleGraph>("hublabel_overreport.vg");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want the unit tests to depend on data files; they should pass if you just download the static binary and run vg test.

This might want to become a CLI test, or we might want to include the offending graph (or, ideally, a minimally-sized version of that graph with as much stuff removed or simplified as possible) as JSON here to load for the test.

@adamnovak adamnovak merged commit 9ebf9bf into master Jun 30, 2026
2 checks passed
@faithokamoto faithokamoto deleted the hublabel branch June 30, 2026 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants