-
Notifications
You must be signed in to change notification settings - Fork 196
[REVIEW] New Dataset API Clarifying Ownership #1846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
HowardHuang1
wants to merge
179
commits into
NVIDIA:main
Choose a base branch
from
HowardHuang1:HH-Dataset-API
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
179 commits
Select commit
Hold shift + click to select a range
d78f459
get build working
HowardHuang1 4febf8b
add dataset compression test and basic constructor types test
HowardHuang1 b403473
add padded_dataset class along with test cases
HowardHuang1 8d6833a
add support for new padded_dataset classes all the way up to the CAGR…
HowardHuang1 5447a4c
Merge branch 'main' into HH-Dataset-API
aamijar 17ab09d
fix style
aamijar fb556c9
build() now only takes views and not unique ptrs + get rid of distinc…
HowardHuang1 37d28dc
clean up old overloads of build & index functions that take ownership…
HowardHuang1 f30e7ed
fully removed index ownership so that it only takes views + add suppo…
HowardHuang1 26b46a2
fix failing mg tests that do build -> serialize -> deserialize -> search
HowardHuang1 a38fb18
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 393367a
Merge branch 'release/26.04' of https://github.com/rapidsai/cuvs into…
HowardHuang1 70b6b58
fix formatting w/ pre-commit
HowardHuang1 0c3df3e
Merge upstream main into branch
HowardHuang1 dbc47e1
run pre-commit
HowardHuang1 27a6eb3
fix merge issues that cause build to fail
HowardHuang1 355adb3
fix failing test cases for cagra
HowardHuang1 5e5445b
fix cagra test cases to conform with new dataset API by always callin…
HowardHuang1 1d4ca18
fixed failing cagra test cases caused by shift to new padded dataset …
HowardHuang1 187e66e
fixed failing cagra test cases when query dimensions don't match numb…
HowardHuang1 e894477
Fix ace build caller mismatch and vpq keep alive error in cagra test …
HowardHuang1 84573a6
fix error with vpq layout queries.extent(1) != idx.dim()
HowardHuang1 dfb0d4c
Fix failing query vs index dimension check when calling cagra search …
HowardHuang1 7be7329
Fix unsupported data type due to dataset_view not taken by serialize()
HowardHuang1 0c9432d
Fix extend core issue in failing cagra test cases to support device_p…
HowardHuang1 234606e
Fix merge to treat device_padded_dataset_view as valid attached dataset
HowardHuang1 20f7ea7
Fix cagra test case issue where cagra merge failed to preserve 16 byt…
HowardHuang1 82979c8
Fix failing low recall cagra test cases so that padding in padded dat…
HowardHuang1 167cb75
remove is_owning state. Split into high level polymorphic_dataset<> a…
HowardHuang1 07caefa
Remove top level base polymorphic_dataset class and split inheritance…
HowardHuang1 f15317f
change index() and build() to take high level abstract dataset_view i…
HowardHuang1 ee0b5e4
add stride check for update_dataset() + update documentation
HowardHuang1 db16de1
remove unused naming aliases
HowardHuang1 08e5e9d
remove host_padded_dataset/view since it's not used in any control path
HowardHuang1 b0cc113
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 4098a44
fix minor CI issues
HowardHuang1 b130341
fix copyright
HowardHuang1 b897116
Revert "fix copyright"
HowardHuang1 547657a
fix copyright for CI
HowardHuang1 c124b00
remove cmake changes
HowardHuang1 bce0f36
Fix cmakelists for build
HowardHuang1 a7fbf2f
Fix build issue with spectral clustering
HowardHuang1 c15ed1e
Merge remote-tracking branch 'upstream/main' into HH-Dataset-API
8dd7436
fix FAISS cuVS bridge to support new Dataset API. Had update_dataset(…
HowardHuang1 34d499f
Fix ACE CAGRA device dataset padding and C API handle for disk-backed…
HowardHuang1 f41c355
run pre-commit formatting
HowardHuang1 b0e5369
Fix disk-ann type casting after splitting dataset and dataset_view in…
HowardHuang1 81341c9
Fix non-ACE cpu path internal call to make_padded_dataset to call cor…
HowardHuang1 f9e83c5
Increase compressed binary size threshold to avoid CI error
HowardHuang1 5598753
run pre-commit formatting
HowardHuang1 b4892a5
Fix failing build for hnsw examples to support new build_res return type
HowardHuang1 485b820
Fix seed for random make blob test for determinism
HowardHuang1 7b9edda
Fix broken C API handling of out_dataset ownership and lifetime durin…
HowardHuang1 14f6bfe
Fix non-padded dataset from host from_graph path that caused cuda mis…
HowardHuang1 110f9f3
Fix cagra::extend_core which requires a preallocated device buffer + …
HowardHuang1 69d45fe
add support for vpq dataset in C API. Previously C++ API already supp…
HowardHuang1 b7148c3
Disable flaky mg_ivf_flat_extend test case temporarily
HowardHuang1 15a8834
Fix failing java tests where tiered_index fails to keep dataset passe…
HowardHuang1 9543191
Fix failing java tests where tiered_index doesn't use padded dataset …
HowardHuang1 2dee124
Fix CAGRA search query alignment. Serialize search when padded stride…
HowardHuang1 7071656
cuvs_c_verify_install_headers test used to look for cpp project root.…
HowardHuang1 3bbf993
Merge remote-tracking branch 'upstream/main' into HH-Dataset-API
HowardHuang1 c8b5397
Fix RMM integration after breaking RMM API change was merged
HowardHuang1 d81f873
Fix docs CI check and fix linking of cmake crate in rust with CMAKE_C…
HowardHuang1 98b5697
revert rust build.rs config
HowardHuang1 c88b23c
remove leftover stale index wrapper files
HowardHuang1 b32e868
remove old test cases
HowardHuang1 1a88bb8
revert rmm changes
HowardHuang1 1f24980
check for 16-byte alignment in _from_args() from cuvsCagraIndexFromAr…
HowardHuang1 5e5ed60
rename merged_cagra_holder to cuvs_cagra_c_api_lifetime_holder since …
HowardHuang1 5cadbad
refactor out repeated dataset stride matching calculation into global…
HowardHuang1 12272d2
Add explicit stride validation in convert_dataset_view_to_padded_for_…
HowardHuang1 e689b04
Attach the padded dataset view, not the original unvalidated view whe…
HowardHuang1 58ee2e5
In the case that params.compression is set, do not drop VPQ ownership…
HowardHuang1 88cfb37
Fix lifetime issue in tiered_index: Repoint the CAGRA index before re…
HowardHuang1 706f95d
keep ace build result alive in cagra_build_into_index
HowardHuang1 231a9e1
ACE build is the only path that accepts raw mdspan so call make_padde…
HowardHuang1 8aac5bd
check for managed memory case for pinned page-locked host memory. Che…
HowardHuang1 d869208
fix pre-commit styles
HowardHuang1 ad30ca1
Merge remote-tracking branch 'upstream/main' into HH-Dataset-API
HowardHuang1 4c3faef
Merge branch 'main' of github.com:rapidsai/cuvs into HH-Dataset-API
HowardHuang1 6676c3a
bring back build() functions that return indexes and work with datase…
HowardHuang1 d53c4a8
add deprecation warnings to deprecated classes and functions
HowardHuang1 adcb0a2
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 6a91ca4
since build() returning index was brought back, make corresponding fi…
HowardHuang1 7f0ba65
fix FAISS code now that we've brought back build() that returns index…
HowardHuang1 945a249
bring back out of core non-ACE batched IVF host path
HowardHuang1 a2d937a
abide by convention where addr+dtype is used to maintain ownership an…
HowardHuang1 063e439
Use raft::resources mdspan-based API where possible
HowardHuang1 cc9c1f3
merge FAISS diff files and use raft::copy_matrix instead of cudaMemcpy
HowardHuang1 c3edfc7
revert header_check.cmake changes
HowardHuang1 5fa5bf3
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 c5cfcf2
Remove semicolon that was messing FAISS patch
HowardHuang1 58c7bbd
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 2b8801c
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 b12a6c2
refactor using templates for composition in place of inheritance
HowardHuang1 8d99fa7
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 285e5ed
consolidate aliases and get rid of naming inconsistencies
HowardHuang1 93557e4
Merge branch 'main' into HH-Dataset-API
HowardHuang1 e46300f
reapply commit a2efbee remove noexcept to avoid CI error
HowardHuang1 b1c979b
fix clang-format for pre-commit styles
HowardHuang1 c6405c6
Remove noexcept to avoid CI error
HowardHuang1 3ccbe5c
rename case 0 and case 1 in switch statement for clarity
HowardHuang1 4117f23
move overload in comments to a separate line to avoid Doxygen being m…
HowardHuang1 0340038
add back in update_dataset() functions that take in owning dataset fo…
HowardHuang1 0e2691c
mismatched branches tried to stuff template dataT into non-matching t…
HowardHuang1 7dfbf85
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 23b07d3
remove calls to detail namespace from header file cagra.hpp
HowardHuang1 46935d4
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 cebcfd4
Remove detail namespace around finalize_index_from_ace and finalize_i…
HowardHuang1 88da190
move implementation of dispatch in index out into dispatch.hpp file
HowardHuang1 79d0fd8
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 f9adbc5
remove indirect_dataset
HowardHuang1 d1c1dd4
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 5223862
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 8c836ec
fix failing CI by having deserialize_strided() recover a padded datas…
HowardHuang1 11b0c61
index was missing vpq_16_owning check when rebinding dataset during d…
HowardHuang1 1de47f9
index logical element type and vpq codebook type do not need to be th…
HowardHuang1 c6b4901
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 99ab789
pull out nested vpq_dataset creation from build(). Users should now c…
HowardHuang1 3ea5f56
remove deferred_host_dataset from build_result. Deprecate old edge ca…
HowardHuang1 ea4c1d6
remove build_result completely and remove build() overloads that retu…
HowardHuang1 68c3f6b
make_vpq_dataset now takes any_dataset_view instead of just padded_da…
HowardHuang1 f7f607d
remove unused files and includes. Added code example for make_vpq_dat…
HowardHuang1 34d2dc4
Merge branch 'main' into HH-Dataset-API
HowardHuang1 3298366
remove ace_build_result. build_ace() now returns index only. Deprecat…
HowardHuang1 0af9527
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 6459568
remove all instances of build_ace() on public API surface. Now it's o…
HowardHuang1 e3cf3d3
unify two index dataset storage variables into one
HowardHuang1 e819312
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 1c1eb55
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 fa31267
Get rid of merge_result. Pulled out nested merged dataset creation fr…
HowardHuang1 40bc7eb
fix merge conflict in cagra_build_inst.cu.in and merge upstream main
HowardHuang1 fd9d591
fix merge conflicts by adding CUVS_EXPORT after recent upstream chang…
HowardHuang1 186d81e
change make_vpq_dataset() factory to accept template mdspan parameter…
HowardHuang1 06e879d
remove 3 deprecated paths, remove deprecated strided_dataset, remove …
HowardHuang1 d95dd91
add missing update_dataset() calls after build() since build() no lon…
HowardHuang1 a8d27a6
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 44e8888
add update_dataset() at the end of detail::build_from_device_matrix a…
HowardHuang1 6021700
restore raft copy and simplify variant with std::visit
HowardHuang1 34f6460
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 8b3d8f8
template index on DatasetViewT to get rid of variants. Use padded_ind…
HowardHuang1 5411928
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 de2000b
remove update_dataset() call at the end of detail::build_from_device_…
HowardHuang1 c0190fa
fix failing cpp test cases due to missing update_dataset() calls afte…
HowardHuang1 baab6a7
add host counterparts to dataset API so device vs host can be disting…
HowardHuang1 98ef788
Merge remote-tracking branch 'upstream' into HH-Dataset-API
6fdfebf
template build_ace and host build on DatasetViewT and add attach_devi…
HowardHuang1 f356e48
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 53ef0e6
combine host build and device build into one build function in cagra.…
HowardHuang1 428542e
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 6e9b7a5
add templating on extend()
HowardHuang1 178eb3f
replace templates on public API surface with concrete index type over…
HowardHuang1 a09c834
remove deprecated build() overloads that take host/device_matrix_view…
HowardHuang1 c3af8bd
Remove variants from serialize and deserialize. Fix to just device_pa…
HowardHuang1 dfedfb9
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 6d756e5
remove variants completely. Remove variant usage from vamana
HowardHuang1 828c371
add fixes at call sites once variants were removed completely
HowardHuang1 c630451
template serialize/deserialize on DatasetViewT. Users are now expecte…
HowardHuang1 4258dc5
Fix FAISS to use new templated index. Add a default so that we don't …
HowardHuang1 f58c21e
Fix cuvs_cagra_wrapper.h: use int64_t extents and host_padded_dataset…
HowardHuang1 e0fce27
Fix FAISS CuvsCagra::train() to use dataset_view instead of raw mdspan
HowardHuang1 73bb010
Fix Python ACE test failures: use-after-free, missing FD transfers, d…
HowardHuang1 0dfeffe
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 9c024c6
Fix corrupt hunk header in faiss-1.14-cuvs-26.06.diff
HowardHuang1 55736de
bring back attach_dataset_on_build param. It only applies to the devi…
HowardHuang1 af2fd42
Fix BinaryCuvsCagra::train() device path to use make_device_padded_da…
HowardHuang1 598e861
Fix use-after-free in convert_host_to_device_index (MG CAGRA segfault)
HowardHuang1 e795819
minor fix for update_graph overload confusion
HowardHuang1 26b87e5
fix FAISS to call update_dataset() with padded dataset after deprecat…
HowardHuang1 2804440
remove deprecated update_dataset() overload taking in device_matrix_v…
HowardHuang1 48fc8d5
update examples/cpp/CMakeLists.txt to use C++20 to support concepts. …
HowardHuang1 ff8cc82
Revert "update examples/cpp/CMakeLists.txt to use C++20 to support co…
HowardHuang1 4d7d0a6
move FAISS changes for new Dataset API into separate 26.08 patch
HowardHuang1 bb275f1
use raft::copy_matrix instead of cudaMemcpy2DAsync
HowardHuang1 217e5bd
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 8001bf4
fix two examples that still use the old cagra API. Migrated them over…
HowardHuang1 3803e00
Merge remote-tracking branch 'upstream' into HH-Dataset-API
HowardHuang1 c31ea7c
fix merge conflict. Recover missing line
HowardHuang1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.