[REVIEW] Add OpenSearch backend to ``cuvs-bench`` by jrbourbeau · Pull Request #2012 · rapidsai/cuvs

jrbourbeau · 2026-04-10T18:30:27Z

Would still like to refine a bit but pushing up for early feedback

cc @singhmanas1 @janakivamaraju @afourniernv

xref #1907 which does something similar but for Elastic

EDIT: Okay, this is now ready for review (cc @jnke2016)

Signed-off-by: James Bourbeau <[email protected]>

jrbourbeau · 2026-04-10T18:59:30Z

-                    print(
-                        f"Build failed for {config.index_name}: {build_result.error_message}"
+            try:
+                backend.initialize()


Note to reviewers: This is a separate, but related change to the orchestrator. I noticed the initialize and cleanup methods weren't actually being called like their docstrings said they were suppose to. This is a big diff but most of it is just intending over the existing code into a try/except block. The functional change is calling initialize + cleanup

EDIT: FWIW supporting a context manager for backend setup / teardown seems reasonable. But that's also not something I wanted to handle in this PR.

Good catch. The initialize() and cleanup() hooks in BenchmarkBackend are defined in the abstract base class with docstrings describing their intended purpose, but neither method is actually called by the BenchmarkOrchestrator in _run_sweep() or _run_trial(). This was an oversight because the only backend implemented at the time was CppGoogleBenchmarkBackend, whose lifecycle is entirely self-contained within each subprocess call to build()/search(), making initialize()/cleanup() unnecessary for it.

EDIT: FWIW supporting a context manager for backend setup / teardown seems reasonable. But that's also not something I wanted to handle in this PR.

I Agree that context manager support is reasonable. The purpose of the pluggable backend API is precisely to enable these kinds of extensions without modifying the core orchestrator. Since initialize() and cleanup() are already part of the BenchmarkBackend interface, wiring them up in the orchestrator (whether as explicit calls or a context manager) would ensure the plugin system works end-to-end for future backends without requiring further modifications to the core

Overall, I think we're leaning a little too strongly on the local (individual backend) vs global (benchmark orchestrator) control and we should probably prioritize global control for all boilerplate things so that we're not having to reimplement them in each backend. I'm okay doing this in follow-ups, but the way we typically handle these types of things is to create a github issue that clearly explains the intended changes for the follow-up (explain it in a way that assumes someone else might have to pick it up in the future, just in case) and then reference a link to the issue in a todo comment in the code where the change is intended to be made. This makes sure 1) people reading the code are aware of the intentions, and 2) the issue doesn't get orphaned and forgotten.

jrbourbeau · 2026-04-10T18:59:59Z


+[tool.pytest.ini_options]
+markers = [
+    "integration: tests that require a live OpenSearch node (run with '-m integration')",


TODO: Make this more generic language (not just opensearch)

Signed-off-by: James Bourbeau <[email protected]>

…ch-opensearch Signed-off-by: James Bourbeau <[email protected]>

Signed-off-by: James Bourbeau <[email protected]>

…ch-opensearch Signed-off-by: James Bourbeau <[email protected]>

jnke2016

The PR looks good me.

cjnolet · 2026-04-20T18:52:56Z

+
+    ext = os.path.splitext(path)[1].lower()
+    dtype = _DTYPE_FOR_EXT.get(ext, np.float32)
+    with open(path, "rb") as f:


I mentioned this in the elasticsearch PR as well- we should standardize/centralize these calls so that we're not having to manually implement them for every backend. The benchmark should be opening the files and passing the vectors or the file handle) down to the actual backends.

@jnke2016

Don't disagree with this comment (or the other ones about consolidating various utilities). IIUC it sounds like these will need deeper changes to the config loader / orchestrator specs. Is that something that's needed in this PR, or is a follow up to refactor fine?

Since both backends(elastic search and OpenSearch) are separately implementing this, a follow up PR consolidating both might be easier especially since both PRs are still open. Otherwise I can push a PR consolidating those various utilities (general purpose) based on both PRs which they can pull after it is merged

@jrbourbeau
Loading vectors in the orchestrator or config loader isn't the right approach because the C++ backend never loads vectors into Python. It only passes file paths to the C++ subprocess. Loading vectors at the orchestrator level would waste memory for the C++ backend and any future backend that operates the same way, so this doesn't require changes to the config loader or orchestrator specs.

There are two options to centralize this:

Option 1: Shared utility module. Create a backends/utils.py with a single load_vectors() function that handles all supported binary formats (.fbin, .f16bin, .u8bin, .i8bin, .ibin) and supports subset_size. Python-native backends like OpenSearch and Elasticsearch import and call it when they need vectors. The C++ backend, orchestrator, and config loader are completely untouched. This is the simpler approach with no core changes, though it relies on future backend authors knowing to use it.

Option 2: Lazy loading on Dataset. Make base_vectors, query_vectors, and groundtruth_neighbors lazy properties on the Dataset class that load from the corresponding file paths on first access. The C++ backend never accesses these properties (it uses file paths), so nothing loads. Python-native backends just access dataset.base_vectors and get loaded vectors automatically without needing to know about any utility or handle the empty-array fallback pattern. This prevents the duplication from recurring since future backend authors can't accidentally reimplement it, but it requires changing Dataset from a plain dataclass to a regular class.

I'd suggest we go with option 1 for now since it solves the problem without touching core types, and we can move to option 2 later if the pattern keeps getting reimplemented

Went with option 2. The Dataset class is now a regular class where base_vectors, query_vectors, and groundtruth_neighbors are loaded automatically from file paths on first access. Backends just use dataset.base_vectors and get a numpy array without needing to know about file loading. The C++ backend is unaffected since it only uses dataset.base_file.

cjnolet · 2026-04-20T18:53:23Z

+    return data.reshape(n_rows, n_cols)
+
+
+def _load_yaml(path: str) -> Any:


This and the dataset config should also be getting loaded by the benchmark itself and not the backend. @jnke2016

cjnolet · 2026-04-20T18:53:40Z

+
+
+def _gather_algo_configs(
+    config_path: str, algorithm_configuration: Optional[str]


We need to centralize this @jnke2016

Agreed. Functions like _gather_algo_configs, _load_yaml, and _get_dataset_conf are shared config-loading logic that every backend ends up reimplementing. @afourniernv - The Elasticsearch PR has the same pattern. The C++ loader already has equivalent methods (load_yaml_file, get_dataset_configuration, gather_algorithm_configs) as instance methods. I'll promote those to the base ConfigLoader class so all config loaders inherit them automatically and developers don't need to reimplement them. The backend should only be responsible for its backend-specific concerns (connection parameters, index mappings, API calls), not for loading YAML files or finding datasets by name.

Though the load() method is abstract because each backend has different config-loading logic, helper methods like load_yaml_file and get_dataset_configuration are common operations that every config loader reimplements when implementing load(). gather_algorithm_configs is also shared between the C++ and OpenSearch loaders.

cjnolet · 2026-04-20T18:55:50Z

+
+        benchmark_configs: List[BenchmarkConfig] = []
+
+        for algo_file in algo_files:


Yeah, this is all boilerplate- exactly why this should be centralized. The resulting config dicts should be loaded from the files by the benchmark and passed down to the individual backends. It's important that this is standardized across benchmarks so that we don't have specific backends doing things drastically different than others (that just makes it super complicated for everyone. But also, if we find a bug in a spcific loader or decide to change how we do things generally, every single backend then needs to be updated individually instead of just changing it one place ) @jnke2016

So the loading pipeline (loading dataset YAML, gathering algorithm files, filtering by algorithm/group names etc ..) is a shared logic that should be centralized in the base ConfigLoader instead of having each backend implementing this.

cjnolet · 2026-04-20T18:59:24Z

+                method_params = {"nlist": build_param.get("nlist", 4)}
+            else:
+                raise ValueError(
+                    f"Unsupported faiss_method {faiss_method!r}. "


Yes! This is important. I believe i suggested that this be done in the eladticserch PR as well. Need to fail gracefully when the user configures somethign that's not supported (or loads the wrong file).

cjnolet · 2026-04-20T19:03:38Z

cc @navneet1v for thoughts / feedback on the opensearch backend for cuvs-bench.

Signed-off-by: James Bourbeau <[email protected]>

…ch-opensearch

Signed-off-by: James Bourbeau <[email protected]>

jnke2016 · 2026-04-27T09:15:50Z

+                build_spec: Dict[str, List] = group_conf.get("build", {})
+                search_spec: Dict[str, List] = group_conf.get("search", {})
+
+                build_keys = list(build_spec.keys())


Cartesian product expansion: Both backends (Elasticsearch and OpenSearch) implement the same param expansion pattern via Cartesian product). I've added expand_param_grid() in backends/utils.py that both can use instead of reimplementing. For OpenSearch, the expansion block at lines 261-276 would be replaced with:

from .utils import expand_param_grid build_combos = expand_param_grid(build_spec) search_params_list = expand_param_grid(search_spec)

This is now handled transparently by the framework. Backend authors no longer need to know about or call expand_param_grid() directly.

jnke2016 · 2026-04-27T09:31:15Z

+            elapsed = time.perf_counter() - t0
+            qps = n_queries / elapsed if elapsed > 0 else 0.0
+
+            recall = 0.0


Recall calculation: I've added a shared compute_recall() in backends/utils.py so that any bug fix or improvement happens in one place and applies to all Python-native backends. The inline recall block at lines 1027-1035 would be replaced with:

from .utils import compute_recall recall = 0.0 if groundtruth is not None: recall = compute_recall(neighbors, groundtruth, k)

Running the parameters and calculating recall should be done by the orchestrator not the individual backends. The orchestrator should be controlling this end to end. Backends should not even have the option to do this because the orchestrator should be the one controlling it all.

Please look at how Ann-benchmarks is architected. That's what we want to do here (and that's what the original cuVS-bench scripts were doing.

The original cuVS-bench scripts had the framework control everything end-to-end, with recall computed by the C++ subprocess. This is still the case in the new pluggable benchmark API for the C++ backend. Python-native backends (OpenSearch, Elasticsearch) don't have this since they return raw neighbor IDs over HTTP. Moving recall computation to the orchestrator would require handling the case where the C++ backend already returns recall, to avoid overwriting it. One way to distinguish this is by checking neighbors.size > 0, but that relies on the C++ backend returning empty neighbor arrays rather than an explicit signal, and if any other backend also returns empty neighbors for a different reason, the orchestrator would wrongly skip recall computation. We could also hardcode an exception for the C++ backend in the orchestrator, but that goes against the pluggable design since any future backend that also computes recall internally would need another exception.

I looked at how ann-benchmarks handles this. In ann-benchmarks, every algorithm returns only raw neighbor indices. The framework stores them as-is, and recall is computed downstream during analysis, never by the algorithm itself as you pointed out. In our case, the C++ backend is the exception because its subprocess design doesn't return raw neighbors to Python. It computes recall internally and returns it in the JSON output. Following the ann-benchmarks pattern, the orchestrator would compute recall for any backend that returns actual neighbor arrays (neighbors.size > 0), and preserve the C++ backend's recall when it returns empty neighbors. This was the main reason I initially kept recall in the backends rather than the orchestrator. Relying on neighbors.size > 0 to skip the C++ backend is fragile if other backends also return empty neighbors for different reasons. I will have recall computed at the orchestrator level and skipping it for the C++ backend (through neighbors.size > 0) and I am open to suggestions.

recall is now computed by the orchestrator after backend.search() returns. Backends no longer compute it themselves. The C++ backend is naturally skipped since it returns empty neighbor arrays (recall is already computed by the subprocess).

Add OpenSearch backend to cuvs-bench

b30b9e1

Signed-off-by: James Bourbeau <[email protected]>

jrbourbeau requested review from a team as code owners April 10, 2026 18:30

jrbourbeau requested a review from bdice April 10, 2026 18:30

github-project-automation Bot added this to Unstructured Data Processing Apr 10, 2026

jrbourbeau commented Apr 10, 2026

View reviewed changes

jrbourbeau added 2 commits April 10, 2026 14:13

Lint

a58363f

Signed-off-by: James Bourbeau <[email protected]>

Merge branch 'main' of https://github.com/rapidsai/cuvs into cuvs-ben…

734a83c

…ch-opensearch Signed-off-by: James Bourbeau <[email protected]>

jrbourbeau mentioned this pull request Apr 10, 2026

[DNM] Add OpenSearch deployment #2013

Draft

jrbourbeau added 2 commits April 13, 2026 11:43

Updates

ed7c1d0

Signed-off-by: James Bourbeau <[email protected]>

Merge branch 'main' of https://github.com/rapidsai/cuvs into cuvs-ben…

7131d39

…ch-opensearch Signed-off-by: James Bourbeau <[email protected]>

jrbourbeau changed the title ~~[WIP] Add OpenSearch backend to cuvs-bench~~ [REVIEW] Add OpenSearch backend to cuvs-bench Apr 13, 2026

jnke2016 self-requested a review April 20, 2026 18:43

jnke2016 approved these changes Apr 20, 2026

View reviewed changes

cjnolet reviewed Apr 20, 2026

View reviewed changes

cjnolet assigned jrbourbeau Apr 20, 2026

cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Apr 20, 2026

cjnolet moved this to In Progress in Unstructured Data Processing Apr 20, 2026

jrbourbeau added 3 commits April 20, 2026 14:40

Updates

964a0df

Signed-off-by: James Bourbeau <[email protected]>

Merge branch 'main' of https://github.com/rapidsai/cuvs into cuvs-ben…

288d308

…ch-opensearch

Lint

887b2a3

Signed-off-by: James Bourbeau <[email protected]>

jnke2016 reviewed Apr 27, 2026

View reviewed changes

		return data.reshape(n_rows, n_cols)


		def _load_yaml(path: str) -> Any:



		def _gather_algo_configs(
		config_path: str, algorithm_configuration: Optional[str]


		benchmark_configs: List[BenchmarkConfig] = []

		for algo_file in algo_files:

Conversation

jrbourbeau commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrbourbeau Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnke2016 Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnke2016 left a comment

Choose a reason for hiding this comment

Uh oh!

cjnolet Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnke2016 Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cjnolet Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnke2016 Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cjnolet Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cjnolet commented Apr 20, 2026

Uh oh!

jnke2016 Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnke2016 Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnke2016 May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

jrbourbeau commented Apr 10, 2026 •

edited

Loading

jrbourbeau Apr 10, 2026 •

edited

Loading

jnke2016 Apr 13, 2026 •

edited

Loading

cjnolet Apr 20, 2026 •

edited

Loading

jnke2016 Apr 20, 2026 •

edited

Loading

cjnolet Apr 20, 2026 •

edited

Loading

jnke2016 Apr 25, 2026 •

edited

Loading

cjnolet Apr 20, 2026 •

edited

Loading

jnke2016 Apr 27, 2026 •

edited

Loading

jnke2016 Apr 27, 2026 •

edited

Loading

jnke2016 May 4, 2026 •

edited

Loading