[BUG] KMeansPlusPlus significantly degrades kmeans performance

We are trying to use the new pq API to speedup diskann /aisaq pq computation.
We see there's a big performance difference if using kmeans++ or random initialization.
For example for sift1m for 256k training dataset and 12 kmeans iterations - kmeans takes:
17.5 sec if using kmeans++
6.5 sec if using random.
On an A100 GPU.
We still need the kmeans++ method as it gives roughly 1-2%+ recall improvement



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] KMeansPlusPlus significantly degrades kmeans performance #2266

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] KMeansPlusPlus significantly degrades kmeans performance #2266

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions