Skip to content

Fuzzy near-duplicate detection with MinHash #1

Description

@delcenjo

Matching is exact or normalized-exact today, so paraphrased duplicates slip through. A MinHash/LSH pass with a configurable Jaccard threshold would catch fuzzy near-duplicates, kept behind a flag so the default stays fast and deterministic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions