[MRG] Fixes #276: expose SBD as tslearn.metrics.sbd / cdist_sbd#673
[MRG] Fixes #276: expose SBD as tslearn.metrics.sbd / cdist_sbd#673jbbqqf wants to merge 1 commit into
Conversation
Issue tslearn-team#276 asks for a stand-alone Shape-Based Distance (SBD) function, the way ``tslearn.metrics.dtw`` and ``soft_dtw`` already are. Until now SBD was only reachable through ``cdist_normalized_cc`` plus a ``1 - cc.max()`` formula buried inside ``KShape._cross_dists``. This adds ``tslearn.metrics.sbd`` (scalar) and ``cdist_sbd`` (pairwise), both implemented on top of the existing ``cycc`` numba kernel so the behaviour matches what KShape already does internally. ``cdist_sbd`` deliberately drives ``cdist_normalized_cc`` with ``self_similarity=False`` because the ``self_similarity=True`` branch is tailored for KShape's in-loop usage (it zeros the diagonal of ``cc``, which becomes a wrong SBD = 1 once we apply ``1 - cc``). A code comment in ``_sbd.py`` notes that constraint. A regression test in ``tests/test_metrics.py::test_sbd_public`` verifies identity, shape error, distance-matrix invariants (zero diagonal, symmetry, non-negativity), and parity with the raw normalized_cc path.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #673 +/- ##
==========================================
+ Coverage 93.70% 93.73% +0.03%
==========================================
Files 73 74 +1
Lines 6986 7027 +41
==========================================
+ Hits 6546 6587 +41
Misses 440 440 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
charavelg
left a comment
There was a problem hiding this comment.
Thanks for the PR! Some minor changes suggested for this great PR to cross the goalline.
| "sbd", | ||
| "cdist_sbd", |
There was a problem hiding this comment.
Needs to be added to docs/gen_modules/tslearn.metrics.rst for proper API documentation
| sbd : Scalar SBD between two time series. | ||
| cdist_normalized_cc : Underlying cross-correlation matrix. | ||
| """ | ||
| dataset1 = to_time_series_dataset(dataset1).astype(numpy.float64) |
There was a problem hiding this comment.
to_time_series_dataset already uses dtype=float which is numpy.float64 for numpy backend and torch backend won't support astype anyway, so i think astype is redundant.
| if dataset2 is None: | ||
| dataset2 = dataset1 | ||
| else: | ||
| dataset2 = to_time_series_dataset(dataset2).astype(numpy.float64) |
| ) | ||
| # normalized_cc returns the full lag-correlation vector; SBD is 1 minus | ||
| # its max, matching KShape._cross_dists. | ||
| cc = normalized_cc(s1.astype(numpy.float64), s2.astype(numpy.float64)) |
There was a problem hiding this comment.
to_time_series already uses dtype=float which is numpy.float64 for numpy backend. torch backend won't support astype anyway, so i think astype are redundant.
| SBD is the distance used inside :class:`tslearn.clustering.KShape`. Until now | ||
| it was only available indirectly via :func:`cdist_normalized_cc`. Issue #276 | ||
| asks for a function-level handle, the way :func:`tslearn.metrics.dtw` exposes | ||
| DTW as a stand-alone distance. |
There was a problem hiding this comment.
Not sure if the "current" state and the issue is worth mentionning in the module docstring even though this won't be included in the API docs.
Summary
Adds
tslearn.metrics.sbd(scalar) andtslearn.metrics.cdist_sbd(pairwise) so the Shape-Based Distance used insideKShapeis reachable from user code without re-deriving the1 - cdist_normalized_cc(...).max()formula. Issue #276 asked for exactly this: SBD as a stand-alone distance the waydtw/soft_dtw/lcssalready are.Fixes #276 — SBD distance function.
Context
KShape._cross_dists(intslearn/clustering/kshape.py:153) computes its distance as1 - cdist_normalized_cc(...), butnormalized_ccis private andcdist_normalized_cc(self_similarity=True)is not a valid SBD matrix on its own — it zero-fills the diagonal, leaving1 - ccwith1on the diagonal of the SBD matrix. So users porting KShape-style code consistently re-implement SBD or end up with a wrong distance matrix.This PR exposes SBD as a small wrapper around the same numba-jitted kernel KShape already uses, with a comment in the source explaining why we deliberately pass
self_similarity=Falsetocdist_normalized_cc.Changes
tslearn/metrics/_sbd.py— new private module with two public helpers:sbd(s1, s2)— scalar SBD withto_time_seriesnormalisation and a clearValueErrorwhen shapes mismatch.cdist_sbd(dataset1, dataset2=None)— pairwise SBD that always runs the kernel withself_similarity=False(the branch used by KShape produces a SBD matrix with1on the diagonal, which is a footgun for callers expecting a true distance matrix). A 4-line comment in the function explains that constraint.tslearn/metrics/__init__.py— re-exportssbdandcdist_sbd(added to__all__).tests/test_metrics.py::test_sbd_public— regression test covering identity, shape error, distance-matrix invariants (zero diagonal, symmetry, non-negativity), parity with the rawcdist_normalized_ccpath, and asymmetriccdist_sbd(X, Y)shape.CHANGELOG.md— entry under[Towards v0.9.0] / Added.KShape itself is unchanged; this is purely an API addition.
Reproduce BEFORE/AFTER yourself (copy-paste)
What I ran locally
pytest tests/test_metrics.py::test_sbd_public -v→ 1 passed.pytest --doctest-modules tslearn/metrics/_sbd.py→ 2 passed (the two examples in the new docstrings).np.allclose(cdist_sbd(X), 1 - cdist_normalized_cc(X, X, ..., self_similarity=False))isTrueon a random(4, 6, 1)dataset (covered by the regression test).Edge cases tested
sbd([1,2,3], [1,2,3])0.0test_sbd_publicsbd([1,2,3], [[1,2],[3,4]])ValueErrortest_sbd_public(pytest.raises)cdist_sbd(X)for randomXshape(4,6,1)1 - cdist_normalized_cc(self_similarity=False)test_sbd_publiccdist_sbd(X, Y)shape(4,2);D2[i,j] == sbd(X[i], Y[j])test_sbd_publicRisk / blast radius
Additive only — two new public names in
tslearn.metrics, a new private module file, and a regression test. KShape and every existing metric stay byte-identical.Release note
PR drafted with assistance from Claude Code. The reproducer block above was used during development; it is the same one a reviewer can paste verbatim. The implementation was checked against
tslearn/clustering/kshape.py:153 KShape._cross_diststo ensure parity.