Skip to content

[MRG] Fixes #276: expose SBD as tslearn.metrics.sbd / cdist_sbd#673

Open
jbbqqf wants to merge 1 commit into
tslearn-team:mainfrom
jbbqqf:feat/276-expose-sbd-distance
Open

[MRG] Fixes #276: expose SBD as tslearn.metrics.sbd / cdist_sbd#673
jbbqqf wants to merge 1 commit into
tslearn-team:mainfrom
jbbqqf:feat/276-expose-sbd-distance

Conversation

@jbbqqf
Copy link
Copy Markdown

@jbbqqf jbbqqf commented May 9, 2026

Summary

Adds tslearn.metrics.sbd (scalar) and tslearn.metrics.cdist_sbd (pairwise) so the Shape-Based Distance used inside KShape is reachable from user code without re-deriving the 1 - cdist_normalized_cc(...).max() formula. Issue #276 asked for exactly this: SBD as a stand-alone distance the way dtw / soft_dtw / lcss already are.

Fixes #276SBD distance function.

Context

KShape._cross_dists (in tslearn/clustering/kshape.py:153) computes its distance as 1 - cdist_normalized_cc(...), but normalized_cc is private and cdist_normalized_cc(self_similarity=True) is not a valid SBD matrix on its own — it zero-fills the diagonal, leaving 1 - cc with 1 on the diagonal of the SBD matrix. So users porting KShape-style code consistently re-implement SBD or end up with a wrong distance matrix.

This PR exposes SBD as a small wrapper around the same numba-jitted kernel KShape already uses, with a comment in the source explaining why we deliberately pass self_similarity=False to cdist_normalized_cc.

Changes

  • tslearn/metrics/_sbd.py — new private module with two public helpers:
    • sbd(s1, s2) — scalar SBD with to_time_series normalisation and a clear ValueError when shapes mismatch.
    • cdist_sbd(dataset1, dataset2=None) — pairwise SBD that always runs the kernel with self_similarity=False (the branch used by KShape produces a SBD matrix with 1 on the diagonal, which is a footgun for callers expecting a true distance matrix). A 4-line comment in the function explains that constraint.
  • tslearn/metrics/__init__.py — re-exports sbd and cdist_sbd (added to __all__).
  • tests/test_metrics.py::test_sbd_public — regression test covering identity, shape error, distance-matrix invariants (zero diagonal, symmetry, non-negativity), parity with the raw cdist_normalized_cc path, and asymmetric cdist_sbd(X, Y) shape.
  • CHANGELOG.md — entry under [Towards v0.9.0] / Added.

KShape itself is unchanged; this is purely an API addition.

Reproduce BEFORE/AFTER yourself (copy-paste)

# --- one-time setup ---
git clone https://github.com/tslearn-team/tslearn.git /tmp/repro && cd /tmp/repro
python -m venv .venv && source .venv/bin/activate
pip install -e . pytest >/dev/null

# Pull the regression test into the working tree
git fetch https://github.com/jbbqqf/tslearn.git feat/276-expose-sbd-distance
git checkout FETCH_HEAD -- tests/test_metrics.py

# --- BEFORE (origin/main) ---
git checkout origin/main -- tslearn/metrics/__init__.py
pytest tests/test_metrics.py::test_sbd_public -q
# Expected: FAILED — ImportError: cannot import name 'sbd' from 'tslearn.metrics'

# --- AFTER (this PR) ---
git checkout FETCH_HEAD -- tslearn/metrics/__init__.py tslearn/metrics/_sbd.py
pytest tests/test_metrics.py::test_sbd_public -q
# Expected: 1 passed

What I ran locally

  • pytest tests/test_metrics.py::test_sbd_public -v → 1 passed.
  • pytest --doctest-modules tslearn/metrics/_sbd.py → 2 passed (the two examples in the new docstrings).
  • Smoke check: np.allclose(cdist_sbd(X), 1 - cdist_normalized_cc(X, X, ..., self_similarity=False)) is True on a random (4, 6, 1) dataset (covered by the regression test).

Edge cases tested

# Scenario Input Expected Verified by
1 Identical series sbd([1,2,3], [1,2,3]) 0.0 test_sbd_public
2 Mismatched shapes sbd([1,2,3], [[1,2],[3,4]]) ValueError test_sbd_public (pytest.raises)
3 Self-pairwise distance matrix cdist_sbd(X) for random X shape (4,6,1) symmetric, zero diagonal, non-negative entries; matches 1 - cdist_normalized_cc(self_similarity=False) test_sbd_public
4 Asymmetric pairwise cdist_sbd(X, Y) shape (4,2); D2[i,j] == sbd(X[i], Y[j]) exact match test_sbd_public

Risk / blast radius

Additive only — two new public names in tslearn.metrics, a new private module file, and a regression test. KShape and every existing metric stay byte-identical.

Release note

metrics: tslearn.metrics now exposes sbd() and cdist_sbd(), the Shape-Based Distance used by KShape (#276).

PR drafted with assistance from Claude Code. The reproducer block above was used during development; it is the same one a reviewer can paste verbatim. The implementation was checked against tslearn/clustering/kshape.py:153 KShape._cross_dists to ensure parity.

Issue tslearn-team#276 asks for a stand-alone Shape-Based Distance (SBD) function,
the way ``tslearn.metrics.dtw`` and ``soft_dtw`` already are. Until now
SBD was only reachable through ``cdist_normalized_cc`` plus a
``1 - cc.max()`` formula buried inside ``KShape._cross_dists``.

This adds ``tslearn.metrics.sbd`` (scalar) and ``cdist_sbd`` (pairwise),
both implemented on top of the existing ``cycc`` numba kernel so the
behaviour matches what KShape already does internally. ``cdist_sbd``
deliberately drives ``cdist_normalized_cc`` with ``self_similarity=False``
because the ``self_similarity=True`` branch is tailored for KShape's
in-loop usage (it zeros the diagonal of ``cc``, which becomes a wrong
SBD = 1 once we apply ``1 - cc``). A code comment in ``_sbd.py`` notes
that constraint.

A regression test in ``tests/test_metrics.py::test_sbd_public`` verifies
identity, shape error, distance-matrix invariants (zero diagonal,
symmetry, non-negativity), and parity with the raw normalized_cc path.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.73%. Comparing base (2f2029a) to head (11320a1).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #673      +/-   ##
==========================================
+ Coverage   93.70%   93.73%   +0.03%     
==========================================
  Files          73       74       +1     
  Lines        6986     7027      +41     
==========================================
+ Hits         6546     6587      +41     
  Misses        440      440              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@charavelg charavelg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Some minor changes suggested for this great PR to cross the goalline.

Comment on lines +133 to +134
"sbd",
"cdist_sbd",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be added to docs/gen_modules/tslearn.metrics.rst for proper API documentation

Comment thread tslearn/metrics/_sbd.py
sbd : Scalar SBD between two time series.
cdist_normalized_cc : Underlying cross-correlation matrix.
"""
dataset1 = to_time_series_dataset(dataset1).astype(numpy.float64)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_time_series_dataset already uses dtype=float which is numpy.float64 for numpy backend and torch backend won't support astype anyway, so i think astype is redundant.

Comment thread tslearn/metrics/_sbd.py
if dataset2 is None:
dataset2 = dataset1
else:
dataset2 = to_time_series_dataset(dataset2).astype(numpy.float64)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Comment thread tslearn/metrics/_sbd.py
)
# normalized_cc returns the full lag-correlation vector; SBD is 1 minus
# its max, matching KShape._cross_dists.
cc = normalized_cc(s1.astype(numpy.float64), s2.astype(numpy.float64))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_time_series already uses dtype=float which is numpy.float64 for numpy backend. torch backend won't support astype anyway, so i think astype are redundant.

Comment thread tslearn/metrics/_sbd.py
Comment on lines +3 to +6
SBD is the distance used inside :class:`tslearn.clustering.KShape`. Until now
it was only available indirectly via :func:`cdist_normalized_cc`. Issue #276
asks for a function-level handle, the way :func:`tslearn.metrics.dtw` exposes
DTW as a stand-alone distance.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the "current" state and the issue is worth mentionning in the module docstring even though this won't be included in the API docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SBD distance function

2 participants