Dynamic time warping in native PyTorch, with CPU and CUDA backends.
pip install torchdtwThis package requires PyTorch 2.10 or later. It is developed using the PyTorch 2.10 Stable ABI, and compiled with instructions for CUDA cards from Volta to Blackwell. It is available on Linux (with CUDA support), macOS, and Windows (without CUDA). This was originally made for fastabx, but it can be used in other projects. Only the exact DTW is implemented, there is no plan to add variants.
This package provides three functions:
dtw(distances)Compute the DTW cost of the given distances 2D tensor.
Use +inf to mask forbidden alignments. NaN distances are unsupported: the result is
unspecified and may differ between the CPU and CUDA backends. Integer distances accumulate
the cost in their own dtype and may overflow on long sequences; use a wide enough integer dtype
or a floating dtype.
Parameters:
- distances (
Tensor) – A 2D tensor of shape (n, m) representing the pairwise distances between two sequences.
Returns:
Tensor– A scalar tensor with the cost.
dtw_batch(distances, sx, sy, *, symmetric)Compute the batched DTW cost on the distances 4D tensor.
Only the (sx[i], sy[j]) sub-block of each pair is read, so padding beyond the sequence
lengths is ignored. Every sx[i] must be <= s1 and every sy[j] <= s2: the CPU backend
validates this, but the CUDA backend assumes it and reads out of bounds if violated. Use +inf
to mask forbidden alignments. NaN distances are unsupported: the result is unspecified and may
differ between the CPU and CUDA backends. Integer distances accumulate the cost in their own
dtype and may overflow on long sequences; use a wide enough integer dtype or a floating dtype.
Parameters:
- distances (
Tensor) – A 4D tensor of shape (n1, n2, s1, s2) representing the pairwise distances between two batches of sequences. - sx (
Tensor) – A 1D tensor of shape (n1,) representing the lengths of the sequences in the first batch. - sy (
Tensor) – A 1D tensor of shape (n2,) representing the lengths of the sequences in the second batch. - symmetric (
bool) – Whether or not the DTW is symmetric (i.e., the two batches are the same).
Returns:
Tensor– A 2D tensor of shape (n1, n2) with the costs.
dtw_path(distances)Compute the DTW path of the given distances 2D tensor.
No CUDA variant or batched implementation are provided for now.
Use +inf to mask forbidden alignments. NaN distances are unsupported and give an
unspecified path.
Parameters:
- distances (
Tensor) – A 2D tensor of shape (n, m) representing the pairwise distances between two sequences.
Returns:
Tensor– A 2D tensor of shape (*, 2) with the path indices.
For many DTWs on short sequences, prefer dtw_batch over a Python loop of dtw calls.
A single dtw_batch launches one CUDA kernel (one block per pair) or one parallel CPU
loop, amortizing dispatch, allocation, and launch overhead across the whole batch.
Check this folder for comparisons against reference implementations.
Please cite the fastabx paper if you use this package in your work:
@misc{fastabx,
title={fastabx: A library for efficient computation of ABX discriminability},
author={Maxime Poli and Emmanuel Chemla and Emmanuel Dupoux},
year={2025},
eprint={2505.02692},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.02692},
}