This is a beat tracker built from a tiny ConvNet (18k parameters) and a particle filter. Trained on an organic handcrafted 90-hour dataset of annotated music, mostly EDM and pop. All the details, benchmarks and videos are in a blog post :)
no_std, no alloc, and it's confirmed to run on the RP2350 Pico 2 at 180MHz.
For western dance-y music from radio pop to rawstyle, this is what you can expect compared to michaelkrzyzaniak/BTT and BTrack:
In demo, run cargo r -r -- FILE to play and visualize a file (best option). --device DEVICE uses a microphone input instead, if no device is given the default is used. Works on Linux with ALSA at least, use --latency 50 to compensate for latency in milliseconds if needed,
and --flash enables flashier colors.
If you have a Raspberry Pi Pico 2 and an I2S microphone flying around, check out the blinky demo for that
On platforms supported by CMSIS-DSP you should enable the cmsis cargo feature
and set the CMSIS_TARGET environment variable to the target CPU, for example cortex-m33.
Also be sure to add -C target-cpu=cortex-... to RUSTFLAGS for faster code.
let opts: BeattrackOpts = Default::default();
// Decide how to allocate 12175 floats of arena memory.
let arena = vec![0f32; Beattracker::arena_size(&opts)].leak();
// Scratch buffer (3213 floats) is only temporarily borrowed during computation.
let mut scratch = vec![0f32; Beattracker::scratch_size(&opts)];
let mut tracker = Beattracker::new(arena, opts);
// Acquire 16kHz audio samples, ideally in chunks <= 16ms for realtime use
let mut remaining_samples: &[f32] = samples;
while !remaining_samples.is_empty() {
let result: Option<BeattrackResult>;
// This needs to run in under 16ms for 256 input samples.
(result, remaining_samples) = tracker.process_samples(remaining_samples, &mut scratch);
if let Some(result) = result {
if result.is_beat {
do_something();
}
}
}BeattrackOpts::const_default, arena_size and scratch_size are const functions, so you can allocate fixed size buffers at compile time.
- ESP32-S3: uses 27ms of 16ms budget, 7ms of which is the FFT. It would need integration with Espressif's DSP library and a general look at codegen on Xtensa cores. No plans for that at the moment, more likely once the new RISC-V versions are out.
- Teensy 4, STM32H7S3: way overkill, no problem
- STM32H523 at 250MHz: works, 10.8ms max for 16ms of input.
- STM32F411 (Blackpill) at 100MHz: nope, takes 18.5ms for 16ms of input audio.
Any of:
- GNU General Public License v3
- European Union Public Licence v. 1.2