Skip to content

test(separation): characterize stem-separation quality against known sources#557

Open
seonghobae wants to merge 2 commits into
developfrom
test/stem-separation-quality-baseline
Open

test(separation): characterize stem-separation quality against known sources#557
seonghobae wants to merge 2 commits into
developfrom
test/stem-separation-quality-baseline

Conversation

@seonghobae

Copy link
Copy Markdown
Collaborator

What

Adds two characterization tests (tests/test_separation_quality.py) that measure how well the local stem separator recovers a known source from a mixture, using ground-truth signals we control.

Why

The stem separator is a frequency-band FFT heuristic, not neural source separation — yet nothing measured its actual separation quality. Existing test_separation.py only covers role-keyword mapping, pure-tone band routing, and error handling. There was no test that a known source is actually separated.

The tests (measured facts, not a quality bar)

  • test_recovered_bass_is_not_high_fidelity_isolation — recovered bass-stem SI-SDR vs the true bass source stays below a clean-isolation bound (~9 dB measured; a neural model would exceed ~20 dB on a signal this trivial).
  • test_bass_source_energy_leaks_across_stems — a lone harmonic bass source leaks ~11% of its energy into other stems, proving it splits by frequency band, not by source.

These pin current behaviour as a regression guard. If a real separation model (e.g. demucs) is introduced, SI-SDR will rise past these bounds and the assertions should be re-baselined.

Verification

uv run pytest tests --cov=src/bandscope_analysis --cov-fail-under=100 — 435 passed, 100% coverage. ruff check + ruff format --check clean.

🤖 Generated with Claude Code

https://claude.ai/code/session_01RjGVapDZ3k7V7zKYk16P4C

seonghobae and others added 2 commits July 5, 2026 15:44
…sources

The local stem separator is a frequency-band FFT heuristic, not neural
source separation — but nothing measured how well it recovers a known
source from a mixture (existing tests only check role keyword mapping,
band routing of pure tones, and error handling).

Add two characterization tests over a controlled ground-truth mix
(harmonic-rich bass + vocal-band tone):
- recovered bass stem SI-SDR stays below a clean-isolation bar (~9 dB
  measured; a neural model would exceed ~20 dB on a signal this simple)
- a lone bass source leaks a meaningful energy share (~11%) into other
  stems, proving it splits by frequency band, not by source

These pin current behaviour and act as a regression guard; the bounds
should be re-baselined if a real separation model is introduced.

Co-Authored-By: Claude Opus 4.8 <[email protected]>
Claude-Session: https://claude.ai/code/session_01RjGVapDZ3k7V7zKYk16P4C
On overlapping instruments (bass/keys/voice sharing bands) plus broadband
drums, the band-split heuristic scores a NEGATIVE mean SI-SDR — for most
stems the output is further from the true source than the mixture itself.
Real neural separators are positive here (Demucs ~+9 dB, Open-Unmix ~+5 dB
on MUSDB18). This pins that the current feature is not source separation.

Co-Authored-By: Claude Opus 4.8 <[email protected]>
Claude-Session: https://claude.ai/code/session_01RjGVapDZ3k7V7zKYk16P4C
seonghobae added a commit that referenced this pull request Jul 5, 2026
The separator is coarse FFT band-splitting (audio_separator.py: 'coarse
canonical frequency and percussion bands'), not source separation —
measured SI-SDR ~ -39 dB on a realistic mix vs ~+9 dB for demucs
(characterization tests in PR #557). 'rough stem previews' matches the
README's own hedged voice ('likely harmony', 'visible confidence') and
the existing scope disclaimer, without overclaiming DAW-grade separation.

Co-Authored-By: Claude Opus 4.8 <[email protected]>
Claude-Session: https://claude.ai/code/session_01RjGVapDZ3k7V7zKYk16P4C
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant