Dup names#4950
Conversation
|
great it can't find my commit (to be fair I am also confused about where the commit lives) |
|
This is failing now because it needs some fixes I put in #4945 |
|
OK, here's the problem: The check for duplicate entries in the FASTA is in the index load code. If you run the test more than once locally without deleting the index, the FASTA is already indexed and the index load code detects the duplication. If you run the test in a clean repo (like on the Mac CI), the index gets built and we don't actually go through the index load code, so the duplicate check never happens. |
|
This now uses vgteam/vcflib#29 |
|
If someone indexes a FASTA with duplicate records with samtools, which warns and drops duplicates from the index, then vg won't be able to detect that there are duplicates in the file when we open it, because we'll just query the index. That might be good enough. |
Changelog Entry
To be copied to the draft changelog by merger:
Description
Resolves #547 in a way. Now instead of non-obvious problems when a sequence name is duplicated, we give a big beautiful obvious error. Old behavior was that the last of the duplicately named sequences was used. New behavior is that we tell you that there are duplicate names. Test cases also added.