Understanding sav benefits.

I'm trying to catch up on what's been going on in the world of alternat vcf representations and I'm trying to understand what the benefits of savvy are vs bcf.  I've run into a few questions.

1.  It seems like the big difference is the addition of a sparse vector type.  The random vcf files I've tried *savving* haven't seen any appreciable size improvement from running `sav import` on them though so I was wondering if you had some examples of files that benefited from using savvy.     I suspect I'm either using files that don't particularly benefit from the sparsity reduction, or I've misconfigured my import.

2. I don't understand how PBWT is used by sav files and what benefit that gives.  Does it only apply to genotype fields?  I tried looking in the code, but I couldn't find where it actually computes PBWT.  It seems like it's just tagging fields as being PBWT sorted?  Is this passing through something processed upstream and just acting as a marker for it?  How is this intended to be used? I'm not really a C++ programmer so I may have just missed something obvious.

3. From what I can tell sav doesn't directly address the problem of encoding gvcf files efficiently.  (Although they could probably benefit from the sparse vector type when encoding sparse PLs.).  Is that outside of the mandate of the sav format? 

Thank you.   Let me know if there's a better forum for asking general non-code questions about savvy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding sav benefits. #11

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Understanding sav benefits. #11

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions