Skip to content

Understanding sav benefits. #11

@lbergelson

Description

@lbergelson

I'm trying to catch up on what's been going on in the world of alternat vcf representations and I'm trying to understand what the benefits of savvy are vs bcf. I've run into a few questions.

  1. It seems like the big difference is the addition of a sparse vector type. The random vcf files I've tried savving haven't seen any appreciable size improvement from running sav import on them though so I was wondering if you had some examples of files that benefited from using savvy. I suspect I'm either using files that don't particularly benefit from the sparsity reduction, or I've misconfigured my import.

  2. I don't understand how PBWT is used by sav files and what benefit that gives. Does it only apply to genotype fields? I tried looking in the code, but I couldn't find where it actually computes PBWT. It seems like it's just tagging fields as being PBWT sorted? Is this passing through something processed upstream and just acting as a marker for it? How is this intended to be used? I'm not really a C++ programmer so I may have just missed something obvious.

  3. From what I can tell sav doesn't directly address the problem of encoding gvcf files efficiently. (Although they could probably benefit from the sparse vector type when encoding sparse PLs.). Is that outside of the mandate of the sav format?

Thank you. Let me know if there's a better forum for asking general non-code questions about savvy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions