Skip to content
This repository was archived by the owner on Oct 29, 2023. It is now read-only.
This repository was archived by the owner on Oct 29, 2023. It is now read-only.

add support for STRICT intra-shard boundary #201

@deflaux

Description

@deflaux

When the genomic region we want to examine is divided into shards, we use a STRICT shard boundary to remove duplicate data that would occur at the end of the current shard and also at the beginning of the next shard.

  • This works fine when we are working over an entire chromosome.
  • BUT when we want to shard a subset of a chromosome, we are filtering out the records at the beginning of the very first shard even though they would not be duplicated in any other shards.
    • Some times we do want to make use of those records that overlap the beginning of the shard boundary.
    • We need a way to use OVERLAPS for the first shard and STRICT for all subsequent shards.

Confirm this functionality with a JoinNonVariantSegmentsWithVariants integration test that operates over a small genomic region specified by both normal sharding and SitesToShards.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions