-
Notifications
You must be signed in to change notification settings - Fork 20
feat: add support for over-writing genotypes for males on x non par #1030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -24,5 +27,6 @@ class FeatureFlag: | |||
EXPECT_TDR_METRICS: bool = EXPECT_TDR_METRICS | |||
EXPECT_WES_FILTERS: bool = EXPECT_WES_FILTERS | |||
INCLUDE_PIPELINE_VERSION_IN_PREFIX: bool = INCLUDE_PIPELINE_VERSION_IN_PREFIX | |||
OVERWRITE_SV_MALE_NON_PAR_CALLS: bool = OVERWRITE_SV_MALE_NON_PAR_CALLS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of this logic is it will only apply for internal SV callsets. From notes:
cohort specific of the GREGoR project
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why we wouldn't apply this to all SVs in seqr, presumably we are doing this because of desired seqr search behavior. I don't think its rreasonable for us to write search queries that are able to handle branching logic for determining what a genotype for a sample is based on whether its internal or external data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👁️ 👁️
v03_pipeline/lib/misc/sv.py
Outdated
mt = mt.annotate_entries( | ||
GT=hl.if_else( | ||
( | ||
male_sample_ids.contains(mt.s) | ||
& non_par_interval.overlaps( | ||
hl.interval( | ||
mt.start_locus, | ||
mt.end_locus, | ||
), | ||
) | ||
), | ||
hl.Call([1, 1], phased=False), | ||
mt.GT, | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would this handle SVs overlapping the start of a non-PAR regions, and do we want to be handling them this way ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and what do the GTs look like elsewise? are they left haploid ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe overlaps
handles cases where the end of the SV is > the start of the non-Par region.
I guess the broader question is what is that code ur implementing in the first place |
* First pass * add comment * ruff
tdr updates for sample qc
…es into benb/x_males_non_par
) | ||
& mt.GT.is_het() | ||
), | ||
hl.Call([1], phased=False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tweaked this to 1) only change het diploid calls and 2) replace with a haploid call.
@@ -121,6 +122,10 @@ def create_table(self) -> hl.MatrixTable: | |||
if self.dataset_type.has_multi_allelic_variants: | |||
# NB: throws SeqrValidationError | |||
mt = split_multi_hts(mt, self.skip_validation) | |||
|
|||
if self.dataset_type.overwrite_male_non_par_calls: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is no longer feature flagged and will run for all SV
s.
…s into benb/x_males_non_par
GT=hl.if_else( | ||
( | ||
male_sample_ids.contains(mt.s) | ||
& non_par_interval.overlaps( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct - if a single SV call overlaps both non-par and par regions I think we want to keep the diploid call because its real for the par region. I think we only want to change it if the SV is 100% overlapped by non-par region
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hrm, I think the current behavior is to "obtain those calls overlapping non-PAR regions and update the GT to 1/1 in males."
I think your logic to keep as diploid makes sense though, if we're ok with the behavior change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not like super confident in my knowledge of this data/biology, so it might be better to maintain parity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be worth asking lynn + alba on this. I think we can get a fast answer.
Implementation of: