-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for over-writing genotypes for males on x non par #1030
base: main
Are you sure you want to change the base?
Conversation
@@ -24,5 +27,6 @@ class FeatureFlag: | |||
EXPECT_TDR_METRICS: bool = EXPECT_TDR_METRICS | |||
EXPECT_WES_FILTERS: bool = EXPECT_WES_FILTERS | |||
INCLUDE_PIPELINE_VERSION_IN_PREFIX: bool = INCLUDE_PIPELINE_VERSION_IN_PREFIX | |||
OVERWRITE_SV_MALE_NON_PAR_CALLS: bool = OVERWRITE_SV_MALE_NON_PAR_CALLS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of this logic is it will only apply for internal SV callsets. From notes:
cohort specific of the GREGoR project
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why we wouldn't apply this to all SVs in seqr, presumably we are doing this because of desired seqr search behavior. I don't think its rreasonable for us to write search queries that are able to handle branching logic for determining what a genotype for a sample is based on whether its internal or external data
from v03_pipeline.lib.model import ReferenceGenome, Sex | ||
|
||
|
||
def overwrite_male_non_par_calls( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matren395 tag as resident actual biologist of this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not clear of the context, but - male X non-par would be regions on ChrX that are (biologically) haploid . reasonably confident these are relevant for a few diseases. It's here where all that cool sex-linked-trait-stuff happens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👁️ 👁️
non_par_interval = hl.interval( | ||
par_intervals[0].end, | ||
par_intervals[1].start, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep! PARs are at the start and the end of ChrX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think theres more than one par interval though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's two par intervals at the ends of the chromosome, but only one par region [end par1, start par2]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, there's two intervals - one at the start and one at the end of ChrX. If you take the space between the end of the first and the start of the second (what I believe this is doing) , you get the non-PAR portion of ChrX.
all of my knowledge is from https://en.wikipedia.org/wiki/Pseudoautosomal_region
mt = mt.annotate_entries( | ||
GT=hl.if_else( | ||
( | ||
male_sample_ids.contains(mt.s) | ||
& non_par_interval.overlaps( | ||
hl.interval( | ||
mt.start_locus, | ||
mt.end_locus, | ||
), | ||
) | ||
), | ||
hl.Call([1, 1], phased=False), | ||
mt.GT, | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would this handle SVs overlapping the start of a non-PAR regions, and do we want to be handling them this way ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and what do the GTs look like elsewise? are they left haploid ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe overlaps
handles cases where the end of the SV is > the start of the non-Par region.
I guess the broader question is what is that code ur implementing in the first place |
Implementation of: