Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for over-writing genotypes for males on x non par #1030

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

bpblanken
Copy link
Collaborator

@bpblanken bpblanken commented Feb 5, 2025

Implementation of:

Screenshot 2025-02-05 at 5 28 50 PM

@bpblanken bpblanken changed the title Add support for over-writing genotypes for males on x non par feat: add support for over-writing genotypes for males on x non par Feb 5, 2025
@bpblanken bpblanken marked this pull request as ready for review February 6, 2025 19:45
@bpblanken bpblanken requested a review from a team as a code owner February 6, 2025 19:45
@@ -24,5 +27,6 @@ class FeatureFlag:
EXPECT_TDR_METRICS: bool = EXPECT_TDR_METRICS
EXPECT_WES_FILTERS: bool = EXPECT_WES_FILTERS
INCLUDE_PIPELINE_VERSION_IN_PREFIX: bool = INCLUDE_PIPELINE_VERSION_IN_PREFIX
OVERWRITE_SV_MALE_NON_PAR_CALLS: bool = OVERWRITE_SV_MALE_NON_PAR_CALLS
Copy link
Collaborator Author

@bpblanken bpblanken Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of this logic is it will only apply for internal SV callsets. From notes:

 cohort specific of the GREGoR project

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why we wouldn't apply this to all SVs in seqr, presumably we are doing this because of desired seqr search behavior. I don't think its rreasonable for us to write search queries that are able to handle branching logic for determining what a genotype for a sample is based on whether its internal or external data

from v03_pipeline.lib.model import ReferenceGenome, Sex


def overwrite_male_non_par_calls(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matren395 tag as resident actual biologist of this code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear of the context, but - male X non-par would be regions on ChrX that are (biologically) haploid . reasonably confident these are relevant for a few diseases. It's here where all that cool sex-linked-trait-stuff happens

Copy link
Contributor

@matren395 matren395 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👁️ 👁️

Comment on lines +25 to +28
non_par_interval = hl.interval(
par_intervals[0].end,
par_intervals[1].start,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! PARs are at the start and the end of ChrX

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think theres more than one par interval though?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's two par intervals at the ends of the chromosome, but only one par region [end par1, start par2]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, there's two intervals - one at the start and one at the end of ChrX. If you take the space between the end of the first and the start of the second (what I believe this is doing) , you get the non-PAR portion of ChrX.

all of my knowledge is from https://en.wikipedia.org/wiki/Pseudoautosomal_region

Comment on lines +36 to +50
mt = mt.annotate_entries(
GT=hl.if_else(
(
male_sample_ids.contains(mt.s)
& non_par_interval.overlaps(
hl.interval(
mt.start_locus,
mt.end_locus,
),
)
),
hl.Call([1, 1], phased=False),
mt.GT,
),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would this handle SVs overlapping the start of a non-PAR regions, and do we want to be handling them this way ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and what do the GTs look like elsewise? are they left haploid ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe overlaps handles cases where the end of the SV is > the start of the non-Par region.

@matren395
Copy link
Contributor

I guess the broader question is what is that code ur implementing in the first place

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants