Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform variant calling at non-target positions #145

Open
standage opened this issue Sep 1, 2022 · 2 comments
Open

Perform variant calling at non-target positions #145

standage opened this issue Sep 1, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@standage
Copy link
Member

standage commented Sep 1, 2022

At the moment, the core haplotype calling algorithm considers only a collection of explicitly designated SNPs. But there is often rare/cryptic variation at non-target sites within the locus. This thread is a placeholder and a reminder to come back at some time in the future and implement features for calling SNPs (or maybe even small indels) at all sites in the reference.

Some folks argue that variant calls or perhaps even the entire locus sequence are the MH alleles of the future. I’m sympathetic on a philosophical level, but there are practical obstacles to actualizing that glorious future. What I propose here would be an incremental step in that direction, providing complete backwards and forwards compatibility for markers whose SNP definitions may change over time, but providing data to begin experimenting with comprehensive variant call sets at each locus.

For now, we’ll probably want to store variant calls separately from the MH allele tallies, along with the marker reference sequence.

@standage standage added the enhancement New feature or request label Sep 1, 2022
@standage
Copy link
Member Author

Experimented a bit today, and the following seems to be a reasonable starting point.

  • per-sample read alignment to marker target sequences with bwa mem
  • per-sample variant discovery with GATK HaplotypeCaller
  • gVCF aggregation with GATK CombineVCFs
  • joint variant calling with GATK GenotypeVCFs

This would create a set of "de novo variants", which could be combined with user-supplied "reference variants" to specify the final "marker definitions" to be used by mhpl8r type for haplotype calling.

@standage
Copy link
Member Author

Should investigate variant filtering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant