-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variant Filtering for input #57
Comments
Hi Jen, I think there's two main parts to your question, one caused by a misconception and one a philosophical one. First, the misconception: My understanding of the Second, the philosophical question which I will simplify to, "Why didn't you use these other filter criteria XYZ?" In our experience, no single set of filters will work perfectly in all scenarios. Additionally, users (such as yourself) each have different preferences for how filters should be applied, usually based on the problem they're trying to solve (which in some instances, can be very niche). We selected filters that:
If you find that the HiPhase defaults are not working well enough for your use case, then I would encourage you to do a parameter sweep on filters you wish to apply. Happy to answer any follow-ups, |
Thank you for the prompt response! Could you share some of the criteria you use to evaluate whether HiPhase is working well? Thank you |
I would recommend reading through the HiPhase paper to understand the metrics we use to measure accuracy and performance. I can answer any specific follow-ups, but that's the best place to start. The WhatsHap paper will also be useful if you want to go back a bit more. EDIT: I forgot we also have a tiny primer in our performance file on GitHub. Matt |
Matt |
Closing due to inactivity, feel free to re-open if there are follow up questions. |
Thank you for providing such an amazing tool!
I am currently working with HiFi reads and using DeepVariant to call germline variants. From one of your previous issues, I learned that Hiphase does not consider the "FILTER" column in the VCF file but instead uses the GQ value for filtering variants.
However, I have a question regarding the use of RefCall in phasing. If RefCall variants are included in the phasing process, doesn’t that mean Hiphase is incorporating non-PASS germline variants for phasing?
From my understanding, shouldn't Hiphase prioritize variants that are confidently classified as germline variants and have a considerable GQ value? May I know reasons for considering only GQ value? For instance, some RefCall variants have very high GQ values, whereas some PASS variants have low GQ values, such as 5.
Wouldn’t this variability potentially introduce confusion or reduce the accuracy of phasing in Hiphase?
Thank you for your time and support!
Sincerely
Jen
The text was updated successfully, but these errors were encountered: