Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

13kb false duplication at chr10_MATERNAL:43,216,000-43,245,000 #662

Open
1 task done
nhansen opened this issue Jan 16, 2024 · 5 comments
Open
1 task done

13kb false duplication at chr10_MATERNAL:43,216,000-43,245,000 #662

nhansen opened this issue Jan 16, 2024 · 5 comments
Labels
large_error This issue describes a poor quality or misassembled region in the assembly recall_consensus This issue is a candidate for re-calling consensus from spanning reads to create a patch v1.0 This is an issue/error in the hg002v1.0 assembly

Comments

@nhansen
Copy link
Collaborator

nhansen commented Jan 16, 2024

Have you confirmed that this issue hasn't already been reported?

  • I have confirmed in the UCSC browser hub that this is a new issue (required)

Issue location in assembly (use format chromosome:start-end, e.g., chr13_MATERNAL:3740148-9625296)

chr10_MATERNAL:43,216,000-43,245,000

Description of the issue

Sniffles calls on v0.9 picked up a large deletion in this region, and long read data bear it out. Here's an IGV screenshot:

image
@nhansen nhansen added v1.0 This is an issue/error in the hg002v1.0 assembly large_error This issue describes a poor quality or misassembled region in the assembly labels Jan 16, 2024
@jzook
Copy link
Collaborator

jzook commented Jan 21, 2024

I just came across this when looking at DV calls in both HiFi and ONT as well, and looks like it is annotated as a segdup between HSats in the browser. Some HG2 and HG4 HiFi reads align across it so should be possible to correct based on HiFi sequence as well as ONT

@jzook
Copy link
Collaborator

jzook commented Apr 9, 2024

One thing I just noticed when looking at NateD's stratifications is that the 11kb region chr10_MATERNAL 43232301 43243056 is a pure C homopolymer! This made chr10 a huge outlier for long C homopolymers :)

@nhansen
Copy link
Collaborator Author

nhansen commented Apr 9, 2024

Wow--tagging @skoren so he can possibly figure out where all those C's came from! Just to be picky, though, it's not pure C's for the whole 11kb. There are actually non-C bases at about seven spots, giving eight very long mononucleotide runs!
image

@nhansen
Copy link
Collaborator Author

nhansen commented Apr 9, 2024

As you point out, Justin, it should be fairly easy to re-call consensus for this stretch to create a patch.

@jzook
Copy link
Collaborator

jzook commented Apr 9, 2024

ah, I forgot that we merged nearby perfect homopolymers to get this region, so that makes sense that there could be small (<10bp) interruptions between 21+bp perfect homopolymers

@nhansen nhansen added the recall_consensus This issue is a candidate for re-calling consensus from spanning reads to create a patch label Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
large_error This issue describes a poor quality or misassembled region in the assembly recall_consensus This issue is a candidate for re-calling consensus from spanning reads to create a patch v1.0 This is an issue/error in the hg002v1.0 assembly
Projects
None yet
Development

No branches or pull requests

2 participants