-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make bed file for reads processed by primrose #11
Comments
Thanks for describing your use case. Would you already have a BAM file with MM/ML tags for your unmapped reads? If so, it sounds like you're looking for more convenient access to the methylation data than the MM/ML tags themselves -- is this accurate? |
Yes this is accurate. Is that an easy way to translate MM/ML tags to the bed format? |
The information you want is embedded in the MM/ML tags. It's not trivial to convert it into the bed format you describe, but this would give you the most control over probability thresholds for the meth/unmath states you'd like to include. I'm not sure if or when we would be able to help out with such a converter but its helpful to capture the use case here as a first step. |
Hi @oushujun, It is possible to write code to dump the values per read into bed format, but the size of the output file will balloon very quickly because we are looking at sites per read rather than sites across the reference. The output you suggested contains a binary |
Hello @dportik, Thinking about uncertainties in a genome assembly, including assembly errors, heterozygosity, unassembled highly repetitive regions, etc, going back to the read level will be helpful to detect methylation status of these regions. |
Sorry, I accidentally closed the issue. Using binary states for methylations is a simplification. If possible you may also use a continuous probability score. We may not able to apply the model-based pileup to improve the call accuracy on read level, but it's still way "better than nothing." The file could be very large for all reads, that's why if the script can take in a list of reads, it will alleviate this issue and make the analysis more specific. |
@oushujun my tool can do this (and it is pretty fast) if you are okay working outside of the pacbio toolset. See You can work on subsets of reads by filtering the input bam file if it has an index. e.g.:
|
@mrvollger Thank you! It looks very powerful! |
Hello,
I am trying to find out the methylation status of haplotype-specific variants. These variants may not be assembled in the haploid genome (no mapping) and thus I need to go back to the reads to measure relative richness. Can you help to develop some codes to convert the CpG status of each read into a bed file, so that calling methylation is entirely based on --min-passes and independent of genome assembly (no mapping needed)?
Making a bed file for all reads could be slow and big, alternatively, can it take a subset of reads (i.e., unmapped or unreliability mapped) and generate bed-formatted output?
The bed file could be like:
Thank you!
Shujun
The text was updated successfully, but these errors were encountered: