Make bed file for reads processed by primrose #11

oushujun · 2022-05-20T13:58:41Z

Hello,

I am trying to find out the methylation status of haplotype-specific variants. These variants may not be assembled in the haploid genome (no mapping) and thus I need to go back to the reads to measure relative richness. Can you help to develop some codes to convert the CpG status of each read into a bed file, so that calling methylation is entirely based on --min-passes and independent of genome assembly (no mapping needed)?

Making a bed file for all reads could be slow and big, alternatively, can it take a subset of reads (i.e., unmapped or unreliability mapped) and generate bed-formatted output?

The bed file could be like:

#read_id start end status pass
m54333U_210725_040653/7/ccs 189 190 methylated 4
m54333U_210725_040899/7/ccs 198 199 unmethylated 7
...

Thank you!
Shujun

ctsa · 2022-05-27T19:38:09Z

Thanks for describing your use case. Would you already have a BAM file with MM/ML tags for your unmapped reads? If so, it sounds like you're looking for more convenient access to the methylation data than the MM/ML tags themselves -- is this accurate?

oushujun · 2022-05-27T19:46:17Z

Yes this is accurate. Is that an easy way to translate MM/ML tags to the bed format?

ctsa · 2022-05-27T19:59:27Z

The information you want is embedded in the MM/ML tags. It's not trivial to convert it into the bed format you describe, but this would give you the most control over probability thresholds for the meth/unmath states you'd like to include. I'm not sure if or when we would be able to help out with such a converter but its helpful to capture the use case here as a first step.

dportik · 2022-06-03T22:31:07Z

Hi @oushujun,
Can you describe in more detail what this information will be used for?

It is possible to write code to dump the values per read into bed format, but the size of the output file will balloon very quickly because we are looking at sites per read rather than sites across the reference.

The output you suggested contains a binary methylated or unmethylated state, but this would be a simplification of a continuous probability score. We perform a correction in the model-based pileup to improve on the single molecule accuracy. Taking values from individual reads is less desirable. If you could describe how you intend to use these individual values, that might help us think of a better way to implement a feature like this.

oushujun · 2022-06-04T04:30:36Z

Hello @dportik,

Thinking about uncertainties in a genome assembly, including assembly errors, heterozygosity, unassembled highly repetitive regions, etc, going back to the read level will be helpful to detect methylation status of these regions.

oushujun · 2022-06-04T04:36:31Z

Sorry, I accidentally closed the issue.

Using binary states for methylations is a simplification. If possible you may also use a continuous probability score. We may not able to apply the model-based pileup to improve the call accuracy on read level, but it's still way "better than nothing."

The file could be very large for all reads, that's why if the script can take in a list of reads, it will alleviate this issue and make the analysis more specific.

mrvollger · 2023-05-13T00:19:55Z

@oushujun my tool can do this (and it is pretty fast) if you are okay working outside of the pacbio toolset. See ft extract at https://github.com/fiberseq/fibertools-rs

You can work on subsets of reads by filtering the input bam file if it has an index. e.g.:

samtools view -@ 8 -u input.bam chr20:1-10000000 | ft extract -r --cpg - | bgzip -@ 8 > chr20.cpg.bed12.gz

oushujun · 2023-07-12T03:53:53Z

@mrvollger Thank you! It looks very powerful!

ctsa added the enhancement New feature or request label May 27, 2022

oushujun mentioned this issue Jun 2, 2022

Program stalled/zombified #15

Closed

oushujun closed this as completed Jun 4, 2022

oushujun reopened this Jun 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make bed file for reads processed by primrose #11

Make bed file for reads processed by primrose #11

oushujun commented May 20, 2022

ctsa commented May 27, 2022

oushujun commented May 27, 2022

ctsa commented May 27, 2022

dportik commented Jun 3, 2022

oushujun commented Jun 4, 2022

oushujun commented Jun 4, 2022

mrvollger commented May 13, 2023 •

edited

Loading

oushujun commented Jul 12, 2023

Make bed file for reads processed by primrose #11

Make bed file for reads processed by primrose #11

Comments

oushujun commented May 20, 2022

ctsa commented May 27, 2022

oushujun commented May 27, 2022

ctsa commented May 27, 2022

dportik commented Jun 3, 2022

oushujun commented Jun 4, 2022

oushujun commented Jun 4, 2022

mrvollger commented May 13, 2023 • edited Loading

oushujun commented Jul 12, 2023

mrvollger commented May 13, 2023 •

edited

Loading