Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is CpG status predicted at insertion variants? #40

Open
elcortegano opened this issue Mar 31, 2023 · 4 comments
Open

Is CpG status predicted at insertion variants? #40

elcortegano opened this issue Mar 31, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@elcortegano
Copy link

Following a recent issue open in the IGV site (igvteam/igv#1303), we wonder if aligned_bam_to_cpg_scores.py would call 5mC methylated CpG sites at insertions (i.e. at sites that are not in the reference).

I run primrose in my data on the CCS reads, and then 5mC sites were called from a pbmm2 aligned BAM similarly as:

python3 aligned_bam_to_cpg_scores.py \
    -b sample.5mc.pbmm2.bam \
    -f reference.fa \
    -o founders \
    -p model \
    -c 5 \
    -d pileup_calling_model/ \
    -t 16 
@ctsa
Copy link
Member

ctsa commented Mar 31, 2023

Thanks @elcortegano , I just commented on the IGV ticket -- in summary the HiFi reads always contain methylation information in insertions if they've had primrose 5mCpG calling applied to them (this process occurs on unaligned reads). For the pileup tools, they do not provide an output for insertions today. In the default denovo mode for aligned_bam_to_cpg_tools, we could provide a summary for insertions in theory, but there are a few practical barriers, the first being how to communicate these results in a standardized file format. Neither bed nor bigwig output would support this. If you have an output format and corresponding viewer in mind, we could look into supporting it.

@elcortegano
Copy link
Author

Thanks for all the clarifications @ctsa ! specially for the input on the 5mCpG calls in the BAMs.

Regarding the issue here with the pileup methods, I understand the difficulties in adding insertions data. The only standard format I could think of for adding that data would look like a VCF, with the different modification probabilities in the FORMAT field (?).

I think that could be useful. I'll admit that for my personal uses right now it would be fine having IGV visualization for 5mC sites. However, relying on IGV visualization is not scalable, so I think other users might benefit of a feature like this in the future.

@ctsa
Copy link
Member

ctsa commented Mar 31, 2023

Okay that sounds good. I'll be in touch with Jim if we can help with the IGV visualization at the read level. For the pileup we'll look out for a way to do this, perhaps VCF or another format will make this doable in future.

@ctsa ctsa added the enhancement New feature or request label Apr 5, 2023
@leon945945
Copy link

same requirement as @elcortegano, HiFi data is beneficial for insertion identification and the 5mC methylation status may be highly related with these insertions, the support for 5mC of insertion sequence is desired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants