-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to determine the quantity of various kinds of random mutations in a genomic simulation? #15
Comments
Hi there, this is a tricky question that depends on lots of things.
Best, |
First and foremost, I greatly appreciate your reply. My research goal is to simulate a cancer genome derived from the human reference genome by individually simulating the paternal and maternal genomes, subsequently simulating reads based on these genomes, and finally combining the reads from both parental genomes into a single final fastq file. I am wondering whether it is viable to use simuG for this purpose, considering that simuG includes features for simulating a variety of mutation types such as SNPs, INDELs, CNVs, INVERSIONS (INV), and TRANSLOCATIONS (TRA), all of which are integral aspects of simulating cancer genomes. Thank you once again for your help. |
Hi @DayTimeMouse , I see. Yes, simuG can definitely do what you need. Just find a cancer genome sequencing paper for the specific types of cancer that you want to cover and plug in those estimated numbers of SNV, CNV and SV count per sample will be fine. We have a paper on NKTCL that with these numbers reported coming out soon. If you are interested in. I will post a link to the paper when it comes out in early-mid April. One thing to keep in mind that there could be significant intratumor genomic heterogeneity across different cancer cells. And what we usually sequenced for real cancer genomes is a bulk of such heterogeneous cancer cell populations. I don't know the bigger scientific question of your study in which this simulation analysis is involved, but you might want to introduce or at least discuss the noise generated by such intercellular genomic heterogeneity when you want to compare your simulated data with real cancer sequencing data. Best, |
Yes, I am very interested in your related work and look forward to you posting the link in the future, I will follow it. Finally, I wish you all the best! |
Hi @DayTimeMouse , Here is the link to our paper with per-sample variant number estimated for SNVs, CNVs, and SVs: Best, |
Hi @yjx1217, Thank you so much, I learned a lot from this paper. Warm Regards. |
Hi @yjx1217, I used pbsv(https://github.com/PacificBiosciences/pbsv) to call SVs, then DUPLICATION is called. I want to ask how to set duplication varaints via simuG. Is there any difference between CNV and DUPLICATION setup? The introduction of DUPLICATION is below: |
Hi @ DayTimeMouse , Thanks for the email. You can consider CNV as a consequence of segmental/tandem insertion + deletion + duplication + contraction. So you can use simuG to introduce CNV in general, which will include some cases of duplication. If you only want to simulate duplication, you can still use simuG' s CNV simulation function with specialized settings for the following parameters:
Best, |
Hi,
I am quite puzzled about how to appropriately set the number of different types of mutations when simulating a human cancer genome.
Could you provide me with some guidance on this matter?
Thanks a lot!
The text was updated successfully, but these errors were encountered: