-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calibrate DNA built profiles #19
Comments
Hi to calibrate your PSSM profiles from DNA sequences we recommend to shuffle your partial bacterial DNA sequences using a method such as "windows 20" shuffling, and then use this shuffle database for the calibration. |
Okay, thanks @smoretti. But by the way, I've taken time to read the article from Pagni and Jogeneel but I don't clearly know what step or tools to use to shuffle DNA sequence with a method like "windows 20". Please can you help me with the process to create such a shuffle database or point me to interesting resources? I really need it. Thanks in advance. |
In the distribution a script (src/Perl/scramble_fasta.pl) is provided to do it. The "windows 20" method should be run with |
Great thanks to you for your help! I'm trying it. |
Again thank you @smoretti for the help and point me to the Perl script. I'll further explore all the files in the pftools2 package. |
Hello @smoretti, I have ran
It is the first time I see a negative SCORE and I'm trying to know what I'm doing wrong. Thanks in advance for the help. |
Negative SCORE are possible, mainly when global (not local) profiles are used. Your case is more tricky. Could you retry with less long sequences (and profiles)? |
I want to but I'll lose important gene information. I have already used partial gene sequences lower than the full gene size. Is it not possible to find another way? Or perhaps increase the memory storage for DNA profiles? |
Sorry, I missed your message. In fact by default profiles should be stored in 16bits. If you rebuild pftools3 with this option If it does not solve it, you can try to use less long profiles by splitting them, and build overlapping profiles. |
Thanks for your response. While waiting for your response I have taken the option to try to split sequences to build less long profiles and overlapping profiles. I have not gone far meanwhile. Definitively, I'll try both options and see which one can lead me to meaningful results. I'll let you know. |
Hello,
I am trying to build PSSM profiles for DNA sequences. The profile construction ran smoothly. Now I need to calibrate the profile with a database and I really cannot find a way to do that. Can you please show me a way or a database to use?
P.S. The profile was built with partial bacterial DNA sequences from NCBI.
Thanks in advance.
The text was updated successfully, but these errors were encountered: