Calibrate DNA built profiles #19

Ebedthan · 2021-03-23T12:41:22Z

Hello,

I am trying to build PSSM profiles for DNA sequences. The profile construction ran smoothly. Now I need to calibrate the profile with a database and I really cannot find a way to do that. Can you please show me a way or a database to use?

P.S. The profile was built with partial bacterial DNA sequences from NCBI.

Thanks in advance.

smoretti · 2021-03-24T16:06:02Z

Hi

to calibrate your PSSM profiles from DNA sequences we recommend to shuffle your partial bacterial DNA sequences using a method such as "windows 20" shuffling, and then use this shuffle database for the calibration.

Ebedthan · 2021-03-24T16:25:46Z

Okay, thanks @smoretti. But by the way, I've taken time to read the article from Pagni and Jogeneel but I don't clearly know what step or tools to use to shuffle DNA sequence with a method like "windows 20". Please can you help me with the process to create such a shuffle database or point me to interesting resources? I really need it. Thanks in advance.

smoretti · 2021-03-24T16:33:33Z

In the distribution a script (src/Perl/scramble_fasta.pl) is provided to do it.
It can run several types of shuffling.
More information with
perl scramble_fasta.pl -h

The "windows 20" method should be run with
perl scramble_fasta.pl -m window -P 20 a_file_with_all_your_partial_bacterial_DNA_sequences_in_fasta_format

Ebedthan · 2021-03-24T16:48:53Z

Great thanks to you for your help! I'm trying it.

Ebedthan · 2021-03-24T16:54:15Z

Again thank you @smoretti for the help and point me to the Perl script. I'll further explore all the files in the pftools2 package.

Ebedthan · 2021-03-25T13:37:47Z

Hello @smoretti,

I have ran perl scramble_fasta.pl -m window -P 20 bacterial_dna.fa > mywindow20.seq and got the database for the profile calibration. Nevertheless the profiles obtained have a score for both cut off values like SCORE=-2147483648. And running pfscanV3 or pfsearchV3 I got the following error:

Error: Inconsistent alignment found in alignment 3 - no list produced.
       Alignement should be from 1431 to 1!
Thread 0 : Internal error xalip reported no possible alignment for sequence 0(0) (nali=-1)!

It is the first time I see a negative SCORE and I'm trying to know what I'm doing wrong.

Thanks in advance for the help.

smoretti · 2021-03-26T08:14:18Z

Negative SCORE are possible, mainly when global (not local) profiles are used.

Your case is more tricky.
Such very large SCOREs look to be a memory issue:
To optimize speed and memory storage, matches in pftoolsv3 are stored on 32bits in memory. When very large profiles are used the storage is exceeded.

Could you retry with less long sequences (and profiles)?

Ebedthan · 2021-04-02T12:05:07Z

I want to but I'll lose important gene information. I have already used partial gene sequences lower than the full gene size. Is it not possible to find another way? Or perhaps increase the memory storage for DNA profiles?

smoretti · 2021-04-21T12:55:25Z

Sorry, I missed your message.

In fact by default profiles should be stored in 16bits. If you rebuild pftools3 with this option
cmake -DUSE_32BIT_INTEGER=ON
profiles will be stored in 32bits. Maybe it will solve your issue.

If it does not solve it, you can try to use less long profiles by splitting them, and build overlapping profiles.

Ebedthan · 2021-04-21T13:03:49Z

Thanks for your response. While waiting for your response I have taken the option to try to split sequences to build less long profiles and overlapping profiles. I have not gone far meanwhile. Definitively, I'll try both options and see which one can lead me to meaningful results. I'll let you know.

Ebedthan closed this as completed Mar 24, 2021

smoretti reopened this Mar 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calibrate DNA built profiles #19

Calibrate DNA built profiles #19

Ebedthan commented Mar 23, 2021

smoretti commented Mar 24, 2021

Ebedthan commented Mar 24, 2021

smoretti commented Mar 24, 2021

Ebedthan commented Mar 24, 2021

Ebedthan commented Mar 24, 2021

Ebedthan commented Mar 25, 2021

smoretti commented Mar 26, 2021

Ebedthan commented Apr 2, 2021

smoretti commented Apr 21, 2021

Ebedthan commented Apr 21, 2021 •

edited

Loading

Calibrate DNA built profiles #19

Calibrate DNA built profiles #19

Comments

Ebedthan commented Mar 23, 2021

smoretti commented Mar 24, 2021

Ebedthan commented Mar 24, 2021

smoretti commented Mar 24, 2021

Ebedthan commented Mar 24, 2021

Ebedthan commented Mar 24, 2021

Ebedthan commented Mar 25, 2021

smoretti commented Mar 26, 2021

Ebedthan commented Apr 2, 2021

smoretti commented Apr 21, 2021

Ebedthan commented Apr 21, 2021 • edited Loading

Ebedthan commented Apr 21, 2021 •

edited

Loading