fastadigest

This allows you to specify an enzyme to digest a given fasta file.

If we wanted to digest a protein fasta file with trypsin, it goes as such (note trypsin is the default, but I am specifying it anyways):

python fastadigest.py --enzyme trypsin --file /home/chris/ref/refseq62.faa --out /home/chris/ref/refseq62_trypsin.fasta

Which outputs something like:

>gi|17536613|ref|NP_494333.1| Protein SUP-9 [Caenorhabditis elegans] Pep:1
YNMSNADYEILEATIVK
>gi|17536613|ref|NP_494333.1| Protein SUP-9 [Caenorhabditis elegans] Pep:2
FSGAFYFATTVITTIGYGHSTPMTDAGK
>gi|17536613|ref|NP_494333.1| Protein SUP-9 [Caenorhabditis elegans] Pep:3
VFCMLYALAGIPLGLIMFQSIGER
>gi|17536613|ref|NP_494333.1| Protein SUP-9 [Caenorhabditis elegans] Pep:4
MNTFAAK
>gi|17536613|ref|NP_494333.1| Protein SUP-9 [Caenorhabditis elegans] Pep:5
FLTMNTEDER
>gi|17536613|ref|NP_494333.1| Protein SUP-9 [Caenorhabditis elegans] Pep:6
DEQEAILAAQGLVR
>gi|17536613|ref|NP_494333.1| Protein SUP-9 [Caenorhabditis elegans] Pep:7
VGDPTADDDFGR
>gi|17536613|ref|NP_494333.1| Protein SUP-9 [Caenorhabditis elegans] Pep:8
LPLSDNVSLASCSCYQLPDEK
>gi|17536613|ref|NP_494333.1| Protein SUP-9 [Caenorhabditis elegans] Pep:9
HTEPHGGPPTFSGMTTRPK
>gi|17535787|ref|NP_495499.1| Protein SRD-59 [Caenorhabditis elegans] Pep:1
SPATLDGLK
>gi|17535787|ref|NP_495499.1| Protein SRD-59 [Caenorhabditis elegans] Pep:2
IFLYNTSCVQIALITFAFLSQHR
>gi|17535787|ref|NP_495499.1| Protein SRD-59 [Caenorhabditis elegans] Pep:3
...

Where Pep: # refers to the tryptic peptide number obtained from that digested protein

Suppose I wanted to digest a whole genome nucleotide fasta:

python fastadigest.py --enzyme trypsin --file /home/chris/ref/human/hg19.fa --out /home/chris/ref/hg19_trypsin.fasta --type nt --frame 6 --genome

Which would output:

>chr1 F:+1 Start:10147 End:10182
PLTLTLTLTLT
>chr1 F:+1 Start:10231 End:10263
PLTLTLNPKP
>chr1 F:+1 Start:10288 End:10329
PQPQPQPQPQPQP
>chr1 F:+1 Start:10330 End:10359
PLTLTLTLP
>chr1 F:+1 Start:10390 End:10446
PLTPNPNPNPNPNPNPNP
>chr1 F:+1 Start:10474 End:10512
YPQPARPPGSDLR

Where the start and end coordinates correspond to either: the peptide which ends at a stop codon, or the peptide which ends at a tryptic cleavage site.

Suppose I wanted to digest a refseq nucleotide fasta:

python fastadigest.py --enzyme trypsin --file /home/chris/ref/human/refseq62.fa --out /home/chris/ref/refseq62_nt_trypsin.fasta --type nt --frame 3

Which would output:

>NR_024540 gene=WASH7P F:+1 Orf:1 Pep:1
AEAAAGASGR
>NR_024540 gene=WASH7P F:+1 Orf:1 Pep:2
HHDSCEDAALPGR
>NR_024540 gene=WASH7P F:+1 Orf:1 Pep:3
ALHPARPAAR
>NR_024540 gene=WASH7P F:+1 Orf:1 Pep:4
GGRPADGGCPAVPAEGLWR
>NR_024540 gene=WASH7P F:+1 Orf:1 Pep:5
HLQQVEQSR
>NR_024540 gene=WASH7P F:+1 Orf:1 Pep:6
SQVQAIGEK
>NR_024540 gene=WASH7P F:+1 Orf:1 Pep:7
VSLAQAK
>NR_024540 gene=WASH7P F:+1 Orf:1 Pep:8
LQEYGSIFTGAQDPGLQR
>NR_024540 gene=WASH7P F:+1 Orf:1 Pep:9
HRPLDER
>NR_024540 gene=WASH7P F:+1 Orf:2 Pep:1
YVFLDPLAGAVTK
>NR_024540 gene=WASH7P F:+1 Orf:2 Pep:2
THVMLGAETEEK
>NR_024540 gene=WASH7P F:+1 Orf:2 Pep:3
LFDAPLSISK
>NR_024540 gene=WASH7P F:+1 Orf:2 Pep:4

Where the Orf is the given open reading frame and the Pep:# is the tryptic peptide in that ORF.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fastadigest

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally