Datasets

By searching the keywords such as “neuropeptide” from UniProt/KB database and filtering these protein terms without the precursor of flags and the signal peptide, we collected 1194 complete reviewed precursors of neuropeptide. Here, we chose 31 test precursors as the independent test dataset which was integrated into the UniProt database after 2014. To guarantee a fair comparison on the independent test dataset, the collected sequences that are similar to the test precursors at a threshold of 40% using CD-HIT would be dropped. By the above steps, the remaining precursors from the training dataset included 717 precursors (training: validation=4:1). All training data and test data are freely available at https://github.com/isyslab-hust/DeepNeuropePred.

Model architecture and evaluation

The model architecture consists of 4 parts, pre-trained language model, convolutional layer, average pooling layer, position-wise fully-connected layers. The pre-trained language model (ESM-12) can obtain the global feature representation of the precursor because the input is the full length of the precursor sequence rather than the window sequence of the cleavage site. Using the convolution layers with different scale kernels (1 and 3) could obtain local features of the windows of 18 amino acids in two different scales and the average pooling layer was used to obtain the glob representation of the windows. In the next stage, position-wise fully connected layers were used for mapping the embedding of cleavage sites and non-cleavage sites to the classification space.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

methodological_explanation.md

methodological_explanation.md

Datasets

Model architecture and evaluation

Files

methodological_explanation.md

Latest commit

History

methodological_explanation.md

File metadata and controls

Datasets

Model architecture and evaluation