A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.

Garg, Aarti and Raghava, G.P.S. (2008) A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search. In silico biology, 8 (2). pp. 129-40. ISSN 1386-6338

[img] HTML
raghavasilico1.mht - Published Version
Available under License Creative Commons Attribution.

Download (177Kb)
Official URL: http://www.bioinfo.de/isb/2008/08/0012/

Abstract

Most of the prediction methods for secretory proteins require the presence of a correct N-terminal end of the preprotein for correct classification. As large scale genome sequencing projects sometimes assign the 5'-end of genes incorrectly, many proteins are encoded without the correct N-terminus leading to incorrect prediction. In this study, a systematic attempt has been made to predict secretory proteins irrespective of presence or absence of N-terminal signal peptides (also known as classical and non-classical secreted proteins respectively), using machine-learning techniques; artificial neural network (ANN) and support vector machine (SVM). We trained and tested our methods on a dataset of 3321 secretory and 3654 non-secretory mammalian proteins using five-fold cross-validation technique. First, ANN-based modules have been developed for predicting secretory proteins using 33 physico-chemical properties, amino acid composition and dipeptide composition and achieved accuracies of 73.1%, 76.1% and 77.1%, respectively. Similarly, SVM-based modules using 33 physico-chemical properties, amino acid, and dipeptide composition have been able to achieve accuracies of 77.4%, 79.4% and 79.9%, respectively. In addition, BLAST and PSI-BLAST modules designed for predicting secretory proteins based on similarity search achieved 23.4% and 26.9% accuracy, respectively. Finally, we developed a hybrid-approach by integrating amino acid and dipeptide composition based SVM modules and PSI-BLAST module that increased the accuracy to 83.2%, which is significantly better than individual modules. We also achieved high sensitivity of 60.4% with low value of 5% false positive predictions using hybrid module. A web server SRTpred has been developed based on above study for predicting classical and non-classical secreted proteins from whole sequence of mammalian proteins, which is available from http://www.imtech.res.in/raghava/srtpred/.

Item Type: Article
Additional Information: OPEN ACCESS
Uncontrolled Keywords: classical pathway, non-classical pathway, secretory proteins, prediction, SRTpred, redundancy, dataset size, ANN, SVM, BLAST, PSI-BLAST, N-terminal sequence
Subjects: Q Science > QH Natural history > QH301 Biology
QH301 Biology
Depositing User: Dr. K.P.S.Sengar
Date Deposited: 08 Dec 2011 19:37
Last Modified: 08 Dec 2011 19:37
URI: http://crdd.osdd.net/open/id/eprint/578

Actions (login required)

View Item View Item