To discriminate the peptides having antiviral activity from non active ones we arrived at four different features.
1. Motif search using MEME/MAST(Bailey and Elkan 1994; Bailey, Boden et al. 2009)
2. Amino acid composition
3. Sequence alignment using BLAST
4. Physico-chemical parameters including secondary structure, charge, size, hydrophobicity and amphiphilic character
as these yielded an appreciable accuracy using machine learning technique. The values of physico-chemical properties were retrieved from AA index database(Kawashima and Kanehisa 2000)
Algorithm
Support Vector Machine (SVM) was implemented using freely downloadable software package SVMlight (Joachims 1999, http://svmlight.joachims.org/). SVMlight is an implementation of Vapnik's Support Vector Machine (Vapnik, 1995) for the problem of pattern recognition .The software enables the user to define a number of parameters as well as to select from a choice of inbuilt kernel functions, including a radial basis function (RBF) and a polynomial kernel.
Evaluation Modules
The performance modules constructed in this study were evaluated using a 5-fold cross-validation technique. In the 5-fold cross-validation, the relevant dataset was partitioned randomly into five equally sized sets. The training and testing was carried out five times, each time using one distinct set for testing and the remaining four sets for training. The performance of the methods was computed using the following formulas
Sensitivity (Sn) = [TP / (TP+FN)]*100
Specificity (Sp) = [TN / (TN+FP)]*100
Accuracy (Ac ) = [TP+TN / (TP+FP+TN+FN)]*100
TP and TN are correctly predicted antiviral and non-antiviral peptides respectively.
FP and FN are wrongly predicted antiviral peptides and non-antiviral peptides respectively.
Bailey, T. L., M. Boden, et al. (2009). "MEME SUITE: tools for motif discovery and searching." Nucleic Acids Res 37(Web Server issue): W202-208.
Kawashima, S. and M. Kanehisa (2000). "AAindex: amino acid index database." Nucleic Acids Res 28(1): 374.
Vladimir N. Vapnik, The Nature of Statistical Learning Theory. Springer, 1995.
Thorsten Joachims, Transductive Inference for Text Classification using Support Vector Machines. International Conference on Machine Learning (ICML), 1999.
SVMlight for Linux downloaded from http://download.joachims.org/svm_light/current/svm_light.tar.gz