Help about ChloroPred |
Support vector machine (SVM) is a novel machine learning method. It is based on the statistical learning theory presented by V.N.Vapnik, it has been successfully applied to numerous classification and pattern recognition problems such as text categorization, image recognition and bioinformatics. The application of SVM results in the globally optimized while with neural networks, the gradient based on training algorithms and the solution for a classification problems. The SVM light is a freely downloadable package written by Joachim's which can be downloadable from http://ais.gmd.de/~thorsten/svm_light/. The SVM_light is used to predict the chloroplast protein. The SVM modules were developed based on the amino acid composition and dipeptide composition.
Evaluation of Performance:-
The 5 fold cross validation technique examined the prediction quality. In this technique the relevant dataset was partitioned randomly into 5 equal datasets. The training and testing was carried out five times, each time onset for testing and other 4 sets for training. The accuracy of results commonly measured by the quantity of True Positives (TP), True Negatives (TN), False Positives (FP) and False Negatives (FN). In the prediction system the total prediction accuracy, Mathew's correlation co-efficient (MCC), sensitivity and specificity was calculated by following equations.
Sensitivity = TP / (TP+FN),
Specificity = TN / (TN+FP),
Accuracy = TP+TN / TP+TN+FP+FN and
MCC = (TP * TN) - (FP*FN) / Ö(TP+FN)* (TP+FP)*(TN+FP)*(TN+FN).
-
weka-SMO
We use John C. Platt's sequential minimal optimization (SMO) algorithm for training a support vector classifier using polynomial or RBF kernels. This implementation globally replaces all missing values and transforms nominal attributes into binary ones. It also normalizes all attributes by default. Platt’s comparative testing against other algorithms has shown that SMO is often much faster and has better scaling properties. Sequential minimal optimization (SMO) is a carefully organized algorithm with excellent computational efficiency. However, because of its way of computing and using a single threshold value [Keerthi SS et al, 2001].