Help about ChloroPred |
Sequence Name:
The user can give a name of sequence which he wish to input for the prediction.
E-mail Address:Depending upon the option which will be opted for prediction or number of queries in que, time required to serve the query will vary from 10 seconds to more than 5 minutes. Instead of waiting till the prediction is complete, this option provide oppurtunity to put the email-id of user to the server. As the prediction is complete, user will receive an email to intimate the same. Only alphabets or numerics or '_' should be there in email-id.
Sequence:On the text box beside this option, sequence whose localization has is to be predicted, can be pasted. Option of uploading the sequence file in Fasta format is also there. By default the server takes only single letter code of amino acids. The server also has the capability to ignore all the non-standard characters such as ,*%!@$%-_ etc.
>ACCD MEKSWFNLMFSKGELEYRGELSKAMDSFAPIEKTTISKDRFIYDMDKNFYGWGERSSYYNNVDLLVNSKDIRNFISDDTFFVRDSNKNSY SIYFDIEKKKFEINNDLSDLEIFFYSYCSSSYLNNRSKGDNDLHYDPYIKDTKYNCNNHINSCIDSYFRSHICINSHFLSDSNNSNESYIYN FICSESGSGKIRESKNDKIRTNSNRNNLMSSKDFDITKNYNQLWIQCDNCYGLKYKKVEMNVCEECGHYLKMTSSERIELSIDPGSWNGMDED MVSADPIKFHSREEPYKKRIASAQKKTGLTDAIQTGTGQLNGIPVALGVMDFQFMGGSMGSVVGDKITRLIEYATNQCLPLILVCSSGGARMQ EGSLSLMQMAKISSVLCDYQSSKKLFYISILTSPTTGGVTASLGMLGDIIIAEPYAYIAFAGKRVIEQTLKKAVPEGSQAAESLLRKGLLDA IVPRNPLKGVVSELFQLHAFFPLNKNEIKPrediction options:There are three methods of prediction:-
Simple Amino Acid composition based SVM model: In this method percentage composition of all 20 amino acids were calculated, which inturn were used to derive the weight corrosponding to each amino acid. It was done by substracting the composition data. To determine the any unknown protein, compositions is calculated and then corrosponding weight is multiplied to it.
Dipeptide composition based SVM model: In this approach, percentage composition of all 400 dipeptides were calculated, which inturn wereused to derive the weight corrosponding to each dipeptide. It was done by substracting the composition of same dipeptide from chloroplast and non-chloroplast data. To determine the localization of any unknown protein, dipeptide compositions is calculated and then corrosponding weight is multiplied to it.
Dipeptide composition based weka-SMO prediction model: weka-SMO is a new machine-learning technique called the Weka classifier. In this prediction option we used dipeptide composition as input vector because maximum accuracy is achieved by Dipeptide composition, so this approach would be most reliable prediciton method.
Hybrid Approach In this method we integrated two approaches, Pfam domain information and SVM based prediction. HMM based Pfam search will be done to examine which domain(s) is/are present in the input sequence. If any domain is found then the class to which it belongs in our domain catalogue will be scaned. Our domain catalogue classify Pfam domains into three classes (i) those that found exclusively in chloroplast proteins, (ii) those that found in other location (not chloroplast) and (iii) those found both in chloroplast and non-chloroplast proteins. If input sequence contains even one class (i) or (ii) domain then it is directly assigned to that class directly. But if only class (iii) or no domain is found then SVM is used for prediction. In this method first we search Pfam Domain and then run SVM prediction method so it will take time in domains searching, so it take only one sequence at a time.