Dataset:
In first step of data collection we search for PDB having NAG as heteroatom and extract 647 PDB Ids of protein having contact with NAG in PDB. In next step use these PDB IDs in Ligand Protein Contact (LPC: Sobolev V. 1998) and get total 1502 chain which interacts with NAG with their corresponding interacting residues. Then we remove redundant chains which have more than 25% similarity by using Blast-clust, finally retrieved a total 120 interacting chains with a total 1029 NAG interacting residues rest are noninteracting residues. Binary Patterns: Amino acids were represented as binary string of length 21 where 20 "0" and a unique position set to "1" for each amino acid. For example an amino acid(A) can be represented as follows A = 100000000000000000000 Evolutionary information (PSSM): Evolutionary information obtaineb from position specific scoring matrix (PSSM) generated during PSI-BLAST search against non-redundant (nr) database of protein sequence. The evolutionary information for each amino acid is encapsulated ina vector of 21 dimensions where the size of PSSM matrix of a protein with N residue is 21 * N. Where 20 dimension are standard amino acid and 1 for dummy amino acid. We normalized each value within the range of 0-1. Amino Acid composition calculation pattern information can be encapsulated in a vector of 20 dimensions, using amino acid composition of pattern. The amino acid composition is the fraction of each amino acid type within a pattern (window size).The fractions of all 20 natural amino acids were calculated by usingfollowing formula: Amino acid composition (%) =Total number of amino acid(i)/Total number of amino acids in sequence ×100. Where (i) is any amino acid. Support Vector Machines: The SVM was implemented using freely downloadable software package SVM_light written by Joachims (Joachims 1999). The software enables the user to define a number of parameters as well as to select from a choice of inbuilt kernal functions, including a radial basis function (RBF) and a polynomial kernal. Evaluation module: The performance modules constructed in this study were evaluated using a 5-fold cross-validation technique. In the 5-fold cross-validation, the relevant dataset was partoned randomly into five equally sized sets. The training and testing was carried out five times, each time using one distinct set for testing and the remaining four sets for training.The performance of the methods was computed using the following formulas Sensitivity = TP/TP+FNX100 Specificity = TN/TN+FPX100 Accuracy = TP+TN/TP+FP+TN+FN Where TP and TN are correctly predicted antibacterial peptides and non antibacterial peptides respectively. FP and FN are wrongly predicted antibacterial peptides and non antibacterial peptides respectively. |