PolyApred: Polyadenylation signal prediction server from human DNA sequence

PolyApred is a support vector machine (SVM) based method for the prediction of polyadenylation signal (PAS) in human DNA sequence. In this method we developed mixed pattern as an input feature by using different nucleotides frequency frequency of 100nt long upstream sequence combined with frequency of 100 nt long downstream sequence relative to Polyadenylation signal (PAS).The maximum MCC achieved using mononucleotide, dinucleotide and trinucleotide and tetranucleotide frequency were 0.51, 0.62, 0.67 and 0.68 respectively.To obtain more information about region specific distinct base elements, we split each 100nt long sequence into two regions and features of each region were combined to develop SVM model.Finally we developed a hybrid method, which combination frequency of dinucleotide, pseudo-dinucleotide and tetranucleotide of each region and achieved maximum MCC of 0.72. By using these region specific base features, SVM model was developed that predict PAS with high accuracy than the other methods.
The main aim of this server is to help users to identify the real polyadenylation signal (NNUANA) from pseudo-PAS which constitute ~93% PAS in human. The input sequence should not be shorter than 206 nucleotide sequence and must be in fasta format. Server may take more time to predict Sequence of 1000nt long .