The study was performed on the datasets taken from miRBase version 11.0. We retrieve the distinct sequence of miRNA* and its corresponding miRNA which were experimentally validated. Finally we got 359 sequences of miRNA precursors, from 21 different organisms including viruses, for our study. Here we only considered the 359 sequences of matured miRNA and its corresponding miRNA* i.e. 359 pairs of sequences. These sequences are divided into two groups.
(A) The datasets for training: It consists of 329 sequences of miRNA and its corresponding miRNA*. The investigation and development of models were carried out using features of this dataset.
(B) The datasets for independent testing: The model, developed on training dataset, was tested on independent dataset to evaluate its performance. These datasets contains 30 sequences of miRNAs and its corresponding miRNA*. These sequences have no homology with each other and even not with training dataset.
This study describes a method developed for predicting functional strand of miRNAs by using support vector machine (SVM). All models were trained and tested on 329 miRNA and 329 miRNA* using five fold cross validation technique. Firstly, models were developed using mono-, di-, tri-nucleotide composition and achieved highest accuracy of 58.8%, 63.8% and 59.5% respectively. Secondly, models were developed using split nucleotide composition and achieved an accuracy of 55.3%, 64.1% and 60.1% for mono-, di-, and tri-nucleotide respectively. Models were also developed using binary pattern and achieved highest accuracy of 70.82%. Furthermore we integrate the secondary structure feature with binary pattern which improves the model accuracy up to 71.88%. Finally, a hybrid model have been developed by incorporating combined features of secondary structure with binary pattern and G/C contents that achieved an accuracy of 79.94% at sensitivity of 78% and specificity of 81.7%. Moreover, the performance of this model was tested on independent dataset and achieved an accuracy of 80% which clearly demonstrate the efficacy of our algorithm.