MSLVP: prediction of multiple subcellular localization of viral proteins using a support vector machine.

Thakur, Anamika and Rajput, Akanksha and Kumar, Manoj (2016) MSLVP: prediction of multiple subcellular localization of viral proteins using a support vector machine. Molecular bioSystems, 12 (8). pp. 2572-86. ISSN 1742-2051

[img] PDF
c6mb00241b.pdf - Published Version
Restricted to Registered users only

Download (2533Kb) | Request a copy
Official URL: http://pubs.rsc.org/en/Content/ArticleLanding/2016...

Abstract

Knowledge of the subcellular location (SCL) of viral proteins in the host cell is important for understanding their function in depth. Therefore, we have developed "MSLVP", a two-tier prediction algorithm for predicting multiple SCLs of viral proteins. For this study, data sets of comprehensive viral proteins with experimentally validated SCL annotation were collected from UniProt. Non-redundant (90%) data sets of 3480 viral proteins that belonged to single (2715), double (391) and multiple (374) sites were employed. Additionally, 1687 (30% sequence identity) viral proteins were categorised into single (1366), double (167) and multiple (154) sites. Single, double and multiple locations further comprised of eight, four and six categories, respectively. Viral protein locations include the nucleus, cytoplasm, endoplasmic reticulum, extracellular, single-pass membrane, multi-pass membrane, capsid, remaining others and combinations thereof. Support vector machine based models were developed using sequence features like amino acid composition, dipeptide composition, physicochemical properties and their hybrids. We have employed "one-versus-one" as well as "one-versus-other" strategies for multiclass classification. The performance of "one-versus-one" is better than the "one-versus-other" approach during 10-fold cross-validation. For the 90% data set, we achieved an accuracy, a Matthew's correlation coefficient (MCC) and a receiver operating characteristic (ROC) of 99.99%, 1.00, 1.00; 100.00%, 1.00, 1.00 and 99.90%; 1.00, 1.00 for single, double and multiple locations, respectively. Similar results were achieved for a 30% sequence identity data set. Predictive models for each SCL performed equally well on the independent dataset. The MSLVP web server () can predict subcellular locations i.e. single (8; including single and multi-pass membrane), double (4) and multiple (6). This would be helpful for elucidating the functional annotation of viral proteins and potential drug targets.

Item Type: Article
Additional Information: Copyright of this article belongs to RSC.
Subjects: Q Science > QR Microbiology
Depositing User: Dr. K.P.S.Sengar
Date Deposited: 04 Oct 2016 06:20
Last Modified: 04 Oct 2016 06:20
URI: http://crdd.osdd.net/open/id/eprint/1918

Actions (login required)

View Item View Item