Computaional Resources for Proteome Annotation and Proteomics


The proteome is the entire complement of proteins expressed by an organism, tissue, cell or a genome. More specifically, it is the expressed proteins at a given time point under specific conditions. A cellular proteome is the set of proteins found in a particular cell type under a particular set of environmental conditions such as exposure to hormone. The proteome is not limited to the number of the sequences of the proteins present. Thus it is evident that the proteome is larger than the genome, especially in eukaryotes. This is due to post-translational modifications like glycosylation or phosphorylation etc., and alternative splicing of genes in eukaryotes. Understanding of the proteome requires knowledge of: the structure of the proteins in the proteome and the functional interaction between the proteins. List of computer resources available in the field of proteome annotation and proteomics is given below:

Servers integrated at CRDD

Server Description
AC2DGel This is a web server for analysis and comparison of two-dimensional electrophoresis (2-DE) Gel images. It helps in annotating the virual 2-D gel image proteins on the basis of known molecular weight andpH scales of the markers.
ESLpredThis is a SVM based method for predicting subcellular localization of Eukaryotic proteins using dipeptide composition and PSIBLAST generated pfofile Using this server user may know the function of their protein based on its location in cell. (Bhasin, M. and Raghava, G. P. S., (2004) Nucleic Acid Res. 32(Web Server issue):W414-9).
NRpredThis is a SVM based tool for the classification of nuclear receptors on the basis of amino acid composition or dipeptide composition. The overall prediction accuracy of amino acid composition and dipeptide composition based methods is 82.6% and 97.2% (Bhasin, M. and Raghava, G. P. S., (2004) Journal of Biological Chemistry 279(22):23262-6
GPCRpredThis is a server forpredicting G-protein-coupled receptors and for classifying them in families and sub-families. This server can play vital role in drug design, as GPCR are commonly used as drug targets (Bhasin, M. and Raghava, G. P. S., (2004) Nucleic Acid Res. 32(Web Server issue):W383-9)
GPCRSclassThis is a dipeptide composition based method for predicting Amine Type of G-protein-coupled receptors. In this method type amine is predicted from dipeptide composition of proteins using SVM. (Bhasin M, Raghava GP. (2005) 33(Web Server issue):W143-7) protein coding regions in human genomic DNA.
Comp2DGelComparison, management and access of 2D gel electrophoresis.
DNASIZEThis web-server allow to compute the length of DNA or protein fragments from its electropheric mobility using a graphical method (Raghava, G. P. S. (2001) Biotech Software and Internet Report, 2:198).
HSLpred This server allows predicting the subcellulare localization of human proteins. This is based on various type of residue composition of proteins using SVM technique. (Garg A, Bhasin M, Raghava GP. J Biol Chem. (2005) 280(15):14427-32)
PSLpredA method for subcellular localization proteins belongs to prokaryotic genomes. The pathogen play an important role in our life. (Bhasin M, Garg A, Raghava GP. Bioinformatics. (2005) 21(10):2522-4)
MANGOPrediction of manually annotated proteins in Genome Ontology (GO). This server is based on nearest neighbor method (NNM).
BtxpredThe aim of BTXpred server is to predict bacterial toxins and its function from primary amino acid sequence.
MitpredThis server predicts mitochondrial proteins.
SRTpredThis server classifies protein sequence as secretory or non-secretory proteins.
HemopredIt allows users to predict hemoglobin proteins.
VGIchanThe aim of this server is to predict voltage gated ion-channels and classify them into sodium, potassium, calcium and chloride ion channels from primary amino sequences.
SGpredThis server allows user to identify and visulaze the genes which have different expression level in normal and disease conditions.
LGEpredThis server allows user to analsis the expresion data (Microarray Data) where it calculate correlation coefficient between amino acid residue and gene expression level.
NTXpredThe aim of this server is to predict neurotoxins and it source and probable functions from primary amino acid sequences.
VICMpredThis server aids in broad functional classification of bacterial proteins into virulence factors, information molecule, cellular process and metabolism molecule.(Saha, S. and Raghava, G. P. S.(2006) Genomics Proteomics & Bioinformatics(In Press)
AlgpredThis server predicts allergens from amino acid sequences using presence of IgE epitopes, MEME/MAST motif, allergen representative peptides BLAST search and SVM based method.(Saha, S. and Raghava, G. P. S.(2006) Nucleic Acids Research(In Press)
RBpredThis server predicts rice leaf blast severity(%) based on the weather parameters and utilizes the regression mode of SVM.
RSL-predThis server predicts subcellular localization of rice proteins e.g, chloroplast, cytoplasmic, mitochondrial and nuclear proteins.
AntiBPThis is a QM, SVM, ANN based server that predicts whether a peptides sequences is an antibacterial peptide or not. It also identifies antibacterial peptides in a protein sequence.
COpidThis server find proteins that are amino acid compositionaly similar to other proteins present in database. It can be used to compare and calculate amino acid/dipeptide composition, and can form distance matrix for phylogenetic analysis. It can also be used for patterns generation for SNNS, SVM and Timble.
siRNApredThis server predicts siRNA and utilize SVM based on composition.

Web Servers/Databases/Mirror Sites


web servers


1. Subcellular location Prediction Servers
Server Description Standalone Available
NetNESLeucine-rich nuclear exhttp://www.cbs.dtu.dk/services/NetNESport signals (NES) in eukaryotic proteins.YES
PSORTPrediction of protein subcellular localization.YES
SecretomePNon-classical and leaderless secretion of protein.YES
TargetPPrediction of subcellular location.YES
TatPTwin-arginine signal peptides.NO
DASPrediction of transmembrane regions in prokaryotes using the Dense Alignment Surface method.NO
HMMTOPPrediction of transmembrane helices and topology of proteins.YES
PredictProteinPredi ction of transmembrane helix location and topology.NO
TMAPTransmembrane detection based on multiple sequence alignment.NO
SOSUIPrediction of transmembrane regions.NO
TMHMMPrediction of transmembrane helices in proteins.YES
TMpredPrediction of transmembrane regions and protein orientationNO
TopPredTopology prediction of membrane proteinsNO
PSLDocUses document classification techniques and incorporates a probabilistic latent semantic analysis with a support vector machine model, for prediction on prokaryotes and eukaryotes.NO
PSL101Hybrid prediction method for Gram-negative bacteria that combines a one-versus-one support vector machine(SVM) model and a structure homology approach.NO
SLP-LocalPredicts localizations for chloroplast, mitochondria, secretory pathway, and other locations (nucleus or cytosol) for eukaryotic proteins, as well as cytoplasm, extracell, and periplasm for Gram negative organisms.NO
CELLOUses a two-level Support Vector Machine system to assign localizations to both prokaryotic and eukaryotic proteins.NO
PA-SUBThis specialized server available at the PENCE Proteome Analyst site is able to classify Gram-negative, Gram-positive, fungi, plant and animal proteins to many localization sites.NO
LOCtreeLOCtree is a eukaryotic and prokaryotic localization prediction tool.NO
subLocUses Support Vector Machine to assign a prokaryotic protein to the cytoplasmic, periplasmic, or extracellular sites, and a eukaryotic protein to the cytoplasmic, mitochondrial, nuclear, or extracellular sites.NO
EpiLocA text-based system for predicting animal, plant and fungal protein subcellular locations.NO
ProLoc-GOUtilizes Gene Ontology terms for sequenced-based prediction of subcellular localization.NO
AAIndexLocPredicts protein subcellular localization by using amino acid composition and physicochemical properties.NO
SCLFAPredicts localizations by feature vectors based on amino acid composition (frequency) and sequence alignment. Subcellular locations predicted include chloroplast, mitochondria, secretory pathway, and other locations (nucleus or cytosol) for eukaryotic proteins.NO
SherLocIntergrates several sequence and text-based features and provides predictions for plant, animal, and fungal proteins.NO
SLPSSubcellular Localization Predicting System, predicts localization using a Nearest Neighbor Algorithm (NNA) and incorporating a protein functional domain profile.NO
BaCelLoPredictor for five classes of eukaryotic subcellular localization (secretory pathway, cytoplasm, nucleus, mitochondrion and chloroplast) and it is based on different SVMs organized in a decision tree.NO
Protein ProwlerA multi-layer classifer system for predicting the subcellular localization of proteins based on their amino acid sequence. It classifies eukaryotic targeting signals as secretory, mitochondrion, chloroplast or other.NO
pTARGETUses amino acid composition and localization-specific Pfam domains to assign a eukaryotic protein to one of nine localization sites.NO
Golgi predictorPredicts Golgi Type II membrane proteins and can discriminate between proteins destined for the Golgi apparatus or other post-Golgi locations.NO
LOCSVMPSIA eukaryotic localization prediction method that incorporates evolutionary information into its predictions. The method uses PSI-BLAST and support vector machine to generate predictions for up to 12 localization sites.NO
PSLTA Bayesian network-based method that predicts human protein localization based on motif/domain co-occurence.NO
ESLPredUses Support Vector Machine and PSI-BLAST to assign eukaryotic proteins to the nucleus, mitochondrion, cytoplasm, or extracellular space.NO
Nuc-PLocA web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM.NO
NUCLEOPredicts possible nuclear localization by taking into consideration of dually localized proteins. It uses an SVM-based approach with a custom kernel that employs a composite spectrum (or multiple k-mer) encoding conjoined with a bit vector indicating the presence or absence of a range of sequence motifs known to be important for nuclear proteins.NO
NucPredPredicts possible nuclear localization by using a genetic programming-based algorithm.NO
ProLocPredicts subnuclear localizations using an evolutionary SVM based classifier with automatic selection from a large set of physicochemical composition (PCC) features.NO
Subnuclear Compartments Prediction SystemPredicts subnuclear localization by combining an SVM-based system for sequence analysis with a nearest-neighbor classifier using a similarity measure derived from the GO annotation terms for the protein sequences.NO
NetNESPredicts nuclear export signals using neural network and HMMs.NO
PredictNLSUses nuclear localization signal motifs to predict whether a protein might be localized to the nucleus.YES
ChloroPPrediction of chloroplast transit peptides.YES
LipoPPrediction of lipoproteins and signal peptides in Gram negative bacteria.YES
MITOPROTPrediction of mitochondrial targeting sequences.YES
PATSPrediction of apicoplast targeted sequences.NO
PlasmitPrediction of mitochondrial transit peptides in Plasmodium falciparum.NO
PredotarPrediction of mitochondrial and plastid targeting sequences.NO
PTS1Prediction of peroxisomal targeting signal 1 containing proteins.NO
SignalIPPrediction of peptide cleavage sites.YES


2. Servers calculating physiochemical properties of amino acids

Server Description Standalone Available
AACompIdentIdentify a protein by its amino acid composition.NO
AACompSimCompare the amino acid composition of a UniProtKB/Swiss-Prot entry with all other entries.NO
TagIdentIdentify proteins with isoelectric point (pI), molecular weight (Mw) and sequence tag, or generate a list of proteins close to a given pI and Mw.NO
MultiIdentIdentify proteins with isoelectric point (pI), molecular weight (Mw), amino acid composition, sequence tag and peptide mass fingerprinting data.NO
ProtParamPhysico-chemical parameters of a protein sequence (amino-acid and atomic compositions, isoelectric point, extinction coefficient, etc.).NO
Compute pI/MwCompute the theoretical isoelectric point (pI) and molecular weight (Mw) from a UniProt Knowledgebase entry or for a user sequence.NO
IsotopIdentPredicts the theoretical isotopic distribution of a peptide, protein, polynucleotide or chemical compound.NO
AldenteIdentify proteins with peptide mass fingerprinting data. A new, fast and powerful tool that takes advantage of Hough transformation for spectra recalibration and outlier exclusion.NO
MascotPeptide mass fingerprint from Matrix Science Ltd., London.NO
PepMAPPERPeptide mass fingerprinting tool from UMIST, UK.NO
ProteinProspectorUCSF tools for peptide masses data (MS-Fit, MS-Pattern, MS-Digest, etc.).NO
ProFoundSearch known protein sequences with peptide mass information from Rockefeller and NY Universities.NO
PhenyxProtein and peptide identification/characterization from MS/MS data from GeneBio, Switzerland.NO
OMSSAMS/MS peptide spectra identification by searching libraries of known protein sequences.NO
PepFragSearch known protein sequences with peptide fragment mass information from Rockefeller and NY Universities.NO
MALDIPepQuantQuantify MALDI peptides (SILAC) from Phenyx output.NO
pIcarverVisualize theoretical distributions of peptide pI on a given pH range and generate fractions with similar peptide frequencies.NO
GlycanMassCalculate the mass of an oligosaccharide structure.NO
GlycoModPredict possible oligosaccharide structures that occur on proteins from their experimentally determined masses (can be used for free or derivatized oligosaccharides and for glycopeptides)NO

3. Servers Predicting Post-translational Modifications

server Description Standalone Available
peptideMassCalculate masses of peptides and their post-translational modifications for a UniProtKB/Swiss-Prot or UniProtKB/TrEMBL entry or for a user sequence.NO
FindModPredict potential protein post-translational modifications and potential single amino acid substitutions in peptides. Experimentally measured peptide masses are compared with the theoretical peptides calculated from a specified Swiss-Prot entry or from a user-entered sequence, and mass differences are used to better characterize the protein of interest.NO
FindPeptIdentify peptides that result from unspecific cleavage of proteins from their experimental masses, taking into account artefactual chemical modifications, post-translational modifications (PTM) and protease autolytic cleavageNO
Popitamdentification and characterization tool for peptides with unexpected modifications (e.g. post-translational modifications or mutations) by tandem mass spectrometry.NO
DictyOGlycPrediction of GlcNAc O-glycosylation sites in Dictyostelium.NO
NetCGlycC-mannosylation sites in mammalian proteins.NO
NetOGlycPrediction of O-GalNAc (mucin type) glycosylation sites in mammalian proteins.YES
NetGlycateGlycation of epsilon amino groups of lysines in mammalian proteins.YES
NetNGlycPrediction of N-glycosylation sites in human proteins.YES
OGPETPrediction of O-GalNAc (mucin-type) glycosylation sites in eukaryotic (non-protozoan) proteins.YES
YinOYangO-beta-GlcNAc attachment sites in eukaryotic protein sequences.YES
big-PI PredictorGPI Modification Site Prediction.NO
GPI-SOMIdentification of GPI-anchor signals by a Kohonen Self Organizing Map.YES
MyristoylatorPrediction of N-terminal myristoylation by neural networks.NO
NMTPrediction of N-terminal N-myristoylation.NO
CSS-PalmPalmitoylation site prediction with CSS.YES
PrePSPrenylation Prediction Suite.NO
NetAcetPrediction of N-acetyltransferase A (NatA) substrates (in yeast and mammalian proteins).YES
NetPhosPrediction of Ser, Thr and Tyr phosphorylation sites in eukaryotic proteins.YES
NetPhosKKinase specific phosphorylation sites in eukaryotic proteins.NO
NetPhosYeastSerine and threonine phosphorylation sites in yeast proteins.NO
SulfinatorPrediction of tyrosine sulfation sites.NO
SulfoSitePrediction of tyrosine sulfation sites.NO
SUMOplotPrediction of SUMO protein attachment sites.NO
TermiNatorPrediction of N-terminal modification.NO
NetPicoRNAPrediction of protease cleavage sites in picornaviral proteins.NO
NetCoronaCoronavirus 3C-like proteinase cleavage sites in proteins.NO
ProPArginine and lysine propeptide cleavage sites in eukaryotic protein sequences.YES
PeptideCutterPredicts potential protease and cleavage sites and sites cleaved by chemicals in a given protein sequence.NO

Databases


1.Proteomics (2D and MALDI) Databases

server Description Standalone Available
SWISS-2DPAGEcontains data on proteins identified on various 2-D PAGE and SDS-PAGE reference maps..NO
WORLD-2DPAGEA Dynamic Portal to query simultaneously World-Wide Gel-based Proteomics Databases.NO
DOSAC-COBS 2D-PAGE2D-PAGE server to query 'DOSAC-COBS 2D Page.NO
Plasmo2DbasePlasmodium falciparum 2-DE database at Indian Institute of Science, Bangalore, India.NO
Cornea-2DPAGEHuman cornea, Department of Molecular Biology, Faculty of Science, Aarhus University, Denmark.NO
REPRODUCTION-2DPAGE2D-PAGE database (Human ovary, Mouse testis).NO
ANU-2DPAGE2-DE database (Rice anther and Medicago truncatula) of the Australian National University, Canberra, Australia.NO
OGP-WWWOxford GlycoProteomics database (Human platelet).NO
PHCI-2DPAGEParasite host cell interaction 2D-PAGE database.NO
RAT HEART-2DPAGE2-DE database of rat heart.NO
SIENA-2DPAGE2D-PAGE database (Chlamydia trachomatis, Caenorhabditis elegans, Human breast ductal carcinoma and histologically normal tissue, Human amniotic fluid).NO

2.Subcellular Location Databases

server Description Standalone Available
eSLDBcollects the annotations of subcellular localizations of eukaryotic proteomes based on experimental results, homology, and computational predictions.NO
PSORTdbA two-component searchable and browsable database. ePSORTdb contains bacterial proteins of experimentally verified localization used in training and testing of PSORTb. cPSORTdb contains predictions of localization for bacterial genomes..NO
SUBAAn Arabidopsis subcellular localization database with annotations based on experimental results, literature references, Swiss-Prot annotations, and computational predictions.NO
FTFLP Databasecontains a collection of Arabidopsis protein localizations verified using fluorescent tagging of full-length proteins.NO
SPdbA signal peptide database containing a repository of experimentally verified and predicted signal peptides.NO
NESbaseA database with a collection of nuclear export signals.NO
LOCATEA database that houses data describing the membrane organization and subcellular localization of human and mouse proteins.NO
PDBTMA database of transmembrane proteins with known 3D structures.NO
PA-GOSUBA database collecting the localization predictions made by the Proteome Analyst tool.NO
Organelle DBA database of eukaryotic proteins found at various organelles and subcellular structures.NO
AMPDBA database of known and predicted mitochondrial proteins in the plant species Arabidopsis thaliana.NO
MITOMAPA database of information related to the human mitochondrial genome.NO
DBSubLocA dataset of proteins with annotated subcellular localizations according to SWISS-PROT and PIR.NO
LOCtargetA database of LOCtree predictions for structural genomics targets.NO
LOC3dA database of predicted localizations for eukaryotic proteins with 3D structures.NO
LOCkeyContains predicted localizations for the human, Arabidopsis, fly, yeast and worm genomes based on Swiss-Prot keywords.NO
LOChomIs a database of predicted localizations based on homology to experimentally annotated proteins.NO
SignalIpThe dataset of prokaryotic and eukaryotic secreted and non-secreted proteins used to train SignalP, and also used to train PSORTb's signal peptide prediction module.NO
SignalPeptidesThe dataset of prokaryotic and eukaryotic secreted and non-secreted proteins used in an independent evaluation of several signal peptide prediction methods, and used to test PSORTb's signal peptide prediction module.NO

3.Post-translation Modifications Databases

server Description Standalone Available
PRENbaseDatabase of Prenylated Proteins.NO