Computaional Resources for Proteome Annotation and Proteomics

The proteome is the entire complement of proteins expressed by an organism, tissue, cell or a genome. More specifically, it is the expressed proteins at a given time point under specific conditions. A cellular proteome is the set of proteins found in a particular cell type under a particular set of environmental conditions such as exposure to hormone. The proteome is not limited to the number of the sequences of the proteins present. Thus it is evident that the proteome is larger than the genome, especially in eukaryotes. This is due to post-translational modifications like glycosylation or phosphorylation etc., and alternative splicing of genes in eukaryotes. Understanding of the proteome requires knowledge of: the structure of the proteins in the proteome and the functional interaction between the proteins. List of computer resources available in the field of proteome annotation and proteomics is given below:

Servers integrated at CRDD

Server	Description
AC2DGel	This is a web server for analysis and comparison of two-dimensional electrophoresis (2-DE) Gel images. It helps in annotating the virual 2-D gel image proteins on the basis of known molecular weight andpH scales of the markers.
ESLpred	This is a SVM based method for predicting subcellular localization of Eukaryotic proteins using dipeptide composition and PSIBLAST generated pfofile Using this server user may know the function of their protein based on its location in cell. (Bhasin, M. and Raghava, G. P. S., (2004) Nucleic Acid Res. 32(Web Server issue):W414-9).
NRpred	This is a SVM based tool for the classification of nuclear receptors on the basis of amino acid composition or dipeptide composition. The overall prediction accuracy of amino acid composition and dipeptide composition based methods is 82.6% and 97.2% (Bhasin, M. and Raghava, G. P. S., (2004) Journal of Biological Chemistry 279(22):23262-6
GPCRpred	This is a server forpredicting G-protein-coupled receptors and for classifying them in families and sub-families. This server can play vital role in drug design, as GPCR are commonly used as drug targets (Bhasin, M. and Raghava, G. P. S., (2004) Nucleic Acid Res. 32(Web Server issue):W383-9)
GPCRSclass	This is a dipeptide composition based method for predicting Amine Type of G-protein-coupled receptors. In this method type amine is predicted from dipeptide composition of proteins using SVM. (Bhasin M, Raghava GP. (2005) 33(Web Server issue):W143-7) protein coding regions in human genomic DNA.
Comp2DGel	Comparison, management and access of 2D gel electrophoresis.
DNASIZE	This web-server allow to compute the length of DNA or protein fragments from its electropheric mobility using a graphical method (Raghava, G. P. S. (2001) Biotech Software and Internet Report, 2:198).
HSLpred	This server allows predicting the subcellulare localization of human proteins. This is based on various type of residue composition of proteins using SVM technique. (Garg A, Bhasin M, Raghava GP. J Biol Chem. (2005) 280(15):14427-32)
PSLpred	A method for subcellular localization proteins belongs to prokaryotic genomes. The pathogen play an important role in our life. (Bhasin M, Garg A, Raghava GP. Bioinformatics. (2005) 21(10):2522-4)
MANGO	Prediction of manually annotated proteins in Genome Ontology (GO). This server is based on nearest neighbor method (NNM).
Btxpred	The aim of BTXpred server is to predict bacterial toxins and its function from primary amino acid sequence.
Mitpred	This server predicts mitochondrial proteins.
SRTpred	This server classifies protein sequence as secretory or non-secretory proteins.
Hemopred	It allows users to predict hemoglobin proteins.
VGIchan	The aim of this server is to predict voltage gated ion-channels and classify them into sodium, potassium, calcium and chloride ion channels from primary amino sequences.
SGpred	This server allows user to identify and visulaze the genes which have different expression level in normal and disease conditions.
LGEpred	This server allows user to analsis the expresion data (Microarray Data) where it calculate correlation coefficient between amino acid residue and gene expression level.
NTXpred	The aim of this server is to predict neurotoxins and it source and probable functions from primary amino acid sequences.
VICMpred	This server aids in broad functional classification of bacterial proteins into virulence factors, information molecule, cellular process and metabolism molecule.(Saha, S. and Raghava, G. P. S.(2006) Genomics Proteomics & Bioinformatics(In Press)
Algpred	This server predicts allergens from amino acid sequences using presence of IgE epitopes, MEME/MAST motif, allergen representative peptides BLAST search and SVM based method.(Saha, S. and Raghava, G. P. S.(2006) Nucleic Acids Research(In Press)
RBpred	This server predicts rice leaf blast severity(%) based on the weather parameters and utilizes the regression mode of SVM.
RSL-pred	This server predicts subcellular localization of rice proteins e.g, chloroplast, cytoplasmic, mitochondrial and nuclear proteins.
AntiBP	This is a QM, SVM, ANN based server that predicts whether a peptides sequences is an antibacterial peptide or not. It also identifies antibacterial peptides in a protein sequence.
COpid	This server find proteins that are amino acid compositionaly similar to other proteins present in database. It can be used to compare and calculate amino acid/dipeptide composition, and can form distance matrix for phylogenetic analysis. It can also be used for patterns generation for SNNS, SVM and Timble.
siRNApred	This server predicts siRNA and utilize SVM based on composition.

Web Servers/Databases/Mirror Sites

web servers

1. Subcellular location Prediction Servers

Server	Description	Standalone Available
NetNES	Leucine-rich nuclear exhttp://www.cbs.dtu.dk/services/NetNESport signals (NES) in eukaryotic proteins.	YES
PSORT	Prediction of protein subcellular localization.	YES
SecretomeP	Non-classical and leaderless secretion of protein.	YES
TargetP	Prediction of subcellular location.	YES
TatP	Twin-arginine signal peptides.	NO
DAS	Prediction of transmembrane regions in prokaryotes using the Dense Alignment Surface method.	NO
HMMTOP	Prediction of transmembrane helices and topology of proteins.	YES
PredictProtein	Predi ction of transmembrane helix location and topology.	NO
TMAP	Transmembrane detection based on multiple sequence alignment.	NO
SOSUI	Prediction of transmembrane regions.	NO
TMHMM	Prediction of transmembrane helices in proteins.	YES
TMpred	Prediction of transmembrane regions and protein orientation	NO
TopPred	Topology prediction of membrane proteins	NO
PSLDoc	Uses document classification techniques and incorporates a probabilistic latent semantic analysis with a support vector machine model, for prediction on prokaryotes and eukaryotes.	NO
PSL101	Hybrid prediction method for Gram-negative bacteria that combines a one-versus-one support vector machine(SVM) model and a structure homology approach.	NO
SLP-Local	Predicts localizations for chloroplast, mitochondria, secretory pathway, and other locations (nucleus or cytosol) for eukaryotic proteins, as well as cytoplasm, extracell, and periplasm for Gram negative organisms.	NO
CELLO	Uses a two-level Support Vector Machine system to assign localizations to both prokaryotic and eukaryotic proteins.	NO
PA-SUB	This specialized server available at the PENCE Proteome Analyst site is able to classify Gram-negative, Gram-positive, fungi, plant and animal proteins to many localization sites.	NO
LOCtree	LOCtree is a eukaryotic and prokaryotic localization prediction tool.	NO
subLoc	Uses Support Vector Machine to assign a prokaryotic protein to the cytoplasmic, periplasmic, or extracellular sites, and a eukaryotic protein to the cytoplasmic, mitochondrial, nuclear, or extracellular sites.	NO
EpiLoc	A text-based system for predicting animal, plant and fungal protein subcellular locations.	NO
ProLoc-GO	Utilizes Gene Ontology terms for sequenced-based prediction of subcellular localization.	NO
AAIndexLoc	Predicts protein subcellular localization by using amino acid composition and physicochemical properties.	NO
SCLFA	Predicts localizations by feature vectors based on amino acid composition (frequency) and sequence alignment. Subcellular locations predicted include chloroplast, mitochondria, secretory pathway, and other locations (nucleus or cytosol) for eukaryotic proteins.	NO
SherLoc	Intergrates several sequence and text-based features and provides predictions for plant, animal, and fungal proteins.	NO
SLPS	Subcellular Localization Predicting System, predicts localization using a Nearest Neighbor Algorithm (NNA) and incorporating a protein functional domain profile.	NO
BaCelLo	Predictor for five classes of eukaryotic subcellular localization (secretory pathway, cytoplasm, nucleus, mitochondrion and chloroplast) and it is based on different SVMs organized in a decision tree.	NO
Protein Prowler	A multi-layer classifer system for predicting the subcellular localization of proteins based on their amino acid sequence. It classifies eukaryotic targeting signals as secretory, mitochondrion, chloroplast or other.	NO
pTARGET	Uses amino acid composition and localization-specific Pfam domains to assign a eukaryotic protein to one of nine localization sites.	NO
Golgi predictor	Predicts Golgi Type II membrane proteins and can discriminate between proteins destined for the Golgi apparatus or other post-Golgi locations.	NO
LOCSVMPSI	A eukaryotic localization prediction method that incorporates evolutionary information into its predictions. The method uses PSI-BLAST and support vector machine to generate predictions for up to 12 localization sites.	NO
PSLT	A Bayesian network-based method that predicts human protein localization based on motif/domain co-occurence.	NO
ESLPred	Uses Support Vector Machine and PSI-BLAST to assign eukaryotic proteins to the nucleus, mitochondrion, cytoplasm, or extracellular space.	NO
Nuc-PLoc	A web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM.	NO
NUCLEO	Predicts possible nuclear localization by taking into consideration of dually localized proteins. It uses an SVM-based approach with a custom kernel that employs a composite spectrum (or multiple k-mer) encoding conjoined with a bit vector indicating the presence or absence of a range of sequence motifs known to be important for nuclear proteins.	NO
NucPred	Predicts possible nuclear localization by using a genetic programming-based algorithm.	NO
ProLoc	Predicts subnuclear localizations using an evolutionary SVM based classifier with automatic selection from a large set of physicochemical composition (PCC) features.	NO
Subnuclear Compartments Prediction System	Predicts subnuclear localization by combining an SVM-based system for sequence analysis with a nearest-neighbor classifier using a similarity measure derived from the GO annotation terms for the protein sequences.	NO
NetNES	Predicts nuclear export signals using neural network and HMMs.	NO
PredictNLS	Uses nuclear localization signal motifs to predict whether a protein might be localized to the nucleus.	YES
ChloroP	Prediction of chloroplast transit peptides.	YES
LipoP	Prediction of lipoproteins and signal peptides in Gram negative bacteria.	YES
MITOPROT	Prediction of mitochondrial targeting sequences.	YES
PATS	Prediction of apicoplast targeted sequences.	NO
Plasmit	Prediction of mitochondrial transit peptides in Plasmodium falciparum.	NO
Predotar	Prediction of mitochondrial and plastid targeting sequences.	NO
PTS1	Prediction of peroxisomal targeting signal 1 containing proteins.	NO
SignalIP	Prediction of peptide cleavage sites.	YES

2. Servers calculating physiochemical properties of amino acids

Server	Description	Standalone Available
AACompIdent	Identify a protein by its amino acid composition.	NO
AACompSim	Compare the amino acid composition of a UniProtKB/Swiss-Prot entry with all other entries.	NO
TagIdent	Identify proteins with isoelectric point (pI), molecular weight (Mw) and sequence tag, or generate a list of proteins close to a given pI and Mw.	NO
MultiIdent	Identify proteins with isoelectric point (pI), molecular weight (Mw), amino acid composition, sequence tag and peptide mass fingerprinting data.	NO
ProtParam	Physico-chemical parameters of a protein sequence (amino-acid and atomic compositions, isoelectric point, extinction coefficient, etc.).	NO
Compute pI/Mw	Compute the theoretical isoelectric point (pI) and molecular weight (Mw) from a UniProt Knowledgebase entry or for a user sequence.	NO
IsotopIdent	Predicts the theoretical isotopic distribution of a peptide, protein, polynucleotide or chemical compound.	NO
Aldente	Identify proteins with peptide mass fingerprinting data. A new, fast and powerful tool that takes advantage of Hough transformation for spectra recalibration and outlier exclusion.	NO
Mascot	Peptide mass fingerprint from Matrix Science Ltd., London.	NO
PepMAPPER	Peptide mass fingerprinting tool from UMIST, UK.	NO
ProteinProspector	UCSF tools for peptide masses data (MS-Fit, MS-Pattern, MS-Digest, etc.).	NO
ProFound	Search known protein sequences with peptide mass information from Rockefeller and NY Universities.	NO
Phenyx	Protein and peptide identification/characterization from MS/MS data from GeneBio, Switzerland.	NO
OMSSA	MS/MS peptide spectra identification by searching libraries of known protein sequences.	NO
PepFrag	Search known protein sequences with peptide fragment mass information from Rockefeller and NY Universities.	NO
MALDIPepQuant	Quantify MALDI peptides (SILAC) from Phenyx output.	NO
pIcarver	Visualize theoretical distributions of peptide pI on a given pH range and generate fractions with similar peptide frequencies.	NO
GlycanMass	Calculate the mass of an oligosaccharide structure.	NO
GlycoMod	Predict possible oligosaccharide structures that occur on proteins from their experimentally determined masses (can be used for free or derivatized oligosaccharides and for glycopeptides)	NO

3. Servers Predicting Post-translational Modifications

server	Description	Standalone Available
peptideMass	Calculate masses of peptides and their post-translational modifications for a UniProtKB/Swiss-Prot or UniProtKB/TrEMBL entry or for a user sequence.	NO
FindMod	Predict potential protein post-translational modifications and potential single amino acid substitutions in peptides. Experimentally measured peptide masses are compared with the theoretical peptides calculated from a specified Swiss-Prot entry or from a user-entered sequence, and mass differences are used to better characterize the protein of interest.	NO
FindPept	Identify peptides that result from unspecific cleavage of proteins from their experimental masses, taking into account artefactual chemical modifications, post-translational modifications (PTM) and protease autolytic cleavage	NO
Popitam	dentification and characterization tool for peptides with unexpected modifications (e.g. post-translational modifications or mutations) by tandem mass spectrometry.	NO
DictyOGlyc	Prediction of GlcNAc O-glycosylation sites in Dictyostelium.	NO
NetCGlyc	C-mannosylation sites in mammalian proteins.	NO
NetOGlyc	Prediction of O-GalNAc (mucin type) glycosylation sites in mammalian proteins.	YES
NetGlycate	Glycation of epsilon amino groups of lysines in mammalian proteins.	YES
NetNGlyc	Prediction of N-glycosylation sites in human proteins.	YES
OGPET	Prediction of O-GalNAc (mucin-type) glycosylation sites in eukaryotic (non-protozoan) proteins.	YES
YinOYang	O-beta-GlcNAc attachment sites in eukaryotic protein sequences.	YES
big-PI Predictor	GPI Modification Site Prediction.	NO
GPI-SOM	Identification of GPI-anchor signals by a Kohonen Self Organizing Map.	YES
Myristoylator	Prediction of N-terminal myristoylation by neural networks.	NO
NMT	Prediction of N-terminal N-myristoylation.	NO
CSS-Palm	Palmitoylation site prediction with CSS.	YES
PrePS	Prenylation Prediction Suite.	NO
NetAcet	Prediction of N-acetyltransferase A (NatA) substrates (in yeast and mammalian proteins).	YES
NetPhos	Prediction of Ser, Thr and Tyr phosphorylation sites in eukaryotic proteins.	YES
NetPhosK	Kinase specific phosphorylation sites in eukaryotic proteins.	NO
NetPhosYeast	Serine and threonine phosphorylation sites in yeast proteins.	NO
Sulfinator	Prediction of tyrosine sulfation sites.	NO
SulfoSite	Prediction of tyrosine sulfation sites.	NO
SUMOplot	Prediction of SUMO protein attachment sites.	NO
TermiNator	Prediction of N-terminal modification.	NO
NetPicoRNA	Prediction of protease cleavage sites in picornaviral proteins.	NO
NetCorona	Coronavirus 3C-like proteinase cleavage sites in proteins.	NO
ProP	Arginine and lysine propeptide cleavage sites in eukaryotic protein sequences.	YES
PeptideCutter	Predicts potential protease and cleavage sites and sites cleaved by chemicals in a given protein sequence.	NO

Databases

1.Proteomics (2D and MALDI) Databases

server	Description	Standalone Available
SWISS-2DPAGE	contains data on proteins identified on various 2-D PAGE and SDS-PAGE reference maps..	NO
WORLD-2DPAGE	A Dynamic Portal to query simultaneously World-Wide Gel-based Proteomics Databases.	NO
DOSAC-COBS 2D-PAGE	2D-PAGE server to query 'DOSAC-COBS 2D Page.	NO
Plasmo2Dbase	Plasmodium falciparum 2-DE database at Indian Institute of Science, Bangalore, India.	NO
Cornea-2DPAGE	Human cornea, Department of Molecular Biology, Faculty of Science, Aarhus University, Denmark.	NO
REPRODUCTION-2DPAGE	2D-PAGE database (Human ovary, Mouse testis).	NO
ANU-2DPAGE	2-DE database (Rice anther and Medicago truncatula) of the Australian National University, Canberra, Australia.	NO
OGP-WWW	Oxford GlycoProteomics database (Human platelet).	NO
PHCI-2DPAGE	Parasite host cell interaction 2D-PAGE database.	NO
RAT HEART-2DPAGE	2-DE database of rat heart.	NO
SIENA-2DPAGE	2D-PAGE database (Chlamydia trachomatis, Caenorhabditis elegans, Human breast ductal carcinoma and histologically normal tissue, Human amniotic fluid).	NO

2.Subcellular Location Databases

server	Description	Standalone Available
eSLDB	collects the annotations of subcellular localizations of eukaryotic proteomes based on experimental results, homology, and computational predictions.	NO
PSORTdb	A two-component searchable and browsable database. ePSORTdb contains bacterial proteins of experimentally verified localization used in training and testing of PSORTb. cPSORTdb contains predictions of localization for bacterial genomes..	NO
SUBA	An Arabidopsis subcellular localization database with annotations based on experimental results, literature references, Swiss-Prot annotations, and computational predictions.	NO
FTFLP Database	contains a collection of Arabidopsis protein localizations verified using fluorescent tagging of full-length proteins.	NO
SPdb	A signal peptide database containing a repository of experimentally verified and predicted signal peptides.	NO
NESbase	A database with a collection of nuclear export signals.	NO
LOCATE	A database that houses data describing the membrane organization and subcellular localization of human and mouse proteins.	NO
PDBTM	A database of transmembrane proteins with known 3D structures.	NO
PA-GOSUB	A database collecting the localization predictions made by the Proteome Analyst tool.	NO
Organelle DB	A database of eukaryotic proteins found at various organelles and subcellular structures.	NO
AMPDB	A database of known and predicted mitochondrial proteins in the plant species Arabidopsis thaliana.	NO
MITOMAP	A database of information related to the human mitochondrial genome.	NO
DBSubLoc	A dataset of proteins with annotated subcellular localizations according to SWISS-PROT and PIR.	NO
LOCtarget	A database of LOCtree predictions for structural genomics targets.	NO
LOC3d	A database of predicted localizations for eukaryotic proteins with 3D structures.	NO
LOCkey	Contains predicted localizations for the human, Arabidopsis, fly, yeast and worm genomes based on Swiss-Prot keywords.	NO
LOChom	Is a database of predicted localizations based on homology to experimentally annotated proteins.	NO
SignalIp	The dataset of prokaryotic and eukaryotic secreted and non-secreted proteins used to train SignalP, and also used to train PSORTb's signal peptide prediction module.	NO
SignalPeptides	The dataset of prokaryotic and eukaryotic secreted and non-secreted proteins used in an independent evaluation of several signal peptide prediction methods, and used to test PSORTb's signal peptide prediction module.	NO

3.Post-translation Modifications Databases

server	Description	Standalone Available
PRENbase	Database of Prenylated Proteins.	NO