The sequencing techniques are increasingly becoming more advanced. Hence the number of sequenced genomes is also increasing exponentially. One of the major challenges in contemporary science is to annotate the available sequence data. Annotation defines the coding regions in the genome as well as their physical location. It also provides the number and spatial distribution of repeat regions and the evolutionary information about the whole genomes.
Several computational tools have been developed to cut down time and expense involved in the experimental procedure of annotation. Computational resources at CRDD have been classified in following categories:
A web server for locating probable
protein coding region in nucleotide sequence using fourier tranform approach (Issac,
B., Singh, H., Kaur, H. and Raghava, G.P.S. (2002) Bioinformatics 18:196).
This server allows to predict gene (protein coding
regions) in eukaryote genomes that includes introns and exons, using similarity
aided (double) and consensus Ab Intion methods. (Issac B, Raghava GP. (2004) Genome Res.
14(9):1756-66)
A genome wide blast server. It allow
user to search ther sequence against sequenced genomes and annonated
proteomes. This integrate various tools which allows analysys of BLAST SEARCH.
Spectral Repeat Finder (SRF) is a
program to find repeats through an analysis of the power spectrum of a given
DNA sequence. By repeat we mean the repeated occurrence of a segment of N
nucleotides within a DNA sequence. SRF is an ab initio technique as no prior
assumptions need to be made regarding either the repeat length, its fidelity,
or whether the repeats are in tandem or not (Sharma D, Issac B, Raghava GP, Ramaswamy R. (2004) Bioinformatics.
20(9):1405-12)
Genome Wise Sequence Similarity Search
using FASTA. It allow user to search their sequence against sequenced genomes
and their product proteome. This integrate various tools which allows
analysys of FASTA search (Issac, B. and Raghava, G.P.S. (2002)
Biotechniques 33:548-56).
MyPattern Finder is a program for detection of a 'motif'
in DNA sequence by using an exact search method (Option A (1.0))
or an alignment technique (Option B
(1.0)).
Meta-servers, web-servers and mirroring of
web-servers and databases
A. Krogh: In Proc. of
Fifth Int. Conf. on Intelligent Systems for Molecular Biology, ed. Gaasterland, T. et al., Menlo Park, CA: AAAI Press, 1997,
pp. 179-186.
PoonamSinghal,
B. Jayaram, Surjit B.
Dixit and David L. Beveridge. Prokaryotic Gene
Finding based on Physicochemical Characteristics of Codons
Calculated from Molecular Dynamics Simulations.Biophysical
Journal,2008,Volume:94 Issue:11, 4173-4183 ]
GlimmerHMM
is based on a Generalized Hidden Markov Model (GHMM). Although the gene
finder conforms to the overall mathematical framework of a GHMM, additionally
it incorporates splice site models adapted from the GeneSplicer program
and a decision tree adapted from GlimmerM. It also
utilizes Interpolated Markov Models for the coding and noncodingmodels . Currently, GlimmerHMM's
GHMM structure includes introns of each phase, intergenic regions, and four types of exons
(initial, internal, final, and single).
GeneZilla
is based on the Generalized Hidden Markov Model (GHMM). It evolved out of the
ab initio eukaryotic gene finder TIGRscan, which was developed at The Institute for
Genomic Research.
GeneZilla (formerly "TIGRscan") is briefly described in:
Majoros W, et al.
(2004)
Bioinformatics
20, 2878-2879
The novel decoding
algorithm used by GeneZilla is described in:
Twinscan: Mammals,
Caenorhabditis (worm), Dicot
plants, and Cryptococci. N-SCAN: human and
Drosophila
TWINSCAN extends
the probability model of GENSCAN, allowing it to exploit homology between two
related genomes. Separate probability models are used for conservation in exons, introns, splice sites,
and UTRs, reflecting the differences among their patterns of evolutionary
conservation.
N-SCAN (a.k.a. TWINSCAN
3.0) model the phylogenetic relationships between
the aligned genome sequences, context dependent substitution rates, and
insertions and deletions. N-SCAN Is created and used to generate predictions
for the entire human genome and the genome of the fruit fly Drosophila melanogaster.
TWINSCAN:Gross and Brent. J Comput
Biol. 2006 Mar;13(2):379-93.
Korf I, N-SCAN: Flicek et al
Bioinformatics. 2001;17Suppl
1:S140-8.
Manatee is a web-based
gene evaluation and genome annotation tool that can view, modify, and store
annotation for prokaryotic and eukaryotic genomes. The Manatee interface
allows biologists to quickly identify genes and make high quality functional
assignments using a multitude of genome analyses tools. These tools consist
of, but are not limited to GO classifications, BER and blast search data, paralogous families, and annotation suggestions generated
from automated analysis.
(Coding Region
Identification Tool Invoking Comparative Analysis)
Prokaryotic
CRITICA combines
traditional approaches to the problem with a novel comparative analysis. If,
in a nucleotide alignment, a pair of ORFs can be found in which the
conceptual translated products are more conserved than would be expected from
the amount of conservation at the nucleotide level, this is evolutionary
evidence that the DNA sequences are protein coding. Regions found by this
method are used to generate traditional dicodon
frequencies for further analysis and give the prediction about a probable
protein coding region.
Badger and Olsen. Molecular
Biology and Evolution, 16(4):512-524. 1999.
Sgp2
predict genes by comparing anonymous genomic sequences from two different
species. Further it combines tblastx, a sequence similarity search
program, with geneid, an "ab
initio" gene prediction program.
EuGène
exploit probabilistic models like Markov models for discriminating coding
from non coding sequences or to discriminate effective splice sites from
false splice sites (using various mathematical models).