FTGPRED: Gene Identification using Fourier Transformation

Performance of Gene Predictors on 11 different Genomes

Annotation from Genbank files were considered for evaluation. These annotation were obtained as *.ptt files from ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/"organism"/"gi number".ptt. Measures that were calculated were the---True positives (Number of actual ORFs/Genes that were correctly predicted by the predictors. Correct predictions are those predictions that correctly predicts at least one or both of the ends, 5` or 3`, of the ORF/Gene.); False Negatives (Number of actual ORFs/Genes that were missed by the predictors); False Positives (Number of wrong ORFs/Genes that were predicted as correct by the predictors); Sensitivity is defined as Sen=True positives/(True Positives + False Negatives); Specificity is defined as Spe=True positives/(True Positives + False Positives) (Besemer et al., 2001; Aggarwal and Ramaswamy, 2002).


PERFORMANCE ON GENBANK ANNOTATED ORFs/GENEs.:
Performance of programs on genbank annotated ORFs/Genes. WWW-based servers of the programs were used for all predictions except for Glimmer which was locally installed on a Sun Solaris system. All the accuracy measures were calculated as described above. Only the predicted ORFs above 300 bp were taken for evaluation for FFT programs. For the FTGPred programs, the default threshold for spectral power used here to discriminate protein-coding ORFs from non-coding ORFs were derived from a previous evaluation on Vibrio cholerae genomic DNA (Issac et al., 2002). A prediction is considered true only if the position of start or stop codons or both ends are exactly the same as given in the Genbank annotation.

Aeropyrum pernix:

------------------------------------------------------------------------------------------------------------
   PROGRAMS             |   START CODON   |   STOP CODON   |   BOTH ENDS   |               |               |
                        |-----------------|----------------|---------------| ACTUAL ORFs   |PREDICTED ORFs |
                        |  Sen   |  Spe   |  Sen  |  Spe   |  Sen  |  Spe  |               |               |
------------------------------------------------------------------------------------------------------------
GeneMarkS               |  0.294 |  0.321 |  0.828|  0.905 |  0.293|  0.321|    1841       |     1684      |
EasyGene v 1.0          |  0.309 |  0.341 |  0.841|  0.929 |  0.308|  0.340|    1841       |     1667      |
Glimmer                 |  0.561 |  0.565 |  0.854|  0.860 |  0.561|  0.565|    1841       |     1827      |
GeneMark.hmm            |  0.301 |  0.326 |  0.835|  0.904 |  0.300|  0.325|    1841       |     1700      |
------------------------------------------------------------------------------------------------------------
FTG                     |  0.397 |  0.365 |  0.810|  0.746 |  0.397|  0.365|    1841       |     1998      |
Genescan                |  0.395 |  0.365 |  0.809|  0.747 |  0.395|  0.365|    1841       |     1992      |
Lengthen-Shuffle        |  0.369 |  0.363 |  0.767|  0.755 |  0.369|  0.363|    1841       |     1871      |
------------------------------------------------------------------------------------------------------------

Bacillus anthracis AMES:
------------------------------------------------------------------------------------------------------------
   PROGRAMS             |   START CODON   |   STOP CODON   |   BOTH ENDS   |               |               |
                        |-----------------|----------------|---------------| ACTUAL ORFs   |PREDICTED ORFs |
                        |  Sen   |  Spe   |  Sen  |  Spe   |  Sen  |  Spe  |               |               |
------------------------------------------------------------------------------------------------------------
GeneMarkS               |  0.705 |  0.682 |  0.949|  0.919 |  0.704|  0.682|    5311       |     5483      |
EasyGene v 1.0          |  0.827 |  0.826 |  0.935|  0.933 |  0.827|  0.826|    5311       |     5320      |
Glimmer                 |  0.616 |  0.589 |  0.954|  0.913 |  0.616|  0.588|    5311       |     5555      |
GeneMark.hmm**          |   -    |   -    |   -   |   -    |   -   |   -   |               |               |
------------------------------------------------------------------------------------------------------------
FTG                     |  0.555 |  0.581 |  0.799|  0.836 |  0.555|  0.581|    5311       |     5073      |
Genescan                |  0.555 |  0.585 |  0.799|  0.844 |  0.554|  0.585|    5311       |     5031      |
Lengthen-Shuffle        |  0.507 |  0.593 |  0.727|  0.849 |  0.507|  0.592|    5311       |     4545      |
------------------------------------------------------------------------------------------------------------

** Results for E.coli using GeneMark.hmm were not obtained due to inexplicable server input error.


Borrelia burgdorferi:
------------------------------------------------------------------------------------------------------------
   PROGRAMS             |   START CODON   |   STOP CODON   |   BOTH ENDS   |               |               |
                        |-----------------|----------------|---------------| ACTUAL ORFs   |PREDICTED ORFs |
                        |  Sen   |  Spe   |  Sen  |  Spe   |  Sen  |  Spe  |               |               |
------------------------------------------------------------------------------------------------------------
GeneMarkS               |  0.717 |  0.716 |  0.957|  0.955 |  0.716|  0.717|    851        |     852       |
EasyGene v 1.0          |  0.718 |  0.752 |  0.948|  0.993 |  0.718|  0.752|    851        |     813       |
Glimmer                 |  0.933 |  0.917 |  0.967|  0.950 |  0.933|  0.917|    851        |     866       |
GeneMark.hmm            |  0.729 |  0.743 |  0.955|  0.974 |  0.729|  0.743|    851        |     835       |
------------------------------------------------------------------------------------------------------------
FTG                     |  0.626 |  0.670 |  0.859|  0.919 |  0.626|  0.670|    851        |     795       |
Genescan                |  0.627 |  0.673 |  0.860|  0.922 |  0.627|  0.673|    851        |     794       |
Lengthen-Shuffle        |  0.592 |  0.677 |  0.823|  0.940 |  0.592|  0.677|    851        |     745       |
------------------------------------------------------------------------------------------------------------

Campylobacter jejuni:
------------------------------------------------------------------------------------------------------------
   PROGRAMS             |   START CODON   |   STOP CODON   |   BOTH ENDS   |               |               |
                        |-----------------|----------------|---------------| ACTUAL ORFs   |PREDICTED ORFs |
                        |  Sen   |  Spe   |  Sen  |  Spe   |  Sen  |  Spe  |               |               |
------------------------------------------------------------------------------------------------------------
GeneMarkS               |  0.908 |  0.878 |  0.990|  0.957 |  0.906|  0.876|    1634       |    1690       |
EasyGene v 1.0          |  0.891 |  0.892 |  0.974|  0.975 |  0.890|  0.891|    1634       |    1633       |
Glimmer                 |  0.851 |  0.805 |  0.990|  0.937 |  0.850|  0.804|    1634       |    1727       |
GeneMark.hmm            |  0.912 |  0.879 |  0.989|  0.952 |  0.911|  0.877|    1634       |    1697       |
------------------------------------------------------------------------------------------------------------
FTG                     |  0.761 |  0.803 |  0.885|  0.934 |  0.760|  0.802|    1634       |    1548       |
Genescan                |  0.761 |  0.806 |  0.886|  0.938 |  0.761|  0.805|    1634       |    1544       |
Lengthen-Shuffle        |  0.736 |  0.814 |  0.851|  0.942 |  0.735|  0.814|    1634       |    1476       |
------------------------------------------------------------------------------------------------------------

Chlamydophila pneumoniae AR39:
------------------------------------------------------------------------------------------------------------
   PROGRAMS             |   START CODON   |   STOP CODON   |   BOTH ENDS   |               |               |
                        |-----------------|----------------|---------------| ACTUAL ORFs   |PREDICTED ORFs |
                        |  Sen   |  Spe   |  Sen  |  Spe   |  Sen  |  Spe  |               |               |
------------------------------------------------------------------------------------------------------------
GeneMarkS               |  0.739 |  0.771 |  0.916|  0.956 |  0.738|  0.770|    1112       |    1066       |
EasyGene v 1.0          |  0.692 |  0.766 |  0.871|  0.965 |  0.691|  0.765|    1112       |    1004       |
Glimmer                 |  0.790 |  0.760 |  0.930|  0.894 |  0.787|  0.757|    1112       |    1156       |
GeneMark.hmm            |  0.743 |  0.769 |  0.911|  0.943 |  0.742|  0.768|    1112       |    1074       |
------------------------------------------------------------------------------------------------------------
FTG                     |  0.641 |  0.697 |  0.826|  0.898 |  0.641|  0.697|    1112       |    1023       |
Genescan                |  0.643 |  0.701 |  0.832|  0.907 |  0.643|  0.701|    1112       |    1020       |
Lengthen-Shuffle        |  0.580 |  0.718 |  0.739|  0.915 |  0.580|  0.718|    1112       |     898       |
------------------------------------------------------------------------------------------------------------

Escherichia coli O57:H7:
------------------------------------------------------------------------------------------------------------
   PROGRAMS             |   START CODON   |   STOP CODON   |   BOTH ENDS   |               |               |
                        |-----------------|----------------|---------------| ACTUAL ORFs   |PREDICTED ORFs |
                        |  Sen   |  Spe   |  Sen  |  Spe   |  Sen  |  Spe  |               |               |
------------------------------------------------------------------------------------------------------------
GeneMarkS               |  0.659 |  0.728 |  0.868|  0.960 |  0.658|  0.727|    5361       |     4850      |
EasyGene v 1.0***       |  0.360 |  0.781 |  0.455|  0.986 |  0.360|  0.780|    5361       |     2474      |
EasyGene v 1.0(- strand)|  0.721 |  0.780 |  0.911|  0.986 |  0.720|  0.780|    2678       |     2473      |
Glimmer                 |  0.675 |  0.620 |  0.928|  0.852 |  0.674|  0.619|    5361       |     5836      |
GeneMark.hmm**          |   -    |   -    |   -   |   -    |   -   |   -   |               |               |
------------------------------------------------------------------------------------------------------------
FTG                     |  0.646 |  0.491 |  0.846|  0.643 |  0.645|  0.491|    5361       |     7051      |
Genescan                |  0.642 |  0.494 |  0.841|  0.647 |  0.642|  0.494|    5361       |     6966      |
Lengthen-Shuffle        |  0.591 |  0.509 |  0.775|  0.667 |  0.591|  0.509|    5361       |     6229      |
------------------------------------------------------------------------------------------------------------

*** The positive strand predictions for E coli. using EasyGene were not obtained due to inexplicable server error.
** Results for E.coli using GeneMark.hmm were not obtained due to inexplicable server input error.


Haemophilus influenzae RD:
------------------------------------------------------------------------------------------------------------
   PROGRAMS             |   START CODON   |   STOP CODON   |   BOTH ENDS   |               |               |
                        |-----------------|----------------|---------------| ACTUAL ORFs   |PREDICTED ORFs |
                        |  Sen   |  Spe   |  Sen  |  Spe   |  Sen  |  Spe  |               |               |
------------------------------------------------------------------------------------------------------------
GeneMarkS               |  0.860 |  0.809 |  0.988|  0.929 |  0.857|  0.806|    1657       |     1762      |
EasyGene v 1.0          |  0.871 |  0.834 |  0.982|  0.939 |  0.870|  0.832|    1657       |     1702      |
Glimmer                 |  0.766 |  0.679 |  0.989|  0.876 |  0.763|  0.676|    1657       |     1870      |
GeneMark.hmm            |  0.862 |  0.788 |  0.987|  0.902 |  0.858|  0.785|    1657       |     1812      |
------------------------------------------------------------------------------------------------------------
FTG                     |  0.798 |  0.790 |  0.904|  0.895 |  0.797|  0.789|    1657       |     1673      |
Genescan                |  0.801 |  0.795 |  0.907|  0.900 |  0.800|  0.793|    1657       |     1670      |
Lengthen-Shuffle        |  0.737 |  0.805 |  0.832|  0.908 |  0.736|  0.804|    1657       |     1518      |
------------------------------------------------------------------------------------------------------------

Helicobacter pylori 26695:
------------------------------------------------------------------------------------------------------------
   PROGRAMS             |   START CODON   |   STOP CODON   |   BOTH ENDS   |               |               |
                        |-----------------|----------------|---------------| ACTUAL ORFs   |PREDICTED ORFs |
                        |  Sen   |  Spe   |  Sen  |  Spe   |  Sen  |  Spe  |               |               |
------------------------------------------------------------------------------------------------------------
GeneMarkS               |  0.801 |  0.794 |  0.959|  0.950 |  0.799|  0.792|    1576       |     1590      |
EasyGene v 1.0          |  0.780 |  0.804 |  0.945|  0.974 |  0.779|  0.802|    1576       |     1529      |
Glimmer                 |  0.750 |  0.655 |  0.966|  0.844 |  0.748|  0.653|    1576       |     1805      |
GeneMark.hmm            |  0.780 |  0.760 |  0.957|  0.933 |  0.779|  0.758|    1576       |     1618      |
------------------------------------------------------------------------------------------------------------
FTG                     |  0.672 |  0.678 |  0.862|  0.869 |  0.670|  0.676|    1576       |     1562      |
Genescan                |  0.673 |  0.683 |  0.862|  0.874 |  0.671|  0.681|    1576       |     1554      |
Lengthen-Shuffle        |  0.587 |  0.699 |  0.746|  0.888 |  0.585|  0.697|    1576       |     1323      |
------------------------------------------------------------------------------------------------------------

Helicobacter pylori J99:
------------------------------------------------------------------------------------------------------------
   PROGRAMS             |   START CODON   |   STOP CODON   |   BOTH ENDS   |               |               |
                        |-----------------|----------------|---------------| ACTUAL ORFs   |PREDICTED ORFs |
                        |  Sen   |  Spe   |  Sen  |  Spe   |  Sen  |  Spe  |               |               |
------------------------------------------------------------------------------------------------------------
GeneMarkS               |  0.869 |  0.853 |  0.978|  0.960 |  0.863|  0.847|    1491       |     1518      |
EasyGene v 1.0          |  0.869 |  0.873 |  0.974|  0.978 |  0.863|  0.867|    1491       |     1484      |
Glimmer                 |  0.751 |  0.645 |  0.985|  0.845 |  0.745|  0.640|    1491       |     1737      |
GeneMark.hmm            |  0.859 |  0.815 |  0.982|  0.932 |  0.853|  0.810|    1491       |     1571      |
------------------------------------------------------------------------------------------------------------
FTG                     |  0.686 |  0.660 |  0.889|  0.855 |  0.680|  0.655|    1491       |     1549      |
Genescan                |  0.687 |  0.664 |  0.892|  0.861 |  0.682|  0.659|    1491       |     1544      |
Lengthen-Shuffle        |  0.600 |  0.676 |  0.772|  0.870 |  0.595|  0.670|    1491       |     1323      |
------------------------------------------------------------------------------------------------------------

Mycobacterium tuberculosis H37:Rv:
------------------------------------------------------------------------------------------------------------
   PROGRAMS             |   START CODON   |   STOP CODON   |   BOTH ENDS   |               |               |
                        |-----------------|----------------|---------------| ACTUAL ORFs   |PREDICTED ORFs |
                        |  Sen   |  Spe   |  Sen  |  Spe   |  Sen  |  Spe  |               |               |
------------------------------------------------------------------------------------------------------------
GeneMarkS               |  0.598 |  0.591 |  0.956|  0.944 |  0.597|  0.590|     3927      |     3976      |
EasyGene v 1.0          |  0.679 |  0.714 |  0.929|  0.977 |  0.676|  0.711|     3927      |     3735      |
Glimmer                 |  0.569 |  0.481 |  0.966|  0.816 |  0.565|  0.477|     3927      |     4649      |
GeneMark.hmm            |  0.573 |  0.569 |  0.952|  0.944 |  0.572|  0.567|     3927      |     3959      |
------------------------------------------------------------------------------------------------------------
FTG                     |  0.460 |  0.197 |  0.881|  0.378 |  0.458|  0.196|     3927      |     9156      |
Genescan                |  0.459 |  0.198 |  0.879|  0.380 |  0.457|  0.198|     3927      |     9078      |
Lengthen-Shuffle        |  0.445 |  0.209 |  0.853|  0.401 |  0.444|  0.209|     3927      |     8349      |
------------------------------------------------------------------------------------------------------------

Mycoplasma genitalium:
------------------------------------------------------------------------------------------------------------
   PROGRAMS             |   START CODON   |   STOP CODON   |   BOTH ENDS   |               |               |
                        |-----------------|----------------|---------------| ACTUAL ORFs   |PREDICTED ORFs |
                        |  Sen   |  Spe   |  Sen  |  Spe   |  Sen  |  Spe  |               |               |
------------------------------------------------------------------------------------------------------------
GeneMarkS               |  0.665 |  0.345 |  0.661|  0.343 |  0.246|  0.127|     484       |     934       |
EasyGene v 1.0          |  0.837 |  0.805 |  0.975|  0.938 |  0.835|  0.803|     484       |     503       |
Glimmer                 |  0.705 |  0.296 |  0.696|  0.293 |  0.250|  0.105|     484       |    1152       |
GeneMark.hmm            |  0.045 |  0.139 |  0.091|  0.278 |  0.025|  0.076|     484       |     158       |
------------------------------------------------------------------------------------------------------------
FTG                     |  0.479 |  0.494 |  0.413|  0.426 |  0.211|  0.217|     484       |     470       |
Genescan                |  0.473 |  0.490 |  0.415|  0.430 |  0.213|  0.221|     484       |     467       |
Lengthen-Shuffle        |  0.419 |  0.499 |  0.353|  0.420 |  0.184|  0.219|     484       |     407       |
------------------------------------------------------------------------------------------------------------


PERFORMANCE ON CONFIRMED ORFs/GENEs.:

Performance of programs on confirmed ORFs/Genes where hypothetical or putative ORFs/Genes were removed from calculation. Almost all newly sequenced genomes are annotated using gene prediction programs. Databases keep this annotation for providing to public. Reliability of the prediction accuracy is therefore questionable as the program that was used for annotating the genome will give the best performance. For accurately computing the performance of gene predictors, only the experimentally confirmed genes are used. Therefore, annotations from the GenBank file for hypothetical and putative ORFs are removed in this study. True Positives are those predictions where at least one of the 5` or 3` end predictions are correct. False Negatives are those Genes/ORFs whose neither ends are corectly predicted by the predictors. A different criteria was used to define FN here for including overlapping predictions in the same frame as true predictions. WWW-based servers of the programs were used for all predictions except for Glimmer which was locally installed on a Sun Solaris system.

GeneMarkS:
------------------------------------------------------------------------------
ORGANISM                |   TOTAL GENES  |  TRUE POSITIVES  | FALSE NEGATIVES|
------------------------------------------------------------------------------
A. pernix               |     655        |      643         |      12        |
B. anthracis            |    3441        |     3413         |      28        |
B. burgdorferi          |     749        |      713         |      36        |
C. pneumoniae AR39      |     572        |      568         |       4        |
C. jejuni               |    1308        |     1301         |       7        |
E. coli                 |    3461        |     3233         |     228        |
H. influenzae Rd        |    1233        |     1229         |       4        |
H. pylori 26695         |     896        |      888         |       8        |
H. pylori J99           |     874        |      868         |       6        |
M. tuberculosis O157:H7 |    1482        |     1455         |      27        |  
M. genitalium           |     318        |      273         |      45        |
------------------------------------------------------------------------------

GeneMark.hmm:
------------------------------------------------------------------------------
ORGANISM                |   TOTAL GENES  |  TRUE POSITIVES  | FALSE NEGATIVES|
------------------------------------------------------------------------------
A. pernix               |     655        |       648        |       7        |
B. anthracis**          |     ---        |       ---        |     ---        |
B. burgdorferi          |     749        |       713        |      36        |
C. pneumoniae AR39      |     572        |       565        |       7        |
C. jejuni               |    1308        |      1299        |       9        |
E. coli**               |     ---        |       ---        |     ---        |
H. influenzae Rd        |    1233        |      1229        |       4        |
H. pylori 26695         |     896        |       889        |       7        |
H. pylori J99           |     874        |       868        |       6        |
M. tuberculosis O157:H7 |    1482        |      1454        |      28        |  
M. genitalium           |     318        |        45        |     273        |
------------------------------------------------------------------------------

** Results for E.coli and B.anthracis were not obtained due to inexplicable server input error. Both the genomes are above 5 Mbp in size.


EasyGene 1.0:
------------------------------------------------------------------------------
ORGANISM                |   TOTAL GENES  |  TRUE POSITIVES  | FALSE NEGATIVES|
------------------------------------------------------------------------------
A. pernix               |     655        |      645         |      10        |
B. anthracis            |    3441        |     3404         |      37        |
B. burgdorferi          |     749        |      706         |      43        |
C. pneumoniae AR39      |     572        |      563         |       9        |
C. jejuni               |    1308        |     1284         |      24        |
E. coli***              |    3461        |     1713         |    1748        |
H. influenzae Rd        |    1233        |     1229         |       4        |
H. pylori 26695         |     896        |      887         |       9        |
H. pylori J99           |     874        |      868         |       6        |
M. tuberculosis O157:H7 |    1482        |     1447         |      35        |  
M. genitalium           |     318        |      312         |       6        |
------------------------------------------------------------------------------

*** The positive strand predictions for E coli. using EasyGene were not obtained due to inexplicable server error.


Glimmer:

------------------------------------------------------------------------------
ORGANISM                |   TOTAL GENES  |  TRUE POSITIVES  | FALSE NEGATIVES|
------------------------------------------------------------------------------
A. pernix               |     655        |      651         |       4        |
B. anthracis            |    3441        |     3405         |      36        |
B. burgdorferi          |     749        |      722         |      27        |
C. pneumoniae AR39      |     572        |      563         |       9        |
C. jejuni               |    1308        |     1299         |       9        |
E. coli                 |    3461        |     3352         |     109        |
H. influenzae Rd        |    1233        |     1231         |       2        |
H. pylori 26695         |     896        |      886         |      10        |
H. pylori J99           |     874        |      866         |       8        |
M. tuberculosis O157:H7 |    1482        |     1463         |      19        |  
M. genitalium           |     318        |      273         |      45        |
------------------------------------------------------------------------------


Detection of gene/ORFs missed by different programs using Fast Fourier Transformation algorithms.