IMTECH
INSTITUTE OF MICROBIAL TECHNOLOGY
BIOINFORMATICS CENTER

GENEBENCH: EVALUATION OF GENE FINDERS AND DATASET CREATION SERVER





Introduction:

The performance of different gene-finders were evaluated on the accuracy measures defined below. The various measures have been taken as defined in Burset and Guigo, 1996 and Rogic et al, 2001.



  • NUCLEOTIDE LEVEL ACCURACY:

    1. True Positives (TP):

      The number of nucleotide bases that are actually coding and are predicted as coding.

    2. True Negatives (TN):

      The number of nucleotide bases that are actually non-coding and are predicted as non-coding.

    3. False Positives (FP):

      The number of nucleotide bases that are actually non-coding and are predicted as coding.

    4. False Negatives (FN):

      The number of nucleotide bases that are actually coding and are predicted as non-coding.

    5. Sensitivity (Sn):

      .

    6. Specificity (Sp):

      .

    7. Correlation Coefficient (CC):

      .

    8. Simple Matching Coefficient (SMC):

      .

    9. Average Conditional Probability (ACP):

      .

    10. Approximate Correlation (AC):

      .



  • EXON LEVEL ACCURACY:

    1. Sensitivity (ESn):

      .

    2. Specificity (ESp):

      .

    3. Average (EAvg):

      .

    4. Correct Exons (CR):

      Proportion of predicted exons whose both ends are correct.

    5. Partially Correct Exons (PC):

      Proportion of predicted exons whose either 5' or 3' alone is correct.

    6. Overlapping Exons (OL):

      Proportion of predicted exons whose end are incorrectly predicted but overlaps an actual exon.

    7. Missed Exons (ME):

      Proportion of actual exons which do not overlap any of the predicted exons.

    8. Wrong Exons (WE):

      Proportion of predicted exons which do not overlap any of the actual exons.



  • PROTEIN LEVEL ACCURACY:

    1. Sensitivity (Psen):

      Ratio of number of predicted protein-coding genes whose entire protein translation product correctly matches the actual protein product of the corresponding genes against number of actual protein-coding genes.

    2. Specificity (Pspe):

      Ratio of number of protein-coding genes correctly predicted by the programs against total number of predicted protein-coding genes.

New Accuracy Measures:

Two new accuracy measures introduced recently are used here to evaluate the predictions. These are the q8 and q9. For detailes see reference (Zhang and Zhang, 2002).



new definitions
  • q8:

    q8
  • q9:

    q9