Evaluation of The MHC Binding Peptide Prediction Methods

MHCBench
Evaluation of MHC Binding Peptide Prediction Algorithms

Home

Evaluation

Help

Developers

About

Threshold Independent [TIE]

Threshold Dependent [TDE]

PARAMETERS

The predictive performance of any method can be calculated using threshold independent or threshold dependent parameters. Both parameters has their own advantages and drawbacks. Below are the descriptions and matematical equation used in calculating these parameters

A. The Threshold dependent parameters

In a binary prediction model (e.g. presence / absence) such as a 2-group discriminant analysis there are two possible prediction errors: false positives (FP) and false negatives (FN). The performance of a binary prediction model is normally summarized in a confusion or error matrix that cross-tabulates the observed and predicted + / - patterns.

	Actual +	Actual -
Predicted +	a	b
Predicted -	c	d

The left upper cell represents the number of peptides that have been correctly predicted as binders (TP), while the right lower cell represents the number of correctly predicted non-binders (TN). The other two cells are the number of peptides for which prediction and reality disagree. That is the number of binders predicted as non-binders (FN) and the number of non-binders predicted as binders (FP).
Several measures have been proposed to capture the information in a 2X2 table to a single scalar. The most widely used measures of are Sensitivity (Sn), Specificity (Sp), Positive Porbability Value (PPV) and Negative Probability Value (NPV).

These are defined as:

Sensitivity:
Also known as coverage of binders, the sensitivity is the percent of binders that are correctly predicted as binders. Higher sensitivity means that almost all of the potential binders will be included in the predicted results. However, at the same time some of the non-binders will also be predicted as binders. Therefore, the coverage is increased at the cost of PPV.

Specificity:
The specificity is the percent of correctly predicted as non-binders. It is similar to sensitivity. The sensitivity is for binders and specificity is for non-binders.

Positive Porbability Value (PPV):
It is the probability that a predicted binder will actually a binder. In other words, the PPV gives the confidence in predicted results. Higher probability means that there is very much chance that a predicted binder will actually be a binder. At the same time at higher PPV, we may loose some potential binders and the sensitivity or coverage may be less.

Negative Probability Value (NPV):
It is the probability that a predicted non-binder will actually be a non-binder. It is similar to senstivity but specific for non-binders.

Parameter	Brief description	Formulae
Sensitivity (Sn)	The proportion of correctly predicted binders	(a/(a + c))*100
Specificity (Sp)	The proportion of correctly predicted non-binders	(d/(b + d))*100
Positive Porbability Value (PPV)	The probability that a predicted binder will actually be a binder	(a/(a + b))*100
Negative Porbability Value (NPV)	The probability that a predicted non-binder will actually be a non-binder	(d/(c + d))*100

Sn = P( F(x) = b|x = b)

Sp = P(x = b | F(x) = b)

Accuracy

Dfactor

Correlation Coefficient (CC)

Parameter	Brief description	Formulae
Accuracy	The proportion of correctly predicted peptides (both binders and non-binders)	((a + d)/(a + b + c + d))*100
Dfactor	The sumation of sensitivity and specificity	((a/(a + c)) + (d/(b + d)))*100

Corelation Coefficient (CC)

B. The Threshold independent parameters

One problem with the threshold dependent measures is their failure to use all of the information provided by a classifier. Although dichotomous classifications are convenient for decision making they can introduce distortions. The medical literature has recognized these problems and other measures have been introduced. In particular, the use of threshold-independent Receiver Operating Characteristic (ROC) Plots has received considerable attention. (ROC plots are now included in SPSS v9.0).
A ROC plot is obtained by plotting all sensitivity values (true positive fraction) on the y axis against their equivalent (1 - specificity) values (false positive fraction) for all available thresholds on the x axis, as in the example shown below.