MHC MOLECULES AND PREDICTION ALGORITHMS









MHC molecules are cell surface glycoproteins, which take active part in host immune reactions. The counterparts in human are known as HLA (Human Leukocyte Antigens). The discovery of MHC molecules is related to transplantation rejection. Their functions were unraveled by experiments in inbred and congenic mice strains. 
The MHC molecules are coded by three classes of genes. Class I and Class II gene products are directly associated with immune reactions whereas Class III gene products play an indirect role.
Class I genes encodes the principle subunits of MHC I glycoprotein called H2-K, H2-d, H2-l in mice and HLA- A, B, C in humans. Proteins encoded by these genes are present virtually on all nucleated cells. Class I molecules consists of a heavy peptide chain of 43kDa non-covalently bonded to a smaller 11 kDa fragment called beta-2 microglobulin. The largest part of the heavy chain is organized into three globular domains (alpha-1, 2 and 3 ) which protrudes from the cell membrane. A hydrophobic segment of alpha chain anchors the molecule to the membrane. X-ray analysis has provided an exciting leap forward in our understanding of MHC functions. Both beta-2 microglobulin and alpha-3 domain resembles classical immunoglobulin fold. However, alpha-1 and alpha-2 domains form an utterly surprising structure composed of two extended alpha helices, above a floor created by peptide strands held together in a beta pleated sheet. These proteins, which elicit an intense response of CD8+ T cells, play a major role in graft rejection or infected cell clearance.
Class II genes encodes cell surface glycoproteins which are structurally very similar to MHC Class I molecules. These molecules are expressed only on Antigen Presenting Cells (APC). Together with antigenic fragments, the Class II proteins form epitopes that are recognized by T-helper cells (CD4+). Hence MHC Class II proteins are critically involved in response to nearly all antigens.
Class III genes encode three proteins of The Complement Cascade (C2, C4 and Bf) and two cytotoxic proteins (TNF and lymphotoxin). These proteins are involved in diverse immune reactions, directly or indirectly.
Crystallographic and binding studies revealed similar conformation of peptide ligands bound to both Class I & II molecules. Class I molecules interact  with the N- and C- terminals of the bound peptide, leaving a bulge in the middle. These N- & C- terminal interactions together with closed peptide binding groove restricts the length of interacting peptide to 8-10 amino acids. However peptide binding groove of Class II molecules is open at both ends and the interactions of peptide are more diffuse thereby a more variable length is allowed (generally 10 -28 amino acids). The involvement of MHC class-II in response to almost all antigens and the variable length of interacting peptides makes the study of MHC Class II molecules very interesting. MHC molecules have been well characterized in terms of their role in immune reactions. They bind to some of the peptide fragments generated after proteolytic cleavage of antigen. This binding acts like red flags for antigen specific T-cells to generate immune response against the parent antigen. So a small fragment of antigen can induce immune response against whole antigen. This theme is implemented in designing subunit and synthetic peptide vaccines. The question that remained unanswered in this context was "How to identify the regions which can bind to MHC and evoke a T cell response ?".
More traditional way is to scan the whole antigen sequence by synthesizing overlapping  peptide fragments and assaying for immune reactions. Though the technique is 100% accurate but it requires lot of time and is expensive. A better alternative is to restrict the number of peptides required for scanning. This is where the prediction methods come into play.
Two observations which still pose questions in the development of an efficient prediction  method are " The same MHC molecule can bind a range of peptides" and "MHC allelic polymorphism". So overall the situation is that we have many alleles of MHC molecules, each of which can bind to a wide range of peptides. Researchers have tried to answer these questions by simply asking question " Is there any set of amino acids responsible for specific binding to MHC molecules ?" . The answer to this question gave the MHC binding peptide prediction  methods.
Broadly the MHC binding peptide prediction methods can be divided into three main groups a) Motif based methods, b) Statistical/ Mathematical expression based methods and, c) Structure based methods.
Binding motifs describe general position based patterns of recurrent amino acids favorable for HLA- peptide binding. Prediction methods based on binding motifs are mostly all or none algorithms with high false rates. 
Statistical/ Mathematical expression based methods include Quantitative matrix and Neural network based methods. Quantitative matrices provide a linear model with easy to implement capabilities. Their predictive accuracies are considerable. On the other hand, neural networks are more complex, nonlinear and self learning systems. Their predictive accuracies are higher but they require large amount of data for learning which makes Quantitative matrix based methods suitable for MHC binding peptide predictions.
Structure based methods are logically very sound but computationally complex. These methods calculate binding energy of peptide-MHC complex and the energetically favorable peptides are predicted as binders. These methods are in stages of development. 
All the above mentioned approaches cannot effectively deal with MHC Polymorphism i.e. for each allele a separate matrix has to be generated or a separate set of rules have to be applied. Recently, Sturinolo et al., 1999 provided an answer by using virtual matrix which holds promise for delivering better MHC BINDING PEPTIDE PREDICTION METHOD.