Phylogenetic Prediction
From Indipedia: India's Wikipedia at OSDD
Contents |
[edit] Introduction
One of the important area of sequence analysis is phylogenetic analysis of nucleic acids and protein sequences.It helps to analyse the changes that have occurred during the evolution of different organisms,to study the evolution of a family of sequences,to follow the changes occurring in a rapidly changing species, such as a virus. On the basis of this analysis, sequences that are the most closely related can be identified since they occupy the neighbouring branches on a tree. When a gene family is found in an organism or group of organisms, phylogenetic relationships among the genes can help to predict which ones might have an equivalent function. These functional predictions can then be validated by genetic experiments. Moreover, analysis of the types of changes within a population can reveal whether or not a particular gene is under selection. Thus, phylogenetics is useful to find the evolutionary ties between organisms i.e. to analyze changes occurring in different organisms during evolution, to understand the relationship between an ancestral sequence and its descendants i.e. to understand evolution of a family of sequences and to estimate time of divergence between a group of organisms that share a common ancestor.
[edit] Relationship of Phylogenetic analysis to Sequence alignment
- When two nucleotide or protein sequences of two different organisms are similar then they are likely to derive from a common ancestor sequence. A sequence alignment can reveal the changes between a given set of sequences from the ancestral sequence determining the positions at which the sequences are diverged or conserved from the given ancestral sequence. So, when we are quite certain that the two sequences share an evolutionary relationship based on the sequence alignment then we can say that the sequences are homologous.
- The progressive multiple alignment of a group of sequences, first aligns the most similar pair and then it adds the more distant pairs.The alignment is thus influenced by the most alike sequences and thus may not be the correct representative of the evolutionary history of the sequences.
- It is easier to trace the evolution of sequences that are strongly similar to each other. But if sequence are distantly related then their alignment needs positioning of gaps which represent insertion or deletion or rearrangements during evolution.Gaps are treated differently by different phylogenetic programs, infact some of these programs ignore gaps.In some programs scoring system is used to handle gaps accurately.Thus very often a similarity score with penalties for gaps is used. These scores are then converted to distant scores that are suitable for phylogenetic analysis.
[edit] Points to consider in Phylogenetic analysis
Following are the points that are needed to be kept in mind in order to make a reliable phylogenetic analysis:
- Genome complexity need to be considered during phylogenetic analysis.
- Evolutionary history of one gene may not coincide with the evolutionary history of another.
- Use of molecules that carry a great deal of evolutionary history in interspecies sequence variations like mtDNA and rRNA.
- It is assumed that the same sequence positions evolve at the same rate in different genomes but assumptions becomes problematic when we analyse distantly related sequences.
- Gene duplication events that cause tandem copies of a gene in a genome should also be considered during Phylogenetic analysis.
[edit] Methods for Phylogenetic Prediction
There are three main methods for phylogenetic prediction:
[edit] Maximum Parsimony
[edit] Distance Method
[edit] Maximum Likelihood
A flowchart describing which method to use under what situation is shown in figure 1.
[edit] Phylogenetic analysis programs
Phylip and Paup are the two most widely used programs for phylogenetic analysis. Phylip is "Phylogenetic Inference Package" and Paup is "Phylogenetic Analysis using Parsimony". Both these methods use the same methods for phylogenetic analysis viz. parsimony, distance methods and maximum likelihood methods.
- The details of the programs used by the PHYLIP package are:
1.Parismony methods
- Nucleotide sequences
- DNAPARS: treats gaps as fifth nucleotide state
- DNAPENNY: performs parsimonious phylogenies
- DNACOMP: finds the tree that support the largest number of sites and is used when rate of evolution varies among sites.
- DNAMOVE: it interactively performs parsimony and compatibility analysis
- Protein sequences
- PROTOPARS
2. Distance Methods
- DNADIST: to calculate distance matrix for nucleic acid sequences.
- PROTODIST: to calculate distance matrix for protein sequences.
These distance matrices are then used as input to the following distance analysis programs:
- FITCH: estimates a phylogenetic tree using Fitch-Margoliash algorithm without assuming molecular clock hypothesis.
- KITSCH: estimates a phylogenetic tree using Fitch-Margoliash algorithm assuming molecular clock hypothesis.
- NEIGHBOR: estimates phylogeny using neighbour joining and UPGMA method.
3. Maximum Likelihood programs
- DNAML: estimates phylogey using ML method.
- DNAMLK: same as DNAML but using molecular clock.

