What is Molecular Phylogenetics
description
Transcript of What is Molecular Phylogenetics
What is Molecular Phylogenetics
A set of techniques that enable the evolutionary relationships between DNA sequences to be inferred by making comparisons between those sequences.
Molecular phylogenetics predates DNA sequencing by several decades.
It is derived from the traditional method for classifying organisms according to their similarities and differences,
Linnaeus in the 18th century, placed all known organisms into a logical classification
The tree of life
Why Molecular Phylogenetics is important than other types of phylogentic info?..
When molecular data are used, a single experiment can provide information on many different characters.
Molecular character states are unambiguous: A, C, G and T are easily recognizable and one cannot be confused with another.
Molecular data are easily converted to numerical form and hence are amenable to mathematical and statistical analysis.
Immunological data,
Obtained by Nuttall (1904),
Involve measurements of the amount of cross-reactivity seen when an antibody specific for a protein from one organism is mixed with the same protein from a different organism.
Protein electrophoresis
Used to compare the electrophoretic properties, and hence degree of similarity, of proteins from different organisms.
This technique has proved useful for comparing closely related species and variations between members of a single species
DNA-DNA hybridization
Data are obtained by hybridizing DNA samples from the two organisms being compared.
The DNA samples are denatured and mixed together so that hybrid molecules form.
The stability of these hybrid molecules depends on the degree of similarity between the nucleotide sequences of the two DNAs, and is measured by determining the melting temperature, a stable hybrid having a higher melting temperature than a less stable one.
DNA yields more phylogenetic information than protein. The two DNA sequences differ at three positions but the amino acid sequences differ at only one position.
A phylogenetic tree is a graph composed of nodes and branches, in which only one branch connects any two adjacent nodes.
The nodes represent the taxonomic units and the branches define the relationships among the units in terms of descent and ancestry.
The branching pattern of a tree is called the topology.
The branch length usually represents the number of changes that have occurred in that branch.
The taxonomic units represented by the nodes can be species, populations, individuals or genes.
Phylogenetic trees can be either rooted or unrooted.
In a rooted tree there exists a particular node, called the root, from which a unique path leads to any other node.
An unrooted tree illustrate the relatedness of the leaf nodes without making assumptions about ancestry at all.
Most common approach :
Comparison of homologous sequences for genes using sequence alignment techniques to identify similarity.
DNA barcoding, wherein the species of an individual organism is identified using small sections of mitochondrial DNA.
RECONSTRUCTION OF DNA-BASED PHYLOGENETIC TREES
Talha Bin Rahat
Key features• The external nodes represent the genes being compared.
• The internal nodes represent the ancestral genes.
•The length of the branches indicate the degree of difference between the genes represented by nodes.
• Unrooted tree represent only the relationship between the genes but not the series of evolutionary events.
• Rooted trees show the evolutionary relationship as well and require one outgroup atleast.
An outgroup is a homologous gene related to all the genes under study but to a lesser extent than the genes related to each other.
It is necessary to obtain correct evolutionary pathway of the genes and to identify the root.
The tree we obtain after analysis is known as an inferred tree. It can be same as the true tree but not necessarily.
As an example an arbitrary gene is analyzed in human chimpanzee gorilla and orangutan and baboon being the outgroup. Baboon is taken as an outgroup because we know from fossil analysis that the primate ancestors diverged long ago than among any of the four primates
Gene tree and species tree A gene tree represents more accurate and less ambiguous
representation of the species tree than that obtainable by morphological comparisons.
This is often correct assumption but both trees are not same because the internal nodes in both trees are not precisely equivalent.
An internal node in a gene tree represents the divergence of an ancestral gene into two genes with different DNA sequences. this occurs by mutation.
An internal node in a species tree represents a speciation event. this occurs by the population of the ancestral species splitting into two groups that are unable to interbreed, for example, because they are geographically isolated.
When molecular clock is used, the species at nodes are quite ancient, the difference between speciation and mutation is negligible but for recent species, it’s not
The branching can be different in both trees e.g. when a speciation event is quickly followed by another speciation.
Tree Reconstruction
Four steps
1. Align the sequence
2. Reconstruct the tree
3. Assess the accuracy
4. Date the events
Alignment The sequences must be
homologous, if they are not we’ll get a tree but that has no real evolutionary relation.
For homologous sequences the main problem are insertion and deletions known as indels.
If indels are not properly placed in multiple alignment the analysis will not be correct.
Alignment techniques
The dot matrix is used for pairs of sequences being aligned. The diagonal represent the correct alignment. The point mutation is represented by a break and indel by shifting to another diagonal. (figure at right)
Similarity approach aligns the sequences on the basis of maximum matching nucleotides
Distance method aligns by minimizing the mismatches.
Computer based softwares are now used for alignment.
Tree Reconstruction
The data obtained is first converted to numerical data which is then mathematically processed.
Distance matrix, the simplest approach, is a table showing the evolutionary distances between all pairs of sequences in the dataset.
Neighbor joining technique for tree reconstruction It uses distance matrix for mathematical analysis.1. Start with only one internal node in a star shape of tree.2. pair of sequences is removed from the star, and
attached to a second internal node, connected by a branch to the center of the star.
3. calculate the total branch length in this new ‘tree’. 4. The sequences are then returned to their original.5. the total branch length is calculated for all the possible
pairs by same method.6. This pair of sequences that gives the tree with the
shortest total branch length to be will be neighbors in the final tree.
7. they are combined into a single unit, creating a new star with one branch fewer than the original one.
8. The whole process is repeated so that a second pair of neighboring sequences is identified, and so on.
9. The result is a complete reconstructed tree.
Maximum parsimony for tree reconstruction.
It is based on a simple assumption that evolution follows the shortest possible route and that the correct phylogenetic tree is therefore the one that requires the minimum number of nucleotide changes to produce the observed differences between the sequences.
However since large data handling is difficult. The number of possible trees huge enough that for just 50 sequences it is impossible to consider all possible unrooted trees even with fastest computers.
Assesment of tree Bootstrap analysis is commonly
used. It uses the aligned nucleotides to
make arbitrary sequences and make trees with new alignments.
In practice, 1000 new alignments are created so 1000 replicate trees are reconstructed.
each internal node in the original tree is assessed with a value being the number of times that branch pattern was seen at that node was reproduced in the replicate trees.
If the bootstrap value is greater than 700/1000 then we can assign a reasonable degree of confidence to the topology at that particular internal node.
Assigning Dates to the nodes
We make use of molecular clock hypothesis which states, that nucleotide substitutions (or amino acid substitutions if protein sequences are being compared) occur at a constant rate.
The degree of difference between two homologous sequences is related to the time elapsed from their common ancestor. However we need to calibrate the clock
Calibration is usually achieved by reference to the fossil record.
The calibration is different in different organisms and even different for different genes in same organism because. Non-synonymous mutations occur at slower rate than
synonymous mutations Mitochondrial genome has lack many of the DNA repair
systems thus the clock is faster as compared to nuclear genome.
THE APPLICATIONS OF MOLECULAR PHYLOGENETICS
Examples of the use of phylogenetic trees
1. DNA phylogenetics has clarified the evolutionary relationships between humans and other primates
Darwin was the first biologist to speculate on the evolutionary relationships between humans and other primates.
He proposed that humans are closely related to the chimpanzee, gorilla
and orangutan
It was controversial because biologists were in favour of anthropocentric view of human place in the animal world
Study of fossils, concluded (prior to 1960) that chimpanzees and gorillas are our closest relatives but that the relationship was distant, the split, occurred some 15 million years ago.
Immunological studies in the 1960s confirmed that humans, chimpanzees and gorillas do indeed form a single cladeRelationship is much closer, a molecular clock indicating that this split occurred only 5 million years ago.• Debate was ‘won’ by the molecular biologists, so it is
established that split occurred 5 million years ago
(A)Comparisons of the mitochondrial genomes of the three species by restriction mapping and DNA sequencing suggested that the chimpanzee and gorilla are more closely related to each other than either is to humans
(B) DNA-DNA hybridization data supported a closer relationship between humans and chimpanzees.
Reason for conflicting results There is a close similarity between DNA sequences in the three
species, the differences being less than 3% for even the most divergent regions of the genomes.This makes it difficult to establish relationships unambiguously.
(C) Comparasion of genes(sequences of variable loci such as pseudogenes and non-coding sequences)
chimpanzee is the closest relative to humans, with our lineages diverging 4.6–5.0 million years ago.
The gorilla is a slightly more distant cousin, its lineage having diverged from the human-chimp one between 0.3 and 2.8 million years earlier
2. The origins of AIDS
AIDS is caused by human immunodeficiency virus 1 (HIV-1), a retrovirus that infects cells involved in the immune response.
Similar immunodeficiency viruses are present in primates such as the chimpanzee, sooty mangabey, mandrill and various monkeys.
These simian immunodeficiency viruses (SIVs) are not pathogenic in their normal hosts.
But if one had become transferred to humans then within this new species the virus might have acquired new properties, such as the ability to cause disease and to spread rapidly.
Retrovirus genomes accumulate mutations relatively quickly because reverse transcriptase, that lacks an efficient proofreading activity.
The phylogenetic tree reconstructed from HIV and SIV genome sequences
RNA for different viruses was converted into DNA and it was amplified to get sufficient amount of nucleotide sequence for comparison.
The closest relative to HIV-1 among primates is the SIV of chimpanzees.
SIV from sooty mangabey, clusters in the tree with the second human immunodeficiency virus, HIV-2.
ZR59 sequence represents one of the earliest versions of HIV-1.
MOLECULAR PHYLOGENETICS AS A TOOL IN THE STUDY OF HUMAN PREHISTORY
Molecular phylogenetics can be used in intraspecific studies: the study of the evolutionary history of members of the same species.
Molecular phylogenetics is being used to deduce the origins of modern humans and the geographic patterns of their recent migrations in the Old and New Worlds.
Intraspecific studies require highly variable genetic loci:
In molecular phylogenetics applications, the genes chosen for analysis must display variability in the organisms being studied.
If there is no variability then there is no phylogenetic information.
However, in intraspecific studies the organisms being compared are all members of the same species and so share a great deal of genetic similarity, even if the species has split into populations.
Hence the DNA sequences used in phylogenetic analysis must be the most variable ones that are available.
There are three main possibilities in humans:
Multiallelic genes, such as members of the HLA family, which exist in many different sequence forms;
Microsatellites, which evolve not through mutation but by replication slippage. Cells do not appear to have any repair mechanism for replication slippage, so new microsatellite alleles are generated relatively frequently.
Mitochondrial DNA which accumulates nucleotide substitutions relatively rapidly because mitochondria lack many of the repair systems that slow down the molecular clock in the human nucleus. The mitochondrial DNA variants present in a single species are called haplotypes.
The fact that different alleles or haplotypes of these loci coexist in the population as a whole is critical to their application in molecular phylogenetics.
The loci are therefore polymorphic and information regarding the relationships
between different individuals can be obtained by comparing the combinations of
alleles and/or haplotypes that those individuals possess.
The origins of modern humans
It seems reasonably certain that the origin of humans lies in Africa because it is here that all of the oldest pre-human fossils have been found.
The paleontological evidence reveals that hominids first moved outside of Africa over 1 million years ago and became geographically dispersed, eventually spreading to all parts of the Old World.
The events that followed the dispersal of Homo erectus are controversial.
1. The multiregional hypothesis states that Homo erectus left Africa over 1 million years ago and then evolved into modern humans in different parts of the Old World.
2. The Out of Africa hypothesis states that the populations of Homo erectus in the Old World were displaced by new populations of modern humans that followed them out of Africa.
A phylogenetic tree reconstructed from mitochondrial RFLP data obtained from 147 humans representing populations from all parts of the World confirmed that the ancestors of modern humans lived in Africa.
Applying the mitochondrial molecular clock to the tree showed that the ancestral mitochondrial DNA, the one from which all modern mitochondrial DNAs are descended, existed between 140 000 and 290 000 years ago.
The tree showed that this mitochondrial genome was located in Africa, so the person who possessed it must have been African.
This discovery gave rise to the Out of Africa Hypothesis.
As a challenge to the new hypothesis, the RFLP data obtained by were examined by other molecular phylogeneticists and several quite different trees could be reconstructed from the data, some of which did not have a root in Africa.
Countering this, more detailed mitochondrial DNA sequence datasets were obtained, most of which are compatible with a relatively recent African origin and so support the Out of Africa hypothesis rather than multiregional evolution.
Further complementing this, studies of the Y chromosome suggets that the person possessing the ancestral Y chromosome also lived in Africa some 200 000 years ago.
However, complications have arisen from studies of nuclear genes other than those on the Y chromosome. For example, β-globin sequences give a
much earlier date, 800 000 years ago, for the common ancestor,
and studies of an X chromosome gene, PDHA1, place the ancestral sequence at 1 900 000 years ago.
• Molecular anthropologists are currently debating the significance of these results.
The patterns of more recent migrations into Europe are also controversial
The question centers on the process by which agriculture spread into Europe. The simplest explanation or Wave of Advance Model
suggests that farmers migrated from one part of Europe to another, taking with them their animals and crops, displacing the indigenous pre-agricultural communities present in Europe at that time.
Large-scale phylogenetic analysis of the allele frequencies for 95 nuclear genes in populations from across Europe was carried out using the advanced statistical tool Principle Component Analysis (to identify patterns in a dataset corresponding to the uneven geographic distribution of alleles, these uneven distributions possibly being indicative of past population migrations).
The most striking pattern within the European dataset is a gradation of allele frequencies across Europe implying that a migration of people occurred either from the Middle East to northeast Europe and hence supporting the Wave of Advance Model.
A second study was carried out using mitochondrial DNA haplotypes in 821 individuals from various populations across Europe.
It failed to confirm the gradation of allele frequencies detected in the nuclear DNA dataset, and instead suggested that European populations have remained relatively static over the last 20 000 years.
A refinement of this work led to the discovery that eleven mitochondrial DNA haplotypes predominate in the modern European population, each with a different time of origin, thought to indicate the date at which the haplotype entered Europe.
The youngest haplotypes, J and T1, which at 9000 years in age could correspond to the origins of agriculture, are possessed by just 8.3% of the modern European population, suggesting that the spread of farming into Europe was not the huge wave of advance indicated by the principal component study.
Instead, it is now thought that farming was brought into Europe by a smaller group of ‘pioneers’ who interbred with the existing pre-farming communities rather than displacing them.
CRITICISM RAISED: The data provided no indication of when the inferred migration took place.
THANKYOU