Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to...
-
Upload
teresa-barton -
Category
Documents
-
view
224 -
download
0
Transcript of Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to...
![Page 1: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/1.jpg)
Phylogeny – data mining by biologists
• Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences
![Page 2: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/2.jpg)
Understanding our relationships
![Page 3: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/3.jpg)
Trees are like mobiles
![Page 4: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/4.jpg)
The language of trees
![Page 5: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/5.jpg)
Changes can occur
![Page 6: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/6.jpg)
The why and what of natural selection
• Variation exists at the DNA level: alleles• This variation is inexhaustible (something
important to remember when looking at new genome sequences)
• These differences are subjected to selection:– Changes in protein structure are typically unfavorable
and as a result, selected against
– However, some changes in structure/function are selected for: sickle cell anemia/malaria
![Page 7: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/7.jpg)
Neutral Theory of Evolution - Kimura
• Third position of a codon or a nucleotide in a non-coding, non-regulatory region are expected to be invisible to natural selection
• Compare Fugu with humans..most conserved sequences are the genes– http://www.sciencemag.org/cgi/content/full/297/5585/1301
• Synonymous substitutions and substitutions in pseudogenes (define) are thought to be reflective of actual mutation rate operating with a genome (no selection)
• Is this accurate?
![Page 8: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/8.jpg)
Genetic drift
• Random genetic drift is a stochastic process (by definition).
• One aspect of genetic drift is the random nature of transmitting alleles from one generation to the next given that only a fraction of all possible zygotes become mature adults.
• Begin with equal frequency of C or T at given position, next generation observe 60/40 in favor of C…greater chance of C making it into the next generation
![Page 9: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/9.jpg)
Neutralist vs. Selectionist
![Page 10: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/10.jpg)
Where do substitutions occur?
• Non-coding regions exhibit a substitution rate 2X greater than coding regions
• Coding regions are more “functionally constrained”
• Higher degeneracy of codon, higher substitution rate observed
• A thought: Coding sequences – sequence constraint; Non-coding sequence – structure constraint???
![Page 11: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/11.jpg)
Natural variants
• Site-directed mutagenesis studies of a single gene will give way to comparative genomic studies derived from the abundance of sequence data
• As a result, it is important to understand molecular evolution and models describing this process
![Page 12: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/12.jpg)
The relationship between time and substitutions is non-linear
![Page 13: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/13.jpg)
Observing differences in nucleotides
• The simplest measure of distance between two sequences is to count the # of sites where the two sequences differ – called p-distance
• If all sites are not equally likely to change, the same site may undergo repeated substitutions
• As time goes by, the number of differences between two sequences becomes less and less an accurate estimator of the actual number of substitutions that have occurred
![Page 14: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/14.jpg)
So what is phylogeneticsgood for?
Phylogenetics has direct applications to:
• Conservation: test wood, ivory, meat products for poaching
• Agriculture: analyze specific differences between cultivars
• Forensics: DNA fingerprinting
• Medicine: determine specific biochemical function of cancer-causing genes
![Page 15: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/15.jpg)
Phylogenetic concepts:Interpreting a Phylogeny
Sequence A
Sequence B
Sequence C
Sequence D
Sequence E
Time
Which sequence is most closely related to B?
A, because B diverged from A more recently than from any other sequence.
Physical position in tree is not meaningful! Only tree structure matters.
![Page 16: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/16.jpg)
Rooted vs. unrooted
• Root – ancestor of all taxa considered
• Unrooted – relationship without consideration of ancestry
• Often specify root with outgroup– Outgroup – distantly related species (ie.
mammals and an archaeal species)
![Page 17: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/17.jpg)
Phylogenetic concepts:Rooted and Unrooted Trees
Time
A
B
C
D
Root =
A B
C D
Root
X
=?
A B
C D
?
? ?
? ?
X
![Page 18: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/18.jpg)
How Many Trees?
Unrooted trees Rooted trees
# sequences
# pairwise distances # trees
# branches /
tree # trees
# branches
/tree
3 3 1 3 3 4
4 6 3 5 15 6
5 10 15 7 105 8
6 15 105 9 945 10
10 45 2,027,025 17 34,459,425 18
30 435 8.69 1036 57 4.95 1038 58
N N (N - 1)
2
(2N - 5)!
2N - 3 (N - 3)!
2N - 3 (2N - 3)!
2N - 2 (N - 2)!
2N - 2
![Page 19: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/19.jpg)
Tree Types
Root
50 million years
sharks
seahorses
frogs
owls
crocodiles
armadillosbats
Evolutionary trees measure time.
Root
sharksseahorses
frogsowls
crocodilesarmadillos
bats5% change
Phylograms measure change.
![Page 20: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/20.jpg)
Tree Properties
Root
UltrametricityAll tips are an equal
distance from the root.X
Y
a
b
c de
a = b + c + d + e
Root
AdditivityDistance between any two tips equals the total branch
length between them.
X
Y
ab
c d
e
XY = a + b + c + d + e
In simple scenarios, evolutionary trees are ultrametric and phylograms are additive.
![Page 21: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/21.jpg)
Tree building
• Get protein/RNA/DNA sequences
• Construct multiple sequence alignment
• Compute pairwise distances (if necessary)
• Build tree – topology and distances
• Estimate reliability
• Visualize
![Page 22: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/22.jpg)
Tree summary
![Page 23: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/23.jpg)
Various models have been generated to more accurately estimate distance and evolution
• All use the following framework:
Probability matrix
pAC is the probability of a site starting with an A had a C at the end of time interval t, etc.
Base composition of sequence; fa = frequency of A
![Page 24: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/24.jpg)
Phylogenetic Methods
Neighbor-joining• Minimizes distance between nearest neighbors
Maximum parsimony• Minimizes total evolutionary change
Maximum likelihood• Maximizes likelihood of observed data
Many different procedures exist. Three of the most popular:
![Page 25: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/25.jpg)
Comparison of Methods
Neighbor-joining Maximum parsimony Maximum likelihood
Uses only pairwise distances
Uses only shared derived characters
Uses all data
Minimizes distance between nearest neighbors
Minimizes total distance
Maximizes tree likelihood given specific parameter values
Very fast Slow Very slow
Easily trapped in local optima
Assumptions fail when evolution is rapid
Highly dependent on assumed evolution model
Good for generating tentative tree, or choosing among multiple trees
Best option when tractable (<30 taxa, homoplasy rare)
Good for very small data sets and for testing trees built using other methods
![Page 26: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/26.jpg)
Which procedure should we use?Neighbor-
joining
Maximumparsimony
Maximumlikelihood
All that we can!
?
• Each method has its own strengths
• Use multiple methods for cross-validation
• In some cases, none of the three gives the correct phylogeny!
![Page 27: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/27.jpg)
Jukes-Cantor Model
• Distance between any two sequences is given by: d = -3/4 ln(1-4/3p)
• p is the proportion of nucleotides that are different in the two sequences
• All substitutions are equally probable– Each position in matrix = ; except diagonal =
1-
![Page 28: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/28.jpg)
Kimura’s two parameter model
• d = ½ ln[1/(1-2P-Q)] + ¼ ln[1/1-2Q)]
• P and Q are proportional differences between the two sequences due to transitions and transversions, respectively.
• Accounts for transition bias in sequences (transversions more rare)
![Page 29: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/29.jpg)
Distances in Amino acid sequences
• Account for synonymous and non-synonymous changes in respective codons
• Pathways to double mutations
![Page 30: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/30.jpg)
Dealing with multiple substitutions
• Unweighted method – pathways are equally likely • Weighted – favor synonymous changes • Degeneracy classifications
– Nondegenerate (0) – First two positions of TTT (Phe)
– Two-fold degenerate (2) – Third position of TTT (Phe)
– Four fold degenerate (4) – Third position of GTT (Val)
![Page 31: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/31.jpg)
Evolutionary models
![Page 32: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/32.jpg)
Implementing models and building trees
![Page 33: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/33.jpg)
Comparing models
![Page 34: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/34.jpg)
Trees are hypotheses about evolutionary history
So far, we’ve looked at understanding and formulating these hypotheses. Now, let’s turn our attention to testing them.
![Page 35: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/35.jpg)
Testing the reliability of trees
• Interior branch test or Bootstrap analysis
• Bootstrap analysis – subsequences or sequence deletion or replacement; re-draw trees; how many times do you get some branching? Bootstrap values of 70 (95) or greater are normally considered reliable
![Page 36: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/36.jpg)
Tree Testing:Split Decomposition
Split decomposition is one method for testing a tree.
A
B
C
D
A
D
B
C
A
C
B
D
Under this procedure, we choose exactly four taxa (A, B, C, D) and examine the topologies of all possible unrooted trees. How many such trees are there?
Only one of these topologies is right. How can we quantitatively assess the support for each tree?
![Page 37: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/37.jpg)
Tree Testing:Split Decomposition
The correct tree should be approximately additive; the others usually will not. For each tree, we calculate split indices that estimate the length of the internal branch:
+A
D
B
C+
A
C
B
D
–
2Large split indices Long internal branch Topology strongly supported
Small split indices Short internal branch Topology weakly supported
Negative split indices Biologically impossible Topology probably wrong
=
if A
C
B
Dis the right phylogeny!
![Page 38: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/38.jpg)
Tree Testing:Bootstrapping
Used to assess the support for individual branches
Randomly resample characters, with replacement
How often does a specific branch appear?
Repeat many times (1000 or more)
rathumanturtlefruit flyoakduckweed
100
98
73
![Page 39: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/39.jpg)
Rates of nucleotide substitutions between human and mouse or rat
• Synonymous rate = 2-10 substitutions per site per 109 years in coding regions
• Nonsynonymous rate = 0-3 substitutions per site per 109 years in coding regions (more variable among genes)
• Synonymous rate exceeds nonsynonymous rate
![Page 40: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/40.jpg)
Molecular Clocks
• Do homologous proteins evolve at the same substitution rate?
• Estimate relative rates using an outgroup
• But, what about effects of generation time, metabolic specialization, etc?
![Page 41: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/41.jpg)
Darwin’s theory reinterpreted homology as common ancestry.
ATCGGCCACTTTCGCGATCA
ATAGGCCACTTTCGCGATCA
ATAGGCCACTTTCGCGATTA
ATAGGGCAGTTTCGCGATTA
ATAGGGCAGTTTTGCGATTA
ATAGGGCAGTTTCGCGATTA
ATAGGGCAGTCTCGCGATTA
ATCGGCCACTTTCGCGATCG
ATCGGCCACTTTCGTGATCG
ATCGGCCACGTTCGTGATCG
ATCGGCCACGTTCGCGATCG
ATCGGCCACCTTCGCGATCG
ACCGGCCACCTTCGCGATCG
ACCGGCCACCTTCGCGATCGATAGGGCAGTCTCGCGATTA
Ancestral sequence
Homologous sequences
![Page 42: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/42.jpg)
Orthologs arise by speciation
ATCGGCCACTTTCGCGATCA
ATAGGGCAGTCTCGCGATTA ACCGGCCACCTTCGCGATCG
Sequence in ancestralOrganism
Orthologous sequences
Speciation event
Modern species A Modern species B
Orthologs are “evolutionary counterparts” – Koonin (2001)
![Page 43: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/43.jpg)
Paralogs arise by duplications
ATCGGCCACTTTCGCGATCA
ATAGGGCAGTCTCGCGATTA ACCGGCCACCTTCGCGATCG
Sequence in ancestralOrganism
Paralogous sequences
Duplication event
Modern duplicate A Modern duplicate B
![Page 44: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/44.jpg)
Hardison PNAS 2001 98 :1327-1329
We have different types of hemoglobins
The major adult hemoglobin is composed of 2 chains and 2 chains. The major fetal hemoglobin is composed of 2 chains and 2 chains.
![Page 45: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/45.jpg)
“There may thus exist a Molecular Evolutionary Clock”Zuckerkandl & Pauling (1965)
A model of sequence divergence can be used to extract the duplication dates of the difference hemoglobin chains
Duplication event
Primordial hemoglobin
Human Human Cow Cow
Speciation event
Note: This model explains why the distance betweem Human and Cow is shorter than Human – Human proximity.
![Page 46: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/46.jpg)
PBS Evolution Library (http://www.pbs.org/wgbh/evolution/library/)
Different clocks keep different times
Between horse and man
![Page 47: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/47.jpg)
The clock varies for different regions of the protein
For example, locations on the exterior of the protein may change at a different rate than those on the interior.
![Page 48: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/48.jpg)
Ayala, F. Bioessays 1999 Jan;21(1):71-5
No universal clocks found!
Two terrible clocks
![Page 49: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/49.jpg)
Ayala, F. Bioessays 1999 Jan;21(1):71-5
The common estimate is 1,100 My
![Page 50: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/50.jpg)
What causes deviations from the clock?
1. Generation time: Shorter generation time will accelerate the clock because it shortens the time to fix new mutations.
2. Mutation rate: Species-characteristic differences in polymerases or other biological properties that affect the fidelity of DNA replication, and hence the incidence of mutations.
3. Gene function: Changes in the function of a protein as evolutionary time proceeds. This might particularly be expected in the case of gene duplication.
4. Natural selection: Organisms are continually adapting to the physical and biotic environments, which change endlessly in patterns that are unpredictable and differently significant to different species.
Ayala, F. Bioessays 1999 Jan;21(1):71-5
![Page 51: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/51.jpg)
HIV Example 1:Florida dentist case
• 1990 case: Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist?
• HIV evolves so fast that transmission patterns can be reconstructed from viral sequence (molecular forensics).
• Compared viral sequence from the dentist, three of his HIV+ patients, and two HIV+ local controls.
![Page 52: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/52.jpg)
Florida dentist case
![Page 53: Phylogeny – data mining by biologists Molecular phylogenetics is using clustering techniques to discern relationships between different biological sequences.](https://reader035.fdocuments.net/reader035/viewer/2022062322/56649eb55503460f94bbdf69/html5/thumbnails/53.jpg)
So what do the results mean?
• 2 of 3 patients closer to dentist than to local controls. Statistical significance? More powerful analyses?
• Do we have enough data to be confident in our conclusions? What additional data would help?
• If we determine that the dentist’s virus is linked to those of patients E and G, what are possible interpretations of this pattern? How could we test between them?