Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and...
-
date post
21-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and...
![Page 1: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/1.jpg)
Estimating recombination rates using three-site
likelihoods
Jeff Wall
Program in Molecular and Computational Biology, USC
![Page 2: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/2.jpg)
DNA sequence variation
Patterns of DNA sequence variation are affected by
mutationrecombinationpopulation structurechanges in population sizenatural selectiongenetic drift
![Page 3: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/3.jpg)
DNA sequence variation
Patterns of DNA sequence variation are affected by
mutationrecombinationpopulation structurechanges in population sizenatural selectiongenetic drift
![Page 4: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/4.jpg)
Standard double strand break model of recombination
Gene conversion
Crossover (with gene conversion)
Slide courtesy of M. Przeworski
![Page 5: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/5.jpg)
Standard double strand break model of recombination
Gene conversion
Crossover (with gene conversion)
Approximated as
Gene conversion
Crossover
Ignore patchworks.
e.g.
Slide courtesy of M. Przeworski
![Page 6: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/6.jpg)
Gene conversion
• Most population genetic models ignore gene conversion. However gene conversion has a strong effect on the levels of linkage disequilibrium between closely linked sites.
Recombinants are produced at a rate proportional to the genetic distance between the sites.
Recombinants are produced at a rate that is roughly independent of the distance between the sites.
Crossing over
Gene conversion
![Page 7: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/7.jpg)
Effect of gene conversion on patterns of linkage disequilibrium (LD)
Gene conversion leads to a steeper decay of LD at short distances.
0
0.1
0.2
0.3
0.4
0 5000 10000 15000 20000
avera
ge r2
Physical distance between markers (bps)
no gene conversion
gene conversion
Figure courtesy of M. Przeworski
![Page 8: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/8.jpg)
Implications of high levels of gene conversion
• To detect natural selection (Andolfatto and Nordborg 1998; Berry and Barbadilla 2000)
![Page 9: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/9.jpg)
Implications of high levels of gene conversion
• To detect natural selection (Andolfatto and Nordborg 1998; Berry and Barbadilla 2000)
• For linkage disequilibrium-based association studiesA
B C
1 2 3 1 2 3
1 2 3
![Page 10: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/10.jpg)
Parameters
= 4Nerco where Ne is the effective population size and rco is the crossover rate
per bp per generation
f = rgc / rco where rgc is the rate of gene conversion initiation per bp per
generation
t = mean gene conversion tract length. We assume that gene conversion tract lengths follow a geometric distribution.
![Page 11: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/11.jpg)
General Approach
Ideally we would calculate the probability of the data
as a function of the recombination parameters.
However, full likelihood methods (e.g., Fearnhead & Donnelly 2001) are too computationally intensive.
The composite likelihood approach calculates likelihoods for small subsets of the data, thenmultiplies these likelihoods over many subsets.
![Page 12: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/12.jpg)
Composite likelihood (Frisse et al. 2001)
Sequence 1 a c c g a t g c g t a a g c t
Sequence 2 g t a g a t g c g t c a g c t
Sequence 3 g t a g t c g t g t c g g c c
Sequence 4 a c a g t c g t g t c g g t t
Sequence 5 a c a g t c g t g t a g g t t
Sequence 6 a c c g a c g c c c a a g c t
Sequence 7 a c c g a t g c c c a a g c t
Sequence 8 a c c g a t g c c c a a g c c
Sequence 9 a c c t a t g c g t a a g c t
Sequence 10 a c c g a t a c g t c g g t t
Sequence 11 a c a g a c g c g t c g c c t
Sequence 12 g t a g a t g c c c a a g c t
![Page 13: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/13.jpg)
Composite likelihood (Frisse et al. 2001)
Sequence 1 a c c g a t g c g t a a g c t
Sequence 2 g t a g a t g c g t c a g c t
Sequence 3 g t a g t c g t g t c g g c c
Sequence 4 a c a g t c g t g t c g g t t
Sequence 5 a c a g t c g t g t a g g t t
Sequence 6 a c c g a c g c c c a a g c t
Sequence 7 a c c g a t g c c c a a g c t
Sequence 8 a c c g a t g c c c a a g c c
Sequence 9 a c c t a t g c g t a a g c t
Sequence 10 a c c g a t a c g t c g g t t
Sequence 11 a c a g a c g c g t c g c c t
Sequence 12 g t a g a t g c c c a a g c t
![Page 14: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/14.jpg)
Composite likelihood (Frisse et al. 2001)
Sequence 1 a c c g a t g c g t a a g c t
Sequence 2 g t a g a t g c g t c a g c t
Sequence 3 g t a g t c g t g t c g g c c
Sequence 4 a c a g t c g t g t c g g t t
Sequence 5 a c a g t c g t g t a g g t t
Sequence 6 a c c g a c g c c c a a g c t
Sequence 7 a c c g a t g c c c a a g c t
Sequence 8 a c c g a t g c c c a a g c c
Sequence 9 a c c t a t g c g t a a g c t
Sequence 10 a c c g a t a c g t c g g t t
Sequence 11 a c a g a c g c g t c g c c t
Sequence 12 g t a g a t g c c c a a g c t
![Page 15: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/15.jpg)
Composite likelihood (Frisse et al. 2001)
Sequence 1 a c c g a t g c g t a a g c t
Sequence 2 g t a g a t g c g t c a g c t
Sequence 3 g t a g t c g t g t c g g c c
Sequence 4 a c a g t c g t g t c g g t t
Sequence 5 a c a g t c g t g t a g g t t
Sequence 6 a c c g a c g c c c a a g c t
Sequence 7 a c c g a t g c c c a a g c t
Sequence 8 a c c g a t g c c c a a g c c
Sequence 9 a c c t a t g c g t a a g c t
Sequence 10 a c c g a t a c g t c g g t t
Sequence 11 a c a g a c g c g t c g c c t
Sequence 12 g t a g a t g c c c a a g c t
![Page 16: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/16.jpg)
Composite likelihood (Wall 2004)
Sequence 1 a c c g a t g c g t a a g c t
Sequence 2 g t a g a t g c g t c a g c t
Sequence 3 g t a g t c g t g t c g g c c
Sequence 4 a c a g t c g t g t c g g t t
Sequence 5 a c a g t c g t g t a g g t t
Sequence 6 a c c g a c g c c c a a g c t
Sequence 7 a c c g a t g c c c a a g c t
Sequence 8 a c c g a t g c c c a a g c c
Sequence 9 a c c t a t g c g t a a g c t
Sequence 10 a c c g a t a c g t c g g t t
Sequence 11 a c a g a c g c g t c g c c t
Sequence 12 g t a g a t g c c c a a g c t
![Page 17: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/17.jpg)
Composite likelihood (Wall 2004)
Sequence 1 a c c g a t g c g t a a g c t
Sequence 2 g t a g a t g c g t c a g c t
Sequence 3 g t a g t c g t g t c g g c c
Sequence 4 a c a g t c g t g t c g g t t
Sequence 5 a c a g t c g t g t a g g t t
Sequence 6 a c c g a c g c c c a a g c t
Sequence 7 a c c g a t g c c c a a g c t
Sequence 8 a c c g a t g c c c a a g c c
Sequence 9 a c c t a t g c g t a a g c t
Sequence 10 a c c g a t a c g t c g g t t
Sequence 11 a c a g a c g c g t c g c c t
Sequence 12 g t a g a t g c c c a a g c t
![Page 18: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/18.jpg)
Composite likelihood (Wall 2004)
Sequence 1 a c c g a t g c g t a a g c t
Sequence 2 g t a g a t g c g t c a g c t
Sequence 3 g t a g t c g t g t c g g c c
Sequence 4 a c a g t c g t g t c g g t t
Sequence 5 a c a g t c g t g t a g g t t
Sequence 6 a c c g a c g c c c a a g c t
Sequence 7 a c c g a t g c c c a a g c t
Sequence 8 a c c g a t g c c c a a g c c
Sequence 9 a c c t a t g c g t a a g c t
Sequence 10 a c c g a t a c g t c g g t t
Sequence 11 a c a g a c g c g t c g c c t
Sequence 12 g t a g a t g c c c a a g c t
![Page 19: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/19.jpg)
Composite likelihood (Wall 2004)
Sequence 1 a c c g a t g c g t a a g c t
Sequence 2 g t a g a t g c g t c a g c t
Sequence 3 g t a g t c g t g t c g g c c
Sequence 4 a c a g t c g t g t c g g t t
Sequence 5 a c a g t c g t g t a g g t t
Sequence 6 a c c g a c g c c c a a g c t
Sequence 7 a c c g a t g c c c a a g c t
Sequence 8 a c c g a t g c c c a a g c c
Sequence 9 a c c t a t g c g t a a g c t
Sequence 10 a c c g a t a c g t c g g t t
Sequence 11 a c a g a c g c g t c g c c t
Sequence 12 g t a g a t g c c c a a g c t
![Page 20: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/20.jpg)
Simulations
We ran simulations of 5 Kb loci with
n = 50, θ = ρ = 0.001 / bp, f = 4 and t = 125 bp.
We analyze each locus individually as well as groupsof 5, 20 and 100 loci (assuming each locus is evolutionarily independent). For each group, we estimate f over a grid of values using the methods of Frisse et al. (2001) and Wall (2004).
![Page 21: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/21.jpg)
Distribution of estimates of f(1 locus)
Triplet
method
Pair method
Estimated value of f
Frequ
en
cy
0
0.05
0.1
0.15
0.2
0.25
0 1 1.4 2 2.8 4 5.6 8 11.2 16
![Page 22: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/22.jpg)
Distribution of estimates of f(5 loci)
Triplet
method
Pair method
Estimated value of f
Frequ
en
cy
0
0.05
0.1
0.15
0.2
0.25
0.3
0 1 1.4 2 2.8 4 5.6 8 11.2 16
![Page 23: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/23.jpg)
Distribution of estimates of f(20 loci)
Triplet
method
Pair method
Estimated value of f
Frequ
en
cy
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 1 1.4 2 2.8 4 5.6 8 11.2 16
![Page 24: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/24.jpg)
Distribution of estimates of f(100 loci)
Triplet
method
Pair method
Estimated value of f
Frequ
en
cy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 1 1.4 2 2.8 4 5.6 8 11.2 16
![Page 25: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/25.jpg)
Estimating ρ and f jointly
0
0.2
0.4
0.6
0.8
1
1 10 100 1000
Triplet method
Pair method
Number of loci
Pro
bab
ility
![Page 26: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/26.jpg)
Conclusions
• For estimating gene conversion rates, the triplet composite likelihood method is slightly more accurate than the pairwise composite likelihood method.
• Both methods are not very accurate on an absolute scale.
![Page 27: Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d605503460f94a41992/html5/thumbnails/27.jpg)
Further directions
• Modify method to handle unphased data, missing data, ascertainment bias, etc.
• Variation in recombination rates
• Confounding factors:– Multiple hits– Sequencing errors– Population history– Natural selection