N-gene Coalescent Problems Probability of the 1 st success after waiting t, given a time-constant, a...
-
Upload
avis-palmer -
Category
Documents
-
view
217 -
download
0
Transcript of N-gene Coalescent Problems Probability of the 1 st success after waiting t, given a time-constant, a...
N-gene Coalescent Problems
• Probability of the 1st success after waiting t, given a time-constant, a ~ p, of success
04/18/23 Comp 790– Continuous-Time Coalescence 1
€
Exp(a,t)=ae−at
E(Exp(a,t))=1a
Var(Exp(a,t))= 1a2
Review N-genes
• Likelihood k genes have a distinct lineage is:
• Manipulating a little
• Where, for large N, 1/N2 is negligible
04/18/23 Comp 790– Continuous-Time Coalescence 2
€
(2N−1)2N
(2N−2)2N
L(2N−(k−1))
2N= 1− i
2Ni=1
k−1
∏
The 1st gene can choose its parent freely, but the next k-1 must choose from the remainderGenes without a child
€
1− i2N
i=1
k−1
∏ ≈1− j2N
i=1
k−1
∑ +O 1N2( )=1−
k2
⎛
⎝ ⎜
⎞
⎠ ⎟12N
+O 1N2( )
Approx N-gene Coalescence
• Approximate probability k-genes have different parents:
• The probability two or more have a common parent:
• Repeated distinct lineages for j generations leads to a geometric distribution, with
04/18/23 Comp 790– Continuous-Time Coalescence 3
€
1−k2
⎛
⎝ ⎜
⎞
⎠ ⎟12N
€
1− 1−k2
⎛
⎝ ⎜
⎞
⎠ ⎟12N
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟=
k2
⎛
⎝ ⎜
⎞
⎠ ⎟12N
€
P(N=j)≈ 1−k2
⎛
⎝ ⎜
⎞
⎠ ⎟12N
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
j−1k2
⎛
⎝ ⎜
⎞
⎠ ⎟12N
€
p=k2
⎛
⎝ ⎜
⎞
⎠ ⎟12N
Recall that the 2-gene case had a similar form, but with 1 in place of the combinatorial. Here the combinatorial terms accounts for all possible k-choose-2 pairs, which are treated independently
Impact of Approximation
• Approximation is not “proper” for all values of k < 2N
• Considering the following values of N
04/18/23 Comp 790– Continuous-Time Coalescence 4
€
1−k2
⎛
⎝ ⎜
⎞
⎠ ⎟12N
=1−k(k−1)4N
<0 fork> 16N+1 +12
N 10 100 1000 10000 100000 1000000
k 7 21 64 201 633 2001
Fix N and Vary k
• Comparing the actual to the approximation
04/18/23 Comp 790– Continuous-Time Coalescence 5
Concrete Example
• In a population of 2N = 10 the probability that 3 genes have one ancestor in the previous generation is:
• The probability that all 3 have a different ancestor is:
• The remaining probability is that the 3 genes have two parents in the previous generation
04/18/23 Comp 790– Continuous-Time Coalescence 6
€
110
110
= 1100
The 1st gene can choose its parent freely, while the next 2 must choose the same one
€
1010
910
810
= 72100
The ist gene can choose its parent from the 10, while the next 2 must choose the remainder
€
1− 1100
− 72100
= 27100
Example Continued
• The probability is that 2 or more genes have common parents in the previous generation is:
• By our approximation term the probability that two or more genes share a common parent is:
• Leads to a MRCA estimate of
04/18/23 Comp 790– Continuous-Time Coalescence 7
€
27100
+ 1100
= 28100
The probability that 2 have common parents plus the probability all 3 have a common parent
€
32
⎛
⎝ ⎜
⎞
⎠ ⎟110
= 310
error= 310
− 28100
= 2100
Error in approximation for k=3, 2N=10
€
1p= 1
32
⎛
⎝ ⎜
⎞
⎠ ⎟110
=103
=3.33
For Large N and Small k
• For 2N > 100, the agreement improves, so long as k << 2N
• The advantage of the approximation is that it fit’s the “form” of a geometric distribution, an thus can be generalized to a continuous-time model
04/18/23 Comp 790– Continuous-Time Coalescence 8
Continuous-time Coalescent
• In the Wright-Fisher model time is measures in discrete units, generations.
• A continuous time approximation is conceptually more useful, and via the given approximation, computationally simple
• Moreover, a continuous model can be constructed that is independent of the population size (2N), so long as our sample size, k, is much smaller (one of those rare cases where a small sample size simplifies matters)
• The only time we will need to consider population size (2N) is when we want to convert from time back into generations.
04/18/23 Comp 790– Continuous-Time Coalescence 9
Continuous-time Derivation
• As before, let , where j is now time measured in generations
• It follows that j = 2Nt translates continuous time, t, back into generations j. In practice floor(2Nt) is used to assign a discrete generation number.
• The waiting time, , for k genes to have k – 1 or fewer ancestors is exponentially distributed, , derived from t = j/2N, M=2N and
• Giving:
04/18/23 Comp 790– Continuous-Time Coalescence 10
€
Tkc
€
t= j2N
€
Tkc ~Exp k
2( )( )
€
p= k2( ) / 2N
€
P Tkc ≤t( )=1−e
− k2
⎛ ⎝ ⎜ ⎞ ⎠ ⎟t
The probability that k genes will have k-1 or fewer ancestors at some time greater than or equal to t
Visualization
• Plots of , for k = [3, 4, 5, 6]
04/18/23 Comp 790– Continuous-Time Coalescence 11
€
P Tkc ≤t( )
k=3
k=4
k=5
k=6
Continuous Coalescent Time Scale
• In the continuous-time time constant is a measure of ancestral population size, with the original at time 0, ½ the original at time 0.5, and ¼ at 1.0
04/18/23 Comp 790– Continuous-Time Coalescence 12
1 2 3 4 5 6
0.0
0.5
1.0
1.3
t
0
N
2N
2.6N
Population size
A Coalescent Model
• The continuous coalescent lends itself to generative models• The following algorithm constructs a plausible genealogy for
n genes
• This model is backwards, it begins from the current populations and posits ancestry, in contrast to a forward algorithm like those used in the first lecture
04/18/23 Comp 790– Continuous-Time Coalescence 13
1. Start with k = n genes2. Simulate the waiting time, , to the next event,3. Choose a random pair (i, j) with 1 ≤ i < j ≤ k uniformly
among the pairs4. Merge I and J into one gene and decrease the sample size
by one, k k -15. Repeat from step 2 while k > 1
€
Tkc
€
Tkc ~Exp k
2( )( )
€
k2( )
Properties of a Coalescent Tree
• The height, Hn, of the tree is the sum of time epochs, Tj, where there are j = n, n-1, n-2, … , 2, 1 ancestors.
• The distribution of Hn amounts to a convolution of the exponential variables whose result is:
• Where
• With
04/18/23 Comp 790– Continuous-Time Coalescence 14
€
P Hn ≤t( )= e− k
2
⎛ ⎝ ⎜ ⎞ ⎠ ⎟t
k−1
n
∑ (−1)k−1(2k−1)F(k)G(k)
€
F(k)=n(n−1)(n−2)L (n−k+1)G(k)=n(n+1)(n+2)L (n+k−1)
€
E(Hn)= E(Tj)=21
j(j−1)=2 1−1
n( )j=2
n
∑j=2
n
∑
Var(Hn)= Var(Tj)j=2
n
∑ =4 1j2 (j−1)2j=2
n
∑
As n ∞, E(Hn) 2,and, if n=2, E(H2)=1.
Thus, the waiting time for n genes to find their common ancestor is less than twice the time for 2!