Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.
-
date post
20-Dec-2015 -
Category
Documents
-
view
220 -
download
4
Transcript of Linkage Disequilibrium Granovsky Ilana and Berliner Yaniv Computational Genetics 19.06.03.
Linkage DisequilibriumLinkage Disequilibrium
Granovsky Ilana and Berliner Yaniv
Computational Genetics
19.06.03
What is Linkage Disequilibrium?What is Linkage Disequilibrium?• When the occurrence of pairs of specific
alleles at different loci on the same haplotype is not independent, the deviation form independence is termed linkage disequilibrium
• In general, linkage disequilibrium is usually seen as an association between one specific allele at one locus and another specific allele at a second locus
LinkageLinkage Disequilibrium Coefficient Disequilibrium Coefficient DefinitionsDefinitions
Marker 2
Marker1
Allele1
(probability = p2)
Allele2
(probability = 1-p2)
Allele1
(probability = p1)
X1
p1*p2+D11
X2
p1*(1-p2)-D11
Allele2
(probability = 1-p1)
X3
(1-p1)*p2-D11
X4
(1-p1)*(1-p2)+D11
•Xi-number of observations in cell i (X1+X2+X3+X4)=n
•D11-coefficient of gametic linkage disequilibrium
between allele 1 at locus 1 and allele 1 at locus 2
D11=E[X1X4-X2X3|n=1]
Population-based sampling and the Population-based sampling and the EH programEH program
• We wish to test the absence of disequilibrium between allele A at locus 1 and allele B at locus 2 (DAB=0)
• The sample of individuals we have consist of genotyping data with no possibility to fully distinguish all of the haplotypes in each individual
Table of all possible two-locus Table of all possible two-locus genotypesgenotypes
Locus2
Locus 2
AA Aa aa
BB k1 k2 k3
Bb k4 k5 k6
bb k7 k8 k9
In cell 5 there can be either of two phases, AB/ab or Ab/aB
Analysis of likelihoodAnalysis of likelihood
• We maximize the log likelihood of the data observed:
• For cell 1: p1=[P(A B)] • For cell 4: p4=2P(A B)P(A b)• For cell 5: p5=P(A B/a b)+P(A b/a B) =
=2P(A B)P(a b)+2P(A b)P(a B)
1 2
1
ln[ ( )] ln( )a a
i ii
L data pk
2
2
Table of probabilities in each cellTable of probabilities in each cell
Locus 1
Locus 2
AA Aa aa
BB p(A B) 2p(A B)p(a B) P(a B)
Bb 2p(A B)p(A b)
2P(A B)P(a b)+
+2P(A b)P(a B)
2p(a B)p(a b)
bb P(A b) 2p(A b)p(a b) P(a b)
2
2
2
2
Analysis of likelihoodAnalysis of likelihood
• We maximize the likelihood above over the possible haplotype frequencies (p(A), p(B) and DAB.
• This likelihood is then compared with the maximum likelihood when DAB is set equal to 0 (absence of linkage disequilibrium)
ExampleExample Locus 1
Locus 2
AA Aa aa
BB K1=10 K2 = 10 K3=3
Bb K4=15 K5=50 K6=13
bb K7=5 K8=13 K9=10
A a
B 45 29
b 38 46
A a
B 0.28 0.18
b 0.24 0.29
*When censoring k5 all the haplotypes can be uniquely determined
Example cont.Example cont.
• P(A) = 0.28+0.24 = 0.525
• P(B) = 0.28+0.18 = 0.468• DAB = p(A B) –p(A)p(B) = 0.28 – 0.525*0.468
= 0.0387
* Biased example due to the elimination of the 50 observations in k5.
EH program input file formatEH program input file format
• EH = estimated haplotype.– Input file EH.dat
Line 1: Number of alleles at each of the two loci
Line 2: k1 k4 k7
Line 3: k2 k5 k8
Line 4: k3 k6 k9
EH program output fileEH program output file• Output – Estimates of Gene Frequencies
(including k5)
AlleleLocus
1 2
1 0.515 0.484
2 0.480 0.519
# of typed Individuals: 129
EH program output fileEH program output file
Allele at locus 1
Allele at locus 2
Haplotype frequencyIndependent w/association
1 1 0.248 0.328
1 2 0.268 0.188
2 1 0.232 0.153
2 2 0.252 0.332
Chi square testChi square test
df Ln(L) Chi-square
H0: No association 2 -252.68 0.00
H1: Allelic association allowed
3 -248.23 8.89
•The difference between the 2 chi-square is 8.89
• The P-value associated with chi-square (with 1 df) is 0.002873
• It is clear the k5 contributes siginificant information
Haplotype frequencies
Without k5 With k5Haplotype Indepe
ndentassociate Indepe
ndentassociate
A B 0.246 0.284 0.247 0.327
A b 0.279 0.24 0.267 0.187
a B 0.222 0.183 0.232 0.152
a B 0.252 0.291 0.251 0.331
p(A) 0.525 0.515
p(B) 0.468 0.48
Dab 0.038 0.079
SummarySummary
Multiallelic genotype information in EH Multiallelic genotype information in EH programprogram
Locus 2Locus 1 1/1 1/2 2/2 1/3 2/3 3/3
1/1 a1 b1 c1 d1 e1 f1
1/2 a2 b2 c2 d2 e2 f2
2/2 a3 b3 c3 d3 e3 f3
1/3 a4 b4 c4 d4 e4 f4
2/3 a5 b5 c5 d5 e5 f5
3/3 a6 b6 c6 d6 e6 f6
Line 1: Number of alleles at each locus
Subsequent lines:
Multilocus genotype dataMultilocus genotype data
Locus 3
Locus 1 Locus 2 1/1 1/2 2/2
1/1 1/1 a1 b1 c1
1/2 a2 b2 c2
2/2 a3 b3 c3
1/2 1/1 a4 b4 c4
1/2 a5 b5 c5
2/2 a6 b6 c6
2/2 1/1 a7 b7 c7
1/2 a8 b8 c8
2/2 a9 b9 c9
Ex. 23Ex. 23• Full data Solution file: • Censored data solution file.
Censored data
1/1 haplotype data
Locus 2Locus 1
1/1 1/2 1/3 1/4 2/2 2/3 2/4 3/3 3/4 4/4
1/1 10 5 6 4 1 2 3 1 2 0
1/2 6 3 3 3 1 2 1 1 2 1
2/2 12 9 8 11 3 2 5 1 0 3
1/3 1 2 2 1 1 1 1 0 4 2
2/3 0 2 2 8 2 2 9 3 6 8
3/3 8 6 4 10 3 3 8 5 9 13
Haplotypes from censored genotype dataHaplotypes from censored genotype data
Allele at locus 2
Allele at locus 1 1 2 3 4
1 42 14 13 12
2 58 25 16 31
3 37 26 29 63
Allele at locus 2
Allele at locus 1
1 2 3 4
1 0.11 0.038 0.035 0.032
2 0.158 0.068 0.044 0.085
3 0.10 0.07 0.079 0.172