QTL ANALYSIS FOR YIELD AND OTHER AGRONOMIC TRAITS IN BARLEY.
Basic QTL Analysis
description
Transcript of Basic QTL Analysis
Basic QTL Analysis
Is there an association between marker genotype and quantitative trait phenotype? - Classify progeny by marker genotype - Compare phenotypic mean between classes (t-test or ANOVA) - Significance = marker linked to QTL - Difference between means = estimate of QTL effect
g = (µ1 - µ2)/2
g = genotypic effect
µ1 = trait mean for genotypic class AA
µ2 = trait mean for genotypic class aa
0aa AA
Genotypic classes
βo
-1 x
y
Notations for single-QTL models in backcross and F2 populations
Model Genotype ValueBackcross (Qq x QQ) QQ µ1
Qq µ2
Genetic effect g = 0.5(µ1 - µ2)
DH (qq x QQ) QQ µ1
Qq µ2
Genetic effect g = 0.5(µ1 - µ2)
F2 (Qq x Qq)
QQ µ1
Qq µ2
qq µ3
Additive a = 0.5(µ1 - µ3)
Dominance d = 0.5(2µ2 - µ1 - µ3)
Single-marker analysis• How it works
– Finds associations between marker genotype and trait value
• When to use– Order of markers unknown or incomplete maps– Quick scan
– Find best possible QTLs– Identify missing or incorrectly formatted
data
• LimitationsUnderestimates QTL number and effects
QTL position can not be precisely determined
A(marker)
Q(putative QTL)
r
r = recombination fraction
yj = trait value for the jth individual in the population
μ = population mean
f(A) = function of marker genotype
εj = residual associated with the jth individual
jj Afy )(
Single-marker analysis in backcross progeny
• Parents: AAQQ x aaqq
• Backcross: aaqq x AaQq x AAQQ
Expected
Frequency
• BC Progeny AaQq AAQQ 0.5 (1 - r)
Aaqq AAQq 0.5r
aaQq AaQQ 0.5r
aaqq AaQq 0.5(1 - r)
r is recombination frequency between A and Q
Expected QTL genotypic frequencies conditional on genotypes
Marker genotype
Observed count
Marginal frequencies
QTL genotype Expected trait value
QQ Qq
Joint frequency
AA n1 0.5 0.5(1-r) 0.5r
Aa n2 0.5 0.5r 0.5(1-r)
Conditional frequency
AA n1 0.5 1-r r (1-r)µ1 + rµ2
Aa n2 0.5 r 1-r rµ1 + (1-r)µ2
Single-marker analysis
- Simple t-test- Analysis of variance- Linear regression- Likelihood
A(marker)
Q(putative QTL)
r
Simple t-test using backcross progeny
Yj(i)k = μ + Mi + g(M)j(i) + ei(j)k
21
2 11ˆ
ˆˆ
nns
t
M
aaAaM
H0: [μAa - μaa ] = 0(a + d) = 0
r = 0.5
t-distribution with df = N – 2
If tM is significant, then a QTL is declared to be near the marker
Yj(i)k = trait value for individual j with genotype i in the replication kμ = population mean Mi = effect of the marker genotypeg(M)j(i) = genotypic effect which cannot be explained by the marker genotypeei(j)k = error termµAa = trait mean for genotypic class Aaµaa = trait mean for genotypic class aas2
M = pooled variance within the two classes
2
2
1
2 ˆˆ
ˆˆ
ns
ns
taaAa
aaAaM
Analysis of variance using backcross progenyH0: [μAa - μaa ] = 0
(a + d) = 0
r = 0.5
Source df MS (Mean Square)
Expected MS
Total Genetics N - 1 MSG
Marker 1 MSMG(Marker) N - 2 MSG(M)
Residual N (b - 1) MSE 2e
2)(
22 )1(4 arrb QTLGe
222)(
22 )21()1(4 arbcarrb QTLGe
22Ge b
)(MMSGMSMF
F-distribution with 1 and N – 2 df
If F is significant, then a QTL is declared to be near the marker
F = t if df for numerator is 1
N= no. of individuals in pop.b = no. of replicationsr = recombination fraction
Analysis of variance using SAS
data a;input Individuals Trait1 Marker1 Marker2;cards; 1 1.57 A B 2 1.35 B A 3 10.7 B B…proc glm;class Marker1 Marker2;model Trait1 = Marker1 Marker2;lsmeans Marker1 Marker2;run;
(A simple example)
0aa Aa
Genotypic classes
βo
-1 x
y
Linear regression using backcross progeny
jj jxy 10
H0: [μAa - μaa ] = 0(a + d) = 0
r = 0.5
Dummy variables:
aa = -1
Aa = 1
yj= trait value for the jth individual
xj= dummy variable
βo= intercept for the regression
β1= slope for the regression
j= random errorExpectations:
E(βo) = 0.5 (µAa + µaa) = Mean for the trait
E(β1) = 0.5 (1 - 2r) (µAa - µaa) = (1 - 2r) g = 0.5 (a + d) (1 - 2r)
β1
R2: percent of the phenotypic variance explained by the QTL
y = 3 + x + e
0
1
2
3
4
5
6
-2 -1 0 1 2
y = 3 - x + e
0
1
2
3
4
5
6
-2 -1 0 1 2
Linear regression using backcross progeny
Interpretation of results depends on coding of the dummy variables
y y
x x
Genotypic classes Genotypic classesaa Aa aa Aa
µ = 3µAa = 4µaa = 2g = 0.5(µAa - µaa) = 1
µ = 3µAa = 2µaa = 4g = 0.5(µAa - µaa) = -1
A likelihood approach using backcross progeny
N
i j
jiijN
yMQpL
1
2
12
2
2)(
exp)/(2
1
Joint distribution function:
A likelihood approach using backcross progeny (cont.)
)2(22
)(exp)/(,,,( 2
1
2
12
22
21
LnNyMQpLnrLLn
N
i j
jiij
)2(2
)(2
1( 2
1
2221
LnNyLLn
N
ii
)2(22
)(2
)(exp)5.0( 2
12
22
2
21
LnNyyLnrLLn
N
i
ii
A likelihood approach using backcross progeny (cont.)
H0: [μAa - μaa ] = 0
(a + d) = 0
r = 0.5
)5.0(ln)ˆ,ˆ,ˆ,ˆ(ln2 2 rLrLG aaAa G is distributed asymptotically as a chi-square variable with one degree of freedom
)(ln)ˆ,ˆ,ˆ,ˆ(ln2 2 aaAaaaAa LrLG
The t-test is approximately equivalent to the likelihood ratio test using this formula
G-statistics
Likelihood ratio test statistics (LR)Probability of occurrence of the data under the
null hypothesis
(Weller, 1986)
LOD scoreLOD : Logarithm of the odds ratio
Base 10 logarithm of GLR= 2 (log)LOD = 4.605LOD LOD= 0.217LR
LOD is interpreted as an odds ratio
(probability of observing the data under linkage/probability of observing the same data under no linkage)
No theoretical distribution is needed to interpret a lOD score
Key value: ≥ 3 (H1 is 1000 times more likely than H0 -no linkage-)
(approx: p = 0.001) p= probability of type I errorType I error: false positive (declare a QTL when there is no QTL)
G-Statistics and LOD score
Single-marker analysis Summary
• Identify marker-trait associations• Identify missing or incorrectly formatted data• Genetic map is not required• Divide the population into subpopulations based on the allelic
segregation of individual loci (one marker at a time)• Get trait means for each subpopulation (genotypic class)• Determine if the subpopulations trait means are significantly
different
• LimitationsUnderestimates QTL number and effects
QTL position can not be precisely determined