Regression-based linkage analysis
description
Transcript of Regression-based linkage analysis
Regression-based linkage Regression-based linkage analysisanalysis
Shaun Purcell, Pak Sham.
• Penrose (1938)• quantitative trait locus linkage for sib pair data
• Simple regression-based method• squared pair trait difference
• proportion of alleles shared identical by descent
(HE-SD)(X – Y)2 = 2(1 – r) – 2Q( – 0.5) + ^
Haseman-Elston regressionHaseman-Elston regression
IBD
(X - Y)2
= -2Q
210
Expected sibpair allele Expected sibpair allele sharingsharing
+++ + + +---- -- Sib 2
Sib 1
Squared differences (SD)Squared differences (SD)
+++ + + +---- -- Sib 2
Sib 1
Sums versus differencesSums versus differences
• Wright (1997), Drigalenko (1998)
• phenotypic difference discards sib-pair QTL linkage information
• squared pair trait sum provides extra information for linkage
• independent of information from HE-SD
(HE-SS)(X + Y)2 = 2(1 + r) + 2Q( – 0.5) + ^
Squared sums (SS)Squared sums (SS)
+++ + + +---- -- Sib 2
Sib 1
SD and SSSD and SS
+++ + + +---- -- Sib 2
Sib 1
• New dependent variable to increase power• mean corrected cross-product (HE-CP)
XY 2241 )()( YXYX
• other extensions• > 2 sibs in a sibship multiple trait loci and epistasis
• multivariate multiple markers
• binary traits other relative classes
SD + SS ( = CP)SD + SS ( = CP)
+++ + + +---- -- Sib 2
Sib 1
Xu et al Xu et al
• With residual sibling correlation• HE-CP in power, HE-SD in power
2
ˆˆˆ SDSS
• HE-CP
SDSS ww ˆ)1(ˆˆ • Propose a weighting scheme
Variance of SDVariance of SD
Variance of SSVariance of SS
Low sibling correlationLow sibling correlation
Increased sibling correlationIncreased sibling correlation
• Clarify the relative efficiencies of existing HE methods
• Demonstrate equivalence between a new HE method and variance components methods
• Show application to the selection and analysis of extreme, selected samples
Haseman-Elston regressionsHaseman-Elston regressions
HE-SD(X – Y)2 = 2(1 – r) – 2Q( – 0.5) +
HE-SS(X + Y)2 = 2(1 + r) + 2Q( – 0.5) +
HE-CPXY = r + Q( – 0.5) +
NCPs for H-E regressionsNCPs for H-E regressions
(X – Y)2
(X + Y)2
XY
Dependent
8(1 – r )2
8(1 + r )2
1 + r2
Variance ofDependent
2
2
)1(2
)ˆ(
r
VarQ
2
2
)1(2
)ˆ(
r
VarQ
2
2
1
)ˆ(
r
VarQ
NCP persibpair
Weighted H-EWeighted H-E
22
22
)1(1
)1(1
)1(1
)1(1 ˆˆ
ˆ
rr
SDrSSrQQ
Q
• Squared-sums and squared-differences • orthogonal components in the population
• Optimal weighting• inverse of their variances
Weighted H-EWeighted H-E
22
22
)1(
)1()ˆ(
r
rVarQNCP
• A function of• square of QTL variance
• marker informativeness
• complete information = 0.0125
• sibling correlation
• Equivalent to variance components• to second-order approximation
• Rijsdijk et al (2000)
Combining into one Combining into one regressionregression
• New dependent variable :• a linear combination of
• squared-sum
• and squared-difference
• weighted by the population sibling correlation:
)5.0ˆ()1(
)1(4
1
2
)1(
)(
)1(
)(22
2
22
2
2
2
Qr
r
r
r
r
YX
r
YX
)5.0ˆ()1(
)1(4
1
2
)1(
)(
)1(
)(22
2
22
2
2
2
Qr
r
r
r
r
YX
r
YX21
4
r
r
HE-COMHE-COM
+++ +---- -- Sib 2
Sib 1
SimulationSimulation
• Single QTL simulated• accounts for 10% of trait variance
• 2 equifrequent alleles; additive gene action
• assume complete IBD information at QTL
• Residual variance• shared and nonshared components
• residual sibling correlation : 0 to 0.5
• 10,000 sibling pairs • 100 replicates
• 1000 under the null
Unselected samplesUnselected samples
0
2
4
6
8
10
0 0.1 0.2 0.3 0.4 0.5
Residual sib correlation
Ave
rag
e ch
i-sq
uar
ed
HE-SD
HE-SSHE-CP
HE-COMVC
Sample selectionSample selection
• A sib-pairs’ squared mean-corrected DV is proportional to its expected NCP
• Equivalent to variance-components based selection scheme• Purcell et al, (2000)
2
22
2
2
2
1
2
)1(
)(
)1(
)()|(
r
r
r
YX
r
YXtraitNCPE 21
4
r
r
Sample selectionSample selection
-4 -3 -2 -1 0 1 2 3 4Sib 1 trait -4
-3-2
-10
12
34
Sib 2 trait
00.20.40.60.8
11.21.41.6
Sibship NCP
• 500 (5%) most informative pairs selected
r = 0.05 r = 0.60
Analysis of selected samplesAnalysis of selected samples
Selected samples : HSelected samples : H00
0
0.5
1
1.5
2
0 0.1 0.2 0.3 0.4 0.5
Residual sib correlation
Av
era
ge
ch
i-s
qu
are
d
VC
VC-C
HE-COM
Selected samples : HSelected samples : HAA
0
2
4
6
8
10
0 0.1 0.2 0.3 0.4 0.5
Residual sib correlation
Av
era
ge
ch
i-s
qu
are
d
VC-C
HE-COM
• Variance-based weighting scheme • SD and SS weighted in proportion to the inverse of
their variances
• Implemented as an iterative estimation procedure• loses simple regression-based framework
• Product of pair values corrected for the family mean• for sibs 1 and 2 from the j th family,
• Adjustment for high shared residual variance
• For pairs, reduces to HE-SD
))(( 21 jjjj XX
ConclusionsConclusions
• Advantages• Efficient
• Robust
• Easy to implement
• Future directions• Weight by marker informativeness
• Extension to general pedigrees
The EndThe End