Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny,...

37
Regression-Based Linkage Regression-Based Linkage Analysis of General Analysis of General Pedigrees Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis

Transcript of Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny,...

Page 1: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Regression-Based Linkage Regression-Based Linkage Analysis of General Analysis of General

PedigreesPedigrees

Pak Sham, Shaun Purcell,

Stacey Cherny, Gonçalo Abecasis

Page 2: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

This SessionThis Session

• Quantitative Trait Linkage Analysis• Variance Components

• Haseman-Elston

• An improved regression based method• General pedigrees

• Non-normal data

• Example application• PEDSTATS

• MERLIN-REGRESS

Page 3: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

• Simple regression-based method• squared pair trait difference

• proportion of alleles shared identical by descent

(HE-SD)(X – Y)2 = 2(1 – r) – 2Q( – 0.5) + ^

Page 4: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Haseman-Elston regressionHaseman-Elston regression

IBD

(X - Y)2

= -2Q

210

Page 5: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Sums versus differencesSums versus differences

• Wright (1997), Drigalenko (1998)

• phenotypic difference discards sib-pair QTL linkage information

• squared pair trait sum provides extra information for linkage

• independent of information from HE-SD

(HE-SS)(X + Y)2 = 2(1 + r) + 2Q( – 0.5) + ^

Page 6: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

• New dependent variable to increase power• mean corrected cross-product (HE-CP)

XY 2241 )()( YXYX

• But this was found to be less powerful than original HE when sib correlation is high

Page 7: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Variance Components Variance Components AnalysisAnalysis

proportion sharing IBD theis ˆ

sindividual twofor thet coefficien kinship theis

Where,

2ˆ22222

22222

marker

egagamarker

gamarkerega

Page 8: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Likelihood functionLikelihood function

2,1,0

)(*)'(1

2,1,0

)()'(1

Expected"" *

iesprobabilit sharing IBD )datamarker |(

|*|)2(

||)2(

12

12

1

12

12

1

jjIBDij

iij

i

i jjIBDij

Z

jIBDPZ

e

eZL jIBD

μyΩμy

μyΩμy

Page 9: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

LinkageLinkage

Page 10: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

No LinkageNo Linkage

Page 11: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

The ProblemThe Problem

• Maximum likelihood variance components linkage analysis• Powerful (Fulker & Cherny 1996) but

• Not robust in selected samples or non-normal traits

• Conditioning on trait values (Sham et al 2000) improves robustness but is computationally challenging

• Haseman-Elston regression• More robust but

• Less powerful

• Applicable only to sib pairs

Page 12: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

AimAim

• To develop a regression-based method that

• Has same power as maximum likelihood variance components, for sib pair data

• Will generalise to general pedigrees

Page 13: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Extension to General Extension to General PedigreesPedigrees

• Multivariate Regression Model

• Weighted Least Squares Estimation

• Weight matrix based on IBD information

Page 14: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Switching VariablesSwitching Variables

• To obtain unbiased estimates in selected samples

• Dependent variables = IBD

• Independent variables = Trait

Page 15: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Dependent VariablesDependent Variables

• Estimated IBD sharing of all pairs of relatives

• Example:

34

24

23

14

13

12

ˆ

ˆ

ˆ

ˆ

ˆ

ˆ

ˆ

Π

Page 16: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Independent VariablesIndependent Variables

• Squares and cross-products• (equivalent to non-redundant squared sums and differences)

• Example

44

33

22

11

43

42

32

41

31

21

xx

xx

xx

xx

xx

xx

xx

xx

xx

xx

Y

Page 17: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Covariance MatricesCovariance Matrices

DependentΠΣ ˆ

Obtained from prior (p) and posterior (q) IBD distribution given marker genotypes

klijklijklijklijklijI qpCov ˆˆ~~)ˆ,ˆ(

Page 18: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Covariance MatricesCovariance Matrices

Independent

Obtained from properties of multivariate normal distribution, under specified mean, variance and correlations

Assuming the trait has mean zero and variance one. Calculating this matrix requires the correlation between the

different relative pairs to be known.

jkiljlikklijlkji rrrrrrXXXXE )(

Page 19: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

EstimationEstimation

For a family, regression model is

Estimate Q by weighted least squares, and obtain sampling variance, family by family

Combine estimates across families, inversely weighted by their variance, to give overall estimate, and its sampling variance

εYHΣQΣΠ C1YΠC

ˆˆ

Page 20: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Average chi-squared statistics: fully Average chi-squared statistics: fully informative marker NOT linked to informative marker NOT linked to

20% QTL20% QTL

Ave

rage

chi

-squ

are

Sibship size

N=1000 individualsHeritability=0.510,000 simulations

Page 21: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Average chi-squared statistics: fully Average chi-squared statistics: fully informative marker linked to 20% informative marker linked to 20%

QTLQTL

Ave

rage

chi

-squ

are

Sibship size

N=1000 individualsHeritability=0.52000 simulations

Page 22: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Average chi-squared statistics: Average chi-squared statistics: poorly informative marker NOT linked poorly informative marker NOT linked

to 20% QTLto 20% QTL

Ave

rage

chi

-squ

are

Sibship size

N=1000 individualsHeritability=0.510,000 simulations

Page 23: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Average chi-squared statistics: Average chi-squared statistics: poorly informative marker linked to poorly informative marker linked to

20% QTL20% QTL

Ave

rage

chi

-squ

are

Sibship size

N=1000 individualsHeritability=0.52000 simulations

Page 24: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Average chi-squares: Average chi-squares: selected sib pairs, NOT linked to 20% selected sib pairs, NOT linked to 20%

QTLQTL

Selection scheme

Ave

rage

chi

-squ

are

20,000 simulations10% of 5,000 sib pairs selected

0

0.1

0.2

0.3

0.4

0.5

0.6

Ran ASP DSP Inf

Full

Poor

Page 25: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Average chi-squares: Average chi-squares: selected sib pairs, linkage to 20% selected sib pairs, linkage to 20%

QTLQTL

0

5

10

15

20

25

Ran ASP DSP Inf

Full

Poor

Selection scheme

Ave

rage

chi

-squ

are

2,000 simulations10% of 5,000 sib pairs selected

Page 26: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Mis-specification of the Mis-specification of the mean,mean,

2000 random sib quads, 20% 2000 random sib quads, 20% QTL QTL

="Not linked, full"

Page 27: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Mis-specification of the Mis-specification of the covariance,covariance,

2000 random sib quads, 20% QTL2000 random sib quads, 20% QTL

="Not linked, full"

Page 28: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Mis-specification of the Mis-specification of the variance,variance,

2000 random sib quads, 20% 2000 random sib quads, 20% QTL QTL

="Not linked, full"

Page 29: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Cousin pedigreeCousin pedigree

Page 30: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Average chi-squares for 200 Average chi-squares for 200 cousin pedigrees, 20% QTLcousin pedigrees, 20% QTL

Poor marker information Full marker information

REG VC REG VC

Not linked 0.49 0.48 0.53 0.50

Linked 4.94 4.43 13.21 12.56

Page 31: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

ConclusionConclusion

• The regression approach • can be extended to general pedigrees

• is slightly more powerful than maximum likelihood variance components in large sibships

• can handle imperfect IBD information

• is easily applicable to selected samples

• provides unbiased estimate of QTL variance

• provides simple measure of family informativeness

• is robust to minor deviation from normality

• But• assumes knowledge of mean, variance and

covariances of trait distribution in population

Page 32: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Example Application: Example Application: Angiotensin Converting Angiotensin Converting

EnzymeEnzyme• British population

• Circulating ACE levels• Normalized separately for males / females

• 10 di-allelic polymorphisms• 26 kb• Common• In strong linkage disequilibrium

• Keavney et al, HMG, 1998

Page 33: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Check The DataCheck The Data

• The input data is in three files:• keavney.dat

• keavney.ped

• keavney.map

• These are text files, so you can peek at their contents, using more or notepad

• A better way is to used pedstats …

Page 34: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

PedstatsPedstats

• Checks contents of pedigree and data files• pedstats –d keavney.dat –p keavney.ped

• Useful options:• --pairStatistics Information about relative pairs

• --pdf Produce graphical summary

• --hardyWeinberg Check markers for HWE

• --minGenos 1 Focus on genotyped individuals

• What did you learn about the sample?

Page 35: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Regression AnalysisRegression Analysis

• MERLIN-REGRESS

• Requires pedigree (.ped), data (.dat) and map (.map) file as input

• Key parameters:• --mean, --variance

• Used to standardize trait• --heritability

• Use to predicted correlation between relatives

• Heritability for ACE levels is about 0.60

Page 36: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

MERLIN-REGRESSMERLIN-REGRESS

• Identify informative families• --rankFamilies

• Customizing models for each trait • -t models.tbl

• TRAIT, MEAN, VARIANCE, HERITABILITY in each row

• Convenient options for unselected samples:• --randomSample

• --useCovariates

• --inverseNormal

Page 37: Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

The EndThe End