Accelerated Permutation Inference for Pedigree Heritability … · 2015-06-12 · Accelerated...

1
Accelerated Permutation Inference for Pedigree Heritability Inference with Family-Based Neuroimaging data Habib Ganjgahi 1 , Anderson Winkler 2 , David C. Glahn 3 , John Blangero 4 , Peter Kochunov 5 , and Thomas Nichols 1 1 University of Warwick 2 University of Oxford 3 Yale University 4 University of Texas Rio Grande Valley 5 University of Maryland Introduction The prerequisite of any genetic analysis is establishing the heritability of your trait of interest. In neuroimaging genetics, the large number of voxel-wise measurements present a challenge both in terms of computational intensity, and the need to control false positive risk over multiple tests. There is a gap in existing tools, as standard neuroimaging software cannot estimate heritability, and yet standard quantitative genetics tools cannot provide essential neuroimaging inferences, like FWE corrected voxel- or cluster-wise P-values. Moreover, available heritability tools rely on P-values that can be inaccurate due to use of asymptotic inference methods. Hence, there is a need for alternative, computationally efficient inference procedures that make fewer assumptions. Permutation methods can provide exact control of false positive rates and FWE corrected voxel and cluster-wise inferences. Blangero (2013) introduced a method to accelerate maximum likelihood heri- tability estimation. However, this advance doesn’t eliminate iterative optimization, possible convergence problems, nor the use of asymptotic P-values. In the present work (Ganjgahi 2015), we expanded upon Blangero (2013) to derive approximate, non-iterative estimates and test statistics based on the first iteration of Newton’s method, develop corresponding Wald and Score tests, and find P-values from a permutation test. Methods Polygenic model: Y = + Var( )=Σ= σ 2 A (2Φ) + σ 2 E I (1) σ 2 A and σ 2 E : additive genetic and environmental effects, respectively. Φ: N×N kinship matrix; 2Φ is the genetic relatedness of subjects and . Eigen-simplified polygenic model: Applying an orthogonal transformation based on the eigenvectors of the kinship matrix to the Eq (1) gives S Y = S + S + S ⇒Y * = X * β + * (2) Y * and X * : transformed trait and covariates respectively. S: kinship matrix eigenvalue decomposition Φ = SD S where, N×N matrix S and diagonal D are eigenvector and eigenvalue matrices respectively. Var( * )=Σ * = σ 2 A D + σ 2 E I : transformed data variance, a diagonal matrix. MLE estimation of β , σ 2 A and σ 2 E are same in the original and transformed model due to ML invariance property. Parameter Estimation A non-iterative heritability estimator can be created based on 1-step optimization of the likelihood, equivalent to a WLS regression of squared residuals on the kinship matrix eigenvalues. Amemiya (1977) showed that such estimator is asymptotically normal and consistent. ˆ β OLS = ( X * X * ) -1 X * Y * ˆ θ OLS = max 0 ( U U ) -1 U * OLS ˆ OLS = Y * -X * ˆ β OLS ; ˆ β WLS = X * ( ˆ Σ * OLS ) -1 X * -1 X * ( ˆ Σ * OLS ) -1 Y * ˆ θ WLS = max 0 U ( ˆ Σ *2 OLS ) -1 U -1 U ( ˆ Σ *2 OLS ) -1 * OLS β and θ =(σ 2 A σ 2 E ) are the transformed model fixed and random parameters. * OLS : squared OLS residuals. U =[1λ ]: 2 auxiliary model design matrix, where 1 is a vector of ones and λ = } is a vector of kinship matrix eigenvalues. ˆ Σ * OLS is formed by ˆ θ OLS =(σ 2 AOLS σ 2 EOLS ). ˆ 2 WLS σ 2 AWLS / σ 2 AWLS σ 2 EWLS ). Test Statistics: The Wald test Fully converge ML estimator. One-step WLS estimator (corresponds to a generalized sums of squares): T WWLS = 1 2 σ 2 AWLS ) 2 ( ˆ Σ *-1 OLS λ ) I- ˆ Σ *-1 OLS 1 ( ˆ Σ *-1 OLS 1) ( ˆ Σ *-1 OLS 1) -1 1 ˆ Σ *-1 OLS ˆ Σ *-1 OLS λ Likelihood Ratio Test (LRT): T LML : based on fully converged estimator. T LWLS : based on WLS estimator. The Score test: Takes the form of an auxiliary model explained sums of square. T S = 1 2 ˆ σ 2 AOLS ˆ σ 2 OLS 2 λ I- 1 1 N λ Hypothesis Testing Parametic Test: Asymptotic 50 : 50 mixture χ 2 with 1 and 0 degrees of freedom. Permutation Test: two variants P1: Permute the kinship structure, refitting the model with Φ * = P ΦP . P2: ˜ Y * = X * ˆ β OLS + P ˆ * OLS . where P is a permutation matrix, ˆ β OLS is the fixed effect parameter estimation and ˆ * OLS is the residulas under the null hypothesis. Results WLS and ML heritability estimators are compareable for large samples. The parametric inference is conservative. False positive rates with P2 permutation almost exact, P1 conservative.. nS=138 nS=626 nS=858 nS=1497 0 0.2 0.4 0.6 0.8 Mean h2: ML vs WLS nS=138 nS=626 nS=858 nS=1497 0 0.05 0.1 0.15 0.2 sd h2: ML vs WLS nS=138 nS=626 nS=858 nS=1497 -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 bias h2: ML vs WLS nS=138 nS=626 nS=858 nS=1497 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 MSE h2: ML vs WLS ML: H2=0 ML: H2=0.2 ML: H2=0.4 ML: H2=0.6 ML: H2=0.8 WLS: H2=0 WLS: H2=0.2 WLS: H2=0.4 WLS: H2=0.6 WLS: H2=0.8 Test Statisics, h 2 =0 T S T L,ML T L,WLS T W,ML T W,WLS Rejection Rates 0 1 2 3 4 5 Test Statisics, h 2 =0.2 T S T L,ML T L,WLS T W,ML T W,WLS Rejection Rates 0 20 40 60 80 100 P1 P2 Parametric Test Statisics, h 2 =0.4 T S T L,ML T L,WLS T W,ML T W,WLS Rejection Rates 0 20 40 60 80 100 Test Statisics, h 2 =0.6 T S T L,ML T L,WLS T W,ML T W,WLS Rejection Rates 0 20 40 60 80 100 Theoretical -log(P-values) 0 2 4 6 8 Observed -log(P-values) 0 1 2 3 4 5 6 7 8 PP plot for Uncorrected P-values Permutation T S Permutation T W,WLS Parametric T S Parametric T W,WLS Figure 1a: 2 Estimation Accuracy Figure 1b: 2 Test FP (top left) and Power Figure 1c: Uncorrected P-value Performance Image-wise Simulations Theoretical -log(P-values) 0 0.5 1 1.5 2 2.5 3 3.5 4 Observed -log(P-values) 0 0.5 1 1.5 2 2.5 3 3.5 4 PP plot for Maximum Statistic T S T W,WLS Test Statistics T S T W,WLS FWE Rates 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 Cluster-wise Inference P=0.05 P=0.01 P=0.005 P=0.001 Theoretical -log(P-values) 0 0.5 1 1.5 2 2.5 3 3.5 Observed -log(P-values) 0 0.5 1 1.5 2 2.5 3 3.5 PP plot for T S Max Cluster Size P=0.05 P=0.01 P=0.005 P=0.001 Theoretical -log(P-values) 0 0.5 1 1.5 2 2.5 3 3.5 Observed -log(P-values) 0 0.5 1 1.5 2 2.5 3 3.5 PP plot for T W,WLS Max Cluster Size Figure 2a. Figure 2b. Figure 2c. Voxel-wise FWE p-values (2a), Cluster-wise FWE rates (2b) and cluster-wise FWE p-values (2c) for different cluster forming thresholds. Rates are nominal except for the higher cluster forming thresholds of the Wald test (Fig 2b,c). Real Data Analysis ML and WLS heritability estimators have same distribution (Fig 3b) WLS estimator showing a slight but consistent trend towards underestimation relative to ML (Fig 3a). Score Test is slightly less sensitive than the other tests (Fig 3c). ML 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 Histogram of non-zero h 2 (89.1% of h 2 > 0) WLS 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 Histogram of non-zero h 2 (89.1% of h 2 > 0) Figure 3a Figure 3b Figure 3c Figure 3e Figure 3f Real data results, voxel-wise heritability estimates for ML (top) and WLS (bottom) (Fig 3d). Voxel-wise 5% FWE significant heritability, for LRT (top), WLS Wald (middle) and the score (bottom) tests (Fig 3e). Cluster-wise 5% FWE inference results for uncorrected P=0.01% cluster forming threshold, LRT (top row), WLS Wald (middle) and the score test (bottom) (Fig 3f). Figure 3d Conclusion In this work we presented fast and powerful method to estimate heritability that enables standard spa- tial inference for genetic analysis tools optimized for imaging research, such as the SOLAR/SOLAReclipse (Kochunov2013). References T. Amemiya. A note on a heteroscedastic model. Journal of Econometrics, 6(3):365-370, 1977. J. Blangero et al. A kernel of truth: statistical advances in polygenic variance component models for complex human pedigrees., volume 81. Academic Press, 2013. H Ganjgahi et al, (2015). Fast and powerful heritability inference for family-based neuroimaging studies. NeuroImage, In press. P. Kochunov, T.E, Nichols (2013). SOLAR-Eclipse computational tools for imaging genetic and mega-genetic analysis. 19th Annual Meeting of the Organization for Human Brain Mapping. Electronic copy of this poster: http://warwick.ac.uk/tenichols/ohbm

Transcript of Accelerated Permutation Inference for Pedigree Heritability … · 2015-06-12 · Accelerated...

Page 1: Accelerated Permutation Inference for Pedigree Heritability … · 2015-06-12 · Accelerated Permutation Inference for Pedigree Heritability Inference with Family-Based Neuroimaging

Accelerated Permutation Inference for Pedigree Heritability Inference withFamily-Based Neuroimaging data

Habib Ganjgahi1, Anderson Winkler2, David C. Glahn3, John Blangero4, Peter Kochunov5, and Thomas Nichols11University of Warwick 2University of Oxford 3Yale University 4University of Texas Rio Grande Valley 5University of Maryland

IntroductionThe prerequisite of any genetic analysis is establishing the heritability of your trait ofinterest. In neuroimaging genetics, the large number of voxel-wise measurements presenta challenge both in terms of computational intensity, and the need to control false positiverisk over multiple tests. There is a gap in existing tools, as standard neuroimaging softwarecannot estimate heritability, and yet standard quantitative genetics tools cannot provideessential neuroimaging inferences, like FWE corrected voxel- or cluster-wise P-values.Moreover, available heritability tools rely on P-values that can be inaccurate due to useof asymptotic inference methods. Hence, there is a need for alternative, computationallyefficient inference procedures that make fewer assumptions. Permutation methods canprovide exact control of false positive rates and FWE corrected voxel and cluster-wiseinferences. Blangero (2013) introduced a method to accelerate maximum likelihood heri-tability estimation. However, this advance doesn’t eliminate iterative optimization, possibleconvergence problems, nor the use of asymptotic P-values. In the present work (Ganjgahi2015), we expanded upon Blangero (2013) to derive approximate, non-iterative estimatesand test statistics based on the first iteration of Newton’s method, develop correspondingWald and Score tests, and find P-values from a permutation test.MethodsPolygenic model:

Y = Xβ + ε Var(ε) = Σ = σ 2A(2Φ) + σ 2

EI (1)• σ 2

A and σ 2E: additive genetic and environmental effects, respectively.

•Φ: N ×N kinship matrix; 2Φij is the genetic relatedness of subjects i and j .Eigen-simplified polygenic model: Applying an orthogonal transformation based on theeigenvectors of the kinship matrix to the Eq (1) gives

S′Y = S′Xβ + S′g+ S′ε ⇒ Y ∗ = X ∗β + ε∗, (2)•Y ∗ and X ∗: transformed trait and covariates respectively.•S: kinship matrix eigenvalue decomposition Φ = SDgS′ where, N×N matrix S and diagonalDg are eigenvector and eigenvalue matrices respectively.•Var(ε∗) = Σ∗ = σ 2

ADg + σ 2EI: transformed data variance, a diagonal matrix.

•MLE estimation of β, σ 2A and σ 2

E are same in the original and transformed model due toML invariance property.Parameter Estimation•A non-iterative heritability estimator can be created based on 1-step optimization of thelikelihood, equivalent to a WLS regression of squared residuals on the kinship matrix

eigenvalues.•Amemiya (1977) showed that such estimator is asymptotically normal and consistent.

βOLS = (X ∗′X ∗)−1X ∗′Y ∗, θOLS = max{0, (U ′U)−1U ′f∗OLS} , εOLS = Y ∗ − X ∗βOLS;βWLS = (X ∗′(Σ∗OLS)−1X ∗)−1

X ∗′(Σ∗OLS)−1Y ∗, θWLS = max{0, (U ′(Σ∗2OLS)−1U)−1U ′(Σ∗2OLS)−1f∗OLS

},

•β and θ = (σ 2A, σ 2E) are the transformed model fixed and random parameters.• f∗OLS: squared OLS residuals.•U = [1, λg]: N×2 auxiliary model design matrix, where 1 is a vector of ones and λg = {λgi}is a vector of kinship matrix eigenvalues.• Σ∗OLS is formed by θOLS = (σ 2A,OLS, σ 2E,OLS). h2WLS = σ 2A,WLS / (σ 2A,WLS + σ 2E,WLS).Test Statistics:•The Wald testã Fully converge ML estimator.ã One-step WLS estimator (corresponds to a generalized sums of squares):

TW,WLS = 12(σ 2A,WLS)2(Σ∗−1OLSλg)′(I − Σ∗−1OLS1((Σ∗−1OLS1)′(Σ∗−1OLS1))−11′Σ∗−1OLS

) Σ∗−1OLSλg.• Likelihood Ratio Test (LRT):ãTL,ML: based on fully converged estimator.ãTL,WLS: based on WLS estimator.•The Score test: Takes the form of an auxiliary model explained sums of square.

TS = 12(σ 2A,OLSσ 2OLS

)2λ′g(I − 1′1

N

)λg,

Hypothesis Testing•Parametic Test: Asymptotic 50 : 50 mixture χ2 with 1 and 0 degrees of freedom.•Permutation Test: two variantsã P1: Permute the kinship structure, refitting the model with Φ∗ = PΦP ′.ã P2: Y ∗ = X ∗βOLS + Pε∗OLS.where P is a permutation matrix, βOLS is the fixed effect parameter estimation and ε∗OLS isthe residulas under the null hypothesis.

Results•WLS and ML heritability estimators are compareable for large samples.• The parametric inference is conservative.• False positive rates with P2 permutation almost exact, P1 conservative..

nS=138

nS=626

nS=858

nS=1497

0

0.2

0.4

0.6

0.8

Mean h2: ML vs WLS

nS=138

nS=626

nS=858

nS=1497

0

0.05

0.1

0.15

0.2

sd h2: ML vs WLS

nS=138

nS=626

nS=858

nS=1497

-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

bias h2: ML vs WLS

nS=138

nS=626

nS=858

nS=1497

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

MSE h2: ML vs WLS

ML: H2=0

ML: H2=0.2

ML: H2=0.4

ML: H2=0.6

ML: H2=0.8

WLS: H2=0

WLS: H2=0.2

WLS: H2=0.4

WLS: H2=0.6

WLS: H2=0.8

Test Statisics, h2 =0

TS

TL,ML

TL,WLS

TW,ML

TW,WLS

Reje

ction R

ate

s

0

1

2

3

4

5

Test Statisics, h2 =0.2

TS

TL,ML

TL,WLS

TW,ML

TW,WLS

Reje

ction R

ate

s

0

20

40

60

80

100

P1

P2

Parametric

Test Statisics, h2 =0.4

TS

TL,ML

TL,WLS

TW,ML

TW,WLS

Reje

ction R

ate

s

0

20

40

60

80

100

Test Statisics, h2 =0.6

TS

TL,ML

TL,WLS

TW,ML

TW,WLS

Reje

ction R

ate

s

0

20

40

60

80

100

Theoretical -log(P-values)

0 2 4 6 8

Observ

ed -

log(P

-valu

es)

0

1

2

3

4

5

6

7

8

PP plot for Uncorrected P-values

Permutation TS

Permutation TW,WLS

Parametric TS

Parametric TW,WLS

Figure 1a: h2 Estimation Accuracy Figure 1b: h2 Test FP (top left) and Power Figure 1c: Uncorrected P-valuePerformanceImage-wise Simulations

Theoretical -log(P-values)

0 0.5 1 1.5 2 2.5 3 3.5 4

Observ

ed -

log(P

-valu

es)

0

0.5

1

1.5

2

2.5

3

3.5

4PP plot for Maximum Statistic

TS

TW,WLS

Test Statistics

TS

TW,WLS

FW

E R

ate

s

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07Cluster-wise Inference

P=0.05

P=0.01

P=0.005

P=0.001

Theoretical -log(P-values)

0 0.5 1 1.5 2 2.5 3 3.5

Ob

serv

ed

-lo

g(P

-va

lues)

0

0.5

1

1.5

2

2.5

3

3.5

PP plot for TS Max Cluster Size

P=0.05

P=0.01

P=0.005

P=0.001

Theoretical -log(P-values)

0 0.5 1 1.5 2 2.5 3 3.5

Ob

serv

ed

-lo

g(P

-va

lues)

0

0.5

1

1.5

2

2.5

3

3.5

PP plot for TW,WLS

Max Cluster Size

Figure 2a. Figure 2b. Figure 2c.• Voxel-wise FWE p-values (2a), Cluster-wise FWE rates (2b) and cluster-wise FWE p-values (2c) for differentcluster forming thresholds. Rates are nominal except for the higher cluster forming thresholds of the Waldtest (Fig 2b,c).Real Data Analysis•ML and WLS heritability estimators have same distribution (Fig 3b)•WLS estimator showing a slight but consistent trend towards underestimation relative to ML (Fig 3a).• Score Test is slightly less sensitive than the other tests (Fig 3c).

ML

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

1

2

3

4

Histogram of non-zero h2 (89.1% of h2 > 0)

WLS

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

1

2

3

4

Histogram of non-zero h2 (89.1% of h2 > 0)

Figure 3a Figure 3b Figure 3c

Figure 3e Figure 3f•Real data results, voxel-wise heritability estimates for ML(top) and WLS (bottom) (Fig 3d).• Voxel-wise 5% FWE significant heritability, for LRT (top),WLS Wald (middle) and the score (bottom) tests (Fig 3e).• Cluster-wise 5% FWE inference results for uncorrectedP=0.01% cluster forming threshold, LRT (top row), WLSWald (middle) and the score test (bottom) (Fig 3f).

Figure 3d

ConclusionIn this work we presented fast and powerful method to estimate heritability that enables standard spa-tial inference for genetic analysis tools optimized for imaging research, such as the SOLAR/SOLAReclipse(Kochunov2013).ReferencesT. Amemiya. A note on a heteroscedastic model. Journal of Econometrics, 6(3):365-370, 1977.J. Blangero et al. A kernel of truth: statistical advances in polygenic variance component models for complex human pedigrees., volume 81. Academic Press, 2013.H Ganjgahi et al, (2015). Fast and powerful heritability inference for family-based neuroimaging studies. NeuroImage, In press. P. Kochunov, T.E, Nichols (2013). SOLAR-Eclipse computational tools for imaging genetic andmega-genetic analysis. 19th Annual Meeting of the Organization for Human Brain Mapping.Electronic copy of this poster: http://warwick.ac.uk/tenichols/ohbm