A GLMM-based Collapsing Method for Rare CNV Analysis

49
A GLMM-based Collapsing Method for Rare CNV Analysis 1 Jung-Ying Tzeng Bioinformatics Research Center & Department of Statistics NC State University Joint work with Jin Szatkiewicz and Patrick Sullivan @ UNC-CH ENAR March 18, 2014

description

1. Jung-Ying Tzeng Bioinformatics Research Center & Department of Statistics NC State University Joint work with Jin Szatkiewicz and Patrick Sullivan @ UNC-CH. A GLMM-based Collapsing Method for Rare CNV Analysis. ENAR March 18, 2014. 2. Copy Number Variants (CNVs). - PowerPoint PPT Presentation

Transcript of A GLMM-based Collapsing Method for Rare CNV Analysis

Page 1: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

A GLMM-based Collapsing Method for Rare CNV Analysis

1

Jung-Ying Tzeng

Bioinformatics Research Center & Department of Statistics

NC State University

Joint work with Jin Szatkiewicz and Patrick Sullivan @ UNC-CH

ENAR March 18, 2014

Page 2: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Copy Number Variants (CNVs)• CNVs : changes in the number of DNA copies comparing to the

reference

• Although SNPs outnumber CNVs, their relative contributions to genomic variation (as measured in nucleotides) are similar (Malhotra and Sebat 2012)

2

...CG ATG...

ATG......CG

ATG......CG

GAA......TTGGG......GTG

Deletion

Duplication

1bp - Mb

(Source: Ferreira and Purcell 2009)

SNPs CNVs

Estimate ~1 in 1000 bp > 1000 CNVs

Base pairs ~4 Mb ~4 Mb

% genome ~0.1% ~0.1%

Mutation rate

10-8 10-4 to 10-6

Page 3: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Copy Number Variants (CNVs)

• CNVs can affect disease risk

Ex. CNVs play an important role in the etiology of multiple psychiatric disorders, e.g., developmental delay, autism, schizophrenia

3

Page 4: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Malhotra and Sebat 2012

Page 5: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Malhotra and Sebat 2012

Page 6: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Collapsing Analysis for rare CNVs • Collapsing analysis serves as a key approach to

evaluate the collective effect of rare CNVs (Sullivan et al. 2012; Collins and Sullivan 2013; Malhotra & Sebat 2012)

• CNVs are typically collapsed across the genome– Ex. a greater genome-wide burden of rare CNVs

in SCZ cases than in controls (Walsh et al. 2008 Science; International Schizophrenia Consortium 2008 Nature; Kirov et al. 2009 Hum. Mol. Genet; Buizer-Voskamp et al. 2011 Biol. Psychiatry)

or within genes– Ex. the burden of rare CNVs in NRXN1 was

significantly greater in SCZ cases than in controls (Szatkiewicz et al. submitted)

6

Page 7: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Developments in SNP Collapsing Analysis• Depending on how genotype information are modeled, SNP

collapsing methods can be roughly classified into

1. Fixed effects approaches

2. Random effects approaches

7

Page 8: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

SNP Collapsing Analysis1. Fixed effects approaches

• Focus on testing mean level of genetic effects

• Optimal if the effects of different loci are additive, have similar size and same direction

8

1*

1 0,1,2M Mi mii iGg EY w w G G

Regress traits on weighted genotype sum of all loci

, where *

0 : 0Test for associa tionH

𝛽𝑚

Page 9: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

9

2. Random effects approaches

• Focus on testing variance level of genetic effects

, genetic similarity between and

SNP Collapsing Analysis

11 , 0,1,2i i Mi miMg EY G G G

where

Basic Idea :

0 : 0Test for associationH

~ 0, Assume N

1

)

,

[ , , ] ~ (0, )

In general ( :

where

i i

Tn n n

g EY g

g g g N S

<<<<<<<<<<<<<<

Schaid2010, Pan 2011, Wu et al. 2011,Tzeng et al. 2009,2011

𝛽𝑚

Page 10: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

10

2. Random effects approaches

• Methods differ by the choices of weights and

E.g., Global test (Goeman et al. 2004) and no weights

C-alpha method (Neale et al. 2011)and with weight = I{MAF < cut}

Kernel Machine Regression (Wu et al. 2010, 2011) = IBS at locus between and and weight = (1-MAF)24

Similarity Regression (Tzeng et al. 2009, 2011, 2014) = IBS at locus between and and weight =

• Optimal if genetic effects are interactive / non-linear among loci or

vary across loci

SNP Collapsing Approaches

Page 11: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Challenges in CNV Collapsing Analysis--- Cautions about applying SNP collapsing methods1. Copy number (dosage) is not binary

– Deletion (0,1), normal copy (2) and duplication (3,4+)– SNP collapsing methods assume binary event

(i.e., mutant allele vs. not) and only keep track of number of “events”

2. CNV polymorphisms are multi-faceted– CNVs can vary in dosage, length and details of

gene intersections– Each of these ”features” affects CNVs’ impact on

disease risk. – SNP collapsing methods target only on one

feature (i.e., mutation burden).

11

Page 12: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

3. Etiological heterogeneity is often observed in CNVs – Different dosage may have different effects

Ex. 22q11.2 deletion is a risk factor for SCZ (Bassett et al. 2005; Murphy et al. 1999) whereas 22q11.2 duplication is a protective factor (Rees et al. 2014)

Ex. In gene VIPR2, triplication has higher risk than duplication for SCZ (Vacic et al. 2011)

– Collapsing with random effects methods have greater potentials than fixed effect methods for CNV analyses (for between-locus heterogeneity)

– Cautions are still needed for within-locus heterogeneity

12

Challenges in CNV Collapsing Analysis--- Cautions about applying SNP collapsing methods

Page 13: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Current CNV Collapsing Methods(All are fixed effects methods)

1. PLINK Burden Tests (International Schizophrenia Consortium 2008; Kirov et al. 2009)– Dichotomize CNV genotypes based on the event of

interests, e.g.,• CNV () vs. no CNVs ()• Del (<2) vs. No Del• Dup (>2) vs. No Dup• Genes intersected (GI) by CNVs vs. no GI

– Compare the event rates between cases and controls – Drawbacks:

• Need to dichotomize data based on event of interests• Do not address the issue of etiological heterogeneity• Only evaluate marginal effects of a CNV feature, which

subjects to spurious association (Raychaudhuri et al. 2010)

13

Page 14: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

  Cases ControlsScenario CNV rate Mean size (kb) CNV rate Mean size (kb)

S0 0.25 100 0.25 100S1 0.25 100 0.05 100S2 0.25 60 0.25 100S3 0.25 100 0.05 60S4 0.25 60 0.05 100

Under no GI effect (Raychaudhuri et al. 2010) 14

Page 15: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Current CNV Collapsing Methods2. PLINK Enrichment Tests (Raychaudhuri et al. 2010)

– Pros: assess conditional effect of CNV features and avoid spurious association

15

Total # of CNVs

# of genes intersected (GI) by a CNVMean CNV size

(kb)

Page 16: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

  Cases ControlsScenario CNV rate Mean size (kb) CNV rate Mean size (kb)

S0 0.25 100 0.25 100S1 0.25 100 0.05 100S2 0.25 60 0.25 100S3 0.25 100 0.05 60S4 0.25 60 0.05 100

Under no GI effect (Raychaudhuri et al. 2010) 16

Page 17: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Current CNV Collapsing Methods2. PLINK Enrichment Tests (Raychaudhuri et al. 2010)

– Pros: assess conditional effect of CNV features and avoid spurious association

– Cons: • Need to dichotomize data based on event of

interests• Do not address the issue of etiological

heterogeneity

17

Total # of CNVs

# of genes intersected (GI) by a CNVMean CNV size

(kb)

Page 18: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Proposed CNV Collapsing Method

18

Page 19: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Plan

• Use random effects model approaches– To account for between-locus and within-locus

etiological heterogeneity• Model multiple features of CNVs

– To assess the conditional effect of a CNV feature• Accommodate multi-nominal nature of dosage

– To avoid dichotomizing data

19

Page 20: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

(0) Start with a PLINK format CNV file

(1) Define CNV region (CNVR): – Clusters of CNV segments with ≥1bp overlap

– Retain region-specific effect when collapsing

1. Input Data Format

CNVR2CNVR1

12345…

--------------------------------------------------------------------------------------------------

20

Page 21: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

(2) Create design matrix for each CNV feature: dosage, length, and gene intersection– Dosage (DS) :

– Length (Len):

1. Input Data Format

¿CNVR CNVR

0,1,2,3,4}

¿CNVR CNVR

length∈kb

21

Page 22: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

(2) Create design matrix for each CNV feature: dosage, length, and gene intersection– Gene intersection (GI) :

1. Input Data Format

¿Gene Gene

∈ { 0 : no GI1 :GI by a Del2 :GI by a Dup}

22

Page 23: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

• For subject , be the continuous or binary trait, be a covariate vector including the intercept, and design vector of feature ,

• Model

• Assume exponential family with density

where and

models the effect of CNV feature

23

2. Model

Page 24: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

• Example of – Ex. Linear regression: – Ex. Random effect: and – Ex. In Raychaudhuri et al (2010),

• (total of CNVs of subject ) • ( of GI by CNV for subject ).

• Propose to model the covariates and background CNV features using fixed effects and model the CNV feature of interests using random effects

24

2. Model

Page 25: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

• GLM Model:

where matrix with

similarity between – mean CNV length in kb

– Dosage effect can be evaluated by testing – Test statistic: follows a weighted distribution

Example: Assessing Dosage Effect

25

Page 26: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

• The GLMM has a direct connection with kernel machine regression (Kwell et al. 2008; Wu et al. 2010) and gene-trait similarity regression (Tzeng et al. 2009; 2011)

• Under the kernel machine framework,

the GLMM is equivalent to set

with being the unknown parameters (the dual representation)• Under the similarity regression framework,

regression coefficient of genetic similarity that is quantified by the

similarity metric .

Remark 1: Connection with Other Random Effects Methods

26

Page 27: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Use the -th order polynomial function

is the pre-specified weight for locus based on, e.g., MAF• Cannot directly use in the kernel function

(both and are deviated from “normal reference”)• Solution: factorize dosage

– for – for 3– Then, which retains dosage-specific effect when

collapsing

Remark 2: Quantifying Similarity b/w

27

Page 28: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

28

Simulation Studies

Page 29: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Simulation Scheme• Obtain CNV data from TwinGene Project (Heijmans 2005;

Silventoinen et al. 2006)– Cross-sectional sampling design

– 2000 unrelated samples (rarest CNV = )

– 1757 CNVRs

– 688 genes (69 genes intersected by CNVs)

– Sample with replacement to form an individual’s CNV

– Determine based on CNV features of interests

– Simulate individuals (1000 cases and 1000 controls)

29

Page 30: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Simulation Scheme

Scheme A. Different dosage effects of Dup and Del

A1. Between-locus heterogeneity

– Randomly select 300 Dup-only CNVRs and 300 Del-only CNVRs as causal loci

A2. Within-locus heterogeneity

– Select the 38 CNVRs with both Dup and Del as causal

Scheme B. Different gene-Intersection effect of Dup and Del (i.e., heterogeneous effect of genes intersected by Dup and by Del)

B1. Across-gene heterogeneity

– Randomly select 26 genes with Dup intersection on only and 26 genes with Del intersection only as causal

B2. Within-gene heterogeneity

– Select the 8 genes with both Dup and Del intersection as causal

30

Page 31: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Type I Error for (A) Dosage Analysis

• Compare the proposed GLMM methods with

plink.all = PLINK CNV rates

plink.dup = PLINK Duplication rates

plink.del = PLINK Deletion rates

• Type I error rates:

31

Model GLMMPlink.a

llplink.dup Plink.del

Between-Locus

Heterogeneity

0.035 0.046 0.057 0.041

Within-Locus Heterogeneit

y0.041 0.051 0.057 0.043

Page 32: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Simulation Scheme

Scheme A. Different dosage effects of Dup and Del

A1. Between locus heterogeneity

– Randomly select 300 Dup-only CNVRs and 300 Del-only CNVRs as causal

A2. Within locus heterogeneity

– Select the 38 CNVRs with both Dup and Del as causal

Scheme B. Different gene-intersection effect of Dup and Del

B1. Across-gene heterogeneity

– Randomly select 26 genes with Dup intersection only and 26 genes with Del intersection only as causal

B2. Within-gene heterogeneity

– Select the 8 genes with both Dup and Del intersection as causal

32

Page 33: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Type I Error for Gene Intersection (GI) Analysis

• Compare the proposed GLMM methods with

PLINK Enrichment test (Raychaudhuri et al. 2010)

• Type I error rates:

33

Model GLMM Plink.enrichment

Between-Locus

Heterogeneity0.041 0.044

Within-Locus Heterogeneity 0.043 0.051

Page 34: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Power Analysis for (A) Dosage Effects

34

Page 35: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Po

wer

(v

s. p

link

2 si

ded

)35

50% Dup causal (Del causal) are harmful and 50% are protective

All Dup causal are harmfulAll Del causal are protective

All Dup causal are protectiveAll Del causal are harmful No Heterogeneity

A1. (Dosage effect) Between-Locus Heterogeneity

Page 36: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

A2. (Dosage effect) Within-Locus HeterogeneityP

ow

er

(vs.

plin

k 2

sid

ed)

36

Page 37: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

B. Power Analysis for (B) GI Effects

37

Page 38: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

B1. (GI effect) Between-Gene HeterogeneityP

ow

er

(vs.

plin

k.en

rich

men

t)38

50% Dup causal (Del causal) are harmful and 50% are protective

All Dup causal are harmfulAll Del causal are protective

All Dup causal are protectiveAll Del causal are harmful No Heterogeneity

Page 39: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

B2. (GI effect) Within-Gene HeterogeneityP

ow

er

(vs.

plin

k.en

rich

men

t)39

Page 40: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

SummaryFor CNV collapsing analysis:• Developments in SNP collapsing can be applied in CNV collapsing with

modification to account for the nature of CNVs, e.g., defining “locus” using CNVR or gene,

calculating similarity based on factorized dosage / GI details, adjust for background CNV features

• Random effect modeling has more potential to address etiological heterogeneity– For DS, random effects model has robustness across different

scenarios– For GI, GLMM is more powerful than plink.enrichment

• Note that GLMM has the same model as plink.enrichment except that GI effect is modeled using random effect with factorized coding

• Current work: a fixed-effect imputation method to speed up the EM computation (for estimation the variance components) when using random effects on all CNV features

40

Page 41: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Testing with

 Type I error rate ( at nominal level 0.05

Power ( at nominal level 0.05

 Fixed-effect Imputation

EM Algorithm

Fixed-effect Imputation

EM Algorithm

0.051

(4.9 min)

0.048

(32.6 min)

0.282

(2.2 min)

0.287

(18.7 min)

0.052

(5.8 min)

0.049

(18.2 min)

0.278

(2.4 min)

0.282

(12.0 min)

Page 42: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Thank you

Page 43: A GLMM-based  Collapsing  Method for Rare  CNV Analysis
Page 44: A GLMM-based  Collapsing  Method for Rare  CNV Analysis
Page 45: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Kirov et al 2009

Page 46: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

  Cases ControlsScenario CNV rate Mean size (kb) CNV rate Mean size (kb)

S0 0.25 100 0.25 100S1 0.25 100 0.05 100S2 0.25 60 0.25 100S3 0.25 100 0.05 60S4 0.25 60 0.05 100

Under no GI effect (Raychaudhuri et al. 2010)

Page 47: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

Multi-faceted Nature of CNVs

Kirov et al 2009

47

Page 48: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

A2. (Dosage effect) Within-Locus HeterogeneityP

ow

er

(vs.

plin

k 2

sid

ed)

48

Page 49: A GLMM-based  Collapsing  Method for Rare  CNV Analysis

50% Dup causal (Del causal) are harmful and 50% are protective

Po

wer

(v

s. p

link

1 si

ded

)49

All Dup causal are harmfulAll Del causal are protective

All Dup causal are protectiveAll Del causal are harmful No Heterogeneity

A1. (Dosage effect) Between-Locus Heterogeneity