1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis...

21
1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics & Genome Institute Washington University School of Medicine

Transcript of 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis...

Page 1: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

1

Associating Genomic Variations with

Phenotypes

Model comparison, rare variants, and analysis pipeline

Qunyuan Zhang

Division of Statistical Genomics & Genome Institute

Washington University School of Medicine

Page 2: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

2

Data & Question

Relationshipbetween X and Y ?

nmnnn

m

m

xxxyn

xxxy

xxxy

XYi

..

.....................

...2

...1

21

222212

112111

Genotypes:SNP

InsertionDeletion

DuplicationInversion

Translocation…

Phenotypes(quantitative,categorical)

Page 3: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

3

Linkage & Association

Association: (Y,X)

Linkage: (Y,Q)Q is unobservable

...

.....................

...2

...1

221

2222212

1212111

nnnn xqxyn

xqxy

xqxy

XYi Genotypes

Phenotype

Putative QTL

r1 Q r2

Page 4: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

4

A Fixed-effect Mixture Model For LinkageCommonly used in plant genetics

r1 Q r2

P1 X P2

F1

F2

3

1

),|()(j

iji rXQPyf

2)(

2

1exp

2

1

j

jiy

j

n

iiyfYL

1

)()(

SNP A SNP B

Page 5: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

5

A Variance-component Model For Linkage

Commonly used in human genetics

r1 Q r2

)()(

2

1exp

||)2(

1)( 1

2/12/

YYYL T

nV

V

222)( eggQQYCov IΔΔV

Background IBD matrix

QTL IBD matrix

Diagonal unit matrix

SNP A SNP B

Page 6: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

6

Variance-component Model = Random-effect Linear Model

222eggQQ IΔΔV

eγZγZμ ggQQY

),0( 2QQMVN Δ ),0( 2

ggMVN Δ ),0( 2eN

)()(

2

1exp

||)2(

1)( 1

2/12/

YYYL T

nV

V

Random effects

Page 7: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

7

From Linkage to Association

22egg IΔV

eγZγZμ ggQQY

)()(

2

1exp

||)2(

1)( 1

2/12/

XYXYYL T

nV

V

eγZXβμ ggY

marker effect(s)

Family-based association model

Linkage model

QTL effect(s)

fixed effect(s)

Page 8: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

8

A Simple Association ModelFor Unrelated Subjects

2eIV

)()(

2

1exp

||)2(

1)( 1

2/12/

XYXYYL T

nV

V

eXβμ Y

n

i e

i Xy

e1

2)(2

1exp

2

1

Page 9: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

9

Covariate(s): Adjusting For Confounder(s)

eβXXβμ CCY

Observed confounders: age, sex etc.Hidden confounders: population structure

Population structure can be estimated by:-PCA-Clustering-Admixture/ancestry

Page 10: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

10

Modeling Hidden Genetic CorrelationBetween Subjects

22egg IΔV

eγZβXXβμ ggCCY

marker fixed effect(s)

Family data, pedigree => IBD matrixPopulation data, hidden, marker data => IBS matrix

covariate fixed effect(s)

Genetic background random effects

Page 11: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

11

Modeling Rare Variants

eγZβXXβμ ggCCY

...11 XY μ

......2211 kkXXXY μ

Common variants, tested individually, H0: β1=0. One p-value per variant

Rare variants, tested as an entire group (burden test), usually by geneH0: β1= β2=…=βk=0 . One p-value per group of variants

Incorporated with variable selection, with loose criteria

β can be treated as random effects, variance components test, can be weighted by prior information

Page 12: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

12

Collapsing Model

......2211 kkXXXY μ

... XY μ

1

1

0

0013

1102

0001321 XXXXsubject

Collapsing multiple variables into one

Page 13: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

13

Weighted Sum Model

......2211 kkXXXY μ

...)(1

k

jjjXwY μ

2.0

8.0

0.0

0013

1102

00013.05.02.0 1

3

1

2

1

1 S

w

X

w

X

w

Xsubject

Weighted sum score

... SY μ

Page 14: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

14

Weighting Variants

Base on allele frequency, continuous or binary(0,1) weight, variable threshold;

Based on function annotation/prediction;Based on sequencing quality (coverage, mapping quality,

genotyping quality, validated or not etc.);Data-driven, using both genotype and phenotype data,

learning weights (including effect directions) from data, requiring permutation test;

Any combination …

Grouping VariantsBy gene By transcript By exonBy gene set / pathway By protein domain……

Page 15: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

15

Modeling More Data TypesGeneralized Linear (Mixed) Model

eXβμ ...)(Yg

Link function

For binary Y, logistic model

)0(1

)1(log)(log)(

YP

YPYitYg

1)...exp(

)...exp()1(

eXβμ

eXβμYP

Page 16: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

16

Longitudinal Data (quantitative)

Fixed effect, time as covariate

Repeated measures, random effect, correlation within subjects

Time

Page 17: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

17

Longitudinal Data (binary)

Linear model, time as covariate

Survival analysis, CoxPH model etc.

Time

Page 18: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

18

Tools

SAS ProceduresREG, LOGISTIC, GENMOD, MIXED, HPMIXED, GLIMMIX, PHREG/LIFETEST

R Functions/Packageslm (), glm()gee, nlme, kinship2/coxme, lme4, survival

Other ProgramsSOLAR, MMAP, EMMA, EMMAX, SKAT

Page 19: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

19

Pipeline

job1 job2 …..Job N

Input (data + options)

Options.jobi => self-programmed modules (SAS, R,…)

Options.jobi => external program modules (MMAP, SKAT,..)

Result 1

Result 2

….. Result N

Job generating/submitting module

Job number controlling module

Job status monitoring module (all done ?)

Yes

Result summarizing module

no

Wait …

LSF bsub

Page 20: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

20

gwas.sh options.gwa

#!/bin/shOPFILE=$1...…

[DATA]database=SASgenotype_dir=/dsg1/gwas/fhsgenogenotype_file=

phenotype_file=fhs100markerinfo_file=mapallmarker_selection=MAF>0.01pedigree_file=pediallsubjectID=subjectpedgreeID=famidmarkername=snp…[ANALYSIS]phenolist_file=pheno_list=bmi/qtcovariates=program=SASGLManalysis=mixed[OUTPUT]output_dir=/dsguser/qunyuan/fhs/bmioutput_file=output_replace=no[RUN]clusterjobname=bmimixedmemsize=1000Mmaxjobn=300…

Pheno type covar program analysis runBmi qt age,sex SASGLM mixed YESObes ql NA SASGLM gee YESHD ql age SASGLM gee NOAge …Sex ……

Program language location Maintainer SASGLM SAS /dsg1/code/sas/glm.sas Q.ZhangGSTAT R /dsg1/code/R/gstat.R Q.ZhangMMAP C /dsg1/code/sas/mmap.sh J. Czajkowski…

Page 21: 1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

21

Thanks !