Post on 02-Feb-2022
QRank: Quantile Regression for eQTL Analysis
Gen Li
Department of Biostatistics, Columbia University
March 27, 2017
Gen Li (Columbia Biostatistics) QRank March 27, 2017 1 / 11
Motivation
Expression quantitative trait loci (eQTLs) analysis can elucidategenetic regulatory pathways of complex diseases
Most eQTL studies focus on identifying mean effects
Linear regression models are commonly used
However, genetic variants may affect the entire distribution of geneexpressions
This distributional heterogeneity is understudied
Gen Li (Columbia Biostatistics) QRank March 27, 2017 2 / 11
Motivation
Expression quantitative trait loci (eQTLs) analysis can elucidategenetic regulatory pathways of complex diseases
Most eQTL studies focus on identifying mean effects
Linear regression models are commonly used
However, genetic variants may affect the entire distribution of geneexpressions
This distributional heterogeneity is understudied
Gen Li (Columbia Biostatistics) QRank March 27, 2017 2 / 11
Examples of Distributional Heterogeneity
0.0 0.2 0.4 0.6 0.8 1.0
-2-1
01
2
Gene: ENSG00000196932.7 & SNP: 10_63342861_C_T_b37
τ
Qua
ntile
nor
mal
ized
gen
e ex
pres
sion
SNP=0SNP=1SNP=2
0.0 0.2 0.4 0.6 0.8 1.0
-2-1
01
2
Gene: ENSG00000119943.6 & SNP: 10_100122640_C_G_b37
τ
Qua
ntile
nor
mal
ized
gen
e ex
pres
sion
SNP=0SNP=1SNP=2
P-values:LR: 6.11e-17CLS: 0.593QRBT: 6.25e-14
0.0 0.2 0.4 0.6 0.8 1.0
-2-1
01
2
Gene: ENSG00000259917.1 & SNP: 15_34743556_G_A_b37
τ
Qua
ntile
nor
mal
ized
gen
e ex
pres
sion
SNP=0SNP=1SNP=2
P-values:LR: 6.74e-08CLS: 0.0155QRBT: 4.46e-11
P-values:LR: 6.74e-08CLS: 0.0155QRBT: 4.46e-11
0.0 0.2 0.4 0.6 0.8 1.0
-2-1
01
2
Gene: ENSG00000206503.7 & SNP: 6_29880050_T_C_b37
τ
Qua
ntile
nor
mal
ized
gen
e ex
pres
sion
SNP=0SNP=1SNP=2
P-values:LR: 0.00198CLS: 2.42e-05QRBT: 6.43e-11
Gen Li (Columbia Biostatistics) QRank March 27, 2017 3 / 11
Quantile Regression for eQTL Discovery
In a particular tissue with n samples, for i ∈ {1,2, ...,n},Yi,k is the expression level of the k th gene in the i th sample
xi,j is the j th SNP within ±1MB of the TSS of the gene in the i thsample
z i is a covariate vector for the i th sample
For each gene-SNP pair k and j , we assume the quantiles of Yi,kfollow the model:
QYi,k (τ |z i , xi,j) = z iαjk ,τ + xi,jβjk ,τ ,
Goal: identify the k and j pairs for βjk ,τ 6= 0 for any given τ ∈ (0,1)
Gen Li (Columbia Biostatistics) QRank March 27, 2017 4 / 11
QRank Test
Define the rank-score function for a fixed quantile τ as
Sn,τ = n−1/2n∑
i=1
ρτ{yi,k − z iα̂jk ,τ}x∗i,j
ρτ{u} = τ − I(u < 0) is an asymmetric sign function
α̂jk ,τ is the estimated coefficient vector under the null H0 : βjk ,τ = 0
x∗i,j is the genotype data adjusted by the covariates
Sn,τ is close to zero if and only if βjk ,τ = 0
Gen Li (Columbia Biostatistics) QRank March 27, 2017 5 / 11
QRank Test
Test statistic at a fixed quantile τ :
Tn,τ = S2n,τ/Vn −→ χ2
1, as n→∞
where V−1n is the variance of Sn,τ such that
Vn = n−1τ(1− τ)x∗Tj x∗
j
Composite test statistic (τ1, · · · , τ`):
Tn,` = STn,`Σ
−1n,`Sn,` −→ χ2
` , as n→∞
where Sn,` = (Sn,τ1 , · · · ,Sn,τ`) and Σn,` has an explicit expression
Gen Li (Columbia Biostatistics) QRank March 27, 2017 6 / 11
Numerical Results
Compare QRank (with 5 quantile levels) with LR and CLS1
Type I error and power analysis in simulation
GTEx v6 data analysis (in 4 tissues)
Gene-level discoveries, tissue specificity, GWAS enrichment
1Brown et al., Elife (2014)Gen Li (Columbia Biostatistics) QRank March 27, 2017 7 / 11
eQTL Discoveries (Unique Genes, FDR=0.05)
Gen Li (Columbia Biostatistics) QRank March 27, 2017 8 / 11
Tissue Specificity
eQTL sharing coefficient πij = P(eQTL in tissue i| eQTL in tissue j)
Gen Li (Columbia Biostatistics) QRank March 27, 2017 9 / 11
Acknowledgement
Collaborators:Xiaoyu SongZhenwei ZhouXianling WangIuliana Ionita-LazaYing Wei
Funding:NSF (DMS-120923)NIH (R01HG008980)Calderone Junior Faculty AwardMSPH, Columbia University
Song et al. “QRank: A novel quantile regression tool for eQTLdiscovery." Bioinformatics, 2017+(http://biorxiv.org/content/early/2016/08/17/070052)R package available at https://github.com/cran/QRank
Gen Li (Columbia Biostatistics) QRank March 27, 2017 11 / 11