Post on 24-Feb-2021
MTAG: MULTI-TRAIT ANALYSIS OF GWAS
Aysu Okbay
VU Amsterdam
MOTIVATION
• For polygenic traits, GWAS requires large 𝑁𝑁• Improving prediction requires even more
• In many cases, are GWAS of other (genetically) correlated traits available
• GOAL: Boost power by pooling GWAS results from multiple related traits
MULTI-TRAIT ANALYSIS OF GWAS
• Joint analyses can boost statistical power, but often impractical in GWAS:
– Some require individual-level data.
– Some can be applied to summary statistics, but only if there is zero sample overlap
– Computationally burdensome.
• MTAG effectively addresses these challenges.
MTAG THEORETICAL FRAMEWORK
There are 𝑇𝑇 traits. Let βj be the vector of marginal effects for SNP 𝑗𝑗.From GWAS, we estimate
�𝜷𝜷𝑗𝑗 = 𝛽𝛽𝑗𝑗 + 𝑒𝑒𝑗𝑗𝑒𝑒𝑗𝑗 ~ 𝑁𝑁(0,𝚺𝚺𝑗𝑗)
where Σj is the variance-covariance matrix ofestimation error.
Assume 𝛽𝛽𝑗𝑗 are random effects with some correlation between traits, identically distributed across 𝑗𝑗
𝐸𝐸[𝛽𝛽𝑗𝑗] = 0𝑉𝑉𝑉𝑉𝑉𝑉[𝛽𝛽𝑗𝑗] = Ω
NON-GENETIC VARIATION IN �𝜷𝜷𝑗𝑗
• Non-genetic variation includes sampling variation and bias
• When are the off-diagonal elements non-zero?
𝚺𝚺𝑗𝑗 can be estimated with intercept of LD score regression
GENETIC VARIATION IN �𝜷𝜷𝑗𝑗
• Related to heritability and genetic correlation
• Heritability: diagonal• Genetic correlation: off-
diagonal• Key assumption of MTAG:𝛀𝛀 is
homogeneous across SNPs!
• Using method of moments �𝛀𝛀 = 𝑉𝑉𝑉𝑉𝑉𝑉(�𝜷𝜷𝑗𝑗 ) − �𝚺𝚺
MTAG ESTIMATING EQUATIONS
• MTAG is a generalized methods of moments (GMM) estimator
• Imagine we regressed the GWAS estimates for trait 𝑠𝑠 onto the true marginal effect size for trait 𝑡𝑡 (and a constant)
• The first-order condition of the OLS minimization is
𝐸𝐸 �̂�𝛽𝑗𝑗,𝑠𝑠 −𝜔𝜔𝑡𝑡,𝑠𝑠
𝜔𝜔𝑡𝑡,𝑡𝑡𝛽𝛽𝑗𝑗,𝑡𝑡 = 0
𝜔𝜔𝑡𝑡,𝑠𝑠: 𝑡𝑡, 𝑠𝑠-th element of 𝛀𝛀𝑇𝑇 such moment conditions, 1 parameter → GMM
MTAG OVERVIEW
• Builds on LD score regression framework.
• Assigns a weight to each coefficient estimate, that depends (in intuitive ways) on two sources of correlation between GWAS estimates.
• Allows for all sources of estimation error (not just sampling variation)
• Outputs a set of association test statistics that are trait-specific.
• Estimation is not computationally burdensome.
SPECIAL CASES
• No sample-overlap• Off-diagonal elements of 𝚺𝚺𝑗𝑗 are zero• Assumes uncorrelated biases
• Perfect genetic correlation• 𝜔𝜔𝑡𝑡𝑠𝑠 = 𝜔𝜔𝑡𝑡𝑡𝑡𝜔𝜔𝑠𝑠𝑠𝑠 for all 𝑡𝑡, 𝑠𝑠
• Equal heritabilities• 𝜔𝜔𝑡𝑡𝑡𝑡 = 𝜔𝜔𝑠𝑠𝑠𝑠 for all 𝑡𝑡, 𝑠𝑠
All of these special cases together: standard meta-analysiswith LD score intercept correction
MTAG APPLICATION
Note. Okbay et al. (2016), Nat Genet, 48, 624-633.
COHORTS IN MTAG APPLICATION
DEP NEUR SWB
GWAS MTAG GWAS MTAG GWAS MTAG
SNP-based comparisons
Lead SNPs(P < 5×10-8)
32 74 9 66 13 60
Mean χ2 1.44 1.60 1.28 1.56 1.31 1.57
Neff 354,862 168,105 388,538
DEP NEUR SWB
GWAS MTAG GWAS MTAG GWAS MTAG
SNP-based comparisons
Lead SNPs(P < 5×10-8)
32 74 9 66 13 60
Mean χ2 1.44 1.60 1.28 1.56 1.31 1.57
Neff 354,862 168,105 388,538
DEP NEUR SWB
GWAS MTAG GWAS MTAG GWAS MTAG
SNP-based comparisons
Lead SNPs(P < 5×10-8)
32 74 9 66 13 60
Mean χ2 1.44 1.60 1.28 1.56 1.31 1.57
Neff 354,862 168,105 388,538
HOW MUCH WOULD WE HAVE HAD TO BOOST N IN EACH UNIVARIATE GWAS TO MATCH OBSERVED MTAG
GAINS?
GWAS-EQUIVALENT SAMPLE SIZE FOR MTAG
• DEP: 37% increase (N = 354K to 479K)
• NEUR: 96% increase (N = 168K to 330K)
• SWB: 85% increase (N = 388K to 718K)
ARE THESE GAINS “REAL”?
PREDICTION ACCURACY IN HRS
WHAT’S THE BAD NEWS?
• Model misspecification → potentially substantial bias
• Most problematic are “bad SNPs”• SNPs that are null for the primary trait, but nonnull
for some secondary trait• Inflated type-I error rate, false discovery rate
• Prediction of other traits and biological annotation may be biased.
MAXIMUM FALSE DISCOVERY RATE
• Based on multivariate spike-and-slab distribution• Each SNP may be associated with all, some, or no traits• Potentially leads to “bad SNPs,” which are associated with
secondary traits but not the primary trait
𝛽𝛽𝑗𝑗~
𝑁𝑁 0, 0 00 0 with probability 𝜋𝜋𝐹𝐹𝐹𝐹
𝑁𝑁 0, 𝜔𝜔11 00 0 with probability 𝜋𝜋𝑇𝑇𝐹𝐹
𝑁𝑁 0, 0 00 𝜔𝜔22
with probability 𝜋𝜋𝐹𝐹𝑇𝑇
𝑁𝑁 0,𝜔𝜔11 𝜔𝜔12𝜔𝜔12 𝜔𝜔22
with probability 𝜋𝜋𝑇𝑇𝑇𝑇
MAXIMUM FALSE DISCOVERY RATE
Maximize FDR over all feasible spike-and-slab distributions
RECOMMENDATIONS
• Replication!
• Choose settings with a low risk of a high false discovery rate (FDR)• Genetic correlation between traits is high (>0.7) AND• Mean 𝜒𝜒2-statistic of primary trait is high (>1.7) OR
higher than that of secondary traits
• Possible to run into problems even when above is satisfied• Perform maxFDR calculations
SOFTWARE
Code is publicly available at:
https://github.com/omeed-maghzian/mtag
PRACTICAL
INTRO
• Begin by looking at the MTAG options by typing
mtag –h
• Copy the files into your working directorymkdir MTAG_practicalcd MTAG_practical
cp /faculty/aysu/MTAG/SWB_Full.txt .cp /faculty/aysu/MTAG/Neuroticism_Full.txt .cp -r /faculty/aysu/MTAG/eur_w_ld_chr/ .
INPUT FILE FORMAT
• These are GWAS results on neuroticism and subjective well-being from Okbay et. al. (2016) , restricted to HapMap3 SNPs.
• Have a look at the data
head SWB_Full.txt
INPUT FILE FORMAT
• The following columns are necessary for MTAG to run (order not important):
• snpid (--snp_name)• a1/a2 (--a1_name / --a2_name)• freq (--eaf_name) • z (--z_name)• n (--n_name)
• The other columns are not directly used by mtag.py but are part of the munging procedure implemented via ldsc.
RUNNING MTAG WITH THE DEFAULTS
Using mtag with the default options implements the following steps: 1. Read in the input GWAS summary statistics and filter the
SNPs by MAF ≥ 0.01 and sample size N ≥ (2/3) * 90th percentile.
2. Merge the filtered GWAS summary statistics results together, taking the intersection of available SNPs.
3. Estimate the residual covariance matrix (Σ) via LD Score regression.
4. Estimate the genetic covariance matrix (Ω) 5. Perform MTAG and output results.
RUNNING MTAG WITH THE DEFAULTS
mtag --sumstats SWB_Full.txt,Neuroticism_Full.txt \--snp_name MarkerName \--chr_name CHR \--bpos_name POS \--a1_name A1 \--a2_name A2 \--eaf_name EAF \--use_beta_se \--beta_name Beta \--se_name SE \--p_name Pval \--n_name N \--ld_ref_panel ./eur_w_ld_chr/ \--out ./NEUR_SWB \--stream_stdout
MTAG OUTPUT
Running mtag should have produced five files in your current directory:• “NEUR_SWB.log” timestamps the different steps taken by mtag.py.• “NEUR_SWB_sigma_hat.txt” stores the estimated residual
covariance matrix.• “NEUR_SWB_omega_hat.txt” stores the estimated genetic
covariance matrix.• “NEUR_SWB_trait_1.txt” and “NEUR_SWB_trait_2.txt” are tab-
delimited results files corresponding to the MTAG-adjusted effect sizes and standard errors for the neuroticism and subjective well-being summary statistics, respectively.
Note that the files are numbered in the order they were presented in the list provided to the --sumstats flag.
LOG FILE
Apart from providing time stamps for the estimating the different matrices needed for MTAG, the log file also displays the calculated values of Omega and Sigma along with a summary output of the results:
RESULTS FILES
NEUR_SWB_trait_1.txt provides the MTAG-adjusted effect sizes for the neuroticism GWAS:
• The first eight columns are copied from the corresponding input file.
• mtag_beta and mtag_se provide the unstandardized weights and standard errors calculated by mtag, yielding the corresponding z-scores mtag_z and p-values mtag_pval.
SPECIAL CASES
• --no_overlap : Assumes no overlap between any of the cohorts in any pair of GWAS studies fed into mtag
• --perfect_gencov : Assumes the T summary statistics used in MTAG are GWAS estimates for traits that are perfectly correlated with one another
• -equal_h2 : Requires --perfect_gencov.Assumes all summary statistics files have in MTAG have the same heritability