The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Graph Regularized Dual Lasso for Robust eQTL Mapping...
-
Upload
rosamond-stevens -
Category
Documents
-
view
215 -
download
1
Transcript of The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Graph Regularized Dual Lasso for Robust eQTL Mapping...
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Graph Regularized Dual Lasso for Robust eQTL Mapping
Wei Cheng1 Xiang Zhang2 Zhishan Guo1 Yu Shi3 Wei Wang4
1University of North Carolina at Chapel Hill,2Case Western Reserve University,
3University of Science and Technology of China,4University of California, Los Angeles
Speaker: Wei ChengThe 22th Annual International Conference on Intelligent Systems for Molecular
Biology (ISMB’14)
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
eQTL (Expression QTL) • Goal: Identify genomic locations where
genotype significantly affects gene expression.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
• Partition individuals into groups according to genotype of a SNP
• Do a statistic (t, ANOVA) test
• Repeat for each SNP
Statistical Test
SNPs
(X)
Gene expression levels (Z)
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .0 0 0 0 0 0 1 1 1 1 1 10 0 1 1 1 1 0 0 1 0 0 00 0 1 0 0 0 1 0 1 0 0 11 0 0 0 1 0 1 0 1 1 1 10 0 0 1 0 0 1 1 1 0 0 01 0 1 0 1 0 1 0 1 0 1 0. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .8 7 12 11 9 13 6 4 2 5 0 39 8 1 0 8 5 2 1 0 8 6 2. . . . . . . . . . . .
individuals
SNP1
1
0
4 8 12
Gene expression level
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Lasso-based feature selection
X: the SNP matrix (each row is one SNP)Z: the gene expression matrix (each row is one
gene expression level)Objective:
21
1min || || || ||
2 F W
Z WX W
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Incorporating prior knowledge
• SNPs (and genes) usually are not independent
• The interplay among SNPs and the interplay among genes can be represented as networks and used as prior knowledgePrior knowledge: genetic interaction network, PPI network,
gene co-expression network, etc.
• E.g., group lasso, multi-task, SIOL, MTLasso 2G.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Limitations of current methods
• A clustering step is usually needed to obtain the grouping information.
• Do not take into consideration the incompleteness of the prior knowledge and the noise in themE.g., PPI networks may contain many false interactions and
miss true interactions
• Other prior knowledge, such as location and gene pathway information, are not considered.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Motivation
• Examples of prior knowledge on genetic interaction network S and gene-gene interactions represented by PPI network (or gene co-expression network G).W is the regression coefficients to be learned.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
21 *
, , ,
T T
2 2
1min || || || || || ||
2
tr( ( ) ) tr( ( ) )
|| || || ||
F
F F
W L S 0 G 0
S G
0 0
Z WX L W L
W D S W W D G W
S S G G
GD-Lasso: Graph-regularized Dual Lasso• Objective:
Lasso objective considering confounding factors (L), ||L||* is the nuclear norm to control L as low-rank.
The graph regularizerThe graph regularizer
The fitting constraint for prior knowledgeThe fitting constraint for prior knowledge
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
GGD-Lasso: Generalized Graph-regularized Dual Lasso
• Further incorporating location and pathway information.
• Objective:
21 *
, , ,
* * , * * ,, ,
1min || || || || || ||
2
( , ) ( , )
F
i j i j i j i ji j i j
D D
W L S 0 G 0
Z WX L W L
w w S w w G
D(·, ·) is a nonnegative distance measure.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
GGD-Lasso: Optimization• Executes the following two steps iteratively until
the termination condition is met: 1) update W while fixing S and G;2) update S and G according to W, while decreasing: and
We can maintain a fixed number of edges in S and G. E.g., to update G, we can swap edge (i’, j’) and edge (i,j) when
• Further integrate location and pathway information
* * ,,
( , )i j i ji j
D w w S * * ,,
( , )i j i ji j
D w w G
* * '* '*( , ) ( , )i j i jD Dw w w w
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Experimental Study: simulation
• 10 gene expression profiles are generated by2
* * * * *
T*
(0, )
(0, ), , (0,1)
j j j j j
j ij
where N
N N
Z W X E E I
MM M~
~
~
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Experimental Study: simulation
The ROC curve. The black solid line denotes what random guessing would have achieved.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Experimental Study: simulation
AUCs of Lasso, LORS, G-Lasso and GD-Lasso. In each panel, we vary the percentage of noises in the prior networks S0 and G0.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Experimental Study: Yeast• yeast eQTL dataset
112 yeast segregants generated from a cross of two inbred strains: BY and RM;
removing those SNP markers with percentage of NAs larger than 0.1 (the incomplete SNPs are imputed), and merging those markers with the same genotypes, dropping genes with missing values;
get 1017 SNP markers, 4474 expression profiles;
• Genetic interaction network and PPI network (S and G)
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Experimental Study: Yeast
• cis-enrichment analysis (1) one-tailed Mann-Whitney: test on each SNP for cis
hypotheses; (2) a paired Wilcoxon sign-rank: test on the p-values
obtained from (1).
• trans-enrichment:Similar strategy: genes regulated by transcription factors
(TF) are used as trans-acting signals.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Experimental Study: Yeast
Pairwise comparison of different models using cis-enrichment and trans-enrichment analysis
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Experimental Study: Yeast
Summary of the top-15 hotspots detected by GGD-Lasso. Hotspot (12) in bold cannot be detected by G-Lasso. Hotspot (6) in italic cannot be detected by SIOL. Hotspot (3) in teletype cannot be detected by LORS.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Experimental Study: Yeast
Hotspots detected by different methods
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Conclusion
• In this paper…We propose novel and robust graph regularized
regression models to take into account the prior networks of SNPs and genes simultaneously.
Exploiting the duality between the learned coefficients and incomplete prior networks enables more robust model.
We also generalize our model to integrate other types of information, such as location and gene pathway information.