Post on 28-Mar-2015
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Gene Set Enrichment Analysis
Chao-Jen Wong
Fred Hutchinson Cancer Research Center
January 28, 2010
1 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
1 Hypergeometric Testing
2 Simple GSEA using Z-score and Permutation
3 GSEA using Linear Models
2 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Gene set enrichment analysis
Unlike per-gene analysis ...
Search for categories where the constituent genes show changes inexpression level over the experimental conditions.
Use predefined gene set such as KEGG pathways, GO classifications,chromosome bands, and protein complexes.
No need to make a cutoff between genes that are differentiallyexpressed and those that are not.
Provided in the GESABase, Category , GOstats and topGO.
3 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Outline
1 Hypergeometric Testing
2 Simple GSEA using Z-score and Permutation
3 GSEA using Linear Models
4 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Hypergeometric testing
Basic concept: Suppose there are N balls in an urn, n are white andm are black. Drawing k balls out of the urn without replacement,how many black balls do we expect to get? What is the probabilityof getting x black balls?
Hypergeometric testing for under- and over-representation of GOterms.
Inputs
1 Gene universe, N.
2 GO categories (categorize genes by GO terms).
3 A list of interesting genes, I , (differentially expressed genes
identified by limma or just simply t-test by rowttests).
5 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Hypergeometric testing
Interesting (Black) Not (White)In GO term n11 n12 K
Not in GO term n21 n22 N − KI N − I N
Suppose there are j interesting genes in the GO term (n11 = j),compute
1 Probability of seeing j or more black balls in K draws.
2 Expected number of black balls seeing in K draws.
6 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Data preparation
Define gene universe (a vector of Entrez Gene IDs).
Select a list of interesting genes (a vector of Entrez Gene ID).
Code: gene selection via t-test
> library(genefilter)> library(day2)> library(hgu95av2.db)> data(ALLfilt_bcrneg)> ttests <- rowttests(ALLfilt_bcrneg, "mol.biol")> ## select interesting genes> smPV <- ttests[ttests$p.value < 0.005, ]> selectedEntrezIds <- unlist(mget(rownames(smPV),+ hgu95av2ENTREZID))> entrezUniverse=unlist(mget(featureNames(ALLfilt_bcrneg),+ hgu95av2ENTREZID))
7 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Data preparation
Define gene universe (a vector of Entrez Gene IDs).
Select a list of interesting genes (a vector of Entrez Gene ID).
Code: gene selection via t-test
> library(genefilter)> library(day2)> library(hgu95av2.db)> data(ALLfilt_bcrneg)> ttests <- rowttests(ALLfilt_bcrneg, "mol.biol")> ## select interesting genes> smPV <- ttests[ttests$p.value < 0.005, ]> selectedEntrezIds <- unlist(mget(rownames(smPV),+ hgu95av2ENTREZID))> entrezUniverse=unlist(mget(featureNames(ALLfilt_bcrneg),+ hgu95av2ENTREZID))
7 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Hypergeometric testing
Create a GOHyperGParams object.
Code: GOHyperGParams
> library(GOstats)> hgCutoff <- 0.001> GOparams <- new("GOHyperGParams",+ geneIds=selectedEntrezIds,+ universeGeneIds=entrezUniverse,+ annotation="hgu95av2.db",+ ontology="BP",+ pvalueCutoff=0.001,+ conditional=TRUE,+ testDirection="over")
8 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Outputs and summary.
Code: hyperGTest
> hgOver <- hyperGTest(GOparams)> class(hgOver)> summary(hgOver)
Exercise: generate report using htmlReort.
> showMethods("htmlReport")> htmlReport(hgOver, file="hgResult.html")> browseURL("hgResult.hrml")
9 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Outputs and summary.
Code: hyperGTest
> hgOver <- hyperGTest(GOparams)> class(hgOver)> summary(hgOver)
Exercise: generate report using htmlReort.
> showMethods("htmlReport")> htmlReport(hgOver, file="hgResult.html")> browseURL("hgResult.hrml")
9 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Lab activity
1 Chapter 14: read and do the exercises in Section 14.3 and 14.4.
2 Use the topGenes dataset (load the data using data(topGenes))and find a subset of genes whose adj.P.Val are less than 0.01.
3 Repeat the conditional Hypergeometric testing to find under- andover-represented biological processes.
4 Generate html reports.
10 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Outline
1 Hypergeometric Testing
2 Simple GSEA using Z-score and Permutation
3 GSEA using Linear Models
11 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Simple GSEA
Consider two group comparison
Start with data quality assessment.
Compute per-gene t-statistics: tk for each gene k.
Null hypothesis: no difference in mean expression
Ho : ZK = 0
ZK =1�|K |
�
k∈K
tk ∼ N (0, 1),
where K denotes the gene sets, and |K | the number of genes in thegene set.
Alternative approach: use permutation test to assess which genesets have an unusually large absolute value of zK .
12 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Data preparation
ALLfill bcrneg> library(ALL)> library(hgu95av2.db)> data(ALL)> bcell <- grep("^B", as.character(ALL$BT))> types <- c("NEG", "BCR/ABL")> moltyp <- which(as.character(ALL$mol.biol) %in% types)> # subsetting> ALL_bcrneg <- ALL[, intersect(bcell, moltyp)]> ALL_bcrneg$BT <- factor(ALL_bcrneg$BT)> ALL_bcrneg$mol.biol <- factor(ALL_bcrneg$mol.biol)> # nonspecific filter: remove genes that does not> ## show much variation across samples> library(genefilter)> filt_bcrneg <- nsFilter(ALL_bcrneg,+ var.cutoff=0.5)> ALLfilt_bcrneg <- filt_bcrneg$eset
13 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Using KEGG
Data representation: create an incidence matrix Am where aij = 1 ifgene j is in gene set i and aij = 0 otherwise.
> library(KEGG.db)> library(GSEABase)> gsc <- GeneSetCollection(ALLfilt_bcrneg,+ setType=KEGGCollection())> Am <- incidence(gsc)
ExpressionSet object retains only those features that are in theincidence matrix Am.
> nsF <- ALLfilt_bcrneg[colnames(Am), ]
14 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Using KEGG
Exercise
1 How many gene sets and how many genes are represented by theincidence matrix Am?
2 How many gene sets have fewer than ten genes in them?
3 What is the largest number of gene sets in which a gene can befound?
4 What is the name of this gene set? (use KEGGPATHID2NAME)
Code> dim(nsF)> dim(Am)> nGene <- rowSums(Am)> rownames(Am)[nGene < 10]> sort(nGene, decreasing=TRUE)[1]> KEGGPATHID2NAME[["05200"]]
15 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Using KEGG
Exercise
1 How many gene sets and how many genes are represented by theincidence matrix Am?
2 How many gene sets have fewer than ten genes in them?
3 What is the largest number of gene sets in which a gene can befound?
4 What is the name of this gene set? (use KEGGPATHID2NAME)
Code> dim(nsF)> dim(Am)> nGene <- rowSums(Am)> rownames(Am)[nGene < 10]> sort(nGene, decreasing=TRUE)[1]> KEGGPATHID2NAME[["05200"]]
15 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Using KEGG
Compute the per-gene test statistics using the rowttests function.
> rtt <- rowttests(nsF, "mol.biol")> names(rtt)
[1] "statistic" "dm" "p.value"
> rttStats <- rtt$statistic
Reduce the incidence matrix by removing all gene sets that havefewer than ten genes in them.
> selectedRows <- (rowSums(Am) > 10)> Am2 <- Am[selectedRows, ]
Compute zk for each pathway: zK = 1√|K |
�k∈K tk .
> tA <- as.vector(Am2 %*% rttStats)> tAadj <- tA /sqrt(rowSums(Am2))> names(tAadj) <- rownames(Am2)
16 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Using KEGG
Compute the per-gene test statistics using the rowttests function.
> rtt <- rowttests(nsF, "mol.biol")> names(rtt)
[1] "statistic" "dm" "p.value"
> rttStats <- rtt$statistic
Reduce the incidence matrix by removing all gene sets that havefewer than ten genes in them.
> selectedRows <- (rowSums(Am) > 10)> Am2 <- Am[selectedRows, ]
Compute zk for each pathway: zK = 1√|K |
�k∈K tk .
> tA <- as.vector(Am2 %*% rttStats)> tAadj <- tA /sqrt(rowSums(Am2))> names(tAadj) <- rownames(Am2)
16 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Using KEGG
Compute the per-gene test statistics using the rowttests function.
> rtt <- rowttests(nsF, "mol.biol")> names(rtt)
[1] "statistic" "dm" "p.value"
> rttStats <- rtt$statistic
Reduce the incidence matrix by removing all gene sets that havefewer than ten genes in them.
> selectedRows <- (rowSums(Am) > 10)> Am2 <- Am[selectedRows, ]
Compute zk for each pathway: zK = 1√|K |
�k∈K tk .
> tA <- as.vector(Am2 %*% rttStats)> tAadj <- tA /sqrt(rowSums(Am2))> names(tAadj) <- rownames(Am2)
16 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Using KEGG
Exercise
1 Which pathways have remarkably low (< 5) and high aggregatestatistics (> 5)?
2 What is the name the pathway that has the lowest zk score?
3 Use KEGG2heatmap to plot a heatmap for the genes in thispathway.
Code> smPW <- tAadj[tAadj < -5]> mget(names(smPW),KEGGPATHID2NAME)> lgPW <- tAadj[tAadj > 5]> mget(names(lgPW), KEGGPATHID2NAME)
17 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Using KEGG
Exercise
1 Which pathways have remarkably low (< 5) and high aggregatestatistics (> 5)?
2 What is the name the pathway that has the lowest zk score?
3 Use KEGG2heatmap to plot a heatmap for the genes in thispathway.
Code> smPW <- tAadj[tAadj < -5]> mget(names(smPW),KEGGPATHID2NAME)> lgPW <- tAadj[tAadj > 5]> mget(names(lgPW), KEGGPATHID2NAME)
17 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
KEGG2heatmap
> KEGG2heatmap("03010", nsF, "hgu95av2")
11005
28036
65005
64001
08012
28006
28042
28001
26003
28031
22013
12026
22010
28005
24022
68003
28021
27003
12012
43004
09017
24018
22009
28007
24001
24008
27004
03002
28024
04010
33005
24011
04016
62001
57001
64002
20002
30001
16009
49006
24010
36002
25006
28019
01010
28037
08001
06002
12019
43007
68001
28035
24017
04008
28023
48001
09008
15005
25003
26001
37013
14016
28043
28044
62002
62003
12007
31011
08011
28047
04007
22011
84004
01005
43001
08024
12006
15001
43012
31948_at33661_at41214_at34316_at32394_s_at1151_at39856_at39916_r_at41765_at31511_at34570_at31907_at33485_at36786_at33668_at31546_at31955_at32330_at34085_at31956_f_at33619_at32440_at36333_at35125_at31509_at32315_at33674_at31505_at34592_at32437_at36358_at33117_r_at1653_at
18 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Permutation testing
Assess the significant gene sets with respect to a referencedistribution build by a number of permutations.
gseattperm: permute the sample labels.
Return p-value w.r.t. to a reference distribution:Lower: proportion of permutation t-statistics that were smaller than
the observed t-statisticsUpper: proportion of permutation t-statistics that were larger than
the observed t-statistics
Code: using gseattperm> library(Category)> set.seed(123)> pvals <- gseattperm(nsF, nsF$mol.biol, Am2, 1000)> pvalCut <- 0.05> lowC <- rownames(pvals)[pvals[, 1] <= pvalCut]> unlist(getPathNames(lowC), use.names=FALSE)
[1] "Glycerophospholipid metabolism"
[2] "Ribosome"
[3] "RNA polymerase"
[4] "DNA replication"
[5] "Mismatch repair"
[6] "Homologous recombination"
19 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Outline
1 Hypergeometric Testing
2 Simple GSEA using Z-score and Permutation
3 GSEA using Linear Models
20 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Chromosome bands
Use the mapping of genes to chromosome bands.
To answer whether there are anomalies in the pattern of geneexpression that related to chromosome bands.
Use GSEA linear models.
Human chromosome 12
p13.33
p13.32
p13.31
p13.2
p13.1
p12.3
p12.2
p12.1
p11.23
p11.22
p11.21
p11.1
q11
q12
q13.11
q13.12
q13.13
q13.2
q13.3
q14.1
q14.2
q14.3
q15
q21.1
q21.2
q21.31
q21.32
q21.33 q22
q23.1
q23.2
q23.3
q24.11
q24.12
q24.13
q24.21
q24.22
q24.23
q24.31
q24.32
q24.33
Figure: Ideogram for human chromosome 12. The shaded bands togetherrepresent 12q21. Notice that the chromosome bands are hierarchicallynested, and they almost form a partition. (D. Sarker et. al. 2007)
Reference
”Using Categories defined by Chromosome Bands”by D. Sarker et. al.
21 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Data preparation
Consider the comparison of BCR/ABL and NEG groups.
Use ALL_bcrneg object.
Use nsFilter to remove probes with no Entrez Gene ID and nomapping to a chromosome band. Ensure that each Entrez Gene IDmaps to exactly one probeset which has the highest IQR. Alsoremove probes with lack of variation (var < 0.5).
Code: nonspecific filtering
> ALLfilt <- nsFilter(ALL_bcrneg, require.entez=TRUE,+ remove.dupEntrez=TRUE,+ require.CytoBand=TRUE,+ var.func=IQR,+ var.cutoff=0.5)$eset
22 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Data preparation
Consider the comparison of BCR/ABL and NEG groups.
Use ALL_bcrneg object.
Use nsFilter to remove probes with no Entrez Gene ID and nomapping to a chromosome band. Ensure that each Entrez Gene IDmaps to exactly one probeset which has the highest IQR. Alsoremove probes with lack of variation (var < 0.5).
Code: nonspecific filtering
> ALLfilt <- nsFilter(ALL_bcrneg, require.entez=TRUE,+ remove.dupEntrez=TRUE,+ require.CytoBand=TRUE,+ var.func=IQR,+ var.cutoff=0.5)$eset
22 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Data preparation
Compute per-gene t-statistics using limma.
Code: moderate t-statistics> library(limma)> design <- model.matrix(~0 + ALLfilt$mol.biol)> colnames(design) <- c("BCR/ABL", "NEG")> contr <- c(1, -1)> fit1 <- lmFit(ALLfilt, design)> fit2 <- contrasts.fit(fit1, contr)> fit3 <- eBayes(fit2)> tlimma <- topTable(fit3, number=nrow(fit3),+ adjust.method="none")> ## annotation> entrezUniverse <- unlist(mget(tlimma$ID,+ hgu95av2ENTREZID))> tstats <- tlimma$t> names(tstats) <- entrezUniverse
23 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Data preparation
Compute per-gene t-statistics using limma.
Code: moderate t-statistics> library(limma)> design <- model.matrix(~0 + ALLfilt$mol.biol)> colnames(design) <- c("BCR/ABL", "NEG")> contr <- c(1, -1)> fit1 <- lmFit(ALLfilt, design)> fit2 <- contrasts.fit(fit1, contr)> fit3 <- eBayes(fit2)> tlimma <- topTable(fit3, number=nrow(fit3),+ adjust.method="none")> ## annotation> entrezUniverse <- unlist(mget(tlimma$ID,+ hgu95av2ENTREZID))> tstats <- tlimma$t> names(tstats) <- entrezUniverse
23 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Linear models
Per−gene t−statistics
12q21.1
12q21
12q2
12q
All
−5 0 5
!! !
!!!
!!! !!
!! !
!!! ! !! !
! !! !!! !
! !! !! !!!!! !! ! !
! ! ! !!!
!!!!! !
!!! !!! !
!!!! !
!!!!!
!! !! !! !!!
! !!!
!!!
!! !!!
!!
!! !!!!! !
!!! ! ! !!
! !!
!!
! !!
!!! ! !! !! ! !!! !!
!!
! !!! !
!!!! !
! !! !!!! ! !
! !!!! !! !
!!!!!! !! ! !!
!!! ! !!
! !!! !
!! !!! !!
!!
! !!!! ! !! !
! !! !!!!! !
!!! !!!! ! !! ! !
!! ! !!!
!!!
!! !!! ! !!! !!! !
! !! !!!!
!!!
!!! !
!!! !!
! ! !!!! ! !! !! !! !
! !!! !!
!! !
!!!
!!
!!!!!!
! !! !
!!
!!!!
!!!
! !!!
!! !
!! ! !
!!
! ! !!!
!! !
!! !!
!!!!!!! !
!!
!!! !! !!! ! !
!! !! !! !!! ! !!!
!!!
!! !
! !!!! !!! !
! !!!!! ! !!! !
!!!
!! ! !! !! !!
!! ! ! !! !
! !!
! !!
!!! !!! !
! !!! !! !! !!! ! !!! !!!!
!!!
!!
!!!! ! !! !!
! !!! !!
!!!!
! !! !!
!!!! ! !!! !
!! !!! !! !
! ! !! !! !!!!! !
!!!
!!!
!!!
!!
!!! !!
!!!! !! !
! !!!!
!! !! ! !! !
!! !! !!!
! ! !! !!! !
!! !!! !! !! !!!! !!!! !!
! ! ! !!!
!! !!!!! !! ! !! ! !!
!! !!! !!!!
!! ! !! !!
!! ! ! !!!
!!!
!!!!! !!!! !!!
! ! !!! !!!! !!! !! !
! !!!
!!!!
! !! !
!!!
!!! !! ! !! !
!! !!! !
!!
!! !!!!!!
!!
!!!!!! !!! !
!! ! ! ! !
!!
! !!
!!!!! !!
!!!!!
!!!
!!!
!!!! !!
!! !!
!!!
!!
! !! !!! ! !! !!
! !! ! !
!!!
! !! !!!! !!! !! !
!! !!
!! !!
!!
! !!!! !! ! !! !!! !! !
! !!! ! !!! !!
!!
! !! !!!
!!
!!! ! !!!!
!!
!!!! !!
! !!!!!!
! !! !! !! ! !!!! ! !
! !!! !! !! ! !!!!!! !!
! !! ! !!!!
!! !!!!! !!
!!!
!!
!!! !
!!!!
!! ! !!
!! !!!! !
! !! !!
! ! !! !!
!!! !! ! ! !!! !! !! !
!!!
!! ! !!!
! !!!! !! !!!
!!
! !!!
!! !! ! !!! !!!
!!
!!!
!!
!!!
!! !! ! !!! !!
!!! ! !! !
!!
!! !!
!!! !!
!! !!!!
!! !!!! ! !!!!
! !!
! ! !! ! !!! !!! !! !! !!! !!!!!
!!! !! ! ! !
!!!
!!
!! ! ! !!!!!!
!!!
! !!
!!! ! !! ! !!! !! !
! !! !! ! !!
! !!
!!
!! !! !! !!!! !
!!
!!!
!!! !
!!!
!! !!! !
!!! ! !
!!!! ! !
!!
!!!!
!!!!! !!! ! !
!! !!!! ! ! !!!! ! !!!
! !! !! !! !! ! ! !!! !
!!! !! !!!! !! !!
!! !! !!!! !! !
!!
!!
! !!!
!!
!! !! !
!! !!
! ! !! !!
! !!
! !!! ! !
!!! ! !!!!! !
!! !!!
!! ! !!
!!! ! !! !
!!
!!!! !
! !!!!
!!!
! ! !!!
!! !!
!! ! !!! ! ! !
!!! !!! ! !! !!
!!! !!!
!! !!!!
!! ! !! !! !!!
!! !! ! !
!!! !
! !!!
!! !!! !
! !!!! !
!! !
!!! ! !! !
! !!! ! !
!!!!! ! !!!
! !!
!!
!!!!
!! !!!!! !!!! !
!! !
!! !!
!!!!!! !!!!
!!
! !!!! !! !
!!!
! !!!!!
!!! !! !
!!! ! ! !! !
!!!!! !! !! ! !
!!!
! !!! !! !!
!! !!!!!
!!
!!
!!!! !
!!!! !
! !!!!!
!! ! !!!
! !! !!! !! !!
!!
!! !!
!! !! ! ! !
!!! !
! !! ! !!!!! !
!! !!!!!!!
!!!
!!!
!!
! !!!!!
! !!!
!!
!! !
! !!!!
! !! !!! !!
!!!! ! !! !
!!! !!! !
!! !!
!! !! !
!!!!!! !
!!
!!!! ! !
!! !!
! ! ! !! ! !! !!! !! ! !! !! !! !!
!! !!
!! !!!
! !!!!!!
! ! !! !! ! !!!
!!! !!! !!!! !!
!!! !! ! !!! !
!!!
! !!
!! !!! ! !! !!! ! !! !
!!!! !!
!!!! !! !!!!
! !!!!
!!!
!!! !! !
!!!!
!!!!
!!! !!! !
!!! ! !!!! !
!! !!
!!! !! !!! !!
! !! !!!
!!!
!! !!! !!!
! ! !!!!!! !!!
!!!! !! ! !! ! !!
!!!!!!
!!!! ! ! !
!!
!!!
!!
!!
! !!! ! ! !
! !! !!! !! ! !
!!!
!!
! !! !!
!!!!!! ! !!
!!! !!
! ! !!!!!! !! !
! !! !! !! !!!
!! !!
!!! !! !!!
!!!! ! !!!
!! !!! !!! !
!!
!!!! !! !
! !!!!
!! !
!!! !!! !! !! !!
! !!!
!! !!!! !!! !!
!! !
!!
!!! ! !! !
!!! !
!!
! ! !!
!! !
!!
!!
!!!! !!! !
!! ! !
!! ! !!! !!
! !!! ! !! !!!!
! !!! !!! !! !!!
! !!
!!! !
!! !!!!! !!!!! ! !!
!!!! !
!!! !!!! !!! ! !
!!!! !!!!
!! !! !! !!!
!!
!!! !!!!
!!! !!
! !!! !!! !!
!!!
!! ! !
!!
!! ! !! !!!!
!!
! !!
!!!
! !! !! !!! ! !
!! !! !!!
! ! !!!
!! !!!
! !!!! !
!! !!!
!!! !
!!!!
!!!! ! !! !
!!
!! ! !!!
!!! !! ! !!
!! !!
! !!!!
!! !! !
!!!! !!
!!
!!! !!!!
!! !
! ! ! !!!! !! !
!!! !!! !! ! !
! !! !! !!! !! !!! !
!! !!
!!!! !! !!
!!!!!!! !
! !!!
!!! !!
!!!
! !! !!
! !!
!!! !!!!
!!
! !!! !
! !! !! !!!! !! !! !
!!
!! !!!! ! ! ! !!! !! !
! ! !!! !!
! !!
!!
!!
! !! ! !! !! !
!!!
!!!
!!! !!! !! ! !!!! !! !! !!!!! !!
!! !!!! !
!! !! !!!
!! !!!! !!!! ! !!! !!
!!! !!!!
!! !
! !!
!!! !!! !! !!! !
!!
! !!!! ! !
! ! ! ! !!
!! !! !
!!! !
!! !!!! ! !
!!
! !!!! !!
! !!! ! !! !!!!
!!!
! !!
! !!!
!!!
! !! !! ! !!! !!
!! !! !
!!! !! ! !! !! !!!!!! !! ! !
!! !! !!
!! ! !! !!
!! !!
!! !! !
! !!! !
!! ! !!!
!! !!
!! !! !! !
!!!
! ! !! !! !
!!!! !! !!! !
!!! !!!!
!!! !!! !!! !! !!! !!!!! !! !
!!!
!!
!! !! !! !
! !! !! !! !! !
!!
!!! !! !!!! !! !!
!! !!!! !!
! !!
!! ! !!! !! !
! !!!!!!
! !! !!
! !!! ! !
!!!
!!
! ! !!! ! !
!!
!!!! !!!!! !!!
! ! !! !! !
!!! !!! !
!! !! !! !! !!
!! !! !
!!! !! !!
!!
! !! !
!!! !!
! !!!! !! !! ! !
! !!!
!!
!! ! !
!! !!! ! !
!!!
!! !!! !
!!!
!!! !!!
! !! !
!!!!!!!! !! !! !!
!!
!! !! !! !!!
!!! ! !!!! !
!! ! !! ! ! !!! !! !! !!
! !!
! !! !!!!!
!!!! !!
! !!
!!!!!!
!!! !! !! !
!! !! !
! ! !! !!
!! ! !!!
!!!!
! ! ! !! ! !!!!
!! !! ! !! ! !! !
!!
! !!
!!!! !! ! !!
! ! !! !
!!
!! !!
! !! ! !!! !
!!!! !!
!! !! !
! !!! !
!!
!!! !
!! ! !!
!!! !!! !!
!!!! ! !
!!!! ! !!
!!
!!
!!!
!!! !
!!!
!!!! !!! ! !! !
!!
!!
!!!
!!
!!
! !! !!! !! !!!! ! !!
!!! ! !! !
!!! !!! ! !
!!
!! ! !! !!!!! !! ! !!! ! !
!!!
!! !! !!
!!!
!!! !
!!! !! !!
! ! !!! !! !!!
!!!! ! !!!
!! !
!!! !
!!!! !
! !!!!!
! !!!!! ! !
!!!!!
!!!!
!! !! !
! ! ! !!! !!
!! !! ! !!! !!
!! !!!
! !! ! !! !!!
! !!! !!
! !! !!
!!
!!
!!
! !!!! !
! !! !! !!
!! !!
! !! !!!
! !! ! !! !! !! !!
!!!! !! ! !! !
!!! ! !! !!
! !! ! !!
!!!
!!
!!! ! !
!!
!!!
!!
! !!
!!!!!
!!
!!! !
! !!!
!!!!
!!
!! !!
!!
!!
! !!
!!!!!
!!! !
! !! !!! ! !!! ! !!!! ! !
! !! !! !
!!!! !!!! ! ! !!! ! !!! ! !! !!!!!
!!! !! ! !!! ! !!! !! !!! !! !! !!! !
!!!
! ! !! !!!! !! !!
!!
!!!
!!
!!!! !
!!
!! !!! !!!! !
!! ! !! !! !!!
! !!!
!! !
! !! !!!! !!
! !!
!!!
! ! !! !!!
! !!!! !!
!! !!!! !! !
!! !! !!!!! !!
!!!!
! !!! !! !! !!
!! !!! !
! !!!!
! !! !! !
!!!
! !! !!! !!!!
!!! !
!!
!!! !
! !!!
!!!!!
!! !!! !
! !! !!!!
! !!! !!!! !!
! !! !!! !!
!! ! !!! !!
!!! ! !!!
!! !
!!!
!! !!!!! !!
!!! !
!! !!!!
! !! !!!! ! !!
!!! !!!!
!!!! !
!!!!! !!
!!! !!
!!!! !!! !! ! !
! !! !!! !!! !!
!!
!!! !! !
!!! ! !! !!! !
! ! !! !! !!! !!
!!
! !!!! ! ! ! !!! ! !!!
!!!!!!!!!!!
! !! !!!
!!!! !
!! !
!!
!!!!!! !! !
!!! !!!
!! !! !!! ! !!! ! !!
!!!! ! !! !! !
!! !! ! ! !!! !!!! !
! !!
!!
!!
! !!!!
! !!! !
!! !!
!!! !
!!! !
!! !!!
!!!! !!
! ! !!!!!
!!
! !!!!! ! !!! !!!!
! !!! !! !! !! !!! ! ! !
!! !
! ! !! ! !!!
!!
!!!
!! !! !
!! !! !
! !!! !!! !!! ! !
! !! ! !
!!! !
!!!!
!!
!!!
!! !!!! !! !
! !! !!! !
! ! !!!!! !
! !!! !!!! !
! ! !!! !!
! !! ! !! !!!
!!
!! !! !!! !!
!!!
! !!!
!!
! !!!!
!! !! !! !! !!
!!
!!!!
!!!!
!! !!!! !!
! !! !!! !!
!!! !!!!! !
!! !
!!!!!!! !!! ! !!
! !!! ! !! !
!!! !
! !!!! !
!! !! !! !! !! !! !
!!
! !!! ! !!!
!! !!! !!!
! !!! !
!!
! !!! ! !!
!! ! !! !!!!! !!! ! !
! !! !!! !!!
!! !!!! !! !! ! ! !
! ! !!!! !! ! !! !
!! !!!
! ! !!!! !!
!!
!! !! !!!!! !
!! !! !
!!! ! !!
!!!! !!
!! !! ! !!! !!
!!! !!
!! !!!!
!! ! !!!
!!!
!!
! ! !!! !!! ! !!!! !!
!! !!! !! !
! !! !! !! !! !
! !!! !
!!! ! ! !!
!!
! !!! !! !!!!!
!!! !!
! ! !!! !!! ! ! !! !
!!!
!! !!! !
!! !!
! !!!! ! !! !!!
!! ! !!
!!! ! !!
! !!! ! !!!!! !!
!! !
! !!! !! !! !!! !
!! !! !!!
!!! !!
!!! !! !
!!
!! ! !!
!! !! !! !! !! !!!
!!!
! ! !!
!! !! !!!!
! !!!
! ! !!
! !!!!!
!! ! ! !!!
!!!!! !!! !
!!
! !!!!
!!
! ! !! ! !!!!!
!! !!
!!!! !! !!!! !! !
Fitting linear model with per-gene t-statistics: for each category j ,
yi = β0 + β1aij + εi ,
where aij = 1 if gene i is associated with category j , and 0otherwise. The index i may range over from universal genes to asubset of genes.
β1 ∼ N (0, 1)
Null hypothesisH0 : β1 = 0
24 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Linear models
Create a ChrMapLinearMParams object.
Code: instance of class ChrMapLinearMParams
> library(Category)> params <- new("ChrMapLinearMParams",+ conditional=FALSE,+ testDirection="up",+ universeGeneIds=entrezUniverse,+ geneStats=tstats,+ annotation="hgu95av2",+ pvalueCutoff=0.01,+ minSize=4L)
25 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Calling the linearMTest function
linearMTest: compute the p-values for detecting up- ordown-regulation of predefined gene sets.
Code: linearMTest
> lman <- linearMTest(params)
> lman
> summary(lman)
26 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Exercise
1 Get familiar with the structure of ChrMapLinearMParams class?ChrMapLinearMParams or help(“ChrMapLinearMParams-class”)
2 Perform conditional GSEA linear models to find interestingchromosome bands that are up-regulated.
3 Summarize the result of the conditional test using summary.
Code: conditional test> slotNames(params)> paramsCond <- params> paramsCond@conditional <- TRUE> lmanCond <- linearMTest(paramsCond)> summary(lmanCond)
27 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Exercise
1 Get familiar with the structure of ChrMapLinearMParams class?ChrMapLinearMParams or help(“ChrMapLinearMParams-class”)
2 Perform conditional GSEA linear models to find interestingchromosome bands that are up-regulated.
3 Summarize the result of the conditional test using summary.
Code: conditional test> slotNames(params)> paramsCond <- params> paramsCond@conditional <- TRUE> lmanCond <- linearMTest(paramsCond)> summary(lmanCond)
27 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Summary
1 Basic idea behind GSEA.
2 Simple GSEA: t-tests and permutation.
3 Using KEGG categories.
4 Linear models and chromosome band categories.
5 Hypergeometric testings on GO BP terms.
28 / 29
Gene Set Enrichment Analysis
Outline Hypergeometric Testing Simple GSEA using Z-score and Permutation GSEA using Linear Models
Reference
Assaf P. Oron et. al., Gene set enrichment analysis using linearmodels and diagnostics, Bioinformatics, vol. 24 no. 22, pp.2566-2591, 2008.
Florian Hahne et. al., Bioconductor Case Studies, chapter 13-14,Springer, 2008.
Deepayan Sarker et. al., Using Categories defined chromosomebands, Bioconductor Category package vignette.
D. Sarker et.al., Modeling gene expression data via chromosomebands, Bioinformatics, 2007.
29 / 29
Gene Set Enrichment Analysis