Statistics for genomics - University Of...
Transcript of Statistics for genomics - University Of...
![Page 1: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/1.jpg)
Statistics for genomicsMayo-Illinois Computational Genomics Course
June 11, 2019
Dave ZhaoDepartment of Statistics
University of Illinois at Urbana-Champaign
![Page 2: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/2.jpg)
Preparation
• install.packages(c("Seurat", "glmnet", "ranger", "caret"))
• Download sample GSM2818521 of GSE109158 from https://urlzs.com/7UNr6
![Page 3: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/3.jpg)
Objective
![Page 4: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/4.jpg)
We will illustrate how to identify the appropriate statistical method for a genomics analysis
Statistical toolbox
Statistical method
![Page 5: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/5.jpg)
Classifying statistical tools
No dependent variables
Continuous outcome
Censored outcomes
Etc.
Visualize
Identify latent factors
Cluster observations
Select features
Etc.
Data structure
Statistical task
![Page 6: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/6.jpg)
Examples from basic
statistics
![Page 7: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/7.jpg)
Rosner (2015)
Examples from basic
statistics
![Page 8: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/8.jpg)
Distinctive features of genomic data can require specialized statistical tools to handle:1. Increased complexity2. Increased uncertainty3. Increased scale
Wu, Chen et al. (2011)
![Page 9: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/9.jpg)
Wu, Chen et al. (2011)
New technology
Distinctive features of genomic data can require specialized statistical tools to handle:1. Increased complexity2. Increased uncertainty3. Increased scale
![Page 10: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/10.jpg)
Wu, Chen et al. (2011)
Many variables
Distinctive features of genomic data can require specialized statistical tools to handle:1. Increased complexity2. Increased uncertainty3. Increased scale
![Page 11: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/11.jpg)
Wu, Chen et al. (2011)
Few samples
Distinctive features of genomic data can require specialized statistical tools to handle:1. Increased complexity2. Increased uncertainty3. Increased scale
![Page 12: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/12.jpg)
Wu, Chen et al. (2011)
High heterogeneity
Distinctive features of genomic data can require specialized statistical tools to handle:1. Increased complexity2. Increased uncertainty3. Increased scale
![Page 13: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/13.jpg)
We will discuss some statistical tools that are frequently used in genomics
We will also discuss ageneral framework fororganizing new tools.
![Page 14: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/14.jpg)
Choosing the right tool for a given biological question requires creativity and experience
Statistical toolbox
Statistical method
![Page 15: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/15.jpg)
There are other frameworks that are organized by biological question rather than statistical tool
![Page 16: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/16.jpg)
Example analysis using R
![Page 17: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/17.jpg)
Use R packages to perform data analysis
![Page 18: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/18.jpg)
To illustrate these tools, we will analyze single-cell RNA-seq data in R
![Page 19: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/19.jpg)
Anatomy of a basic R command
pandey =
read.table("GSM2818521_larva_counts_matrix.txt")
• Case-sensitive
• Function(): performs pre-programmed calculations given inputs and options
• Variable: stores values and outputs of function, name cannot contain whitespace and cannot start with a special character
![Page 20: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/20.jpg)
Basic preprocessing of single-cell RNA-seq data using Seuratlibrary(Seurat)
s_obj = CreateSeuratObject(counts = pandey,
min.cells = 3, min.features = 200)
s_obj = NormalizeData(s_obj)
s_obj = FindVariableFeatures(s_obj)
s_obj = ScaleData(s_obj)
![Page 21: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/21.jpg)
Statistical methods for genomics
![Page 22: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/22.jpg)
Where statistics appears in a standardgenomic analysis workflow
1. Experimental design
2. Quality control
3. Preprocessing
4. Normalization and batch correction
5. Analysis
6. Biological interpretation
Not covered today
Uses statistics but is highly dependent on technology
The focus of today’s discussion
![Page 23: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/23.jpg)
Classifying statistical tools
No dependent variables
Continuous outcome
Censored outcomes
Etc.
Visualize
Identify latent factors
Cluster observations
Select features
Etc.
Data structure
Statistical task
![Page 24: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/24.jpg)
Classifying statistical tasks
Description Inference
Tables
PlotsSupervised
Unsupervised
Testing
Estimation
Prediction
Latent factors
Latent groups
![Page 25: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/25.jpg)
Classifying data structures
• Can vary widely, and classification is difficult
• Important factors in genomics:1. Data type
2. Number of samples relative to number of variables
![Page 26: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/26.jpg)
PCA
> dim(pandey)
[1] 24105 4365
> pandey[1:3, 1:2]
larvalR2_AAACCTGAGACAGAGA.1 larvalR2_AAACCTGAGACTTTCG.1
SYN3 0 0
PTPRO 1 0
EPS8 0 0
Research question:Can the gene expression information be summarized in fewer features?
![Page 27: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/27.jpg)
PCA
Statistical task: calculate a (usually small) set of latent factors that captures most of the information in the datasetData structure: can be applied to all data structures
Description Inference
Tables
PlotsSupervised
Unsupervised
Testing
Estimation
Prediction
Latent factors
Latent groups
![Page 28: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/28.jpg)
PCA
ISL Chapter 6
![Page 29: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/29.jpg)
PCA using Seurat
s_obj = RunPCA(s_obj)
![Page 30: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/30.jpg)
Graph clustering
Research question:How many cell types exist in the larval zebrafish habenula?
![Page 31: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/31.jpg)
Graph clustering
Statistical task: construct latent groups into which the observations fallData structure: relatively small number of features and complicated cluster structure
Description Inference
Tables
PlotsSupervised
Unsupervised
Testing
Estimation
Prediction
Latent factors
Latent groups
![Page 32: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/32.jpg)
Graph clustering
![Page 33: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/33.jpg)
Comparison to other clustering methods
http://webpages.uncc.edu/jfan/itcs4122.html
Distance• Hierarchical clustering• K-means clustering
Relationship• Spectral clustering• Graph clustering
vs.
![Page 34: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/34.jpg)
Graph clustering using Seurat
s_obj = FindNeighbors(s_obj, dims = 1:10)
s_obj = FindClusters(s_obj, dims = 1:10,
resolution = 0.1)
![Page 35: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/35.jpg)
t-SNE plot
Research question:How to visualize the different cell types?
![Page 36: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/36.jpg)
t-SNE plot
Statistical task: visualize observations in low dimensionsData structure: can be applied to all data structures
Description Inference
Tables
PlotsSupervised
Unsupervised
Testing
Estimation
Prediction
Latent factors
Latent groups
![Page 37: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/37.jpg)
t-SNE plot
https://en.wikipedia.org/wiki/Spring_system
![Page 38: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/38.jpg)
t-SNE plot
s_obj = RunTSNE(s_obj)
DimPlot(s_obj, reduction = "tsne", label = TRUE)
![Page 39: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/39.jpg)
Wilcoxon test
Research question:Does the expression of the gene PDYN differ between clusters 5 and 6?
![Page 40: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/40.jpg)
Wilcoxon test
Statistical task: test whether there is an association between two variablesData structure: one variable is dichotomous
Description Inference
Tables
PlotsSupervised
Unsupervised
Testing
Estimation
Prediction
Latent factors
Latent groups
![Page 41: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/41.jpg)
Wilcoxon test using Seurat
markers = FindMarkers(s_obj, ident.1 = 5,
ident.2 = 6)
markers["PDYN",]
![Page 42: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/42.jpg)
FDR control
Research question:Which genes differ between clusters 5 and 6?
![Page 43: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/43.jpg)
FDR control
Statistical task: identify which of many hypothesis tests are truly significantData structure: p-values are available and statistically independent
Description Inference
Tables
PlotsSupervised
Unsupervised
Testing
Estimation
Prediction
Latent factors
Latent groups
![Page 44: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/44.jpg)
FDR control
• FDR = False Discovery Rate = expected value of
# false discoveries
total # of discoveries
• Statistical methods reject the largest number of hypothesis tests while maintaining FDR ≤ 𝛼, for some preset 𝛼
![Page 45: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/45.jpg)
FDR control using Seurat
markers = FindMarkers(s_obj, ident.1 = 5,
ident.2 = 6)
head(markers)
sum(markers$p_val_adj <= 0.05)
![Page 46: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/46.jpg)
Random forest classification
Research question:Given the principal components of the RNA-seq expression values of all genes from a new cell, how can we determine the cell’s type?
?
![Page 47: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/47.jpg)
Random forest classification
Statistical task: learn prediction rule between dependent and independent variablesData structure: one categorical dependent variable, relatively few independent variables, complicated relationship
Description Inference
Tables
PlotsSupervised
Unsupervised
Testing
Estimation
Prediction
Latent factors
Latent groups
![Page 48: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/48.jpg)
Random forest classification
ISL Chapter 8
Independent variables
Prediction
![Page 49: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/49.jpg)
Random forest classification using caret and rangerlibrary(caret)
pcs = Embeddings(s_obj, reduction = "pca")[-(1:2), 1:10]
class = as.factor(Idents(s_obj))[-(1:2)]
dataset = data.frame(class, pcs)
rf_fit = train(class ~ .,
data = dataset,
method = "ranger",
trControl = trainControl(method = "cv", number = 3))
new_pcs = Embeddings(s_obj, reduction = "pca")[1:2, 1:10]
predict(rf_fit, new_pcs)
Idents(s_obj)[1:2]
![Page 50: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/50.jpg)
Lasso
Research question:If we can only measure the expression of 10 genes in a new cell, which should we measure in order to most accurately predict the cell’s type?
?
![Page 51: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/51.jpg)
Lasso
Statistical task: identify a small set of independent variables that are useful for predicting dependent variablesData structure: simple (linear) prediction rule, dependent variables are associated with only a few (unknown) independent variables
Description Inference
Tables
PlotsSupervised
Unsupervised
Testing
Estimation
Prediction
Latent factors
Latent groups
![Page 52: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/52.jpg)
Lasso
Estimates coefficients in the regression model
𝑌 = 𝛽0 + 𝑋1𝛽1 +⋯+ 𝑋𝑝𝛽𝑝
![Page 53: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/53.jpg)
Lasso using glmnet
library(glmnet)
counts = t(GetAssayData(s_obj, slot = "counts"))[-(1:2),]
pcs = Embeddings(s_obj, reduction = "pca")[-(1:2), 1:10]
lasso = cv.glmnet(counts, pcs, family = "mgaussian", nfolds = 3)
lambda = min(lasso$lambda[lasso$nzero <= 10])
coefs = coef(lasso, s = lambda)
rownames(coefs$PC_1)[which(coefs$PC_1 != 0)]
new_counts = t(GetAssayData(s_obj, slot = "counts"))[1:2,]
new_pcs = predict(lasso, newx = new_counts, s = lambda)[,, 1]
predict(rf_fit, new_pcs)
Idents(s_obj)[1:2]
![Page 54: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/54.jpg)
Conclusions
![Page 55: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/55.jpg)
Statistical toolbox
Description Inference
Tables
PlotsSupervised
Unsupervised
Testing
Estimation
Prediction
Latent factors
Latent groups
![Page 56: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/56.jpg)
Statistical tools
1. PCA
2. Graph clustering
3. t-SNE plot
4. Wilcoxon test
5. FDR control
6. Random forest classification
7. Lasso
![Page 57: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/57.jpg)
Can be applied to single-cell RNA-seq and beyond
![Page 58: Statistics for genomics - University Of Illinoispublish.illinois.edu/computational-genomics-course/files/...Distinctive features of genomic data can require specialized statistical](https://reader036.fdocuments.net/reader036/viewer/2022081400/5f1a583b44beb3396f128b45/html5/thumbnails/58.jpg)
To learn more
• Take systematic courses in basic statistics, statistical learning, and R/python
• Study recently published papers in your field of interest that use your technology of interest
• Consult tutorials, workshops, lab mates, and Google
Thank you