Introduction to Analysis of Genomic Data Using R...
Transcript of Introduction to Analysis of Genomic Data Using R...
![Page 1: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R](https://reader033.fdocuments.net/reader033/viewer/2022042913/5f48f2e29e59b5652b03236d/html5/thumbnails/1.jpg)
- - :
Introduction to Analysis of Genomic Data Using RLab 2: Introduction to R
Dr. Yen-Yi Ho ([email protected])
Jan 17, 2018
1/11
![Page 2: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R](https://reader033.fdocuments.net/reader033/viewer/2022042913/5f48f2e29e59b5652b03236d/html5/thumbnails/2.jpg)
- - :
2/11
![Page 3: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R](https://reader033.fdocuments.net/reader033/viewer/2022042913/5f48f2e29e59b5652b03236d/html5/thumbnails/3.jpg)
- - :
R computing & graphics package
I R is a powerful, free statistical computing and graphicspackage.
I Popular with many researchers due to contributed packages:R functions to do specialized, advanced, & often complexstatistical analysis.
I R can also do many important, routine calculations, analysis,and provide common graphical displays used in this course.
I Installed in several of the computing labs across campus, e.g.Sloan 108 & 109, Gambrell 003.
I You can download it and install it from CRAN:http://cran.r-project.org/
3/11
![Page 4: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R](https://reader033.fdocuments.net/reader033/viewer/2022042913/5f48f2e29e59b5652b03236d/html5/thumbnails/4.jpg)
- - :
R: Pros and Cons
Pros Cons+ Free - No dedicated support+ Available for all major - Complex Syntax
platforms+ Powerful graphics - Not point-and-click+ Comprehensive - No warranty+ Easy interface with other languages
(such as C, Fortran) - Relatively slow+ Well-designed programming
language (object-oriented)+ Unlimited extensibility+ Widely used by statisticians+ Increasingly used for genomic
analyses (Bioconductor)
4/11
![Page 5: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R](https://reader033.fdocuments.net/reader033/viewer/2022042913/5f48f2e29e59b5652b03236d/html5/thumbnails/5.jpg)
- - :
Bioconductor: a collection of R packages for genomic dataanalysis
I Started by Robert Gentleman housing R packages for genomicdata analysis.
5/11
![Page 6: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R](https://reader033.fdocuments.net/reader033/viewer/2022042913/5f48f2e29e59b5652b03236d/html5/thumbnails/6.jpg)
- - :
Bioconductor installation
I Use biocLite.R script
I Installing a specific package from Bioconductor:
source("http://www.bioconductor.org/biocLite.R")
biocLite("limma")
6/11
![Page 7: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R](https://reader033.fdocuments.net/reader033/viewer/2022042913/5f48f2e29e59b5652b03236d/html5/thumbnails/7.jpg)
- - :
Online resources: genome browser and public datarepositories
I UCSC genome browser: host genomic annotation data formany species.
7/11
![Page 8: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R](https://reader033.fdocuments.net/reader033/viewer/2022042913/5f48f2e29e59b5652b03236d/html5/thumbnails/8.jpg)
- - :
Public high-throughput data repositories
I GEO: Gene expression omnibus.I Funded by NCBII Host array- and sequencing-based data.
I ArrayExpression: European version of GEOI Better curated than GEO but has less data.
I SRA: sequence read archive.I Designed for hosting large scale high-throughput sequencing
data (high speed file transfer).
8/11
![Page 9: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R](https://reader033.fdocuments.net/reader033/viewer/2022042913/5f48f2e29e59b5652b03236d/html5/thumbnails/9.jpg)
- - :
Other public data resources
I TCGA (The Cancer Genome Atlas)I Host data generated by TCGA, a big consortium to study
cancer genomics.I Huge collection of cancer related data: different types of
genomic, genetic and clinical data for many different types ofcancers.
I ICGC (International Cancer Genome Consortium): Similar toTCGA but have a larger collection of studies.
I ENCODE (the ENCyclopedia Of DNA Elements) datacoordination center
I Host data generated by ENCODE, a big consortium to studyfunctional elements of human genome.
I Rich collection of genomic and epigenomic data.
I Many others ...
9/11
![Page 10: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R](https://reader033.fdocuments.net/reader033/viewer/2022042913/5f48f2e29e59b5652b03236d/html5/thumbnails/10.jpg)
- - :
Next Lab: R Topics Outline
I Get Started
I R as a calculator
I Vectors
I Matrices, Arrays, Factors, List, Data Frame
I Import/Export Data
I R Graphics
I Random number generating
I Writing R function
I for loops
I rep, seq, which, match
10/11
![Page 11: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R](https://reader033.fdocuments.net/reader033/viewer/2022042913/5f48f2e29e59b5652b03236d/html5/thumbnails/11.jpg)
- - :
To do list after this class
I Review slides.
I Read WiKi for DNA, gene, genome, DNA microarray andDNA sequencing.
I Install R and Bioconductor on your computer.
I Start to learn R by reading Applied Statistics froBioinformatics Using R https://cran.r-project.org/doc/contrib/Krijnen-IntroBioInfStatistics.pdf
11/11