RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.
-
Upload
sandra-powell -
Category
Documents
-
view
222 -
download
1
Transcript of RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.
![Page 1: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/1.jpg)
RNA-seq workshopCOUNTING & HTSEQ
Erin Osborne Nishimura
![Page 2: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/2.jpg)
_trim.fastq file
.bam/.sam file
.bw file
counts.txt file
TOPHAT2
bedGraphToBigWig
bedtools genomecov
.bg file
HTseq
DESeq2/R
Differentially AbundantgenesIGV/UCSC
Pretty browser shots
Today’s simple analysis pipeline.fastq file
trimmomatic/bbduk.sh
![Page 3: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/3.jpg)
Quantification with htseq
![Page 4: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/4.jpg)
Quantification with htseq
![Page 5: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/5.jpg)
The problem
![Page 6: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/6.jpg)
Counting reads
• What will we count?• Genes?• Exons?• Isoforms?
• What are some of the issues we need to account for when counting reads?
• Paralogs?• Overlap?• Isoforms?• Errors?
• How to count?• Raw counts• RPKM -- Reads aligned kilobase per million mapped reads• FPKM -- Fragments per kilobase per million mapped reads
![Page 7: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/7.jpg)
htseq-count
• Manual:– http://www-huber.embl.de/users/anders/HTSeq/
doc/count.html
• Paper– http://bioinformatics.oxfordjournals.org/content/3
1/2/166
![Page 8: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/8.jpg)
The problem
![Page 9: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/9.jpg)
The three htseq-count modes
![Page 10: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/10.jpg)
Switch to hands on tutorial
• https://github.com/erinosb/HTSF_workshop/blob/master/02_RNAseq_count.md
![Page 11: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/11.jpg)
Assessing differential abundance
![Page 12: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/12.jpg)
Assessing pairwise differential abundance, relatively simple
Anders and Huber, 2010
![Page 13: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/13.jpg)
Identifying genes with shared patterns across multiple samples, complex
![Page 14: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/14.jpg)
For today…
Anders and Huber, 2010
![Page 15: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/15.jpg)
Many publications report performance comparisons of the of different packages
• Seyednasrollah et al., 2013
– http://bib.oxfordjournals.org/content/16/1/59.full.pdf+html
• Soneson et al., 2013.• http://www.biomedcentr
al.com/1471-2105/14/91
• Rapaport et al., 2013– http://www.genomebiolog
y.com/2013/14/9/r95
![Page 16: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/16.jpg)
Why is this hard? Why is this different from other types of data?
• Your question• The data
– Discretness– Small numbers of
replicates– Large dynamic range– Outliers– Data is overdispersed
• Variance does not scale linearly with mean
• Breaks the assumptions of some inference tests
Anders and Huber, 2010
![Page 17: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/17.jpg)
Why DESeq?
• Original paperhttp://www.genomebiology.com/content/11/10/R106
• DESeq2 paper• http://www.genomebiology.com/2014/15/12/550
• Bioconductor• http://bioconductor.org/packages/release/bioc/html/DESeq2.ht
ml• Vignette
• https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf
![Page 18: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/18.jpg)
A final word about the fate of your data
• You will need to submit your raw and processed files in a repository PRIOR to submitting your paper for publication.
• Keep track of what you did!– Module Versions– Conversion & transformation steps– Settings/Options
![Page 19: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/19.jpg)
Switch to hands-on tutorial
• https://github.com/erinosb/HTSF_workshop/blob/master/02_RNAseq_count.md
![Page 20: RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.](https://reader030.fdocuments.net/reader030/viewer/2022020417/56649f1e5503460f94c35baf/html5/thumbnails/20.jpg)
20
Key Quality Control Metrics
• Gene coverage– CEAS
• Over-amplification– FASTQC
• Complexity– TOPHAT output
• Reproducibilitybility