Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq...
-
Upload
benedict-blankenship -
Category
Documents
-
view
220 -
download
0
Transcript of Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq...
![Page 1: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/1.jpg)
Schedule change
• Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq)
• Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy)
• Day 3: AM – Introduction to Exome Sequencing and Variant Discovery
• Day 3: PM - Exome sequence analysis practical (Galaxy)
Galaxy server going down for maintenance on Thursday
![Page 2: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/2.jpg)
Quick Recap• NGS data production becoming commonplace
• Many applications -> research intent determines technology platform choice
• High volume data BUT error prone
• FASTQ is accepted format standard
• Must assess quality scores before proceeding
• ‘Bad’ data can be rescued
![Page 3: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/3.jpg)
Introduction to RNAseq
![Page 4: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/4.jpg)
The Central Dogma of Molecular Biology
4
ReverseTranscription
![Page 5: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/5.jpg)
RNAseq Protocols
• cDNA, not RNA sequencing
• Types of libraries available:– Total RNA sequencing (not advised)– polyA+ RNA sequencing– Small RNA sequencing (specific size range
targeted)
![Page 6: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/6.jpg)
cDNA Synthesis
![Page 7: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/7.jpg)
Genome-scale Applications• Transcriptome analysis
• Identifying new transcribed regions
• Expression profiling
• Resequencing to find genetic polymorphisms:– SNPs, micro-indels – CNVs– Question: Why even bother with exome sequencing
then?
![Page 8: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/8.jpg)
Sequencing details• Standard sequencing
– polyA/total RNA– Size selection– Primers and adapters– Single- and paired-end sequencing
• Strand-specific sequencing– still immature tech– Sequencing only + or – strand– Mostly paired-end
![Page 9: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/9.jpg)
What about microarrays??!!!
• Assumes we know all transcribed regions and that spliceforms are not important
• Cannot find anything novel
• BUT may be the best choice depending on QUESTION
![Page 10: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/10.jpg)
Arrays vs RNAseq (1)
• Correlation of fold change between arrays and RNAseq is similar to correlation between array platforms (0.73)
• Technical replicates almost identical• Extra analysis: prediction of alternative
splicing, SNPs• Low- and high-expressed genes do not
match
![Page 11: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/11.jpg)
RNA-Seq promises/pitfalls
• can reveal in a single assay: – new genes – splice variants– quantify genome-wide gene expression
• BUT– Data is voluminous and complex– Need scalable, fast and mathematically principled
analysis software and LOTS of computing resources
![Page 12: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/12.jpg)
Experimental considerations• Comparative conditions must make biological sense
• Biological replicates are always better than technical ones
• Aim for at least 3 replicates per condition
• ISOLATE the target mRNA species you are after
• NOT looking for new transcripts can bias expression estimates
![Page 13: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/13.jpg)
Analysis strategies• De novo assembly of transcripts:
+ re-constructs actual spliced transcripts+ does not require genome sequence
easier to work post-transcriptional modifications- requires huge computational resources (RAM)- low sensitivity: hard to capture low abundance transcripts
• Alignment to the genome => Transcript assembly+ computationally feasible+ high sensitivity+ easier to annotate using genomic annotations- need to take special care of splice junctions
# 13
![Page 14: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/14.jpg)
Basic analysis flowchart
# 14
Illuminareads
Remove artifacts
AAA..., ...N...
Clip adapters(small RNA)
Pre-filter: low complexity
synthetic
Countand
discard
mappedAlign to the
genome
un-mapped
un-mapped
Re-align with different number of mismatchesetc
"Collapse" identical
reads
Assemble:contigs (exons)+ connectivity
mapped
Annotate
Filter out low confidence
contigs(singletons)
![Page 15: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/15.jpg)
Software• Short reads aligners
• Stampy, BWA, Novoalign, Bowtie, TOPHAT
• Data preprocessing• Fastx toolkit• samtools
• Expression studies• Cufflinks package• R packages (DESeq, edgeR, more…)
• Alternative splicing• Cufflinks• Augustus
![Page 16: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/16.jpg)
The ‘Tuxedo’ protocol• TOPHAT + CUFFLINKS
• TopHat aligns reads to genome and discovers splice sites
• Cufflinks predicts transcripts present in dataset
• Cuffdiff identifies differential expression
Very widely adopted suite
![Page 17: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/17.jpg)
![Page 18: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/18.jpg)
‘Tuxedo’ protocol limitations
• Uses shortread data - Illumina OR SOLiD
• Requires a sequenced genome
• No GUI
• Versions implemented in GALAXY are old(ish)
![Page 19: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/19.jpg)
Read alignment with TopHat
![Page 20: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/20.jpg)
Splice junctions
• In humans, terminal exons are ~1kb long, and since mRNAs are ~2kb,
~half of the reads should originate in initial and internal exons
• Initial and internal exons are ~200b long => for 75-mer reads, ~20% of reads are supposed to cross splice junctions
R
LexonRNA:
Genome:
![Page 21: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/21.jpg)
Splice junctions strategies
• Create a splice junctions database joining together donors and acceptors
• Typically, use known (annotated) splice junctions or known splice sites
• TopHat: uses putative exons from mapped reads, database is made of canonical splice sites around putative exons
![Page 22: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/22.jpg)
Read alignment with TopHat (2)
• Uses BOWTIE aligner to align reads to genome
• BOWTIE cannot deal with large gaps (introns)
• Tophat segments reads that remain unaligned
• Smaller segments mostly end up aligning
![Page 23: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/23.jpg)
Read alignment with TopHat (3)• When there is a large gap between segments of
same read -> probable INTRON
• Tophat uses this to build an index of probable splice sites
• Allows accurate measurement of spliceform expression
• Possibility of detecting gene fusion events
![Page 24: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/24.jpg)
Cufflinks package
• http://cufflinks.cbcb.umd.edu/• Cufflinks:
– Expression values calculation– Transcripts de novo assembly
• Cuffcompare:– Transcripts comparison (de novo/genome
annotation)• Cuffdiff:
– Differential expression analysis
![Page 25: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/25.jpg)
Cufflinks: Transcript assembly
• Assembles individual transcripts based on aligned reads
• Infers likely spliceforms of each gene
• Builds ‘transfrags’• The smallest number of spliceforms that can be
explained by the data• NOTE: assembly errors do occur -> sequencing depth
helps
![Page 26: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/26.jpg)
Cufflinks: Transcript assembly (2)
• Quantifies expression level of each transfrag
• Filters out those likely to be premature terminations, non-mature mRNAs, etc
![Page 27: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/27.jpg)
Cuffmerge
• Merges transfrags into transcripts where appropriate
• Also performs a reference based assembly of transcripts using known transcripts
• Produces single annotation file which aids downstream analysis
![Page 28: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/28.jpg)
Cuffdiff: Differential expression
• Calculates expression level in two or more samples
• Expression level relates to read abundance• Because of bias sources, cuffdiff tries to model
the variance in its significance calculation
What else is important?
![Page 29: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/29.jpg)
FPKM (RPKM): Expression Values
C= the number of reads mapped onto the gene's exonsN= total number of reads in the experimentL= the sum of the exons in base pairs.
![Page 30: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/30.jpg)
Cufflinks (Expression analysis)gene_id bundle_id chr left right FPKM FPKM_conf_lo FPKM_conf_hi statusENSG00000236743 31390 chr1 459655 461954 0 0 0 OKENSG00000248149 31391 chr1 465693 688071 787.12 731.009 843.232 OKENSG00000236679 31391 chr1 470906 471368 0 0 0 OKENSG00000231709 31391 chr1 521368 523833 0 0 0 OKENSG00000235146 31391 chr1 523008 530148 0 0 0 OKENSG00000239664 31391 chr1 529832 532878 0 0 0 OKENSG00000230021 31391 chr1 536815 659930 2.53932 0 5.72637 OKENSG00000229376 31391 chr1 657464 660287 0 0 0 OKENSG00000223659 31391 chr1 562756 564390 0 0 0 OKENSG00000225972 31391 chr1 564441 564813 96.9279 77.2375 116.618 OKENSG00000243329 31391 chr1 564878 564950 0 0 0 OKENSG00000240155 31391 chr1 564951 565019 0 0 0 OK
![Page 31: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/31.jpg)
Cuffdiff (differential expression)
• Pairwise or time series comparison• Normal distribution of read counts• Fisher’s test
test_id gene locus sample_1 sample_2 status value_1 value_2 ln(fold_change) test_stat p_value significantENSG00000000003 TSPAN6chrX:99883666-99894988 q1 q2 NOTEST 0 0 0 0 1 noENSG00000000005 TNMD chrX:99839798-99854882 q1 q2 NOTEST 0 0 0 0 1 noENSG00000000419 DPM1 chr20:49551403-49575092 q1 q2 NOTEST 15.0775 23.8627 0.459116 -1.39556 0.162848 noENSG00000000457 SCYL3 chr1:169631244-169863408 q1 q2 OK 32.5626 16.5208 -0.678541 15.8186 0 yes
![Page 32: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/32.jpg)
Visualization: Genome Viewers
• Web based:– UCSC Genome Browser (http://genome.ucsc.edu/)
• Standalone– Integrated Genome Viewer
(http://www.broadinstitute.org/software/igv/)
![Page 33: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/33.jpg)
![Page 34: Schedule change Day 2: AM - Introduction to RNA-Seq (and a touch of miRNA-Seq) Day 2: PM - RNA-Seq practical (Tophat + Cuffdiff pipeline on Galaxy) Day.](https://reader036.fdocuments.net/reader036/viewer/2022062423/56649ed05503460f94bdea40/html5/thumbnails/34.jpg)
RNAseq hands-on practical (Galaxy)
• Data QC and trimming
• Aligning reads to reference genome
• Running CUFFLINKS and looking at some transcripts using the UCSC genome browser
• Finding differentially expressed genes with CUFFDIFF