BioNetwork Biological Modeling and Analysis Microarray and Visualization.
Biological question Differentially expressed genes Sample class prediction etc. Testing Biological...
-
Upload
janet-worrick -
Category
Documents
-
view
226 -
download
2
Transcript of Biological question Differentially expressed genes Sample class prediction etc. Testing Biological...
![Page 1: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/1.jpg)
Biological questionDifferentially expressed genesSample class prediction etc.
Testing
Biological verification and interpretation
Microarray experiment
Estimation
Experimental design
Image analysis
Normalization
Clustering Discrimination
Churchill, March 15
Bult, Lecture 5
Bult, Lecture 6
Hibbs, Lectures 10 and 11
Blake, Lecture 16 and 17
![Page 2: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/2.jpg)
Project Steps
• Find and Download Array Data• Normalize Array Data• Analyze Data
– i.e., generate gene lists• Differentially expressed genes, genes in clusters, etc.
• Interpret Gene Lists– Use the annotations of genes in your lists
• Gene Ontology terms are available for many organisms, but not all
![Page 3: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/3.jpg)
Getting The Data
• Search GEO (or whatever) for a data set of interest.
• Download the data files– e.g., Affy .CEL files, Affy .CDF files, etc.
• Upload to home directory
![Page 4: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/4.jpg)
Normalize the Data
• Sent you all a script (2/23/2012) to RMA normalize the Ackerman array data available from my home directory
![Page 5: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/5.jpg)
library(affy)library(makecdfenv)
Array.CDF=make.cdf.env(“MoGene-1_0-st-v1.cdf”)CELData=ReadAffy()CELData@cdfName=“Array.CDF”rma.CELData = rma(CELData)rma.expr = exprs(rma.CELData)rma.expr.df = data.frame(ProbeID=row.names(rma.expr),rma.expr)write.table(rma.expr.df,"rma.expr.dat",sep="\t",row=F,quote=F)
![Page 6: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/6.jpg)
• What is a library?• What does the ReadAffy() function do?What
are possible arguments for the ReadAffy() function?
• What class of R object is rma.CELData?• What class of R object is rma.expr?• What class of R object is rma.expr.df?
![Page 7: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/7.jpg)
• slotNames(CELData)• phenoData(CELData)
![Page 8: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/8.jpg)
This is what rma.expr.df looks like in Excel……
![Page 9: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/9.jpg)
Plotting summarized probeset intensities across the Ackerman arrays….(non normalized)
jpeg("boxplot.jpeg")boxplot(CELData, names=CELData$sample, col="blue")dev.off()
![Page 10: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/10.jpg)
mydata=rma.expr.df
jpeg("normal_boxplot.jpg")boxplot(mydata[-1], main = "Normalized Intensities", xlab="Array", ylab="Intensities", col="blue")dev.off()
Plotting summarized probeset intensities across the Ackerman arrays….(normalized)
![Page 11: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/11.jpg)
Next time
• Posted articles from Gary Churchill. – If you only read one article, read Churchill 2004– See also Gary’s web site:
• http://churchill.jax.org/software/rmaanova.shtml– Look at Sample Data and Tutorial
• After that lecture we will begin analysis of microarray data– MAANOVA
![Page 12: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/12.jpg)
![Page 13: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/13.jpg)
19901992
19941997
19992001
20032005
20072009
0.00
10,000.00
20,000.00
30,000.00
40,000.00
50,000.00
60,000.00
70,000.00
$0.00
$20.00
$40.00
$60.00
$80.00
$100.00
$120.00
$140.00
Gig
abas
esCost per Kb
Lucinda Fulton, The Genome Center at Washington University
Cost Throughput
![Page 14: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/14.jpg)
Sequencing Technologies
http://www.geospiza.com/finchtalk/uploaded_images/plates-and-slides-718301.png
![Page 15: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/15.jpg)
Sequence “Space”• Roche 454 – Flow space
– Measure pyrophosphate released by a nucleotide when it is added to a growing DNA chain
– Flow space describes sequence in terms of these base incorporations– http://www.youtube.com/watch?v=bFNjxKHP8Jc
• AB SOLiD – Color space– Sequencing by DNA ligation via synthetic DNA molecules that contain two nested known
bases with a flouorescent dye– Each base sequenced twice– http://www.youtube.com/watch?v=nlvyF8bFDwM&feature=related
• Illumina/Solexa – Base space– Single base extentions of fluorescent-labeled nucleotides with protected 3 ‘ OH groups– Sequencing via cycles of base addition/detection followed deprotection of the 3’ OH– http://www.youtube.com/watch?v=77r5p8IBwJk&feature=related
• GenomeTV – Next Generation Sequencing (lecture)– http://www.youtube.com/watch?v=g0vGrNjpyA8&feature=related
http://finchtalk.geospiza.com/2008/03/color-space-flow-space-sequence-space_23.html
![Page 16: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/16.jpg)
“Standard” File formats
Sequence containersFASTAFASTQBAM/SAM
AlignmentsBAM/SAMMAF
AnnotationBEDGFF/GTF/GFF3WIG
VariationVCFGVF
![Page 17: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/17.jpg)
ToolsAlignments
BLAST: not for NGSBWABowtieMaq…
TranscriptomicsTophatCufflinks…
Variant callingssahaSNPMosaic…
Counting (Chip-Seq, etc)FindPeaksPeakSeq
![Page 18: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/18.jpg)
FASTQ: Data Format• FASTQ
– Text based– Encodes sequence calls and quality scores with ASCII characters– Stores minimal information about the sequence read– 4 lines per sequence
• Line 1: begins with @; followed by sequence identifier and optional description
• Line 2: the sequence• Line 3: begins with the “+” and is followed by sequence identifiers and
description (both are optional)• Line 4: encoding of quality scores for the sequence in line 2
• References/Documentation– http://maq.sourceforge.net/fastq.shtml– Cock et al. (2009). Nuc Acids Res 38:1767-1771.
![Page 19: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/19.jpg)
FASTQ Example
FASTQ example from: Cock et al. (2009). Nuc Acids Res 38:1767-1771.
For analysis, it may be necessary to convert to the Sanger form of FASTQ…For example,
Illumina stores quality scores ranging from 0-62;Sanger quality scores range from 0-93.
Solexa quality scores have to be converted to PHRED quality scores.
![Page 20: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/20.jpg)
SAM (Sequence Alignment/Map)
• It may not be necessary to align reads from scratch…you can instead use existing alignments in SAM format– SAM is the output of aligners that map reads to a
reference genome– Tab delimited w/ header section and alignment
section• Header sections begin with @ (are optional)• Alignment section has 11 mandatory fields
– BAM is the binary format of SAM
http://samtools.sourceforge.net/
![Page 21: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/21.jpg)
http://samtools.sourceforge.net/SAM1.pdf
Mandatory Alignment Fields
![Page 22: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/22.jpg)
http://samtools.sourceforge.net/SAM1.pdf
Alignment Examples
Alignments in SAM format
![Page 23: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/23.jpg)
chr1 86114265 86116346 nsv433165chr2 1841774 1846089 nsv433166chr16 2950446 2955264 nsv433167chr17 14350387 14351933 nsv433168chr17 32831694 32832761 nsv433169chr17 32831694 32832761 nsv433170chr18 61880550 61881930 nsv433171
chr1 16759829 16778548 chr1:21667704 270866 -chr1 16763194 16784844 chr1:146691804 407277 +chr1 16763194 16784844 chr1:144004664 408925 -chr1 16763194 16779513 chr1:142857141 291416 -chr1 16763194 16779513 chr1:143522082 293473 -chr1 16763194 16778548 chr1:146844175 284555 -chr1 16763194 16778548 chr1:147006260 284948 -chr1 16763411 16784844 chr1:144747517 405362 +
Valid BED files
![Page 24: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/24.jpg)
Galaxyhttp://main.g2.bx.psu.edu/
See Tutorial 1
Build and share data and analysis workflowsNo programming experience requiredStrong and growing development and user community
![Page 25: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/25.jpg)
Tools HistoryDialog/Parameter Selection
![Page 26: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/26.jpg)
Tutorial Web Sitehttp://www.ncbi.nlm.nih.gov/staff/church/GenomeAnalysis/index.shtml
Tutorial 5
![Page 27: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/27.jpg)
RNA Seq Workflow• Convert data to FASTQ• Upload files to Galaxy• Quality Control
– Throw out low quality sequence reads, etc.• Map reads to a reference genome
– Many algorithms available– Trade off between speed and sensitivity
• Data summarization– Associating alignments with genome annotations– Counts
• Data Visualization• Statistical Analysis
![Page 28: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/28.jpg)
Typical RNA_Seq Project Work Flow
Sequencing Sequencing
Tissue Sample Tissue Sample
Cufflinks Cufflinks
TopHat TopHat
FASTQ file FASTQ file
QC QC
Gene/Transcript/Exon Expression
Gene/Transcript/Exon Expression
VisualizationVisualization
Total RNA Total RNA mRNA mRNA cDNA cDNA
Statistical Analysis
Statistical Analysis
JAX Computational Sciences Service
![Page 29: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/29.jpg)
TopHat
Trapnell et al. (2009). Bioinformatics 25:1105-1111.
http://tophat.cbcb.umd.edu/
Figure from: Trapnell et al. (2010). Nature Biotechnology 28:511-515.
TopHat is a good tool for aligning RNA Seq data compared to other aligners (Maq, BWA) because it takes splicing into account during the alignment process.
![Page 30: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/30.jpg)
Trapnell C et al. Bioinformatics 2009;25:1105-1111
TopHat is built on the Bowtie alignment algorithm.
![Page 31: Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649cae5503460f94971feb/html5/thumbnails/31.jpg)
Cufflinks
Trapnell et al. (2010). Nature Biotechnology 28:511-515.
http://cufflinks.cbcb.umd.edu/
• Assembles transcripts,• Estimates their abundances, and •Tests for differential expression and regulation in RNA-Seq samples