overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05...
Transcript of overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05...
Technion 28/03/05
Bioconductor
overview
Mirror site @ http://inn.weizmann.ac.il/bioconductor/
Technion 28/03/05
Outline
IntroductionBioconductor by examples?How to use it?
Technion 28/03/05
History: developmental environment
TransparencyReproducibilityModular development
Technion 28/03/05
History: Features of R
Prototyping – quick and dirty programmingPackaging- help data and code are provided in one packageObject oriented programming (OOP)WWW connectivity – connecting to external resources without leaving R environmentStatistical and mathematical capabilityVisualization and graphic supportSupport for concurrent computation (parallel machine)
Technion 28/03/05
Bioconductor site
Technion 28/03/05
Bioconductor site: screenshots
Technion 28/03/05
Bioconductor packages
633Gene classificationOntologies
11Normalization and summarization of Affymetrix and spotted data
Pre-processing
6Tools for genomics arrayGenomics & proteomics
14Data analysis and QC of microarray dataAnalysis
5Tools for dynamic annotationAnnotation
3R- API for databasesDatabase Interaction
7TCL GUI to make the use more convenientGraphics & User Interface
4R- API for other graphical toolsGraphs
10Infra-structure tools for developersGeneral tools
Number of packagesDescriptionPackage category
Technion 28/03/05
Bioconductor: Metadata
CDF PackagesProbe PackagesAnnotation Packages
Technion 28/03/05
Bioconductor: experimentaldata
ALL Acute leukemia Affymetrix datacolonCA Alon et al. (1999) coMlon cancerecoliLeucine Experimental data with Affymetrix E. coli chipsEstrogen 2x2 factorial design exercise for the Bioconductor short course1.5.0fibroEset exprSet for Karaman et al. (2003) fibroblasts data golubEsets exprSets for golub leukemia dataIyer517 Time course data for fibroblasts exposed to serumSpikeInAffymetrix's Spike-In Experiment DatayeastCCSpellman et al. (1998) yeast cell cycle microarray data0.5.1
Technion 28/03/05
Bioconductor vignettes
Set of tutorial in PDF format that include chunk of codes.Each vignette constructs how to use a package.
Annotation1annaffy
Annotation1goCluster
Annotation3GOstats
Annotation7annotate
Sppoted array1arrayMagic
Sppoted array2limmaGUI
Affymetrix1makecdfenv
Affymetrix1simpleaffy
Affymetrix1gcrma
Affymetrix1affylmGUI
Affymetrix1affydata
Affymetrix1affycomp
Affymetrix2affyPLM
Affymetrix3altcdfenvs
Affymetrix5affy
AnalysisCount of PackagePackage
Technion 28/03/05
Bioconductor by examples?
QCPre-processing for spotted and affymetrixData analysisAnnotation – chromosome mapping and databases mappingGO statistics
Technion 28/03/05
Typical Experiments
RNA extraction
RNA labeling
One-color experimentTwo-color experiment
conditioncontrol
conditioncontrol
Technion 28/03/05
DNA Array Technologies
(B) “Spotting”(A) Affymetrix
Technion 28/03/05
Differences: Analysis flow
QuantificationNormalizationFiltering noise outData analysis
Pre-processingTwo-color/spotted experiment
One-color/Affy experiment
Technion 28/03/05
Common: Expression Data MatrixexprsSet object
Samples Each column comes from one array
Experiments set
Gen
es
Each row represent one genes
Gene expression levels
Gene expression matrixc
Technion 28/03/05
Affymetrix QC: Affymetrix Chip
Technion 28/03/05
Affymetrix QC: Probe set design
Multiple Multiple oligo probes of 25oligo probes of 25--mermer
55´́ 33´́GeneGene
SequenceSequence
Perfect MatchPerfect Match
MismatchMismatch
Technion 28/03/05
Affymetrix QC: RNA labeling
AAAA RNA
AAAA
TTTTRNA -> cDNA
cDNA -> cRNA
AAAA
TTTT
AAAA
TTTT3’ds cDNA
5’
Technion 28/03/05
Affymetrix QC: RNA degradation
deg <-AffyRNAdeg(affybatch.example)summaryAffyRNAdeg(deg)
plotAffyRNAdeg(deg)
20A 20B 10ASlope 0.0767 0.063 0.0842p-value 0.136 0.212 0.091
Technion 28/03/05
Affymetrix QC: Probe level view
replicate1replicate2
Probe level density plot shows that the difference between two replicates is higher than the difference between signal and backgroundProbe level scatter plot of
replicated chips
Technion 28/03/05
Affymetrix Preprocessing: Probe set design
GeneGeneSequenceSequence
Multiple Multiple oligo probes of 25oligo probes of 25--mermer
55´́ 33´́
Perfect MatchPerfect Match
MismatchMismatch
Technion 28/03/05
Affymetrix Preprocessing: Probe set variability
Probes variability within probe set is 2 to 5 fold larger than the probe variability among arrays
Technion 28/03/05
Affymetrix Preprocessing: Many arrays assess probe affinity
~500
,000
pro
bes
Samples
12,0
00 –
22,0
00 g
enes
prob
es
Probe set Intensity = RNA concentration + Probe affinity
One probe set
Technion 28/03/05
Affymetrix Preprocessing: Overall Precision MAS5, dChip and RMA
Reducing the noise of low signals
Technion 28/03/05
Affymetrix Preprocessing: Algorithm flow
4; medianpolish3; Model based, outliers detection
3; average of log(PM-CT)
Signal summary
5; 2; simple PM-MM or PM-only
2; DividingBackground subtraction
3; estimating the left tail
No1; calculating CT<PM
Background summary
2; quantile1; invariantset4; globalNormalization
145;Linearization (log-transformation)
RMAdChipMAS5
Technion 28/03/05
Affymetrix Preprocessing: Bioconductor flexibility
Package: AffyMethod: expresso
Technion 28/03/05
Spotted preprocessing: Per-Tip Normalization
}Sub-array
Array
Technion 28/03/05
Spotted preprocessing: Per-Tip Normalization
Y. H. Yang, et al., Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research, 30(4), 2002.
Technion 28/03/05
Spotted preprocessing: Per-Tip Normalization(pre-normalization)
Down-regulated Up-regulated Spatial graph – each colored squared represent a spoteach black square represents a set of spots printed by a given tip.Green is a gene that were down-regulated in the 10th percentileRed is a gene that were up-regulated in the 10th percentile
Technion 28/03/05
Spotted preprocessing: Per-Tip Normalization (post-normalization)
Technion 28/03/05
Spotted preprocessing: Experimental design
A B C
Ref
Common reference
Loop design
A
B
C
A B C
ABCPooled reference
Technion 28/03/05
Behavior: detecting of small expression differences
– Before that paper only small amount of genes in brain were shown to have changes in expression which are related to behavior.
– By conducting the right experimental design using ANOVA in this study they could detect small differences of gene expression in honey bees brain and therefore relate many genes to behavior.
– By this they could predict the behavior of honey-bees workers by their brain gene expression.
C. W. Whitfield, A. M. Cziko, G. E. Robinson, Science 302, 296-9 (2003).
Technion 28/03/05
Data Analysis: Applications of microarrays
Evolution– Most of gene expression differences between chimpanzees human have
been detected in their brain. Development
– Associating apoptosis related gene expression with metamorphosis stages in a variety tissues of Drosophila
Behavior– Can predict the behavior of honey-bees workers by their brain gene
expression. Tissue
– Molecular signature specific to subtype of cancer tissuesRegulation
– Finding novel regulatory motifs by coupling motif search with co-expressionFunctional annotation
– Annotating unknown genes based on co expression (guilty by association)
- Question about samples - Question about gene
Technion 28/03/05
Biological questions
Sample classification– What is the set of genes that differentiate between two or
more groups of Treatments (Supervised methods)– What is the set of conditions that displays the same
expression profile in a given genes repertoire. (Unsupervised methods)
Gene classification– What is the set of genes that have the same expression
pattern over a set of treatments. (Unsupervised methods)
Technion 28/03/05
Data Analysis: Methods
Supervised – Discrimenant analysis– ANOVA– Linear models– Empirical Bayesian
Unsupervised Analysis– Partition methods
SOM (Self Organization Maps)K means
– Hierarchical Clustering (bootstrapping)
Technion 28/03/05
Data Analysis: visualization
Using R
Bioconductor
Preparing packages that connect R to other visualization programs
Technion 28/03/05
Annotation: Packages
GenebankLocusLink
KEGGGO
GEOSwissProtUniGene
Parser AnnBuilder(Developers)
Annotate(Users)
Database format(HTML/XML)
Package
HTML Table
Technion 28/03/05
Annotation: Functionality enrichment (GoStat; geneplotter)
Wound healing
Signaling and Angiogenesis
Immediate and early response
Cell cycle
Cholesterol biosynthesis
Biological Process
2
1
24
1
Number of genes in a cluster (10)
20
10
2040
10
Number of genes in an array (100)
Enrichment to cell cycle?
606Others
404Cell Cycle
ArrayCluster
Y-axis:GC-RMA GA, Time continuousColored by:Time 0
Error Bars:between-sample std. errorGene List:Selected (6)
0 1 3 7 14 30Time0.01
0.1
1
10
100
0 1 3 7 14 30Time0.01
0.1
1
10
100
Technion 28/03/05
Who is it for?
Knowing your way in microarray analysisYou have to know basics of R-LanguageRequire only R environment due to R automation
Technion 28/03/05
First time:Installation
Installation by two commands
– source(http://www.bioconductor.org/getBioC.R)
– getBioC()
gpls, PROcess and apComplex.proteomics
DNAcopy and aCGH.arrayCGH
Biobase, ctc, daMA, edd, factDesign, genefilter, geneplotter, globaltest, gpls, limma, RMAGEML, multtest, pamr, ROC, siggenes, and splicegear.
analyses
annotate, AnnBuilder, humanLLMappings, KEGG, GO, SNPtools, makecdfenv, and ontoTools.
annotation
daMA and factDesign.design
AnnBuilder, SAGElyzer, Rdbi, and RdbiPgSQL.
database
tkWidgets, widgetTools and DynDoc.widgets
externalVector, graph, hexbin and Ruuid.prog
graph, Rgraphviz and RBGLgraph
Biobase, annotate, edd, genefilter, geneploter, globaltest,ROC, MAGEML, multtest, limma, pamr, siggenes and vsn.
exprs
gets all packages from "affy", "cdna" and "exprs"
default
marray, vsn, plus "exprs".cdna
affy, affycomp, affydata, affyPLM, annaffy, gcrma, makecdfenv, matchprobes, plus "exprs".
affy
Allall
PackagesArgument
Technion 28/03/05
First time: Searchable mailing list
Technion 28/03/05
First time: browsing vignette
library(tkWidgets)vExplorer()
Technion 28/03/05
Summary
Bioconductor is a project is not a software therefore you have to know your way in microarray analysisIt includes almost any type of analysis and many unique methods however to use it one has to know RBioconductor support the use of its packages by supplying many tutorials and methods that let you stay in R environment