overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05...

43
Technion 28/03/05 Bioconductor overview Mirror site @ http://inn.weizmann.ac.il/bioconductor/

Transcript of overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05...

Page 1: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Bioconductor

overview

Mirror site @ http://inn.weizmann.ac.il/bioconductor/

Page 2: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Outline

IntroductionBioconductor by examples?How to use it?

Page 3: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

History: developmental environment

TransparencyReproducibilityModular development

Page 4: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

History: Features of R

Prototyping – quick and dirty programmingPackaging- help data and code are provided in one packageObject oriented programming (OOP)WWW connectivity – connecting to external resources without leaving R environmentStatistical and mathematical capabilityVisualization and graphic supportSupport for concurrent computation (parallel machine)

Page 5: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Bioconductor site

Page 6: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Bioconductor site: screenshots

Page 7: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Bioconductor packages

633Gene classificationOntologies

11Normalization and summarization of Affymetrix and spotted data

Pre-processing

6Tools for genomics arrayGenomics & proteomics

14Data analysis and QC of microarray dataAnalysis

5Tools for dynamic annotationAnnotation

3R- API for databasesDatabase Interaction

7TCL GUI to make the use more convenientGraphics & User Interface

4R- API for other graphical toolsGraphs

10Infra-structure tools for developersGeneral tools

Number of packagesDescriptionPackage category

Page 9: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Bioconductor: experimentaldata

ALL Acute leukemia Affymetrix datacolonCA Alon et al. (1999) coMlon cancerecoliLeucine Experimental data with Affymetrix E. coli chipsEstrogen 2x2 factorial design exercise for the Bioconductor short course1.5.0fibroEset exprSet for Karaman et al. (2003) fibroblasts data golubEsets exprSets for golub leukemia dataIyer517 Time course data for fibroblasts exposed to serumSpikeInAffymetrix's Spike-In Experiment DatayeastCCSpellman et al. (1998) yeast cell cycle microarray data0.5.1

Page 10: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Bioconductor vignettes

Set of tutorial in PDF format that include chunk of codes.Each vignette constructs how to use a package.

Annotation1annaffy

Annotation1goCluster

Annotation3GOstats

Annotation7annotate

Sppoted array1arrayMagic

Sppoted array2limmaGUI

Affymetrix1makecdfenv

Affymetrix1simpleaffy

Affymetrix1gcrma

Affymetrix1affylmGUI

Affymetrix1affydata

Affymetrix1affycomp

Affymetrix2affyPLM

Affymetrix3altcdfenvs

Affymetrix5affy

AnalysisCount of PackagePackage

Page 11: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Bioconductor by examples?

QCPre-processing for spotted and affymetrixData analysisAnnotation – chromosome mapping and databases mappingGO statistics

Page 12: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Typical Experiments

RNA extraction

RNA labeling

One-color experimentTwo-color experiment

conditioncontrol

conditioncontrol

Page 13: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

DNA Array Technologies

(B) “Spotting”(A) Affymetrix

Page 14: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Differences: Analysis flow

QuantificationNormalizationFiltering noise outData analysis

Pre-processingTwo-color/spotted experiment

One-color/Affy experiment

Page 15: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Common: Expression Data MatrixexprsSet object

Samples Each column comes from one array

Experiments set

Gen

es

Each row represent one genes

Gene expression levels

Gene expression matrixc

Page 16: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Affymetrix QC: Affymetrix Chip

Page 17: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Affymetrix QC: Probe set design

Multiple Multiple oligo probes of 25oligo probes of 25--mermer

55´́ 33´́GeneGene

SequenceSequence

Perfect MatchPerfect Match

MismatchMismatch

Page 18: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Affymetrix QC: RNA labeling

AAAA RNA

AAAA

TTTTRNA -> cDNA

cDNA -> cRNA

AAAA

TTTT

AAAA

TTTT3’ds cDNA

5’

Page 19: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Affymetrix QC: RNA degradation

deg <-AffyRNAdeg(affybatch.example)summaryAffyRNAdeg(deg)

plotAffyRNAdeg(deg)

20A 20B 10ASlope 0.0767 0.063 0.0842p-value 0.136 0.212 0.091

Page 20: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Affymetrix QC: Probe level view

replicate1replicate2

Probe level density plot shows that the difference between two replicates is higher than the difference between signal and backgroundProbe level scatter plot of

replicated chips

Page 21: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Affymetrix Preprocessing: Probe set design

GeneGeneSequenceSequence

Multiple Multiple oligo probes of 25oligo probes of 25--mermer

55´́ 33´́

Perfect MatchPerfect Match

MismatchMismatch

Page 22: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Affymetrix Preprocessing: Probe set variability

Probes variability within probe set is 2 to 5 fold larger than the probe variability among arrays

Page 23: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Affymetrix Preprocessing: Many arrays assess probe affinity

~500

,000

pro

bes

Samples

12,0

00 –

22,0

00 g

enes

prob

es

Probe set Intensity = RNA concentration + Probe affinity

One probe set

Page 24: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Affymetrix Preprocessing: Overall Precision MAS5, dChip and RMA

Reducing the noise of low signals

Page 25: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Affymetrix Preprocessing: Algorithm flow

4; medianpolish3; Model based, outliers detection

3; average of log(PM-CT)

Signal summary

5; 2; simple PM-MM or PM-only

2; DividingBackground subtraction

3; estimating the left tail

No1; calculating CT<PM

Background summary

2; quantile1; invariantset4; globalNormalization

145;Linearization (log-transformation)

RMAdChipMAS5

Page 26: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Affymetrix Preprocessing: Bioconductor flexibility

Package: AffyMethod: expresso

Page 27: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Spotted preprocessing: Per-Tip Normalization

}Sub-array

Array

Page 28: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Spotted preprocessing: Per-Tip Normalization

Y. H. Yang, et al., Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research, 30(4), 2002.

Page 29: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Spotted preprocessing: Per-Tip Normalization(pre-normalization)

Down-regulated Up-regulated Spatial graph – each colored squared represent a spoteach black square represents a set of spots printed by a given tip.Green is a gene that were down-regulated in the 10th percentileRed is a gene that were up-regulated in the 10th percentile

Page 30: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Spotted preprocessing: Per-Tip Normalization (post-normalization)

Page 31: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Spotted preprocessing: Experimental design

A B C

Ref

Common reference

Loop design

A

B

C

A B C

ABCPooled reference

Page 32: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Behavior: detecting of small expression differences

– Before that paper only small amount of genes in brain were shown to have changes in expression which are related to behavior.

– By conducting the right experimental design using ANOVA in this study they could detect small differences of gene expression in honey bees brain and therefore relate many genes to behavior.

– By this they could predict the behavior of honey-bees workers by their brain gene expression.

C. W. Whitfield, A. M. Cziko, G. E. Robinson, Science 302, 296-9 (2003).

Page 33: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Data Analysis: Applications of microarrays

Evolution– Most of gene expression differences between chimpanzees human have

been detected in their brain. Development

– Associating apoptosis related gene expression with metamorphosis stages in a variety tissues of Drosophila

Behavior– Can predict the behavior of honey-bees workers by their brain gene

expression. Tissue

– Molecular signature specific to subtype of cancer tissuesRegulation

– Finding novel regulatory motifs by coupling motif search with co-expressionFunctional annotation

– Annotating unknown genes based on co expression (guilty by association)

- Question about samples - Question about gene

Page 34: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Biological questions

Sample classification– What is the set of genes that differentiate between two or

more groups of Treatments (Supervised methods)– What is the set of conditions that displays the same

expression profile in a given genes repertoire. (Unsupervised methods)

Gene classification– What is the set of genes that have the same expression

pattern over a set of treatments. (Unsupervised methods)

Page 35: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Data Analysis: Methods

Supervised – Discrimenant analysis– ANOVA– Linear models– Empirical Bayesian

Unsupervised Analysis– Partition methods

SOM (Self Organization Maps)K means

– Hierarchical Clustering (bootstrapping)

Page 36: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Data Analysis: visualization

Using R

Bioconductor

Preparing packages that connect R to other visualization programs

Page 37: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Annotation: Packages

GenebankLocusLink

KEGGGO

GEOSwissProtUniGene

Parser AnnBuilder(Developers)

Annotate(Users)

Database format(HTML/XML)

Package

HTML Table

Page 38: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Annotation: Functionality enrichment (GoStat; geneplotter)

Wound healing

Signaling and Angiogenesis

Immediate and early response

Cell cycle

Cholesterol biosynthesis

Biological Process

2

1

24

1

Number of genes in a cluster (10)

20

10

2040

10

Number of genes in an array (100)

Enrichment to cell cycle?

606Others

404Cell Cycle

ArrayCluster

Y-axis:GC-RMA GA, Time continuousColored by:Time 0

Error Bars:between-sample std. errorGene List:Selected (6)

0 1 3 7 14 30Time0.01

0.1

1

10

100

0 1 3 7 14 30Time0.01

0.1

1

10

100

Page 39: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Who is it for?

Knowing your way in microarray analysisYou have to know basics of R-LanguageRequire only R environment due to R automation

Page 40: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

First time:Installation

Installation by two commands

– source(http://www.bioconductor.org/getBioC.R)

– getBioC()

gpls, PROcess and apComplex.proteomics

DNAcopy and aCGH.arrayCGH

Biobase, ctc, daMA, edd, factDesign, genefilter, geneplotter, globaltest, gpls, limma, RMAGEML, multtest, pamr, ROC, siggenes, and splicegear.

analyses

annotate, AnnBuilder, humanLLMappings, KEGG, GO, SNPtools, makecdfenv, and ontoTools.

annotation

daMA and factDesign.design

AnnBuilder, SAGElyzer, Rdbi, and RdbiPgSQL.

database

tkWidgets, widgetTools and DynDoc.widgets

externalVector, graph, hexbin and Ruuid.prog

graph, Rgraphviz and RBGLgraph

Biobase, annotate, edd, genefilter, geneploter, globaltest,ROC, MAGEML, multtest, limma, pamr, siggenes and vsn.

exprs

gets all packages from "affy", "cdna" and "exprs"

default

marray, vsn, plus "exprs".cdna

affy, affycomp, affydata, affyPLM, annaffy, gcrma, makecdfenv, matchprobes, plus "exprs".

affy

Allall

PackagesArgument

Page 41: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

First time: Searchable mailing list

Page 42: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

First time: browsing vignette

library(tkWidgets)vExplorer()

Page 43: overview - Technionbioinfo.cs.technion.ac.il/.../bioconductor_overview.pdfTechnion 28/03/05 Bioconductor: experimental data zALL Acute leukemia Affymetrix data zcolonCA Alon et al.

Technion 28/03/05

Summary

Bioconductor is a project is not a software therefore you have to know your way in microarray analysisIt includes almost any type of analysis and many unique methods however to use it one has to know RBioconductor support the use of its packages by supplying many tutorials and methods that let you stay in R environment