High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of...

34
High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH [email protected] *With special thanks to Chiara Mazzanti, Ph.D.

Transcript of High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of...

Page 1: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

High throughput, multiplex analysis of gene expression*

David Goldman, Laboratory of Neurogenetics, NIH

[email protected]

*With special thanks to Chiara Mazzanti, Ph.D.

Page 2: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

Revolutions in the analysis of gene expression

Protein functionProtein concentrationPost-translational modificationTranscriptional controlGenetic variationGenomics

Page 3: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

Multiplex gene analysis: Why?

Complex diseases involve expression modulation40,000 genes, perhaps moreWe are not so cleverThe ultimate goal is to reconstruct the biochemical and functional networks of the cell. One can not understand a network by studying one gene.

Page 4: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

Methods for multiplex expression analysis

Two dimensional protein electrophoresis [2DE]Differential displayRepresentational difference analysis [RDA]Serial analysis of gene expression [SAGE]RNA array methods

Page 5: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

Array Systems

cDNA

Probes: amplified spotted cDNAshttp://cmgm.standford.eduhttp://www.nhgri.nih.gov

Oligonucleotide chips

Probes: Oligos synthesized in situ or conventionally & followed by immobilizationOligos as long as 80bphttp://www.affymetrix.comhttp://www.rii.com (Agilent)

Page 6: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

Array preparationMicroarray Genechip

Available genes: 10k – NCI [$50] 20k - Affymetrix

Page 7: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

Sample preparation (direct labeling)

Tissue

Total RNA

Cy3 or Cy5 Labelled cDNA

Cy3 Cy5

Hybridization mixing

Hybridization mixing

First strand cDNAsynthesis

Page 8: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov
Page 9: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

10kArrays

Hs-UniGem2-v2p9-060701Blocks: (18x18) x32

Features/Spots: 9984235um spacing

Page 10: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

Post-hybridizationmicroarray

analysis

ScannerGenePix 4000A

GenePix Prosoftware

Page 11: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

Image scanning

Cy3/Cy5 signal normalization

Image cleaning

GAL (gene array list) loading

Image analysis

Typical software (GenPix Pro)

Page 12: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

Case Cy3Control Cy5

Case Cy5Control Cy3

Reciprocal labelling

Page 13: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

cDNA Microarray

Oligo-microchip

Case Cy5Control Cy3Ratio >2, < 0.5

Array vs Chip

Ratio Cy5/Cy3

Ratio Array1/Array2

Page 14: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

http://linus.nci.nih.gov/cgi-bin/brb/download.cgi

Page 15: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov
Page 16: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

LOCAL APPROACH GLOBAL APPROACH

Schulze et al, Nature Cell Biology, August 2001

Page 17: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

Array Analysis Methods• Gene Discovery

– Outlier detection – simple and group logic retrieval tools; single and multiple array viewers

– Scatter plots

• Pattern Discovery– Clustering – Hierarchical, K-means– Principal Components Analysis; Multidimensional

Scaling

• Pattern Prediction

Page 18: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

Expression profile

Expressi o n si gn a tu re

Sample1 Sample 2…….Sample N

Gene 1

Gene 2....

Gene N

Expression signatures and profiles

Page 19: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov
Page 20: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

What is Clustering?

Clustering algorithms are discovery algorithms that find structure in data.

Discovery algorithms develop their own sorting parameters without human intervention.

Page 21: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

Hierarchical Clustering

Page 22: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

Hierarchical clustering

Mock phylogenetic dendrograms

Page 23: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

Dendrogram Construction for Hierarchical Clustering

• Merge nearest neighbors• Merge more distant datapoints

Page 24: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

K-means clustering

Genes are divided into k user-defined, equally-sized groups.

Gene group centroids and individual genes are then reassigned.

The process is reiterated until group compositions stabilize.

Page 25: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

ClusteringClusters are also developed from random data:

• Are clustered genes related in function?

• Do clusters group related samples/tissues/diseases/treatments?

Page 26: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

1 2 34

56

7 89

1011

12 1314

15 1617

1819 20

2122

23 2425

26 2728 29

30 31

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Clustering of Melanoma Tumors Using Average Linkage

1 2 3

4

5

6

7 8

910

1112 13

1415 16

17 1819 20

21

2223 24

25

26 27

28 29

30 31

0.2

0.4

0.6

0.8

1.0

Clustering of Melanoma Tumors Using Complete Linkage

1 2 3

4

5

6

7

8

910

1112 13

14

15 16

17 1819 20

21

2223 24

25

26

27

28 29

30

31

0.2

0.3

0.4

0.5

0.6

Clustering of Melanoma Tumors Using Single Linkage

(Bittner et al., Nature, 2000)

Clustering algorithms: Differing results

Page 27: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

mAdb BioInformatics Project: Current• Links to external data sources

• dbEST

• GeneCards

• LocusLink

• MGD

• GenBank

• Stanford GO (Genome Ontology)

• PubMed

• UniGene

•Automatic updates of external data sources

Page 28: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov
Page 29: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov
Page 30: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

Hierarchical Clustering (Alizadeh et al., Nature, Feb. 2000)

Page 31: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

Van’t Veer et al,Nature, Jan. 2002

Page 32: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

Issues for Study Design

Appropriateness of modelExposure, genotype, timingTissue and cellsControls: Sieving the genes

The issue of completeness# genes expressed in tissue% modulated at chosen level

Page 33: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

Pitfalls in multiplex analysis of gene expression

Nonspecific changes in gene expressionQuantitatively small changes can be criticalmRNA modulation is often not the mechanism

Allosteric regulation of enzymesReceptor desensitization

Page 34: High throughput, multiplex analysis of gene expression* · High throughput, multiplex analysis of gene expression* David Goldman, Laboratory of Neurogenetics, NIH dgneuro@box-d.nih.gov

Conclusions

Multiplex analysis of gene expression has great potential for identifying intermediate processes in neural diseasesRealization of the potential of these inherently industrial methods requires more incisive experimental designsThe downstream biology is inherently nonindustrialAppropriate controls can narrow the field of candidate genesThese methods should lead to the identification of new pathways in disease and an understanding of sharing of pathways between diseases