Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology...
-
Upload
benedict-butler -
Category
Documents
-
view
222 -
download
0
Transcript of Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology...
![Page 1: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/1.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
Gene Expression Analysis Using Microarrays
Dr Mushtaq Ahmed
Technology Incubation Division
Persistent Systems Private Ltd
Pune
![Page 2: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/2.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
Topics
1. Introduction
2.Data Storage and Exchange Standards
3.Analysis (Clustering)
4.Conclusion and References
![Page 3: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/3.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
1. Introduction
• Structure Activity Relationship
• Structural vs. Functional Genomics
• Principals of Microarray Experiment
• Applications
![Page 4: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/4.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
Structure Activity Relationship
GENES(finite)
FUNCTIONS(infinite)
PROTEINS
EXPERIMENTAL SETUP
FunctionalGenomics
ORConfirmation
Work
StructuralGenomics
ORPrediction
Work
![Page 5: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/5.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
Source:Yale Bioinformatics
![Page 6: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/6.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
Principles of a Microarray Experiment:Hybridization
1. Environment Functions Proteins mRNA cDNA
2. Different incubations of cells results in up or down regulation of different sets of genes.
3. Microarray provides a medium for matching known and unknown DNA samples based on base-pairing rules and automating the process of identifying the unknowns
4. Set of expressed genes (at mRNA stage) isolated and identified using hybridization on a microarray chip
![Page 7: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/7.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
HTS Using Hybridization
Target: cDNA (variables to be detected)
Probe: oligos/cDNA(gene templates) +
Hybridization
PathwaysFunctional Annotation
Analysis of outcome
Microarray Chip
Samples
Targets/Leads Disease Class.Physiological
states
![Page 8: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/8.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
Timeline for drug discovery
Discovery (5 yrs)5000 Gene expression
study
Pre-Clinical (1 yr)
50
Clinical (6 yrs)
5
Review (2 yrs)
1
Marketed
![Page 9: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/9.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
2. Data Storage and Exchange Standards
• Raw and Processed Data
• Conceptual View of Database
• Example of ArrayExpress
• Issues
• Standardization for Exchange
![Page 10: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/10.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
Raw data – images
• Red (Cy5) dot – overexpressed or up-regulated
• Green (Cy3) dot – underexpressed or down-regulated
• Yellow dot– equally expressed
• Intensity - “absolute” level
• red/green - ratio of expression– 2 - 2x overexpressed– 0.5 - 2x underexpressed
• log2( red/green ) - “log ratio”– 1 2x overexpressed– -1 2x underexpressed
cDNA plotted microarray
![Page 11: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/11.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
Microarray Expression Value Representation
expression value types
primary images composite imagese.g., green/red ratios
primaryspots
compositespots
primarymeasurements
derivedvalues
Source: MGED
![Page 12: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/12.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
Gene expression database – a conceptual view
SamplesG
enes
Gene expression levels
Sample annotations
Gene annotations
Gene expression matrix
![Page 14: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/14.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
DAG Representation of Biomaterials
Sample sourcePrimary sample 1
Primary sample 2
Derived sample 1
Labeled extract 1
Extract 1
Derived sample 2
A new state ofsample source
Extract 2
Labeled extract 2Hybridizationlabeling
extraction
treatment
treatment
Source: MGED
![Page 15: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/15.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
ArrayExpress (MGED) Design
Experiment
e.g., publication, webresource
Reference
e.g., organismtaxonomy
Ontology
Sample
Array
e.g., gene inSWISS-prot
Database
Hybridization
ExpressionValue
ArrayExpress
External links
Source: MGED
![Page 16: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/16.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
ArrayExpress (MGED) Architecture
data submission & Curation database
data warehouse
application serverWeb server
image server?
ArrayExpress
Curation pipeline
MAMLdata
Source: MGED
![Page 17: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/17.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
Issues in Storage
• Size of Data– Experiments
• 100 000 genes, 320 cell types
• 2000 compounds, 3 time points, 2 concentrations, 2 replicates
– Data• 8 x 1011 data-points
• 1 x 1015 = 1 petaB of data
• Others– Raw data are images
– lack of standard measurement units for gene expression
– lack of standards for sample annotation
![Page 18: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/18.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
Standardization
• MIAME (Minimum Info About a Microarray Expt)– Experimental design, Array design
– Samples, Hybridisations
– Measurements, Controls
• OMG-LSR-DFT
– Life Sciences Research, Domain Task Force Gene Expression RFP
– EBI (MAML), Rosetta (GEML), NetGenics : submitters
• Proposed MAGEML (MAML +GEML)
– Annotations + data; data stored as a set of external 2D matrices
– Data format independent of particular scanner or image analysis software
– Sample and treatment can be represented as a Directed Acyclic Graphs
– Concept of composite images and composite spots
![Page 19: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/19.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
3. Data Analysis (Clustering)
• Normalization
• Hierarchical Clustering
• Divisive Clustering
• Other Methods
• Visual Tools
![Page 20: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/20.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
Normalization
• Assumption– Average expression ratio =1
– Amount of mRNA from both the sample is same
• Total Intensity– Calculate a factor to rescale intensities of all te genes so that
• total Cy3= total Cy5
• Regression Techniques– Adjust the intensities so that
• Slope of scatter plot of Cy3 vs Cy5 =1
• Using ratio statistics– Based on ‘housekeeping genes’ expression a probability density
ratio is developed which is used for normalization
![Page 22: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/22.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
Clustering
• Hierarchical – Single, Complete and Average Linkage
• Divisive– K-means
– Self Organizing Maps (SOM)
• Others– Principal Component Analysis (PCA)
– Supervised Methods
![Page 23: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/23.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
Hierarchical clustering
• Distance metrics or Similarity Measures– Euclidian, Pearson, distance of slopes etc..
• Cost functions– Single Linkage
• Min distance of any two members (one from each of the two clusters)
– Complete Linkage• Max distance of any two members (one from each of the two clusters)
– Average Linkage• UPGMA• WPGMA• Within Groups
– Ward’s Method• Join which produces smallest possible error in some of squared errors
![Page 25: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/25.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
Divisive clustering
• K-means– ‘k’ random (or specified) points used to create clusters, average vectors for
the clusters then used iteratively– Knowledge of probable no of clusters (k) needed – Used in combination with PCA and hierarchical clustering
• Self Organizing maps– User defined geometric configurations as partitions – Random vectors generated for each partition and TRAINED till convergence
(ANN based)
• Visualization Methods– Helps in cluster visualization
• Scatter Plot, Web plot, histogram
– May help in clustering itself• E.g., SuperGrouper utility of MaxdView
![Page 27: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/27.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
Other Clustering Methods
• PCA (Principal Component Analysis) – Also called SVD (Singular Value Decomposition)– Reduces dimensionality of gene expression space– Finds best view that helps separate data into groups
• Supervised Methods– SVM (Support Vector Machine)– Previous knowledge of which genes expected to cluster is used for training– Binary classifier uses ‘feature space’ and ‘kernel function’ to define a optimal
‘hyperplane’– Also used for classification of samples- ‘expression fingerprinting’ for disease
classification
![Page 29: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/29.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
4. Conclusion and References
• Microarrays makes HTS with hybridization possible• No single standard unit for measuring expression levels• Handling and interpretation not yet exact• Assumptions: Elements in cluster must share some commonality• Classification depends on method used for clustering, normalization,
distance function• No “correct” way of classification, “biological understanding” is the
ultimate guide• Provides extension to existing knowledge (e.g., classifying a novel
gene into a known pathway)
![Page 30: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/30.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
Software
• Databases
– Public repositories:
• GEO (NCBI), GeneX (NCGR), ArrayExpress (EBI)
– In-house databases
• Stanford, MIT, University of Pennsylvania,
– Organism specific databases
• Mouse Genome Informatics Database
– Proprietary databases –
• Gene Logic, NCI, Synergy (NetGenics), Genomics Knowledge Platform (Incyte)
• Analysis Tools
– Public Domain
• maxdView (University of Manchester)
• CyberT , RCuster interfaces of GeneX
– Proprietary
• Spotfire, Xpression NTI (Informaxinc)
![Page 31: Persistent Systems Pvt. Ltd. Gene Expression Analysis Using Microarrays Dr Mushtaq Ahmed Technology Incubation Division Persistent.](https://reader036.fdocuments.net/reader036/viewer/2022062421/56649d9f5503460f94a8a9ba/html5/thumbnails/31.jpg)
Persistent Systems Pvt. Ltd.http://www.persistent.co.in
References
• Microarray Gene Expression Database Group – http://www.mged.org
• National Center for Genomic Research– http://genex.ncgr.org
• University of Manchester , Bioinformatics Group– http://bioinf.man.ac.uk/microarray/resources.html
• Nature Reviews Genetics– http://www.nature.com/nrg/