EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon...
-
Upload
leo-douglas-little -
Category
Documents
-
view
235 -
download
0
Transcript of EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon...
ScheduleSchedule Data, preprocessing, grouping (10:15-11:00)Data, preprocessing, grouping (10:15-11:00) Hands-on part I (11:00-11:20)Hands-on part I (11:00-11:20) Group analysis (11:20-11:45)Group analysis (11:20-11:45) Hands-on part II (11:45-12:05)Hands-on part II (11:45-12:05) Coffee Break (12:05 – 12:20)Coffee Break (12:05 – 12:20) Spike (12:20-12:50)Spike (12:20-12:50) Hands-on part III (12:50-13:10)Hands-on part III (12:50-13:10) Lunch Break (13:10 – 14:10)Lunch Break (13:10 – 14:10) Amadeus (14:10 – 14:40)Amadeus (14:10 – 14:40) Hands-on part IV (14:40 – 15:00)Hands-on part IV (14:40 – 15:00)
• EXPANDER EXPANDER – an integrative package for analysis of gene expression data
• Built-in support for 15 organisms:human, mouse, rat, chicken, fly, zebrafish,
C.elegans, yeast (s. cereviciae and s. pombe), arabidopsis, tomato, listeria, E. coli, aspargillus* and rice.
• Demonstration - on oligonucleotide array data, which contains expression profiles measured in several time points after serum stimulation of human cell line.
What can it do?What can it do? Low level analysis:
Missing data estimation (KNN or manual)Data adjustments (merge conditions, divide
by base, take log)NormalizationProbes & condition filtering
High level analysisDetecting patterns/groups in the data
(supervised clustering, differential expression, clustering, bi-clustering, network based grouping).
Ascribing biological meaning to patterns (searching for enrichment within groups).
Input data
Functional enrichment
Preprocessing
Promoter signals
Visu
aliza
tion
utilities
Location enrichment
miRNA Targets
enrichment
Lin
ks to
pu
blic
ann
ota
tion
da
tab
ase
s
Grouping
KEGG pathway
enrichment
EXPANDER – Data Input data:
Expression matrix (probe-row; condition-column) One-channel data (e.g., Affymetrix) Dual-channel data, in which data is log R/G (e.g.
cDNA microarrays) ‘.cel’ files
ID conversion file: maps probes to genes Gene groups data: defines gene groups
EXPANDER – Data (II)
Data definitions: Defining condition subsets Data type & scale (log)
Data Adjustments: Missing value estimation (KNN or arbitrary) Reorder conditions Merging conditions Merging probes by gene IDs Divide by base Log data (base 2)
EXPANDER – Preprocessing Normalization: removal of systematic biases from
the analyzed chips Implemented methods: quantilequantile, lowesslowess Visualization: box plots, scatter plots (simple, M vs. A)
Filtering: Focus downstream analysis on the set of “responding genes” Fold-Change Variation Statistical tests: T-test, SAM (Significance Analysis of
Microarrays) It is possible to define “VIP genes”.
Standardization : Mean=0, STD=1 (visualization)
Input data
Functional enrichment
Preprocessing
Promoter signals
Visu
aliza
tion
utilities
Location enrichment
miRNA Targets
enrichment
Lin
ks to
pu
blic
ann
ota
tion
da
tab
ase
s
Grouping
KEGG pathway
enrichment
Cluster Analysis• partition the responding genes into
distinct groups, each with a particular expression pattern
co-expression → co-function co-expression → co-regulation
• Partition the genes to achieve: High Homogeneity within clusters High Separation between clusters
Cluster Analysis (II)
Implemented algorithms: CLICK, K-means, SOM, Hierarchical
Visualization: Mean expression patterns Heat-maps Chromosomal positions Network sub-graph PCA Clustered matrix
Biclustering
Relevant knowledge can be revealed by identifying genes with common pattern across a subset of the conditions
Novel algorithmic approach is needed: Biclustering
Clustering seeks global partition according to similarity across ALL conditions >> becomes too restrictive on large datasets.
* Bicluster (=module) : subset of genes with similar behavior in a subset of conditions
* Computationally challenging: has to consider many combinations of sub-conditions
Biclustering II Biclustering II
SAMBA = Statistical Algorithmic Method for Bicluster Analysis
* A. Tanay, R. Sharan, R. Shamir RECOMB 02
Drawbacks/ limitations:
• Usefull only for over 20 conditions
• Parameters
• How to asses the quality of Bi-clusters
Biclustering Visualization
Network based grouping
• Goal: to identify modules using gene expression data and interaction networks.
• GE data + Interactions file (.sif) .
• MATISSE (Module Analysis via Topology of Interactions and Similarity SEts).
I. Ulitsky and R. Shamir. BMC Systems Biology (2007)
Motivation
Detect functional modules: groups of interacting proteinsco-expressed genes
Integrative analysis - can identify weaker signals
Identifies a group of genes as well as the connections between them
Front vs Back nodes
Only variant genes (front nodes) have meaningful similarity values
These can be linked by non regulated genes (back nodes).
Back nodes correspond to: Post-translational regulation Partially regulated pathways Unmeasured transcripts
Network based clustering visualization
Similar to clustering visualization (gene list, mean patterns, heat maps, etc.).
Interactions map
Supervised Grouping
• Differential expression: t-test, SAM (Significance Analysis of Microarrays)
• Similarity group (correlation to one probe/gene)
• Rule based grouping (define a pattern)
Hands-on part I (1-3)
Input data
Functional enrichment
Preprocessing
Promoter signals
Visu
aliza
tion
utilities
Location enrichment
miRNA Targets
enrichment
Lin
ks to
pu
blic
ann
ota
tion
da
tab
ase
s
Grouping
KEGG pathway
enrichment
Ascribe functional meaning to gene groups
Gene OntologyGene Ontology (GO) a database of functional annotations. We extract files for all supported organisms.
TANGOTANGO: Apply statistical tests that seek over-represented GO functional categories in the groups.
Functional Enrichment - Visualization
Input data
Functional enrichment
Preprocessing
Promoter signals
Visu
aliza
tion
utilities
Location enrichment
miRNA Targets
enrichment
Lin
ks to
pu
blic
ann
ota
tion
da
tab
ase
s
Grouping
KEGG pathway
enrichment
Inferring regulatory mechanisms from gene expression data
Assumption: co-expression → transcriptional co-regulation → common cis-regulatory promoter elements
Computational identification of cis-regulatory elements that are over-represented in promoters of the co-expressed gene
PRIMA - PRomoter Integration in Microarray Analysis
* Elkon, et. Al, Genome Research (2003)
PRIMA – general description Input:
Target set (e.g., co-expressed genes)Background set (e.g., all genes on the chip)
Analysis: Identify transcription factors whose binding
site signatures are enriched in the ‘Target set’ with respect to the ‘Background set’.
TF binding site models – TRANSFAC DB Default: From -1000 bp to 200 bp relative
the TSS
Promoter Analysis - Visualization
Input data
Functional enrichment
(TANGO)
Normalization/Filtering
Promoter signals (PRIMA)
Visu
aliza
tion
utilities
Location enrichment
miRNA Targets
enrichment (FAME)
Lin
ks to
pu
blic
ann
ota
tion
da
tab
ase
s
Grouping (Clustering/ Biclustering/
Network based clustering)
miRNA Enrichment Analysis
• Goal: to predict micorRNAs (miRNAs) regulation by detecting miRNAs whose binding sites are over/under represented in the 3' UTRs of gene groups.
• FAME = Functional Assignment of MiRNAs via Enrichment
FAME
miRNA targets
Genes sharing a common function
Significance of overlap
• Rather than using a simple hyper-geometric test, FAME also considers:
• The uneven distribution of 3’ UTR lengths
• Confidence values assigned to individual miRNA target sites (context scores)
Implementation in Expander
• Usage very similar to that of TANGO and PRIMA
• Currently uses TargetScan5 predictions• Main parameters:
– Number of random iterations (random graphs created)
– Enrichment direction: over- or under-representation
– Multiple testing correction– The use of the context score weights is
optional
Input data
Functional enrichment
(TANGO)
Normalization/Filtering
Promoter signals (PRIMA)
Visu
aliza
tion
utilities
Location enrichment
miRNA Targets
enrichment (FAME)
Lin
ks to
pu
blic
ann
ota
tion
da
tab
ase
s
Grouping (Clustering/ Biclustering/
Network based clustering)
Location analysis
• Goal: Detect genes that are located in the same area and are co-expressed.
• Search for over represented chromosomal areas within gene groups.
• Statistical test.
• Redundancy filter
• Ignoring known gene clusters
Location analysis visualization
• Enrichment analysis visualization
• Positions view with color assignments
KEGG pathway analysis
• Searches for KEGG pathways that are over-represented in gene groups (I.e. in a target set with respect to a background set).
• Uses hyper geometric test.
• Multiple testing correction (Bonferroni)
• Enrichment results visualization (same as other group analysis results).
Custom enrichment analysis
• Loads an annotation file supplied by the user (provides genes with custom annotations).
• Searches for annotations (features) that are over-represented in gene groups (I.e. in a target set with respect to a background set).
• Uses hyper geometric test.• Multiple testing correction (Bonferroni)• Enrichment results visualization (same as other
group analysis results).
Hands-on part II (3-11)
SPIKE…
Gene Groups File