EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon...

40
Schedule Schedule Data, preprocessing, grouping (10:15-11:00) Data, preprocessing, grouping (10:15-11:00) Hands-on part I (11:00-11:20) Hands-on part I (11:00-11:20) Group analysis (11:20-11:45) Group analysis (11:20-11:45) Hands-on part II (11:45-12:05) Hands-on part II (11:45-12:05) Coffee Break (12:05 – 12:20) Coffee Break (12:05 – 12:20) Spike (12:20-12:50) Spike (12:20-12:50) Hands-on part III (12:50-13:10) Hands-on part III (12:50-13:10) Lunch Break (13:10 – 14:10) Lunch Break (13:10 – 14:10) Amadeus (14:10 – 14:40) Amadeus (14:10 – 14:40) Hands-on part IV (14:40 – 15:00) Hands-on part IV (14:40 – 15:00)

Transcript of EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon...

Page 1: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

ScheduleSchedule Data, preprocessing, grouping (10:15-11:00)Data, preprocessing, grouping (10:15-11:00) Hands-on part I (11:00-11:20)Hands-on part I (11:00-11:20) Group analysis (11:20-11:45)Group analysis (11:20-11:45) Hands-on part II (11:45-12:05)Hands-on part II (11:45-12:05) Coffee Break (12:05 – 12:20)Coffee Break (12:05 – 12:20) Spike (12:20-12:50)Spike (12:20-12:50) Hands-on part III (12:50-13:10)Hands-on part III (12:50-13:10) Lunch Break (13:10 – 14:10)Lunch Break (13:10 – 14:10) Amadeus (14:10 – 14:40)Amadeus (14:10 – 14:40) Hands-on part IV (14:40 – 15:00)Hands-on part IV (14:40 – 15:00)

Page 2: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

• EXPANDER EXPANDER – an integrative package for analysis of gene expression data

• Built-in support for 15 organisms:human, mouse, rat, chicken, fly, zebrafish,

C.elegans, yeast (s. cereviciae and s. pombe), arabidopsis, tomato, listeria, E. coli, aspargillus* and rice.

• Demonstration - on oligonucleotide array data, which contains expression profiles measured in several time points after serum stimulation of human cell line.

Page 3: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

What can it do?What can it do? Low level analysis:

Missing data estimation (KNN or manual)Data adjustments (merge conditions, divide

by base, take log)NormalizationProbes & condition filtering

High level analysisDetecting patterns/groups in the data

(supervised clustering, differential expression, clustering, bi-clustering, network based grouping).

Ascribing biological meaning to patterns (searching for enrichment within groups).

Page 4: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Input data

Functional enrichment

Preprocessing

Promoter signals

Visu

aliza

tion

utilities

Location enrichment

miRNA Targets

enrichment

Lin

ks to

pu

blic

ann

ota

tion

da

tab

ase

s

Grouping

KEGG pathway

enrichment

Page 5: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

EXPANDER – Data Input data:

­ Expression matrix (probe-row; condition-column) One-channel data (e.g., Affymetrix) Dual-channel data, in which data is log R/G (e.g.

cDNA microarrays) ‘.cel’ files

­ ID conversion file: maps probes to genes­ Gene groups data: defines gene groups

Page 6: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

EXPANDER – Data (II)

Data definitions: Defining condition subsets Data type & scale (log)

Data Adjustments: Missing value estimation (KNN or arbitrary) Reorder conditions Merging conditions Merging probes by gene IDs Divide by base Log data (base 2)

Page 7: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

EXPANDER – Preprocessing Normalization: removal of systematic biases from

the analyzed chips Implemented methods: quantilequantile, lowesslowess Visualization: box plots, scatter plots (simple, M vs. A)

Filtering: Focus downstream analysis on the set of “responding genes” Fold-Change Variation Statistical tests: T-test, SAM (Significance Analysis of

Microarrays) It is possible to define “VIP genes”.

Standardization : Mean=0, STD=1 (visualization)

Page 8: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Input data

Functional enrichment

Preprocessing

Promoter signals

Visu

aliza

tion

utilities

Location enrichment

miRNA Targets

enrichment

Lin

ks to

pu

blic

ann

ota

tion

da

tab

ase

s

Grouping

KEGG pathway

enrichment

Page 9: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Cluster Analysis• partition the responding genes into

distinct groups, each with a particular expression pattern

co-expression → co-function co-expression → co-regulation

• Partition the genes to achieve: High Homogeneity within clusters High Separation between clusters

Page 10: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Cluster Analysis (II)

Implemented algorithms: CLICK, K-means, SOM, Hierarchical

Visualization: Mean expression patterns Heat-maps Chromosomal positions Network sub-graph PCA Clustered matrix

Page 11: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Biclustering

Relevant knowledge can be revealed by identifying genes with common pattern across a subset of the conditions

Novel algorithmic approach is needed: Biclustering

Clustering seeks global partition according to similarity across ALL conditions >> becomes too restrictive on large datasets.

Page 12: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

* Bicluster (=module) : subset of genes with similar behavior in a subset of conditions

* Computationally challenging: has to consider many combinations of sub-conditions

Biclustering II Biclustering II

SAMBA = Statistical Algorithmic Method for Bicluster Analysis

* A. Tanay, R. Sharan, R. Shamir RECOMB 02

Page 13: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Drawbacks/ limitations:

• Usefull only for over 20 conditions

• Parameters

• How to asses the quality of Bi-clusters

Page 14: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Biclustering Visualization

Page 15: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Network based grouping

• Goal: to identify modules using gene expression data and interaction networks.

• GE data + Interactions file (.sif) .

• MATISSE (Module Analysis via Topology of Interactions and Similarity SEts).

I. Ulitsky and R. Shamir. BMC Systems Biology (2007)

Page 16: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Motivation

Detect functional modules: groups of interacting proteinsco-expressed genes

Integrative analysis - can identify weaker signals

Identifies a group of genes as well as the connections between them

Page 17: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Front vs Back nodes

Only variant genes (front nodes) have meaningful similarity values

These can be linked by non regulated genes (back nodes).

Back nodes correspond to: Post-translational regulation Partially regulated pathways Unmeasured transcripts

Page 18: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Network based clustering visualization

Similar to clustering visualization (gene list, mean patterns, heat maps, etc.).

Interactions map

Page 19: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Supervised Grouping

• Differential expression: t-test, SAM (Significance Analysis of Microarrays)

• Similarity group (correlation to one probe/gene)

• Rule based grouping (define a pattern)

Page 20: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Hands-on part I (1-3)

Page 21: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Input data

Functional enrichment

Preprocessing

Promoter signals

Visu

aliza

tion

utilities

Location enrichment

miRNA Targets

enrichment

Lin

ks to

pu

blic

ann

ota

tion

da

tab

ase

s

Grouping

KEGG pathway

enrichment

Page 22: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Ascribe functional meaning to gene groups

Gene OntologyGene Ontology (GO) a database of functional annotations. We extract files for all supported organisms.

TANGOTANGO: Apply statistical tests that seek over-represented GO functional categories in the groups.

Page 23: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Functional Enrichment - Visualization

Page 24: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Input data

Functional enrichment

Preprocessing

Promoter signals

Visu

aliza

tion

utilities

Location enrichment

miRNA Targets

enrichment

Lin

ks to

pu

blic

ann

ota

tion

da

tab

ase

s

Grouping

KEGG pathway

enrichment

Page 25: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Inferring regulatory mechanisms from gene expression data

Assumption: co-expression → transcriptional co-regulation → common cis-regulatory promoter elements

Computational identification of cis-regulatory elements that are over-represented in promoters of the co-expressed gene

PRIMA - PRomoter Integration in Microarray Analysis

* Elkon, et. Al, Genome Research (2003)

Page 26: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

PRIMA – general description Input:

Target set (e.g., co-expressed genes)Background set (e.g., all genes on the chip)

Analysis: Identify transcription factors whose binding

site signatures are enriched in the ‘Target set’ with respect to the ‘Background set’.

TF binding site models – TRANSFAC DB Default: From -1000 bp to 200 bp relative

the TSS

Page 27: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Promoter Analysis - Visualization

Page 28: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Input data

Functional enrichment

(TANGO)

Normalization/Filtering

Promoter signals (PRIMA)

Visu

aliza

tion

utilities

Location enrichment

miRNA Targets

enrichment (FAME)

Lin

ks to

pu

blic

ann

ota

tion

da

tab

ase

s

Grouping (Clustering/ Biclustering/

Network based clustering)

Page 29: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

miRNA Enrichment Analysis

• Goal: to predict micorRNAs (miRNAs) regulation by detecting miRNAs whose binding sites are over/under represented in the 3' UTRs of gene groups.

• FAME = Functional Assignment of MiRNAs via Enrichment

Page 30: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

FAME

miRNA targets

Genes sharing a common function

Significance of overlap

• Rather than using a simple hyper-geometric test, FAME also considers:

• The uneven distribution of 3’ UTR lengths

• Confidence values assigned to individual miRNA target sites (context scores)

Page 31: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Implementation in Expander

• Usage very similar to that of TANGO and PRIMA

• Currently uses TargetScan5 predictions• Main parameters:

– Number of random iterations (random graphs created)

– Enrichment direction: over- or under-representation

– Multiple testing correction– The use of the context score weights is

optional

Page 32: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Input data

Functional enrichment

(TANGO)

Normalization/Filtering

Promoter signals (PRIMA)

Visu

aliza

tion

utilities

Location enrichment

miRNA Targets

enrichment (FAME)

Lin

ks to

pu

blic

ann

ota

tion

da

tab

ase

s

Grouping (Clustering/ Biclustering/

Network based clustering)

Page 33: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Location analysis

• Goal: Detect genes that are located in the same area and are co-expressed.

• Search for over represented chromosomal areas within gene groups.

• Statistical test.

• Redundancy filter

• Ignoring known gene clusters

Page 34: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Location analysis visualization

• Enrichment analysis visualization

• Positions view with color assignments

Page 35: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

KEGG pathway analysis

• Searches for KEGG pathways that are over-represented in gene groups (I.e. in a target set with respect to a background set).

• Uses hyper geometric test.

• Multiple testing correction (Bonferroni)

• Enrichment results visualization (same as other group analysis results).

Page 36: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Custom enrichment analysis

• Loads an annotation file supplied by the user (provides genes with custom annotations).

• Searches for annotations (features) that are over-represented in gene groups (I.e. in a target set with respect to a background set).

• Uses hyper geometric test.• Multiple testing correction (Bonferroni)• Enrichment results visualization (same as other

group analysis results).

Page 37: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Hands-on part II (3-11)

Page 38: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.
Page 39: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

SPIKE…

Page 40: EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Gene Groups File