Integration of diverse large-scale datasets

Post on 15-Jul-2015

385 views 2 download

Tags:

Transcript of Integration of diverse large-scale datasets

Integration of diverselarge-scale datasets

Lars Juhl Jensen

promoter analysis

Jensen et al., Bioinformatics, 2000

DNA structure

genome visualization

Pedersen et al., Journal of Molecular Biology, 2000

microarray normalization

Workman et al., Genome Biology, 2002

protein function prediction

STRING

integrate diverse evidence

functional interactions

Bork et al., Current Opinion in Structural Biology, 2005

179 proteomes

evolution

statistics

(the original sin)

prokaryotes

genomic context methods

gene fusion

gene neighborhood

phylogenetic profiles

Cell

Cellulosomes

Cellulose

eukaryotes

integrate diverse datasets

Jensen et al., Drug Discovery Today: Targets, 2004

curated knowledge

MIPSMunich Information center

for Protein Sequences

KEGGKyoto Encyclopedia of Genes and Genomes

STKESignal Transduction Knowledge Environment

Reactome

literature mining

MEDLINE

SGDSaccharomyces Genome Database

The Interactive Fly

OMIMOnline Mendelian Inheritance in Man

co-mentioning

NLPNatural Language Processing

Gene and protein namesCue words for entity recognitionVerbs for relation extraction

[nxgene The GAL4 gene]

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

primary experimental data

microarray expression data

GEOGene Expression Omnibus

physical protein interactions

BINDBiomolecular Interaction Network Database

MINTMolecular Interactions Database

GRIDGeneral Repository for Interaction Datasets

DIPDatabase of Interacting Proteins

HPRDHuman Protein Reference Database

problems

many sources

(different gene identifiers)

many types of evidence

questionable quality

not directly comparable

spread over many species

huge synonyms lists

calculate raw quality scores

calibrate vs. gold standard

KEGGKyoto Encyclopedia of Genes and Genomes

von Mering et al., Nucleic Acids Research, 2005

transfer based on orthology

combine all evidence

Bork et al., Current Opinion in Structural Biology, 2005

cell cycle

qualitative modeling

Chen et al., Molecular Biology of the Cell, 2004

Chen et al., Molecular Biology of the Cell, 2004

synchronized cell culture

microarray time series

periodically expressed genes

S. cerevisiae

Cho et al.

Spellman et al.

numerous analysis methods

Cho et al.

Spellman et al.

Zhao et al.

Johansson et al.

Luan and Li

Lu et al.

Ahdesmäki et al.

Willbrand et al.

no benchmarking

de Lichtenberg et al., Bioinformatics, 2005

reproducibility

de Lichtenberg et al., Bioinformatics, 2005

regulation vs. periodicity

de Lichtenberg et al., Bioinformatics, 2005

list of 600 periodic genes

S. pombe

several expression studies

reproducibility

Marguerat et al., Yeast, 2006

name inconsistencies

Marguerat et al., Yeast, 2006

different analysis methods

no benchmarking

Marguerat et al., Yeast, 2006

Marguerat et al., Yeast, 2006

too many genes suggested

Marguerat et al., Yeast, 2006

Marguerat et al., Yeast, 2006

averaging better than voting

Marguerat et al., Yeast, 2006

S. cerevisiae

list of 600 periodic genes

protein interaction data

von Mering et al., Nucleic Acids Research, 2005

de Lichtenberg et al., Science, 2005

dynamic proteins

static proteins

de Lichtenberg et al., Science, 2005

reproduces what is known

de Lichtenberg et al., Science, 2005

many detailed predictions

de Lichtenberg et al., Science, 2005

global trends

dynamic proteins

de Lichtenberg et al., Science, 2005

static proteins

de Lichtenberg et al., Science, 2005

just-in-time assembly

de Lichtenberg et al., Science, 2005

de Lichtenberg et al., Science, 2005

coordinated regulation

periodically expressed genes

Cdc28p substrates

PEST degradation signals

the human interactome

yeast two-hybrid

1936

13

4

4

1385

65

18465

Stelzl et al. Rual et al.

Small-scale studies

32

0

3

4

18

4

23

Stelzl et al. Rual et al.

Small-scale studies

62 8 39

Small-scale studies

Stelzl et al. Rual et al.

852

17

473

432

69

260

3.5% and 21% sensitivity

in a couple of years

the human interactome

100% = 1/5?

the yeast interactome

five years ago

yeast two-hybrid

1150

117

117

72

4053

118

4469

Uetz et al. Ito et al.

Small-scale studies

162

53

34

72

180

29

338

Uetz et al. Ito et al.

Small-scale studies

511 189 616

Small-scale studies

Uetz et al. Ito et al.

439

178

759

897

190

1347

19% and 12% sensitivity

the challenge

how to get from here …

1936

13

4

4

1385

65

18465

Stelzl et al. Rual et al.

Small-scale studies

… to there …

de Lichtenberg et al., Science, 2005

Acknowledgments

• The STRING team (EMBL)– Christian von Mering– Berend Snel– Martijn Huynen– Sean Hooper– Mathilde Foglierini– Julien Lagarde– Peer Bork

• Literature mining project(EML Research)– Jasmin Saric– Rossitza Ouzounova– Isabel Rojas

• Cell cycle studies (CBS)– Ulrik de Lichtenberg– Thomas Skøt Jensen– Søren Brunak

• S. pombe cell cycle (Sanger)– Samuel Marguerat– Jürg Bähler

• Inspiration for presentation– Lawrence Lessig– Dick Clarence Hardt– Anders Gorm Pedersen

Thank you!