Integration of heterogeneous data
-
Upload
lars-juhl-jensen -
Category
Technology
-
view
347 -
download
0
description
Transcript of Integration of heterogeneous data
Lars Juhl Jensen
Integration of heterogeneous data
Lars Juhl Jensen
Integration of heterogeneous data
Lars Juhl Jensen
Integration of heterogeneous data
what went wrong?
a good question
signaling networks
Oda & Kitano, Molecular Systems Biology, 2006
long way to go
mass spectrometry
Linding, Jensen, Ostheimer et al., Cell, 2007
phosphorylation sites
in vivo
kinases are unknown
peptide assays
Miller, Jensen et al., Science Signaling, 2008
sequence specificity
kinase-specific
in vitro
no context
what a kinase could do
not what it actually does
computational methods
sequence specificity
Miller, Jensen et al., Science Signaling, 2008
kinase-specific
no context
what a kinase could do
not what it actually does
in vitro
in vivo
context
co-activators
scaffolders
expression
association networks
Linding, Jensen, Ostheimer et al., Cell, 2007
a good idea
Linding, Jensen, Ostheimer et al., Cell, 2007
Part Isequence motifs
curated motifs
PROSITE
ELM
HPRD
regular expressions
[ST]P.[KR]
no score
Miller, Jensen et al., Science Signaling, 2008
insufficient
machine learning
NetPhosK
PredPhospho
PHOSITE
GPS
KinasePhos
PPSP
GANNPhos
PhoScan
no regular updates
NetPhorest
Miller, Jensen et al., Science Signaling, 2008
data sources
Phospho.ELM
Diella et al., Nucleic Acids Res., 2008
Diella et al., Nucleic Acids Res., 2008
Scansite
Obenauer et al., Nucleic Acids Res., 2003
Miller, Jensen et al., Science Signaling, 2008
common basis
Miller, Jensen et al., Science Signaling, 2008
automated pipeline
compilation of datasets
classification vs. prediction
Miller, Jensen et al., Science Signaling, 2008
homology reduction
Miller, Jensen et al., Science Signaling, 2008
training and evaluation
cross-validation
Miller, Jensen et al., Science Signaling, 2008
classifier selection
Miller, Jensen et al., Science Signaling, 2008
motif atlas
179 kinases
93 SH2 domains
8 PTB domains
BRCT domains
WW domains
14-3-3 proteins
phosphatases
model organisms
S. cerevisiae
D. melanogaster
C. elegans
biological insights
docking domains
Miller, Jensen et al., Science Signaling, 2008
disease-related kinases
Miller, Jensen et al., Science Signaling, 2008
predictive power
ROC curves
Miller, Jensen et al., Science Signaling, 2008
comparison
Miller, Jensen et al., Science Signaling, 2008
conclusions
data collection
automation
benchmarking
homology reduction!
Part IIassociation networks
STRING
Jensen, Kuhn et al., Nucleic Acids Research, 2009
functional associations
data integration
common basis
630 genomes
model organism databases
Ensembl
RefSeq
genomic context methods
gene fusion
Korbel et al., Nature Biotechnology, 2004
conserved neighborhood
operons
Korbel et al., Nature Biotechnology, 2004
bidirectional promoters
Korbel et al., Nature Biotechnology, 2004
phylogenetic profiles
Korbel et al., Nature Biotechnology, 2004
primary experimental data
protein interactions
yeast two-hybrid
affinity purification
fragment complementation
Jensen & Bork, Science, 2008
genetic interactions
Beyer et al., Nature Reviews Genetics, 2007
BINDBiomolecular Interaction Network Database
BioGRIDGeneral Repository for Interaction Datasets
DIPDatabase of Interacting Proteins
IntAct
MINTMolecular Interactions Database
HPRDHuman Protein Reference Database
PDBProtein Data Bank
inferred associations
gene coexpression
GEOGene Expression Omnibus
expression compendia
curated knowledge
complexes
MIPSMunich Information center
for Protein Sequences
Gene Ontology
pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
KEGGKyoto Encyclopedia of Genes and Genomes
MetaCyc
Reactome
PIDNCI-Nature Pathway Interaction Database
literature mining
MEDLINE
SGDSaccharomyces Genome Database
The Interactive Fly
OMIMOnline Mendelian Inheritance in Man
co-mentioning
statistical methods
NLPNatural Language Processing
Gene and protein namesCue words for entity recognitionVerbs for relation extraction
[nxgene The GAL4 gene]
[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]
easy in theory …
… but not in practice
different formats
parsers
different identifiers
thesaurus
redundant sources
book keeping
variable quality
raw quality scores
reproducibility
von Mering et al., Nucleic Acids Research, 2005
benchmarking
von Mering et al., Nucleic Acids Research, 2005
spread over 630 genomes
transfer by orthology
von Mering et al., Nucleic Acids Research, 2005
two modes
COG mode
von Mering et al., Nucleic Acids Research, 2005
protein mode
von Mering et al., Nucleic Acids Research, 2005
combine all evidence
visualize
Frishman et al., Modern Genome Annotation, 2009
STITCH
metabolite–enzyme links
pathway databases
Letunic & Bork, Trends in Biochemical Sciences, 2008
drug–target links
Drugbank
PDSP Ki
MATADOR
Campillos & Kuhn et al., Science, 2008
chemical–chemical links
shared targets
fingerprint similarity
chemical–protein network
conclusions
more data is better
quality scores
benchmarking
cross-species integration
Part IIIputting it all together
Linding, Jensen, Ostheimer et al., Cell, 2007
NetworKIN
benchmarking
Linding, Jensen, Ostheimer et al., Cell, 2007
2.5-fold better accuracy
context is crucial
localization
Linding, Jensen, Ostheimer et al., Cell, 2007
DNA damage response
Linding, Jensen, Ostheimer et al., Cell, 2007
Linding, Jensen, Ostheimer et al., Cell, 2007
small-scale validation
ATM phosphorylates Rad50
Linding, Jensen, Ostheimer et al., Cell, 2007
Cdk1 phosphorylates 53BP1
Linding, Jensen, Ostheimer et al., Cell, 2007
high-throughput validation
multiple reaction monitoring
Linding, Jensen, Ostheimer et al., Cell, 2007
systematic validation
kinase inhibitor matrix
Fedorov et al., PNAS, 2007
design optimal experiments
integration with literature
Reflect
conclusions
complementary data
visualization
a good question
Acknowledgments
NetworKIN.info– Rune Linding– Gerard Ostheimer– Francesca Diella– Karen Colwill– Jing Jin– Pavel Metalnikov– Vivian Nguyen– Adrian Pasculescu– Jin Gyoon Park– Leona D. Samson– Rob Russell– Peer Bork– Michael Yaffe– Tony Pawson
STITCH.embl.de– Michael Kuhn– Christian von Mering– Monica Campillos– Peer Bork
NetPhorest.info– Martin Lee Miller– Francesca Diella– Claus Jørgensen– Michele Tinti– Lei Li– Marilyn Hsiung– Sirlester A. Parker– Jennifer Bordeaux– Thomas Sicheritz-Pontén– Marina Olhovsky– Adrian Pasculescu– Jes Alexander– Stefan Knapp– Nikolaj Blom– Peer Bork– Shawn Li– Gianni Cesareni– Tony Pawson– Benjamin E. Turk– Michael B. Yaffe– Søren Brunak
STRING.embl.de– Christian von Mering– Michael Kuhn– Manuel Stark– Samuel Chaffron– Philippe Julien– Tobias Doerks– Jan Korbel– Berend Snel– Martijn Huynen– Peer Bork
Reflect.ws– Sean O’Donoghue– Evangelos Pafilis– Heiko Horn– Michael Kuhn– Nigel Brown– Reinhardt Schneider