From ELMs to function: interaction networks and feature spaces
-
Upload
lars-juhl-jensen -
Category
Health & Medicine
-
view
346 -
download
0
description
Transcript of From ELMs to function: interaction networks and feature spaces
From ELMs to Function:Interaction Networks and Feature Spaces
Lars Juhl JensenEMBL Heidelberg
Function unknown for 40% of human proteins
1AOZ (129 aa) vs. 1PLC (99 aa)
scoring matrix: BLOSUM50, gap penalties: -12/-215.5% identity; Global alignment score: -23
10 20 30 40 50 601AOZ SQIRHYKWEVEYMFWAPNCNENIVMGINGQFPGPTIRANAGDSVVVELTNKLHTEGVVIH .. .. : ... . . ..: . :...: . .: ...:. 1PLC ---------IDVLLGA---DDGSLAFVPSEFS-----ISPGEKIVFK-NNAGFPHNIVFD 10 20 30 40
70 80 90 100 110 1201AOZ WHGILQRGTPWADGTASISQCAINPGETFFYNFTVDNPGTFFYHGHLGMQRSAGLYGSLI .: :. . . : . :::: .. . .:. : : ::. :.. 1PLC EDSI-PSGVDASKISMSEEDLLNAKGETFEVALSNKGEYSFYCSPHQG----AGMVGKVT 50 60 70 80 90
1AOZ VDPPQGKKE :. 1PLC VN-------
Structural similarity can be deceiving: Two structures from the Cupredoxin superfamily
Enzyme Non-enzyme
ProtFun: Prediction of protein function from post-translational modifications
# Functional category 1AOZ 1PLC Amino_acid_biosynthesis 0.126 0.070 Biosynthesis_of_cofactors 0.100 0.075 Cell_envelope 0.429 0.032 Cellular_processes 0.057 0.059 Central_intermediary_metabolism 0.063 0.041 Energy_metabolism 0.126 0.268 Fatty_acid_metabolism 0.027 0.072 Purines_and_pyrimidines 0.439 0.088 Regulatory_functions 0.102 0.019 Replication_and_transcription 0.052 0.089 Translation 0.079 0.150 Transport_and_binding 0.032 0.052
# Enzyme/nonenzyme Enzyme 0.773 0.310 Nonenzyme 0.227 0.690
# Enzyme class Oxidoreductase (EC 1.-.-.-) 0.077 0.077 Transferase (EC 2.-.-.-) 0.260 0.099 Hydrolase (EC 3.-.-.-) 0.114 0.071 Lyase (EC 4.-.-.-) 0.025 0.020 Isomerase (EC 5.-.-.-) 0.010 0.068 Ligase (EC 6.-.-.-) 0.017 0.017
Protein features determine function
Feature-functioncorrelations
• Transmembrane helices predictive of– Receptors
– Transporters
– Ion channels
• Subcellular localization– Receptors
– Transcription (regulation)
• S/T-phosphorylation– Transcription regulation
ELMer hunting Bugs: “Heeeey, there's something awfly scwewy going on awound here”
• The idea: compare GO annotation of ELMs with GO term of ELM containing proteins– Color shows the correlation between a GO
term and ELM matches
– Black dots denote annotated GO terms
• Lack of correlations need not be a problem
• But how come ...– LIG_Dynein_DLC8_1 is not annotated as
intracellular protein transport?
– LIG_TRP is not stress response?
– LIG_WRPW_1 and 2 are not involved incell differentiation and development?
– MOD_ASX_betaOH_EGF is not cell differentiation (and perhaps development)?
And now for something completely different: Protein association networks
Genomic Neighborhood
Species Co-occurrence
Gene Fusions
Database Imports
Exp. Interaction Data
Co-expression
Literature co-occurrence
Integrating physical interaction screens
• All screens are not equal– Complex purification vs. Y2H
– Quality varies greatly
• All interactions within a screen are not equal– Quality measure for each type
– Benchmarking against KEGG
• Combination of evidence from multiple screens
• Cross-species transfer of interaction evidence
Mining microarray expression databases
Re-normalize arraysby modern methodto remove biases
Re-normalize arraysby modern methodto remove biases
Buildexpression
matrix
Buildexpression
matrix
Combinesimilar arrays
by PCA
Combinesimilar arrays
by PCA
Construct predictorby Gaussian kerneldensity estimation
Construct predictorby Gaussian kerneldensity estimation
Calibrateagainst
KEGG maps
Calibrateagainst
KEGG maps
Transferassociations
across species
Transferassociations
across species
Co-mentioning in the scientific literature
Associate abstracts with speciesAssociate abstracts with species
Identify gene names in title/abstractIdentify gene names in title/abstract
Count (co-)occurrences of genesCount (co-)occurrences of genes
Test significance of associationsTest significance of associations
Calibrate against KEGG mapsCalibrate against KEGG maps
Transfer associations across speciesTransfer associations across species
Extracting transient interactionsthrough data integration
Mining for ELM mediated interactions
• ELM pattern matching against D. melanogaster SP-proteome using species and domain filters
• Assignment of SMART domains
• Find pairs of proteins having a SMART domain and the corresponding ligand ELM
• Overlay with Y2H protein interaction set by Curagen
Summary: Have ELMs – want function
• There is a huge potential in using ELMs in addition to domains for function prediction
• Conservation of protein features (such as ELMs) in orthologs underlines their importance for protein function
• Integration of ELMs with other evidence types can be used to extract likely (transient) ELM mediated interactions
• Work still remains to be done:– The false positive rate is still too
high for predictive purposes
– Better ELM models are needed
– Better filters are needed
• Wild and crazy ideas– Overlay SMART/ELM pairs with
STRING predicted associations
– Functional associations from ELM/SMART vectors
Acknowledgments
• You – the ELM team
• DisEMBL team– Rune Linding
– Francesca Diella
– Peer Bork
– Toby Gibson
– Rob Russell
• The STRING team– Peer Bork
– Christian von Mering
– Berend Snel
– Martijn Huynen
– Daniel Jaeggi
– Steffen Schmidt
• ArrayProspector– Julien Lagarde
• NetView– Sean Hooper
• The ProtFun team– Søren Brunak– Ramneek Gupta– Can Kesmir– Kristoffer Rapacki– Hans-Henrik Stærfeldt– Henrik Nielsen– Nikolaj Blom– Claus A.F. Andersen– Anders Krogh– Steen Knudsen– Chris Workman
• The EUCLID team– Alfonso Valencia– Damien Devos– Javier Tamames
Thank you!