Predicting interactions between genes based on genome sequence comparisons The “genomic context”...
-
date post
18-Dec-2015 -
Category
Documents
-
view
224 -
download
0
Transcript of Predicting interactions between genes based on genome sequence comparisons The “genomic context”...
Predicting interactions between Predicting interactions between genes based on genome sequence genes based on genome sequence
comparisonscomparisons
The “genomic context” component of STRINGThe “genomic context” component of STRING
Bioinformatics seminar seriesBioinformatics seminar series15-11-200515-11-2005
Berend SnelBerend Snel
Predicting interactions between Predicting interactions between genes based on genome sequence genes based on genome sequence
comparisonscomparisons
The “genomic context” component of STRINGThe “genomic context” component of STRING
Bioinformatics seminar seriesBioinformatics seminar series15-11-200515-11-2005
Berend SnelBerend Snel
TodayTodayTodayToday
• Announcement: the seminar of Jakob de Vlieg Announcement: the seminar of Jakob de Vlieg on 22 November is canceled. Please consult on 22 November is canceled. Please consult the website of the seminar series the website of the seminar series (www.cmbi.ru.nl/edu/seminars) for the new (www.cmbi.ru.nl/edu/seminars) for the new date. date.
• Seminar (today); please ask questions !!!Seminar (today); please ask questions !!!
• Handing out article and questions : Handing out article and questions : ““Identification of a bacterial regulatory system Identification of a bacterial regulatory system for ribonucleotide reductases by phylogenetic for ribonucleotide reductases by phylogenetic profiling.profiling.” Read the article and hand in the ” Read the article and hand in the answers to the questions by Monday November answers to the questions by Monday November 28th. 28th.
ContentsContentsContentsContents
• Predicting functional interactions between Predicting functional interactions between proteins; what & whyproteins; what & why
• Genomic context methods Genomic context methods – GeneralGeneral– Gene fusionGene fusion– Gene orderGene order– Presence / absence of genes across Presence / absence of genes across
genomesgenomes• Integration and benchmarking of Integration and benchmarking of
predictionspredictions• Biochemistry by other means BolABiochemistry by other means BolA• In addition to genomic context: functional In addition to genomic context: functional
genomics datagenomics data
• Predicting functional interactions between Predicting functional interactions between proteins; what & whyproteins; what & why
• Genomic context methods Genomic context methods – GeneralGeneral– Gene fusionGene fusion– Gene orderGene order– Presence / absence of genes across Presence / absence of genes across
genomesgenomes• Integration and benchmarking of Integration and benchmarking of
predictionspredictions• Biochemistry by other means BolABiochemistry by other means BolA• In addition to genomic context: functional In addition to genomic context: functional
genomics datagenomics data
Complete genomes, now what?Complete genomes, now what?Complete genomes, now what?Complete genomes, now what?
• Post-genomic era = we have the parts list Post-genomic era = we have the parts list (complete genomes) (complete genomes)
• to understand the cell we need to know the to understand the cell we need to know the functions of the genes functions of the genes
• Post-genomic era = we have the parts list Post-genomic era = we have the parts list (complete genomes) (complete genomes)
• to understand the cell we need to know the to understand the cell we need to know the functions of the genes functions of the genes
A bacterial genomeA bacterial genomeA bacterial genomeA bacterial genome gene 408..1748 /gene="dnaA" /locus_tag="BCE33L0001" /old_locus_tag="BCZK0001" CDS 408..1748 /gene="dnaA" /locus_tag="BCE33L0001“ /old_locus_tag="BCZK0001" /inference="non-experimental evidence, no additional details recorded“ /codon_start=1 /transl_table=11 /product="chromosomal replication initiator protein“ /protein_id="AAU20227.1" /db_xref="GI:51978677“ /translation="MENISDLWNSALKELEKKVSKPSYETWLKSTTAHNLKKDVLTIT APNEFARDWLESHYSELISETLYDLTGAKLAIRFIIPQSQAEEEIDLPPAKPNAAQDD SNHLPQSMLNPKYTFDTFVIGSGNRFAHAASLAVAEAPAKAYNPLFIYGGVGLGKTHL MHAIGHYVIEHNPNAKVVYLSSEKFTNEFINSIRDNKAVDFRNKYRNVDVLLIDDIQF LAGKEQTQEEFFHTFNALHEESKQIVISSDRPPKEIPTLEDRLRSRFEWGLITDITPP DLETRIAILRKKAKAEGLDIPNEVMLYIANQIDSNIRELEGALIRVVAYSSLINKDIN ADLAAEALKDIIPNSKPKIISIYDIQKAVGDVYQVKLEDFKAKKRTKSVAFPRQIAMY LSRELTDSSLPKIGEEFGGRDHTTVIHAHEKISKLLKTDTQLQKQVEEINDILK" gene 1927..3066 /gene="dnaN" /locus_tag="BCE33L0002" /old_locus_tag="BCZK0002" CDS 1927..3066 /gene="dnaN" /locus_tag="BCE33L0002" /old_locus_tag="BCZK0002" /EC_number="2.7.7.7" /inference="non-experimental evidence, no additional details recorded" /codon_start=1 /transl_table=11 /product="DNA polymerase III, beta subunit" /protein_id="AAU20226.1" /db_xref="GI:51978676" /translation="MRFTIQKDYLVRSVQDVMKAVSSRTTIPILTGIKVVATEEGVTL TGSDADISIESFIPVEEDGKEIVEVKQSGSIVLQAKYFSEIVKKLPKETVEISVENHL MTKITSGKSEFNLNGLDSAEYPLLPQIEEHHVFKIPTDLLKHMIRQTVFAVSTSETRP ILTGVNWKVYNSELTCIATDSHRLALRKAKIEGIADEFQANVVIPGKSLNELSKILDE SEEMVDIVITEYQVLFRTKHLLFFSRLLEGNYPDTTRLIPAESKTDIFVNTKEFLQAI DRASLLARDGRNNVVKLSTLEQAMLEISSNSPEIGKVVEEVQCEKVDGEELKISFSAK YMMDALKALDSTEIKISFTGAMRPFLIRTVNDESIIQLILPVRTY"
For most genes in any genome we need function For most genes in any genome we need function predictionprediction
For most genes in any genome we need function For most genes in any genome we need function predictionprediction
- E. Coli, the most intensively studied organism: only 1924 genes (~43%) have been (partially)
experimentally characterized.
- E. Coli, the most intensively studied organism: only 1924 genes (~43%) have been (partially)
experimentally characterized.
What is function ?
Various levels of description:
Sequence similarity/homology has the largest relevance for “Molecular Function”. This aspect of protein function is best conserved.Molecular function can often be predicted from similarities between protein sequences (BLAST), or structures.
What is function ?
Various levels of description:
Sequence similarity/homology has the largest relevance for “Molecular Function”. This aspect of protein function is best conserved.Molecular function can often be predicted from similarities between protein sequences (BLAST), or structures.
Predicting protein functionPredicting protein functionPredicting protein functionPredicting protein function
Homology: BLAST and / or SMART/PFAM/CDD Homology: BLAST and / or SMART/PFAM/CDD Homology: BLAST and / or SMART/PFAM/CDD Homology: BLAST and / or SMART/PFAM/CDD
gi|22209068|Mayven [Homo sapiens] 1159
gi|21410410|Klhl2 protein [Mus musculus] 1145
. . .
. . .
i|55725960|hypothetical protein [Pongo pygmaeus] 887
gi|6644176|Klhl3 [Homo sapiens] 885
gi|19354513|Klhl3 protein [Mus musculus] 765
gi|12644384| Ring canal kelch protein [Drosophila melanogaster] 676
““Beyond” homology and molecular functionBeyond” homology and molecular function““Beyond” homology and molecular functionBeyond” homology and molecular function
Homology based function prediction works Homology based function prediction works very well, yet:very well, yet:
• a large fraction of genes are poorly a large fraction of genes are poorly described (no homologs, uncharacterized described (no homologs, uncharacterized homologs; this holds for ~60% of the homologs; this holds for ~60% of the human genes)human genes)
• There are other aspects of function: There are other aspects of function: functional associations, e.g. the target of a functional associations, e.g. the target of a protein kinase or a transcriptional protein kinase or a transcriptional regulator, I.e. to understand the cell we regulator, I.e. to understand the cell we need to know the interactions of the genesneed to know the interactions of the genes
Thus: predicting associationsThus: predicting associations
Homology based function prediction works Homology based function prediction works very well, yet:very well, yet:
• a large fraction of genes are poorly a large fraction of genes are poorly described (no homologs, uncharacterized described (no homologs, uncharacterized homologs; this holds for ~60% of the homologs; this holds for ~60% of the human genes)human genes)
• There are other aspects of function: There are other aspects of function: functional associations, e.g. the target of a functional associations, e.g. the target of a protein kinase or a transcriptional protein kinase or a transcriptional regulator, I.e. to understand the cell we regulator, I.e. to understand the cell we need to know the interactions of the genesneed to know the interactions of the genes
Thus: predicting associationsThus: predicting associations
TranscriptionregulationTranscriptionregulation
PPSignalling pathwaysSignalling pathways
Protein complexesProtein complexes
Metabolic pathwaysMetabolic pathways
There are many types of There are many types of functional associationsfunctional associations (AKA functional interactions, interactions, (AKA functional interactions, interactions,
functional links, functional relations) in molecular functional links, functional relations) in molecular biologybiology
There are many types of There are many types of functional associationsfunctional associations (AKA functional interactions, interactions, (AKA functional interactions, interactions,
functional links, functional relations) in molecular functional links, functional relations) in molecular biologybiology
Cellular processCellular processCellular processCellular process
Types of functional associationsTypes of functional associationsTypes of functional associationsTypes of functional associations
metabolic pathways: filling gapsmetabolic pathways: filling gapsmetabolic pathways: filling gapsmetabolic pathways: filling gaps
Types of functional associationsTypes of functional associationsTypes of functional associationsTypes of functional associations
Transcription regulationTranscription regulationTranscription regulationTranscription regulation
PP
Signalling pathwaysSignalling pathways
Types of functional associationsTypes of functional associationsTypes of functional associationsTypes of functional associations
Cellular processCellular process(“DNA repair”, “Apoptosis”)(“DNA repair”, “Apoptosis”)Cellular processCellular process(“DNA repair”, “Apoptosis”)(“DNA repair”, “Apoptosis”)
Protein complexesProtein complexes
So how can knowledge of the functional So how can knowledge of the functional associations help?associations help?
So how can knowledge of the functional So how can knowledge of the functional associations help?associations help?
• If we did not know anything about the If we did not know anything about the function of the protein we can now say in function of the protein we can now say in which process it is involvedwhich process it is involved
• If we already knew something about the If we already knew something about the function, we might now know much more function, we might now know much more about the function (I.e. if we knew it was a about the function (I.e. if we knew it was a hydrolase we might now know in which hydrolase we might now know in which metabolic pathway it is active)metabolic pathway it is active)
• If the gene was already well characterized, If the gene was already well characterized, we might understand its role better (I.e. we might understand its role better (I.e. new targets for a kinase) new targets for a kinase)
• If we did not know anything about the If we did not know anything about the function of the protein we can now say in function of the protein we can now say in which process it is involvedwhich process it is involved
• If we already knew something about the If we already knew something about the function, we might now know much more function, we might now know much more about the function (I.e. if we knew it was a about the function (I.e. if we knew it was a hydrolase we might now know in which hydrolase we might now know in which metabolic pathway it is active)metabolic pathway it is active)
• If the gene was already well characterized, If the gene was already well characterized, we might understand its role better (I.e. we might understand its role better (I.e. new targets for a kinase) new targets for a kinase)
ContentsContentsContentsContents
• Predicting functional interactions between Predicting functional interactions between proteinsproteins
• Genomic context methods Genomic context methods – General (how do we predict functional General (how do we predict functional
interactions)interactions)– Gene fusionGene fusion– Gene orderGene order– Presence / absence of genes across Presence / absence of genes across
genomesgenomes• Integration and benchmarking of predictionsIntegration and benchmarking of predictions• Biochemistry by other means BolA Biochemistry by other means BolA • In addition to genomic context: functional In addition to genomic context: functional
genomics datagenomics data
• Predicting functional interactions between Predicting functional interactions between proteinsproteins
• Genomic context methods Genomic context methods – General (how do we predict functional General (how do we predict functional
interactions)interactions)– Gene fusionGene fusion– Gene orderGene order– Presence / absence of genes across Presence / absence of genes across
genomesgenomes• Integration and benchmarking of predictionsIntegration and benchmarking of predictions• Biochemistry by other means BolA Biochemistry by other means BolA • In addition to genomic context: functional In addition to genomic context: functional
genomics datagenomics data
How can we now predict / detect functional How can we now predict / detect functional associations?associations?
How can we now predict / detect functional How can we now predict / detect functional associations?associations?
• Functional genomics / high throughput Functional genomics / high throughput experimentsexperiments
• GENOMIC CONTEXTGENOMIC CONTEXT
• Functional genomics / high throughput Functional genomics / high throughput experimentsexperiments
• GENOMIC CONTEXTGENOMIC CONTEXT
functionally associated proteins leave functionally associated proteins leave evolutionary tracesevolutionary traces of their relation in genomes of their relation in genomes
functionally associated proteins leave functionally associated proteins leave evolutionary tracesevolutionary traces of their relation in genomes of their relation in genomes
We can thus detect We can thus detect evolutionary traces of a evolutionary traces of a functional association by functional association by comparing genomescomparing genomes
• Use the genome sequences Use the genome sequences ThemselvesThemselves (through (through comparative genome analysis) for interaction comparative genome analysis) for interaction prediction: genomic context methodsprediction: genomic context methods
• Use the genome sequences Use the genome sequences ThemselvesThemselves (through (through comparative genome analysis) for interaction comparative genome analysis) for interaction prediction: genomic context methodsprediction: genomic context methods
Genomic context is an tool to predict functional Genomic context is an tool to predict functional associations between genesassociations between genes
Genomic context is an tool to predict functional Genomic context is an tool to predict functional associations between genesassociations between genes
0 0.2 0.4 0.6 0.8 1Score
0
0.2
0.4
0.6
0.8
1
FusionGene OrderCo-occurrenceF
ract
ion
sam
e K
EG
G m
a p
•Genomic context Genomic context methods have been methods have been shown to be reliable shown to be reliable indicators for indicators for functional interactionfunctional interaction
• Genomic context is Genomic context is also known as also known as in silicoin silico interaction prediction, interaction prediction, or genomic or genomic associationsassociations
•Genomic context Genomic context methods have been methods have been shown to be reliable shown to be reliable indicators for indicators for functional interactionfunctional interaction
• Genomic context is Genomic context is also known as also known as in silicoin silico interaction prediction, interaction prediction, or genomic or genomic associationsassociations
Three different genomic context methods in Three different genomic context methods in STRINGSTRING
Three different genomic context methods in Three different genomic context methods in STRINGSTRING
• Gene fusion, Rosetta stone method Gene fusion, Rosetta stone method • Conserved gene order between divergent Conserved gene order between divergent
genomes genomes • Co-occurrence of genes across genomes, Co-occurrence of genes across genomes,
phylogenetic profilesphylogenetic profiles
• Gene fusion, Rosetta stone method Gene fusion, Rosetta stone method • Conserved gene order between divergent Conserved gene order between divergent
genomes genomes • Co-occurrence of genes across genomes, Co-occurrence of genes across genomes,
phylogenetic profilesphylogenetic profiles
ContentsContentsContentsContents
• Predicting functional interactions between Predicting functional interactions between proteinsproteins
• Genomic context methodsGenomic context methods – GeneralGeneral– Gene fusionGene fusion– Gene orderGene order– Presence / absence of genes across Presence / absence of genes across
genomesgenomes• Integration and benchmarking of Integration and benchmarking of
predictionspredictions• Biochemistry by other means BolA Biochemistry by other means BolA • In addition to genomic context: functional In addition to genomic context: functional
genomics datagenomics data
• Predicting functional interactions between Predicting functional interactions between proteinsproteins
• Genomic context methodsGenomic context methods – GeneralGeneral– Gene fusionGene fusion– Gene orderGene order– Presence / absence of genes across Presence / absence of genes across
genomesgenomes• Integration and benchmarking of Integration and benchmarking of
predictionspredictions• Biochemistry by other means BolA Biochemistry by other means BolA • In addition to genomic context: functional In addition to genomic context: functional
genomics datagenomics data
Gene fusionGene fusionGene fusionGene fusion
• i.e. the orthologs of two genes in another organism are i.e. the orthologs of two genes in another organism are fused into one polypeptide fused into one polypeptide
• A very reliable indicator for functional interaction; partly A very reliable indicator for functional interaction; partly because it is an relatively infrequent evolutionary event: because it is an relatively infrequent evolutionary event: 3470 distinct fusions when surveying 179 genomes3470 distinct fusions when surveying 179 genomes
• i.e. the orthologs of two genes in another organism are i.e. the orthologs of two genes in another organism are fused into one polypeptide fused into one polypeptide
• A very reliable indicator for functional interaction; partly A very reliable indicator for functional interaction; partly because it is an relatively infrequent evolutionary event: because it is an relatively infrequent evolutionary event: 3470 distinct fusions when surveying 179 genomes3470 distinct fusions when surveying 179 genomes
FusionFusionFusionFusion
ContentsContentsContentsContents
• Predicting functional interactions between Predicting functional interactions between proteinsproteins
• Genomic context methodsGenomic context methods – GeneralGeneral– FusionFusion– Gene orderGene order– Presence / absence of genes across Presence / absence of genes across
genomesgenomes• Integration and benchmarking of Integration and benchmarking of
predictionspredictions• Biochemistry by other means BolA Biochemistry by other means BolA • In addition to genomic context: functional In addition to genomic context: functional
genomics datagenomics data
• Predicting functional interactions between Predicting functional interactions between proteinsproteins
• Genomic context methodsGenomic context methods – GeneralGeneral– FusionFusion– Gene orderGene order– Presence / absence of genes across Presence / absence of genes across
genomesgenomes• Integration and benchmarking of Integration and benchmarking of
predictionspredictions• Biochemistry by other means BolA Biochemistry by other means BolA • In addition to genomic context: functional In addition to genomic context: functional
genomics datagenomics data
Gene order evolves rapidlyGene order evolves rapidlyGene order evolves rapidlyGene order evolves rapidly
But …But …But …But …
Differential retention Differential retention of divergent / of divergent / convergent gene convergent gene pairs suggests that pairs suggests that conservation implies conservation implies a functional a functional associationassociation
““Operons”Operons”
Comparison to pathways conservation implies a functional Comparison to pathways conservation implies a functional associationassociation
Comparison to pathways conservation implies a functional Comparison to pathways conservation implies a functional associationassociation
1
10
100
1000
10000
0 3 6 9 12 15 18 21 24 27 30
co-occurrences in operons
num
ber
of C
OG
s
0
1
2
3
4
5
6
aver
age
met
abol
ic
dist
ance
number of COGS
average metabolicdistance
Conserved gene orderConserved gene orderConserved gene orderConserved gene order
• i.e. genes that are present over ‘sufficiently large’ i.e. genes that are present over ‘sufficiently large’ evolutionary distances in the same gene clusterevolutionary distances in the same gene cluster
• Contributes by far the most predictionsContributes by far the most predictions
• i.e. genes that are present over ‘sufficiently large’ i.e. genes that are present over ‘sufficiently large’ evolutionary distances in the same gene clusterevolutionary distances in the same gene cluster
• Contributes by far the most predictionsContributes by far the most predictions
Conserved gene orderConserved gene orderConserved gene orderConserved gene order
NB1 predicting operons is not trivial; in fact NB1 predicting operons is not trivial; in fact conserved gene order or functional conserved gene order or functional association is a major clueassociation is a major clue
NB2 using ‘only’ operons NB2 using ‘only’ operons without requiring without requiring conservationconservation results in much less reliable results in much less reliable function predictionfunction prediction
Conserved gene order: an example from Conserved gene order: an example from metabolism of propionyl-CoA
Conserved gene order: an example from Conserved gene order: an example from metabolism of propionyl-CoA
““query”query”““target”target”
Conserved gene order: an example from Conserved gene order: an example from metabolism of propionyl-CoA
Conserved gene order: an example from Conserved gene order: an example from metabolism of propionyl-CoA
Biochemical assays Biochemical assays confirm the function confirm the function of members of of members of COG0346 as a DL-COG0346 as a DL-methylmalonyl-CoA methylmalonyl-CoA racemase racemase
Biochemical assays Biochemical assays confirm the function confirm the function of members of of members of COG0346 as a DL-COG0346 as a DL-methylmalonyl-CoA methylmalonyl-CoA racemase racemase
ContentsContentsContentsContents
• Predicting functional interactions between Predicting functional interactions between proteinsproteins
• Genomic context methodsGenomic context methods – GeneralGeneral– Gene fusionGene fusion– Gene orderGene order– Presence / absence of genes across Presence / absence of genes across
genomesgenomes • Integration and benchmarking of Integration and benchmarking of
predictionspredictions• Biochemistry by other means BolA Biochemistry by other means BolA • In addition to genomic context: functional In addition to genomic context: functional
genomics datagenomics data
• Predicting functional interactions between Predicting functional interactions between proteinsproteins
• Genomic context methodsGenomic context methods – GeneralGeneral– Gene fusionGene fusion– Gene orderGene order– Presence / absence of genes across Presence / absence of genes across
genomesgenomes • Integration and benchmarking of Integration and benchmarking of
predictionspredictions• Biochemistry by other means BolA Biochemistry by other means BolA • In addition to genomic context: functional In addition to genomic context: functional
genomics datagenomics data
Presence / absence of genesPresence / absence of genesPresence / absence of genesPresence / absence of genes
Gene content Gene content co-evolution. (The easy case, few genomes. ) co-evolution. (The easy case, few genomes. )Gene content Gene content co-evolution. (The easy case, few genomes. ) co-evolution. (The easy case, few genomes. )
Genomes share genes for phenotypes they have in commonGenomes share genes for phenotypes they have in commonGenomes share genes for phenotypes they have in commonGenomes share genes for phenotypes they have in common
Differences between gene Differences between gene Content reflect differences inContent reflect differences inPhenotypic potentialitiesPhenotypic potentialities
Differences between gene Differences between gene Content reflect differences inContent reflect differences inPhenotypic potentialitiesPhenotypic potentialities
Presence / absence of genesPresence / absence of genesPresence / absence of genesPresence / absence of genes
L. innocua (non-pathogen)L. innocua (non-pathogen) L. monocytogenes (pathogen)L. monocytogenes (pathogen)
Presence / absence of genesPresence / absence of genesPresence / absence of genesPresence / absence of genes
L. innocua (non-pathogenic)L. innocua (non-pathogenic) L. monocytogenes (pathogenic)L. monocytogenes (pathogenic)
Genes involved in pathogenecity Genes involved in pathogenecity
Generalization: phylogenetic profiles / co-occurence
Generalization: phylogenetic profiles / co-occurence
Gene 1: Gene 2:Gene 3:....
Gene 1: Gene 2:Gene 3:....
spec
ies
1 sp
ecie
s 2
spec
ies
3
spec
ies
4
spec
ies
5 ..
....
..
... ..sp
ecie
s 1
spec
ies
2
spec
ies
3
spec
ies
4
spec
ies
5 ..
....
..
... ..
Gene 1: 1 0 1 1 0 1 Gene 2: 1 1 0 0 1 0Gene 3: 0 1 0 0 1 0....
Gene 1: 1 0 1 1 0 1 Gene 2: 1 1 0 0 1 0Gene 3: 0 1 0 0 1 0....
spec
ies
1 sp
ecie
s 2
spec
ies
3
spec
ies
4
spec
ies
5 ..
....
..
... ..sp
ecie
s 1
spec
ies
2
spec
ies
3
spec
ies
4
spec
ies
5 ..
....
..
... ..
Co-occurrence of genes across genomesCo-occurrence of genes across genomes
• i.e. two genes i.e. two genes have the same have the same presence/ absence presence/ absence pattern over pattern over multiple genomes: multiple genomes: they have ‘co-they have ‘co-evolved’evolved’
•AKA phylogenetic AKA phylogenetic profilesprofiles
Predicting function of a disease gene protein with Predicting function of a disease gene protein with unknown function, frataxin, using co-occurrence unknown function, frataxin, using co-occurrence
of genes across genomesof genes across genomes
Predicting function of a disease gene protein with Predicting function of a disease gene protein with unknown function, frataxin, using co-occurrence unknown function, frataxin, using co-occurrence
of genes across genomesof genes across genomes
• Friedreich’s ataxiaFriedreich’s ataxia• No (homolog with) known functionNo (homolog with) known function
• Friedreich’s ataxiaFriedreich’s ataxia• No (homolog with) known functionNo (homolog with) known function
A.aeolicus Synechocystis
B.subtilis
M.genitalium
M.tuberculosis
D.radiodurans
R.prow
azekii
C.crescentus
M.loti
N.m
eningitidis
X.fastidiosa
P.aeruginosa
Buchnera
V.cholerae
H.influenzae
P.multocida
E.coliA
.pernixM
.jannaschii
A.thaliana S.cerevisiae
s
C.jejuni
C.albicans
S.pombe
H.sapiens
C.elegan
H. pylori
D.m
elan.
cyaY Yfh1cyaY Yfh1
hscB Jac1hscB Jac1hscAhscA
ssq1ssq1
Nfu1Nfu1
iscA Isa1-2iscA Isa1-2fdx Yah1fdx Yah1
Arh1Arh1
RnaMRnaMIscRIscRHypHyp
iscS Nfs1 iscS Nfs1 iscU Isu1-2iscU Isu1-2
Atm1Atm1
Frataxin has co-evolved with hscA and hscB Frataxin has co-evolved with hscA and hscB indicating that it plays a role in iron-sulfur cluster indicating that it plays a role in iron-sulfur cluster
assemblyassembly
Frataxin has co-evolved with hscA and hscB Frataxin has co-evolved with hscA and hscB indicating that it plays a role in iron-sulfur cluster indicating that it plays a role in iron-sulfur cluster
assemblyassembly
The opposite of co-occurrence:The opposite of co-occurrence:anti-correlation / complementary patterns: anti-correlation / complementary patterns:
predicting analogous enzymespredicting analogous enzymes
The opposite of co-occurrence:The opposite of co-occurrence:anti-correlation / complementary patterns: anti-correlation / complementary patterns:
predicting analogous enzymespredicting analogous enzymes
A B A B
Genes with complementary phylogenetic profiles tend to have a similar biochemical function.Genes with complementary phylogenetic profiles tend to have a similar biochemical function.
Complementary patterns in thiamin biosynthesis Complementary patterns in thiamin biosynthesis predict analogous enzymespredict analogous enzymes
Complementary patterns in thiamin biosynthesis Complementary patterns in thiamin biosynthesis predict analogous enzymespredict analogous enzymes
Morett E, Korbel JO, Rajan E, Saab-Rincon Morett E, Korbel JO, Rajan E, Saab-Rincon G, Olvera L, Olvera M, Schmidt S, Snel B, G, Olvera L, Olvera M, Schmidt S, Snel B, Bork P. Nature Biotech 2003Bork P. Nature Biotech 2003
Prediction of analogous enzymes is confirmedPrediction of analogous enzymes is confirmedPrediction of analogous enzymes is confirmedPrediction of analogous enzymes is confirmed
ContentsContentsContentsContents
• Predicting functional interactions between Predicting functional interactions between proteinsproteins
• Genomic context methods Genomic context methods – GeneralGeneral– Gene fusionGene fusion– Gene orderGene order– Presence / absence of genes across Presence / absence of genes across
genomes genomes • Integration and benchmarking of Integration and benchmarking of
predictionspredictions• Biochemistry by other means BolA Biochemistry by other means BolA • In addition to genomic context: functional In addition to genomic context: functional
genomics datagenomics data
• Predicting functional interactions between Predicting functional interactions between proteinsproteins
• Genomic context methods Genomic context methods – GeneralGeneral– Gene fusionGene fusion– Gene orderGene order– Presence / absence of genes across Presence / absence of genes across
genomes genomes • Integration and benchmarking of Integration and benchmarking of
predictionspredictions• Biochemistry by other means BolA Biochemistry by other means BolA • In addition to genomic context: functional In addition to genomic context: functional
genomics datagenomics data
Benchmark and integration: KEGG mapsBenchmark and integration: KEGG mapsBenchmark and integration: KEGG mapsBenchmark and integration: KEGG maps
0 0.2 0.4 0.6 0.8 1Score
0
0.2
0.4
0.6
0.8
1
FusionGene OrderCo-occurrenceF
ract
ion
sam
e K
EG
G m
a p
Integrating genomic context scores into one Integrating genomic context scores into one single scoresingle score
• Compare each individual method against an independent benchmark Compare each individual method against an independent benchmark (KEGG), and find “equivalency”(KEGG), and find “equivalency”• Multiply the chances that two proteins are Multiply the chances that two proteins are not not interacting and subtract interacting and subtract from 1; naive bayesian i.e. assuming independencefrom 1; naive bayesian i.e. assuming independence
BenchmarkBenchmarkBenchmarkBenchmark
0.5 0.6 0.7 0.8 0.9 1.0
Accuracy (fraction of confirmed predictions, i.e. same KEGG map)
10
100
1000
10000
100000
Fusion (norm.)Fusion (abs.)
Gene Order (norm.)Gene Order (abs.)Cooccurrence
Integrated
Co
ver
age
(n
umbe
r o
f pre
dic
ted
l ink
sb
etw
ee
n o
r tho
log
ou
s g
r ou
ps)
Accuracy
Co
ver
age
purifiedcomplexes
TAP
yeast two-hybrid
two methods
three methods
PurifiedComplexesHMS-PCI
combinedevidence
mRNAco-expression
genomic context
syntheticlethality
fra
cti
on
of
refe
ren
ce
se
t c
ov
ere
d b
y d
ata
fraction of data confirmed by reference set
filtered data
raw data
parameter choices
Performance of genomic context compared to Performance of genomic context compared to high-throughput interaction datahigh-throughput interaction data
ContentsContentsContentsContents
• Predicting functional interactions between Predicting functional interactions between proteinsproteins
• Genomic context methods Genomic context methods – GeneralGeneral– Gene fusionGene fusion– Gene orderGene order– Presence / absence of genes across Presence / absence of genes across
genomes genomes • Integration and benchmarking of Integration and benchmarking of
predictionspredictions• Biochemistry by other means BolA Biochemistry by other means BolA • In addition to genomic context: functional In addition to genomic context: functional
genomics datagenomics data
• Predicting functional interactions between Predicting functional interactions between proteinsproteins
• Genomic context methods Genomic context methods – GeneralGeneral– Gene fusionGene fusion– Gene orderGene order– Presence / absence of genes across Presence / absence of genes across
genomes genomes • Integration and benchmarking of Integration and benchmarking of
predictionspredictions• Biochemistry by other means BolA Biochemistry by other means BolA • In addition to genomic context: functional In addition to genomic context: functional
genomics datagenomics data
An interaction of BolA with a mono-thiol An interaction of BolA with a mono-thiol glutaredoxin ?glutaredoxin ?
(STRING) (STRING)
BolABolA
BolA and Grx occur as neighbors in a number of genomesBolA and Grx occur as neighbors in a number of genomes
BolaGrx
BolA and Grx have an (almost) identical phylogenetic distributionBolA and Grx have an (almost) identical phylogenetic distribution
BolA and Grx have been shown to interact in Y2H in BolA and Grx have been shown to interact in Y2H in S.cerevisiaeS.cerevisiae and and D.melanogasterD.melanogaster, and in Flag tag in , and in Flag tag in S.cerevisiaeS.cerevisiae
BolA phylogeny
BolA does have (predicted) interactions with cell-division / cell-wall proteins. Those appear secondary to the link with GrX
STRING has obtained a higher resolution in function prediction than phenotypic analyses
Cell division / Cell wallCell division / Cell wall (oxidative) stressoxidative) stress
BolA is homologous to the peroxide reductase OsmC, BolA is homologous to the peroxide reductase OsmC, suggesting a similar functionsuggesting a similar function
OsmC uses thiol groups of two, evolutionary conserved OsmC uses thiol groups of two, evolutionary conserved cysteines to reduce substratescysteines to reduce substrates
Problem: The BolA family does not have conserved Problem: The BolA family does not have conserved cysteines. cysteines.
……It would have to obtain its reducing equivalents from It would have to obtain its reducing equivalents from elsewhere…elsewhere…
BolA family alignmentBolA family alignment
BolA is (homologous to) a reductaseBolA interacts with GrX?
GrX provides BolA with reducing equivalents !? (or “scaffolding?”)
Prediction of interaction partner and molecular function complement each otherPrediction of interaction partner and molecular function complement each other
Genomic context: biochemistry by other meansGenomic context: biochemistry by other meansGenomic context: biochemistry by other meansGenomic context: biochemistry by other means
Despite the high performance of genomic context Despite the high performance of genomic context methods, as a tool for function prediction it is not a methods, as a tool for function prediction it is not a button press methodbutton press method
It is more like biochemistry by other means.It is more like biochemistry by other means.
Often quite a lot of manual input and expert Often quite a lot of manual input and expert knowledge from the researcher is needed to distill knowledge from the researcher is needed to distill associations into a concrete function predictionassociations into a concrete function prediction
Small-scale bioinformatics?Small-scale bioinformatics?
ContentsContentsContentsContents
• Predicting functional interactions between Predicting functional interactions between proteinsproteins
• Genomic context methods Genomic context methods – GeneralGeneral– FusionFusion– Gene orderGene order– Co-occurrence across genomesCo-occurrence across genomes
• Integration and benchmarking of Integration and benchmarking of predictionspredictions
• Interaction networksInteraction networks• In addition to genomic context: functional In addition to genomic context: functional
genomics datagenomics data
• Predicting functional interactions between Predicting functional interactions between proteinsproteins
• Genomic context methods Genomic context methods – GeneralGeneral– FusionFusion– Gene orderGene order– Co-occurrence across genomesCo-occurrence across genomes
• Integration and benchmarking of Integration and benchmarking of predictionspredictions
• Interaction networksInteraction networks• In addition to genomic context: functional In addition to genomic context: functional
genomics datagenomics data
STRING currently in addition includes:STRING currently in addition includes:
• Functional association data from large scale / high-Functional association data from large scale / high-throughput biochemical experiments (functional throughput biochemical experiments (functional genomics data)genomics data)
• protein complex purificationprotein complex purification
• yeast-2-hybridyeast-2-hybrid
• ChIP-on-chipChIP-on-chip
• micro-array gene expressionmicro-array gene expression
• “ “known” functional relations, so called “legacy data”, known” functional relations, so called “legacy data”, as present in PubMed abstracts and databases like as present in PubMed abstracts and databases like MIPS or KEGG. MIPS or KEGG.
STRING currently in addition includes:STRING currently in addition includes:
• Functional association data from large scale / high-Functional association data from large scale / high-throughput biochemical experiments (functional throughput biochemical experiments (functional genomics data)genomics data)
• protein complex purificationprotein complex purification
• yeast-2-hybridyeast-2-hybrid
• ChIP-on-chipChIP-on-chip
• micro-array gene expressionmicro-array gene expression
• “ “known” functional relations, so called “legacy data”, known” functional relations, so called “legacy data”, as present in PubMed abstracts and databases like as present in PubMed abstracts and databases like MIPS or KEGG. MIPS or KEGG.
• Handing out article and questions : Handing out article and questions : ““Identification of a bacterial regulatory Identification of a bacterial regulatory system for ribonucleotide reductases by system for ribonucleotide reductases by phylogenetic profiling.phylogenetic profiling.” Read the article ” Read the article and hand in the answers to the and hand in the answers to the questions by Monday November 28th. questions by Monday November 28th.