Interactive tools for improved prediction of the effect of non- synonymous single nucleotide...
-
Upload
pauline-lawrence -
Category
Documents
-
view
228 -
download
0
Transcript of Interactive tools for improved prediction of the effect of non- synonymous single nucleotide...
![Page 1: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/1.jpg)
Interactive tools for improved prediction of the effect of non-synonymous single nucleotide mutations on the protein
Gilad Wainreb Department of Biochemistry and Molecular Biology
Under the supervision of Prof. Nir Ben-Tal
![Page 2: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/2.jpg)
Mutations diseases
• The human population contains approximately 10 million single nucleotide polymorphism (SNP) sites1.
• A large portion of the known genetic diseases is associated with non-synonymous SNPs (nsSNP)2.
1Sachidanandam et al. (2001) Nature, 409, 928–933.2Stenson,P.D et al. (2008) J. Med. Genet., 45,124–126.
![Page 3: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/3.jpg)
Cell culture: constitutive phosphorylation
nsSNPs analysis
Bercovich, D., Ganmore, I., Scott, L.M., Wainreb, G., Birger, Y., Elimelech, A., Shochat, C., Cazzaniga, G., Biondi, A., Basso, G. et al. (2008) Mutations of JAK2 in acute lymphoblastic leukaemias associated with Down's syndrome. Lancet, 372, 1484-1492.
Does it alter the protein’s function?
?
Phenotypic expression
How does the mutation affect the protein’s function?
?
?
R683G, R683S, R683K in the Janus kinase (JAK2)
?
?
?
Myeloproliferative disorders
?
?
?
![Page 4: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/4.jpg)
1Ashkenazy et al. (2010) NAR, web-server issue.
JAK2 R683G, R683S, R683K SNPs analysis
A homology model of the JAK2 pseudo kinase domain colored according to the ConSurf1 color scheme
![Page 5: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/5.jpg)
Mutagenesis studies: Deleterious
nsSNPs analysis
Bercovich, D., Ganmore, I., Scott, L.M., Wainreb, G., Birger, Y., Elimelech, A., Shochat, C., Cazzaniga, G., Biondi, A., Basso, G. et al. (2008) Mutations of JAK2 in acute lymphoblastic leukaemias associated with Down's syndrome. Lancet, 372, 1484-1492.
Does it alter the protein’s function?
?
Phenotypic expression
Why does the mutation affect the protein’s function?
?
?
R683G, R683S, R683K in the Janus kinase (JAK2)
?
?
?
Myeloproliferative disorders
Hindering the binding site
![Page 6: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/6.jpg)
Distribution of the Effects of Missense SNPs on Protein Molecular Function
Wang and Moult (2001) , Human Mutation 17:263-270.
![Page 7: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/7.jpg)
Non-synonymous SNPs analysis
Deleterious/neutral
?
?
Disease
?
Protein Mutant stAbilitY Analyzer
Changes in protein stability
![Page 8: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/8.jpg)
Protein Mutant stAbilitY Analyzer
Wainreb et al. (2011) Bioinformatics 27, 3286
![Page 9: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/9.jpg)
What is protein stability?• Protein stability is defined as the difference in Gibbs free energy (ΔG) between the
unfolded and folded states of the protein.
• ΔΔG= ΔGmutant- ΔGWT
Why is protein stability important?• Stability of the native conformation is important for proper function. • 83% of the deleterious SNP involve changes in protein stability1.
Change in protein stability (ΔΔG)
1Wang and Moult (2001). Human Mutation 17:263-270.
![Page 10: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/10.jpg)
Atom-basedAmino acid-based
Computational prediction methods
1 Prevost et al. (1991) PNAS , 88, 10880-10884. 5 Guerois et al. (2002) JMB, 320, 369-387.2 Seeliger et al. (2010) Biophysical journal, 98, 2309-2316. 6 Tian et al. (2010) BMC bioinformatics, 11, 370.3 Bahar and Jernigan (1997) JMB, 266, 195-214. 7 Dehouck et al. (2009) Bioinformatics, 25, 2537-2543.4 Samudrala and Moult (1998) JMB, 275, 895-916. 8 Capriotti et al. (2005) NAR, 33, W306-310.
Physical effective potentials1,2 Molecular dynamics
Statistical effective potentials3,4 Observed frequencies potentials
Empirical effective potentials Machine learning (mostly)
FoldX5
Prethermut6
PopMuSic-2.07
Evolutionary- physicochemical- or sequence-based features
I-Mutant2.08
Van der Waals interactionsTorsion angleElectrostatic terms
Known ΔΔG
Entropic cost(EC)
. . Electrostatic(E)
Torsion angle(TA)
Van der Waals(VDW)
1.02 0.2 . . 0.4 30 0.8 1
-0.3 0.4 . . 0.9 -0.7 1 2
0.8 -0.9 . . 0.2 0.5 0.6 3
ΔΔG = w1*VDW+w2*TA+w3*E+ ..+wi(EC)
Protein Mutant stAbilitY Analyzer
ΔΔG = VDW+ TA+ E+ ..+ EC
![Page 11: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/11.jpg)
Preliminary study (goal)
• The ΔΔG of mutations occurring at the same position tend to cluster.
• Prior knowledge of ΔΔG values of other mutations at the query position might help in the prediction of the query’s ΔΔG.
Protein Mutant stAbilitY Analyzer
![Page 12: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/12.jpg)
Machine learning
• Learn from experience• Correlate between Description <--> outcome
![Page 13: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/13.jpg)
The prediction scheme
Dataset
FeaturesDescribe the substitutions
Find relations between the features and the observed ΔΔG
PoPMuSiC-DB1
2646 mutations in 137 proteins 2155 mutations in 79 proteins
Potapov-DB2
Structural-based
• Solvent accessibility.• Prethermut.• PoPMuSiC-2.0.
Sequence-based
• The wild and mutant AAs.• Physicochemical deviation.
Random Forests3 and collaborative filtering based prediction
Machine learning
1Dehouck et al. (2009) Bioinformatics, 25, 2537-2543.2Potapov et al. (2009) Protein Eng Des Sel, 22, 553-560.3Leo Breiman, Machine Learning. (2001) Kluwer Academic Publishers. p. 5-32.
Mutations with a measured ΔΔG
![Page 14: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/14.jpg)
Calculate the query’s features
Prediction results
Predict the ΔΔG using the features and a pre-calculated Random Forests model
A
B
No known ΔΔG records at the query position (traditional scheme)
C
![Page 15: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/15.jpg)
Machine learning algorithmRandom Forests1
Descriptors
Buried?
WT =P WT =A
Sub
stitu
tions
1Leo, B., Random Forests. 2001, Kluwer Academic Publishers. p. 5-32.
× 700
Decision tree
ΔΔG =0.01
![Page 16: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/16.jpg)
Calculate the query’s descriptors
Prediction results
Predict the ΔΔG using the descriptors and a pre-calculated Random Forests model
A
B
No known ΔΔG records at the query position (traditional scheme)
C
![Page 17: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/17.jpg)
ΔΔG values of the “known” mutations
Calculate descriptors for the mutation with the known ΔΔG and the query mutations
Add the “known” mutations to the training set
Rebuild Random Forests model
Predict the query’s ΔΔG using the new model
ΔΔG Prediction
Query mutation with other known mutations at the query position
Collaborative filtering and content-based algorithm
![Page 18: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/18.jpg)
Collaborative-filtering
![Page 19: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/19.jpg)
Position MU
1WQ549
1LZ12
4LYZ91
1EY041
5DFR91
1EY071
1SAK342
1HMS64
A -0.41 1.51 NA NA NA 0.55 0.38 NA. . . . . . . . .. . . . . . . . .
W -2.43 NA NA NA 1.34 NA NA NA
Y NA NA 3.07 NA NA NA NA NA
movie User
Star wars E.T Jaws
Top gun
ConAir
Pulpfiction Troy
Wall-E
Jane 4 1 NA NA NA 5 3 NA
. . . . . . . . .
. . . . . . . . .
Tom 4 NA 1 NA NA NA NA NA
John 2 NA NA NA 3 NA NA NA
Bellkor1 collaborative-filtering algorithmFirst we represent the mutation data as a sparse matrix:
The purpose of the algorithm is to predict the missing elements in the matrix by creating a model according to the available data
1Koren, Y. (2008) Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’08). pp. 426-434.
Protein position
Possible mutation outcomes
![Page 20: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/20.jpg)
Bellkor collaborative-filtering model
The neighborhood model
The latent factor model
The Bellkor algorithm takes into account only the ΔΔG table:
Different positions and amino acids have different ΔΔG tendencies
Predict the baseline estimators for each amino acid and position
Baseline estimators model
?Rui = bui+
bui=bi+bu
Position MU
1WQ549
1LZ12
4LYZ91
5DFR91
1EY071
1SAK342
1HMS64
A -0.41 1.51 NA NA 0.55 0.38 NA
. . . . . . . .
. . . . . . . .
W -2.43 NA NA 1.34 NA NA NA
Y NA NA 3.07 NA NA 2.3 NA
bu...
.
.
bi . . . . . . .
![Page 21: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/21.jpg)
Bellkor collaborative-filtering model
Baseline estimators model
Rui = bui+
Up to now no “Biology” was introduced into our model only ΔΔG data
Content-based modelUse a linear regression with a subset of the features to describe the mutation:
XuiF
1) Pro-Maya (RF)2) Prethermut3) PoPMuSiC-2.0
4) Solvent accessibility
Pro-Maya algorithm
For example: Rui = bui+ F1*solvent accesibility+F2*Prethermut …
![Page 22: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/22.jpg)
Collaborative-Filtering
Generate a model to relate the ΔΔG of mutations
A matrix representation of the known ΔΔGs
Training:
Calibrated model
Stochastic gradient descent
Query mutation (at a position that is present in the training set) Predicted ΔΔG
Prediction:
![Page 23: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/23.jpg)
Stochastic gradient descent
We want to find the best model i.e. the best set of: bi, bu,and F for which Rui-rui<Ɛ
Rui = bui+ XuiF
1. Create random values for bi, bu ,F.2. Iterate on all the known values of the table.3. For each value measure the error between
the models prediction .4. Move down the slop and go to 2. until Rui-rui<Ɛ
![Page 24: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/24.jpg)
Collaborative-Filtering
Generate a model to relate the ΔΔG of mutations
A matrix representation of the known ΔΔGs
Training:
Calibrated model
Stochastic gradient descent
Query mutation (at a position that is present in the training set) Predicted ΔΔG
Prediction:
![Page 25: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/25.jpg)
Results
![Page 26: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/26.jpg)
How do we test our prediction performance?Cross-validation
All substitutions
Test set Learning set
Test set
Test set
Test set Test setTest setResults on
all substitutions
![Page 27: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/27.jpg)
Cross-validation results
Rosetta
Hunter
I-Mutant2.0
FoldX
CC/PBSA
EGAD
PoPMuSiC-2.0
Combining method
Prethermut
Pro-Maya
0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.26
0.45
0.54
0.55
0.56
0.59
0.62
0.64
0.72
0.77
Performance of current methods on the Potapov-DB
Pearson Correlation Coefficient
![Page 28: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/28.jpg)
PCC – Pearson correlation coefficient
Cross-validation results
number of Mutations
Performance
measure
Pro-MayaPrethermut
Random Forests
Collaborative filtering
No known ΔΔGs at the
query position
910PCC 0.65±0.02
0.61±0.02
1.14RMSE (kcal/mol) 1.09
One or more known ΔΔGs at the query
position
1735PCC 0.79±0.01 0.82±0.01 0.75±0.01
RMSE (kcal/mol) 0.92 0.86 0.99
![Page 29: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/29.jpg)
validation set results
I-mutant-2.0
Eris
CUPSAT
FoldX
Automute
Dmutant
PoPMuSiC-2.0
Prethermut
Pro-Maya
0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.29
0.35
0.37
0.4
0.46
0.48
0.69
0.72
0.79
Performance of current methods on the Validation set
Pearson Correlation Coefficient
![Page 30: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/30.jpg)
Pro-Maya performs well also in its sequence-based prediction scheme
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
Pro-Maya Pro-Maya sequence only Prethermut
Pearson correla-tion coefficient
![Page 31: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/31.jpg)
How does the number of mutations with known ΔΔG values in the query position affect the prediction accuracy
One or two known ΔΔGs are sufficient to improve the prediction accuracy significantly (alanine scanning)
0 (910) 1 (524) 2 (327) >2 (690)0.55
0.6
0.65
0.7
0.75
0.8
0.85
The prediction accuracy improves as the number of known records at the query position increases
Number of known ΔΔG values per query position
Pearsoncorrelation coefficient
![Page 32: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/32.jpg)
Conclusions• One or two known records at the query position improve the
prediction.
• The improvement is independent of the amino acid identity of the known records and of the sequence identity of the query protein to the training set.
• Availability: bental.tau.ac.il/ProMaya
![Page 33: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/33.jpg)
Non-synonymous SNPs analysis
Deleterious/neutral
?
?
Disease
?
Protein Mutant stAbilitY Analyzer
Changes in protein stability
![Page 34: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/34.jpg)
Prediction of deleterious SNPs
Wainreb G, Ashkenazy H, Bromberg Y, Starovolsky-Shitrit A, Haliloglu T, Ruppin E, Avraham KB, Rost B, Ben-Tal N. MuD: an interactive web server for the prediction of non-neutral substitutions using protein structural data. Nucleic Acids Res. 2010 Jul 1;38 Suppl:W523-8.
![Page 35: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/35.jpg)
Previous workSequence-based:
1) Agreement with the profiles of AA residues in the alignment (SIFT)1.
2) Physicochemical-based features (MAPP)2.
3) Sequence-based prediction of structural features (SNAP etc.)(NN)3,4.
Sequence- and structure-based:
1) Observed solvent accessibility (Tree classifier)5.
2) Distance to the ligand6.
3) Micro-environment description (SVM)7.
4) SWISS-PROT annotations (PolyPhen)8.
1Ng et al. (2001). Gen. Res, 11. 2Stone et al. (2005) Gen. Res, 15. 3Bromberg et al. (2007), NAR, 35.4Ferrer-Costa et al. (2004) Proteins, 57. 5Saunders et al. (2002). JMB, 322. 6Chasman et al. (2001). JMB, 307.7Capriotti et al. (2005). Bioinfo, 21. 8Ramensky et al. (2002) NAR, 30.
![Page 36: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/36.jpg)
Prediction of deleterious SNPs
? Ligand binding site? Catalytic site? Protein-protein interface site? Stop codon? Changes in protein stability
?
?
?
Disease
Missing data
Dirty datasets
A harder problem Deleterious/neutral
![Page 37: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/37.jpg)
Prediction of deleterious mutations
DatasetSubstitution number 1 2 . . 3000 3001Substitution A54V H89Y N30A K90FObserved ΔΔG 1 -1 1 1
Features Describe the substitutions
Find a pattern that distinguishes deleterious
versus neutral substitutionsMachine learning
Deleterious/Neutral
![Page 38: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/38.jpg)
Dealing with the problem of Noisy data
1Bairoch, et al. NAR, 2005, 332Kawabata et al. NAR,1999, 273Bromberg et al NAR, 2007, 35.
Known mutations in proteins with a solved
crystal structure
Experimentally validated
Evolutionary model substitutions3
Protein Mutant Database2
Swiss-Prot1 variants
![Page 39: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/39.jpg)
Prediction of deleterious mutations
DatasetSubstitution number 1 2 . . 3000 3001Substitution A54V H89Y N30A K90FObserved ΔΔG 1 -1 1 1
Features Describe the substitutions
Find a pattern that distinguishes deleterious
versus neutral substitutionsMachine learning
Deleterious/Neutral
![Page 40: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/40.jpg)
Dealing with the problem of missing data
Structural and sequence based descriptors
Sequence identity with nearest homolog bearing the substitution
Swiss-Prot annotations
Apo- Holo
Substitution matrix distance
SIFT analysis Secondary structure assignment
Neighborhood composition in homologs
Evolutionary conservation (ConSurf1)
Physicochemical deviation
Cα B-factor
Solvent accessibility
Number of sequences in the alignment
Distance from the ligand and binding site conservation
Describes the importance of the query position
SNAP’s prediction
1Ashkenazy et al. (2010) NAR, web-server issue.
![Page 41: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/41.jpg)
Prediction of deleterious mutations
DatasetSubstitution number 1 2 . . 3000 3001Substitution A54V H89Y N30A K90FObserved ΔΔG 1 -1 1 1
Features Describe the substitutions
Random ForestsMachine learning
Deleterious/Neutral
![Page 42: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/42.jpg)
Results
![Page 43: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/43.jpg)
All alpha proteins (18%)
All beta pro-teins (16%)
Alpha and beta pro-teins (a/b)
(27%)
Alpha and beta pro-
teins (a+b) (22%)
Multi-do-main pro-
teins (alpha and beta)
(7%)
Membrane and cell
surface pro-teins and peptides
(4%)
Small pro-teins (3%)
0.05
0.15
0.25
0.35
0.45
0.55
0.65
Cross validation results of MuD and current methods analyzed accord-ing
to the SCOP class of the query protein
MuD SNAP SIFT Polyphen
SCOP class
Mat
thew
s co
rrel
atio
n c
oef
fici
ent
PolyPhen SIFT SNAP MuD0.35
0.37
0.39
0.41
0.43
0.45
0.47
0.49
0.51
Cross-validation results of MuD and current methodsM
att
he
ws
co
rre
lati
on
co
eff
icie
nt
Cross-validation results
![Page 44: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/44.jpg)
Lac repressor
T4 lysozyme
HIV protease
PolyPhen 54.2 36.4
SIFT 40.2 34.8 51.7SNAP 41.2 37.7 30.7MuD 60.93 27.4 45.6
MuD SA 0.64 0.45 0.54
Known oligomerization stateKnown naturally occurring Ligands
Lac repressor
T4 lysozyme
HIV protease
PolyPhen 0.54 0.36 NA
SIFT 0.40 0.34 0.51SNAP 0.41 0.37 0.30MuD 0.61 0.27 0.45
Importance of interactivity
Test set
Predicted oligomerization statePredicted naturally occurring Ligands
![Page 45: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/45.jpg)
PDB structure template selection
PDB ID or user uploaded coordinate file
Protein Sequence
Accept query mutations Accept query mutations
Oligomerization state selection and removal of irrelevant chains
Filtering of biologically irrelevant ligands
Selection of interesting residues
Results page
![Page 46: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/46.jpg)
Conclusions• Development of interactive tool that can incorporate user reported data
into the prediction and improve the prediction. • Biological data is increasing rapidly thus allowing us to improve further the
accuracy (annotations, protein structures).
![Page 47: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/47.jpg)
Thanks• Prof. Nir Ben-Tal• Dr. Guy Nimrod• Haim Ashkenazy
Protein Mutant stAbilitY Analyzer
• Prof. Lior Wolf• Dr. Yves Dehouck
• Dr. Yana Bromberg• Alina Starovolsky-Shitrit• Prof. Turkan Haliloglu• Prof. Eytan Ruppin• Prof. Karen B. Avraham• Prof. Burkhard Rost
To all my friends in the lab:• Dr. Yanay Ofran• Maya Schushan• Yana Gofman• Daphna Meroz• Inbar Fish• Noam Chen• Ofir Goldenberg• Dr. Gal Almogy• Ori Kalid• Dr. Meytal Landau• Dr. Sarel Fleishman• Uri Ron• Adva Suez• Yariv Barkan• Matan Kalman
To my wife Adi for putting up with all the saturdays and nights .....
Funding:• Eurohear project.• DIP program.
Just a reminder that not all mutations are bad
![Page 48: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/48.jpg)
Additional experimental data
![Page 49: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/49.jpg)
![Page 50: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/50.jpg)
Random Forests cross-validation results
![Page 51: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/51.jpg)
How does the amino acid type of mutations with known ΔΔG values in the query position affect the prediction accuracy
Position MU
1WQ549
A NA
C NAD ?E 0.02F NAG 1.2H NAI -0.4. .. .Y NA
Amino acid pair
Miyata physicochemical distance
A G 0.9
A E 3.98
Minimal MiyataPhysicochemicaldistance
?Accuracy X
RMSEMinimal Miyata
physicochemical distance
0.3 0.9
0.5 3.98. .. .
0.02 2.37
Miyata et al (1979) Journal of molecular evolution, 12, 219-236.
0.14
![Page 52: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/52.jpg)
![Page 53: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/53.jpg)
PM* ΔΔGRF CFCB ΔΔGRF CFCB
LOO type LOO
unseenLOO
allLOO
unseenLOO
allLOO
unseenLOO
all
The whole
dataset
PCC 0.71±0.01 0.74±0.01
0.75±0.01 0.77±0.01
RMSE (kcal/mol) 1.04 0.98 0.96 0.94
SRPMPCC 0.60±0.02 0.64±0.02
RMSE (kcal/mol) 1.15 1.10
MRPMPCC 0.76±0.01 0.79±0.01 0.83±0.01 0.82±0.01
RMSE (kcal/mol) 0.98 0.91 0.84 0.84
how well Pro-Maya performs on query mutations at proteins that are not homologous to any of the proteins in the training set
![Page 54: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/54.jpg)
Random Forests cross-validation results on the SRPM and MRPM subsets of PoPMuSiC-DB and Potapov-DB
SRPM – Single-Replacement Position MutationMRPM - Multi-Replacement Position Mutation
![Page 55: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/55.jpg)
validation set results
![Page 56: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/56.jpg)
DatasetPerformance
measure
Pro-MayaSequence -
based
Pro-MayaStructure -
basedPrethermut
All the dataset
Potapov-DB
PCC 0.68±0.02 0.77±0.01 0.72±0.01RMSE
(kcal/mol)1.27 1.09 1.20
PoPMuSiC-DB
PCC 0.69±0.01 0.77±0.01 0.71±0.01
RMSE (kcal/mol)
1.06 0.94 1.05
SRPM
Potapov-DB
PCC 0.44±0.03 0.59±0.04 0.57±0.03RMSE
(kcal/mol)1.45 1.28 1.30
PoPMuSiC-DB
PCC 0.55±0.02 0.64±0.02 0.61±0.02RMSE
(kcal/mol)1.20 1.11 1.14
MRPM
Potapov-DB
PCC 0.77±0.01 0.83±0.01 0.77±0.01RMSE
(kcal/mol)1.15 0.98 1.14
PoPMuSiC-DB
PCC 0.76±0.01 0.82±0.01 0.75±0.01RMSE
(kcal/mol)0.97 0.85 0.99
Sequence-based Pro-Maya
![Page 57: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/57.jpg)
0 0.2 0.4 0.6 0.8 1
Mutant and WT AAIsoelectric point deviation
Number of sequencesProtein flexibility
Torsion angle potentialsMolecular weight deviation
SIDCHHydrophobicity deviation
Solvent accessibilityPoPMuSiC-2.0
Prethermut
Mean decrease in mean square error
Relative importance of Pro-Maya's features
![Page 58: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/58.jpg)
Additional explanatory slides
![Page 59: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/59.jpg)
More models
![Page 60: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/60.jpg)
Bellkor collaborative-filtering model
The latent factor model
The Bellkor algorithm takes into account only the ΔΔG table:
Baseline estimators model
Rui = bui+
Position MU
1WQ549
1LZ12
4LYZ91
5DFR91
1EY071
1SAK342
1HMS64
A -0.41 0.4 NA NA 0.55 0.38 NA
. . . . . . . .
. . . . . . . .
W -2.43 NA NA -1.34 NA NA NA
Y 4.5 NA 3.07 2.1 NA 2.3 NA
The neighborhood model
Find k positions with similar ΔΔG values
Learn the pairwise weights for the
neighboring positionsWij, Cij
0.5
( ; )
( ; )k
kuj uj ij ij
j R u i
R u i r b w c
Relate the ΔΔG values of neighboring positions i and j
![Page 61: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/61.jpg)
Bellkor collaborative-filtering model
The Bellkor algorithm takes into account only the ΔΔG table:
Baseline estimators model
Rui = bui+
The neighborhood model
Mat
rix
Decom
positi
on
MU
Positionf
f
~~x
The latent factor model
Position MU
1WQ549
1LZ12
4LYZ91
1EY071
1SAK342
A -0.41 0.4 NA 0.55 0.38
. . . . . .
. . . . . .W -2.43 NA NA NA NAY NA NA 3.07 NA 2.3
-2.43
Break down the ΔΔG matrix into two matrices and
𝒑𝒖𝒕
qi
+ ptuqi 0.5
( ; )
( ; )k
kuj uj ij ij
j R u i
R u i r b w c
![Page 62: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/62.jpg)
Latent factors
MU
f = number of Latent factors
m . . . . . . . . . . . . . . . . . . . . 2 1A
.
.W
Y
Clustering of similar positions into a representative position while taking into account only the ΔΔG values
Positions
f . . . 1A
.
.
W
Y
Latent factor model example
MU
![Page 63: Interactive tools for improved prediction of the effect of non- synonymous single nucleotide mutations on the protein Gilad Wainreb Department of Biochemistry.](https://reader035.fdocuments.net/reader035/viewer/2022062802/56649ee75503460f94bf7c6c/html5/thumbnails/63.jpg)
Bellkor collaborative-filtering model
The Bellkor algorithm takes into account only the ΔΔG table:
Baseline estimators model
Rui = bui+
The neighborhood model
Mat
rix
Decom
positi
on
MU
Positionf
f
~~x
The latent factor model
Position MU
1WQ549
1LZ12
4LYZ91
1EY071
1SAK342
A -0.41 0.4 NA 0.55 0.38
. . . . . .
. . . . . .W -2.43 NA NA NA NAY NA NA 3.07 NA 2.3
Break down the ΔΔG matrix into two matrices and
𝒑𝒖𝒕
qi
+ ptuqi 0.5
( ; )
( ; )k
kuj uj ij ij
j R u i
R u i r b w c