Protein Structure Prediction
description
Transcript of Protein Structure Prediction
![Page 1: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/1.jpg)
Protein Structure Prediction● Why ?● Type of protein structure
predictions– Sec Str. Pred
– Homology Modelling
– Fold Recognition
– Ab Initio
● Secondary structure prediction– Why
– History
– Performance
– Usefullness
![Page 2: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/2.jpg)
Why do we need structure prediction?
● 3D structure give clues to function: – active sites, binding sites, conformational changes... – structure and function conserved more than sequence – 3D structure determination is difficult, slow and
expensive – Intellectual challenge, Nobel prizes etc... – Engineering new proteins
![Page 3: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/3.jpg)
The Use of Structure
![Page 4: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/4.jpg)
The Use of Structure
![Page 5: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/5.jpg)
The Use of Structure
![Page 6: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/6.jpg)
It's not that simple...
● Amino acid sequence contains all the information for 3D structure (experiments of Anfinsen, 1970's)
● But, there are thousands of atoms, rotatable bonds, solvent and other molecules to deal with...
● Levinthal's paradox
![Page 7: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/7.jpg)
Structure predictionSummary of the four main approaches to structure prediction. Note
that there are overlaps between nearly all categories.
Method Knowledge Approac h Difficulty Usefulness
Compar a tive modelling (Homolog y modelling)
Protei ns of known stru cture
Ident ify relate d s tructure with sequenc e me thod s , cop y 3D coord s and modi fy where necessa ry
Relativel y eas y Very, if sequence identity drug design
Fold recognitio n
Protei ns of known stru cture
Sam e a s abov e , but use mor e sophistic a ted me thod s to find related structure
Mediu m Limited due to poor models
Seconda ry stru cture predictio n
Sequence -stru cture stat istics
Forg e t 3D arra ngeme nt a nd predic t wher e the helice s /s trand s are
Mediu m Can improve alignments, fold recognition, ab initio
ab initio tertia ry stru cture predictio n
Energ y functions , stat istics
Simulat e folding, or gen e rate lots of s tructure s an d try to pick the corre ct one
Ver y har d Not really
![Page 8: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/8.jpg)
CASP Critical Assessment of Techniques for Protein Structure Prediction
● Why do we have CASP ?
– People cheat! ● people work hard to make prediction programs work for their
favourite proteins, but...
– benchmarking may be polluted by ``information leakage'' ● Difficult to compare methods fairly
● software and data issues
● different measures, standards
● What we want is fully blind trials of prediction methods by a third party, i.e. CASP
![Page 9: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/9.jpg)
CASP
![Page 10: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/10.jpg)
Secondary structure predictions
● Ignore 3D, it's too hard!
– Usually concentrate on helix, strand and ``coil''. ● Pattern recognition, but which patterns? ● some amino acids have preferences for helix or strand; due to
geometry and hydrogen bonding ● spatial (along sequence) patterns, alternating hydrophobics (helical
wheel) ● conservation (down alignment) in different members of protein
family; insertions and deletions ● Three main generations/stages in SSP method development since
1970's.
![Page 11: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/11.jpg)
What is ``known secondary structure''?
● Of critical importance in training/assessment of SSP methods
● Can be defined: ● visually by structural biologist ● by geometric and chemical criteria (, angles,
distances between atoms, hydrogen bonds...) by programs like DSSP and STRIDE
![Page 12: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/12.jpg)
Secondary structures -Helix
![Page 13: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/13.jpg)
Secondary Structure - Sheet
![Page 14: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/14.jpg)
Secondary structure - turns
![Page 15: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/15.jpg)
Physics of secondary structures
● Two main opposing forces– sidechain conformational entropy – mainchain hydrogen bonding.
● This predicts:– Helix propensity Ala>Leu>Ile>Val
● Other factors– Polarity (low helical propensity of Ser, Thr, Asp and
Asn)
![Page 16: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/16.jpg)
Secondary Structure Predictions
Some highlights in performance
– 1974 Chou and Fasman 50% – 1978 Garnier 62% – 1993 PhD 72% – 2000 PsiPred 76%
![Page 17: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/17.jpg)
Secondary structure
prediction 1st generation
methods
● Chou and Fassman1) Assign all residues the appropriate set of parameters.
2) Scan through the peptide and identify helical regions
3) Repeat this procedure to locate all of the helical regions in the sequence.
4) Scan through the peptide and identify sheet regions.
5) Solve conflicts between helical and sheet assignments
6) Identify turns● Claims of around 70-80% - actual accuracy about 50-60%
Helix Strand
Strong former E A L M V I
Former H M Q W V F C Y F Q L T W
Weak former K I A
Indifferent D T S R C R G D
Breaker N Y K S H N P
Strong breaker
P G E
![Page 18: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/18.jpg)
GOR III Garnier, Osguthorpe, Robson, 1990
● Secondary structure depends on aminoacids propensities– As in Chou Fassman
● Also influences by neighboring residues– Helix capping– Turns etc
● How to include distant information.● Performance approximately 67%
![Page 19: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/19.jpg)
GOR III Garnier, Osguthorpe, Robson, 1990
The helix propensity tables thus have 20x17 entries.
Assign the state with the highest propensity
![Page 20: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/20.jpg)
Status of predictions in 1990
● Too short secondary structure segments ● About 65% accuracy ● Worse for Beta-strands● Example:
![Page 21: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/21.jpg)
Secondary structure prediction 2nd generation methods
● sequence-to-structure relationship modelled using more complex statistics, e.g. artificial neural networks (NNs) or hidden Markov models (HMMs)
● evolutionary information included (profiles) ● prediction accuracy >70% (PhD, Rost 1993)
![Page 22: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/22.jpg)
PhD (Rost & Sander, 1994)
![Page 23: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/23.jpg)
PhD-Input
![Page 24: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/24.jpg)
PhD-architecture
![Page 25: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/25.jpg)
PhD-predictions
● Secondary structure ``prediction'' by homology
● If sequence of unknown secondary structure has a homologue of known structure, it is more accurate to make an alignment and copy the known secondary structure over to the unknown sequence, than to do ``ab initio'' secondary structure prediction.
![Page 26: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/26.jpg)
PhD summary
● First methods with >70% Q3● Correct length distributions● Much better beta strand predictions● Good correlation between score and accuracy● Better predictions for larger multiple sequence
alignments
![Page 27: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/27.jpg)
3rd generation methods
● enhanced evolutionary sequence information (PSI-BLAST profiles) and larger sequence databases takes Q3 to > 75%
● PHD and PSIPRED are the best known methods
![Page 28: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/28.jpg)
PSIPRED
● Similar to PhD● Psiblast to detect more remote homologs● only two layers● SVM or NN gives similar performance
![Page 29: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/29.jpg)
Current Status of Secondary Structure predictions
● Best Methods
– PsiPred
– Sam-T02
– Prof ● About 75%-76% accuracy
● Improvement mainly due to:
– Larger Databases
– PSI-BLAST
![Page 30: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/30.jpg)
Other secondary structure prediction methods
● turn prediction ● transmembrane helix prediction ● coiled coil ● Dissorder predictions● contact prediction, disulphides
![Page 31: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/31.jpg)
What use is it?
● No 3D means no clues to detailed function, so... ● Accurate secondary structure predictions help
sequence analysis: finding homologues, aligning homologues, identifying domain boundaries.
● Can help true 3D prediction
![Page 32: Protein Structure Prediction](https://reader030.fdocuments.net/reader030/viewer/2022012919/568152ce550346895dc0e9ab/html5/thumbnails/32.jpg)
Future improvements to SSP
● Long range information– Baker
● Folding pathway and/or 3D-information