Structure Prediction
description
Transcript of Structure Prediction
![Page 1: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/1.jpg)
Structure Prediction
![Page 2: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/2.jpg)
Tertiary protein structure: protein folding
Three main approaches:
[1] experimental determination (X-ray crystallography, NMR)
[2] Comparative modeling (based on homology)
[3] Ab initio (de novo) prediction (Dr. Ingo Ruczinski at JHSPH)
![Page 3: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/3.jpg)
Experimental approaches to protein structure
[1] X-ray crystallography-- Used to determine 80% of structures-- Requires high protein concentration-- Requires crystals-- Able to trace amino acid side chains-- Earliest structure solved was myoglobin
[2] NMR-- Magnetic field applied to proteins in solution-- Largest structures: 350 amino acids (40 kD)-- Does not require crystallization
![Page 4: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/4.jpg)
Steps in obtaining a protein structure
Target selection
Obtain, characterize protein
Determine, refine, model the structure
Deposit in database
![Page 5: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/5.jpg)
X-ray crystallographyhttp://en.wikipedia.org/wiki/X-ray_diffraction
Sperm Whale Myoglobin
![Page 6: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/6.jpg)
![Page 7: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/7.jpg)
![Page 8: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/8.jpg)
PDB• April 08, 2008 – 50,000 proteins, 25 new experimentally
determined structures each day
New folds
Old folds
New
PDB
stru
ctur
es
![Page 9: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/9.jpg)
Example 1wey
![Page 10: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/10.jpg)
Ab initio protein prediction
• Starts with an attempt to derive secondary structure from the amino acid sequence– Predicting the likelihood that a subsequence will fold into an alpha-
helix, beta-sheet, or coil, using physicochemical parameters or HMMs and ANNs
– Able to accurately predict 3/4 of all local structures
![Page 11: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/11.jpg)
Structure Characteristics
![Page 12: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/12.jpg)
Beta Sheets
![Page 13: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/13.jpg)
Ab Inito Prediction
![Page 14: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/14.jpg)
Secondary structure prediction
Chou and Fasman (1974) developed an algorithmbased on the frequencies of amino acids found ina helices, b-sheets, and turns.
Proline: occurs at turns, but not in a helices.
GOR (Garnier, Osguthorpe, Robson): related algorithm
Modern algorithms: use multiple sequence alignmentsand achieve higher success rate (about 70-75%)
Page 279-280
![Page 15: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/15.jpg)
Table
![Page 16: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/16.jpg)
![Page 17: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/17.jpg)
![Page 18: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/18.jpg)
![Page 19: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/19.jpg)
Frequency Domain
![Page 20: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/20.jpg)
Neural Networks
![Page 21: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/21.jpg)
Training the Network
• Use PDB entries with validated secondary structures
• Measures of accuracy– Q3 Score percentage of protein correctly predicted
(trains to predicting the most abundant structure)– You get 50% if you just predict everything to be a
coil– Most methods get around 60% with this metric
![Page 22: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/22.jpg)
Correlation Coeficient
• How correlated are the predictions for coils, helix and Beta-sheets to the real structures
• This ignores what we really want to get to– If the real structure has 3 coils, do we predict 3
coils?• Segment overlap score (Sov) gives credit to
how protein like the structure is, but it is correlated with Q3
![Page 23: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/23.jpg)
![Page 24: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/24.jpg)
Fold recognition (structural profiles)
• Attempts to find the best fit of a raw polypeptide sequence onto a library of known protein folds
• A prediction of the secondary structure of the unknown is made and compared with the secondary structure of each member of the library of folds
![Page 25: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/25.jpg)
Threading
• Takes the fold recognition process a step further:– Empirical-energy functions for residue pair
interactions are used to mount the unknown onto the putative backbone in the best possible manner
![Page 26: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/26.jpg)
Fold recognition by threading
Query sequence
Compatibility scores
Fold 1
Fold 2
Fold 3
Fold N
![Page 27: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/27.jpg)
CASP
• http://www.predictioncenter.org/casp8/index.cgi
![Page 28: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/28.jpg)
SCOP
• SCOP: Structural Classification of Proteins.• http://scop.mrc-lmb.cam.ac.uk/scop/
![Page 29: Structure Prediction](https://reader036.fdocuments.net/reader036/viewer/2022070423/5681672c550346895ddbcdff/html5/thumbnails/29.jpg)
CATH
• CATH: Protein Structure Classification• Class (C), Architecture (A), Topology (T) and
Homologous superfamily (H)