CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM...
-
Upload
hilary-allen -
Category
Documents
-
view
215 -
download
0
Transcript of CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM...
![Page 1: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/1.jpg)
CSCE555 BioinformaticsCSCE555 BioinformaticsLecture 18 Protein Tertiary
Structure Prediction
Meeting: MW 4:00PM-5:15PM SWGN2A21
Instructor: Dr. Jianjun Hu
Course page: http://www.scigen.org/csce555
University of South CarolinaDepartment of Computer Science and Engineering
2008 www.cse.sc.edu.
![Page 2: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/2.jpg)
OutlineOutlineExperimental limitation of protein
structure determinationTertiary Structure Prediction
◦AB initio◦Homology modeling◦Threading
![Page 3: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/3.jpg)
Experimental Protein Structure Experimental Protein Structure DeterminationDeterminationHigh-resolution structure determination
◦ X-ray crystallography (<1A)◦ Nuclear magnetic resonance (NMR) (~1-2.5A)
Lower-resolution structure determination◦ Cryo-EM (electron-microscropy) ~10-15A
Theoretical Models?◦ Highly variable - but a few equiv to X-ray!
![Page 4: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/4.jpg)
Tertiary Structure PredictionTertiary Structure Prediction
Fold or tertiary structure prediction problem can be formulated as a search for minimum energy conformation◦ Search space is defined by psi/phi angles of
backbone and side-chain rotamers◦ Search space is enormous even for small proteins!◦ Number of local minima increases exponentially
with number of residuesComputationally it is an exceedingly difficult problem!
![Page 5: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/5.jpg)
Levinthal Paradox of Protein Levinthal Paradox of Protein Folding: How nature does Folding: How nature does search?search?We assume that there are three conformations for each amino acid (ex. α-helix, β-sheet and random coil). If a protein is made up of 100 amino acid residues, a total number of conformations is
3100 = 515377520732011331036461129765621272702107522001
≒ 5 x 1047.
If 100 psec (10-10 sec) were required to convert from a conformation to another one, a random search of all conformations would require
5 x 1047 x 10-10 sec ≒ 1.6 x 1030 years.
However, folding of proteins takes place in msec to sec order. Therefore, proteins fold not via a random search but a more sophisticated search process.
We want to watch the folding process of a protein using molecular simulation techniques.
![Page 6: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/6.jpg)
Steps in Protein FoldingSteps in Protein Folding
1- "Collapse"- driving force is burial of hydrophobic aa’s
(fast - msecs)2- Molten globule - helices & sheets form, but
"loose"(slow - secs)
3- "Final" native folded state - compaction, some 2' structures rearranged
Native state? - assumed to be lowest free energy - may be an ensemble of structures
![Page 7: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/7.jpg)
7
Protein Folding FunnelProtein Folding Funnel
Local mimina
Global minimum
Native Structure
![Page 8: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/8.jpg)
Protein Structure Protein Structure PredictionPredictionAb initio
◦ Use just first principles: energy, geometry, and kinematics
Homology◦ Find the best match to a database
of sequences with known 3D-structure
Combinations
Threading
Meta-servers and other methods
Knowledge based approaches
![Page 9: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/9.jpg)
9
Ab InitioAb Initio Prediction Prediction
Basic idea
Anfinsen’s theory: Protein native structure corresponds
to the state with the lowest free energy of the protein-
solvent system.
General procedures
◦ Develop a Potential/Energy function
Evaluate the energy of protein conformation
Select native structure
◦ Conformational search algorithm
To produce new conformations
Search the potential energy surface and locate the global
minimum (native conformation)Provides both folding pathway & folded structure
Can only apply to very small proteins
![Page 10: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/10.jpg)
10
Potential Functions for PSPPotential Functions for PSP
Potential function◦ Physical based energy function
Empirical all-atom forcefields: CHARMM, AMBER, ECEPP-3,
GROMOS, OPLS
Parameterization: Quantum mechanical calculations,
experimental data
Simplified potential: UNRES (united residue)
◦ Solvation energy
Implicit solvation model: Generalized Born (GB) model,
surface area based model
Explicit solvation model: TIP3P (computationally
expensive)
![Page 11: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/11.jpg)
11
General Form of All-atom General Form of All-atom ForcefieldsForcefields
pairs ,ticelectrosta
pairs , der Waalsvan
612
Hbonds
1012
dihedralsangles
2
0
bonds
2
0totalcos1
jiij
ji
jiij
ij
ij
ij
ij
ij
ij
ij
b
r
r
B
r
A
r
D
r
C
nKKrrKV
Electrostatic term
H-bonding term Van der Waals term
Bond stretching term
Dihedral termAngle bending term
r ΦΘ
+ ーO H rr r
The most time demanding part.
![Page 12: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/12.jpg)
12
Search Potential Energy Search Potential Energy SurfaceSurface
We are interested in minimum points on Potential Energy Surface (PES)
Conformational search techniques
Energy Minimization
Monte Carlo
Molecular Dynamics
Others: Genetic Algorithm, Simulated
Annealing
![Page 13: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/13.jpg)
13
Energy MinimizationEnergy Minimization
Energy minimization
Methods
First-order minimization: Steepest descent, Conjugate
gradient minimization
Second derivative methods: Newton-Raphson method
Quasi-Newton methods: L-BFGS
Local miminum
![Page 14: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/14.jpg)
14
Monte CarloMonte Carlo
In molecular simulations, ‘Monte Carlo’ is an importance sampling technique.1. Make random move and produce a new conformation
2. Calculate the energy change E for the new conformation
3. Accept or reject the move based on the Metropolis criterion
exp( )E
PkT
Boltzmann factor
If E<0, P>1, accept new conformation;
Otherwise: P>rand(0,1), accept, else reject.
![Page 15: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/15.jpg)
Ab initio Prediction – CASP Ab initio Prediction – CASP resultsresults
![Page 16: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/16.jpg)
Comparative Modeling (Knowledge Comparative Modeling (Knowledge based approach)based approach)
Provide folded structure only
Two primary methods 1) Homology modeling2) Threading (fold recognition)
Both rely on availability of experimentally determined structures that are "homologous" or at least structurally very similar to target
![Page 17: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/17.jpg)
Homology ModelingHomology Modeling1. Identify homologous protein sequences (-BLAST)2. Among available structures, choose the one with
closest sequence match to target as template(can combine steps 1 & 2 by using PDB-BLAST)
3. Build model by placing residues in corresponding positions of homologous structure & refine by "tweaking"
Homology modeling - works "well"• Computationally? not very expensive• Accuracy? higher sequence identity better model
Requires ~30% sequence identity with sequence for which structure is known
![Page 18: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/18.jpg)
Homology-based Homology-based PredictionPrediction
Raw model
Loop modeling
Side chain placement
Refinement
![Page 19: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/19.jpg)
Homology-based Homology-based PredictionPrediction
![Page 20: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/20.jpg)
Threading - Fold RecognitionThreading - Fold RecognitionIdentify “best” fit between target sequence & template structure
Threading - works "sometimes"• Computationally? Can be expensive or cheap,
depends on energy function & whether "all atom" or "backbone only" threading
• Accuracy? in theory, should not depend on sequence identity (should depend on quality of template library & "luck")
Usually, higher sequence identity to protein of known structure better model
![Page 21: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/21.jpg)
Threading Algorithm for Threading Algorithm for PSPPSP Database of 3D structures and sequences
◦ Protein Data Bank (or non-redundant subset)
Query sequence◦ Sequence < 25% identity to known
structures Alignment protocol
◦ Dynamic programming Evaluation protocol
◦ Distance-based potential or secondary structure
Ranking protocol3.3b 21
![Page 22: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/22.jpg)
ThreadinThreadingg
Basic premise:
Statistics from Protein Data Bank (~40,000 structures)
Thus, chances for a protein to have a native-like structural fold in PDB are quite good
◦ Note: Proteins with similar structural folds could be either homologs or analogs
The number of unique structural folds in nature is fairly small (probably 2000-3000)
Until very recently, 90% of new structures submitted to PDB had similar structural folds in PDB
![Page 23: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/23.jpg)
1. Align target sequence with template structures
(fold library) from the Protein Data Bank (PDB)
2. Calculate energy score to evaluate goodness of fit
between target sequence & template structure
3. Rank models based on energy scores
Target Sequence
Structure Templates
ALKKGF…HFDTSE
Steps in Threading
![Page 24: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/24.jpg)
Threading IssuesThreading Issues
Structure database - must be complete: no decent model if no good template in library!
Sequence-structure alignment algorithm:
Bad alignment Bad score!
Energy function (scoring scheme): must distinguish correct sequence-fold alignment from
incorrect sequence-fold alignments must distinguish “correct” fold from close decoys
Prediction reliability assessment - How determine whether predicted structure is correct? (or even close?)
Find “correct” sequence-structure alignment of a target sequence with its native-like fold in PDB
![Page 25: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/25.jpg)
Threading: Threading: Template databaseTemplate database
Build a database of structural templates
(eg, ASTRAL domain library derived from the PDB)
Supplement with additional decoys, e.g., generated usingab initio approach such as Rosetta (Baker)
![Page 26: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/26.jpg)
Threading: Threading: Energy functionEnergy function
Two main methods (and combinations of these)
Structural profile (environmental) physico-chemical properties of aa’s
Contact potential (statistical) based on contact statistics from PDB
Miyazawa & Jernigan (ISU)
![Page 27: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/27.jpg)
Protein Threading: Protein Threading: Typical energy functionTypical energy function
How well does a specific residue fit structural environment?
What is "probability" that two specific residues are in contact?
Alignment gap penalty?
Total energy: Ep + Es + Eg
Goal: Find a sequence-structure alignment that minimizes the energy function
![Page 28: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/28.jpg)
CAFASPCAFASP
GOAL
The goal of CAFASP is to evaluate the performance of fully automatic structure prediction servers available to the community. In contrast to the normal CASP procedure, CAFASP aims to answer the question of how well servers do without any intervention of experts, i.e. how well ANY user using only automated methods can predict protein structure. CAFASP assesses the performance of methods without the
user intervention allowed in CASP.
![Page 29: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/29.jpg)
Performance Evaluation in Performance Evaluation in CAFASP3CAFASP3
Servers
(54 in total)
Sum MaxSub
Score
# correct
(30 FR targets)
3ds5 robetta 5.17-5.25 15-17
pmod 3ds3 pmode3 4.21-4.36 13-14
RAPTOR 3.98 13
shgu 3.93 13
3dsn 3.64-3.90 12-13
pcons3 3.75 12
fugu3 orf_c 3.38-3.67 11-12
… … …
pdbblast 0.00 0
(http://ww.cs.bgu.ac.il/~dfischer/CAFASP3, released in December, 2002.)
Servers with name in italic are meta servers
MaxSub score ranges from 0 to 1
Therefore, maximum total score is 30
![Page 30: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/30.jpg)
One structure where RAPTOR One structure where RAPTOR did bestdid best
Red: true structure
Blue: correct part of prediction
Green: wrong part of prediction
• Target Size:144
• Super-imposable size within 5A: 118
• RMSD:1.9
![Page 31: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/31.jpg)
Some more results by other Some more results by other programsprograms
![Page 32: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/32.jpg)
Some more results by other Some more results by other programsprograms
![Page 33: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/33.jpg)
Some more results by other Some more results by other programsprograms
![Page 34: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/34.jpg)
Summary of current state of Summary of current state of the artthe art
![Page 35: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/35.jpg)
Automated Web-Based Homology Automated Web-Based Homology ModelingModeling SWISS Model :
http://www.expasy.org/swissmod/SWISS-MODEL.html
WHAT IF : http://www.cmbi.kun.nl/swift/servers/
The CPHModels Server : http://www.cbs.dtu.dk/services/CPHmodels/
3D Jigsaw : http://www.bmm.icnet.uk/~3djigsaw/
SDSC1 : http://cl.sdsc.edu/hm.html
EsyPred3D : http://www.fundp.ac.be/urbm/bioinfo/esypred/
![Page 36: CSCE555 Bioinformatics Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: .](https://reader036.fdocuments.net/reader036/viewer/2022070409/56649e765503460f94b7783b/html5/thumbnails/36.jpg)
Comparative Modeling Server & Comparative Modeling Server & ProgramProgram
COMPOSER http://www.tripos.com/sciTech/inSilicoDisc/bioInformatics/matchmaker.html
MODELER http://salilab.org/modeler
InsightII http://www.msi.com/
SYBYL http://www.tripos.com/