Lecture 5 Protein Modeling - University of...
Transcript of Lecture 5 Protein Modeling - University of...
-
Lecture 5Protein Modeling
June 4, 2008
Protein Structure Prediction
-
3D Protein Structure
ALA
C
LEU
C
PRO
C
VAL
C
ARG
C
? ? ?
backbone
sidechain
-
The Protein Folding Problem
we know that the function of a protein is determined in large part by its 3D shape (fold, conformation)
can we predict the 3D shape of a protein given only its amino-acid sequence?
-
Motivation Want to identify the function of genes we find,
and what different mutations/alleles do One gene = one protein (sort of)
Function of protein = function of gene
Function can be determined in many ways Gene expression, knockouts, etc But these take time, and are prone to mistakes
Goal: If we can structure every protein, learning their functions isnt too far away
-
Thornton et al 2000 (Nature)
-
ai.stanford.edu/~serafim/CS262_2006/Slides
-
Protein Architecture proteins are polymers consisting of amino acids linked by
peptide bonds each amino acid consists of
a central carbon atom an amino group a carboxyl group a side chain
differences in side chains distinguish different amino acids
2NHCOOH
-
3D Protein Structure
backbonebackbonesidechainbackbonesidechainC-alpha
-
Peptide Bondsaminogroup
carboxylgroup
sidechain
carbon (common reference point for coordinates of a structure)
-
Amino Acid Side Chains side chains vary in: shape, size, charge, polarity
-
Levels of Description protein structure is often described at four different scales
primary structure secondary structure tertiary structure quaternary structure
-
Levels of Description
-
Secondary Structure secondary structure refers to certain common repeating
structures it is a local description of structure two common secondary structures
helices strands/sheets
a third category, called coil or loop, refers to everything else
-
Helices
carbon
hydrogenbond
individualamino acid
-
Sheets
-
Ribbon Diagram Showing Secondary Structures
-
Levels of Description
-
What Determines Conformation? in general, the amino-acid sequence of a protein determines
the 3D shape of a protein [Anfinsen et al., 1950s] but some exceptions
all proteins can be denatured some proteins are inherently disordered (i.e. lack a regular
structure) some proteins get folding help from chaperones there are various mechanisms through which the
conformation of a protein can be changed in vivo post-translational modifications such as
phosphorylation prions etc.
-
What Determines Conformation?
what physical properties of the protein determine its fold? rigidity of the protein backbone interactions among amino acids, including
electrostatic interactions van der Waals forces volume constraints hydrogen, disulfide bonds
interactions of amino acids with water
-
Determining Protein Structures protein structures can be determined
experimentally (in many cases) by x-ray crystallography nuclear magnetic resonance (NMR)
-
Myoglobin
From www.inst.bnl.gov/GasDetectorLab/x-rays/SRI94.htm
-
Myoglobin
S.E.V. Phillips. "Structure and refinement of oxymyoglobin at 1.6 resolution.", J. Mol. Biol. 1980, 142, 531.
-
X-ray Crystallography
proteincrystal collection
plate
x-raybeam
diffractionpattern
electrondensity map
(3D picture)
-
Electron Density Map Interpretation
GIVEN: 3D Electron Density Map
-
Electron Density Map Interpretation
FIND: All-atom Protein Model
-
NMR
Nuclear Magnetic Resonance Spectroscopy Cannot handle large proteins like X-ray Exploits the chemical environment to return
distances between atoms Can use knowledge of restraints to identify
positions of atoms that produce peaks
-
Protein structure determination in solution by NMR spectroscopy Wuthrich K. J Biol Chem. 1990 December 25;265(36):22059-62
-
Experimental Methods
Very expensive and time-consuming Computational methods can help with time
Many proteins still cannot be done in this manner
-
More motivation
there is a large sequence-structure gap300K protein sequences in SwissProt
database50K protein structures in PDB database
key question: can we predict structures by computational means instead?
-
Approaches to Protein Structure Prediction
prediction in 1D secondary structure solvent accessibility (which residues are exposed to
water, which are buried) transmembrane helices (which residues span
membranes) prediction in 2D
inter-residue/strand contacts prediction in 3D
homology modeling fold recognition (e.g. via threading) ab initio prediction (e.g. via molecular dynamics)
-
Prediction in 1D, 2D and 3D
Figure from B. Rost, Protein Structure in 1D, 2D, and 3D, The Encyclopaedia of Computational Chemistry, 1998
predicted secondary structure and solvent accessibility
known secondary structure (E = beta strand) and solvent accessibility
-
2D Prediction Approaches use secondary structure predictions
to predict short-range contacts (e.g. hydrogen bonds in helices)
use secondary structure predictions to predict strand alignments
use correlated mutations to predict contacts
-
Prediction in 3D homology modeling
given: a query sequence Q, a database of protein structuresdo:
find protein P has high sequence similarity to Q return Ps structure as an approximation to Qs structure
fold recognition (threading)given: a query sequence Q, a database of known foldsdo:
find fold F such that Q can be aligned with F in a highly compatible manner
return F as an approximation to Qs structure
-
Prediction in 3D fragment assembly(Rosetta)
given: a query sequence Q, a database of structure fragmentsdo:
find a set of fragments that Q can be aligned with in a highly compatible manner
return the combined fragments as an approximation
molecular dynamicsgiven: a query sequence Qdo:
use laws of physics to to simulate folding of Q
-
Homology Modeling
0% 100%30%pairwise sequence identity
homologsprobablyunrelated
remotehomologs
20%
most pairs of proteins with similar structure are remote homologs (< 25% sequence identity)
homology modeling usually doesnt work for remote homologs ; most pairs of proteins with < 25% sequence identity are unrelated
-
Homology-based Prediction
Raw model
Loop modeling
Side chain placement
Refinement
-
The SCOP DatabaseStructural Classification Of Proteins
FAMILY: proteins that are >30% similar, or >15% similar and have similar known structure/function
SUPERFAMILY: proteins whose families have some sequence and function/structure similarity suggesting a common evolutionary origin
COMMON FOLD: superfamilies that have same secondary structures in same arrangement, probably resulting by physics and chemistry
ai.stanford.edu/~serafim/CS262_2006/Slides
-
Examples of Fold Classesai.stanford.edu/~serafim/CS262_2006/Slides
-
Threading
-
Ab initio Prediction ROSETTA
1. PSI-BLAST homology search
Discard sequences with >25% homology
2. PHD
For each 3-long and each 9-long sequence fragment, get 25 structure fragments that match well
3 M k Ch i M C l
?? ?
ai.stanford.edu/~serafim/CS262_2006/Slides
-
ai.stanford.edu/~serafim/CS262_2006/Slides
-
Ab initio Prediction CASP results
ai.stanford.edu/~serafim/CS262_2006/Slides
-
Summary of current state of the art
ai.stanford.edu/~serafim/CS262_2006/Slides
-
Open Ended
Ab Initio is the goal, far from it Sidechain prediction Contact Map prediction Search space reduction Parallelization (GPUs) Surface Accessibility
-
Other areas
Protein-Protein Interaction Drug Design Protein Engineering Ligand Docking/Inhibition Function Prediction
Lecture 5Protein ModelingJune 4, 20083D Protein StructureThe Protein Folding ProblemMotivationSlide Number 5Slide Number 6Protein Architecture3D Protein StructurePeptide BondsAmino Acid Side ChainsLevels of DescriptionLevels of DescriptionSecondary Structurea Helicesb SheetsRibbon Diagram Showing Secondary StructuresLevels of DescriptionWhat Determines Conformation?What Determines Conformation?Determining Protein StructuresSlide Number 21Slide Number 22X-ray CrystallographyElectron Density Map InterpretationElectron Density Map InterpretationNMRSlide Number 27Experimental MethodsMore motivationApproaches to Protein Structure PredictionPrediction in 1D, 2D and 3D2D Prediction ApproachesPrediction in 3DPrediction in 3DSlide Number 35Homology ModelingHomology-based PredictionThe SCOP DatabaseExamples of Fold ClassesThreadingAb initio Prediction ROSETTA Slide Number 42Ab initio Prediction CASP resultsSummary of current state of the artOpen EndedOther areas