Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher...
-
date post
22-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher...
![Page 1: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/1.jpg)
Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs
Christopher Bystroff & David Baker
Paper presented by: Tal [email protected]
![Page 2: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/2.jpg)
The Approach
• Learn a set of clusters or structure segments that can be identified from short local sequence
• Combine a set of local structural predictions into one whole structure
![Page 3: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/3.jpg)
Methods - Database
• Database of 471 protein sequence families
• By Sander & Schneider 1994
• Each family contains one known sequence structure
• No more than 25% sequence identity between any 2 alignments
• Well determined structures
• Non-membrane proteins
![Page 4: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/4.jpg)
Clustering of Sequence Segments
• Each position in the database is described by a weighted amino acid frequency (Vingron & Argos 1989)
• Similarity between a sequence and a cluster is defined by “Cross-Entropy”:
• Segments of given length (3-15) were clustered via the K-means algorithm
• Unsupervised
![Page 5: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/5.jpg)
Assessing Structure within a clusterand Choice of Paradigm
• Structural similarity between 2 peptide structure segments
• S1i->j is the distance between -carbon atoms i
and j in segments S1
• The paradigm for a cluster was chosen from the top 20 segments as the one with the smallest sum of mda/dme values with the others
![Page 6: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/6.jpg)
True/False Boundaries in Structure Space
• Used for the refinement procedure• Find Natural Boundaries• Compute Histograms of dme & mda vs the
paradigm over all segments in the cluster• The boundary was set to the point where the
histogram first dropped to ½ of its maximum• If reached 130o or 1.3Ao the cluster is rejected• Average boundaries is 81o and 89A• 82 cluster were constructed (I-site library)
![Page 7: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/7.jpg)
DMA-MDA for9 residue serine B-hairpin
![Page 8: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/8.jpg)
Iterative Refinement of Clusters
• For each cluster with good boundaries• Clustering increases P(cluster|sequence)• In order to increase P(structure|cluster)• 2 residues are also observed on each side of each
sequence• All segments that are not within the natural boundaries
of the paradigm are removed• The frequency profile of the cluster is calculated• The database is searched using the new profile and
the highest 400 scored sequences are the new cluster
![Page 9: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/9.jpg)
Cross-Validation and confidence
• A 10 fold cross validation was performed• If the 10 paradigm were not structurally the
same or if the 10 runs did not converge to the same profile then the cluster was rejected
• If the cluster was not rejected a confidence curve was computed as a function of the Dpq sequence to cluster similarity.
• This enables to compare different profile lengths and incorporates P(clust|seq) and P(struct|clust)
![Page 10: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/10.jpg)
Confidence for Similarity
![Page 11: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/11.jpg)
Clustering – What do we want?
• Direction: Sequence -> Structure• We want to as separated as possible cluster of
sequences so that given a test sequence we can assign it to 1 cluster
• Each cluster should have 1 or a few possible structures. Those structures will be used to predict the test protein structure
• P(struct|seq) = cluster P(struct|clust,seq)*P(clust|seq)
= P(struct|clust)* P(clust|seq)
![Page 12: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/12.jpg)
Iterative Peak Removal
• Similar Sequences can map to different structures in some cases
• When this happens, the predominant pattern occludes the second one
• To find those clusters the refinement was performed using subset of the data that excludes the other class members
• This helped identifying two distinct -C-cap extensions which were very similar in sequence
![Page 13: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/13.jpg)
Cluster Weights
• The prediction accuracy is improved by weighting the confidence curves
• Iterative update was used
• Where F+C are the false positive of cluster
C and F-C are the false negative errors
![Page 14: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/14.jpg)
Prediction Protocol
• Given a sequence to predict:
1. Submit the sequence to PHD (Rose 94) to obtain a set of multiple aligned sequences and hence a profile
2. Each segment of the profile is scored against each of the 82 clusters to produce weighted confidences
3. Confidences are sorted
4. The first segment assigns & from its paradigm
5. For all the subsequent segments in the sorted list the prediction is used if it doesn’t conflict with previously assigned &
![Page 15: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/15.jpg)
Results
• Reported on the training set and on 55 independent protein family set
• Local evaluation is measured by agreement over 8 residue window
• 8 residue segment prediction is considered to be correct if non of the & differences is larger than 120o or if the rmsd between the correct and predicted structure was less than 1.4A
• An error is counted per position iff all 8 overlapping segments are incorrect
• Mda is stricter than the commonly used Q3 score
![Page 16: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/16.jpg)
Results
• Training Set– 471 sequences -> 122,510 residues– 95% of 471 had 1 match ¸ 0.8 confidence– 40% of the residues had confidence ¸ 0.6 and
were 71%(mda) correct
![Page 17: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/17.jpg)
Results
![Page 18: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/18.jpg)
Combinations of I-sites and conventional Secondary Structure
Predictions • With the PHD program• Requires translation into Sec Structure or from
SS into torsion angles• Every program performed better in it’s pwn
domain• 64% Q3 because of under predicting loops and
over predicting strands• I-site was much better in loops and specific
angles of turns• Can compliment PHD
![Page 19: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/19.jpg)
Comparison of I-Site & PHD
![Page 20: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/20.jpg)
I-site library
• 82 cluster represents 13 structural motifs
![Page 21: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/21.jpg)
Summary of the I-site library
![Page 22: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/22.jpg)
![Page 23: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/23.jpg)
Conclusions
• Method is fast – requires only profile comparisons
• There is a measure of “confidence” in the prediction
• They do not provide accuracy over the whole protein
• Believe that the strong local sequence-structure relationships (that occur more than 30 times) are present in I-site
![Page 24: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/24.jpg)
Discussion
• NMR studies of isolated peptides of less than 30 residue show that the peptides do not have a well defined structure. The I-site motif are the exceptions
• It might be that the motifs are the areas that adopt structure independence to the rest of the protein
• An extension might be context specific motifs
![Page 25: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/25.jpg)
![Page 26: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/26.jpg)
![Page 27: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/27.jpg)
2 Approaches for global scoring functions
• Derived from the protein Database– Large # of parameters– Complicated
• Potentials– Based on Chemical Intuitions– Simpler– Clearer insights into sequence/structure relations
• They chose the Database approach– Because of the dangers of crafting a measure for a
specific protein family rather than for the whole DB
![Page 28: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/28.jpg)
Scoring Functions
)|()(
)(
)|()()|(
StructureSequencePStructureP
SequenceP
StructureSequencePStructurePSequenceStructureP
• P(Seq|Str) is used when computing sequence profiles for motifs
• P(Structure) is hardest to estimate and contains most of the non-local interactions.
• For ab-initio, P(Structure) captures the features that distinguish folded structures from random chain (local) configurations.
Structure oft independen is )P(Sequence since
![Page 29: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/29.jpg)
Radius of gryation2
Scoring Function• Measures the largest radius from the
center of the fold
![Page 30: Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.](https://reader035.fdocuments.net/reader035/viewer/2022062715/56649d7f5503460f94a6205e/html5/thumbnails/30.jpg)
Radius of gryation2
Scoring Function• Advantages
– Non-dependent on alpha-beta decomposition - since the generated structures is made from segments of real proteins its alpha-beta decomposition much like of real proteins
• Disadvantages– Structures with beta paired strands are no
more probable than those of unpaired beta strands