Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic...
-
date post
15-Jan-2016 -
Category
Documents
-
view
221 -
download
0
Transcript of Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic...
![Page 1: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/1.jpg)
Protein Folding Initiation Site Protein Folding Initiation Site MotifsMotifs
Chris BystroffChris Bystroff
Dept of BiologyDept of Biology
Rensselaer Polytechnic Institute, Troy, NYRensselaer Polytechnic Institute, Troy, NY
![Page 2: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/2.jpg)
ATCTGTATCGTATCGTATTTCTGGHACCCCCTGATGTAAAAGAGAGTTCTATATTACTACAACCACGATCGGATTTATTTTGGTCTADCAGCTCAGGATCATCACAGGATTCAAATCCTATCATCAGGAGGGGGGTCGTGGTTGCGCATTAGCAAAGTTGCAGTCAGTCGTCATGCAGCGACCACATACACACTGCATGCGCGTCTTCAVATCCCACAGTCAGTAGTAGTCACAGACCTCCAGTCAGTCGAGTACGACGTCAGTACGTCAGTCAGCCAGTCAGTCAGTCATACGTACGTCATGCATACGTAGCTAGCAGACGCAGCATTACGTCGCGATCGATCGATCGGCATAGCAGCATCCCAGTCAGTCATATGCATAGTCGATCGACGTCAGTCATGAGATCGTACGAAATACGTAGCTGATCGACGTCAGTCAGACTGATCGATCGGATTCAGTCACGATGCATGCTAGCAAAGTCAGCGCATGCTAGCTACGTAGTCAGTACTGCATGCAGTACGTACGTAGACGTCAGTCAGTCAGTCATGATGCTAGCTAGCTACGTCACAGTCAGTCATGACTGACTGACTGACTGCAGTCAGTCATCGATACGTAGCTAGCTACGTCAGTCATGCAGTCAGTCATTGATGATCGATCGTACATGCAGATGCCGTAGGCTAGCTAGCTAGCACTACGATGCATGCTAGCTAGCTACGACCAGTACCATGATGACTGCATGATCATACTGCCCAAAAAACGACTTAATCGTATCGTATTTCTGGHACCCCCTGATGTAAAAGAGAGTTCTATATTACTACAACCACGATCGGATTTATTTTGGTCTADCAGCTCAGGATCATCACAGGATTCAAATCCTATCATCAGGAGGGGGGTCGTGGTTGCGCATTAGCAAAGTTGCAGTCAGTCGTCATGCAGCGACCACATACACACTGCATGCGCGTCTTCAVATCCCACAGTCAGTAGTAGTCACAGACCTCCAGTCAGTCGAGTACGACGTCAGTACGTCAGTCAGCCAGTCAGTCAGTCATACGTACGTCATGCATACGTAGCTAGCAGACGCAGCATTACGTCGCGATCGATCGATCGGCATAGCAGCATCCCAGTCAGTCATATGCATAGTCGATCGACGTCAGTCATGAGATCGTACGAAATACGTAGCTGATCGACGTCAGTCAGACTGATCGATCGGATTCAGTCACGATGCATGCTAGCAAAGTCAGCGCATGCTAGCTACGTAGTCAGTACTGCATGCAGTACGTACGTAGACGTCAGTCAGTCAGTCATGATGCTAGCTAGCTACGTCACAGTCAGTCATGACTGACTGACTGACTGCAGTCAGTCATCGATACGTAGCTAGCTACGTCAGTCATGCAGTCAGTCATTGATGATCGATCGTACATGCAGATGCCGTAGGCTAGCTAGCTAGCACTACGATGCATGCTAGCTAGCTACGACCAGTACCATGATGACTGCATGATCATACTGCCCAAAAAACGACTTAATCGTATCGTATTTCTGGHACCCCCTGATGTAAAAGAGAGTTCTATATTACTACAACCACGATCGGATTTATTTTGGTCTADCAGCTCAGGATCATCACAGGATTCAAATCCTATCATCAGGAGGGGGGTCGTGGTTGCGCATTAGCAAAGTTGCAGTCAGTCGTCATGCAGCGACCACATACACACTGCATGCGCGTCTTCAVATCCCACAGTCAGTAGTAGTCACAGACCTCCAGTCAGTCGAGTACGACGTCAGTACGTCAGTCAGCCAGGAGGGGGGTCGTGGTTGCGCATTAGCAAAGTTGCAGTCAGTCGTCATGCAGCGACCACATACACACTGCATGCGCGTCTTCAVATCCCACAGTCAGTAGTAGTCACAGACCTCCAGTCAGTCGAGTACGACGTCAGTACGTCAGTCAGCCAGTCAGTCAGTCATACGTACGTCATGCATACGTAGCTAGCAGACGCAGCATTACGTCGCGATCGATCGATCGGCATAGCAGCATCCCAGTCAGTCATATGCATAGTCGATCGACGTCAGTCATGAGATCGTACGAAATACGTAGCTGATCGACGTCAGTCAGACTGATCGATCGGATTCAGTCACGATGCATGCTAGCAAAGTCAGCGCATGCTAGCTACGTAGTCAGTACTGCATGCAGTACGTACGTAGACGTCAGTCAGTCAGTCATGATGCTAGCTAGCTACGTCACAGTCAGTCATGACTGACTGACTGACTGCAGTCAGTCATCGATACGTAGCTAGCTACGTCAGTCATGCAGTCAGTCATTGATCATGATCATACTGCCCAAAAAACGACTTA
Bioinformatics = sequence analysis
Biological sequences come in two types: DNA and protein
DNA has a four-letter alphabet
Protein has a 20-letter alphabet
Sequences are an abstraction. As such, they are treated abstractly...
Sequence alignment
Phylogenetic trees
Gene finding
Data mining
![Page 3: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/3.jpg)
"A free-standing reality"
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
ATGCATCAGGACTAGCTATCAGAATC
Any DNA sequence REPRESENTS a physical object, and some DNA sequences translate to protein serquences, which also REPRESENT physical objects.
behind the abstraction...
![Page 4: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/4.jpg)
Sequence = Structure
Structure = Function
Function = Life
__________________
Sequence = Life
![Page 5: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/5.jpg)
The protein folding problem
Unfolded Folded
This happens spontaneously (in water).
Sequence = Structure
![Page 6: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/6.jpg)
The problem with the protein folding problem.
Number of amino acids residues in a typical protein: 100
Approximate number of degrees of freedom per residue: 3
Estimated total number of conformations (=3100): 1045
Time required to fold if all conformations are sampled at the rate of 1 per 10-15s: 1020 y
Time since the Big Bang: ~13 x 109 y
![Page 7: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/7.jpg)
pathways
![Page 8: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/8.jpg)
folding pathways must exist
The protein is unfolded...
...something happens first...
...then something else happens.
![Page 9: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/9.jpg)
Early events eliminate alternative pathways
![Page 10: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/10.jpg)
What happens first?
Helix/coil transition 10-100ns
Beta-hairpin 0.1-1.0 s
transient intermediates < 1ms
equilibrium 0.001-1.0 s
![Page 11: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/11.jpg)
Local structure usually isn't stable
Helices and turns form quickly but just as quickly fall apart.
Most short peptides (<20aa) do not show structural stability in NMR studies.
Exceptions:A few short peptides have been shown to be conformationally stable (for example Met-enkephalin = YGGFM)
![Page 12: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/12.jpg)
Interesting parallels between bioinformatics and semantics
language proteins
letters amino acids
words motifs
phrases modules
sentences whole proteins
meaning structure
literature genome
grammar folding??
![Page 13: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/13.jpg)
ATCTGTATCGTATCGTATTTCTGGHACCCCCTGATGTAAAAGAGAGTTCTATATTACTACAACCACGATCGGATTTATTTTGGTCTADCAGCTCAGGATCATCACAGGATTCAAATCCTATCATCAGGAGGGGGGTCGTGGTTGCGCATTAGCAAAGTTGCAGTCAGTCGTCATGCAGCGACCACATACACACTGCATGCGCGTCTTCAVATCCCACAGTCAGTAGTAGTCACAGACCTCCAGTCAGTCGAGTACGACGTCAGTACGTCAGTCAGCCAGTCAGTCAGTCATACGTACGTCATGCATACGTAGCTAGCAGACGCAGCATTACGTCGCGATCGATCGATCGGCATAGCAGCATCCCAGTCAGTCATATGCATAGTCGATCGACGTCAGTCATGAGATCGTACGAAATACGTAGCTGATCGACGTCAGTCAGACTGATCGATCGGATTCAGTCACGATGCATGCTAGCAAAGTCAGCGCATGCTAGCTACGTAGTCAGTACTGCATGCAGTACGTACGTAGACGTCAGTCAGTCAGTCATGATGCTAGCTAGCTACGTCACAGTCAGTCATGACTGACTGACTGACTGCAGTCAGTCATCGATACGTAGCTAGCTACGTCAGTCATGCAGTCAGTCATTGATGATCGATCGTACATGCAGATGCCGTAGGCTAGCTAGCTAGCACTACGATGCATGCTAGCTAGCTACGACCAGTACCATGATGACTGCATGATCATACTGCCCAAAAAACGACTTAATCGTATCGTATTTCTGGHACCCCCTGATGTAAAAGAGAGTTCTATATTACTACAACCACGATCGGATTTATTTTGGTCTADCAGCTCAGGATCATCACAGGATTCAAATCCTATCATCAGGAGGGGGGTCGTGGTTGCGCATTAGCAAAGTTGCAGTCAGTCGTCATGCAGCGACCACATACACACTGCATGCGCGTCTTCAVATCCCACAGTCAGTAGTAGTCACAGACCTCCAGTCAGTCGAGTACGACGTCAGTACGTCAGTCAGCCAGTCAGTCAGTCATACGTACGTCATGCATACGTAGCTAGCAGACGCAGCATTACGTCGCGATCGATCGATCGGCATAGCAGCATCCCAGTCAGTCATATGCATAGTCGATCGACGTCAGTCATGAGATCGTACGAAATACGTAGCTGATCGACGTCAGTCAGACTGATCGATCGGATTCAGTCACGATGCATGCTAGCAAAGTCAGCGCATGCTAGCTACGTAGTCAGTACTGCATGCAGTACGTACGTAGACGTCAGTCAGTCAGTCATGATGCTAGCTAGCTACGTCACAGTCAGTCATGACTGACTGACTGACTGCAGTCAGTCATCGATACGTAGCTAGCTACGTCAGTCATGCAGTCAGTCATTGATGATCGATCGTACATGCAGATGCCGTAGGCTAGCTAGCTAGCACTACGATGCATGCTAGCTAGCTACGACCAGTACCATGATGACTGCATGATCATACTGCCCAAAAAACGACTTAATCGTATCGTATTTCTGGHACCCCCTGATGTAAAAGAGAGTTCTATATTACTACAACCACGATCGGATTTATTTTGGTCTADCAGCTCAGGATCATCACAGGATTCAAATCCTATCATCAGGAGGGGGGTCGTGGTTGCGCATTAGCAAAGTTGCAGTCAGTCGTCATGCAGCGACCACATACACACTGCATGCGCGTCTTCAVATCCCACAGTCAGTAGTAGTCACAGACCTCCAGTCAGTCGAGTACGACGTCAGTACGTCAGTCAGCCAGGAGGGGGGTCGTGGTTGCGCATTAGCAAAGTTGCAGTCAGTCGTCATGCAGCGACCACATACACACTGCATGCGCGTCTTCAVATCCCACAGTCAGTAGTAGTCACAGACCTCCAGTCAGTCGAGTACGACGTCAGTACGTCAGTCAGCCAGTCAGTCAGTCATACGTACGTCATGCATACGTAGCTAGCAGACGCAGCATTACGTCGCGATCGATCGATCGGCATAGCAGCATCCCAGTCAGTCATATGCATAGTCGATCGACGTCAGTCATGAGATCGTACGAAATACGTAGCTGATCGACGTCAGTCAGACTGATCGATCGGATTCAGTCACGATGCATGCTAGCAAAGTCAGCGCATGCTAGCTACGTAGTCAGTACTGCATGCAGTACGTACGTAGACGTCAGTCAGTCAGTCATGATGCTAGCTAGCTACGTCACAGTCAGTCATGACTGACTGACTGACTGCAGTCAGTCATCGATACGTAGCTAGCTACGTCAGTCATGCAGTCAGTCATTGATCATGATCATACTGCCCAAAAAACGACTTA
Does anyone know the words?
What if we use the enormous database of protein sequences to find recurrent short patterns?
Those short patterns would be the words.
But, are they "meaningful words"?
(Does the sequence correlate with the local structure?)
![Page 14: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/14.jpg)
Maybe, protein folding pathways can be found in protein sequence
"grammar"1. Letters
2. Words
3. Phrases
4. Sentences
![Page 15: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/15.jpg)
Amino acids can be groupedA C D E F G H I K L M N P Q R S T V W Y
4 0 -2 -1 -2 0 -2 -1 -1 -1 -1 -2 -1 -1 -1 1 0 0 -3 -2 A
9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -2 C
6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -3 D
5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -2 E
6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 3 F
6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -3 G
8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 2 H
4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1 I
5 -2 -1 0 -1 1 2 0 -1 -2 -3 -2 K
4 2 -3 -3 -2 -2 -2 -1 1 -2 -1 L
5 -2 -2 0 -1 -1 -1 1 -1 -1 M
6 -2 0 0 1 0 -3 -4 -2 N
7 -1 -2 -1 -1 -2 -4 -3 P
5 1 0 -1 -2 -2 -1 Q
5 -1 -1 -3 -3 -2 R
4 1 -2 -3 -2 S
5 0 -2 -2 T
4 -3 -1 V
11 2 W
7 Y
![Page 16: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/16.jpg)
Sequence alignments show evolutionary diversity
![Page 17: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/17.jpg)
VIVAANRSAVIVSAARTAVIASAVRTAVIVDAGRSAVIASGVRTAVIVAAKRTAVIVSAVRTPVIVSAARTAVIVSAVRTPVIVDAGRTAVIVDAGRTAVIVSGARTPVIVDFGRTPVIVSATRTPVIVSATRTPVIVGALRTPVIVSATRTPVIVSATRTPVIASAARTAVIVDAIRTPVIVAAYRTAVIVSAARTPVIVDAIRTPVIVSAVRTAVIVAAHRTA
••••••
Sequence alignment
Sequence profile
Pij wk skj aai
kseqs
wkkseqs
Sequence profiles are condensed sequence alignments
Red = high prob ratio (>3)Green = background prob ratio(~1)Blue = low prob ratio (< 1/3)
(Gribskov)
![Page 18: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/18.jpg)
l1
|Pijl Pikl|i1,20“distance” between two points =
each dot represents a different 1-residue profile
Clustering profilesResulting clusters:
K Q RA S TA CS W Y FA P GD E NI L V MH Y
did it!
"Kmeans" clustering
![Page 19: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/19.jpg)
Protein sequence grammar
1. Letters: amino acid profiles
2. Words
3. Phrases
4. Sentences
![Page 20: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/20.jpg)
Protein sequence grammar
1. Letters: amino acid profiles
2. Words
3. Phrases
4. Sentences
![Page 21: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/21.jpg)
l1,L |Pijl Pikl |
i1,20“distance” from i to k =
each dot represents a different short profile
~120,000 segments
26 27 28 29 30 31 32position
C
FLIV
WYM
AQNT
SHRK
EDPG
AA
26 27 28 29 30 31 32
C
FLIV
WYM
AQNT
SHRK
EDPG
Clustering profile segments, length L
~800 clusters for each L
L=3,15
![Page 22: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/22.jpg)
Learning the structure of each sequence cluster
the database
Search the database for the 400 nearest neighbors
remove all cluster members that do not conform with the paradigm
profile of cluster
cluster of nearest neighbors
After convergence, a cross-validation test is done.
![Page 23: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/23.jpg)
I-sites library of sequence structure motifs
1000's of sequence clusters
supervised learning
Cross-validation
262 motifs
Number of different motifs after removing register variants: 31
![Page 24: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/24.jpg)
Example of a motif
Sequences that match sequence profile....
...tend to have the same structure...
...and this is it.
![Page 25: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/25.jpg)
Clustering finds previously known sequence-structure motifs
amphipathic -helix
amphipathic -strand-helix N-cap
p•nppn• nS••En•p •n•n
![Page 26: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/26.jpg)
Many new motifs are found
diverging type-2 turn
Serine hairpin Type-I hairpin
Frayed helix
Proline helix C-capalpha-alpha corner
glycine helix N-cap
![Page 27: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/27.jpg)
![Page 28: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/28.jpg)
Why are there motifs in proteins?
Ancient conserved regions?
Selection for stability?
Folding initiation sites?
![Page 29: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/29.jpg)
Structural features seem to drive clustering.
1. glycine at strained angles
3. negative design against alternative structures (helix)
2. conserved sidechain contacts
![Page 30: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/30.jpg)
Number of Patternsites / 100 positions Average boundaries of conserved
Motif clusters overall confid. > 0.60 mda° dme rmsd (len) non-polar residues
1 Amphipathic -helix 13 3.1 0.9 56 0.71 0.78 (15) 1-4-8, 1-5-8
2 Non-polar -helix 6 0.9 0.12 54 0.58 0.40 (11) 1-4-8, 1-5-8
3 Schellman cap Type 1 6 0.09 0.07 81 1.01 1.02 (15) 1-6-9-114 Schellman cap Type 2 10 0.3 0.14 76 0.94 0.94 (15) 1-6-8-95 Proline -helixC cap 10 1.8 0.6 92 1.07 0.89 (13) 1-2-5-86 Frayed helix 2 1.2 0.13 75 0.96 0.69 (15) 1-5-9-137 Helix N capping box 10 1.1 0.6 99 0.95 0.65 (15) 1-6-9-138 Amphipathic -strand 8 6.8 2.1 89 0.87 0.87 (6) 1-3, 1-3-59 Hydrophobic -strand 5 2.3 0.3 101 0.91 0.91 (7) 1-2-310 -bulge 2 0.5 0.15 100 0.97 0.78 (7) 1-4-611 Serine -hairpin 4 1.3 0.3 94 0.76 0.81 (9) 1-812 Type-I hairpin 2 0.07 0.04 80 0.94 1.23 (13) 1-7-813 Diverging Type-II turn 4 0.3 0.14 87 1.04 1.00 (9) 1-7-9
I-sites sequence patterns are distinct
(Bystroff & Baker, J. Mol. Biol, 1998)
![Page 31: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/31.jpg)
A hypothesis:
I-sites sequence motifs are folding initiation sites.
• The I-sites sequence patterns are mutually exclusive.
• Each I-sites motif is found in a variety of contexts.
• Local structure forms fast.
• Early-folding units 'initiate' folding.
One reason this hypothesis may be wrong:
Database statistics may reflect bias in the data.
![Page 32: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/32.jpg)
Alpha helices may fold by packing interactions.
Dots show positions of alpha-carbons relative to the amphipathic helix motif. The hydrophobic side is up.
maybe not...
![Page 33: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/33.jpg)
How do we test this hypothesis?
• See if I-sites peptides fold in isolation from the rest of the protein.
... by NMR.
... by simulation.
![Page 34: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/34.jpg)
26 27 28 29 30 31 32position
C
FLIV
WYM
AQNT
SHRK
EDPG
AA
26 27 28 29 30 31 32
C
FLIV
WYM
AQNT
SHRK
EDPG
1 2 3 4 5 6 7position
C
FLIV
WYM
AQNT
SHRK
EDPG
AA
1 2 3 4 5 6 7
C
FLIV
WYM
AQNT
SHRK
EDPG
(a)(b)(c)(d)color scale
≥1.0.80.60.40.20.0-.2-.4-.6-.8≤-1AAAANMR structure of a 7-residue I-sites motif in isolation
(Yi et al, J. Mol. Biol, 1998)
diverging turn
![Page 35: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/35.jpg)
Partial literature search of peptide NMR structures
I-sites motif Authors date
glycine helix cap Viguera 1995
serine hairpin Blanco 1994
Type-I hairpin deAlba 1996
diverging turn Sieber 1996
![Page 36: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/36.jpg)
Molecular dynamics... is a cheap substitute for an NMR spectrometer.
What is MD?
• A simulation of the dynamic behavior of the molecule in water, using "first principles."
Advantages?
• You can observe the system directly.
Disadvantages?
• It's not a real system, just an approximation.
![Page 37: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/37.jpg)
Helical peptide simulations
AAALDRMRAALEALLRAANRSHMPAARYKFIEADFKAAVAAFDGETEIAKELVVVYAKGVETADARFTKRLGATLEEKLNCNGGHWIADAVTRYWPDEAIDAYIDELTRHIRDYVRSKIAEDLVERLKEELKQALREEMVSKLKEKLLESLEEKPFGTSYEQIKAAVK
FHMYFMLRFSVMNDASFYSSYVYLGQLMALKQHNLIEAFEIEHTLNEKIQNGDWTFKAAIAQLRKKYRPETDKNPDNVVGKPMGPLLVKQAHPDLKKQDKHYGYKSYLRSLRLDLHQTYLNAVWAAIKNETHSGRKNFLEVGEYNPVKESRHPAIISAAEPLQHHNLL
PRDANTSHQDDARKLMQGIIDKLDQKMKTYFNQTLAQLSVRDFEERMNRIILDRHRRLLLKAYRRPIARMLSRVLGRDLFSCDVKFPITEVMKRLVTLNEKRILYASLRSLVYESHVGCR
Seq
uenc
es
• AMBER (parm94) force field.• Randomly chosen natural sequences• Initially extended.• 800-900 waters added.• Ions added (Na, Cl)• 7-30 ns at 340°K
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
![Page 38: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/38.jpg)
The MD scheme
• Select random peptides and predict how much helix they will have, using the I-sites motif pattern.
• Run LONG simulations.
• Test to see whether they have reached equilibrium.
• If they have, find out how much of the time the peptide spent in a helical state. (by cluster analysis)
• Does the fraction helix correlate with the prediction?
![Page 39: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/39.jpg)
Cluster analysis of trajectories1) Define a node for every step in the trajectory, keep the backbone angles (q).
2) For each node, draw an edge to every other node for which max(Dq) < 60°.
3) The node with the most edges defines the first cluster. Remove it and all its neighbors. Then the node with the most edges is the second cluster. Etc.
![Page 40: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/40.jpg)
Clusters in conformational space
Our criterea for good clustering: no two clusters look alike, and no cluster looks like two.
RPIARMLS
![Page 41: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/41.jpg)
This is what a trajectory looks like if it has reached equilibrium
ns
cluster number
Both halfs of the trajectory have about the same distribution.
![Page 42: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/42.jpg)
This is what it looks like if it has not.
ns
cluster number
![Page 43: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/43.jpg)
NAIIQELE movie
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
A rough energy landscape.
![Page 44: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/44.jpg)
There is a correlation between I-sites sequence score and the simulations
r=0.48 (all peptides)r=0.61 (trajectories > 20ns long)
![Page 45: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/45.jpg)
Sampling of sequence space
72 peptides were simulated. Is this a representative sample of the space of amphipathic helix sequences?
I-sites motif 72 peptides, weighted by %helix
72 peptides, unweighted
![Page 46: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/46.jpg)
What this means?The MD experiment separates the local effects from the non-local effects on helix formation.
In the simulation, there are only local interations.
So the propensity for amphipathic sequences to form helix is mostly intrinsic.
![Page 47: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/47.jpg)
Outliers• Simulation too short.
We see only meta-stable states.
• I-sites scoring method is missing something.
Using additive probabilities ignores statistical dependence between different positions.
• Part-helix was not counted as helix in this study.Helix caps are competing motifs.
(+-) and (-+) look just like (++) and (--)
![Page 48: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/48.jpg)
QVFMRIME (a helix in 1dldA)
Predicted to be helix with confidence = 0.86
Zero helix found in 17ns trajectory. What does it fold into?
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
an outlier
![Page 49: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/49.jpg)
Protein sequence grammar
1. Letters: Amino acid profiles
2. Words: I-sites motifs
3. Phrases:
4. Sentences
![Page 50: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/50.jpg)
Protein sequence grammar
1. Letters: Amino acid profiles
2. Words: I-sites motifs
3. Phrases: a hidden Markov model
4. Sentences
![Page 51: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/51.jpg)
Motif “grammar”?
Arrangement of I-sites motifs in proteins is highly non-random
helixhelix cap
betastrand
betaturn
The dependencies can be modeled as a Markov chain
![Page 52: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/52.jpg)
the
mailman
dog
bit
kicked back
The dog bit the mailman. The mailman kicked the dog back.
Markov model
Sequence data
Stochastic output The dog back. The mailman kicked the mailman kicked the dog bit the dog bit the dog bit the mailman kicked the dog. ...
How to make a Markov chain
![Page 53: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/53.jpg)
A "hidden" Markov model
What's "hidden" about it?
An HMM is a Markov chain where the meaning of the Markov state is probabilistic.
![Page 54: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/54.jpg)
the
mailman 0.5postman 0.5
dog
bit 0.3attacked 0.7
kicked 0.6hit 0.4
back
The dog bit the mailman. The mailman kicked the dog back. The dog attacked the postman. The postman hit the dog.
hidden Markov model
Sequence alignment data
Stochastic output The dog back. The mailman kicked the postman kicked the dog bit the dog bit the dog attacked the mailman kicked the dog. ...
How to make a hidden Markov chain
![Page 55: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/55.jpg)
One Markov state from HMMSTR
ahi
aij
aik
regions
sequence profile
One state emits one letter of each type (b,r,d,c)
probabilitic meaning of the state
amino acid symbols
structure
symbols
bi = {ACDEF...}
ri = {HGEBdblLex}
di = {HST}
ci = {mnhd...}{previous letter(s)
next letter(s)
![Page 56: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/56.jpg)
Constructing a HMM by aligning motifs
![Page 57: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/57.jpg)
Related motifs, branched model.φ -1TypeG -C cap
-2TypeG -C caphelix-2TypeG -C caphelix-1TypeG -C cap
Merging many motifs into one HMM
![Page 58: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/58.jpg)
HMMSTR
Hidden Markov Model for local protein STRucture
282 nodes
317 transitions
Unified model for 31 distinct sequence-structure motifs
(Bystroff & Baker, J. Mol. Biol., 2000)
![Page 59: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/59.jpg)
Variations on a motif theme are modeled as parallel paths
Multiple state-pathways for the helix N-cap motif
![Page 60: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/60.jpg)
Common sub-graphs represent common sub-structures
These peptide segments have the same state sequence (except shaded residues)
![Page 61: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/61.jpg)
How an HMM works
P Q |S( ) =πq1 (S1) aqi −1qi bqi (Si )i=2,N∏
initiation probability
transition probability
emission probability
We have S (the sequence). We want Q (the 1D structure), and P (how well S fits Q)
![Page 62: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/62.jpg)
3-state secondary structure prediction
74.9% correct
74.6% correct
![Page 63: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/63.jpg)
Predicting super-secondary context
Results are for the independent test set.
![Page 64: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/64.jpg)
Fully-automated tertiary structure prediction
(1) Find homologues in the database (Psi-Blast)
(2) Predict local structure (HMMSTR)
(3) Assemble fragments (ROSETTA, D.Baker)
sequence
structure
Protocol used for CAFASP2 experiment (2000)
![Page 65: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/65.jpg)
Rosetta ab initio
Scoring function: Bayesian classification of pairwise secondary structure contact types.
Search function: Monte Carlo fragment insertion. A move consists of selecting a fragment at random from a set of local structure predictions. Coordinates are re-generated after swapping in the new fragment.
(Simons et al, PNAS, 1997)
![Page 66: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/66.jpg)
CASP3 Prediction results for Target 56 : DNA helicase
Predicted structure of 66-residue fragment (23-88)
True structure of same fragment
![Page 67: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/67.jpg)
CAFASP Prediction results for Target 122: 1GEQ Tryptophan Synthase
Predicted 97-residue fragment
True structure of same fragment
![Page 68: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/68.jpg)
Protein sequence grammer
1. Alphabet: amino acid profile
2. Words: I-sites motifs
3. Phrases: HMMSTR pathways
4. Sentences: contact maps
the next step...
![Page 69: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/69.jpg)
In progress:Data mining of contact maps
HMMSTR predictions
Protein sequences + contact maps
Association-rule mining (M. Zaki)
Rules for tertiary contacts
![Page 70: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/70.jpg)
Predicting tertiary contacts
Contact predictions for 2igd
overall : 20% coverage w/20% accuracy
Can the 2D map be translated to 3D?
![Page 71: Protein Folding Initiation Site Motifs Chris Bystroff Dept of Biology Rensselaer Polytechnic Institute, Troy, NY.](https://reader033.fdocuments.net/reader033/viewer/2022051316/56649d625503460f94a449c4/html5/thumbnails/71.jpg)
I-sites/HMMSTR collaborators
David Baker U. WashingtonKaren Han UCSFVestienn Thorsson U.WashingtonQian Yi U. WashingtonEdward Thayer ZymogeneticsShekhar Garde RPIMohammed Zaki RPISusan Baxter Wadsworth (->Novartis)Chip Lawrence Wadsworth/RPIBobbie Jo Webb WadsworthKim Simons U. Washington (->Harvard)
Bystroff Lab
Yu Shao
Xin Yuan
Jerry Huang
isites.bio.rpi.edu