Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ......
Transcript of Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ......
![Page 1: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/1.jpg)
Brian Kidd
November 23, 2010
Computational Biology Tools
Lecture 15:
Protein Structure Prediction/Analysis
*Slides from David Bernick and Carol Rohl
![Page 2: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/2.jpg)
Questions/Concerns from Last Time
![Page 3: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/3.jpg)
Overview
1. Structure alignments
• methods and applications
2. Protein structure prediction
• methods and applications
3. Case study
4. 3D structure visualization
![Page 4: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/4.jpg)
Structure more conserved than sequence
Why Examine Protein Structures?
Similar folds often share similar function
Remote similarities may only be detectable at structure level
Interpret experimental dataLocate sites of interesting mutations
Locate splice sites
Design ExperimentsIn silico mutagenesis
![Page 5: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/5.jpg)
Structure Analysis
Identify interesting sites on a protein
Homologs
Mutants
With and without ligand (or binding partner)
Measure geometry (distances, angles, ...)
Examine surface properties (shape, charge)
Compare two structures
![Page 6: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/6.jpg)
Comparing Protein Structures
Defined alignmentMutant v. wildtype, model v. experimental, i.e. two different conformations
Unique solution exists – we know the true alignment
Derived alignmentQuery is an unknown protein
Known parent (assumed homolog)
Calculate an “optimal” alignment computationally
Infer annotation from parent to query
![Page 7: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/7.jpg)
What do we want from an alignment?
Optimal alignmentImportant parts of protein should associate (align) with each other
Catalytic residues and their positionsImportant structures (hinges, binding sites, etc)Protein interface residues and their positionEvolutionary history
Natural selection only selects for successful function
Sequences (and alignments) are assumed to be sequential
![Page 8: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/8.jpg)
What do we want from an alignment?
Sequence alignments can be improved when we have structural information
No unique solution (more residues or closer match?)
Structural alignment implies a sequence alignment
![Page 9: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/9.jpg)
Tools and DatabasesNCBI Structure (VAST and MMDB)
http://www.ncbi.nlm.nih.gov/Structure/Molecular Modeling Database
Experimentally derived structures from PDB (not theoretical)FSSP (DALI)
http://www2.embl-ebi.ac.uk/dali/fssp/http://ekhidna.biocenter.helsinki.fi/daliFamilies of structurally similar proteins
Maintains database of protein neighbors organized by PDB codeFully automated using the DALI algorithm (Holm & Sander)
No internal node annotationsStructural similarity search using DALI
CEhttp://cl.sdsc.edu/Combinatorial extension
Maintains database of protein neighbors by PDB code
![Page 10: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/10.jpg)
Tools and Databases
Structure classification by domainClassification based on secondary structureSCOP – Structural Classification of Proteins
http://scop.berkeley.edu/Class-fold-superfamily-familyManual assembly by inspection (last release June 2009)
CATH – Class-Architecture-Topology-Homologous Superfamilyhttp://www.biochem.ucl.ac.uk/bsm/cath/Manual classification at Architecture levelAutomated topology classification using SSAP (Orengo & Taylor)Last release July 2009
CEMC – Multiple Structure Alignmenthttp://bioinformatics.albany.edu/~cemc/
![Page 11: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/11.jpg)
How Structure Alignments Work
MethodsStructal – Gerstein group at Yale
DALI – Holm group at Helsinki
VAST – NCBI resource
Structure similarity measuresRMSD – similarity metric
Pvalues – significance measure
![Page 12: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/12.jpg)
Iterative Dynamic Programming
Algorithm
1. Make initial guess for the superposition
2. Calculate all pairwise Ca-Ca distances and generate scoring matrix
3. Find optimal alignment according to this scoring matrix by dynamic programming
4. Re-superimpose structures using this alignment
5. Repeat steps 2–4 until converged
No guarantee of optimal solution, final results depends on the initial alignment selected
Structural: Subbiah et al., Curr. Biol. 3:141 (1993)
![Page 13: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/13.jpg)
Structural Alignment
Many methods other than dynamic programming are used
Most methods use some sort of heuristics to speed things up and make good initial guess
Sheba – sequence alignment
Mammoth – local structural alignment
VAST – aligns secondary structure element vectors
DALI – distance matrix alignment
![Page 14: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/14.jpg)
Distance Matrix AlignmentMatrix of all pairwise distances
Characteristic patterns:
Main diagonal runs correspond to helix (i.e. local contacts)
Hairpins – start on main diagonal, run perpendicular
Parallel pairs run parallel to main diagonal
Others are long range contacts
Converts 3D alignment problem into a 2D problem
Find best subset of rows and columns such that the distance matrices of two proteins are optimally similar
Myoglobin
![Page 15: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/15.jpg)
!"#$%&'()
*+%,&-
*+.&"/&'
00*1$".'21
Contact Map Comparison
Myoglobin
Protein G
// strands
α-helix
β-hairpin
![Page 16: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/16.jpg)
Similarity Measure
RMSD – root mean square deviation�< ||XA
i −XBi ||2 >
1. Superimpose optimally
2. Pair up residues
3. Calculate RMSD
!"#
!$#!%#
!&#
!'#
!"(
!$(
!&(
!%(!'(
Sensitive to outliersDepends on number of pairs comparedA better measure is the significance of this RMSD for similar sized matches
![Page 17: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/17.jpg)
z-scores and p-values
!"#$%&'()*+,"-.*+'(*)(/
0'$12(3(-42(,"-.*+'(5(3
/(-42(,"-.*+'(5(/
6(-42(,"-.*+'(5(6,"-.*+'(5(7
,"-.*+'(5(8
z-score: number of standard deviations above/below the mean
± 1 sd ~ 66%
± 2 sd ~ 95%
If we have a histogram, we can just count, or integrate a function fitted to the histogram
p-valueprobability of obtaining ≥ this score under the null model (normally distributed data -- “by chance”)
Histogram of scores for random matches
![Page 18: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/18.jpg)
Meaning of Structural Alignments!"""""""!""!"""""""!""""""""!!!!!!!!""!!!!!!!!!!!"""!
###$%&'()*!+*,)*&*+-(!-./0*&-1##!####()2)&%!0)-,&..0%%333!!!3!3333333333!33333333!!!!!!!!!!!!!!!!!!3333332*4)(*+&1-!2-,&1-*&05!000*4&+022!--2,+0+.4/!562,25/*52
"""""""!""!"""""""!""""""""!!!!!!!!""!!!!!!!!!!!"""
"""""!!!!!!!!"!!!!!!!!!!!!!!!!!!!!!""""""""!"""""6+&'2,)%##!#+-0,6##*+!/########0!41&%)-/*+7!+(+6+6,33!3!3!3!!!!3333!!!33!3!!!!!!!!3!33333333,*&*/,*&0%!/0%/'+000%!&-2,4(+*5(!24.*/05*&)!*7%--,+"""""!!!!!!!!"!!!!!!!!!!!!!!!!!!!!!""""""""!"""""
!"#$ %&'(
Two proteins are clearly structurally similar
Mammoth identifies similar substructures, but the alignment is not entirely “correct”
Opportunistic matched residuesMisses some analogous elements
1ubq 4fxc
![Page 19: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/19.jpg)
![Page 20: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/20.jpg)
Why Predict Protein Structures?
![Page 21: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/21.jpg)
Test models and theories about structural biochemistry
Why Predict Protein Structures?
Identify drug targets for medicine
Experimentally derived structures are still slow and not all structures are easily solved
Explore states that are difficult to examine experimentally
![Page 22: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/22.jpg)
Challenges for Structure Prediction
Search space is astronomical – need an efficient sampling algorithm
Actual proteins tend to be in energy minimums – need a scoring system for discriminating between modesl
![Page 23: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/23.jpg)
CASP
Critical Assessment of Structure Prediction
Community effort to improve predictions
Forced scientists to start learning what actually works in prediction
![Page 24: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/24.jpg)
Types of Predictions
Comparative Model
Ab initio or de novo
![Page 25: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/25.jpg)
!"#$%&%!"#$%&'()!*'+,!-#.%/0!Stage I. Fragment
Assembly!
Baker Method
*Slides from Rhiju Das
![Page 26: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/26.jpg)
!"#$%&%!"#$%&'()!*'+,!-#.%/0!Stage II. All-atom
refinement!
Baker Method
*Slides from Rhiju Das
![Page 27: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/27.jpg)
Example
![Page 28: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/28.jpg)
!"#$%&%!"#$$%""%"&!'(()!"
Native! Model!2.0 Å over 61 residues
CASP7 target T0316 (domain 3)
*Slides from Rhiju Das
![Page 29: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/29.jpg)
![Page 30: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/30.jpg)
Case Study
PAZ domain of Pf_Ago
Pf_Ago 1u04
hAgo1 1si2/1si3
Y212 Y309(Y90)
Y216 Y314(Y95)
H217 H269(H49)
Y190 Y277(Y57)
Ji-Joon Song [PMID: 15284453] asserts on p. 1435 that the following are functionally equivalent:
What do you think?
![Page 31: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/31.jpg)
Case Study Continued
1u04 – PAZ domain, chain A 152-275
1si2 – PAZ domain, chain A 4-128
http://www.pdb.orgGet the above structures
http://www.ebi.ac.uk/DaliLite/Align 1UO4 with 1SI2:A (h_Ago)
What is the z-score? Is it significant?
What is the RMSD? Is this a reasonable alignment?
How many residues aligned?
![Page 32: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/32.jpg)
PyMol InClassLoad both molecules (aligned) into PyMol
Action-preset-pretty for both molecules
For 1u04, delete everything you don’t need
select-rename object 1u04
chain B; select-remove atoms
chain A and resi 1-151; select-remove atoms
chain A and resi 276-770; select remove atoms
color red
Load 1si2, color it yellow
chain B is a small RNA; show spheres, chain B; color blue
select 1u04 and resi 212; show as sticks
repeat for 190, 216, 217
select 1si2 and resi 309; show as sticks
repeat for 269, 277, 314
![Page 33: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/33.jpg)
In Class Summary
So, who’s correct?
Is J.J. Song correct?
Is Dali?
Is Vast?
![Page 34: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/34.jpg)
Essentials at this PointAccessing literature and sequence information from various databases (NCBI and UCSC)
BLAST (all variants)
Pairwise sequence analysis tools and algorithms
Single sequence analysis tools DNA:EMBOSS, ORFs, Restriction Enzymes, & Primers
Protein databases and analysis tools
PSI and PHI BLASTs
Multiple sequence alignments
Phylogeny
RNA structure (basics and analytical tools)
Protein structure (basics and analytical tools)
This is everything!
![Page 35: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis](https://reader031.fdocuments.net/reader031/viewer/2022030417/5aa2ce027f8b9a1f6d8db934/html5/thumbnails/35.jpg)
For Next Time
Reading
Problem set
Review
Finish up PS #3 (due Tuesday, November 23)
Start working on PS #4 (due Friday, December 3)
http://www.soe.ucsc.edu/classes/bme110/Fall10/calendar.html