Homology modeling and structure prediction of thioredoxin (TRX) protein in wheat (Triticum aestivum
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling...
-
date post
21-Dec-2015 -
Category
Documents
-
view
228 -
download
0
Transcript of Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling...
![Page 1: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/1.jpg)
Protein Structure and Function Prediction
![Page 2: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/2.jpg)
Predicting 3D Structure
– Comparative modeling (homology)
– Fold recognition (threading)
Outstanding difficult problem
![Page 3: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/3.jpg)
Comparative Modeling
Comparative structure predictionproduces an all atom model of asequence, based on its alignment to oneor more related protein structures in thedatabase
Similar sequence suggests similar structure
![Page 4: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/4.jpg)
Comparative ModelingModeling of a sequence based on known structuresConsist of four major steps :1. Finding a known structure(s) related to the sequence
to be modeled (template), using sequence comparison methods such as PSI-BLAST
2. Aligning sequence with the templates
3. Building a model
4. Assessing the model
![Page 5: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/5.jpg)
Comparative Modeling• Accuracy of the comparative model is
related to the sequence identity on which it is based
>50% sequence identity = high accuracy
30%-50% sequence identity= 90% modeled
<30% sequence identity =low accuracy (many errors)
• Similarity particularly high in core– Alpha helices and beta sheets preserved– Even near-identical sequences vary in loops
![Page 6: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/6.jpg)
Comparative Modeling Methods
MODELLER (Sali –Rockefeller/UCSF)
SCWRL (Dunbrack- UCSF )
SWISS-MODEL http://swissmodel.expasy.org//SWISS-MODEL.html
![Page 7: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/7.jpg)
Protein Folds
• A combination of secondary structural units– Forms basic level of classification
• Each protein family belongs to a fold– Estimated 1000–3000 different folds
– Fold is shared among close and distant family members
• Different sequences can share similar folds
![Page 8: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/8.jpg)
Hemoglobin TIM
Protein Folds: sequential and spatial arrangement of secondary structures
![Page 9: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/9.jpg)
Fold classification:(SCOP)•Class:
All alphaAll betaAlpha/betaAlpha+beta
•Fold•Family•Superfamily
![Page 10: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/10.jpg)
Basic steps in Fold Recognition :
Compare sequence against a Library of all known Protein Folds (finite number)
Query sequenceQuery sequence
MTYGFRIPLNCERWGHKLSTVILKRP...
Goal: find to what folding template the sequence fits bestGoal: find to what folding template the sequence fits best
Find ways to evaluate sequence-structure fit
![Page 11: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/11.jpg)
Find best fold for a protein sequence: Fold recognition (threading)
Find best fold for a protein sequence: Fold recognition (threading)
MAHFPGFGQSLLFGYPVYVFGD...
Potential fold
...
1) ... 56) ... n)
...
-10 ... -123 ... 20.5
![Page 12: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/12.jpg)
Programs for fold recognition
• TOPITS (Rost 1995)
• GenTHREADER (Jones 1999)
• SAMT02 (UCSC HMM)
• 3D-PSSM http://www.sbg.bio.ic.ac.uk/~3dpssm/
![Page 13: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/13.jpg)
Ab Initio Modeling
• Compute molecular structure from laws of physics and chemistry alone– Ideal solution (theoretically)
• Simulate process of protein folding– Apply minimum energy considerations
• Practically nearly impossible– Exceptionally complex calculations– Biophysics understanding incomplete
![Page 14: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/14.jpg)
Ab Initio Methods
• Rosetta (Bakers lab, Seattle)
• Undertaker (Karplus, UCSC)
![Page 15: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/15.jpg)
Predicting Protein Function
PART 2
![Page 16: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/16.jpg)
Inferring protein function :
• Based on the existence of known protein domains
• Based on homology
![Page 17: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/17.jpg)
Protein Domains
• Domains can be considered as building blocks of proteins.
• Some domains can be found in many proteins with different functions, while others are only found in proteins with a certain function.
• The presence of a particular domain can be indicative of the function of the protein.
![Page 18: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/18.jpg)
DNA Binding domainZinc-Finger
![Page 19: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/19.jpg)
Protein Domain can be defined by :
• A motif• A profile (PSSM)• A Hidden Markov Model
![Page 20: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/20.jpg)
MOTIF
Rxx(F,Y,W)(R,K)SAQ
![Page 21: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/21.jpg)
Profile Scoring
![Page 22: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/22.jpg)
PROSITE
• ProSite is a database of protein domains that can be searched by either regular expression patterns or sequence profiles.
Zinc_Finger_C2H2 Cx{2,4}Cx3(L,I,V,M,F,Y,W,C)x8Hx{3,5}H
![Page 23: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/23.jpg)
Profile HMM (Hidden Markov Model)
D16 D17 D18 D19
M16 M17 M18 M19
I16 I19I18I17
100%
100% 100%
100%
D 0.8S 0.2
P 0.4R 0.6
T 1.0 R 0.4S 0.6
X XX X
50%
50%D R T RD R T SS - - SS P T RD R T RD P T SD - - SD - - SD - - SD - - R
16 17 18 19
HMM is a probabilistic model of the MSA consisting of a number of interconnected states
Match
delete
insert
![Page 24: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/24.jpg)
Pfam
• The Pfam database is based on two distinct classes of alignments– Seed alignments which are deemed to be
accurate and used to produce Pfam A– Alignments derived by automatic clustering of
SwissProt, which are less reliable and give rise to Pfam B
• Database that contains a large collection of multiple sequence alignments andProfile hidden Markov Models (HMMs).
• High-quality seed alignments are used to build HMMs to which sequences are aligned
![Page 25: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/25.jpg)
InterPro
Was built from protein classification databases, such as:
• PROSITE• ProDom• SMART• Pfam• PRINTS
Uses UniProt = SWISSPROT and TrEMBL
![Page 26: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/26.jpg)
Database and Tools for protein families and domains
• InterPro - Integrated Resources of Proteins Domains and Functional Sites
• Prosite – A dadabase of protein families and domain • BLOCKS - BLOCKS db • Pfam - Protein families db (HMM derived)• PRINTS - Protein Motif fingerprint db • ProDom - Protein domain db (Automatically generated) • PROTOMAP - An automatic hierarchical classification of Swiss-Prot
proteins • SBASE - SBASE domain db • SMART - Simple Modular Architecture Research Tool • TIGRFAMs - TIGR protein families db
![Page 27: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/27.jpg)
Inferring protein function based on sequence homology
![Page 28: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/28.jpg)
Clusters of Orthologous Groups of proteins
(COGs) Classification of conserved genes according to their
homologous relationships. (Koonin et al., NAR)
Homologs - Proteins with a common evolutionary origin
Paralogs - Proteins encoded within a given species that arose from one or more gene duplication events.
Orthologs - Proteins from different species that evolved by vertical descent (speciation).
![Page 29: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/29.jpg)
Clusters of Orthologous Groups of proteins
(COGs)
Each COG consists of individual orthologous proteins or orthologous sets of paralogs from at least three lineages.
Orthologs typically have the same function, allowing transfer of functional information from one member to an entire COG.
![Page 30: Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d615503460f94a426ab/html5/thumbnails/30.jpg)
COGS - Clusters of orthologous groups
* All-against-all sequence comparison of the proteins encoded in completed genomes (paralogs/orthologs)
* For a given protein “a” in genome A, if there are several similarproteins in genome B, the most similar one is selected
* If when using the protein “b” as a query, protein “a” in genome A is selected as the best hit “a” and “b” can be included in a COG
* Proteins in a COG are more similar to other proteins in the COG than to any other protein in the compared genomes
* A COG is defined when it includes at least three homologousproteins from three distant genomes