Computational Analysis of Proteins
Dr. K. SivakumarDepartment of Chemistry
SCSVMV [email protected]
Chemistry – Our Life, Our Future
National Workshop
on
Modern Techniques in Analytical Chemistry
www.kanchiuniv.ac.in/DrKSivakumar_chemistry.html
AMINO ACIDS: THE BUILDING BLOCKS OF PROTEINS
Triple & single letter codes of amino acids
General structureof an amino acid
Amino AcidTriple letter
codeSingle letter
codeAlanine Ala A
Cysteine Cys C
Aspartic acid Asp D
Glutamic acid Glu E
Phenylalanine Phe F
Glycine Gly G
Histidine His H
Isoleucine Ile I
Lysine Lys K
Leucine Leu L
Methionine Met M
Asparagine Asn N
Proline Pro P
Glutamine Gln Q
Arginine Arg R
Serine Ser S
Threonine Thr T
Valine Val V
Tryptophan Trp W
Tyrosine Tyr Y2
PROTEIN SEQUENCING ( Order of amino acids in proteins)
MALSFTVGQLIFLFWTMRITEASPD
Methionine
AlanineLeucine Serine
Phenylalanine
Protein sequence
Protein sequencer•Protein sequencing - determining the order of amino acid sequence
•Methods– Mass Spec., Edman degradation,….
•Amino acids in a protein - determines the properties of proteins
•Proteins are sequenced - by microbiologists and biotechnologists for various purposes.
3
4
www.writersujatha.com
Refer “GENOME” by Sujatha, for simple explanations on sequencing process
5
Various levels of protein structure…….. Various levels of protein structure……..
Methane
Primary structure
Secondarystructure
Tertiarystructure
Protein
Primary structure
Secondarystructure
Tertiarystructure
4CH
M for MetheonineM for group of atoms
C for carbonC for single atom
• Protein sequences are continuously submitted by sequencing centers and updated in protein databases.
• Till date more than 10 Lac proteins are sequenced and publicly made available through protein databases. For example,
524,420
Protein Sequence Databases No. of Sequences
1,365,912
13,593,921
7
Sequence growth in Protein sequence databases:
Ref: SwissProt – Feb’ 2011 Ref: GenomeNet – Feb’ 2011
70,947Till 01, Feb, 2011
9
524,420 - ~ 5 Lac
Protein Sequence Databases No. of Sequences
1,365,912 - > 10 Lac
13,593,921 - ~ 1 Cr
The ONLY Protein Structure Database No. of Structure
Ref: K. Sivakumar, Advanced BioTech, V (9), 20-27 (2007)
10
PDB contains (70,947) structures determined by X-ray, NMR & Electron microscopyPDB contains (70,947) structures determined by X-ray, NMR & Electron microscopy
EM~350
NMR~8,700
X-ray~60,500
Most of the sequenced proteins lack a descriptive, documented physico-chemical and STRUCTURAL characterization.
Because, experimental methods (X-ray, NMR, EM) are,
Trial and error based
Time consuming
Expensive
11
Computational methods are,
Minimizing the number of experimental trials.
Reduces the cost of experimental investigation.
Facilitates experimental analysis be more focused.
Ref: K. Sivakumar, S. Balaji, Ganga Radhakrishnan, Journal of Theoretical and Computational Chemistry, 6 (1), 127-140 (2007).
12
Need for computational analysisNeed for computational analysis
• > 10 Lac sequences are available in public databases
• Sequences are highly valuable resources, because…
• Huge amount of structural, functional & evolutionary information are locked up in sequences
• By contrast, the # of unique protein structures is very less
• - this represents a huge information deficit
• So, We need to construct 3D Models by COMPUTATIONAL METHODS
13
3D Structure can be modelled by…3D Structure can be modelled by…
• Homology Modeling
• Threading
• Ab initio
Ref: K. Sivakumar, Advanced BioTech, IV (11), 18-23 (2006)
Repeated with other suitable templates
14
Homology Modeling – Principle…Homology Modeling – Principle…
??
KQFTKCELSQNLYDIDGYGRIALPELICTMFHTSGYDTQAIVENDESTEYGLFQISNALWCKSSQSPQSRNICDITCDKFLDDDITDDIMCAKKILDIKGIDYWIAHKALCTEKLEQWLCEKE
Predicting Protein Structure:Predicting Protein Structure:Comparative ModelingComparative Modeling
(formerly, homology modeling)(formerly, homology modeling)
Use as template & model 8lyz1alc
KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRLShare
Similar Sequence
HomologousTarget sequence Template sequence
Template structure
What is Homology Modeling?
• Predicts the three-dimensional structure of a given protein sequence (TARGET) based on an alignment to one or more known protein structures (TEMPLATES)
• If similarity between the TARGET sequence and the TEMPLATE sequence is detected, structural similarity can be assumed.
• In general, 30% sequence identity is required for generating useful models.
17
Homology ModelingHomology ModelingGet protein sequence from sequence database
http://expasy.org/sprot/
18
Click to get protein details
19
Click to get protein sequence
20
protein sequence in fasta format
Save it in a notepad for further use
21
http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins
Using Protein Blast server to find similar STRUCTURE
Click to search, similar structures in PDB
Paste sequence in Fasta format
Choose PDB
22
Graphical summary of Blastp suite
Blast search of O70456 Vs PDB
23
List of similar structure - Blastp suite
24
Detailed summary of Blastp suite
25
Paste sequence
only
Type the PDB ID
Method1: EsyPred3D server - Submit the sequence and PDB ID
Click to submit
26
Get built in structure through email in Inbox
27
Download the attached the *.pdb file and save it
28
Open and visualize the *.pdb file in RasMol
29
Open and visualize the *.pdb file in RasMol
30
Method2: SWISS-MODEL server
Click for modeling
31
Submit sequence only in Fasta format (without PDB ID)
Similarity search (BlastP) will be done by SWISS-MODEL server
Paste sequenceClick to submit
32
Get built in structure through email in Inbox
33
The links in the email will lead to
Click to download 3D structure
34
Open and visualize the *.pdb file in RasMol
35
Structure retrieval from Protein 3D Structure Database – PDB……….
36
Structure retrieval from Protein 3D Structure Database – PDB……….
PDB ID
Click for protein details
491 sequence in SwissProt for « Keratin »
37
Structure retrieval from Protein 3D Structure Database – PDB……….
Click for downloading structure
38
Structure retrieval from Protein 3D Structure Database – PDB……….
Save & Know the location
39
Open and visualize the *.pdb file in RasMol
Structure of 3EUU
40
MNRVDLSLFIPDSLTAETGDLKIKTYKVVLIARAASIFGVKRIVIYHDDADGEARFIRDILTYMDTPQYLRRKVFPIMRELKHVGILPPLRTPHHPTG
Sequence data
Structural data
(in notepad)
Atom No. AtomAmino
Acid(AA)AA No.
1 N PRO 98 8.824 17.273 88.787
2 CA PRO 98 8.452 18.679 89.088
3 CD PRO 98 9.692 16.763 89.899
4 CB PRO 98 8.73 18.889 90.578
5 CG PRO 98 9.172 17.521 91.124
6 C PRO 98 9.482 19.367 88.271
7 O PRO 98 10.515 19.739 88.825
8 N ARG 99 9.263 19.522 86.956
9 CA ARG 99 10.346 20.102 86.231
10 CB ARG 99 11.564 19.174 86.276
11 CG ARG 99 12.054 19.012 87.718
12 CD ARG 99 10.944 18.606 88.698
6213 N GLY 1078 -299.78 40.023 17.009
6214 CA GLY 1078 -285.59 39.377 19.813
6215 C GLY 1078 -267.82 38.403 22.744
6216 O GLY 1078 -267.78 38.205 24.03
6217 N ILE 1079 -255.59 37.727 24.695
6218 CA ILE 1079 -241.59 37.013 27.144
6219 CB ILE 1079 -241.06 35.864 27.728
-------------------------------------------------------------------------------
x, y,z Cordinates
Structural data
(in RasMol)
41
Built model validation by ProQ server
Click for uploading structure
42
Built model validation by ProQ server
Click & upload the structure
43
Built model validation by ProQ server
Submit after uploading
44
Built model validation by ProQ server result
45
Built model validation by Ramachandran Plot
Click & upload the structure
46
Submit after uploading
Built model validation by Ramachandran Plot….
47
Built model validation by Ramachandran Plot…. RESULTS
G.N.Ramachandran
Ref: K. Sivakumar, S. Balaji, Ganga Radhakrishnan, Journal of Chemical Sciences, 119 (5), 571-579 (2007)
3D structure modeling and validation
48
Disulphide bridges in 3D structure of Q01758
• Backbone of Q01758 (rainbow smelt fish)• 10 Cysteines - ball and stick • 10 Sulphur in Cysteines and 5 SS bonds (dotted lines) 49
Disulphide bridges in 3D structure of P05140
• Ribbon model of P05140 (sea raven)• 10 Cysteines - ball and stick• 10 Sulphur in Cysteines and 5 SS bonds (dotted lines)
50
Secondary structure prediction from modeled 3D structure
Q01758
P05140 Beta strand
-helices
Coil
51
52
Finding cavities in the built model using Castp server
Click for calculation
53
Finding cavities in the built model using Castp server
Click, upload & Submit the structure
54
Finding cavities in the built model using Castp server - RESULTS
55
For literature
56
58
59
• Download sequence file for any one of the following proteins from Swissprot/Protein Information Resource/Protein Research Foundation,
AntifreezeVascular Endothelial growth factor proteinKeratin
• Generate atleast 3 homology models using EsyPred server or SWISS-model server (i.e., using different PDB structures)
• Visualize the structure using RasMol tool
• Compare and Evaluate the modelled 3D structure using RamPage, ProQ Server and Combinatorial Extension servers.
EXERCISE
Target sequence code Template (PDB) Codes
RamPage ProQ
Percentage of residues in
favoured regionLG Score MaxSub
60
• Generate the report in MS-Word file and submit to [email protected]
• Repeat the exercise for other protein sequences of your choice
EXERCISE……
Thank you all!
61
P05140
Top Related