Post on 07-Apr-2018
8/6/2019 Structural Bioinfo
1/76
STRUCTURAL BIOINFORMATICS( Toward A High-Resolution Understanding of Biology )
8/6/2019 Structural Bioinfo
2/76
Objectives of Lecture
Structural Bioinformatics What is 3D Structure Prediction Significance of 3D Structure Prediction Central Dogma Fundamentals of Protein StructureProtein Data bank (PDB) To be aware of a number of Structure Prediction methods: Homology Modeling Fold Recognition/ThreadingAb initio Protein Folding ApproachesApplications of Structural BioinformaticsAnalog-Based design Structure-Based design
8/6/2019 Structural Bioinfo
3/76
Structural BioinformaticsStructural Bioinformatics
Structural Bioinformatics is a subset of Bioinformatics
concerned with the use of biological structures-Protein, DNA, RNA, Ligands and complexes thereofto further our understanding of biological systems.
8/6/2019 Structural Bioinfo
4/76
What is protein structure prediction?
A prediction of the (relative) spatial position of each
atom in the tertiary structure generated from
knowledge only of the primary structure
(sequence).
8/6/2019 Structural Bioinfo
5/76
Significance of Protein Structure
Prediction In evolutionary related proteins structure is much better
preserved than sequence.
3D protein structure offers much more information then justthe amino acid sequence. By comparison with known structures we can infer probable
biological functions of new proteins By mapping the residue conservations on to the structure we
can infer active sites and possibly the molecular function
8/6/2019 Structural Bioinfo
6/76
We can also identify regions involved in protein-proteininteractions.
We can reconstruct (at least partially) the structure of protein complexes identified by other experimental methods.
We can build homology models.
8/6/2019 Structural Bioinfo
7/76
The central dogma
DNA ------- RNA ---------- Protein{A,C,T,G} {A,C,G,U} {A,D,..Y}Guanine, Cytosine TU
Thymine, Adenine
8/6/2019 Structural Bioinfo
8/76
Fundamentals of Protein Structure
8/6/2019 Structural Bioinfo
9/76
Terminology
Primary Structure-- The sequence of amino acidresidues in the proteins.
--MESSTHEDRKVLDL
8/6/2019 Structural Bioinfo
10/76
Amino acids and the peptidebond
C first side chain carbon (except for glycine).
C atoms
8/6/2019 Structural Bioinfo
11/76
Secondary Structure
A first level description of 3D structure. The peptide backbone of DNA has areas of positive charge
and negative charge These areas can interact with one another to form hydrogenbonds
The result of these hydrogen bonds are two types ofstructures:
alpha helices beta pleated sheets
8/6/2019 Structural Bioinfo
12/76
Secondary Structure I: TheE-
Helix
8/6/2019 Structural Bioinfo
13/76
Several beta-strands assembleinto abeta-sheet (a tertiarystructural element)
Secondary Structure II: The -
Strand
(About 3.4)
8/6/2019 Structural Bioinfo
14/76
Antiparallel -Sheets
8/6/2019 Structural Bioinfo
15/76
Parallel -Sheets
8/6/2019 Structural Bioinfo
16/76
Mixed -Sheets
8/6/2019 Structural Bioinfo
17/76
Tertiary Structure: The Global Three
Dimensional Structure
Secondary structure elements pack together to form astructural core
Tertiary structure results from the folding of alpha helicesand beta pleated sheets
Factors influencing tertiary structure include: Hydrophobic/hydrophilic interactions Hydrogen bonding Disulfide linkages Folding by chaperone proteins
8/6/2019 Structural Bioinfo
18/76
Tertiary Structure: Different Representations
8/6/2019 Structural Bioinfo
19/76
(Richardson-style)Ribbon Diagrams
are tracesoftheprotein
backbone
emphasizing the 3-D arrangementofa-helices and b-strands.
This arrangement is called
the proteinfold or the proteinfolding topology.
8/6/2019 Structural Bioinfo
20/76
This is much rather likewhat other molecules see when theyencounter a protein!
This is a representation ofthe molecularsurface (Van der Waals
surface) of a hemagglutinin domain withbound sialic acid.
Tertiary Structure: Different Representations
8/6/2019 Structural Bioinfo
21/76
Supersecondary Structures: Between
Secondaryand Tertiary Structure
For example:- alpha- -above
- -hairpin - left
8/6/2019 Structural Bioinfo
22/76
Quaternary Structure
Association of Multiple Polypeptide Chains. Quaternary structure results from the interaction of
independent polypeptide chains
Factors influencing quaternary structure include: Hydrophobic/hydrophilic interactions Hydrogen bonding The shape and charge distribution on associating
polypeptides
8/6/2019 Structural Bioinfo
23/76
8/6/2019 Structural Bioinfo
24/76
8/6/2019 Structural Bioinfo
25/76
Side Chain Properties
Hydrophobic amino acids stay inside of a protein.
Hydrophilic ones tend to stay in the exterior of aprotein.
Oppositely charged amino acids can form salt
bridge.
Polar amino acids can participate hydrogen bonding.
8/6/2019 Structural Bioinfo
26/76
Domain, Motif, Fold
Domain: a discrete portion of a protein assumed to foldindependently of the rest of the protein and possessing its
own function. Most proteins have multiple domains.The overall shape of a domain is called a fold. There are onlya few thousand possible folds.Super-secondary structure, motif
Frequently occurring structure patterns among multipleproteins, which are not necessarily have similar folds.
8/6/2019 Structural Bioinfo
27/76
Determination of protein
structures
X-ray Crystallography
NMR (Nuclear Magnetic Resonance)
EM (Electron microscopy)
8/6/2019 Structural Bioinfo
28/76
A repository for 3-D biological macromolecular structure. Established in 1971 at Brookhaven National Lab (7structures) It includes proteins, nucleic acids and viruses. Obtained by X-Ray crystallography (80%) or NMRspectroscopy (16%). Submitted by biologists and biochemists from around theworld.
Other sites:MMDB (EBI): msd.ebi.ac.ukNCBI: www.ncbi.nlm.nih.gov/Structure/
Protein Data bank (PDB)
8/6/2019 Structural Bioinfo
29/76
Growth ofProtein Data Bank (PDB): The Motivation
The number of unique folds in nature is fairly small(possibly a few thousands)
90% of new structures submitted to PDB in the past three
years have similar structural folds in PDB
New fold
Old fold
8/6/2019 Structural Bioinfo
30/76
8/6/2019 Structural Bioinfo
31/76
8/6/2019 Structural Bioinfo
32/76
8/6/2019 Structural Bioinfo
33/76
8/6/2019 Structural Bioinfo
34/76
8/6/2019 Structural Bioinfo
35/76
Protein Structure Prediction
Methods
Comparative Modeling Method:
Homology Modeling Method
Threading Method
Ab initio folding Method
8/6/2019 Structural Bioinfo
36/76
Experimental
Sequence
Database
Searching
Abinitiomethod
Structure
Homolog?
NO
YES
Homology
ModelingProtein Threading
Protein structure prediction flowchart
HomologyModeling
8/6/2019 Structural Bioinfo
37/76
Homology Modeling
Predicts the three-dimensional structure of a given proteinsequence (TARGET) based on an alignment to one or moreknown protein structures (TEMPLATES)
If similarity between the TARGET sequence and theTEMPLATE sequence is detected, structural similarity can beassumed.
In general, 30% sequence identity is required for generating useful models.
8/6/2019 Structural Bioinfo
38/76
7 Steps In Homology Modeling
8/6/2019 Structural Bioinfo
39/76
Step 1: ID HomologuesinPDB
PRTEINSEQENCEPRTEINSEQUENC
EPRTEINSEQNCEQWERYTRASDFHG
TREWQIYPASDFGHKLMCNASQERWWPRETWQLKHGFDSADAMNCVCNQWER
GFDHSDASFWERQWK
Query Sequence PDB
8/6/2019 Structural Bioinfo
40/76
Step 1: ID HomologuesinPDB
PRTE
INSEQE
NCE
PRTE
INSEQ
UE
NCEPRTEINSEQNCEQWERYTRASDFHG
TREWQIYPASDFGHKLMCNASQERWW
PRETWQLKHGFDSADAMNCVCNQWER
GFDHSDASFWERQWK
PRTEINSEQENCEPRTEINSEQUENC
EPRTEINSEQNCEQWERYTRASDFHG
TREWQIYPASDFGHKLMCNASQERWW
PRETWQLKHGFDSADAMNCVCNQWERGFDHSDASFWERQWK
PRTEINSEQENCEPRTEINSEQUENC
EPRTEINSEQNCEQWERYTRASDFHG
TREWQIYPASDFG
PRTEINSEQENCEPRTEINSEQUENC
EPRTEINSEQQWEWEWQWEWEQWEW
EWQRYEYEWQWNCEQWERYTRASDF
HG
TREWQIYPASDWERWEREWRFDSFG
PRTEINSEQENCEPRTEINSEQUENC
EPRTEINSEQNCEQWERYTRASDFHG
TREWQIYPASDFGPRTEINSEQENCEPRTEINSEQ
UE
NCE
PRTE
INSEQ
NCEQWE
RYTRASDFHGTREWQIYPASDFG
TREWQIYPASDFGPRTEINSEQENCEPRTEINSEQUENCEPRTEINSEQNCEQWERYTRASDFHGTREWQ
PRTEINSEQENCEPRTEINSEQUENC
EPRTEINSEQNCEQWERYTRASDFHG
TREWQIYPASDFGHKLMCNASQERWW
PRETWQLKHGFDSADAMNCVCNQWERGFDHSDASFWERQWK
PRTEINSEQENCEPRTEINSEQUENC
EPRTEINSEQNCEQWERYTRASDFHG
TREWQIYPASDFGHKLMCNASQERWW
PRETWQLKHGFDSADAMNCVCNQWERGFDHSDASFWERQWK
PRTEINSEQENCEPRTEINSEQUENC
EPRTEINSEQNCEQWERYTRASDFHG
TREWQIYPASDFG
PRTEINSEQENCEPRTEINSEQUENC
EPRTEINSEQNCEQWERYTRASDFHG
TREWQIYPASDFGPRTEINSEQENC
Hit#1
Hit#2
Query sequencePDB
8/6/2019 Structural Bioinfo
41/76
Step 2: Align Sequences
G E N E T I C S
G 10 0 0 0 0 0 0 0
E 0 10 0 10 0 0 0 0
N 0 0 10 0 0 0 0 0
E 0 0 0 10 0 0 0 0
S 0 0 0 0 0 0 0 10
I 0 0 0 0 0 10 0 0
S 0 0 0 0 0 0 0 10
G E N E T I C S
G 10 0 0 0 0 0 0 0
E 0 10 0 10 0 0 0 0
N 0 0 10 0 0 0 0 0
E 0 0 0 10 0 0 0 0
S 0 0 0 0 0 0 0 10
I 0 0 0 0 0 10 0 0
S 0 0 0 0 0 0 0 10
G E N E T I C SGE
NESI
S
60 40 30 20 20 0 10 040
302020100
50
302020100
30
402020100
30
203020100
20202020100
0
0100200
10101010100
00010010
DynamicProgramming
8/6/2019 Structural Bioinfo
42/76
Alignment
Key step in Homology Modeling.
Global (Needleman-Wunsch) alignment is absolutely
required.
Small error in alignment can lead to big error instructural model.
Multiple alignments are usually betterthan pairwise
alignments.
Alignment is prepared by superimposing all template
structures.
8/6/2019 Structural Bioinfo
43/76
Two zonesofsequencealignment
8/6/2019 Structural Bioinfo
44/76
Step 3: Find SCRs
Query
Hit #1
Hit #2
ACDEFGHIKLMNPQRST--FGHQWERT-----TYREWYEG
ASDEYAHLRILDPQRSTVAYAYE--KSFAPPGSFKWEYEA
MCDEYAHIRLMNPERSTVAGGHQWERT----GSFKEWYAA
SCR#1 SCR#2
8/6/2019 Structural Bioinfo
45/76
Structurally Conserved regions (SCRs)
Corresponds to the most stable structures orregions (usually interior) of protein.
Corresponds to sequence regions with lowestlevel of gapping, highest level of sequenceconservation.
Usually corresponds to secondary structures.
8/6/2019 Structural Bioinfo
46/76
Step 4: Find SVRs
ACDEFGHIKLMNPQRST--FGHQWERT-----TYREWYEG
ASDEYAHLRILDPQRSTVAYAYE--KSFAPPGSFKWEYEA
MCDEYAHIRLMNPERSTVAGGHQWERT----GSFKEWYAA
HHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCBBBBBBBBB
Query
Hit #1Hit #2
SVR Loop
8/6/2019 Structural Bioinfo
47/76
8/6/2019 Structural Bioinfo
48/76
Step 5: Side Chain Modeling
Rotamer placement and positioning is done via a
superposition algorithm using rotamers.
8/6/2019 Structural Bioinfo
49/76
Step 6: Model Optimization
Efficient way of polishing and shining your protein
model
Removes atomic overlaps and unnatural strains in the
structure
Stabilizes or reinforces strong hydrogen bonds, breaksweak ones
Brings protein to lowest energy in about 1-2 minutes
CPU time
Several freeware options to choose XPLOR (Axel Brunger,Yale)
GROMACS (Gronnigen, The Netherlands)
AMBER (Peter Kollman, UCSF)
CHARMM (Martin Karplus, Harvard)
TINKER (Jay Ponder, Wash U))
8/6/2019 Structural Bioinfo
50/76
Step 7: Model Validation
PROCHECK -http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html
PROSA II -http://lore.came.sbg.ac.at/People/mo/Prosa/prosa.html
VADAR -http://www.pence.ualberta.ca/ftp/vadar/
DSSP -http://www.embl-heidelberg.de/dssp/
8/6/2019 Structural Bioinfo
51/76
Homology Modeling On Web
http://www.expasy.ch/swissmod/SW ISS-MODEL.html
8/6/2019 Structural Bioinfo
52/76
http://www.cmbi.kun.nl:1100/W IWWWI/
8/6/2019 Structural Bioinfo
53/76
http://cl.sdsc.edu/hm.html
8/6/2019 Structural Bioinfo
54/76
Raw Sequence
Predicted structure
Use templates to buildthe structure of the homologous
sequence
8/6/2019 Structural Bioinfo
55/76
MQQPMNYPCP QIFWVDSSAT SSWAPPGSVF PCPSCGPRGP DQRRPPPPPPPVSPLPPPSQPLPLPPLTPL KKKDHNTNLW LPVVFFMVLV ALVGMGLGMY QLFHLQKELA
ELREFTNQSLKVSSFEKQIA NPSTPSEKKE PRSVAHLTGN PHSRSIPLEW EDTYGTALISGVKYKKGGLVINETGLYFVY SKVYFRGQSC NNQPLNHKVY MRNSKYPEDL VLMEEKRLNYCTTGQIWAHSSYLGAVFNLT SADHLYVNIS QLSLINFEES KTFFGLYKL
Use of SwissPDB Viewer to build the structure offollowing sequence
8/6/2019 Structural Bioinfo
56/76
8/6/2019 Structural Bioinfo
57/76
8/6/2019 Structural Bioinfo
58/76
8/6/2019 Structural Bioinfo
59/76
8/6/2019 Structural Bioinfo
60/76
8/6/2019 Structural Bioinfo
61/76
8/6/2019 Structural Bioinfo
62/76
1TNRADOGB
8/6/2019 Structural Bioinfo
63/76
8/6/2019 Structural Bioinfo
64/76
After magic fit
8/6/2019 Structural Bioinfo
65/76
Activate the raw sequence
8/6/2019 Structural Bioinfo
66/76
8/6/2019 Structural Bioinfo
67/76
The Preliminary Result
8/6/2019 Structural Bioinfo
68/76
Protein Threading
Makes structure prediction through identification of good sequence-structure fit.
Protein threading can predict only the backbone structure of a protein (side-chainshave to be predicted using other methods)
Predicted Actual
8/6/2019 Structural Bioinfo
69/76
Ab Initio 3D structure prediction
Aims to predict tertiary structure from basic physico-chemicalproperties.
It is used when Homology Modeling & Threading have failed(no homologies are evident ).
Does not rely on any detection of similarity to sequence ofknown structure.
As yet very unreliable for practical predictions.
8/6/2019 Structural Bioinfo
70/76
8/6/2019 Structural Bioinfo
71/76
Analog Based Design
The analog based approach mainly uses
Pharmacophoric maps and Quantitative structure
Activity Relationship (QSAR) to identify or modify alead in the absence of a known 3D structure of the
receptor.
8/6/2019 Structural Bioinfo
72/76
Structure-Based Design
Structure-based approach starts with thestructure of the receptor site, such as the
active site in protein.
Docking comes under this category of design.
8/6/2019 Structural Bioinfo
73/76
Quantitative Structure Activity relationship(QSAR)
QSAR is an applied series of mathematical models built to predict biologicaland physicochemical behavior of molecules based on their chemicalstructures.
It alleviates the need to determine molecular activity of hundreds of similarcompounds that would take large amounts of resources to determineindividually.
The underlying premise of QSAR is that Biological Activity is correlated to its
physiochemicalparameters.
BA = f (biological + Chemical + Physical)
Biological activity can be any measured such as IC50, orED50.
8/6/2019 Structural Bioinfo
74/76
QSAR Table
Structure Bioproperty Structural properties
Comp.1 Bio1 P1 P2 P3 P4
Comp.2 Bio2 " " " "
Comp.3 Bio3 " " " "
Comp.4 Bio3 " " " "
BA = k1P1 + k2P2 + k3P3 + ...
8/6/2019 Structural Bioinfo
75/76
EXTERNAL VALIDATION OF QSARMODELS
Entire dataset
Test setTraining set
Model development (q2) Prediction of thetest set (R2)
8/6/2019 Structural Bioinfo
76/76
Thank You