CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510...

50
8/19/2005 Su-Shing Chen, CISE 1 CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE

Transcript of CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510...

Page 1: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 1

CAP 5510Lecture 3 Protein Structures

Su-Shing ChenBioinformaticsCISE

Page 2: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 2

Protein Conformation

Page 3: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 3

Protein Conformational Structures

� Hydrophobicity (lack affinity to water), hydrogen bonding, handedness, and tension between hierarchy and interactions (electrostatic and van der Waals), a protein structure is a complex geometric pattern of polypeptide, side chains, and the solvent environment.

� The protein in solvent has the conformation of minimum free energy. Molecular dynamics of the potential energy with some nonlinear force termsgives the conformation structure.

Page 4: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 4

Bond Distance d

Bond Angle Φ

Torsion Angle ΨΦ

Ψd

GeometricFeatures

N

Cn-1

Cαn-1

On-1

CO

H(Carbonyl)

(Nitrogen)

(Hydrogen)

(α Carbon)

Page 5: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 5

COOH

H2N Cα H

R

Basic structure of an amino acid

(amino group)

(Carbonyl group)

(side chain)

Page 6: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 6

Secondary Structures� Alpha helix – repeated curvature

(bond) and torsion φ, ψφ, ψφ, ψφ, ψ angles, repeating patterns of hydrogen bonding between CO of residue n and NH of residue n+4.

� Beta sheet – repeating patterns of hydrogen bonding between distant parts of the backbone.

� Random coil

Page 7: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 7

Secondary Structures

Page 8: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 8

Structure Databases� 3-D biomolecular structures of

protein amino acid sequences.� 3-D structures are determined by

X-ray crystallography and nuclear magnetic resonance (NMR).

� Protein folding is a grand challenge problem: A primary protein sequence determines its 3-D structure – Anfinsen et al 1961

Page 9: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 9

How to Form 3-D Structures� Start from the NH2 terminus, we identify

each amino acid side chain by comparing the atomic structure of each residue with the chemical structure of the 20 amino acids.

� Each atom has x,y,z coordinate, together a ball-and-stick structure is formed.

� A chemical graph of chemical data associated with the ball-and-stick model.

Page 10: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 10

Atoms, Bonds and Energy� The bond length: average length of a

stable X-X bond is about ? angstroms.� The bond (curvature) angle Φ = κ� The torsion angle Ψ=τ

� Potential Energy = (1/2) ΣΣΣΣ cd(d-d0)2 +

(1/2) Σ cκ(κ−κ0)2 + (1/2) Σ cτ(1+cos(nτ-

δ) + Σ (Α/r12 – Β/r6 + q1q2/Dr).

Page 11: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 11

RMSE (Root Mean Square Error)

� Similarity measure of 3-D structures.

� X = {(x1, y1, z1),…, (xn, yn, zn)}� X’ = {(x’1, y’1, z’1),…, (x’n, y’n, z’n)}

� R(X,X’) = squareroot Σ (xi-x’i)2 +(yi-y’i)2+(zi-z’i)2

� R(X,X’) = squareroot Σ (di-d’i)2

+(κi-κ’i)2+(τi-τ’i)2

Page 12: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 12

Inverse Protein Folding - Threading

� Find amino acid sequences folding into a known 3-D structure.

� Sequence similarities > 30 %.� Profile method: Compatible

environments: area of buried residue inaccessible to solvent, side chains of polar O, N atoms, local secondary structures.

Page 13: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 13

Protein Superfamilies & Domain Superfolds

� Many protein structures are similar.� Protein domains of more than 30%

sequence similarity adopt the same fold structure.

� Some proteins with statistically insignificant sequence similarity have similar fold.

� Dayhoff: Families > 50% similarity, superfamilies > 30-40% similarity.

Page 14: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 14

Page 15: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 15

Geometric Features of Proteins

� S. Chen, Characterizing and learning of protein conformations, 1993.

� A set of points P(i) on the backbone.� A right handed orthonormal basis .{Ti, Ni,

Bi}. Ti is the (tangent) vector P(i)P(i+1). The binormal vector is Bi=Ti-1xTi/|Ti-1xTi|, normal to the plane P(i-1), P(i), P(i+1). The normal is Ni=BiXTi.

� The curvature ki is the angle between Ti-1 and Ti. The torsion is Bi and Bi+1.

Page 16: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 16

P(i+1)

P(i)

Ti

Ti+1P(i+2)

Bi+1

Ni+1

Page 17: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 17

Motivations to Study Protein Structures

� Proteins are interesting to look at !� Gene-sequencing projects are

accumulating gene data and protein sequences at a rapid rate. However information about their structure is available for only a small fraction.

� Understanding them might help reduce this gap.

Page 18: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 18

Secondary Structures Prediction� Protein structure prediction is one of the most

significant tasks tackled in computational structural biology. It has the aim of determining the three-dimensional structure of proteins from their amino acid sequences. In more formal terms, this is the prediction of protein tertiary structure from primary structure.

� Protein structure is a valuable resource in drug design and is an highly active field of research.

� The output of experimentally determined protein structures, typically by time-consuming and relatively expensive X-ray crystallography or NMR spectroscopy, is lagging far behind the output of protein sequences

Page 19: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 19

Chou-Fasman� Based on frequencies of residues in

alpha helices, beta sheets and turns.� Accuracy 50-60%

Page 20: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 20

Chou-Fasman

Page 21: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 21

Chou-Fasman� Assign Pij values1. Assign all of the residues the appropriate set of

parameters

T S P T A E L M R S T GP(H) 69 77 57 69 142 151 121 145 98 77 69 57P(E) 147 75 55 147 83 37 130 105 93 75 147 75

P(turn) 114 143 152 114 66 74 59 60 95 143 114 156

Page 22: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 22

Chou-Fasman� Scan peptide for α−helix regions� Identify regions where 4/6 have a P(H) >100 “alpha-helix nucleus”

T S P T A E L M R S T GP(H) 69 77 57 69 142 151 121 145 98 77 69 57

T S P T A E L M R S T GP(H) 69 77 57 69 142 151 121 145 98 77 69 57

Page 23: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 23

Chou-Fasman� Extend α-helix nucleus� Extend helix in both directions until a set of four

residues have an average P(H) <100.� Repeat steps 1 – 3 for entire peptide

T S P T A E L M R S T GP(H) 69 77 57 69 142 151 121 145 98 77 69 57

Page 24: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 24

Chou-Fasman� Scan peptide for β-sheet regionsIdentify regions where 3/5 have a P(E) >100 “b-sheet nucleus”� Extend b-sheet until 4 continuous residues an have an

average P(E) < 100

� If region average > 105 and the average P(E) > average P(H) then “b-sheet”

T S P T A E L M R S T GP(H) 69 77 57 69 142 151 121 145 98 77 69 57P(E) 147 75 55 147 83 37 130 105 93 75 147 75

Page 25: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 25

Chou-Fasman� To identify a bend at residue number j,

calculate the following value p(t) = f(j)f(j+1)f(j+2)f(j+3)

� where the f(j+1) value for the j+1 residue is used, the f(j+2) value for the j+2 residue is used and the f(j+3) value for the j+3 residue is used. If: (1) p(t) > 0.000075; (2) the average value for P(turn) > 1.00 in the tetrapeptide; and (3) the averages for the tetrapeptide obey the inequality P(a-helix) < P(turn) > P(b-sheet), then a beta-turn is predicted at that location

Page 26: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 26

Comparative (homolog) Modeling

� Homology modeling is based on the reasonable assumption that two homologous proteins will share very similar structures. Given the amino acid sequence of a unknown structure and the solved structure of a homologous protein, each amino acid in the solved structure is mutated, computationally, into the corresponding amino acid from the unknown structure.

Page 27: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 27

Comparative (homology) Modeling

Page 28: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 28

Homology Modeling� In homology modeling the over all fold of a protein is known.

The goal is to try to predict the detailed conformation of a protein given a homologous protein

� Comparative ("homology") modeling approximates the 3D structure of a target protein for which only the sequence is available, provided an empirical 3D "template" structure is available with >30% sequence identity

� Suppose you want to know the 3D structure of a targetprotein that has not been solved empirically by X-ray crystallography or NMR. You have only the sequence. If an empirically determined 3D structure is available for a sufficiently similar protein (50% or better sequence identity would be good), you can use software that arranges the backbone of your sequence identically to this template. This is called "comparative modeling" or "homology modeling". It is, at best, moderately accurate for the positions of alpha carbons in the 3D structure, in regions where the sequence identity is high. It is inaccurate for the details of sidechainpositions, and for inserted loops with no matching sequence in the solved structure.

Page 29: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 29

SWISS-PDB Viewer

Page 30: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 30

Protein Threading� Protein threading scans the amino acid

sequence of a unknown structure against a database of solved structures. In each case, a scoring function is used to assess the compatibility of the sequence to the structure, thus yielding possible three-dimensional models.

� Its possible that two protein have less than 25% pairwise sequence identity but however have similar protein structure.

� In these cases remote homology modelling is required.

Page 31: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 31

Protein Threading� The algorithm starts with target protein sequence

aligned with SWISS-PORT protein sequences. The resulting multiple sequence is converted into a 1D structural profile. So the amino acid sequences now been translated into a 1D string of structure symbols.

� Now the idea is to find a 3D fold that is similar to our structure.

� Finally, predicted and observed 1D structure profiles were optimally aligned by a dynamic programming algorithm

� The best hit of the alignment procedure is recorded and a 3D model is build from there.

Page 32: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 32

Protein Threading

Page 33: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 33

Protein Threading

Page 34: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 34

Ab Intito Folding� Researchers have pursued the problem of predicting

three-dimensional protein structure only from the amino acid sequence

� Ab initio folding is based on the global optimization of a potential energy function and in general does not use knowledge of experimentally determined protein structures. Present ab initio folding methods require intense and exhaustive computing time, which increases as a function of the length of the protein. This limitation is due in part to the assumption that the initial condition for the ab initio folding protein is the linear sequence of residues comprising the protein as encoded by the gene. It is also due to optimizing based on all atom potential energy functions and the use of suboptimal global optimization techniques

Page 35: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 35

Prediction of transmembrane proteins

� transmembraneproteins - the polypeptide chain actually traverses the lipid bilayer.

Page 36: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 36

Why are they important� Membrane proteins are important

for several processes and functions in all biological systems� Receptors for neurotransmitters or

hormones� Form ion channels� Serve as the respiratory chain� Nearly 30% of known proteins are

membrane bound

Page 37: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 37

Why Is Prediction Of Transmembrane Regions Important?� Bad News

� Even though X-Ray crystallography is becoming more popular transmembraneproteins are very difficult to crystallize

� Good News� It is commonly accepted that topology

prediction of transmembrane proteins is easier and yields higher accuracy than the prediction of the secondary structure of globular proteins

Page 38: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 38

Properties Of A Membrane Protein� Traverses the lipid bi-layer once or

several times� Generally possess sequences of

hydrophobic residues� α-helical transmembrane structure

� Typically 17 to 25 residues in length

Page 39: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 39

Brief TransmembranePrediction History� Cell membrane is a lipid – nonpolor layer

� First attempts used this information to label sequences of non-polar residues as potential transmembrane regions

� Accuracy was increased by considering the charge distribution between inside the cell and outside the cell segments� Environment in the cell different from

outside the cell� Prediction using neural nets� Using HMM (Hidden Markov Model)

Page 40: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 40

Polar and nonpolar amino acids

Page 41: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 41

Neural Networks� The network attempts to determine the next state given the current

state and input. This approach is recursive because the state calculated is used in the next step as the previous state for the network.

� The choice of neural networks as the empirical learning system on which to build was made for a couple of reasons. One basic reason is that networks provide a very general mechanism for representing concepts. A neural network, given the proper numberof hidden units and hidden layers, can learn almost any type of concept. A second reason for using neural networks is that they generally deal very well with noisy and incorrect data.

� limitations of neural networks, one basic problem is how to go about selecting the topology of the network

Page 42: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 42

Example

Page 43: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 43

Neural Network for Protein Structure Prediction

Page 44: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 44

Hidden Markov Model� Widely used in bioinformatics

� Sequence alignment, generating profiles for protein families and database searching

� Can be tailored to particular problems� Any known structural knowledge can be

incorporated into the models architecture in order to obtain a more accurate prediction

� A set of states, rules for changing states, and probabilities of state transitions

Page 45: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 45

HMM Architecture

Page 46: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 46

Parameters Of The Model

� Fixed Length Sequences� Helix Length

� Min 17 and Max 25 residues

� Tail Length� Min 1 and Max 15 Residues

� Train HMM

Page 47: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 47

HMM� By defining states for transmembrane

helix residues and other states for residues in loops, residues on either side of the membrane, and connecting them in a cycle, we can produce a model that in architecture closely resembles the biological system we are modelling.

� If the model parameters are tuned to capture the biological reality, the path of a protein sequence through the states with the highest probability should be able to predict the true topology.

Page 48: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 48

HMM Results

Page 49: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 49

Problem Studied in Earlier Classes

Page 50: CAP 5510 Lecture 3 Protein Structuressuchen/cap5510fall/ChenLecture... · 2006-10-23 · CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE. 8/19/2005 Su-Shing

8/19/2005 Su-Shing Chen, CISE 50

No structures for Cellulose Synthase