Proteins Secondary Structure Predictions

39
Proteins Secondary Structure Predictions ructural Bioinformati

description

Structural Bioinformatics. Proteins Secondary Structure Predictions. Structure Prediction Motivation. Better understand protein function Broaden homology Detect similar function where sequence differs (only ~50% remote homologies can be detected based on sequence) Explain disease - PowerPoint PPT Presentation

Transcript of Proteins Secondary Structure Predictions

Page 1: Proteins  Secondary Structure Predictions

Proteins SecondaryStructure Predictions

Structural Bioinformatics

Page 2: Proteins  Secondary Structure Predictions

2

Structure Prediction Motivation

• Better understand protein function

• Broaden homology– Detect similar function where sequence differs

(only ~50% remote homologies can be detected based on sequence)

• Explain disease– Explain the effect of mutations – Design drugs

Page 3: Proteins  Secondary Structure Predictions

3

“ Perhaps the most remarkable features of the molecule are its complexity and its lack of symmetry. The arrangement seems to be almost totally lacking in the kind of regularities which one instinctively anticipates.”

Solved in 1958 by Max Perutz John Kendrew of Cambridge University.

Won the 1962 and Nobel Prize in Chemistry.

Myoglobin – the first high resolution protein structure

Page 4: Proteins  Secondary Structure Predictions

4

Predicting the three dimensional structure from sequence of a protein is very hard

(some times impossible)

However we can predict with relative high precision the secondary structure

MERFGYTRAANCEAP….

Page 5: Proteins  Secondary Structure Predictions

What do we mean by Secondary Structure ?

Secondary structure are the building blocks of the protein structure:

=

Page 6: Proteins  Secondary Structure Predictions

6

What do we mean by Secondary Structure ?

Secondary structure is usually divided into three categories:

Alpha helix Beta strand (sheet)Anything else –

turn/loop

Page 7: Proteins  Secondary Structure Predictions

7

3.6 residues

5.6 Å

Alpha Helix: Pauling (1951)

• A consecutive stretch of 5-40 amino

acids (average 10).

• A right-handed spiral conformation.

• 3.6 amino acids per turn.

• Stabilized by H-bonds

Page 8: Proteins  Secondary Structure Predictions

8

Beta Strand: Pauling and Corey (1951)

• Different polypeptide chains run alongside each

other and are linked together by hydrogen bonds.

• Each section is called β -strand,

and consists of 5-10 amino acids.

β -strand

Page 9: Proteins  Secondary Structure Predictions

9

The strands become adjacent to each other, forming beta-sheet.

Beta SheetBeta Sheet3.47Å

4.6Å

3.25Å

4.6Å

Antiparallel

Parallel

Page 10: Proteins  Secondary Structure Predictions

10

Loops

• Connect the secondary structure elements.

• Have various length and shapes.

• Located at the surface of the folded protein and therefore may have important role in biological recognition processes.

Page 11: Proteins  Secondary Structure Predictions

11

Three dimensional Tertiary Structure

Describes the packing of alpha-helices, beta-sheets and random coils with respect to each other on the

level of one whole polypeptide chain

Page 12: Proteins  Secondary Structure Predictions

12

RBP

Globin

Tertiary

Secondary

Page 13: Proteins  Secondary Structure Predictions

13

How do the (secondary and tertiary) structures relate to the primary

protein sequence??

Page 14: Proteins  Secondary Structure Predictions

14

-Early experiments have shown that the sequence of the protein is sufficient to determine its structure (Anfisen)

- Protein structure is more conserved than

protein sequence and more closely related

to function.

STRUCTURESEQUENCE

Page 15: Proteins  Secondary Structure Predictions

15

How (CAN) Different Amino Acid Sequence Determine Similar Protein

Structure ??

Lesk and Chothia 1980

Page 16: Proteins  Secondary Structure Predictions

16

The Globin Family

Page 17: Proteins  Secondary Structure Predictions

17

Different sequences can result in similar structures

1ecd 2hhd

Page 18: Proteins  Secondary Structure Predictions

18

We can learn about the important features which determine structure and function by comparing the sequences and structures ?

Page 19: Proteins  Secondary Structure Predictions

19

The Globin Family

Page 20: Proteins  Secondary Structure Predictions

20

Why is Proline 36 conserved in all the globin family ?

Page 21: Proteins  Secondary Structure Predictions

21

Where are the gaps??

The gaps in the pairwise alignment are mapped to the loop regions

Page 22: Proteins  Secondary Structure Predictions

22

How are remote homologs related in terms of their structure?

retinol-binding protein

odorant-binding protein

apolipoprotein D b-lactoglobulin

RBD

Page 23: Proteins  Secondary Structure Predictions

23

PSI-BLAST alignment of RBP and -lactoglobulin: iteration 3

Score = 159 bits (404), Expect = 1e-38Identities = 41/170 (24%), Positives = 69/170 (40%), Gaps = 19/170 (11%)

Query: 3 WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ 54 V L+ LA A + S V+ENFD ++ G WY + K Sbjct: 1 MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE-KG 59

Query: 55 DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114 + I A +S+ E G + K V + ++ +PAK +++++ + Sbjct: 60 NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112

Query: 115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA 164 +WI+ TDY+ YA+ YSC + ++ R+P LPPE Sbjct: 113 MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET 159

Page 24: Proteins  Secondary Structure Predictions

24

The Retinol Binding Protein b-lactoglobulin

Page 25: Proteins  Secondary Structure Predictions

Structure Prediction: Motivation

• Hundreds of thousands of gene sequences translated to proteins (genbanbk, SW, PIR)

• Only about ~50000 solved protein structures• Experimental methods are time consuming and not

always possible

• Goal: Predict protein structure based on sequence information

Page 26: Proteins  Secondary Structure Predictions

26

Prediction Approaches

• Tow stage

1. Primary (sequence) to secondary structure

2. Secondary to tertiary

• One stage

- Primary to tertiary structure

Page 27: Proteins  Secondary Structure Predictions

27

According to the most simplified model: • In a first step, the secondary structure is

predicted based on the sequence. • The secondary structure elements are then

arranged to produce the tertiary structure, i.e. the structure of a protein chain.

• For molecules which are composed of different subunits, the protein chains are arranged to form the quaternary structure.

Page 28: Proteins  Secondary Structure Predictions

Secondary Structure Prediction

• Given a primary sequence

ADSGHYRFASGFTYKKMNCTEAA

what secondary structure will it adopt ?

28

Page 29: Proteins  Secondary Structure Predictions

29

Secondary Structure Prediction Methods

• Chou-Fasman / GOR Method– Based on amino acid frequencies

• Machine learning methods– PHDsec and PSIpred

• HMM (Hidden Markov Model)

Page 30: Proteins  Secondary Structure Predictions

30

Chou and Fasman (1974)Name P(a) P(b) P(turn)

Alanine 142 83 66Arginine 98 93 95Aspartic Acid 101 54 146Asparagine 67 89 156Cysteine 70 119 119Glutamic Acid 151 037 74Glutamine 111 110 98Glycine 57 75 156Histidine 100 87 95Isoleucine 108 160 47Leucine 121 130 59Lysine 114 74 101Methionine 145 105 60Phenylalanine 113 138 60Proline 57 55 152Serine 77 75 143Threonine 83 119 96Tryptophan 108 137 96Tyrosine 69 147 114Valine 106 170 50

The propensity of an amino acid to be part of a certain secondary structure (e.g. – Proline has a low propensity of being in an alpha helix or beta sheet breaker)

Success rate of 50%

Page 31: Proteins  Secondary Structure Predictions

31

Secondary Structure Method Improvements

‘Sliding window’ approach• Most alpha helices are ~12 residues long

Most beta strands are ~6 residues long Look at all windows of size 6/12 Calculate a score for each window. If >threshold

predict this is an alpha helix/beta sheet

TGTAGPOLKCHIQWMLPLKK

Page 32: Proteins  Secondary Structure Predictions

32

Improvements since 1980’s

• Adding information from conservation in MSA

• Smarter algorithms (e.g. Machine learning, HMM).

Success -> 75%-80%

Page 33: Proteins  Secondary Structure Predictions

33

Machine learning approach for predicting Secondary Structure (PHD, PSIpred)

Step 1: Generating a multiple sequence alignment

Query

SwissProt

QuerySubjectSubjectSubjectSubject

Page 34: Proteins  Secondary Structure Predictions

34

Step 2:Additional sequences are added using a profile. We end up with a MSA which represents the protein family.

Query

seed

QuerySubjectSubjectSubjectSubject

MSA

Page 35: Proteins  Secondary Structure Predictions

35

The sequence profile of the protein family is compared (by machine learning methods) to sequences with known secondary structure.

Query

seed

QuerySubjectSubjectSubjectSubject

MSA Machine LearningApproach Known

structures

Step 3:

Page 36: Proteins  Secondary Structure Predictions

36

• HMM enables us to calculate the probability of assigning a sequence to a secondary structure

TGTAGPOLKCHIQWML HHHHHHHLLLLBBBBB

p? =

HMM approach for predicting Secondary Structure (SAM)

Page 37: Proteins  Secondary Structure Predictions

37

The probability of observing a residue which belongs to an α-helix followed by a residue belonging to a turn = 0.15

The probability of

observing Alanine as part of a β-

sheet

Table built according to large database of known secondary structures

α-helix followed by

α-helix

Beginning with an α-

helix

Page 38: Proteins  Secondary Structure Predictions

38

• The above table enables us to calculate the probability of assigning secondary structure to a protein

• Example

TGQHHH

p = 0.45 x 0.041 x 0.8 x 0.028 x 0.8x 0.0635 = 0.0020995

Page 39: Proteins  Secondary Structure Predictions

39

Secondary structure prediction

• AGADIR - An algorithm to predict the helical content of peptides • APSSP - Advanced Protein Secondary Structure Prediction Server • GOR - Garnier et al, 1996 • HNN - Hierarchical Neural Network method (Guermeur, 1997) • Jpred - A consensus method for protein secondary structure prediction at University

of Dundee • JUFO - Protein secondary structure prediction from sequence (neural network) • nnPredict - University of California at San Francisco (UCSF) • PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader, MaxHom,

EvalSec from Columbia University • Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction • PSA - BioMolecular Engineering Research Center (BMERC) / Boston • PSIpred - Various protein structure prediction methods at Brunel University • SOPMA - Geourjon and Delיage, 1995 • SSpro - Secondary structure prediction using bidirectional recurrent neural networks

at University of California • DLP - Domain linker prediction at RIKEN