Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a...

48
Protein structure prediction.

Transcript of Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a...

Page 1: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Protein structure prediction.

Page 2: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Protein folds.

• Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity (topology). Sometimes a few SSEs may be missing.

• Fold classification: structural similarity between folds is searched using structure-structure comparison algorithms.

Page 3: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Protein structure prediction flowchart

Protein sequence

Database similarity

search

Does sequence

align with a protein of

known structure?

Protein family

analysis

Relationship to known structure?

Three-dimensional comparative

modeling

Predicted three-dimensional

structural model

Structural analysis

Is there a predicted structure?

Three-dimensional

structural analysis in laboratory

No

Yes

Yes

NoYes

No

From D.W.Mount

Page 4: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Protein structure prediction.

Prediction of three-dimensional structure of a protein from its sequence. Different approaches:

- Homology modeling (query protein has a very close homolog in the structure database).

- Fold recognition (query protein can be mapped to template protein with the existing fold).

- Ab initio prediction (query protein has a new fold).

Page 5: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Homology modeling.

Aims to produce protein models with accuracy close to experimental and is used for:

- Protein structure prediction- Drug design- Prediction of functionally important sites (active

or binding sites)

Page 6: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Steps of homology modeling.

1. Template recognition & initial alignment.

2. Backbone generation.

3. Loop modeling.

4. Side-chain modeling.

5. Model optimization.

Page 7: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

1. Template recognition.

Recognition of similarity between the target and template.

Target – protein with unknown structure.

Template – protein with known structure.

Main difficulty – deciding which template to pick, multiple choices/template structures.

Template structure can be found by searching for structures in PDB using pairwise sequence alignment methods.

Page 8: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Two zones of protein structure prediction.

50 100 150 200

50

100

Homology modeling zone

Fold recognition zone

Alignment length

Sequence identity

Page 9: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

2. Backbone generation.

If alignment between target and template is ready, copy the backbone coordinates of those template residues that are aligned.

If two aligned residues are the same, copy their side chain coordinates as well.

Page 10: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

3. Insertions and deletions. insertion

AHYATPTTT AH---TPSS deletion Occur mostly between secondary structures, in the loop

regions. Loop conformations – difficult to predict.

Approaches to loop modeling:- Knowledge-based: search the PDB for loops with known

structures- Energy-based: an energy function is used to evaluate the

quality of a loop. Energy minimization or Monte Carlo.

Page 11: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

4. Side chain modeling.Side chain conformations – rotamers. In similar proteins -

side chains have similar conformations. If % identity is high - side chain conformations can be copied

from template to target. If % identity is not very high - modeling of side chains using libraries of rotamers and different rotamers are scored with energy functions.

Problem: side chain configurations depend on backbone conformation which is predicted, not real

E1

E2

E3E = min(E1, E2, E3)

Page 12: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

5. Model optimization.

Energy optimization of entire structure.

Since conformation of backbone depends on conformations of side chains and vice versa - iteration approach:

Predict rotamers Shift in backbone

Page 13: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.
Page 14: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Classwork: Homology modeling.

- Go to NCBI Entrez, search for gi461699

- Do Blast search against PDB

- Repeat the same for gi60494508

- Compare the results

Page 15: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Fold recognition.

Unsolved problem: direct prediction of protein structure from the physico-chemical principles.

Solved problem: to recognize, which of known folds are similar to the fold of unknown protein.

Fold recognition is based on observations/assumptions:- The overall number of different protein folds is limited

(1000-3000 folds)

- The native protein structure is in its ground state (minimum energy)

Page 16: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Fold recognition.

Goal: to find protein with known structure which best matches a given sequence.

Since similarity between target and the closest template is not high, pairwise sequence alignment methods fail.

Solution: threading – sequence-structure alignment method.

Page 17: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Threading – method for structure prediction.

Sequence-structure alignment, target sequence is compared to all structural templates from the database.

Requires:- Alignment method (dynamic programming, Monte

Carlo,…)- Scoring function, which yields relative score for

each alternative alignment

Page 18: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Target sequence

Structural templates

Score1 Score2 Score3

Protein structure prediction: target sequence is compared to structures using sequence-

structure alignment

Concept of threading: D. Jones et al, 1993

Page 19: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Target sequence

Structural templates

Score1Score2 Score3 Structural

model of target

Score3>Score2>Score1

Protein structure prediction: target sequence is compared to structures using sequence-

structure alignment

Page 20: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Scoring function for threading.

• Contact-based scoring function depends on amino acid types of two residues and distance between them.

• Sequence-sequence alignment scoring function does not depend on the distance between two residues.

• If distance between two non-adjacent residues in the template is less than 8 Å, these residues make a contact.

Page 21: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Scoring function for threading.

),(),(;),(1,

TrpIlewTyrAlawSaawSN

jiji

Ala

Ile Tyr

Trp

“w” is calculated from the frequency of amino acid contacts in PDB; ai – amino acid type of target sequence aligned with the position “i” of the template; N- number of contacts

Page 22: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Classwork: calculate the score for target sequence “ATPIIGGLPY” aligned to template structure which

is defined by the contact matrix.

1 2 3 4 5 6 7 8 9 10

1 * * *

2

3 *

4 *

5 * *

6 *

7 *

8 *

9

10 * *

A T P Y I G L

A -0.2 -0.1 0 -0.1 0.5 -0.2 0.2

T 0.3 -0.1 -0.2 -0.3 0.1 0

P -0.2 -0.4 -0.1 0.1 -0.2

Y -0.4 -0.2 -0.1 -0.2

I 0.3 0.2 0.4

G 0.4 0.2

L 0.3

Page 23: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Optimize the Sum ofResidue-Residue

Contact Potentials ...

…. by a Monte CarloAlignment Algorithm

Page 24: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Evaluation of quality of structural model

• Correct bond length and bond angles

• Correct placement of functionally important sites

• Prediction of global topology, not partial alignment (minimum number of gaps)

>> 3.8 Angstroms

Page 25: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Success and limitations of structure prediction

Success:• Accuracy scores almost doubled

from CASP1 to CASP6, might be because of database size

• Models of small targets are very accurate

Adapted from Kryshtafovych et al 2005

Limitations:• Models of large and remotely related proteins

are not very accurate• Domain boundaries are difficult to define• Models often do not provide details for

functional annotation

Page 26: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

GenThreader http://bioinf.cs.ucl.ac.uk/psipred.

1. Predicts secondary structures for target sequence.

2. Makes sequence profiles (PSSMs) for each template sequence.

3. Uses threading scoring function to find the best matching profile.

Page 27: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Protein-protein interactions.

Page 28: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Common properties of protein-protein interactions.

• Majority of protein complexes have a buried surface area ~1600±400 Ǻ^2 (“standard size” patch).

• Complexes of “standard size” do not involve large conformational changes while large complexes do.

• Protein recognition site consists of a completely buried core and a partially accessible rim.

• Trp and Tyr are abundant in the core, but Ser and Thr, Lys and Glu are particularly disfavored.

Top molecule

Bottom molecule

rim

core

Page 29: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Different types of protein-protein interactions.

• Permanent and transient.

• External are between different chains; internal are within the same chain.

• Homo- and hetero-oligomers depending on the similarity between interacting subunits.

• Interface type can be predicted from amino acid composition (Ofran and Rost 2003).

Page 30: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Experimental methods

Page 31: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Verification of experimental protein-protein interactions.

• Protein localization method.

• Expression profile reliability method.

• Paralogous verification method.

Page 32: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Protein localization method.

Sprinzak, Sattath, Margalit, J Mol Biol, 2003

A – A3: Y2HB: physical methodsC: geneticsE: immunological

True positives:- Proteins which are localized in the

same cellular compartment- Proteins with a common cellular role

Page 33: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Deane, C. M. (2002) Mol. Cell. Proteomics 1: 349-356

Expression profile reliability method.

Page 34: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Deane, C. M. (2002) Mol. Cell. Proteomics 1: 349-356

Paralogous verification method.

PVM method is based on observation that if two proteins interact, their paralogs would interact. Calculates the number of interactions between two families of paralogous proteins.

Page 35: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Interaction databases

• Experiment (E)• Structure detail (S)• Predicted

– Physical (P)– Functional (F)

• Curated (C)• Homology

modeling (H)• *IMEx consortium

Page 36: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Protein interaction databases

• Protein-protein interaction databases

• Domain-domain interaction databases

Page 37: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

DIP database

• Documents protein-protein interactions from experiment– Y2H, protein microarrays,

TAP/MS, PDB

• 55,733 interactions between 19,053 proteins from 110 organisms.

Organisms # proteins # interactions

Fruit fly 7052 20,988

H. pylori 710 1425

Human 916 1407

E. coli 1831 7408

C. elegans 2638 4030

Yeast 4921 18,225

Others 985 401

Page 38: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

DIP database

Duan et al., Mol Cell Proteomics, 2002

• Assess quality– Via proteins: PVM, EPR– Via domains: DPV

• Search by BLAST or identifiers / text

Page 39: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

BIND database

• Records experimental interaction data

• 83,517 protein-protein interactions

• 204,468 total interactions• Includes small molecules,

NAs, complexes

Alfarano et al., Nucleic Acids Res, 2005

Page 40: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Classwork.

• Go to DIP webpage (http://dip.doe-mbi.ucla.edu)

• Retrieve all interactions for cytochrome C, tubulin, RNA-polymerase from yeast

• How many of them are confirmed by several experimental methods?

Page 41: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Protein interaction databases

• Protein-protein interaction databases

• Domain-domain interaction databases

Page 42: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

InterDom database

• Predicts domain interactions (~30000) from PPIs

• Data sources:– Domain fusions– PPI from DIP– Protein complexes– Literature

• Scores interactions

Ng et al., Nucleic Acids Res, 2003

Page 43: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Pibase database

• Records domain interactions from PDB and PQS

• Domains defined with SCOP and CATH

• All inter-domain and inter-chain distances within 6 Ǻ are considered interacting domains

• From interacting domain pairs, create list of interfaces with buried solvent accessible area > 300 Ǻ2

Page 44: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Classwork.

• Go to Pibase website http://salilab.org/pibase

• Select largest structural complexes, 1k73, 1i6h

• Compare two complexes in terms of the number of interacting domains, #interactions per node

Page 45: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

NCBI CBM database

• To retrieve interactions:– Record interactions– Use VAST structural

alignments to compare binding surfaces

– Study recurring domain-domain interactions

• CBM – database of interacting structural domains exhibiting Conserved Binding Modes

Shoemaker et al., Protein Sci, 2006.

Page 46: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Definition of CBM

• Interacting domain pair – if at least 5 residue-residue contacts between domains (contacts – distance of less than 8 Ǻ)

• Structure-structure alignments between all proteins corresponding to a given pair of interacting domains

• Clustering of interface similarity, those with >50% equivalently aligned positions are clustered together

• Clusters with more than 2 entries define conserved binding mode.

Page 47: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Number of interacting pairs and binding modes

• 833 conserved interaction types• 1,798 total domain interaction types• Up to 24 CBMs per interaction type

• Classify complicated domain pairs by CBMs

• Globin example:– 630 pairs– 2 CBMs account for majority

CBM Structures Species

1 154 Jawed vertebrates

2 112 Jawed vertebrates

3 17 Clam,earthworm

4 4 lamprey

5 4 V.stercoraria

6 2 Rice,soybeans

7 2 human

8 2 lamprey

Page 48: Protein structure prediction.. Protein folds. Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity.

Classwork.

• Retrieve structures 1GY3, 1E9H, 1OL2

• Examine all interactions within and between chains/domains.

• How many CBMs do you find?