Structural alignment [email protected]. Protein structure Every protein is defined by a unique...

20
Structural alignment [email protected]

Transcript of Structural alignment [email protected]. Protein structure Every protein is defined by a unique...

Page 1: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

Structural alignment

[email protected]

Page 2: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

Protein structure

Every protein is defined by a unique sequence (primary structure) that folds into a unique shape (tertiary or three-dimensional structure).

However, proteins with similar sequences adopt very similar structures.

Cyclophilin from B. malayi Cyclophilin A from H. sapiens

Page 3: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

Why structural alignment ?

we have sequence alignment - Clustal…

KTHLCV

KSHA -V that gives us an idea about a correspondence of amino acids of two (or more ) proteins

That enables to infer information about function

And evolution of the Protein

If the sequences are similar enough !!!!

Page 4: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

What is twilight zone ?

Sequence alignment unambiguously distinguish only between protein pairs of similar structure and non-similar structures when the pairwise sequence identity is high.

High sequence identity roughly means over 40 %.

The signal gets blurred in the twilight zone of 20-35 % sequence identity.

Page 5: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

More of the twilight zone

More than 90 % sequence pairs with the sequence identity lower than 25 % have different structures.

Significance of sequence alignments is length dependent.

The longer the sequence the lower identity is required to be be called significant.Nevertheless, it converges to 25% with alignments longer than 80 amino acids.

‘The more similar than identical’ rule can reduce a number of false positives.

Using of intermediate sequences for finding links between more distant families can also reduce a number of false positives.

Page 6: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

How far can the sequence identity drop?

Average sequence identity of random alignments - 5.6 %

Average sequence identity of remote homologues 8.5 %

Page 7: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

How does it work?

From http://www.biochem.unizh.ch/antibody/Introduction/Institutsseminar97/source/slide2.htm

Page 8: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

Numbers

Given the average length of a protein 300 amino acid, there are 20300 possibilities of building the average protein - more than atoms in universe.

In reality just few hundred thousand sequences are known.

It is believed that a number of basic protein folds is between 1500 - 5000.

Page 9: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

Structural alignment because:

Structures are better conserved than sequences

structural alignment can imply a functional similarity that is not detectable from a sequence alignment .

Might help to improve sequence alignment when structures are available (phylogenetic studies, homology modeling).

Will improve sequence alignment methods (use of structural alignments’ substitution matrices, gap penalties).

Will improve sequence prediction methods

Page 10: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS

Sequence versus structural alignment

Page 11: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

Material

Page 12: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

Is it difficult to make structural alignment?

Structural alignment is NP-hard (nondeterministic polynomial time) problem.

In other words, it is not tractable properly.

Even, if it would, the result would be correct from technical point of view not necessary from biological point of view.

Yes, it is.

Page 13: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

General solution

Use a heuristic approach:

1. Represent the proteins A and B in some coordinate independent space

2. Compare A and B

3. Optimize the alignment between A and B (e.g. minimize R.M.S.d.)

4. Measure the statistical significance of the alignment against some random set of structure comparisons

Page 14: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

“..in some coordinate independent space…”

Make the problem easier by:

- comparing only distance matrices of atoms

-comparing secondary

structure element (SSE)

- comparing cartoons

- comparing vectors of SSE

- combination of mentioned methods

- ….

Page 15: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

None of the methods guarantee the finding of the closest structure and two methods can disagree at all amino acid positions.

Nevertheless they can still provide a valuable insight into the history of the protein and give hints concerning the function.

Page 16: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

Server Location MethodCE http://cl.sdsc.edu Extension of optimal path1

DALI http://www2.ebi.ac.uk/dali Distance-matrix alignment2

DEJAVU http://portray.bmc.uu.se/cgi-bin/dennis/dejavu.pl SSE alignment with Catom optimisation3

LOCK http://gene.stanford.edu/LOCK/ Absolute orientation of corresponding points4

MATRAS http://bongo.lab.nig.ac.jp/~takawaba/Matras.html

Markov transition model of evolution5

PRIDE http://hydra.icgeb.trieste.it/pride/ CCatom distances6

SSM http://www.ebi.ac.uk/msd-srv/ssm/ssmstart.html Graph matching algorithm

TOP http://bioinfo1.mbfys.lu.se/TOP SSE alignment7

TOPS http:// tops.ebi.ac.uk/tops/compare1. html TOPS-diagram alignment8

TOPSCAN http://www.rubic.rdg.ac.uk/~andrew/bioinf.org/topscan

Secondary topology-string alignment9

VAST http://www.ncbi.nlm.nih.gov/Structure/VAST/vastsearch.html

Vector alignment10

Methods for fold comparison

Page 17: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

Protein structure classification

If you want to know which structures are similar to a known structure, these systems might help:

A) Manual - SCOP

B) Semi-automatic - CATH

C) Automatic - FSSP

Page 18: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

CATH

C (class) - secondary structure composition

A (architecture) - overall shape, secondary structure elements orientation

T (topology) - overall shape, secondary structure elements orientation + connectivity

H (homologous superfamily) - Sequence identity >= 35%, 60% of larger structure equivalent to smaller SSAP score >= 80.0 and sequence identity >= 20% 60% of larger structure equivalent to smaller SSAP score >= 80.0, 60% of larger structure equivalent to smallerand domains which have related functions

S (sequence families) - clustering based on the sequence identity level

Page 19: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

Summary

Structural alignment can help with protein annotations even when the sequence similarity is not significant.

Sequence identity of two proteins with similar structures can be lower than 10 % - number of folds is limited.

Recent progress in the protein structure determination increases the usefulness of structural alignment.

Structural alignment is difficult problem that is solved by heuristic methods.

These methods simplify the problem by moving from 3D space to 2D space sacrificing the optimum result for the speed.

Page 20: Structural alignment marian@xray.bmc.uu.se. Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

Summary II

Different methods can provide completely different alignments.

In our results, CE, Dali,Matras and Vast were the best servers for finding structural relatives.

A few structural classification systems were developed (CATH, FSSP, SCOP), they provide hierarchical classification of protein structures and enable to infer functional and evolutional relationships between proteins.