Mutiple Motifs

25
Mutiple Motifs Charles Yan Spring 2006

description

Mutiple Motifs. Charles Yan Spring 2006. Mutiple Motifs. From Single Motif to Multiple Motifs. One single motif is not sufficent to discriminate a protein family. Multiple motifs have stronger discriminating power. Multiple Motifs. Protein function prediction using multiple motifs - PowerPoint PPT Presentation

Transcript of Mutiple Motifs

Page 1: Mutiple Motifs

Mutiple Motifs

Charles YanSpring 2006

Page 2: Mutiple Motifs

2

Mutiple Motifs

Page 3: Mutiple Motifs

3

From Single Motif to Multiple MotifsOne single motif is not sufficent to discriminate a

protein family. Multiple motifs have stronger discriminating power.

Page 4: Mutiple Motifs

4

Multiple Motifs

Protein function prediction using multiple motifs

Each protein family is characterized by a set of motifs (in stead of a single one).

If a protein contain a set of motifs, it probably belong to the family that the set of motifs correspond to.

Page 5: Mutiple Motifs

5

PRINTS PRINTS (

http://umber.sbs.man.ac.uk/dbbrowser/PRINTS/ ) is a database of protein fingerprints.

A fingerprint is a group of conserved motifs used to characterize a protein family;

ftp.bioinf.man.ac.uk/pub/prints PRINTS is now maintained at the University of

Manchester PRINTS VERSION 38.0 (16 June, 2005) 1900 FINGERPRINTS, encoding 11,435 single

motifs

Page 6: Mutiple Motifs

6

PRINTS Each fingerprint has been defined and iteratively refined

using database SWISS-PROT/TrEMBL composite. Two types of fingerprint are represented in the database,

i.e. they are either simple or composite, depending on their complexity: simple fingerprints are essentially single-motifs; while composite fingerprints encode multiple motifs. The bulk of the database entries are of the latter type because discrimination power is greater for multi-component searches.

Usually the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space.

Fingerprints can encode protein folds and functionalities more flexibly and powerfully than can single motifs, full diagnostic potency deriving from the mutual context provided by motif neighbors.

Page 7: Mutiple Motifs

7

PRINTS A motif is a conserved element corresponding to

a region whose function or structure is known. It is likely to be predictive of any subsequent occurrence of such a structural/functional region in any other protein sequence.

A motif is represented as a conserved alignment of multiple sequence.

A fingerprint is a set of motifs used to predict the occurrence of similar motifs, either in an individual sequence.

Page 8: Mutiple Motifs

8

PRINTS

Page 9: Mutiple Motifs

9

PRINTS The starting point is a multiple sequence

alignment of a small number of sequences Once a motif, or set of motifs, has been identified,

the conserved regions are excised in the form of local alignments

The motif/s are used to scan against the database Only those sequences that match with all motifs

are regarded as true matches The additional sequence data from the new true

set is then used to generate another set of aligned motifs, and the database is searched again

Until converge

Page 10: Mutiple Motifs

10

PRINTS

Page 11: Mutiple Motifs

11

PRINTS

Page 12: Mutiple Motifs

12

PRINTSa) General field

Page 13: Mutiple Motifs

13

PRINTSb) Summary fieldA good fingerprint

should exhibit a clear discrimination cut-off, i.e. shows all true positives matching with all n motifs, perhaps some noise, and few or no matches at intermediate positions of the summary table.

Page 14: Mutiple Motifs

14

PRINTS Motif name Iteration number PCODE: the protein

identification codes of the initial sequences

ST: the location of the motifs within those sequences,

INT: and the interval between adjacent motifs. for the first motif, this is simply the distance from the beginning of the sequence to the start of the motif.

Page 15: Mutiple Motifs

15

PRINTS

Page 16: Mutiple Motifs

16

PRINTS

FPScanSubmitting a PROTEIN sequence find the closest

matching PRINTS fingerprint/s.

Page 17: Mutiple Motifs

17

PRINTS

Page 18: Mutiple Motifs

18

PRINTS

Page 19: Mutiple Motifs

19

PRINTS

Page 20: Mutiple Motifs

20

PRINTS

Page 21: Mutiple Motifs

21

PRINTS

GRAPHScanA graphical view of the result of a scan of a

fingerprint against a sequence. Matching motifs are highlighted if they score above the threshold % identity

Page 22: Mutiple Motifs

22

PRINTS

Page 23: Mutiple Motifs

23

PRINTS

Page 24: Mutiple Motifs

24

PRINTS

MULScanThis facility allows multiple sequences to be

scanned against the database, Results are returned via email.

Page 25: Mutiple Motifs

25

Related Projects InterPro - Integrated Resources of Proteins Domains and

Functional Sites

BLOCKS - BLOCKS db Pfam - Protein families db (HMM derived) [Mirror at St.

Louis (USA)] PRINTS - Protein Motif fingerprint db ProDom - Protein domain db (Automatically generated) PROTOMAP - An automatic hierarchical classification of

Swiss-Prot proteins SBASE - SBASE domain db SMART - Simple Modular Architecture Research Tool TIGRFAMs - TIGR protein families db