EBI is an Outstation of the European Molecular Biology Laboratory. InterPro Database Protein...

104
EBI is an Outstation of the European Molecular Biology Laboratory. InterPro Database Protein Functional Analysis Jennifer McDowall, Ph.D. Senior InterPro Curator
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    1

Transcript of EBI is an Outstation of the European Molecular Biology Laboratory. InterPro Database Protein...

EBI is an Outstation of the European Molecular Biology Laboratory.

InterPro Database Protein

Functional Analysis

Jennifer McDowall, Ph.D.Senior InterPro Curator

http://www.ebi.ac.uk/interpro

EBI Sequence Databases

UniProtKBSwiss-Prot

manual annotation

UniProtKBTrEMBL

protein sequence

translate

(GenBank, DDBJ)

nucleotide sequence

EMBL

CGCGCCTGTACGCTGAACGCTCGTGACGTGTAGTGCGCG

CGCGCCTGTACGCTGAACGCTCGTGACGTGTAGTGCGCG

>7M

>400,000

http://www.ebi.ac.uk/interpro

EBI Sequence Databases

UniProtKBSwiss-Prot

manual annotation

UniProtKBTrEMBL

protein sequence

translate

InterPro

Protein signatures

protein annotation

(GenBank, DDBJ)

nucleotide sequence

EMBL

CGCGCCTGTACGCTGAACGCTCGTGACGTGTAGTGCGCG

CGCGCCTGTACGCTGAACGCTCGTGACGTGTAGTGCGCG

groups of related proteins

(same family or share

domains)

http://www.ebi.ac.uk/interpro

UniProtKB

UniProt/ SwissProt proteins

InterPro ~370,000

~400,000

Signature matches

InterPro ~80% Protein Coverage

UniMESS Metagenomic

proteins

>6M

Available 2009

UniProt/ TrEMBL

proteins

>5.3M

>7M

http://www.ebi.ac.uk/interpro

What are protein signatures?

Multiple sequence alignment

• A signature describes the pattern of a set of conserved residues in a group of proteins

Define a protein family Define a protein feature (domain or conserved site)

http://www.ebi.ac.uk/interpro

• More sensitive homology searches Find more distant homologues than BLAST

What value are signatures?

http://www.ebi.ac.uk/interpro

• More sensitive homology searches

What value are signatures?

• Classification of proteins Associate proteins that share: Function

Domains

Sequence

Structure

http://www.ebi.ac.uk/interpro

What value are signatures?

• Annotation of protein sequences Define conserved regions of a protein

- e.g. location and type of domains

key structural or functional sites

• Classification of proteins

• More sensitive homology searches

http://www.ebi.ac.uk/interpro

What value are signatures?

• Transfer additional (automatic) annotation Associate TrEMBL proteins with well-annotated SwissProt proteins

Transfer annotation

• More sensitive homology searches

• Classification of proteins

• Annotation of protein sequences

http://www.ebi.ac.uk/interpro

Signature methods

• Pattern

• Fingerprint

• Sequence clustering

• HMM

• SAM

http://www.ebi.ac.uk/interpro

Patterns

Pattern/motif in sequence regular expression

Can define important sites

Enzyme catalytic site Prosthetic group attachment Metal ion binding site Cysteines for disulphide bonds Protein or molecule binding

B chain xxxxxxCxxxxxxxxxxxxCxxxxxxxxx A chain xxxxxCCxxxCxxxxxxxxCx | |

EXAMPLE: Insulin

http://www.ebi.ac.uk/interpro

Patterns

Pattern/motif in sequence regular expression

Can define important sites

MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN

B chain xxxxxxCxxxxxxxxxxxxCxxxxxxxxx A chain xxxxxCCxxxCxxxxxxxxCx | |

EXAMPLE: PS00262 Insulin family signature

http://www.ebi.ac.uk/interpro

Patterns

Pattern/motif in sequence regular expression

Can define important sites

B chain xxxxxxCxxxxxxxxxxxxCxxxxxxxxx A chain xxxxxCCxxxCxxxxxxxxCx | |

EXAMPLE: PS00262 Insulin family signature

MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQ CCTSICSLYQLENYC N

http://www.ebi.ac.uk/interpro

Patterns

Pattern/motif in sequence regular expression

Can define important sites

B chain xxxxxxCxxxxxxxxxxxxCxxxxxxxxx A chain xxxxxCCxxxCxxxxxxxxCx | |

EXAMPLE: PS00262 Insulin family signature

C-C-{P}-x(2)-C-[STDNEKPI]-x(3)-[LIVMFS]-x(3)-C

Regular expression

MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQ CCTSICSLYQLENYC N

http://www.ebi.ac.uk/interpro

Patterns – understanding a regular expression

C - C - {P} - x(2) - C - [STDNEKPI] - x(3) - [LIVMFS] - x(3) - C

Strictly conserved site; only one amino acid is

accepted at this position

Strictly conserved site; only one amino acid is

accepted at this position

Curly brackets denote amino acids that cannot occur at a single position

Curly brackets denote amino acids that cannot occur at a single position

x denotes any amino acid can occur at a

single position

x denotes any amino acid can occur at a

single position

There are dashes between each position

There are dashes between each position

http://www.ebi.ac.uk/interpro

Patterns – understanding a regular expression

C - C - {P} - x(2) - C - [STDNEKPI] - x(3) - [LIVMFS] - x(3) - C

X(2) – therefore any amino acid can occur

at the next two position

X(2) – therefore any amino acid can occur

at the next two position

Square brackets denote range of amino acids that occur at a single position

Square brackets denote range of amino acids that occur at a single position

http://www.ebi.ac.uk/interpro

Patterns

Extract pattern sequencesxxxxxxxxxxxxxxxxxxxxxxxx

Sequence alignment

Insulin family motifDefine pattern

Pattern signature

C-C-{P}-x(2)-C-[STDNEKPI]-x(3)-[LIVMFS]-x(3)-CBuild regular expression

PS00000

http://www.ebi.ac.uk/interpro

Fingerprints

Several motifs characterise family

Different combinations of motifs describe subfamilies

Identify small conserved regions in divergent proteins

EXAMPLE: PR00107 Phosphocarrier HPr signature

PTHP_ENTFA: MEKKEFHIVAETGIHARPATLLVQTASKFNSDINLEYKGKSVNLK

SIMGVMSLGVGQGSDVTITVDGADEAEGMAAIVETLQKEGLAE

http://www.ebi.ac.uk/interpro

Fingerprints

Several motifs characterise family

Different combinations of motifs describe subfamilies

Identify small conserved regions in divergent proteins

EXAMPLE: PR00107 Phosphocarrier HPr signature

PTHP_ENTFA: MEKKEFHIVAET GIHARPATLLVQTASKF NSDINLEYKGKSVNLK

SIMGVMSLGVGQGSDVTITVDGADEAEGMAAIVETLQKEGLAE

His phosphorylation site

http://www.ebi.ac.uk/interpro

Fingerprints

Several motifs characterise family

Different combinations of motifs describe subfamilies

Identify small conserved regions in divergent proteins

EXAMPLE: PR00107 Phosphocarrier HPr signature

PTHP_ENTFA:

His phosphorylation site

Ser phosphorylation site

MEKKEFHIVAET GIHARPATLLVQTASKF NSDINLEY KGKSVNLK

SIMGVMSL GVGQGSDVTITVDGADEAEGMAAIVETLQKEGLAE

http://www.ebi.ac.uk/interpro

Fingerprints

Several motifs characterise family

Different combinations of motifs describe subfamilies

Identify small conserved regions in divergent proteins

EXAMPLE: PR00107 Phosphocarrier HPr signature

PTHP_ENTFA:

His phosphorylation site

Ser phosphorylation siteConserved site

MEKKEFHIVAET GIHARPATLLVQTASK FNSDINLEY KGKSVNLK

SIMGVMSL GVGQGSDVTITVDGADE AEGMAAIVETLQKEGLAE

http://www.ebi.ac.uk/interpro

Fingerprints

Several motifs characterise family

Different combinations of motifs describe subfamilies

Identify small conserved regions in divergent proteins

EXAMPLE: PR00107 Phosphocarrier HPr signature

PTHP_ENTFA: MEKKEFHIVAET GIHARPATLLVQTASK FNSDINLEY KGKSVNLK

SIMGVMSL GVGQGSDVTITVDGADE AEGMAAIVETLQKEGLAE

1) GIHARPATLLVQTASKF2) KGKSVNLKSIMGVMSL

3) LGVGQGSDVTITVDGADE 3-motif fingerprint

http://www.ebi.ac.uk/interpro

Fingerprints

Extract motif sequences

xxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxx

Sequence alignment

Correct order

Correct spacing

Ser phosphorylation

site

Conserved site

His phosphorylation

siteDefine motifs

Fingerprint signature 1 2 3

PR00000

http://www.ebi.ac.uk/interpro

Sequence clustering

Automatic clustering of homologous domains

**Rarely covers entire domain (conserved core)

**Signature size can change with release

Known domain families

Recruit homologous domains

PSI-BLAST

MKDOM2

Automatic clustering

ProDomAlignAlign domain families

http://www.ebi.ac.uk/interpro

Hidden Markov Models (HMM)

Can characterise protein over entire length

Models conserved and divergent regions (position-specific scoring)

Models insertions and deletions

Outperform in sensitivity and specificity

More flexible (can use partial alignments)

Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:Sequence 6:Sequence 7:

Sequence alignment

Scoring matrix

(residue frequency at each position in

alignment)

Profile

Hidden Markov Models (HMM)

Bayesian statistics

probability scoring

Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:Sequence 6:Sequence 7:

M = match state

M1

Hidden Markov Models (HMM)

Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:Sequence 6:Sequence 7:

M1

Hidden Markov Models (HMM)

M2

M = match state

Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:Sequence 6:Sequence 7:

M1

Hidden Markov Models (HMM)

M2 M3

M = match state

Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:Sequence 6:Sequence 7:

M1

Hidden Markov Models (HMM)

M2 M3 M4 M5 M6 M7 M8 M9 M10M4 M5 M6 M7 M8 M9 M10

M = match state

http://www.ebi.ac.uk/interpro

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10M4 M5 M6 M7 M8 M9 M10

I = insert state

I1 I2 I3 I4 I5 I6 I7 I8 I9

D = delete state

D2 D3 D4 D5 D6 D7 D8 D9

Hidden Markov Models (HMM)

http://www.ebi.ac.uk/interpro

Hidden Markov Models (HMM)

HMM databases:

• PIR SUPERFAMILY

• PANTHER

• TIGRFAM

• PFAM

• SMART

• SUPERFAMILY

• GENE3D

Domains conserved in sequence

Families conserved in sequence

Domains conserved in structure

http://www.ebi.ac.uk/interpro

SAM Profile HMMs

Homologous structural superfamilies

Start with single seed sequence

Proteins in superfamily may have low

sequence identity

Few proteins in family have PDB structures

Create 1 model for every protein in superfamily combine results

http://www.ebi.ac.uk/interpro

SAM Profile models

T99 script:

Low identity matches

Close homologues

WU-BLASTP

search

Final model

Single seed sequenceGIHARPATLLVQTASKF

Initial model

GIHARPATLLVQTASKF GIHARPATLLVQTASKF GIHARPATLLVQTASKF

New larger alignmentGIHARPATLLVQTASKF GIHARPATLLVQTASKF GIHARPATLLVQTASKF GIHARPATLLVQTASKF GIHARPATLLVQTASKF

http://www.ebi.ac.uk/interpro

Signatures Methods

• Pattern

• Fingerprint

• Sequence clustering

• HMM

• SAM

Describe protein features:active sites, binding sites…

Describe families and sibling subfamilies

Predicts conserved domains

http://www.ebi.ac.uk/interpro

Signature Methods

• Pattern

• Fingerprint

• Sequence clustering

• HMM

• SAM

Functional classification of

families

Functional domain annotation

Structural domain annotation

http://www.ebi.ac.uk/interpro

Comprehensive annotationInterPro removes

redundancy

SWIB/MDM2 domain

RanBP2-type zinc finger

RING-type zinc fingerDomain annotation

http://www.ebi.ac.uk/interpro

Comprehensive annotation

Conserved site within zinc finger

Annotate features

http://www.ebi.ac.uk/interpro

Comprehensive annotation

Mdm2/Mdm4 family

Mdm4 subfamily

Parent

Child

Family classification

http://www.ebi.ac.uk/interpro

Domain Boundaries

Gene3D (and SSF) determines domain structural boundaries

Pfam trims domains to regions of good sequence conservation

ProDom displays shortest conserved sequence

http://www.ebi.ac.uk/interpro

Fragmented Signatures

4) Non-contiguous domains

3) Repeated elements

2) Duplicated domains

1) Signature method

http://www.ebi.ac.uk/interpro

Fragmented Signatures

• e.g. PRINTS – discrete motifs1) Signature methodSignature method

3) Repeated elements

2) Duplicated domains

4) Non-contiguous domains

http://www.ebi.ac.uk/interpro

Fragmented Signatures

1) Signature method

2) Duplicated domainsDuplicated domains

3) Repeated elements

4) Non-contiguous domains

• e.g. SSF - duplication consisting of 2 domains with same fold

http://www.ebi.ac.uk/interpro

Fragmented Signatures

3) Repeated elementsRepeated elements

2) Duplicated domains

• e.g. Kringle, WD40

4) Non-contiguous domains

1) Signature method

http://www.ebi.ac.uk/interpro

Fragmented Signatures

3) Repeats

4) Non-contiguous domainsNon-contiguous domains

2) Duplicated domains

1) Signature method

• Structural domains can consist of non-contiguous sequence

http://www.ebi.ac.uk/interpro

Fragmented Signatures

4) Non-contiguous domains

3) Repeats

2) Duplicated domains

1) Signature method

http://www.ebi.ac.uk/interpro

Complementary Annotation

Sequence-based signature (Pfam) shows that the domain is made up of repeating sequence elements

Beta-propeller repeat

Structural-based signature (SSF) shows boundaries of structural domain

7-blade beta-propeller

http://www.ebi.ac.uk/interpro

Complementary Annotation

PFAM shows domain is composed of two types of repeated sequence motifs

SUPERFAMILY shows the potential domain boundaries

http://www.ebi.ac.uk/interpro

Complementary Annotation

GENE3D shows that these domains share homologous structure

PFAM/SMART show 2 domains from distinct

sequence families

http://www.ebi.ac.uk/interpro

Searching InterPro:

InterProScan sequence search

http://www.ebi.ac.uk/interpro

Searching InterPro

http://www.ebi.ac.uk/interpro/

Search tools include:

• Text Search

• InterProScan (sequence search)

http://www.ebi.ac.uk/interpro

InterPro Text Search

Text search box Search using:• text• protein ID• InterPro ID• GO term

Search results

Direct links to entry

http://www.ebi.ac.uk/interpro

InterProScan Search Use ftp site to run multiple sequences

simultaneously

Member database search engines

Paste in sequence (protein/nucleotide)

http://www.ebi.ac.uk/interpro

InterProScan Search Results

single InterPro entry

Direct links to entry

Direct links to signature databases

http://www.ebi.ac.uk/interpro

EXERCISE 1

http://www.ebi.ac.uk/interpro

Exploring InterPro entries

http://www.ebi.ac.uk/interpro

InterPro Entry

Groups similar signatures together

Adds extensive annotation

Linked to other databases

Structural information and viewers

Links related signatures

http://www.ebi.ac.uk/interpro

Grouping Signatures Together

Same positions

Different protein hits2)

PFAM

PROSITE (100)

(50)

PFAM

PROSITE1) (100)

(100)Same positionsSame protein hits

IPR000001

IPR000001

IPR000002

IPR000001

IPR000002

IPR000001

IPR000002

Different positions4)PFAM

PROSITE (100)

(100)

PROSITE

PFAM

3) (100)

(100)

Different positions

Same protein hits

http://www.ebi.ac.uk/interpro

Extensive Annotation

Annotation Fields in InterPro

• Name and short name• List of signatures (links to member databases)

• Entry type (family, domain, site)

• Relationships (links related signatures)

• GO mapping ( large scale classification)

• Abstract • Taxonomy (search/download using taxonomy)

• Examples• Publications

http://www.ebi.ac.uk/interpro

Extensive Annotation

Annotation Fields in InterPro

• Name and short name• List of signatures (links to member databases)

• Entry type (family, domain, site)

• Relationships (links related signatures)

• GO mapping ( large scale classification)

• Abstract • Taxonomy (search/download using taxonomy)

• Examples• Publications

Short names appear in UniProt entries

http://www.ebi.ac.uk/interpro

Extensive Annotation

Annotation Fields in InterPro

• Name and short name• List of signatures (links to member databases)

• Entry type (family, domain, site)

• Relationships (links related signatures)

• GO mapping ( large scale classification)

• Abstract • Taxonomy (search/download using taxonomy)

• Examples• Publications

http://www.ebi.ac.uk/interpro

Extensive Annotation

Annotation Fields in InterPro

• Name and short name• List of signatures (links to member databases)

• Entry type (family, domain, site)

• Relationships (links related signatures)

• GO mapping ( large scale classification)

• Abstract • Taxonomy (search/download using taxonomy)

• Examples• Publications

Domain Biological units with defined boundaries

Full-length signatures grouping related proteins Family

Region Any signature that doesn’t fit the above

Repeat

Site

Signature repeated as a series of short motifs

Protein feature described by a Prosite pattern

http://www.ebi.ac.uk/interpro

Extensive Annotation

Annotation Fields in InterPro

• Name and short name• List of signatures (links to member databases)

• Entry type (family, domain, site)

• Relationships (links related signatures)

• GO mapping ( large scale classification)

• Abstract • Taxonomy (search/download using taxonomy)

• Examples• Publications

http://www.ebi.ac.uk/interpro

Extensive Annotation

Annotation Fields in InterPro

• Name and short name• List of signatures (links to member databases)

• Entry type (family, domain, site)

• Relationships (links related signatures)

• GO mapping ( large scale classification)

• Abstract • Taxonomy (search/download using taxonomy)

• Examples• Publications

http://www.ebi.ac.uk/interpro

Extensive Annotation

Annotation Fields in InterPro

• Name and short name• List of signatures (links to member databases)

• Entry type (family, domain, site)

• Relationships (links related signatures)

• GO mapping ( large scale classification)

• Abstract • Taxonomy (search/download using taxonomy)

• Examples• Publications

http://www.ebi.ac.uk/interpro

Extensive Annotation

Annotation Fields in InterPro

• Name and short name• List of signatures (links to member databases)

• Entry type (family, domain, site)

• Relationships (links related signatures)

• GO mapping ( large scale classification)

• Abstract • Taxonomy (search/download using taxonomy)

• Examples• Publications

http://www.ebi.ac.uk/interpro

Extensive Annotation

Annotation Fields in InterPro

• Name and short name• List of signatures (links to member databases)

• Entry type (family, domain, site)

• Relationships (links related signatures)

• GO mapping ( large scale classification)

• Abstract • Taxonomy (search/download using taxonomy)

• Examples• Publications

http://www.ebi.ac.uk/interpro

Extensive Annotation

Annotation Fields in InterPro

• Name and short name• List of signatures (links to member databases)

• Entry type (family, domain, site)

• Relationships (links related signatures)

• GO mapping ( large scale classification)

• Abstract • Taxonomy (search/download using taxonomy)

• Examples• Publications

http://www.ebi.ac.uk/interpro

Links to Other Databases

Additional annotation from databases:

• Blocks (family alignments)

• IntEnz (enzymes)

• Prosite documents• COME (bioinorganic motifs)

• CAZy (carbohydrate-active enzymes)

• IUPHAR (GPCR receptors)

• CluS-Tr (protein clusters)

• Pandit (phylogenetic trees of PFAMs)

• Merops (peptidases & inhibitors)

http://www.ebi.ac.uk/interpro

Links to Structural Databases

• SCOP (structural classification of proteins)

• CATH (structural classification of proteins)

• PDB (protein structure databank)

List of proteins with structural data

PDB database of structures

http://www.ebi.ac.uk/interpro

Links to Structural Databases

• SCOP (structural classification of proteins)

• CATH (structural classification of proteins)

• PDB (protein structure databank)

Links to structural classification

http://www.ebi.ac.uk/interpro

Links to Structural Databases

• SCOP (structural classification of proteins)

• CATH (structural classification of proteins)

• PDB (protein structure databank)

Links to structural classification

http://www.ebi.ac.uk/interpro

Links to Interaction Databases

• IntAct (protein-protein interactions)

Lists proteins in entry known to be involved in protein-protein interactions

IntAct database of interactions

http://www.ebi.ac.uk/interpro

EXERCISE 2

http://www.ebi.ac.uk/interpro

Exploring InterPro relationships

http://www.ebi.ac.uk/interpro

InterPro Relationships

Parent/Child

Contains/Found in

Hierarchical subdivision into more closely related groups

Domain/subdomain composition Overlapping Remaining relationships

http://www.ebi.ac.uk/interpro

Link related signatures - relationships

1) Parent - Child (subgroup of more closely related proteins)

PFAM

(75)

(100)

SMART

Protein kinase

Serine kinase

PROSITE (25) Tyrosine kinase

*

PFAM (100) Protein kinase*

No proteins in common

SMART PROSITE

Parent

Children

PFAM

Protein kinase

SMART PROSITE

Serine kinase Tyrosine kinase

(IPR000001)

(IPR000002) (IPR000003)

http://www.ebi.ac.uk/interpro

Relationships – evolutionary context

GENE3D Grandparent

Parents

Children

InterPro Relationship

Criteria for Signature

Structural family

PFAM PFAMSequence families

TIGRFAM TIGRFAM TIGRFAM TIGRFAMFunctional families

Unique to InterPro

http://www.ebi.ac.uk/interpro

IPR011009 Protein kinase-like

IPR000403 PI 3/4 kinase

IPR000719 Protein kinase

IPR001245 Tyr kinase

IPR017442 Ser/Thr kinase-rel

IPR015772TNK1 kin

IPR015783ATMRK kin

IPR002575 APH kinase

IPR004147 ABC-1

IPR004166 EF2 kinase

IPR015275 Actin-fragmin kin

IPR015897 CHK kinase

IPR002290 Ser/Thr kin

IPR015515 GCN2

IPR015771 Hrmn Rcpt

IPR015768 Activin Rcpt

IPR015769 TGFb2 Rcpt

IPR015770 BMPRII

IPR015785 MAPK3 kin

IPR015787 IL1 kin

IPR008350 ERK3 MAPK

IPR015732 PSKH kin

IPR015733 Ca-dep kin4

IPR015734 Ca-dep kin1

IPR015739 Leu zip kin

IPR015740 Plant kin

IPR015747 MAPKKK4

IPR015748MAPKKK3

IPR015749 MAPKKK1

IPR015750 Pak kin

IPR015730 Myosin kin

IPR008351 JNK kin

Example hierarchy:

IPR018934 RIO-like kin

IPR000687 RIO kin

IPR002573 Choline kinase

IPR008349 ERK1 kin

IPR006748 Hydroxyurea kin

IPR009212 MethylTR kin

IPR014093 Thiamine kin

IPR009330 Lipopoly syn

IPR004119 DUF

IPR012877 Put kinase

http://www.ebi.ac.uk/interpro

Different entries not redundant

Parent/child – evolutionary context

http://www.ebi.ac.uk/interpro

Most specific subfamily

classification

Superfamily classification

Parent/child – evolutionary context

http://www.ebi.ac.uk/interpro

2) Contains – Found in

PROSITE C-terminal domainSMARTN-terminal domain

PFAM Receptor family

PFAM

Receptor Family

SMART PROSITE

N-terminal domain C-terminal domainFound in(Pfam)

Contains (Smart and Prosite)

Link related signatures - relationships

(Describes domain composition)

http://www.ebi.ac.uk/interpro

2) Contains – Found in

Link related signatures - relationships

Coverage Signature must cover the entire (>90%) sequence of contained signature

PFAM

SMART

ContainsFound in

PFAM

SMART

Contains

Found in

http://www.ebi.ac.uk/interpro

3) Overlapping

Link related signatures - relationships

All remaining relationships

PROSITE

SMART Overlapping

http://www.ebi.ac.uk/interpro

EXERCISE 3

http://www.ebi.ac.uk/interpro

Exploring InterPro taxonomy

http://www.ebi.ac.uk/interpro

InterPro taxonomy

Select species-specific protein sets

http://www.ebi.ac.uk/interpro

InterPro taxonomy

http://www.ebi.ac.uk/interpro

InterPro taxonomy

http://www.ebi.ac.uk/interpro

EXERCISE 4

http://www.ebi.ac.uk/interpro

Exploring protein structure

in InterPro

http://www.ebi.ac.uk/interpro

Structural information

PDB

Classification

Structures

CATH

SCOP

Homology Models

Swiss-Model

ModBase

http://www.ebi.ac.uk/interpro

Structural information

CATH and SCOP divide PDB structures into domains

Swiss-Model and ModBase predict structure for regions not covered by PDB

Note that one domain is discontiguous

http://www.ebi.ac.uk/interpro

Sequence-Structure Display

Signatures predictive of

protein annotation

Structural data for specific proteins

AstexViewer® for structure

http://www.ebi.ac.uk/interpro

Structure Viewer

Navigate between structure and sequence

Manipulate structures

http://www.ebi.ac.uk/interpro

EXERCISE 5

http://www.ebi.ac.uk/interpro

Exploring splice variants

in InterPro

http://www.ebi.ac.uk/interpro

Other Features – splice variants

Splice variants

http://www.ebi.ac.uk/interpro

EXERCISE 6

http://www.ebi.ac.uk/interpro

Exploring InterPro Domain

Architecture

http://www.ebi.ac.uk/interpro

Other Features – domain architecture

Select data set of these proteins

Each ‘balloon’ represents a

linked InterPro domain

http://www.ebi.ac.uk/interpro

EXERCISE 7

http://www.ebi.ac.uk/interpro

Protein Sequence Coverage

InterPro signatures cover:

95% of UniProt/Swiss-Prot proteins

79% of UniProt/TrEMBL proteins

>5 million matches in InterPro

~17,000 InterPro entries

>57,500 signature methods

http://www.ebi.ac.uk

InterPro Team:

InterPro Consortium:

Team leader: Sarah Hunter

Acknowledgements

David Lonsdale

Louise Daugherty

Jennifer McDowall

Craig McAnulla

David Binns

Ujjwal Das

Anthony Quinn

John Maslen

ManjulaThimma

PhilJones