EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D....

77
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    2

Transcript of EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D....

EBI is an Outstation of the European Molecular Biology Laboratory.

UniProt

Jennifer McDowall, Ph.D.Senior InterPro Curator

Protein Sequence Database:

http://www.ebi.uniprot.org

What do protein scientists require?

High quality protein sequence

Non-redundant data with maximal coverage, including splice isoforms, disease variants and PTMs.

Protein identification

Protein annotation

Stable identifiers and consistent nomenclature

Detailed information: protein function, biological processes, molecular interactions, and pathways

Sequence archiving essential

http://www.ebi.uniprot.org

Sequence quality in UniProt

Evidence at protein level

Evidence at transcript level

Inferred from homology

Predicted

Uncertain

Protein existence level Human

59%

37.5%

1%

0.5%

2%

http://www.ebi.uniprot.org

UniProt Consortium

http://www.ebi.uniprot.org

3 Components of UniProt

UniProtKBUniProtKBKnowledgebase

UniRefUniRefReference

Cluster

UniParcUniParcProtein Archive

Protein sequence repository

Swiss-Prot: non-redundant, manually annotated

TrEMBL: redundant, automatically annotated

History of all sequences

Combines sequences (speed searching)

UniRef100, UniRef90, UniRef50

http://www.ebi.uniprot.org

EMBL/GenBank/DDBJ, Ensembl, PDB, RefSeq, Patent data, Model organism databases

http://www.ebi.uniprot.org

UniProtKB pipeline

EMBL

CGCGCCTGTACGCTGAACGCTCGTGACGTGTAGTGCGCGCGCTGTGATAGCG

CTGATCGTGATGCGTATGCAGGTCGT

nucleotide sequencingnucleotide sequencing

TrEMBL

translate translate sequencesequence

UniProtKBUniProtKB

Swiss-Protannotationannotation

EBI EBI SIB SIB PIRPIR

>7M >4K

http://www.ebi.uniprot.org

Searching UniProt:

Simple text search

http://www.ebi.uniprot.org

Searching UniProt

http://www.uniprot.org/

Search tools include:

• Text Search

• Blast sequence search

• Additional search engines through EBI (e.g. MPSearch and FASTA)

http://www.ebi.uniprot.org

Searching UniProt – Simple Search

• Text-based searching• Logical operators ‘&’ (and), ‘|’

http://www.ebi.uniprot.org

Searching UniProt – Simple Search

http://www.ebi.uniprot.org

Searching UniProt – Search Results

Each linked to the UniProt entry

http://www.ebi.uniprot.org

Searching UniProt – Search Results

http://www.ebi.uniprot.org

Searching UniProt – Search Results

http://www.ebi.uniprot.org

EXERCISE 1

http://www.ebi.uniprot.org

Exploring a SwissProt entry:

General information

http://www.ebi.uniprot.org

Sequence Sequence features

Ontologies

ReferencesNomenclature

Splice variants

Annotations

http://www.ebi.uniprot.org

UniProt/Swiss-Prot AnnotationRemove redundancy

Merge TrEMBL (1 gene product 1 entry)

Sequence variation Identify conflicts & alternative splicing

Modifications Posttranslational, e.g. carbohydrates

Annotate sequence Map domains and sites onto sequence

General annotation Descriptive comments, e.g. function

Binary interactions Linked to protein-protein interaction data

Cross references Extensive integration with other databasesBibliography Cited references

Taxonomy Description of biological source

Structure Describes both secondary and quaternary

Disease association Map sequence deficiencies causing disease

Similarity To protein families and domains

http://www.ebi.uniprot.org

Customise layoutCollapse section

http://www.ebi.uniprot.org

UniProtKB/Swiss-Prot Annotation

http://www.ebi.uniprot.org

Hold down cursor to drag-and-drop sections

Customise layout

http://www.ebi.uniprot.org

Customise layout

http://www.ebi.uniprot.org

Entry Information

Swiss-Prot removes redundancy

http://www.ebi.uniprot.org

Entry InformationSequence correction, versioning and archiving

http://www.ebi.uniprot.org

Entry InformationSequence correction, versioning and archiving

Merged A8K2S6 with Q00987

Able to compare versions directly

http://www.ebi.uniprot.org

Entry InformationSequence correction, versioning and archiving

http://www.ebi.uniprot.org

Entry InformationSequence correction, versioning and archiving

For example: erroneous gene model predictions, frameshifts,read-throughs, premature stop codons, erroneous initiator Met…

http://www.ebi.uniprot.org

Names and OriginSome literature search engines pull synonyms from UniProt

http://www.ebi.uniprot.org

EXERCISE 2

http://www.ebi.uniprot.org

Exploring a SwissProt entry:

Sequence annotation

http://www.ebi.uniprot.org

Sequence

http://www.ebi.uniprot.org

Sequence

http://www.ebi.uniprot.org

Sequence variation - conflicts

http://www.ebi.uniprot.org

Sequence variation – splicing

http://www.ebi.uniprot.org

Sequence variation – splicing

http://www.ebi.uniprot.org

Sequence variation – splicing

http://www.ebi.uniprot.org

Annotate Sequence

http://www.ebi.uniprot.org

EXERCISE 3

http://www.ebi.uniprot.org

Exploring a SwissProt entry:

Structural annotation

http://www.ebi.uniprot.org

Structure - secondary

http://www.ebi.uniprot.org

Structure - secondary

http://www.ebi.uniprot.org

Structure - tertiary

http://www.ebi.uniprot.org

Structure - tertiary

http://www.ebi.uniprot.org

Structure - tertiary

Provides information on ordered and disordered regions of protein

http://www.ebi.uniprot.org

Structure - tertiary

http://www.ebi.uniprot.org

Structure - quaternary

http://www.ebi.uniprot.org

EXERCISE 4

http://www.ebi.uniprot.org

Exploring a SwissProt entry:

General annotation

http://www.ebi.uniprot.org

General Annotation

Controlled vocabularies used where possible

References provides

Literature-derived annotation

http://www.ebi.uniprot.org

General AnnotationAdditional annotation from Gene Ontology

http://www.ebi.uniprot.org

Modifications

http://www.ebi.uniprot.org

Disease Association

http://www.ebi.uniprot.org

Disease Association

Mendelian Inheritance in Man provides information on genetic

disease associations

Pharmacogenomics database

http://www.ebi.uniprot.org

Disease Association

http://www.ebi.uniprot.org

Similarity

http://www.ebi.uniprot.org

Similarity

http://www.ebi.uniprot.org

Binary Interactions

Interacting protein

Experimental information in IntAct

Data imported from other sources

http://www.ebi.uniprot.org

Binary Interactions

Database of Interacting Proteins

Pathway databases

Data imported from other sources

http://www.ebi.uniprot.org

Cross ReferencesA central portal to a wealth of external resources…

http://www.ebi.uniprot.org

Cross ReferencesA central portal to a wealth of external resources…

http://www.ebi.uniprot.org

Source references included

http://www.ebi.uniprot.org

Classification and domain annotation provided by InterPro

http://www.ebi.uniprot.org

A wealth of external links

http://www.ebi.uniprot.org

EXERCISE 5

http://www.ebi.uniprot.org

Searching UniProt:

BLAST search

http://www.ebi.uniprot.org

Searching UniProt – Blast Search

http://www.ebi.uniprot.org

Searching UniProt – Blast Search

http://www.ebi.uniprot.org

Searching UniProt – Blast Results

Alignment with query sequence

http://www.ebi.uniprot.org

Searching UniProt – Blast Results

http://www.ebi.uniprot.org

Searching UniProt – Blast Results

Aligns checked sequences

http://www.ebi.uniprot.org

Searching UniProt – Blast Results

http://www.ebi.uniprot.org

EXERCISE 6

http://www.ebi.uniprot.org

Exploring a UniProt/TrEMBL

entry

http://www.ebi.uniprot.org

UniProt/TrEMBL entry

Redundancy

Nucleotide dataAutomatic clean-up

InterPro family classification Cross-references Keywords (common annotation with

Swiss-Prot) Transfers common annotation to related family members in TrEMBL

Automatic annotation

Swiss-Prot = 400K

TrEMBL = 6M

Entries:

http://www.ebi.uniprot.org

Transferred annotation

http://www.ebi.uniprot.org

EXERCISE 7

http://www.ebi.ac.uk

Acknowledgements

Rolf Apweiler

Amos Bairoch

Cathy Wu

+100 annotators