EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D....
-
date post
21-Dec-2015 -
Category
Documents
-
view
218 -
download
2
Transcript of EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D....
EBI is an Outstation of the European Molecular Biology Laboratory.
UniProt
Jennifer McDowall, Ph.D.Senior InterPro Curator
Protein Sequence Database:
http://www.ebi.uniprot.org
What do protein scientists require?
High quality protein sequence
Non-redundant data with maximal coverage, including splice isoforms, disease variants and PTMs.
Protein identification
Protein annotation
Stable identifiers and consistent nomenclature
Detailed information: protein function, biological processes, molecular interactions, and pathways
Sequence archiving essential
http://www.ebi.uniprot.org
Sequence quality in UniProt
Evidence at protein level
Evidence at transcript level
Inferred from homology
Predicted
Uncertain
Protein existence level Human
59%
37.5%
1%
0.5%
2%
http://www.ebi.uniprot.org
3 Components of UniProt
UniProtKBUniProtKBKnowledgebase
UniRefUniRefReference
Cluster
UniParcUniParcProtein Archive
Protein sequence repository
Swiss-Prot: non-redundant, manually annotated
TrEMBL: redundant, automatically annotated
History of all sequences
Combines sequences (speed searching)
UniRef100, UniRef90, UniRef50
http://www.ebi.uniprot.org
EMBL/GenBank/DDBJ, Ensembl, PDB, RefSeq, Patent data, Model organism databases
http://www.ebi.uniprot.org
UniProtKB pipeline
EMBL
CGCGCCTGTACGCTGAACGCTCGTGACGTGTAGTGCGCGCGCTGTGATAGCG
CTGATCGTGATGCGTATGCAGGTCGT
nucleotide sequencingnucleotide sequencing
TrEMBL
translate translate sequencesequence
UniProtKBUniProtKB
Swiss-Protannotationannotation
EBI EBI SIB SIB PIRPIR
>7M >4K
http://www.ebi.uniprot.org
Searching UniProt
http://www.uniprot.org/
Search tools include:
• Text Search
• Blast sequence search
• Additional search engines through EBI (e.g. MPSearch and FASTA)
http://www.ebi.uniprot.org
Searching UniProt – Simple Search
• Text-based searching• Logical operators ‘&’ (and), ‘|’
http://www.ebi.uniprot.org
Sequence Sequence features
Ontologies
ReferencesNomenclature
Splice variants
Annotations
http://www.ebi.uniprot.org
UniProt/Swiss-Prot AnnotationRemove redundancy
Merge TrEMBL (1 gene product 1 entry)
Sequence variation Identify conflicts & alternative splicing
Modifications Posttranslational, e.g. carbohydrates
Annotate sequence Map domains and sites onto sequence
General annotation Descriptive comments, e.g. function
Binary interactions Linked to protein-protein interaction data
Cross references Extensive integration with other databasesBibliography Cited references
Taxonomy Description of biological source
Structure Describes both secondary and quaternary
Disease association Map sequence deficiencies causing disease
Similarity To protein families and domains
http://www.ebi.uniprot.org
Entry InformationSequence correction, versioning and archiving
Merged A8K2S6 with Q00987
Able to compare versions directly
http://www.ebi.uniprot.org
Entry InformationSequence correction, versioning and archiving
For example: erroneous gene model predictions, frameshifts,read-throughs, premature stop codons, erroneous initiator Met…
http://www.ebi.uniprot.org
Names and OriginSome literature search engines pull synonyms from UniProt
http://www.ebi.uniprot.org
Structure - tertiary
Provides information on ordered and disordered regions of protein
http://www.ebi.uniprot.org
General Annotation
Controlled vocabularies used where possible
References provides
Literature-derived annotation
http://www.ebi.uniprot.org
Disease Association
Mendelian Inheritance in Man provides information on genetic
disease associations
Pharmacogenomics database
http://www.ebi.uniprot.org
Binary Interactions
Interacting protein
Experimental information in IntAct
Data imported from other sources
http://www.ebi.uniprot.org
Binary Interactions
Database of Interacting Proteins
Pathway databases
Data imported from other sources
http://www.ebi.uniprot.org
UniProt/TrEMBL entry
Redundancy
Nucleotide dataAutomatic clean-up
InterPro family classification Cross-references Keywords (common annotation with
Swiss-Prot) Transfers common annotation to related family members in TrEMBL
Automatic annotation
Swiss-Prot = 400K
TrEMBL = 6M
Entries: