Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis...

Post on 14-Jul-2020

3 views 0 download

Transcript of Bioinformacs Resources - Swissprot2016/05/13  · 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis...

BioinfRes SoSe 16

Bioinforma)csResources-Swissprot-

Lecture&ExercisesProf.B.Rost,Dr.L.Richter,J.Reeb

Ins)tutfürInforma)kI12

BioinfRes SoSe 16

Puta)veSchedule

Apr. 22nd Intro, General Overview (1. sh.) Jun 10th No-SQL (7.sh.) Apr. 29th Sequence Databases (2. sh.) Jun 17th No-SQL (8.sh.)* May 6th No lecture Jun 24th JavaScript / UI (9.sh.) May 13th Sequence Databases (3. sh.) Jul 1st Web Services (10.sh.) May 20th Structure Databases (4. sh)* Jul 8th Bioinformatics Suites / Forums May 27th SQL (5. sh.) Jul 15th Wrap Up, Q&A Jun 3rd SQL (6. sh) Jul 28th Exam, 10:30-12:00 MW1050

* These exercises can earn you a bonus

BioinfRes SoSe 16

XMLInfusion(in10sec)●  compila)onfromhMp://www.w3schools.com/xml/default.asp

●  XMLisasoQware-andhardware-independenttooltostoreandtotransportdata

●  XMLstandsforeXtensibleMarkupLanguage

●  designedtostoreandtransportdata●  designedtobeself-descrip)ve

●  W3Crecommenda)on

●  itdoesNOTDOanything

BioinfRes SoSe 16

AboutTags

●  XMLtagsarenotpredefinedlikeHTMLtags●  everybodycan/hastoinventhisowntags

●  newtagscanbeaddedany)me

●  theauthorhastodefinecontentandstructureofthedocument

●  everythingisplaintext

BioinfRes SoSe 16

DocumentStructure<?xml version="1.0" encoding="UTF-8"?>!<bookstore>!!  <book category="cooking">!    <title lang="en">Everyday Italian</title>!    <author>Giada De Laurentiis</author>!    <year>2005</year>!    <price>30.00</price>!  </book>!! <book category="children">!    <title lang="en">Harry Potter</title>!    <author>J K. Rowling</author>!    <year>2005</year>!    <price>29.99</price>!  </book>!!....!</bookstore>!!takenfromhMp://www.w3schools.com/xml/xml_usedfor.asp

BioinfRes SoSe 16

SyntaxRules●  elementsaredefinedusingtags:<tagName> ... </tagName>or<tagName/>!

●  elementscanbenested(containotherelements-parentandchildnodes,siblingnodes)

●  elementscanhavetextcontent

●  eachdocumentmustcontainONErootelementthatistheparentofallotherelements

BioinfRes SoSe 16

SyntaxRefined

●  prologline<?xml ...>isop)onal●  tagsmustbe(self-)closed

●  tagarecasesensi)ve

●  tagsmustbeproperlynested:<a><b>....</a></b> Wrong!<a><b>....</b></a>! Right!

BioinfRes SoSe 16

SyntaxRefined●  tagsmayhaveaMributes●  aMributevaluesmustalwaysbequoted

●  somespecialcharacterscannotbeuseddirectly

●  ->codedbyen)tyreferences:&lt; < lessthan&gt; > greaterthan&amp; & ampersand&apos; ‘ apostrophe&quot; “ quota)onmark

●  comments:<!-- .... -->!

BioinfRes SoSe 16

TagNames●  casesensi)ve●  muststartwithaleMerorunderscore

●  mustnotstartwiththeleMersxmlinanycase

●  cancontain:leMers,digits,hyphens,underscoresandperiods

●  cannotcontainspaces

●  applycommonsenseandaconsistentstyle●  avoid:minus(-),period(.),colon(:),non-englishcharactersforcompa)bilityreasons

BioinfRes SoSe 16

XMLElement

●  everythingbetweenthestartandtheendtag●  tagsareincluded

●  cancontain:-  text-  aMributes-  otherelements-  amixofall

●  areextensible

BioinfRes SoSe 16

XMLAMributes

●  valuesmustbequoted:singleordoublequotes●  theunusedcharactercanbeusedinsidethevalue

●  decisionforaMributeorelementundecided,but:-  aMributescannotcontainmul)plevalues-  aMributescannotcontaintreestructures-  aMributesarenoteasilyexpandable

●  usefultostoremetadata,likeelementid,etc.

BioinfRes SoSe 16

AGlimpseofNamespaces

●  allowtopreventtagnamecollisionsbetweendifferentauthors/applica)ons/domains

●  implementedbytheintroduc)onofprefixes●  definedasanaMribute:xmlns:prefix=“URI”!

●  usage:<prefix:tagName>!●  theURIisonlyneededtobeunique

●  usedtointegrateotherspecifica)ons,e.g.XSLT

BioinfRes SoSe 16

LevelsofCorrectness●  wellformed:adocumentobeythesyntaxrules:-  rootelement-  closingtag-  casesensi)ve-  properlynested-  aMributevaluesquoted

●  validdocuments:inadd)ontobeingvalidthealsoconformtoadocumenttypedefini)on(formatspecifica)on)

BioinfRes SoSe 16

DocumentTypeDefini)ons

●  twowaystospecifyadocumentstructure:●  DTD:DocumentTypDefini)on

●  XMLSchema:XMLbasedalterna)vetoDTD

BioinfRes SoSe 16

Example

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE note SYSTEM "Note.dtd”> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend! &copyright; </body> </note>!

BioinfRes SoSe 16

Example

<!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> <!ENTITY copyright “Copyright by ..”> ]>!

BioinfRes SoSe 16

XMLDTD

●  referencedfromadocumentwith:<!DOCTYPE note SYSTEM "Note.dtd">!

●  !DOCTYPEdefinestherootelement●  !ELEMENTdefinesthestructureoftheelements

●  #PCDTAmeansparse-abletextdata●  !ENTITYdefinesspecialcharactersorstrings

BioinfRes SoSe 16

XMLSchema●  alterna)vetoDTD<xs:element name="note”> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>!

●  supportofdatatypesandnamespaces

●  wriMeninXMLandextensible!

BioinfRes SoSe 16

NamesandOtherComplica)ons

AmosBairoch

takenfromhMp://web.expasy.org/images/people/Amos_Bairoch.jpg

IoannisXenarios

takenfromhMp://www.isb-sib.ch/people/Ioannis.Xenarios

BioinfRes SoSe 16

History

1986 A.BairochcreatedSwiss-Protatthe UniversityofGeneva,since1988in

collabora)onwithEMBL/EBI

1993 togetherwithRonAppellaunchofExPASy

1998 Founda)onofSIB(SwissIns)tuteof Bioinforma)cs)

2002 Founda)onoftheUniProtconsor)umby EBI,SIBandPIR

BioinfRes SoSe 16

UniProtComponents:●  UniProtKB:-  UniProtKB/Swiss-Prot-  UniProtKB/TrEMBL

●  UniParc:puresequencearchive,noannota)ons

●  UniRef:consistsfothreedatabasesofclusteredsetsofproteinsequences(UniRef100,UniRef90,UniRef50)usingtheCD-HITalgorithm

●  UniMes:datafrommetagenomicandenvironmentalsamples,notinUniProtKB

BioinfRes SoSe 16

ExPASy

●  hMp://www.expasy.org●  ExpertProteinAnalysisSystem(1993)

●  now:SIBExPASyBioinforma)csResourcesPortal●  Ar)moP,JonnalageddaM,ArnoldK,Bara)nD,CsardiG,de

CastroE,DuvaudS,FlegelV,For)erA,GasteigerE,GrosdidierA,HernandezC,IoannidisV,KuznetsovD,Liech)R,MoreoS,MostaguirK,RedaschiN,RossierG,XenariosI,andStockingerH.ExPASy:SIBbioinforma9csresourceportal,NucleicAcidsRes,40(W1):W597-W603,2012.

BioinfRes SoSe 16

ExpasyCategories

●  Proteomics●  Genomics

●  StructuralBioinforma)cs

●  Systemsbiology●  Phylogeny/evolu)on

BioinfRes SoSe 16

ExpasyCategories

●  Popula)ongene)cs●  Transcriptomics

●  Biophysics

●  Imaging●  DrugDesign

BioinfRes SoSe 16

ResourceDescrip)on

1.  Resourcenameanddescrip)on2.  MaintainingSIBgroup

3.  Scien)ficcategory4.  Keywords:acontrolledvocabularyisusedtotag

theresource

BioinfRes SoSe 16

ResourceDescrip)on

5.  URLforthewebinterfaceandforthedownloadifavailable

6.  SoQwaretype:website,commandlineinterface,GUI,etc

7.  Status:greencheckboxifcurrentlyavailable

BioinfRes SoSe 16

UniProt/SwissProtSta)s)cs

●  Release2016_05,May.11th●  takenfromhMp://web.expasy.org/docs/relnotes/relstat.html

●  551.193sequenceentries(548.454in2015_05)/196.822.649aminoacids(195.409.447in2015_05)

BioinfRes SoSe 16

UniProt/SwissProtSta)s)cs●  Growthoveroneyear:2016_5vs2015_5

Protein existence (PE) Entries % 1. Evidence at protein level 92.536

(85.419) 16.8

(15.6) 2. Evidence at transcript level 57.757

(61.814) 10.5

(11.3) 3. Inferred from homology 387.589

(387.733) 70.3

(70.7) 4. Predicted 11358

(11.526) 2.1

(2.1) 5. Uncertain 1.953

(1.962) 0.4

(0.4)

BioinfRes SoSe 16

Development

takenfromhMp://web.expasy.org/docs/relnotes/relstat1.pngforrelease2015_5

BioinfRes SoSe 16

MoreNumbers(rel.2015_5)

●  Representedspecies:13.209●  Top20species:116.206sequences,i.e.21.3%ofthetotalnumberofsequences

Entries No of Species Entries No of Species 1 5.495 8 228 2 1.899 9 214 3 1.023 10 122 4 657 11-20 711 5 487 21-50 426 6 399 51-100 213 7 289 >100 1.046

BioinfRes SoSe 16

SpeciesRepresenta)on(rel.2015_5)Top Frequency Species

1 20.198 Homo sapiens (Human) 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces cerevisiae (Baker’s yest) 6 5.993 Bos taurus (Bovine) 7 5.103 Schizosaccheromyces pombe (Fission yeast) 8 4.433 Escherichia coli K12 9 4.185 Bacillus subtilis 10 4.131 Dictyostelium discoideum (Slime mold) ... ... ...

BioinfRes SoSe 16

Representa)onoftheDivisions(rel.2015_5)

Archaea (4%), 19340

Bacteria (61%), 332110

Eukaryota (33%), 180411

Viruses (3%), 16593

BioinfRes SoSe 16

Distribu)onofEukaryota(rel.2015_5)

Human (11%), 20199

Other Mammalia

(26%), 46146

Other Vertebrata

(10%), 17823

Viridiplantae (20%), 36480

Fungi (17%), 31527

Insecta (5%), 8781

Nematoda (2%), 4417

Other (8%), 15038

BioinfRes SoSe 16

LengthDistribu)on(rel.2015_5)

0

10000

20000

30000

40000

50000

60000

70000

BioinfRes SoSe 16

AminoAcidComposi)on(rel.2015_5)

figure taken from http://web.expasy.org/docs/relnotes/relstat.html gray=aliphatic, red=acidic, green=small hydroxy, blue=basic, black=aromatic, white=amide, yellow=sulfur

BioinfRes SoSe 16

SwissProtAnnota)onProcess

●  definedinhMp://www.uniprot.org/docs/sop_manual_cura)on.pdf

●  explainedinhMp://www.uniprot.org/help/manual_cura)on

BioinfRes SoSe 16

Annota)onPhases

1.  Sequencecura)on2.  Sequenceanalysis3.  Literaturecura)on4.  Family-basedcura)on5.  EvidenceaMribu)on6.  Qualityassurance,integra)onandupdate

BioinfRes SoSe 16

SequenceCura)on

●  morethan95%aretranslatedCDSfromINSDC●  othersources:PDB,directproteinsequencing,projectsnotsubmiongtoINSDC

●  sequencesareselectedaccordingtocura)onpriori)es(hMp://www.uniprot.org/program/)

●  resultsinthe“canonicalsequence”foragene/speciespair

BioinfRes SoSe 16

Stepstowardthecanonicalsequence

●  Entryselec)on●  RunBLASTsimilaritysearchestoiden)fyaddi)onalsequencesforthesamegene

●  Iden)fyhomologsbyreciprocalBLASTandphylogenybasedresources

●  Lockselectedentriesforothercuratorstopreventduplica)on

BioinfRes SoSe 16

Stepstowardthecanonicalsequence●  PreparesequencealignmentswithT-Coffee,Muscle,ClustalW

●  Mergeintothecanonicalsequence:-  mostprevalent-  mostsimilartoorthologssequencesfoundinotherspecies

-  basedonlengthandaacomposi)onitallowstheclearestdescrip)on

-  default:longest

●  recordconflictsandvaria)ons

BioinfRes SoSe 16

SequenceAnalysis

●  Severalanalysisprogramsareappliedtothesequencesfor:-  topologicalfeatures-  post-transla)onalmodifica)ons-  domains

●  allresultsaremanuallycheckedandin-orexcludedforannota)on

BioinfRes SoSe 16

TopologicalAnalysis

Tools Prediction Signal P Presence and location of signal peptides TargetP Presence and location of transit peptides Predotar Mitochondrial, plastid or ER targeting sequences ESKW Transmembrane domains MEMSAT Transmembrane domains TMHMM Transmembrane domains Phobius Discriminates transmembrane and signal regions

BioinfRes SoSe 16

Post-transla)onalmodifica)onAnalysis

Tools Prediction GPI-predictor GPI lipid anchor sites NetNGlyc N-glycosylation sites NetOGlyc O-glycosylation sites NMT Predictor N-terminal myristoylation sites Sulfinator Tyrosine sulfatation sites

BioinfRes SoSe 16

DomainAnalysis

Tools Prediction ps_scan internal PROSITE profile, pattern and rule scanning InterPro retrieves non-PROSITE motif matches using InterPro database or

InterProScan Coils Coiled-coils regions polyAA internal program which identifies homopolymeric stretches of amino

acids REPEAT identifies the following repeats: Ankyrin, Armadillo, HAT, HEAT,

Kelch, Leucine-rich, PFTA, PFTB, RCC1, TPR, WD40

BioinfRes SoSe 16

Automatically selected results are returned in a graphical interface which allows visualisation of the predictions (Figure 1). Selected features are shown in green and unselected features are shown in red. The selected/unselected state of a feature can be toggled by clicking on it.

Figure 1. UniProtKB sequence analysis results displayed in graphical interface

All predictions are manually reviewed and relevant results are selected for inclusion in the entry. The sequence analysis platform then transforms the selected features into UniProtKB annotation by applying a set of automatic annotation rules (Figure 2).

taken from http://www.uniprot.org/docs/sop_manual_curation.pdf

BioinfRes SoSe 16

LiteratureCura)on

●  Iden)fica)onofrelevantscien)ficliteraturefrom-  literatureandtextminingresources(PubMed,EuropePMC,iHOP,TextPresso)

-  addi)onsfromothersourcesmadebythecurator

●  Informa)onisextractedformthefulltext:-  generalannota)ons(notposi)onspecific)-  posi)onspecificannota)ons

BioinfRes SoSe 16

GeneralAnnota)ons

●  hMp://www.uniprot.org/help/general_annota)on

●  posi)on-independent●  containsmostlygeneralbiologicalinforma)onlike:func)ons,cataly)cac)vity,cofactor,enzymeregula)on,subunitstructure,pathway,...

BioinfRes SoSe 16

SequenceAnnota)ons

●  posi)ondependent●  hMp://www.uniprot.org/help/sequence_annota)on

●  regionsorsitesofinterestlikepost-transla)onalmodifica)ons,bindingsites,ac)vesites,etc.

●  containsseveralsubsec)ons:moleculeprocessing,regions,sites,aminoacidmodifica)ons,naturalvariants,experimentalinfo,secondarystructure

BioinfRes SoSe 16

Family-basedCura)on

●  Evalua)onandcura)onofhomologsasdescribedabove

●  Standardiza)onofannota)onofhomologs●  Propaga)onofannota)onacrossthehomologstoensureconsistency

BioinfRes SoSe 16

EvidenceAMribu)on●  Everyannota)onisaMributedtoitsoriginalsource

●  Everyannota)oncanbetracedbackandevaluated

●  Forevidencedis)nc)onthereare7codesfromtheEvidenceCodeOntology(ECO)usedformanuallycuratedentries

●  hMp://www.uniprot.org/help/evidences●  Addi)onalGOtermannota)on

BioinfRes SoSe 16

done through the use of a subset of evidence codes from the Evidence Code Ontology (ECO) (24). There are seven ECO evidence codes used in manually curated entries as shown in Table 2.

Table 2. Evidence Code Ontology (ECO) codes used during the UniProt manual curation process

ECO code Term name Usage ECO:0000269 experimental evidence used in

manual assertion Information for which there is published experimental evidence

ECO:0000303 non-traceable author statement used in manual assertion

Information based on author statements in scientific articles for which there is no experimental support

ECO:0000250 sequence similarity evidence used in manual assertion

Information which has been propagated from a related experimentally characterised protein

ECO:0000312 imported information used in manual assertion

Information which has been imported from another database and manually verified

ECO:0000305 curator inference used in manual assertion

Information which has been inferred by a curator based on his/her scientific knowledge or on the scientific content of an article

ECO:0000255 match to sequence model evidence used in manual assertion

Information originating from the UniProt automatic annotation systems or any of the sequence analysis programs used during the manual curation process and which has been manually verified

ECO:0000244 combinatorial evidence used in manual assertion

Information which is manually curated based on a combination of experimental and computational evidence

Full details of the evidences used in UniProtKB are available at http://www.uniprot.org/manual/evidences. 4.11 GO annotation Gene Ontology (GO) terms are assigned based on experimental data from the literature. Relevant terms are identified using the QuickGO (25) browser and are assigned to entries using the Protein2GO curation tool. This tool has been developed within the UniProt group and is used both by UniProt and by other members of the GO Consortium. GO terms are also propagated to homologous proteins where appropriate. The procedure is described in more detail at http://www.ebi.ac.uk/GOA/ManualAnnotationEfforts. 4.12 Quality control and integration All finished entries are run through a series of automated checks which verify a large number of biological rules such as the positions and relevance of amino acids cited in the entry. Any reported errors are corrected. Once an entry has passed the automated checks, it undergoes manual review by a senior curator to ensure that all relevant sequences have been merged, that all relevant literature has been added, that the annotation has been added correctly, and that all relevant sequence analysis results have been included. Once an entry has passed the automated and manual quality control checks, it is integrated into the database. 4.13 Unlock finished entries Integrated entries are unlocked so that they are available for further curation.

taken from http://www.uniprot.org/docs/sop_manual_curation.pdf

BioinfRes SoSe 16

QualityControlandIntegra)on

●  Finishedentriesrunthroughaseriesofrule-basedcheckedconcerningespeciallyposi)onsandregions

●  Allerrorsarecorrected

●  Manuallyreviewedbyaseniorcurator

●  Finallyitisintegratedintothedatabase●  Unlockthefinishedentriesforfurthercura)on

BioinfRes SoSe 16

Demostra)on

●  hMp://www.uniprot.org/uniprot/P62756#sec)on_features

BioinfRes SoSe 16

TheSwiss-ProtFlatFile●  hMp://web.expasy.org/docs/userman.html●  Anentryiscomposedbydifferentlinetypes

●  Linetypeshavetheirownformat

●  FollowsEMBLNucleo)deSequenceDatabaseformatascloseaspossible

●  2sec)ons:-  coredata(sequencedata,cita)oninfo,taxonomy)-  annota)ons(func)on,modifica)on,domains,secandquartstructure,diseaseassocia)ons,conflicts,asf)

BioinfRes SoSe 16

Line Code

Content Occurence in an entry

ID Identification Once; starts the entry AC Accession number(s) Once or more DT Date Three times DE Description Once or more GN Gene name(s) Optional OS Organism species Once or more OG Organelle Optional OC Organism classification Once or more OX Taxonomy cross-reference Once OH Organism host Optional

--continued--

The following table lists the available two-letter line codes. Each code is followed by three blanks.

BioinfRes SoSe 16

Line Code

Content Occurence in an entry

RN Reference number Once or more RP Reference position Once or more RC Reference comment(s) Optional RX Reference cross-reference(s) Optional RG Reference group Once or more (Optional if RA line) RA Reference authors Once or more (Optional if R line) RT Reference title Optional RL Reference location Once or more CC Comments or notes Optional DR Database cross-references Optional PE Protein existence Once KW Keywords Optional FT Feature table data Once or more in Swiss-Prot, optional in TrEMBL SQ Sequence header Once (blanks) Sequence data Once or more // Termination line Once; ends the entry

BioinfRes SoSe 16

FieldsinMoreDetail

●  IDline:IDEntryNameStatus;SequenceLength.

●  EntryName:upto11uppercasealphanumericcharactersX_Y-  Xisamnemoniccodeofatmost5alphanumericcharacters

-  Yisamnemonicspeciesiden)fica)oncodeofatmost5alphanumericcharacters

●  IDCYC_BOVINReviewed;104AA.

BioinfRes SoSe 16

●  ACline:ACAC_number_1;[AC_number_2;]...[AC_number_N;]

●  Accessionnumber:6or10characters1 2 3 4 5 6 7 8 9 10 [A-N,R-Z][0-9][A-Z] [A-Z,0-9][A-Z,0-9][0-9][O,P,Q] [0-9][A-Z,0-9][A-Z,0-9][A-Z,0-9][0-9][A-N,R-Z][0-9][A-Z] [A-Z,0-9][A-Z,0-9][0-9][A-Z] [A-Z,0-9] [A-Z,0-9] [0-9]

●  RegEx:[OPQ][0-9][A-Z0-9]{3}[0-9]|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2}

●  Examples:P12345,Q1AAA9,A0A022YWF9

BioinfRes SoSe 16

●  DTline:date,DD-MMM-YYYY●  alwaysoneofthebiweeklyreleasedates

●  alwaysthreelines:-  dateofintegra)on-  dateofsequenceversion,sequenceversionX-  dateofentryversion,entryversionX

●  Example:DT01-FEB-1999,integratedintoUniProtKB/TrEMBL.DT15-OCT-2000,sequenceversion2.DT15-DEC-2004,entryversion5.

BioinfRes SoSe 16

●  DElines:-  threecategoriesandaddi)onalsubcategories-  containsarecommendedname-  besides:fullname,shortname,ECnumber-  alterna)venames:e.g.asanallergenorinbiotechnology,...

BioinfRes SoSe 16

DERecName:Full=AnnexinA5;DEShort=Annexin-5;DEAltName:Full=AnnexinV;DEAltName:Full=Lipocor)nV;DEAltName:Full=EndonexinII;DEAltName:Full=CalphobindinI;DEAltName:Full=CBP-I;DEAltName:Full=Placentalan)coagulantproteinI;DEShort=PAP-I;DEAltName:Full=PP4;DEAltName:Full=Thromboplas)ninhibitor;DEAltName:Full=Vascularan)coagulant-alpha;DEShort=VAC-alpha;DEAltName:Full=AnchorinCII;DERecName:Full=Granulocytecolony-s)mula)ngfactor;DEShort=G-CSF;DEAltName:Full=Pluripoie)n;DEAltName:Full=Filgras)m;DEAltName:Full=Lenogras)m;DEFlags:Precursor;

BioinfRes SoSe 16

●  OSline:origina)ngorganism●  OSHomosapiens(Human).●  OSRoussarcomavirus(strainSchmidt-RuppinA)(RSV-SRA)(Avianleukosis

OSvirus-RSA).

●  OClines:containthetaxonomicclassifica)onofthesourceorganismaccordingto(hMp://www.ncbi.nlm.nih.gov/Taxonomy/)

●  OCNode[;Node...].

●  OCEukaryota;Metazoa;Chordata;Craniata;Vertebrata;Euteleostomi;OCMammalia;Eutheria;Euarchontoglires;Primates;Catarrhini;Hominidae;OCHomo.

BioinfRes SoSe 16

RN,RP,RC,RX,RG,RA,RT,RL●  canoccurmul)ple)me●  orderinblockfixed

●  e.g:RN[1]RPNUCLEOTIDESEQUENCE[MRNA](ISOFORMSAANDC),FUNCTION,INTERACTIONRPWITHPKC-3,SUBCELLULARLOCATION,TISSUESPECIFICITY,DEVELOPMENTALRPSTAGE,ANDMUTAGENESISOFPHE-175ANDPHE-221.RCSTRAIN=BristolN2;RXPubMed=11134024;DOI=10.1074/jbc.M008990200;RAZhangL.,WuS.-L.,RubinC.S.;RT"AnoveladapterproteinemploysaphosphotyrosinebindingdomainandRTexcep)onallybasicN-terminaldomainstocaptureandlocalizeanRTatypicalproteinkinaseC:characteriza)onofCaenorhabdi)selegansRTCkinaseadapter1,aproteinthatavidlybindsproteinkinaseC3.“;RLJ.Biol.Chem.276:10463-10475(2001).

BioinfRes SoSe 16

CClines

●  freetext●  containsmostoftheannotatedinforma)on●  CC-!-TOPIC:Firstlineofacommentblock;

CCsecondandsubsequentlinesofacommentblock.

●  structuredbypredefinedtopicslike:Allergen,Alterna)veProducts,..,Cofactor,...,Disease,..Domain,...,Func)on,Interac)on,.......

BioinfRes SoSe 16

CC -!- ALLERGEN: Causes an allergic reaction in human. Minor allergen of!

CC bovine dander.!

CC -!- ALTERNATIVE PRODUCTS:!

CC Event=Alternative initiation; Named isoforms=2;!

CC Name=Alpha;!

CC IsoId=P51636-1; Sequence=Displayed;!

CC Name=Beta;!

CC IsoId=P51636-2; Sequence=VSP_018696;!

CC -!- SUBCELLULAR LOCATION: Cell membrane {ECO:0000250}; Peripheral!

CC membrane protein {ECO:0000250}. Secreted {ECO:0000250}. Note=The!

CC last 22 C-terminal amino acids may participate in cell membrane!

CC attachment.!

CC -!- SUBCELLULAR LOCATION: Isoform 2: Cytoplasm {ECO:0000305}.!

!

!

BioinfRes SoSe 16

CrossReferences

●  toomanytoenumerate●  extensivereferenceswithnucleo)dedatabases,e.g.:inEMBLFTCDS302..2674FT/protein_id="CAA03857.1“FT/db_xref="SWISS-PROT:P26345“FT/gene="recA“FT/product="RecAprotein“inSwiss=ProtDREMBL;AJ297977;CAC17465.1;-;Genomic_DNA.DREMBL;X56491;CAA39846.1;ALT_FRAME;mRNA.

BioinfRes SoSe 16

KeyWords/FeatureTable

●  KWKeyword[;Keyword...].●  helpstosearchresp.indexthedatabase

●  nolimits:KW3D-structure;Alterna)vesplicing;Alzheimerdisease;Amyloid;KWApoptosis;Celladhesion;Coatedpits;Copper;KWDirectproteinsequencing;Diseasemuta)on;Endocytosis;KWGlycoprotein;Heparin-binding;Iron;Membrane;Metal-binding;KWNotchsignalingpathway;Phosphoryla)on;Polymorphism;KWProteaseinhibitor;Proteoglycan;Serineproteaseinhibitor;Signal;KWTransmembrane;Zinc.

●  FeaturetablelikeGenBank/EMBL/DDBJ

BioinfRes SoSe 16

Programma)cAccess

●  hMp://www.uniprot.org/help/programma)c_access(rememberthislink!)

●  severalusecasesdocumented,butnotasanAPI●  bestway:usethewebinterfacetoconstruct/refineyourqueryfirstbeforeyoutrytoautomatetheprocess

BioinfRes SoSe 16

RetrievinganIndividualEntry

●  usessimpleURLwhichcanbebookmarked●  forindividualentries:hMp://www.uniprot.org/uniprot/P12345

●  defaultresultisawebpage

●  alterna)veformats:txt,xml,rdf,fasta,gff

●  specifiedviatheaccessionsuffix

●  structuredformatslikexmlorrdfcanincludereferencedentries

BioinfRes SoSe 16

UsingtheIDmappingservice

●  hMp://www.uniprot.org/help/programma)c_access#batch_retrieval_perl_example

●  useshMpPOSTmethod

●  convertsbetweendifferentdatabaseIDs

●  youhavetoknowthespecificabbrevia)onfortherespec)vedatabases

BioinfRes SoSe 16

RetrievingEntriesviaQueries

●  useshMpGETmethodi.e.●  thequerystringispartoftheURL

●  structuremightbequitecomplex

●  usethebrowsertoconfigurethequerystring●  moreseongareavailableviathequerybuilderhMp://www.uniprot.org/help/advanced_search

●  theURLlengthmightbelimitedto1000characters

BioinfRes SoSe 16

Examples●  hMp://www.uniprot.org/uniprot/P12345.txt●  hMp://www.uniprot.org/uniprot/P12345.xml

●  hMp://www.uniprot.org/uniref/UniRef90_P04259.xml

●  hMp://www.uniprot.org/uniref/UniRef90_P04259.rdf

●  hMp://www.uniprot.org/uniref/UniRef90_P04259.fasta

●  hMp://www.uniprot.org/uniref/UniRef90_P04259.tab