EBI web resources II: Ensembl and InterPro Yanbin Yin Fall 2014 1
EBI web resources III: Web-based tools in Europe (EBI...
Transcript of EBI web resources III: Web-based tools in Europe (EBI...
EBI web resources III: Web-based tools in Europe (EBI, ExPASy,
EMBOSS, DTU)
Yanbin Yin
1
Homeworkassignment41. Downloadhttp://cys.bios.niu.edu/yyin/teach/PBB/purdue.cellwall.list.lignin.fa to
yourcomputer2. SelectaC3HproteinandaF5Hproteinfromtheabovefileandcalculatethe
sequence identitybetweenthemusingtheWaterserveratEBI.3. Performamultiple sequencealignmentusingMAFFTwithallFASTA
sequences inthefile4. Builtaphylogenywiththealignmentusingthe"AlaCarte"modeat
http://www.phylogeny.fr/5. Buildanotherphylogenystartingfromtheunalignedsequences usingthe
“one-click”modeathttp://www.phylogeny.fr/;ifyouencounteranyerrorreports,trytofigureoutwhyandhowtosolveit (hint:skiptheGblocks step).
Writeareport(inwordorppt)toincludealltheoperations,screenshotsandthefinalphylogeniesfromstep3and4.
Dueon10/11(sendbyemail)
2
Officehour:Tue,ThuandFri2-4pm,MO325AOremail:[email protected]
Outline
• Handsonexercises!
3
Pairwisealignment(includingdatabasesearch)tools
4
FasterSlower
LessmatchesMorematchesBLASTFASTASSEARCH
BLATBWABowtiePSI-BLAST
PSI-SearchHMMER3RPS-BLAST
5
http://www.ebi.ac.uk/
Tothebottomofthepage
Clickbynames(A-Z)
6
Thisisaverylong listoftoolsScrolldowntofindFASTA
OrCtrl+F andtypefasta
Wearegonna tryFASTAtool
ClickonFASTA[nucleotide]
7
ClickGenomesWe’regonna searchArabidopsisgenome
8
Clickonthislittlearrow ChooseArabidopsis
Gotohttp://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.faandcopythefirstseq (CesA)andpastehere
Changeheretoprotein
9
Tfastx:allowframeshiftbetweencodons
Tfasty:alsoallowframeshiftwithincodons
Toleratesequenceerrors
Good forfinding pseudogenes
http://www.ebi.ac.uk/Tools/sss/fasta/help/index-genomes.html
10
Should befinished veryquickly
Rawoutput (plain text)Graphicalpresentationoftheoutput
ShowEMBLformatofthesubject(hit)Showalignment
11
Inthealignment, lookfor/ frameshift\ frameshift* stopcodon
Weareattherawoutputview
Inorder toalignthequeryprotein tothesubjectgenomicDNA,reading frameshavetomove1or2baseahead(1baseinsertionor2baseinsertion)
12
BLASTgivesshorteralignmentbecauseitsalignmentbreakswhereitseesframeshifts
13
GobacktothetoolA-Zpage:http://www.ebi.ac.uk/services/all
Ctrl+F andtypessearch
SSEARCHisacommand intheFASTApackageimplementing Smith-Watermanalgorithm
Ssearch canonlydoprotein-protein ornucleotide-nucleotide searchesSlowerbutmostaccurate
Canonlydopr-prornt-nt search
14
GobacktothetoolA-Zpage:http://www.ebi.ac.uk/services/all
Ctrl+F andtypeemboss
EMBOSS:EuropeanMolecularBiologyOpenSoftwareSuiteEMBOSS:TheEuropeanMolecular BiologyOpenSoftwareSuite(2000)Rice,P.Longden,I.andBleasby,A.TrendsinGenetics16,(6)pp276--277
EMBOSScontainhundredsofcomputerprogramsforsequenceanalysis
15
Needleman-wunsch algorithmSmith-Watermanalgorithm
Equivalenttothebl2seqcommandoftheBLASTpackage
Let’stryneedlefirst
16
Globalvs localalignment:• inalocalalignment, youtrytomatchyourquerywithasubstring (aportion)
ofyoursubject(reference)• inaglobalalignmentyouperformanendtoendalignmentwiththesubject
17
CslA:539aaCesA:1089aa
Gotohttp://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.faCopy&pasteCesACopy&pasteCslA
18
- gap.Negativescore:positivescore|identical
ThisisdifferentfromwhatBLASTshowsthealignment
Notadatabasesearch,sonoE-valueisreported
Thisisneedleoutput
19
Thisiswateroutput
Thebestwaytofind theoptimallyaligned regionsandcalculatethesimilaritybetweentwosequences
20
Selectheretotryblast2seq
21
Gotohttp://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.faCopy&pasteCesACopy&pasteCslA
Chooseblastp
22
FragmentedalignmentsThisblast2seqoutput
Multiplesequencealignmenttools
Foundationformanyotherfurtheranalyses:phylogeny,evolution,motif,
proteinfamilyetc.
23
24
http://www.ebi.ac.uk/Tools/msa/TheMSApageshowsninetoolsandwe’regonna tryClustal Omega,MAFFTandMUSCLE
25
26
Youcanalwayscheckthehelppage
Gohttp://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.faandcopypasteallthe9protein seq here
Thensubmit
ThisisClustal Omegapage
27
Thisiscalledclustal formatofMSA
ColorAAbasedonchemicalproperties,e.g.acidicAAinblue
Checktheevolutionaryrelatedness
Gettextformatsummaryoftheresults
28
Txtformattodescriberelatednessandcanbevisualizedgraphicallyasatreegraph
Matrixtellshowsimilareachpairofseqs is
Bothcanbecopypastetonotepadandsaveasplaintextfile
29
30
ThisisMAFFTpageYoucanalwayscheckthehelppage
Gohttp://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.faandcopypasteallthe9protein seq here
ChangeheretoClustalW
31
IDsweretruncated
ForcethefirstMresiduealigned
32
ThisisMUSCLEpage
Gohttp://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.faandcopypasteallthe9protein seq here
ChangeheretoClustalW
33
34
MolecularSystemsBiology7:539,2011
accuracy speedMAFFT>Clustal Omega>MUSCLE>>ClustalW
http://mafft.cbrc.jp/alignment/software/about.html
SowhichMSAtoolshould Iuse?
35
http://www.ebi.ac.uk/Tools/msa/Visualizealignment
36
Youcanalwayscheckthehelppage
Gohttp://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.fa.alnandcopypastetheMSAbuiltabove
ThisisMview page
MView reformatstheresultsofasequencedatabasesearch(BLAST,FASTA,etc)oramultiplealignment(MSF,PIR,CLUSTAL,etc)addingoptionalHTMLmark-uptocontrolcolouring andwebpagelayout.MView isnotamultiplealignmentprogram,norisitageneralpurposealignmenteditor
37Consensuslettersexplainedathttp://bio-mview.sourceforge.net/manual/manual.html#ref-output-formats
38
AnotherMSAvisualizationtool:ESPript http://espript.ibcp.fr/ESPript/ESPript/
Clickheretostart
39
Copyhttp://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.fa.alnandpastetoatxtfileandsaveitonyourdesktop,andupload to
Afterthefileisuploaded
40
Anewwindowpopped out,viewinPDF
41
Wejusttriedtheverybasicfunction.ThiswebserverhasmanymoreusefulfunctionssuchasdisplayingsecondarystructuresalongwithMSA.Tolearnmore:http://espript.ibcp.fr/ESPript/ESPript/esp_tutorial.php
42
ExPASy:ExpertProteinAnalysisSystematSIBCollectionofexternal/internaltools
43
http://expasy.org/
Clickongenomics, thensequencealignment
Thiswebsitecollectandclassifyweblinkstohundreds ofbioinfo tools
44
Thispageliststoolsforsequencealignment
We’regonna try
45
http://weblogo.berkeley.edu/logo.cgi
Uploadthefilethatwedownloaded fromhttp://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.fa.aln
Toggle thistoallowlogoshown inmultiline
46
Clicktoincrease
47
Youcanalsocopypasteasegmentofthealignment toweblogoNoneedtousetheentirealignment
48
Pastethecopiedsegmenthere
49
WithMSAyoucanbuildaphylogeny todescribetherelatednessofseqs
Seqs
MSA
Phylogeny
Graph
Wearegonna trythiswebsite
50
http://phylogeny.lirmm.fr/phylo_cgi/index.cgi
Threemodesofphylogenyreconstruction
Trytheoneclickmode
51
Oneclickmodeusesthesetools
Givethisjobaname
http://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.fa
Gblocks isaprogramautomaticallyeditthealignment
52
53
EMBOSS:EuropeanMolecularBiologyOpenSoftwareSuite
EMBOSS:TheEuropeanMolecularBiologyOpenSoftwareSuite(2000)Rice,P.Longden,I.andBleasby,A.TrendsinGenetics16,(6)pp276--277
54
http://emboss.sourceforge.net/
EMBOSScontainhundredsofcomputerprogramswritteninClanguageforsequenceanalysis
ThebestwaytouseistoinstallitonaLinuxcomputer
Herewe’regonna trysomepublicwebserversthathaveEMBOSSpackageinstalled
55
Manyothersarenotaccessible,butthisoneis
56
ThisiscalledEMBOSSexplorer,whichisawebinterfacetosupportrunning EMBOSSprograms throughweb
350+programsputintodifferentgroups
Wewilltryafewprograms inthispackage
57
Themostbasicone:translateanucleotideseq toanaminoacidseq (relatedtofinding theopenreadingframes)
Find theprogramtranseq inthenucleictranslationgroup
58
Copyandpastetheseq inhttp://cys.bios.niu.edu/yyin/teach/PBB/nt-example.faIt’sanassembledtranscriptfromESTdataofsomealgalspeciesWedonotknowifitindeedencodeaproteinandifyeswhereistheORFRemembermRNAcontainsuntranslated region(UTR)
Chooseallsixframes
59
Thisislikelytherightframe
60
Ifthisisacorrectresult?Youcantakethent seq todoblastatNCBI
Puttheseq IDherebecauseitisinGenBank already
Chooseswiss-prot becauseitissmallerandhighquality
61
ClickformattingoptionsChooseplaintextviewClickreformat
62
Thisisthealignmentofourquerywiththebesthit,theframeis+2,sameasthetranseq result
63
64
ThisisthelongestORF
65
ATGCGCTA
TACGCGAT
TAGCGCATReverse
Complementhttp://cys.bios.niu.edu/yyin/teach/PBB/nt-example.fa
66
67
ATGCGCTA
TGCGCTRegion2-7
http://cys.bios.niu.edu/yyin/teach/PBB/nt-example.fa
Region57-400
68
69
CalculateGCcontent
ATGCGCTAGC%=50%
Changetoyestogetapic
70
71
ATGCGCTA
16possibledinuc64possible trinuc256possibletetranuc
Defaultisdinuc
72
Equaloccurrence:1/16
Application: scangenometolookforregionswithabnormalcompositions
73
http://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.fa,copypastethe1st seq
74
75
PopulartoolsdevelopedatTechnicalUniversityofDenmark
76
Google:cbs dtu
77
http://cys.bios.niu.edu/yyin/teach/PBB/nt-example.fa
78
ThislistsalltheATGintheseq,eachwasscoredtoindicateitslikelihood tobeastartcodon
79
80
http://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.fa,copypastethe1st seq
81
82
Nextclass:ClustalX andMEGA
83