Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf ·...
Transcript of Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf ·...
![Page 1: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/1.jpg)
Install and run external command line softwares
YanbinYin
1
![Page 2: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/2.jpg)
Homework#8Createafolderunderyourhomecalledhw8
Changedirectorytohw8
DownloadEscherichia_coli_K_12_substr__MG1655_uid57779faa filefromNCBIandcopyitasmg1655.faa
DownloadEscherichia_coli_O157_H7_uid57781allfaa files fromNCBIandconcatenatethemaso157h7.faa
Createunix one-linerstocounthowmanyproteinsarethereinthetwonewlymadefaa files
Createunix one-linerstodothefollowing:
- BLASTPsearchofmg1655.faavs.o157h7.faaandfindouthowmanymg1655.faaproteinshavehitsino157h7.faa(useE-value<1e-5tofilterhits)
- BLASTPsearchofo157h7.faavs.mg1655.faaandfindouthowmanyo157h7.faaproteinshavehitsinmg1655.faa(useE-value<1e-5tofilterhits)
2
Writeareport(inwordorppt)toincludealltheoperationsandscreenshots.
DueonNov21(sendbyemail,ifthereare2+files,puttheminazipfile;includeyourlastnameinthefilename)
![Page 3: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/3.jpg)
Programs/toolsweoftenuse
• BLAST• FASTA• HMMER• EMBOSS• lftp• bioperl• R
• Clustalw• MAFFT• MUSCLE• SRAtoolkit• weblogo• PhyML• FastTree• RaxML• USEARCH• ...
http://gacrc.uga.edu/3
![Page 4: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/4.jpg)
4
OnWINDOWS,youdownloadsomesoftwares usuallyinexeformat.Youdoubleclickonsomeiconandwindowsinstallerwilldotherestforyou.Usuallyyouwillbeasked:whereyouwanttohavethisprograminstalledto(C:\WINDOWS\Programs\)
OnLinux,it’sverydifferent.Youdownloadthesoftwarepackage,usuallyincompressedformat(.gz or.zip).Youhavetotypeinaseriesofcommandstoinstallit,oftenwriteittoasystemfolder(e.g./usr/bin),whichyoudon’thaveaccessunlessyouaretherootuser
![Page 5: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/5.jpg)
Linux-basedprogramtypes
• SourcecodesinC,C++,Java,Fortranetc.– Needtobecompiledbeforeexecutethecommand
• Precompiledexecutables orbinarycodes• Sourcecodesinscriptinglanguages(perl,python,Retc.)– Canexecutedirectly
Manyprogramsneeddependencies(ifordertoinstallprogramA,youneedinstallBfirst…)
5
![Page 6: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/6.jpg)
Onyourownubuntu machine…• Youaretheroot(administrator)andusingthesudo
commandyoucaninstallanythingyouwantintothesystemdirectory(/usr/bin/,/bin/,/lib/etc.)– apt-get(AdvancedPackagingTool)candomanyinstallationsfor
youfromsourceorbinarycodeshttps://help.ubuntu.com/12.04/serverguide/apt-get.html
• OnSer,youarenottherootandyoucanonlyinstallthingsunderyourhomeusingthe“hard”way– Download->unpack->install->editPATHenvironmentalvariable– Makesureyoucreatefoldersforeachtools,e.g.
yourhome/tools/fasta
6
NotonSer
![Page 7: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/7.jpg)
InstallBLASTonyourownmachine[testifinstalled]sudo apt-get install blast2[testifinstalled]blastall[whereitinstalled]which blastall[ifyouwanttouninstall]sudo apt-get remove blast2[testifit’sgone]blastall
[newversionofblast]sudo apt-get install ncbi-blast+
7
NotonSer
![Page 8: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/8.jpg)
Useapt-gettoinstall
lftp,emboss,hmmer,bioperl,clustaw,muscle,Rsudo apt-get install xxx
[totestifinstalled,typeinthecommand]
8
![Page 9: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/9.jpg)
OnyourownMAC
http://www.macobserver.com/tmo/article/install_the_command_line_c_compilers_in_os_x_lionInstallXcode,thenCcompiler,thenyoucaninstallMacporthttp://www.macports.org/install.phpWithmacport,youcaninstallwget:sudo port install wgetlftp: sudo port install lftphmmer:sudo port install hmmeremboss:sudo port install embossR:sudo port install Rblast:http://www.blaststation.com/freestuff/en/howtoNCBIBlastMac.html
9
http://www.digimantra.com/howto/apple-aptget-command-mac/http://superuser.com/questions/173088/apt-get-on-mac-os-x
![Page 10: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/10.jpg)
10
Installprogramsusingthehardway(yourareNOTtheroot)
Usuallywhenabioinformaticstoolisreleased,thetoolpackagealsoincludesareadme,install ormanualfileinadditiontotheprogramitself.
Inmostcases,therearemorethanonefilesintheprogramfolder.
Sometimesyouwillseefilesinmultipledifferentlanguages.
Thereadmeorinstallfilewilltellyouhowtoinstallthetool.
Basicsteps:downloadthepackage->unpack(untar,unzip)->readthereadme->install
![Page 11: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/11.jpg)
InstallBLAST executable programBLAST source is written C and C program needs to be compiled to get executable programBut here we will download executables of BLAST
lftp ftp.ncbi.nih.gov:/blast/executables/LATEST
get ncbi-blast-2.7.1+-x64-linux.tar.gz
bye
Now you returned to Serls -l
mkdir tools
cd tools
mkdir blast
mv ../ncbi-blast-2.5.0+-x64-linux.tar.gz blast
cd blast
Unpack the tar balltar -zxf ncbi-blast-2.7.1+-x64-linux.tar.gz
ls -l
cd ncbi-blast-2.7.1+/bin
ls -l
./blastp -h
pwd 11
![Page 12: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/12.jpg)
12
Youareatyourhomenano .bashrc
Addthefollowinginthebeginningofthefileexport PATH="$HOME/tools/blast/ncbi-blast-2.7.1+/bin:$PATH"
Nowexitfromnano anddothefollowing
. .bashrcblastp -h
Now return to your homecdblastp -hwhich blastpYou will be told that the blast version is BLAST 2.2.25+, and blastp in your current path is /usr/bin/blastp. This is an older version I installed two years ago, NOT the one you just installed. You have to give the full path to run the blast version in your home
/home/yyin/tools/blast/ncbi-blast-2.7.1+/bin/blastp
What if you don’t want to type this long path? You can add it to a hidden file in your home called .bashrc where Shell auto execute every time you log in: Shell will be notified that when you call blastp, go to that folder to find it.
![Page 13: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/13.jpg)
EnvironmentvariableAnenvironmentvariableisanamedobjectthatcontainsdatausedbyoneormoreapplications.Thevalueofanenvironmentalvariablecanforexamplebethelocationofallexecutablefilesinthefilesystem,thedefaulteditorthatshouldbeused,orthesystemlocalesettings.UsersnewtoLinuxmayoftenfindthiswayofmanagingsettingsabitunmanageable.However,environmentvariablesprovidesasimplewaytoshareconfigurationsettingsbetweenmultipleapplicationsandprocessesinLinux.
env tolistallbuilt-inenvironmentvariable
PATH isaveryimportantenvironmentvariable.Thissetsthepaththattheshellwouldbelookingatwhenithastoexecuteanyprogram.Itwouldsearchinallthedirectoriesthataredefinedinthevariable.Rememberthatentriesareseparatedbya':'.Youcanaddanynumberofdirectoriestothislist.
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
13
![Page 14: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/14.jpg)
Run BLAST in command line mode
14
http://www.ncbi.nlm.nih.gov/books/NBK52640/
![Page 15: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/15.jpg)
15
WhyyoudoneedtorunBLASTincommandlineterminal?
- NCBI’sserverdoesnothaveadatabasethatyouwanttosearch
- MillionsofusersareusingNCBIBLASTservertoo
- Yourquerysethasmorethanonesequenceorevenagenome
- TheBLASToutputcouldbeprocessedandusedasinputforotherLinuxsoftwares
![Page 16: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/16.jpg)
16
Fordoingblastsearch
1. Determinewhatisyourqueryanddatabase,andwhatisthecommand
2. Formatyourdatabase
3. Runtheblastcommand(whatE-valuecutoff,whatoutputformat)
4. Viewtheresult
5. Parsetheresult(hitIDs,hitsequences,alignment,etc.)
TherearetwoversionsofBLAST-blast2(legacyblast)-blast+(thisiswhatyouinstalledinyourhome/tools/)
https://www.biostars.org/p/9480/http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
![Page 17: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/17.jpg)
BLAST2(legacyblast)blastall - | less-p # specify blastp, blastn, blastx, tblastn, tblastx
More commands in blast package
formatdb (format database)megablast (faster version of blastn)rpsblast (protein seq vs. CDD PSSMs)impala (PSSM vs protein seq)bl2seq (two sequence blast)blastclust (given a fasta seq file, cluster them based on sequence similarity)blastpgp (psi-blast, iterative distant homolog search)
17
QueryProtein
DNA
DatabaseProteinDNA
http://www.ncbi.nlm.nih.gov/books/NBK1763/pdf/ch4.pdf
![Page 18: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/18.jpg)
blastall options-pprogramname-ddatabasefilename(textfasta sequencefile)-i queryfilename-ee-valuecutoff(showhitslessthanthecutoff)-moutputformat-ooutputfilename(youcanalsouse>)-Ffilterlow-complexityregionsinquery-vnumberofone-linedescriptiontobeshown-bnumberofalignmenttobeshown-anumberofprocesserstobeused
18
![Page 19: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/19.jpg)
Newversionblast+,e.g.blastp
blastp -help | less
-query query filename-db databasefilename-out outputfilename-evalue e-valuecutoff-outfmt outputformat-num_descriptions
-num_alignments
19
blastpblastntblastnblastxtblastx
Atyourhomels -l tools/ncbi-blast-2.5.0+/bin/22executablefiles(commands)
http://www.ncbi.nlm.nih.gov/books/NBK1763/pdf/CmdLineAppsManual.pdf
![Page 20: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/20.jpg)
20
Exercise1:findCesA/Csl homologoussequenceincharophytic greenalgalgenome
Klebsormidium flaccidum genomewassequenced,analyzedandpublishedinhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC4052687/.WewanttofindouthowmanyCesA/Csl genesthisgenomehas.
NotethattherawDNAreadsareavailableinGenBank,buttheassembledgenomeandpredictedgenemodelsarenot
GotoMethodsoftheabovepaperandfindthelinktothewebsitethatprovidesthegenomeandannotationdata
Wget thepredictedproteinsequencefilewget -q http://www.plantmorphogenesis.bio.titech.ac.jp/~algae_genome_project/klebsormidium/kf_download/131203_kfl_initial_genesets_v1.0_AA.fasta
Makeindexfilesforthedatabasemakeblastdb -in 131203_kfl_initial_genesets_v1.0_AA.fasta -parse_seqids -dbtype 'prot’
Checkhowmanyproteinshavebeenpredictedless 131203_kfl_initial_genesets_v1.0_AA.fasta | grep '>' | wc
![Page 21: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/21.jpg)
21
Wget ourquerysetcp /home/yyin/cesa-pr.fa .
Dotheblastp searchblastp -query cesa-pr.fa -db 131203_kfl_initial_genesets_v1.0_AA.fasta -out cesa-pr.fa.out -outfmt 11
Checkouttheoutputfileless cesa-pr.fa.out
Changetheoutputformattoatab-delimitedfileblast_formatter -archive cesa-pr.fa.out -outfmt 6 | less
FiltertheresultusingE-valuecutoffblast_formatter -archive cesa-pr.fa.out -outfmt 6 | awk '$11<1e-10' | less
blast_formatter -archive cesa-pr.fa.out -outfmt 6 | awk '$11<1e-10' | cut -f2 | less
![Page 22: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/22.jpg)
22
Howdoyouextractthesequencesoftheblasthits?
DATABASEQuerySeqs HitIDs
HitSeqs
Furtheranalyses:MultiplealignmentPhylogeny,etc.
Linuxcmds
blastdbcmd
blast
blast_formatter
Linuxcmds
![Page 23: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/23.jpg)
23
SavethehitIDsblast_formatter -archive cesa-pr.fa.out -outfmt 6 | awk '$11<1e-10' | cut -f2 | sort -u > cesa-pr.fa.out.id
Retrievethefasta sequencesofthehitsblastdbcmd -db 131203_kfl_initial_genesets_v1.0_AA.fasta -entry_batch cesa-pr.fa.out.id | less
blastdbcmd -db 131203_kfl_initial_genesets_v1.0_AA.fasta -entry_batch cesa-pr.fa.out.id | sed 's/>lcl|/>/' | sed 's/ .*//' | less
blastdbcmd -db 131203_kfl_initial_genesets_v1.0_AA.fasta -entry_batch cesa-pr.fa.out.id | sed 's/>lcl|/>/' | sed 's/ .*//’ > cesa-pr.fa.out.id.fa
Wget theannotationfilefromtheJapanwebsitewget -q http://www.plantmorphogenesis.bio.titech.ac.jp/~algae_genome_project/klebsormidium/kf_download/131203_gene_list.txt
Checkwhatourgenesareannotatedtobegrep -f cesa-pr.fa.out.id 131203_gene_list.txt
![Page 24: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/24.jpg)
24
Ifaprogram(e.g.BLAST)runssolongonaremoteLinuxmachinethatitwon’tfinishbeforeyouleaveforhome…
Orifyousomehowwanttorestartyourlaptop/desktopwhereyouhaveaPuttysessionisrunning(Windows)orashellterminalisrunning(Ubuntu)…
Inanycase,youhavetoclosetheterminalsession(orhaveitbeautomaticallyterminatedbytheserver).Ifthishappens,yourprogramwillbeterminatedwithout finishing.Ifyouexpectyourprogramwillrunforaverylongtime,e.g.longerthan10hours,youmayput“nohup”beforeyourcommand;thisensuresthatevenifyouclosetheterminal,theprogramwillstillruninthebackgrounduntilitisfinishedandyoucanloginagainthenextdaytochecktheoutput.Forexample:
nohup blastp -query yeast.aa -db yeast.aa -out yeast.aa.ava.out-outfmt 6 &
Youwillgetanadditionalfilenohup.out intheworkingfolderandthisfilewillbeemptyifnothingwronghappened.
![Page 25: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/25.jpg)
25
Youareatyourhomecd toolslftp emboss.open-bio.org
Insideembosslftp sitecd pub/EMBOSSget emboss-latest.tar.gz
Byetoexitfromlftpbye
NowyouarebacktoyourhomeonSertar zxf emboss-latest.tar.gz
./configure –prefix=$HOME/tools/emboss
make
make install
Ifyouaretheroot,onecommandcandoalltheaboveforyousudo apt-get install emboss
Testifembosshasbeeninstalled:
water
Exercise2:emboss
![Page 26: Install and run external command line softwaresbcb.unl.edu/yyin/teach/PBB/linux-cmd4.pdf · Homework #8 Create a folder under your home called hw8 Change directory to hw8 Download](https://reader033.fdocuments.net/reader033/viewer/2022053011/5f0e6f7f7e708231d43f3bb1/html5/thumbnails/26.jpg)
Changesequenceformatseqret –help
seqret -sequence /home/yyin/Unix_and_Perl_course/Data/GenBank/E.coli.genbank -outseqE.coli.genbank.fa -sformat genbank -osformat fasta
Calculatesequencelengthinfoseq –help
infoseq -sequence /home/yyin/cesa-pr.fa -name –length -only
Morecommandexamples:needle –helpwater –helpfuzznuc –helppepstats -helppepinfo –help
26
plotorf -helptranseq -helpprettyseq –help