BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part...

28
BMS353 BMS 353 Bioinforma.cs for Biomedical Science Module coordinator: Dr Marta Milo

Transcript of BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part...

Page 1: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

BMS353Bioinforma.csforBiomedicalScience

Modulecoordinator:DrMartaMilo

Page 2: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

PartA:Presenta.onofthemoduleBreak–ques.onansweringPartB:Introduc.on

Today’sOutline

Page 3: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

PartAPresenta.onofthemodule

Page 4: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

Whatisallabout?

This module will describe fundamental concepts and technologies underlyingcomputa.onalbiologyandbioinforma.cs.

Computa(onalBiologyisthedevelopmentandapplica5onofdatadrivenmathema5calmodelingandcomputa5onalsimula5ontechniquestostudyofbiological,behavioral,andsocialsystems

Bioinforma(csisaninterdisciplinaryfieldofsciencethatdevelopsmethodsandso?waretoolsforunderstandingbiologicaldataBioinforma5cscombinescomputerscience,sta5s5cs,mathema5cs,andengineeringtoanalyseandinterpretbiologicaldata. adaptedfromwikipedia

Page 5: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

•  Nextgenera.onSequencingdataanalysis

•  Noisedeconvolu.on

•  Modellinguncertainty

•  Integra.onofdata

•  Modellingobserveddataforpredic.ons

WhatisaBioinforma.cian?

WhatIdoinmyresearch:

WhatamIgoingtoteachyou?

SomeofthatSTUFF

Page 6: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

Whatarethelearningoutcomesofthismodule?

Thismoduleaimsto:1.  provideanunderstandingofthefundamentalconceptsandtechnologies

underlyingcomputa.onalbiologyandbioinforma.cs

2.  equipbiologystudentswithbasicknowledgeofmathema5calconceptsthemwithmethodsofBioinforma.csandComputa.onalbiology

3.  useamul5disciplinaryapproachintegratedwithprogrammingtoolsandsta.s.calconceptsunderpinningadvanceddataanalysisandmethodsthataresuitableforhigh-throughputdataanalysis

4.  providenewtransferableskills

Page 7: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

Howwillyoubelearning?

•  Lecturesontheore.calconcepts

•  OnlineresourcesfromopensourcesoSware

•  Wri.ngsimplescriptsfordataanalysisduringprac.calclasses

•  Self-markingandforma.vefeedback

•  Groupdiscussionandforumthroughthemodulewebsite

•  Smallresearchprojectonrealdata•  Bangingyourheadonthecomputer..

•  Givingyourself.metoadapttothisnewwayofthinking…

Page 8: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

WhatwillyougainfromBMS353?

•  Trainingindataanalysisandbasicprogrammingskillswiththeaimsofbeingawareoftheeffectsofexperimentaldesigninthedataanalysis

•  AgoodunderstandingoftechnologiesandmethodsforBioinforma.csanduseofworkflowandpipelinesfordataanalysis

•  Newqualifica.onsthatwillincreaseyouremployability

•  Deeperinsightintotheprinciplesofconduc.ngaresearchdataanalysisproject

•  Anewseoftransferableskills,likeprogrammingandawarenessofcloudcompu.nganddatasharing

•  Learninganewterminologyandnewinterdisciplinaryskills

Page 9: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

ModuleOutline

Theteachingconsistsoftwohoursoflecturesandtwooflabclasseseachweek.Thelectureswillbefollowedbyprac.calclasses.Inthelabclasseswewillusecodingtotransformtheoryinprac.ceLabclassesaresplitintwogroupstoreduceclassnumbersCodingrequiresprac.ce,themorethebe\er

Page 10: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

ModuleOutline(cont.)

•  Course-worksareessen.altolearnthecodingskills–dothem.

•  Hitthedeadlinesfortheself-assessmenttomonitoryourprogressandhighlightproblemyoumighthave.

•  Makesureyoupar.cipateac.velytotheinterac.vesessionsintheclassandinthelabs.

•  Usetheresourcesonthemodulewebsiteand

•  Readcarefullythenotebookandfollowtheinstruc.ons

•  Avoidcatchup!RememberBMS353isdifferentfrombiologyteachingandcanbeoverwhelmingifleSallattheend.

•  Pleasenote,ifyouemailaques.onthatcanbeansweredbyreadingthemodulehandbookorinstruc.onsonMOLEormodulewebsiteyouwillnotreceiveananswer.

Page 11: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

BMS353website

Page 12: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

ThetoolswewilluseJupyternotebook(OriginallyIpythonnotebook)Combinescomputerprograms(code),text,data,resultsintooneinterac.vedocument

Apopularprogramminglanguageinareassuchasbioinforma.cs,sta.s.csanddataanalysis.

Wewilluseacloudcompu.ngenvironmentcalledCoCalc(SageMathCloud)

Wewilluseourbraintocreatenewknowledge

Somemathema.calconcepts

Page 13: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

BMS353assessment

Theexamforthismodulewillbesplitintwoparts:PartA–AMul.pleChoiceQues.ontestforthedura.onof1hrand30minutes,thatwillcount30%ofthefinalgradePartB–Anotebookwiththeimplementa.onofallocatedprojectsthatwillcountfor70%ofthefinalgrade.Theprojectwillbeacollec.onofallthetoolsexperiencedintheprac.callabsimplementedonasetofrealdata.Itwillbedevelopedingroupsofthreestudents,butnotebookwillhavetobehandedindividually.Thelabprac.calnotebookshandedineveryweekduringthemodulewillcons.tuteforma.vefeedbackthatcanbeusedforthefinalproject.

Page 14: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

MCQassessment

Eachques.onwillhave4possibleresponsesA,B,CorD.ONLYONERESPONSEISCORRECTINEACHCASE.Eachques.onisworthonemark,correctanswerwillcountas1,anincorrectanswerwillcountas-0.5.Notansweredques@onswillcountas0.

1.  WhatisthemainsubjectofBMS353:A.PhycologyB.Sta.s.csC.Computa.onalbiologyD.ComputerScience

2.WhatlevelstudentsBMS353isaimedat:A.Level3-BMSB.PostgraduateC.MasterstudentsD.ComputerSciencestudents

3.Therewillbenomathema.csinBMS353:A.TRUEB.FALSEC.TRUEonlyinodddaysD.TRUEonlyinevendays

Student1:1C,2B,3Amark=0Student2:1C,2-,3-mark=1

Page 15: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

PartBIntroduc.on

Page 16: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

Cloudcompu.ng

Cloudcompu5ng,orsimply“thecloud”,alsoknownason-demandcompu.ngisamodelforenablingon-demandaccesstoasharedpoolofconfigurableresources

SamJohnston–fromWikipedia

Thecloudmetaphor:thenetworkelementsrepresen.ngtheservicesareinvisibletotheuser,likeobscuredbyacloud

•  Costefficient•  Largespacestorage•  Backupandrecovery•  Easyaccess•  Quicktogainfunc.onality•  Incen.vescollabora.onanddatasharing

Advantages

Disadvantages•  Technicalissues•  Securityinthecloud•  Pronetoa\ack

Page 17: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

Cloudcompu.ng:anexampleAveryeffec.veuseofthecloudresourcesanditscommercialexploita.onisgivenbyAmazon

Theyusedcloudcompu.ngtocreatetheconceptofElas@cCompu@ng(EC2).ItisakeypartoftheAmazonWebServices(AWS),whichiscomposedofscalableelas.ccomputeunit(ECU)thatwereintroducedasanabstrac.onofcomputerresources.Ausercancreate,launch,andterminateserverusageasneeded.Itisbasedona“payingbythehourforac.veservers”thisiswhyitiscalled"elas.c".Itsglobalfeatureallowsuserstocontroloverthegeographicalloca.onofinstances(serverusage),op.misinglatencyandredundancy.

Firsttoallowcompanytorentscalablecompu.ngresourcesTheirretailecommercesiteisen.relybaseoncloudcompu.ng

Page 18: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

BigDataandDataSharingBigdataisaverygenerictermtoindicatedatasetsthataresolargeorcomplexthattradi.onaldataprocessingapplica.onsareinadequateforminingit.

Visualiza.onofdailyWikipediaeditscreatedbyIBM.Atmul.pleterabytesinsize,thetextandimagesofWikipediaareanexampleofbigdata.

HighvolumeHighvelocityHighvarietyHighlyvariableHighvaria.oninqualityHighcomplexity

Page 19: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

Therearemanychallengeswhendealingwithbigdata,someofthemare:•  Dataanalysis•  Datacura.on•  Searchingengines•  Datasharing•  Datastorageandtransfer•  Datavisualiza.on•  Informa.onprivacyHowever,bigdatahasahighpredic5vepoweranditsaccuracymayleadtomoreconfidentdecisionmaking.

BigDataandDataSharing(cont.)

Inbiology:Withtheadventofhigh-throughputgenomics,lifescien.stsarestar.ngtograpplewithmassivedatasets,encounteringbigdatachallenges

TechnologyFeature,Nature2013

Analysingthelargeamountofgenomicdatawithlocalinfrastructureisimpossible.Thedataisthenmovedtothecloudforanalysisandstorage.Datasharingisbecomingcrucialforbiologicaldata.

Page 20: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

CoCalcwww.cocalc.com

WeareusingthecloudtolearninBMS353.Theresourcesonthecloudareusedasteachingtool

Page 21: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

JupyterNotebooksonCoCalcCloud

WewilluseJupyterNotebooksandtheirkernelsonCoCalcforallourprac.calclasses.AJupyterNotebookskernelisa“computa5onalengine”thatexecutesthecodewri\enintheNotebookdocument.Inthismodule(BMS353)wewilluseRkernelstoimplementourdataanalysisinthenotebooks.Therewillbeallocatedfolderandstoragespacetoourproject:BMS353YouwillaccessyourassignmentsanddatausingCoCalcwithawebbrowser.EverythingwillbestoredinCoCalcfolderallocatedtoyou.Thecloudwillbackupandsecureourwork,aswellasgivinguscomputa.onal.meforthedataanalysisAllthelabprac.calsandthefinalprojectwillbemarkedandassessedfromnotebookssavedintheCoCalcfolders.

Page 22: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

BasicprogrammingterminologyProgramminglanguage=isalanguageformallydesignedtocommunicateinstruc.ontoamachine,i.e.acomputer,tocontrolbehaviorortoexpressamathema.calconstructinnumericalform(makeopera.ons,moreorlesscomplex)Algorithm=itisaprocedureorformulaforsolvingaproblemKernels=computa.onalenginethatisac.vatedbyaspecificlanguage(i.e.R,Python,Cetc.)Scripts=alistofinstruc.onsthatrepresentthecommandneededtorepresentatask.IthasalogicalstructureandadefinedstructurefordatainputImplementa@on=theprocessofpuqngintoeffectthelistofinstruc.onsthatarespecifiedinthescript.Thisisdonebyusingnumericalvaluesasinput.Theimplementa.onprocesswillproduceaafinalsetofvalues.Debug=Processforiden.fyingandremovingerrorfromscriptsObject=virtualcontainerofvaluesstoredintheworkingspace.Itisusedtoimplementtheinstruc.onsandtostorevaluesduringtheimplementa.onandasfinalset.ProgrammingFunc@on=itisaprocedureorarou.nethatencapsulatea“task”.Manyinstruc.onsarecombinedinone“word”(thenameoftheprogrammingfunc.on)whichwillimplementthat“task”onasetofspecifiedinput.ReadandWrite=Theprocessofuploadingdataintotheworkspaceandtodownloaddatafromtheworkingspaceintoalocalorremotearchive(folder)

Page 23: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

Basicmathema.cs

Page 24: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

Basicmathema.csnota.on

Singlevaluesandvectors

xandyarevaluesfromtherealnumbersx, y ∈ℜ

Z

X

Y

A ≡ (x, y, z)A

x

y

z Ingeneralx ≡ (x1, x2,..., xN )

xi ∈ℜ

i =1,...,N

Thevaluesxi arecalledvariablessincetheycanassumearangeafixedvaluesTheparameterarefixedvaluesthatweindicateinmathema.calnota.onwithGreekle\ers

α,β,µ,σ ,λ.....

Page 25: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

Basicmathema.csnota.on(cont.)Matricesaretablesofvaluesorle\ersthatareorganisedinrowsandcolumns.Incommonusetheyonlyhavetwodimensions,inmoreadvancedusetheycanhavethree.Vectorsarespecialcasesofmatrices,theyhaveanumberofNcolumnsandonlyonerow

A = [3x4] Opera@onwithMatrixSumandDifferencesamedimensionsMul.plica.onsnumberofcolumnofthefirstmatrixneedtobethesameasnumberofrawofthesecondmatrix.Mul.plica.onisdonesothat:

Page 26: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

Basicmathema.csnota.on(cont.)

Awayofwri.nganota.onforlargesumsormul.plica.onistousetheGreeksymbolsof

Forsumming

Formul.plying

ForsummingNvalueswewillusethefollowingnota.on:ix /σ

i=1

N

Formul.plyingNvalueswewillusethefollowingnota.on: ix /σi=1

N

Page 27: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

Basicmathema.csnota.on(cont.)

Afunc.onisarela.onfromasetofinputtoasetofpossibleoutputs,whereeachinputisrelatedtoexactlyoneoutput.

f (x) = x / 2

outputInput(variable)

f (x) = 4x + 4

Whentheinputisonewesayaone-dimensionfunc5on

f (x, y) = 2x +2y

Whentheinputismorethatonevariablewesayamul5-dimensionfunc5on.Withtwovariablewesayabi-dimensionalfunc5on

f (x, y /α) =2x +

2yα

Wecanalsohavefunc.oncondi5onaltoaparameter.Inthiscasewecallthemcondi5onalfunc5ons

Whereαhasvaluefromasetofevennumberbetween0and10

Page 28: BMS 353 Bioinformacs for Biomedical Scienceopendsi.cc/bioinformatics/assets/Lecture_Wk7.pdf · Part B – A notebook with the implementaon of allocated projects that will count for

BMS353

Summary

•  WhatisBMS353aboutandwhatyouexpecttolearnandgainaSertakingBMS353

•  Howtogaininforma.onaboutthemoduleandwheretofindlinkstoaddi.onalreadingmaterial,lecturescontentandprac.calclasses(Website)

•  Howtointeractfordiscussionandproblem-solving

•  Howyouwillgetassessed

•  ToolswewillbeusinginBMS353

•  Cloudcompu.ngandBigData

•  JupyterNotebooksandCoCalcCloud

•  Basicprogrammingterminology

•  Refreshedsomebasicmathema.calno.onsandnota.ons.