Enabling Reproducible NGS Analysis Through Automated...
Transcript of Enabling Reproducible NGS Analysis Through Automated...
![Page 1: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/1.jpg)
EnablingReproducibleNGSAnalysisThroughAutomatedJupyter PipelinesAmandaBirmingham
Senior BioinformaticsEngineer
CenterforComputationalBiology&Bioinformatics, UCSD
![Page 2: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/2.jpg)
ReproducibleResearch• Repeatability&reproducibilityarekeytothescientificmethod
◦ In1663,onlyRobertBoyleandChristiaanHuygenscouldproducea
vacuum—andtheirfindingsdidn’tagree
• Informaticsshould beattheforefrontofreproducibleresearch◦ Doingthesamethingoverandoveriswhatcomputersdobest!
◦ Buthastakenalongtimeformethodsreportsforcomputational
worktobecomeasgoodasthoseforwetlabwork
◦ Ex:Proc Natl Acad Sci USA.1986Jun;83(11):3746-50
![Page 3: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/3.jpg)
ReproducibleResearch• Repeatability&reproducibilityarekeytothescientificmethod
◦ In1663,onlyRobertBoyleandChristiaanHuygenscouldproducea
vacuum—andtheirfindingsdidn’tagree
• Informaticsshould beattheforefrontofreproducibleresearch◦ Doingthesamethingoverandoveriswhatcomputersdobest!
◦ Buthastakenalongtimeformethodsreportsforcomputational
worktobecomeasgoodasthoseforwetlabwork
◦ Ex:Proc Natl Acad Sci USA.1986Jun;83(11):3746-50
◦ Progress:
§ “Alignmentswererun”
§ “AlignmentswererunwithBLAST”
§ “AlignmentswererunwithBLASTNversion2.2.6againsthuman”
§ “Alignmentswere runwithNCBIBLASTNv.2.2.9usingthecommand blastn -W 7 -q -1 -F F againsttheNCBIRefSeq release80humantranscriptome”
• Paritywithwet-labmethodsshouldn’tbetheendoftheroad!
![Page 4: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/4.jpg)
WhatIsJupyter?• WhatIsJupyter?
◦ "Opensource,interactivedatascienceandscientificcomputingacrossover40programminglanguages”
§ GrewoutoftheIPython project,whichstartedin2001whenDr.FernandoPerezwasprocrastinatingonhisPhysicsPhD:)
◦ A"literatecomputing"environment,"weavingofanarrativedirectlyintoalivecomputation,
interleavingtextwithcodeandresultstoconstructacompletepiece"--FernandoPerez
• Computingplatformisnamed"jupyter"becauseearlylanguageswerejulia,python,andR
◦ Community-maintainedkernelsforotherlanguages: Bash,C,C++,C#,Fortran,Go,Haskell,Javascript,
Lisp,Mathematica,Matlab,Perl,PHP,Powershell,Ruby,SAS,Scala,Scheme,andmanymore
• Mostwell-knownforaweb-based“notebook”system
◦ Allowswriting&runningofcodefrombrowserenvironment
◦ CanmixinHTML,links,images,interactivecontrols,extensions
Jupyter logocourtesyofhttp://jupyter.org/
![Page 5: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/5.jpg)
WhatIsJupyter,Really?
![Page 6: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/6.jpg)
WhatIsJupyter,Really?
![Page 7: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/7.jpg)
WhatIsJupyter,Really?
![Page 8: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/8.jpg)
WhatIsJupyter,Really?
![Page 9: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/9.jpg)
WhatIsJupyter,Really?
![Page 10: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/10.jpg)
WhatIsJupyter,Really?
![Page 11: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/11.jpg)
WhatIsJupyter,Really?
![Page 12: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/12.jpg)
WhatIsJupyter,Really?
![Page 13: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/13.jpg)
WhatIsJupyter,Really?
![Page 14: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/14.jpg)
WhatIsJupyter,Really?
![Page 15: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/15.jpg)
WhatIsJupyter,Really?
![Page 16: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/16.jpg)
Jupyter Notebooks:FriendorFoe?• Arenotebooksthekeytoreproducibility?◦ DataCarpentryoffersanentireworkshopon
"ReproducibleResearchusingJupyter
Notebooks”
• Easytosave,modify,andextend
◦ Greatforrerunningortweakingpreviousdata
analyses
• CCBBdeliversanalysesasnotebooks◦ Reportbecomesmorethanarecord—itisitselfa
tool!
• Notebooks’greateststrengthisinteractivity◦ Betweeninputandoutput
◦ Between(e.g.)PythonandR
◦ Betweennarrativeandcode
◦ Betweenmaterialandreader
![Page 17: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/17.jpg)
(Inter-)ActivelyDangerous• Interactivitycanalsobeahugedangertoreproducibility
• Humansareinconsistent
◦ Wemakeunpredictablemistakes
◦ Thus,“interactive”=“bad”forrepetitivetasks
§ LikeprimaryNGSanalysispipelines
• Jupyter Notebookscanbeinconsistent,too◦ Changingcode/variablesinanotebookdoes
NOTreruncellsthatdependonthatchange
◦ Infact,doesn’tevenclearoldoutputs!
INEEVRMAKETYPOS!
![Page 18: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/18.jpg)
(Inter-)ActivelyDangerous• Interactivitycanalsobeahugedangertoreproducibility
• Humansareinconsistent
◦ Wemakeunpredictablemistakes
◦ Thus,“interactive”=“bad”forrepetitivetasks
§ LikeprimaryNGSanalysispipelines
• Jupyter Notebookscanbeinconsistent,too◦ Changingcode/variablesinanotebookdoes
NOTreruncellsthatdependonthatchange
◦ Infact,doesn’tevenclearoldoutputs!
INEEVRMAKETYPOS!
![Page 19: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/19.jpg)
(Inter-)ActivelyDangerous• Interactivitycanalsobeahugedangertoreproducibility
• Humansareinconsistent
◦ Wemakeunpredictablemistakes
◦ Thus,“interactive”=“bad”forrepetitivetasks
§ LikeprimaryNGSanalysispipelines
• Jupyter Notebookscanbeinconsistent,too◦ Changingcode/variablesinanotebookdoes
NOTreruncellsthatdependonthatchange
◦ Infact,doesn’tevenclearoldoutputs!
INEEVRMAKETYPOS!
◦ Thus,“interactive”=“bad”forimportantrecords
§ Likeexperimentalrecords(i.e.,methods)
• DowehavetogiveupotheradvantagesofJupyter Notebookswhenbuildingpipelines
andrecordingmethods?
![Page 20: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/20.jpg)
• No!Wecanhaveourcakeandeatit,tooJ
• Jupyter shipswithnbconvert packagethatcanread,write,andexecutenotebooksfromPython
• Anextension,nbparameterise (noteBritishspelling)allowsinjectionofnewvariablevalues
• nbconvert andnbformat (alsobuilt-in)canoutputnotebooksandstatichtml,respectively
• Withthesethreepieces,wecanscriptpipelinesbuiltfromJupyter Notebooks
◦ Notebooksgivereadabilityandreusability
◦ Scriptpreventshumanerrorsandspeedsexecution
◦ HTMLoutputofnotebooksprovidesread-onlyrecordofmethods
• Entireapproachtakeslessthanonepageofcode
ScriptingJupyter Notebooks
![Page 21: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/21.jpg)
ScriptingJupyter Notebooks
![Page 22: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/22.jpg)
• No!Wecanhaveourcakeandeatit,tooJ
• Jupyter shipswithnbconvert packagethatcanread,write,andexecutenotebooksfromPython
• Anextension,nbparameterise (noteBritishspelling)allowsinjectionofnewvariablevalues
• nbconvert andnbformat (alsobuilt-in)canoutputnotebooksandstatichtml,respectively
• Withthesethreepieces,wecanscriptpipelinesbuiltfromJupyter Notebooks
◦ Notebooksgivereadabilityandreusability
◦ Scriptpreventshumanerrorsandspeedsexecution
◦ HTMLoutputofnotebooksprovidesread-onlyrecordofmethods
• Entireapproachtakeslessthanonepageofcode
ScriptingJupyter Notebooks
![Page 23: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/23.jpg)
NotebooksintheWild• AsampleNGSpipelineusingJupyter Notebooks
◦ Goal:identifygenepairswithsynergisticsurvivaleffects(positiveornegative)
◦ Experimentalsystem::Dual-geneknock-outsinhumancelllinesusingCRISPR
◦ Read-out:numberofinstancesofeachCRISPRguideinfinalpopulation,assessedbyNGS
Scaffold
Trimming
Pair
Filtration
Pair
Counting
Count
Visualization
Count
Combination
Jupyter logocourtesyofhttp://jupyter.org/
![Page 24: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/24.jpg)
NotebooksintheWild• AsampleNGSpipelineusingJupyter Notebooks
◦ Goal:identifygenepairswithsynergisticsurvivaleffects(positiveornegative)
◦ Experimentalsystem::Dual-geneknock-outsinhumancelllinesusingCRISPR
◦ Read-out:numberofinstancesofeachCRISPRguideinfinalpopulation,assessedbyNGS
![Page 25: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/25.jpg)
Conclusions• Jupyter Notebooksareafantastictoolfordataanalysis—but:
• Theirtwingoalsofinteractivityandreproducibilityareoftenatodds
• Notebookscanbescriptedtoreduceerrorpotential◦ Andnotebook-basedpipelinesself-documentnicely!
• CCBBhasimplementedasampleJupyter-basedpipelineforNGSdatafromdualCRISPRscreens
◦ PipelineispartofworkwithDr.s PrashantMali&TreyIdeker, nowinpressatNatureMethods
◦ Codeisavailableinthe“CRISPR”sectionofCCBB's jupyter-genomicsrepositoryonGitHub
§ https://github.com/ucsd-ccbb/jupyter-genomics
• CCBB’sDataScienceBloggivesafurtherintrotonotebookscripting◦ http://ccbb.bio/outreach/data-science-blog/
• Reproducibledataanalysisishardwork—butworththeeffort!http://ccbb.bio
![Page 26: Enabling Reproducible NGS Analysis Through Automated JupyterPipelinescompbio.ucsd.edu/wp-content/uploads/2016/10/20170206... · 2017-07-19 · Reproducible Research •Repeatability](https://reader031.fdocuments.net/reader031/viewer/2022041022/5ed27e54773cd410be4fde9c/html5/thumbnails/26.jpg)
Acknowledgments• FernandoPerez&theJupyterProject!
• DualCRISPRTeam◦ Malilab
◦ Idekerlab
• CCBBTeam◦ KatieFisch (Director)
◦ RomanSasik
◦ Guorong Xu
◦ Brin Rosenthal
• Ourfunders◦ UCSanDiegoHealthSciences
◦ CTRICenterforAcceleratingDrugDevelopment
(CADD)– GrantUL1TR001442