Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon...
-
Upload
the-university-of-queensland -
Category
Science
-
view
575 -
download
3
description
Transcript of Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon...
![Page 1: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/1.jpg)
Doing bioinformatics betterMitchell Stanton-Cook
Beatson Microbial Genomics Group
@mscook #ABiC14
![Page 2: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/2.jpg)
About Me
• HR = Systems Administrator/Software Engineer
+15 +10
2003-2006, 2011-
Bio|Dev|Op
![Page 3: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/3.jpg)
The Beatson Group • Microbial Genomics – no wet lab!
• We analyse 10-100-1000’s of isolate genomes• Bacterial evolution• Bacterial pathogenesis• Genomic epidemiology • Software development for Next-Gen Sequencing data
Mitchell Sullivan
Nabil AlikhanMarisa Emerson
![Page 4: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/4.jpg)
• Term DevOps first appeared about 5 years ago
Were bioinformaticians early DevOps?
Dev+Ops
Dev = Builds stuff
Ops = Gets &
keeps stuff running
![Page 5: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/5.jpg)
• Will focus on 5 ‘ings –– Versioning,– Pinning,– Fixing,– Revisioning &– Virtualising
• Encourage & Empower
• Assuming:– Majority here are/have written their own
software/algorithms/analysis scripts/pipelines
•
• Nothing about making data reusable/reproducible
Outline
![Page 6: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/6.jpg)
• Version 0.99 is when you write the paper?• Version 0.99 is in case it does not work as expected?
0.99
The observed version distribution in bioinformatics software
![Page 7: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/7.jpg)
• Use semantic versioning (http://semver.org)
– start at 0.1.0• +1 MAJOR = incompatible changes,• +1 MINOR = new functionality in a backwards-compatible manner,• +1 PATCH = backwards-compatible bug fixes
• Tools– https://github.com/peritus/bumpversion
Follow a formal versioning convention
X Y Z. .{ { {major minor patch
![Page 8: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/8.jpg)
• Version 0.99 is when you write the paper?• Version 0.99 is in case it does not work as expected?
0.99
The observed version distribution in bioinformatics software
![Page 9: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/9.jpg)
Semantic versioning allows others to make quick and informed decisions
Implications of change are clearly identified
![Page 10: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/10.jpg)
• Want the end user running the same versions of 3rd party software/libraries that you built with!
Pinning of dependencies provides predictability
$ cat Readme.txt<SNIP>
Installation------------
My awesome bioinformatics package requires that this software is installed: * numpy * scipy * matplotlib * biopython * ghalton...
![Page 11: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/11.jpg)
• Examples of python requirements.txt file:
$ cat requirements.txtnumpy==1.8.1scipy==0.14.0matplotlib==0.99.1biopython==1.64ghalton==0.6
$ cat requirements.txtnumpyscipymatplotlibbiopythonghalton
Pinning of dependencies provides predictability
![Page 12: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/12.jpg)
• Examples of python requirements.txt file:
pip install -r requirements.txt
$ cat requirements.txtnumpy==1.8.1scipy==0.14.0matplotlib==0.99.1biopython==1.64ghalton==0.6
$ cat requirements.txtnumpyscipymatplotlibbiopythonghalton
Pinning of dependencies provides predictability
![Page 13: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/13.jpg)
$ cat requirements.txtnumpy==1.8.1scipy==0.14.0matplotlib==0.99.1biopython==1.64ghalton==0.6
$ cat requirements.txtnumpy==1.9.0scipy==0.14.0matplotlib==1.4.0Biopython==1.64Ghalton==0.6
Pinning of dependencies provides predictability
![Page 14: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/14.jpg)
$ cat requirements.txtnumpy==1.8.1scipy==0.14.0matplotlib==0.99.1biopython==1.64ghalton==0.6
$ cat requirements.txtnumpy==1.9.0scipy==0.14.0matplotlib==1.4.0Biopython==1.64Ghalton==0.6
Pinning of dependencies provides predictability
![Page 15: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/15.jpg)
• You want yours and others software to be predictable and deterministic
Pinning of dependencies provides predictability
![Page 16: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/16.jpg)
Death (slow or sudden) by:
“Fixing your environment”
$ sudo pip install mypackage
![Page 17: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/17.jpg)
• System wide python install in: /usr/bin/python
• Also python 2.x and 3.x
“Fixing your environment” to manage 3rd party libraries
>virtualenv
Project1 Project2 Project3 ProjectN~/.venvs/Project1/bin/python ~/.venvs/Project2/bin/python ~/.venvs/Project3/bin/python ~/.venvs/ProjectN/bin/python
numpy==1.8.1scipy==0.14.0matplotlib==1.3.1biopython==1.64ghalton==0.6
biopython==1.54 khmer=1.0biopython==1.64
![Page 18: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/18.jpg)
Use a revisioning system
• Revision control* is a system that records changes to a file or set of files over time so that you can recall specific versions later.
*Revision control is also known as version control. I’ll stick with revision control to avoid confusion with semantic versioning of your software/libraries/analysis scripts/pipelines
![Page 19: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/19.jpg)
Why use revision control
Have you ever…
Nope Nope Nope
![Page 20: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/20.jpg)
• Initially:– git init, git add, git commit, git push, git pull, git tag
Choose a revision control tool and learn it (and the tools that enhance it)
Workingdirectory Staging area Repository
git add
git commit
![Page 21: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/21.jpg)
• Initially:– git init, git add, git commit, git push, git pull, git tag
Choose a revision control tool and learn it (and the tools that enhance it)
Workingdirectory Staging area Repository
git add
git commit
git push
git pull
git tag v0.3.5
![Page 22: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/22.jpg)
Use GitHub (or BitBucket)https://education.github.com
https://education.github.com/pack
![Page 23: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/23.jpg)
Research. Shared.
https://guides.github.com/activities/citable-code/
http://www.zenodo.org
![Page 24: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/24.jpg)
numpy==1.8.1scipy==0.14.0matplotlib==1.3.1biopython==1.64ghalton==0.6
~/.venvs/Project2/bin/python
Project2
v0.3.9
Operating system
Software orLibrary or
Analysis ScriptPipeline
numpy==1.8.1scipy==0.14.0matplotlib==1.3.1biopython==1.64ghalton==0.6
~/.venvs/Project2/bin/python
Project2
v0.3.8
Type and version of OS?Version of gcc?Version of libpng?Version of Python/Perl/Ruby
![Page 25: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/25.jpg)
numpy==1.8.1scipy==0.14.0matplotlib==1.3.1biopython==1.64ghalton==0.6
~/.venvs/Project2/bin/python
Project2
v0.3.9
Operating system
Software orLibrary or
Analysis ScriptPipeline
numpy==1.8.1scipy==0.14.0matplotlib==1.3.1biopython==1.64ghalton==0.6
~/.venvs/Project2/bin/python
Project2
v0.3.8
Type and version of OS?Version of gcc?Version of libpng?Version of Python/Perl/Ruby
Virtualising
![Page 26: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/26.jpg)
Virtualisation technologies
git servermonitoring servermanaging server
• Use virtual machines (VMs) to pin specific OS/software versions
• Distribute the VMs
Traditional Virtualised
![Page 27: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/27.jpg)
• Vagrant (http://www.vagrantup.com)– Easy to configure, reproducible and portable work
environments
– Benefits:• Vagrant (+Ansible) will automatically set everything up
required for your software/libraries/analysis/pipelines
Vagrant is useful for making environments
+ Vagrant file + virtualisation software+ base image (+ Ansible)
![Page 28: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/28.jpg)
• Containers– Lightweight• Share resources
– Versionable/diffable– Easily distributable
Virtualisation – forget about VM’s and move to containers?
![Page 29: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/29.jpg)
Conclusions
Nope Nope Nope
• A practicing bioinformatician has roles not too dissimilar to that of a DevOp– Multi-disciplinary
• Often are the ones whom implement deploy and maintain software/libraries/analysis scripts/pipelines.
– Need to understand and use tools from the Dev community• SemVer, dependency pinning, “fixed environment”
development, git, GitHub
– Need to understand and use tools from the Ops community as good ways to distribute tools/pipelines in a controlled manner• Virtualisation (vagrant and docker)• IT automation/orchestration (Ansible)
![Page 30: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/30.jpg)
Acknowledgements
Dr Nouri Ben Zakour
Dr Scott Beatson
http://beatsonlab.com
![Page 31: Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics better by Mitchell Jon Stanton-Cook of The University of Queensland](https://reader038.fdocuments.net/reader038/viewer/2022103017/557d5e1ed8b42ae1438b4dbe/html5/thumbnails/31.jpg)