How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 -...
-
Upload
torsten-seemann -
Category
Science
-
view
1.704 -
download
1
Transcript of How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 -...
![Page 1: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/1.jpg)
How to write bioinformatics software that people will use & cite
A/Prof Torsten Seemann
@torstenseemann
Bioinfosummer 2016 - Adelaide AU, Fri 2 Dec
![Page 2: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/2.jpg)
Who am I ?
![Page 3: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/3.jpg)
Doherty Applied Microbial Genomics
Microbial genomics and bioinformatics
![Page 4: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/4.jpg)
Public health and clinical microbiology
![Page 5: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/5.jpg)
Before bioinformatics
● Undergraduate○ Science / Engineering - Computer Science + Electrical Engineering
● Honours○ Computer Science - Digital image compression
● PhD○ Computer Science - Digital image processing
● Never studied any biology
![Page 6: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/6.jpg)
An opportunity
![Page 7: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/7.jpg)
First “fully Aussie” bacterial genome
● Leptospira hardjobovis str. L550● 2 chromosomes● 4 Mbp
● $1M dollar project● Sanger sequencing
● Led by Dieter Bulach
![Page 8: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/8.jpg)
First Illumina instrument in Australia
● Dept MicrobiologyMonash University, 2008
● 36 bp single end reads
● 2 weeks to run
● 2 lanes for 1.6 Mbp genome
![Page 9: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/9.jpg)
Things have improved a bit since then
Q32
36 bp
Q20
![Page 10: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/10.jpg)
Why am I here?
![Page 11: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/11.jpg)
Bioinformatics software and me
Installed >1000 packages manually
Authored >100 packages into Brew
Written and maintain >10 packages
![Page 12: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/12.jpg)
How to get a bioinformatics headache
1. See tweet about new published tool2. Read abstract - sounds awesome!3. Fail to find link to source code - eventually Google it4. Attempt to compile and install it5. Google for 30 min for fixes6. Finally get it built7. Run it on tiny data set8. Get a vague error9. Delete and never revisit it again
![Page 13: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/13.jpg)
![Page 14: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/14.jpg)
![Page 15: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/15.jpg)
Should I stay for this talk ?
YESIt will help you write good tools
YESIt will help you identify bad tools
![Page 16: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/16.jpg)
Should you write a tool?
![Page 17: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/17.jpg)
Should you write a new tool?
● NO○ It already exists○ You are unable to maintain it
○ You won’t really use it
● YES○ YOU need the tool○ YOU will use the tool○ YOU want others to use the tool○ Desire to give back to the community
![Page 18: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/18.jpg)
Eating my own dog food
![Page 19: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/19.jpg)
Lessons from the Prokka experience
● Nearly all feedback is positive
● People all over the world are grateful
● Warm fuzzy feeling inside
● Increase your public profile
● But maintenance burden and guilt
![Page 20: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/20.jpg)
Discoverability
![Page 21: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/21.jpg)
Choosing a home base
University page
Personal home page
![Page 22: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/22.jpg)
Naming
● Try to be unique○ Google to check for conflicts○ Consider how internationals will pronounce it
○ Be creative!
● Avoid dodgy acronyms ○ Try not to win a JABBA Award○ “Just Another Bogus Bioinformatics Acronym”
![Page 23: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/23.jpg)
Don’t be this person
![Page 24: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/24.jpg)
First impressions count
● Keep It Simple Stupid
● First page of documentation○ What does it do?○ How do I install it?
○ How do I run it?
● Try to keep in one place○ Otherwise becomes inconsistent or missed
![Page 25: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/25.jpg)
Usability
![Page 26: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/26.jpg)
A lesson from history
![Page 27: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/27.jpg)
Print something useful if no parameters
% biotool
Please use --help for instructions
![Page 28: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/28.jpg)
Always have a --help flag
% biotool -h
% biotool --help
Usage: biotool [options] seq.fa--help Show this help--version Print version and exit--top N Keep top N sequences
![Page 29: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/29.jpg)
Always have a --version flag
% biotool -v
% biotool -V
% biotool --version
biotool 1.3
![Page 30: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/30.jpg)
Always raise an error when things go wrong
% biotool seq.fa
ERROR: can not open file ‘seq.fa’
![Page 31: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/31.jpg)
Check that dependencies are installed
% biotool seq.fa
Checking BLAST... okChecking SAMtools... NOT FOUND!
Please install ‘samtools’ and add it to your PATH.
![Page 32: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/32.jpg)
Always let users control output filenames
% biotool seq.fa
Processing ‘seq.fa’Wrote result to ‘filt.seq.fa.out’
# ARGH!
% biotool --out seq.filt.fa
![Page 33: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/33.jpg)
KISS - run with minimum parameters
% biotool seq.faERROR: missing -x parameter
% biotool -x 3 seq.faERROR: missing -y parameter
% biotool -x 3 -y 7 seq.faERROR: need -n name
# ARGH!
![Page 34: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/34.jpg)
Standards
![Page 35: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/35.jpg)
![Page 36: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/36.jpg)
Use the standard getopt interfaceShort options ( -h ) and long options ( --help )
● C #include <getopt.h>● C++ boost:program_options● Python import argparse● Perl use Getopt::Long● R library(argparse)
Command line interface
![Page 37: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/37.jpg)
Unix exit codes
● A positive integer
● Loose standards ○ 0 = success○ 1 = general failure○ 2 = error with command line
○ 3..127 = user defined specific failures
● Result in shell $? Variable
![Page 38: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/38.jpg)
Accessing exit codes in the shell
% ls /tmp/fakels: cannot access /tmp/fake% echo $?1
% ls /proc/cpuinfo/proc/cpuinfo% echo $?0
![Page 39: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/39.jpg)
Using stdin, stderr and stdout
● Stdin (0) command < input● Stdout (1) command > output● Stderr (2) command 2> errors
● All command < input > output 2> errors
● Allows piping!sort input | command1 1> output 2> errors
![Page 40: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/40.jpg)
This makes your tool useful in streaming
% zcat seq.fastq.gz |
cutadapt -a adapters.fa |
qualtrim -Q 20 |
bwa mem -t 8 ref.fa |
samtools sort --threads 4
> seq.bam
![Page 41: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/41.jpg)
Use standards compliant files *
● Feature coordinates○ BED, GFF
● Columnar data (put headings!)○ TSV
○ CSV
● Structured data○ JSON
○ YAML * XML excepted
![Page 42: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/42.jpg)
Installation
![Page 43: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/43.jpg)
Keeping your audience
“Each equation in a book will halve your audience”
“Each difficulty encountered in installation will halve your number of users”
![Page 44: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/44.jpg)
Traditional systems level packaging
● Debian / DEBapt-get install blastdpkg -i blast-2.2.5-amd64.deb
● Redhat / RPMyum install blastrpm -i blast-2.2.5-x86_64.rpm
● Various others
![Page 45: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/45.jpg)
Cross platform solutions: Linux, Mac, Windows
● Brewbrew install blast
● Condaconda install blast
● Others○ GUIX, ... ○ Docker, AMI images
![Page 46: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/46.jpg)
Language specific repositories
● Python - PIPpip install ariba
● Perl - CPANcpanm Bio::Roary
● R - CRANinstall.packages(“edgeseq3”)
![Page 47: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/47.jpg)
Marketing
![Page 48: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/48.jpg)
Publish it
● Preprint archive○ PeerJ, bioRxiv
● Method focussed journal○ Bioinformatics, BMC Bioinformatics
● Software focussed journal○ Journal of Open Source Software
![Page 49: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/49.jpg)
Plug it
● Twitter○ Ask someone popular you know to retweet it
● Blog○ Start a general blog and slot
● Conferences○ Tell people about it
![Page 50: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/50.jpg)
Support your users
● Reply to emails
● Monitor your “Issues” web site
● Monitor Biostars and SeqAnswers
● Have a mailing list
● Update your documentation
● Fix bugs
![Page 51: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/51.jpg)
Conclusions
![Page 52: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/52.jpg)
Take home messages
● Make it as painless as possible to install
● Keep documentation clear and simple
● Get people to use it before you publish
● People are not judging your coding skills
● But they will curse you if waste their time
● Most users are grateful - leads to free beer
● A good tools worth much more than a paper
![Page 53: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/53.jpg)
Acknowledgments
● Gary Glonek● David Adelson
● Bernard Pope - VLSCI● Dieter Bulach - VLSCI● Anna Syme - VLSCI● David Powell - Monash University● Anders Goncalves da Silva - University of Melbourne
![Page 54: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/54.jpg)
References
1. https://gigascience.biomedcentral.com/articles/10.1186/2047-217X-2-15
2. http://berniepope.id.au/scientific_software_etiquette.html
3. http://thegenomefactory.blogspot.com.au/
![Page 55: How to write bioinformatics software people will use and cite - t.seemann - fri 2 dec - bis 2016 - adelaide, au](https://reader031.fdocuments.net/reader031/viewer/2022020203/58701e6e1a28ab7f428b75fb/html5/thumbnails/55.jpg)
The end.