EBI is an Outstation of the European Molecular Biology Laboratory.
EMBOSS
European Molecular Biology
Open Software Suite
Peter Rice [email protected]
BOSC: EMBOSS 200912.04.232
A quick introduction
• Open source package for sequence analysis• ANSI C source code• GPL licensed applications, LGPL libraries• 200+ applications• 100+ third party applications in 15 associated packages• Project started 1996 at Sanger and HGMP• Now based at EBI• Release 6.1.0 15th July 2009• Funded by UK-BBSRC and EMBL-EBI
BOSC EMBOSS 200912.04.233
A near death experience
• April 2004: The UK Medical Research Council decided to close the UK Human Genome Mapping Project Resource Centre (now the Rosalind Franklin Institute)
• That was where all the EMBOSS developers worked• We announced the potential end of EMBOSS development to our
user community• HGMP closed in July 2005• The developers moved to EBI, interim funding to April 2006.• Funding was secured in May 2006 (BBSRC)• … and again in May 2009 (BBSRC)• As far as we are aware, all our academic and industry users
continued running EMBOSS … with no risk• That is a huge advantage for open source licensing
BOSC: EMBOSS 200912.04.234
Who do we serve?
• Expert software developers• Bioinformaticians• Computer scientists
• Expert users• Biology research community• Industry
• Scientific users• Biology research community• Industry
BOSC: EMBOSS 200912.04.235
EMBOSS World Wide
We have users in every continent - and a picture to prove it. This is British Antarctica. We are promised another photo from the frozen North
The first EMBOSS course was in Beijing, April 1999.
The wEMBOSS interface is from Canada, Argentina and Belgium
BOSC: EMBOSS 200912.04.236
EMBOSS command line interface
• EMBOSS applications run from the command line• This is not the only interface
• There are over 100 interfaces and packaged systems available
• All applications have a command definition file (.acd)• Defines all inputs, outputs, and other options• Read at startup• Contains all command line options with descriptions• Template for any other interface
BOSC: EMBOSS 200912.04.237
EMBOSS command line example
% antigenic
Input protein sequence(s): uniprot:actb1_fugru
Minimum length of antigenic region [6]:
Output report [actb1_fugru.antigenic]:
% antigenic uniprot:actb1_fugru -auto
BOSC: EMBOSS 200912.04.238
EMBOSS ACD File
application: antigenic [ documentation: "Finds antigenic sites in proteins" groups: "Protein:Motifs"]
section: input [ information: "Input section" type: "page"]
seqall: sequence [ parameter: "Y" type: "PureProtein" ]
endsection: input
section: required [ information: "Required section" type: "page"]
integer: minlen [ standard: "Y" minimum: "1" maximum: "50" default: "6" information: "Minimum length of antigenic region" ]
endsection: required
section: output [ information: "Output section" type: "page"]
report: outfile [ parameter: "Y" rformat: "motif" multiple: "Y" taglist: "int:pos=Max_score_pos" ]
endsection: output
BOSC: EMBOSS 200912.04.239
EMBOSS makes things easy
• ACD files define sequence input• Sequence type for DNA/protein, possible ambiguity codes, gaps• Sequences in files
• 40+ formats supported - auto detection• Sequence databases
• Remote servers• SRS, Entrez, MRS• User-specified URL
• Locally indexed - using the original data files• Local script utilities
BOSC: EMBOSS 200912.04.2310
EMBOSS Web Interface
http://emboss.ch.embnet.org/wEMBOSS/
BOSC: EMBOSS 200912.04.2311
EMBOSS SoapLab Service
MyGrid/EMBRACE projects: for use by Taverna Workflows
BOSC: EMBOSS 200912.04.2312
EMBOSS User Survey
BOSC: EMBOSS 200912.04.2313
EMBOSS Update
• Release 6.1.0 as usual on 15th July 2009• New EMBL and UniProt formats
• With full set of cross-references
• FASTQ short read formats• Jemboss GUI included as standard• Further profiling for enhanced efficiency• 2000+ QA tests (more needed)• Updated Phylip 3.68 … and file format variants• Services for EMBRACE/SoapLab2• DAS testing
Example Dasty screen:
Example Ensembl screen:
BOSC: EMBOSS 200912.04.2316
EMBOSS Future plans
• Three open source books: users, developers, admin• Cambridge University Press• Original text can be freely reused
• New areas of interest• Metadata and ontologies (EDAM, taxonomy, GO, SO, …)• (all) public data resources• Coordinate systems (ensembl, gene/protein input/results)• Project-based working• Next-generation sequence data – used by ordinary biologists• 100+ new applications
• Database index updates• Scientific advisory board• Developer courses: anywhere, any time
BOSC: EMBOSS 200912.04.2317
Peter RiceAlan Bleasby
Jon Ison Mahmut Uludag
The Emboss Team
Mon 12:15 Technology Track
Mon 17:45 Poster U43
Wed 13:00 Birds of a Feather
BOSC: EMBOSS 200912.04.2318
Acknowledgements
• EBI: Peter Rice, Alan Bleasby, Jon Ison, Martin Senger, Tom Oinn, Jaina Mistry, Rodrigo Lopez, Sharmilla Pillai, Hamish McWilliam
• RFCGR/HGMP: Alan Bleasby, Jon Ison, Tim Carver, Hugh Morgan, Claude Beazley, Lisa Mullan, Damian Counsell, Gary Williams, Val Curwen, Mark Faller, Sinead O’Leary, Thon deBoer, Martin Bishop
• LION: Thomas Laurent, Bijay Jassal, Bren Vaughan, Thure Etzold
• Sanger Institute: Ian Longden, Richard Bruskiewich, Simon Kelley
• National bioinformatics service providers in: Norway, Spain, Italy, Netherlands, Germany, Belgium, Russia, China, Canada, Australia, Argentina
• Others: Catherine Letondal, Don Gilbert, Rodger Staden, Bill Pearson, Webb Miller, Marie-Laetitia Denayer, Amandine Schurmann, Gabriele Weiler, Luke McCarthy, David Mathog, David Bauer, Henrikki Almusa, Thomas Siegmund, Scott Markel, Darryl Leon, Bastien Chevreux...
• IBM, Hewlett-Packard, (Compaq), Apple, SGI, Sun, LION bioscience, SciTegic, Accelrys, Cambridge University Press
• Open-Bio Foundation, Sourceforge
• ... And the British Antarctic Survey
http://emboss.sourceforge.net
http://emboss.open-bio.org/wiki
Top Related