Ontologies and Continuous Integration
-
Upload
chris-mungall -
Category
Technology
-
view
787 -
download
3
description
Transcript of Ontologies and Continuous Integration
![Page 1: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/1.jpg)
Continuous Integration of Open Biological Ontology Libraries
Chris MungallLawrence Berkeley National
Laboratory
![Page 2: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/2.jpg)
Outline
• What is Continuous Integration and why we need it for ontologies
• A build tool for ontologies: OORT• Example workflows: GO and HPO• Lessons learned
![Page 3: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/3.jpg)
Reuse and modularization of ontologies
• Re-use, don’t re-invent– OBO Foundry
• Modularize– Ontologies should not be monolithic
standalone entities– Apply Rector normalization pattern• Building block approach
– Analogous to software engineering
http://obofoundry.org Rector A. Modularisation of domain ontologies implemented in Description Logics and related formalisms including OWL. Proceedings of the 2nd international conference on Knowledge capture (2003)
![Page 4: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/4.jpg)
Examples of ontology re-use
• GO is re-using the CHEBI classification of chemical entities– Using GONG* methodology– Automated classification
• The Human Phenotype (HP) ontology is re-using FMA classification of anatomical structures
• GFF3 format re-uses SO for genome feature types and validation
*Wroe, C. J., Stevens, R., Goble, C. A., & Ashburner, M. (2003). A methodology to migrate the gene ontology to a description logic environment using DAML+OIL. Pac Symp Biocomput, 624-35.
![Page 5: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/5.jpg)
Reuse is not problem-free
• Lesson:– Not an excuse to re-implement de-novo– Integration testing is vital– This applies to ontologies too
• Inter-ontology integration• Integration between ontologies and software systems
• Modules which are tested in one context may not work in another– Example: Therac-25 radiation therapy machine fatal errors– Causes of failure were complex
• Software tested and used on previous models was re-used
– Most software engineers are personally familiar with less lethal examples
![Page 6: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/6.jpg)
Integration testing in software engineering
• Agile, test-driven model– automated Continuous
integration (CI) testing– Immediate feedback
• Traditional waterfall model– Integration testing at end– Deferral = pain
http://martinfowler.com/articles/continuousIntegration.html
![Page 7: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/7.jpg)
Example CI Server Architecture
![Page 8: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/8.jpg)
Jenkins-CI
• A popular extendable open source continuous integration server
• Easy to set up and administer• Multiple plugins• Large helpful user base• Powerful, clean web based dashboard• Integrates with most Version Control Systems
(VCSs)http://jenkins-ci.org/
![Page 9: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/9.jpg)
What’s this got to do with ontologies?Software Engineering Ontology Engineering
Source Code (.java, .pm) Ontology (.owl, .obo)
Version control system Version control system
Builds/Releases Builds/Releases
IDE (Eclipse, Netbeans, …) ODE (Protégé, OBO-Edit)
Bugs ‘true path’ violations, inconsistencies
Junit/Xunit Tests • OWL Logical Axioms• Structural constraints• Terminology checks
Build tool (ant, maven) ???
Integration tests ???
Integration server Integration server
![Page 10: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/10.jpg)
Oort: A build tool for ontologies• What does it do?
– Runs ‘ontology unit tests’ and creates releases– Logical tests:
• No unsatisfiable classes• No inferred equivalencies between named classes
– Other tests:• ≤ 1 textual definition per class• ≤ 1 RDFS label per class
• How does it work?– Built on top of OWL-API
• Most OWL reasoners are available
– GUI• For end-users
– Command line• For use in CI serverhttp://code.google.com/p/owltools/wiki/OortIntro
![Page 11: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/11.jpg)
Example basic workflow• Client:
– Make local modifications using OBO Edit
– Commit changes to SVN– (optionally) checks dashboard in web
browser
![Page 12: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/12.jpg)
Example basic workflow• Client:
– Make local modifications using OBO Edit
– Commit changes to SVN– (optionally) checks dashboard in web
browser
• build-go job:– Load main ontology– Import external disjointness axioms– Launch hermit– Write reasoner report– Fail if unsatisfiable classes found– Run additional perl checks, ensure external xrefs
resolve, etc
• Server:– Jenkins polls SVN– External commit triggers Jenkins
to launch the build-go job (using Oort)
![Page 13: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/13.jpg)
Example basic workflow
FAILSUCCESS
• Jenkins sends email alert to mail list• GO editor debugs, fixes then recommits
• Write reasoner report• If previous build was fail, Jenkins
sends ‘service resumed’ email • Downstream jobs are triggered
• (e.g. bigger integrated builds, deployment)
![Page 14: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/14.jpg)
OBO Jenkins dashboard
http://build.berkeleybop.org/
Red ball = FAIL
In progress –Cell ontology (cl) build
‘outlook’
![Page 15: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/15.jpg)
Why we need this for GO
• GO is gradually moving towards leveraging external ontologies and automated reasoning– E.g.New metabolism terms come in via TermGenie• User simply selects CHEBI class
– Automated graph placement (Elk)
‘carotenoid biosynthesis’ EquivalentTo biosynthesis and ‘has output’ some carotenoid
‘xanthophyll biosynthesis’ EquivalentTo biosynthesis and ‘has output’ some xanthophyll
http://go.termgenie.org
![Page 16: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/16.jpg)
Why we need this for GO
• Automated quality control using reasoning– Taxon constraints– Useful for false function predictions
‘carotenoid biosynthesis’ DisjointWith ‘in taxon’ some Metazoa
‘in taxon’ some Metazoa DisjointWith ‘in taxon’ some Viridiplantae
Deegan, J., Dimmer, E., & Mungall, C. J. (2010). Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development. BMC bioinformatics, 11(1), 530. BioMed Central Ltd. doi:10.1186/1471-2105-11-530
![Page 17: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/17.jpg)
Errors propagate in an integrated environment
Inference:Ada SubClassOf owl:Nothing
![Page 18: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/18.jpg)
Inference:Ada SubClassOf owl:Nothing
Server-side integration tests are vital
• Problem may not be apparent in developers local environment– Manifests when GO is integrated with gene associations
• With CI, errors can be fixed at source
![Page 19: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/19.jpg)
Staged builds
• Fowler Principle: ‘Keep the build fast’• Staged builds– Balances needs of bug finding and speed
![Page 20: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/20.jpg)
User experience
• Previous environment:– Daily cron job, monolithic perl scripts
• Informal survey results:– Gene Ontology developers love Jenkins
• Popular Features:– Transparency of build process– Direct feedback– User-friendliness– ‘build lights’
• Particularly useful for obo/owl hybrid workflows
![Page 21: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/21.jpg)
Human Phenotype Ontology is deployed using CI
• HPO: ~10k classes• Logical definitions have dependencies on:– FMA; PATO; Uberon; GO; CL
• Annotations– Link OMIM disorders to HPO classes
• Validation– Oort and GULO
• Uses Hudson rather than Jenkins
Koehler S et al (2008) Improving ontologies by automatic reasoning and evaluation of logical definitions. BMC Bioinformatics 12(1)
![Page 22: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/22.jpg)
CI best practice: use a VCS
• Ontologies are source code– Always use a version control system to manage your source
code• Sorry, this is non-negotiable
• CI server integration with VCSs is a great feature– Polling– Commit metadata coupled with builds
• Downside of VCSs:– OWL syntaxes are almost always preferable to obo format,
except• They suck with VCSs – spurious diffs• We’re working on a solution
![Page 23: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/23.jpg)
Future Enhancements
• Migrate OBO-Edit verification checks to OWL API• Phase out perl and OBO-Format validation scripts
and move to OWLAPI plus OPPL2 for scripting• Extend GO validation pipeline to include term
enrichment gold standard sets– E.g. after ontology change does the p-value of
angiogenesis change in the glioblastoma gene set?• (Example stolen from Erik Clarke’s talk)
![Page 24: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/24.jpg)
Availability
• Oort: • http://code.google.com/p/owltools/wiki/OortIntro
• OBO build server:• http://build.berkeleybop.org • You can request to have your ontology and custom
build pipeline added– [email protected]
• Easy to clone our config and set up your own server
![Page 25: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/25.jpg)
Conclusions• What works for software can work for ontologies
– Ontology engineering should become more like Software engineering• Ontology re-use can be hard
– A CI server is vital for staying integrated• Simple = good
– Admin: Jenkins is easy to set up and maintain– Users: +1
• Successful for GO, HPO– Now being extended to other ontologies– May be a vital component in OBO Foundry infrastructure
• CI will be integral as information systems evolve to depend more on ontologies
![Page 26: Ontologies and Continuous Integration](https://reader034.fdocuments.net/reader034/viewer/2022042601/5550135eb4c905af648b4a3d/html5/thumbnails/26.jpg)
Acknowledgments
• Tanya Berardini, Rebecca Foulger, David Hill, Jane Lomax, Paola Roncaglia, Midori Harris, Ramona Walls, Laurel Cooper (beta testers)
• Heiko Dietze (Oort)• Sebastian Bauer (HPO)• Seth Carbon, Amelia Ireland (Jenkins wrangling)• GO PIs• Jenkins