BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology...
-
Upload
isabel-sanchez -
Category
Documents
-
view
219 -
download
0
Transcript of BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology...
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI1
Naming conventions for ontology engineering
Daniel Schober, PhD
The European Bioinformatics Institute (EBI)
NET Project – Postdoctoral Ontologist
www.ebi.ac.uk/net-project
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI2
• Metabolomics Standards Initiative (MSI)– Describe metabolomics laboratory workflows
• Minimal requirements, augmenting exchange formats
– Ontology working group under OBI…
• Ontology for Biomedical Investigations (OBI)– Larger collaborative, multi-domains effort
• Brings together p various ‘omics’ and biomedical communities
– Describe general laboratory workflow• Experimental Design, protocols, data analysis etc.
– Developed under OBO Foundry…
• Open Biomedical Ontologies (OBO) Foundry– Provides best practices for ontology engineering– Creates a complete suite of orthogonal and interoperable
ontologies • Over 60 ontologies and ~10 core foundry
Collaborative Efforts – Scenario
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI3
Collaborative Efforts – Challenges
• Create networked orthogonal ontologies– Integrating MSI ontology with OBI– Integrating OBI with BFO and other OBO-Foundry
ontologies, e.g.• PATO (qualities), ChEBI (chemicals), …
• Integrate modular developments– Parallel branch development– OWL-import, referencing
• Improve the communication among developers– Database developers and biologists– Semantic web and text miners
-> We need common naming conventions- To harmonize the appearance and design of modules
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI4
• Representational artefacts built according to different
- Engineering methodologies• MethOntology, Tove, Enterprise, …
– Engineering Tools• Protégé, OBO-Edit, OntoEdit, …
– Representation languages and semantics• OBO, OWL and CLIPS-Frames, …
- Engineering ‘schools’ and philosophies• GO, semantics web, AI (Protégé Frames), … • Manchester, Saarbruecken, Stanford, Trento, Karlsruhe, …• Realists, Conceptualists, …
• As diverse as these backgrounds are the naming conventions applied !– Diverse ad hoc ways to name what is represented
Common Naming Conventions – Why?
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI5
SeparatorSpace vs. underscore vs. nil
CaseUpperCamelCase vs. underscore
Namespace prefix
Acronyms
Synonyms
Administrative helper classes
Compound name
Singular vs. Plural, xref
Instance convention
ID conventionuppercase prefix, underscore, number vs. lowercase prefix, colon, stringor no name just ID string
Omissions
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI6
• Semantic web best practices and deployment group web– Format specific: OWL– Limited visibility: information dispersed and embedded into many
documents
• BioPax manual– Limited visibility: naming conventions only implicitly dealt with in
general documentation– Implementation specific: naming conventions discussed at
implementation level (Protégé/OWL)– Limited coverage: IDs addressed marginally (page 53, Technical
Notes RDF:ID), no conventions on relations
• GO developers style guide– Format specific: mainly OBO; has its own definition for
namespace which differs from the one in OWL/semantic web– Limited visibility: naming conventions dispersed throughout
websites, e.g. GO namespace, term names and identifiers are explained in different documents
Existing Naming Conventions – Status
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI7
• ISO-Standards– Information overflow: About 40 documents that contain closely
related guidelines– Limited access: commercial
• ANSI/ISO Z39.19-2005– Semantics specific: Controlled vocabulary, e.g. about terms,
not classes– Limited coverage: No term ID handling or versioning addressed
• Law and order - Assessing and enforcing compliance with ontological modeling principles in the Foundational Model of Anatomy (FMA)S Zhang, O Bodenreider, Computers in Biology and Medicine 36 (2006)– Scientific domain dependent: anatomy – Hardly visible: paper access
Acceptance and visibility is ‘limited’ to specific target community We need universally applicable conventions
Existing Naming Conventions – Status
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI8
• Overcome diversity and fragmentation– Collect existing naming conventions
• Make them accessible via repository
– Review and compare• Create a single common document
– Distil universally valid aspects for OWL and OBO– Ensure visibility for target domains– Move towards a common resource for the OBO Foundry
groups
• Provide best practice guidelines– Provide robust names for ontology classes– Not a ‘knowledge representation language’ for names, like
e.g. HUGO does for gene symbols (awgTg(GBtslenv)832Pkw)
• Engage in discussion with other groups– A two phases approach …
Our Goals
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI9
• Phase 1: Straw man document- “Working towards naming conventions for use in controlled
vocabulary and ontology engineering”• See Bio-Ontologies SIG Proceedings, p. 29-32
- Created for MSI Ontology WG, targeting the larger OBI group
- Implementation and format independent
• Phase 2: Survey OBO Foundry groups- Questionnaire (work in progress)
• Ontology and engineering process• Current practice in naming entities• Envisioned benefits of common conventions• In depth questions on particular conventions
– Results to be posted under OBO Foundry wiki
Towards Common Naming Conventions
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI10
• Explicit and concise names– Avoid omissions and ellipses
• Plant Ontology (PO) used 'cell' for 'plant cell'
– Avoid negative names like ‘non-separation device’ – Avoid ambiguous words
• 30 meanings of ‘set’; e.g. plurality ‘protocol set’ or action ‘parameter set’
– Brand name convention: use [company name+brand name+superclass]
• ‘US 2’ becomes ‘Bruker US 2 NMR magnet’
To ensure shared understanding of intended meaning
• Typographical issues– Use lowercase as in natural language
• most flexible, e.g. ‘pH’, ‘DNA_hybridisation’ (no acronym boarder problems)
– Avoid punctuation, sub/superscripts– Resolve special characters consistently, e.g. ->alpha
To ensure readability, reduce diversity in appearance
Naming Convention Straw Man - Examples
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI11
• Lexical issues– Reuse words and avoid synonyms within compound names
• ‘x_part_of_process’, ‘y_part_of_process’ and ‘z_part_of_process’ instead of ‘x_component_of_process’, ‘y_portion_of_process’, ‘z_part_of_process’
To decrease learning- and search-burden on user side, to ease text mining by reducing string variability
– Use underscore or space separator (instead of CamelCase) • prevents distortions like ‘CapNMRProbe’ and ‘pHValue’, yet
allows brandnames like ‘SampleJet’To ease text mining and readability (demarked word
borders)
– Use singular nominal word form• Avoid inconsistencies like ‘biphenyl’ (CHEBI:17097) under a
IUPAC required ‘biphenyls’ (CHEBI:22888) To harmonize appearance, to avoid redundancy, to ease
ontology cross-referencing and import
Naming Convention Straw Man - Examples
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI12
• Syntactic issues– Qualifier order: put the qualifier term before the part being
qualified ?• ‘NMR_instrument’ in place of ‘instrument_for_NMR’
– ‘Helper’ strings in class names: establish general ones ?• E.g. ‘sensu’ postfix in GO to indicate species specificity, ‘fruiting body
development (sensu Bacteria)’ (GO:0030583)
• Semantic issues– Administrative ‘helper’ classes: how to name these
metadata bins ?• unclassified (OBI_200067), ChEBI_objects (OBI_336), toBeDiscussed,
_collected_relations – Identifiers and namespace: are conventions useful ?
• OBI uses [group prefix+underscore+unique number], e.g. OBI_334• BFO uses [meaningful string], e.g. IndependentContinuant
Common Naming Convention – Open Issues
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI13
• Communication has improvedp …- In geographically distributed, collaborative efforts- Between developers from different domains and backgrounds
• Appearance of what we represent has been normalized - Not just a matter of aesthetics- Manoeuvring within the hierarchy became faster
… we further envision …• Facilitated access to ontologies through meta-tools
• Reducing diversity with which ontology libraries and tools have to cope with, e.g. OLS, BioPortal, PROMPT and text mining tools
• Facilitating ontology integration and cross-referencing• Comparison, alignment (OWL-import) and mapping
• Serving as guideline for new communities
Common Naming Convention - Benefits
BioOntologies SIG, ISMB/ECCB 2007
Daniel Schober, EMBL-EBI14
• Authors and those contributing to the discussion– Susanna-Assunta Sansone, Philippe Rocca-Serra, Suzi Lewis, Waclaw
Kusnierczyk, Barry Smith, Chris Mungall, Jane Lomax, Robert Stevens, Frank Gibson, Luisa Montecchi-Palazzi, Dietrich Rebholz
• Members of MSI, PSI, OBI groups and OBO Foundry coordinators– http://msi-ontology.sf.net– http://psidev.sf.net– http://obi.sf.net– http://obofoundry.org
• Further info- “Working towards naming conventions for use in controlled
vocabulary and ontology engineering”, Bio-Ontologies SIG Proceedings, p. 29-32
• Funding sources (supporting my work)– UK BBSRC e-Science BB/D524283/1 and BB/E025080/1– Semantic Mining NoE (visits to IFOMIS and Manchester)
Acknowledgements and Resources