BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology...

14
BioOntologies SIG, ISMB/ECCB 2007 Daniel Schober, EMBL-EBI 1 Naming conventions for ontology engineering Daniel Schober, PhD The European Bioinformatics Institute (EBI) NET Project – Postdoctoral Ontologist www.ebi.ac.uk/net-project

Transcript of BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology...

Page 1: BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology engineering Daniel Schober, PhD The European Bioinformatics.

BioOntologies SIG, ISMB/ECCB 2007

Daniel Schober, EMBL-EBI1

Naming conventions for ontology engineering

Daniel Schober, PhD

The European Bioinformatics Institute (EBI)

NET Project – Postdoctoral Ontologist

www.ebi.ac.uk/net-project

Page 2: BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology engineering Daniel Schober, PhD The European Bioinformatics.

BioOntologies SIG, ISMB/ECCB 2007

Daniel Schober, EMBL-EBI2

• Metabolomics Standards Initiative (MSI)– Describe metabolomics laboratory workflows

• Minimal requirements, augmenting exchange formats

– Ontology working group under OBI…

• Ontology for Biomedical Investigations (OBI)– Larger collaborative, multi-domains effort

• Brings together p various ‘omics’ and biomedical communities

– Describe general laboratory workflow• Experimental Design, protocols, data analysis etc.

– Developed under OBO Foundry…

• Open Biomedical Ontologies (OBO) Foundry– Provides best practices for ontology engineering– Creates a complete suite of orthogonal and interoperable

ontologies • Over 60 ontologies and ~10 core foundry

Collaborative Efforts – Scenario

Page 3: BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology engineering Daniel Schober, PhD The European Bioinformatics.

BioOntologies SIG, ISMB/ECCB 2007

Daniel Schober, EMBL-EBI3

Collaborative Efforts – Challenges

• Create networked orthogonal ontologies– Integrating MSI ontology with OBI– Integrating OBI with BFO and other OBO-Foundry

ontologies, e.g.• PATO (qualities), ChEBI (chemicals), …

• Integrate modular developments– Parallel branch development– OWL-import, referencing

• Improve the communication among developers– Database developers and biologists– Semantic web and text miners

-> We need common naming conventions- To harmonize the appearance and design of modules

Page 4: BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology engineering Daniel Schober, PhD The European Bioinformatics.

BioOntologies SIG, ISMB/ECCB 2007

Daniel Schober, EMBL-EBI4

• Representational artefacts built according to different

- Engineering methodologies• MethOntology, Tove, Enterprise, …

– Engineering Tools• Protégé, OBO-Edit, OntoEdit, …

– Representation languages and semantics• OBO, OWL and CLIPS-Frames, …

- Engineering ‘schools’ and philosophies• GO, semantics web, AI (Protégé Frames), … • Manchester, Saarbruecken, Stanford, Trento, Karlsruhe, …• Realists, Conceptualists, …

• As diverse as these backgrounds are the naming conventions applied !– Diverse ad hoc ways to name what is represented

Common Naming Conventions – Why?

Page 5: BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology engineering Daniel Schober, PhD The European Bioinformatics.

BioOntologies SIG, ISMB/ECCB 2007

Daniel Schober, EMBL-EBI5

SeparatorSpace vs. underscore vs. nil

CaseUpperCamelCase vs. underscore

Namespace prefix

Acronyms

Synonyms

Administrative helper classes

Compound name

Singular vs. Plural, xref

Instance convention

ID conventionuppercase prefix, underscore, number vs. lowercase prefix, colon, stringor no name just ID string

Omissions

Page 6: BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology engineering Daniel Schober, PhD The European Bioinformatics.

BioOntologies SIG, ISMB/ECCB 2007

Daniel Schober, EMBL-EBI6

• Semantic web best practices and deployment group web– Format specific: OWL– Limited visibility: information dispersed and embedded into many

documents

• BioPax manual– Limited visibility: naming conventions only implicitly dealt with in

general documentation– Implementation specific: naming conventions discussed at

implementation level (Protégé/OWL)– Limited coverage: IDs addressed marginally (page 53, Technical

Notes RDF:ID), no conventions on relations

• GO developers style guide– Format specific: mainly OBO; has its own definition for

namespace which differs from the one in OWL/semantic web– Limited visibility: naming conventions dispersed throughout

websites, e.g. GO namespace, term names and identifiers are explained in different documents

Existing Naming Conventions – Status

Page 7: BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology engineering Daniel Schober, PhD The European Bioinformatics.

BioOntologies SIG, ISMB/ECCB 2007

Daniel Schober, EMBL-EBI7

• ISO-Standards– Information overflow: About 40 documents that contain closely

related guidelines– Limited access: commercial

• ANSI/ISO Z39.19-2005– Semantics specific: Controlled vocabulary, e.g. about terms,

not classes– Limited coverage: No term ID handling or versioning addressed

• Law and order - Assessing and enforcing compliance with ontological modeling principles in the Foundational Model of Anatomy (FMA)S Zhang, O Bodenreider, Computers in Biology and Medicine 36 (2006)– Scientific domain dependent: anatomy – Hardly visible: paper access

Acceptance and visibility is ‘limited’ to specific target community We need universally applicable conventions

Existing Naming Conventions – Status

Page 8: BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology engineering Daniel Schober, PhD The European Bioinformatics.

BioOntologies SIG, ISMB/ECCB 2007

Daniel Schober, EMBL-EBI8

• Overcome diversity and fragmentation– Collect existing naming conventions

• Make them accessible via repository

– Review and compare• Create a single common document

– Distil universally valid aspects for OWL and OBO– Ensure visibility for target domains– Move towards a common resource for the OBO Foundry

groups

• Provide best practice guidelines– Provide robust names for ontology classes– Not a ‘knowledge representation language’ for names, like

e.g. HUGO does for gene symbols (awgTg(GBtslenv)832Pkw)

• Engage in discussion with other groups– A two phases approach …

Our Goals

Page 9: BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology engineering Daniel Schober, PhD The European Bioinformatics.

BioOntologies SIG, ISMB/ECCB 2007

Daniel Schober, EMBL-EBI9

• Phase 1: Straw man document- “Working towards naming conventions for use in controlled

vocabulary and ontology engineering”• See Bio-Ontologies SIG Proceedings, p. 29-32

- Created for MSI Ontology WG, targeting the larger OBI group

- Implementation and format independent

• Phase 2: Survey OBO Foundry groups- Questionnaire (work in progress)

• Ontology and engineering process• Current practice in naming entities• Envisioned benefits of common conventions• In depth questions on particular conventions

– Results to be posted under OBO Foundry wiki

Towards Common Naming Conventions

Page 10: BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology engineering Daniel Schober, PhD The European Bioinformatics.

BioOntologies SIG, ISMB/ECCB 2007

Daniel Schober, EMBL-EBI10

• Explicit and concise names– Avoid omissions and ellipses

• Plant Ontology (PO) used 'cell' for 'plant cell'

– Avoid negative names like ‘non-separation device’ – Avoid ambiguous words

• 30 meanings of ‘set’; e.g. plurality ‘protocol set’ or action ‘parameter set’

– Brand name convention: use [company name+brand name+superclass]

• ‘US 2’ becomes ‘Bruker US 2 NMR magnet’

To ensure shared understanding of intended meaning

• Typographical issues– Use lowercase as in natural language

• most flexible, e.g. ‘pH’, ‘DNA_hybridisation’ (no acronym boarder problems)

– Avoid punctuation, sub/superscripts– Resolve special characters consistently, e.g. ->alpha

To ensure readability, reduce diversity in appearance

Naming Convention Straw Man - Examples

Page 11: BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology engineering Daniel Schober, PhD The European Bioinformatics.

BioOntologies SIG, ISMB/ECCB 2007

Daniel Schober, EMBL-EBI11

• Lexical issues– Reuse words and avoid synonyms within compound names

• ‘x_part_of_process’, ‘y_part_of_process’ and ‘z_part_of_process’ instead of ‘x_component_of_process’, ‘y_portion_of_process’, ‘z_part_of_process’

To decrease learning- and search-burden on user side, to ease text mining by reducing string variability

– Use underscore or space separator (instead of CamelCase) • prevents distortions like ‘CapNMRProbe’ and ‘pHValue’, yet

allows brandnames like ‘SampleJet’To ease text mining and readability (demarked word

borders)

– Use singular nominal word form• Avoid inconsistencies like ‘biphenyl’ (CHEBI:17097) under a

IUPAC required ‘biphenyls’ (CHEBI:22888) To harmonize appearance, to avoid redundancy, to ease

ontology cross-referencing and import

Naming Convention Straw Man - Examples

Page 12: BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology engineering Daniel Schober, PhD The European Bioinformatics.

BioOntologies SIG, ISMB/ECCB 2007

Daniel Schober, EMBL-EBI12

• Syntactic issues– Qualifier order: put the qualifier term before the part being

qualified ?• ‘NMR_instrument’ in place of ‘instrument_for_NMR’

– ‘Helper’ strings in class names: establish general ones ?• E.g. ‘sensu’ postfix in GO to indicate species specificity, ‘fruiting body

development (sensu Bacteria)’ (GO:0030583)

• Semantic issues– Administrative ‘helper’ classes: how to name these

metadata bins ?• unclassified (OBI_200067), ChEBI_objects (OBI_336), toBeDiscussed,

_collected_relations – Identifiers and namespace: are conventions useful ?

• OBI uses [group prefix+underscore+unique number], e.g. OBI_334• BFO uses [meaningful string], e.g. IndependentContinuant

Common Naming Convention – Open Issues

Page 13: BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology engineering Daniel Schober, PhD The European Bioinformatics.

BioOntologies SIG, ISMB/ECCB 2007

Daniel Schober, EMBL-EBI13

• Communication has improvedp …- In geographically distributed, collaborative efforts- Between developers from different domains and backgrounds

• Appearance of what we represent has been normalized - Not just a matter of aesthetics- Manoeuvring within the hierarchy became faster

… we further envision …• Facilitated access to ontologies through meta-tools

• Reducing diversity with which ontology libraries and tools have to cope with, e.g. OLS, BioPortal, PROMPT and text mining tools

• Facilitating ontology integration and cross-referencing• Comparison, alignment (OWL-import) and mapping

• Serving as guideline for new communities

Common Naming Convention - Benefits

Page 14: BioOntologies SIG, ISMB/ECCB 2007Daniel Schober, EMBL-EBI 1 Naming conventions for ontology engineering Daniel Schober, PhD The European Bioinformatics.

BioOntologies SIG, ISMB/ECCB 2007

Daniel Schober, EMBL-EBI14

• Authors and those contributing to the discussion– Susanna-Assunta Sansone, Philippe Rocca-Serra, Suzi Lewis, Waclaw

Kusnierczyk, Barry Smith, Chris Mungall, Jane Lomax, Robert Stevens, Frank Gibson, Luisa Montecchi-Palazzi, Dietrich Rebholz

• Members of MSI, PSI, OBI groups and OBO Foundry coordinators– http://msi-ontology.sf.net– http://psidev.sf.net– http://obi.sf.net– http://obofoundry.org

• Further info- “Working towards naming conventions for use in controlled

vocabulary and ontology engineering”, Bio-Ontologies SIG Proceedings, p. 29-32

• Funding sources (supporting my work)– UK BBSRC e-Science BB/D524283/1 and BB/E025080/1– Semantic Mining NoE (visits to IFOMIS and Manchester)

Acknowledgements and Resources