Developing an application ontology for biomedical resource annotation and retrieval: challenges and...
-
Upload
kate-hartell -
Category
Documents
-
view
230 -
download
0
Transcript of Developing an application ontology for biomedical resource annotation and retrieval: challenges and...
Developing an application ontology for biomedical resource annotation
and retrieval:challenges and lessons learned
C. Torniai, M. Brush, N. Vasilevsky, E. Segerdell, M. Wilson, T. Johnson, K. Corday, C. Shaffer and M. Haendel
eagle-i project Aims Ontology role
eagle-i ontology Requirements Implementation
Implementation choices Challenges
Outline
c o n s o r t i u m
eagle-i• NIH funded pilot project working to make
scientific resources more visible via a federated network of nine institutional repositories
Index invisible resources• reagents, protocols, techniques,
instruments, expertise, organisms, software, training, human studies, biological specimens, etc.
Ontology-driven approach to research resource annotation and discovery
Facilitate development of shared semantic entities that can be referenced in publications, databases, experiments, etc.
c o n s o r t i u m
c o n s o r t i u m
1) Represent collected resource information
2) Use the set of ontologies to control the data collection and search applications user-interface (UI) and logic
3) Build a set of ontologies that are reusable and interoperable with other ontologies and existing efforts for representing biomedical entities
Ontology development drivers
c o n s o r t i u m
Ontology role in eagle-i architecture
eagle-i ontologies
Federated Network
Repositories (RDF)
NIF, PubMed Entrez Gene
Search Application
Data Collection Application
Resource informationcollection
Ontology/Method Scope/Purpose
Basic Formal Ontology (BFO) Upper ontology
Information Artifact Ontology (IAO) Ontology metadata
Relation Ontology (RO) Common properties
Minimum Information to Represent External Ontology
Terms (MIREOT)
Reuse classes and properties from external ontologies
Implementation
c o n s o r t i u m
Ontology layersGoal: to decouple research resources representation from information used for application appearance and behavior
• Application specific module– Classes, annotation properties
and individuals required to drive the UIs
• eagle-i core ontology– Classes and properties used to
represent information about biomedical research resources
• MIREOT files– Externally sourced classes and
properties
c o n s o r t i u m
eagle-i core and MIREOTed sources
eagle-i core ontology: 1283 classes, 56 object properties, and 61 data properties.
External Ontologies Purpose/subsets Classes
Ontology of Biomedical
Investigations (OBI)research material entities, processes, devices, roles 509
NCBI Taxonomy Organisms taxa 192
VIVO ontology people, organization, publications 20
Ontology of Clinical Research (OCRe)
human study designs and facets 19
Biomdedical Resource Ontology (BRO) instruments 13
Application-specific moduleContains properties and classes required to drive the UIs of the data collection and search applications
– UI Annotation Definition file– Definition of UI annotation
properties and sets of values for these properties
– UI Annotations file – Holds annotations made on
eagle-I core and MIREOTed classes and properties
c o n s o r t i u m
Examples of annotation values and use
Label Description Example
Primary Resource TypeDenotes classes for which instances are
collected ‘instrument’,
‘biospecimen’, ‘protocol’
Data Model Exclude
Denotes classes or properties that are not included in the model
used for the data tool or the search tool UIs
BFO classes such ‘continuant’ or
‘occurrent’ or RO relations such ‘precedes’
Embedded Class
Denotes a class for which instances can only
be created in the context of an
embedding class
‘antibody immunogen’ created within
‘antibody’, ‘construct insert’ created within
‘plasmid’
Additional application-specific properties
PropertyLabel
Description Example Property Type
eagle-i domain constraint
Used to specify the domain of an imported property.
Each annotation will contain the URI of one
class
Value set to “OBI_0000245”
(‘organization’) for RO property
‘location_of’’
Data Property
eagle-i range constraint
Used to specify the range of an imported property.
Each annotation will contain the URI of one
class
Value set to “ERO_0000004”
(‘instrument’) for RO property
‘located_in’
Data Property
eagle-i preferred label
Defines the value of preferred label to display in the data collection tool
and search UIs
Capitalized ‘Organization’ for
OBI_0000245 (‘organization’)
Annotation Property
Classes annotated with ‘primary resource type’
Construct insert is an example of a resource annotated as an ‘embedded class’,
‘eagle-i preferred definition’ is used for tooltips
‘eagle-i preferred label’ is used for the display nameProperty annotated as ‘’primary property’
Technique is annotated as ‘referenced taxonomy’
Data Collection Application
c o n s o r t i u m
Reuse of existent ontologies Ontology Layers
Application-specific module Community coordination and alignment Best practices and tools
Challenges and benefits
c o n s o r t i u m
BFO and the relation ontology (RO) OBO Foundry orthogonality principle
Advantages – Integration with other ontologies– Ease the design process– Data integration and publication (Linked Open Data)
Challenges– Need to exclude some classes (continuant, occurrent) from UI
visualization after the inferred module has computed– Domain and Range in RO not specified or not specific enough for an
application– Not all relevant ontologies are built using BFO and RO
Reuse of existent ontologies
c o n s o r t i u m
Advantages– Effective means to drive an application UI while
maintaining interoperability with external ontologies and data sources
– Facilitate parallel concurrent development Challenges– Keeping the annotations current with the core module– Risk of excessive proliferation of annotation
properties as quick way to simplify application development complexity
Ontology layers
c o n s o r t i u m
Requirements for bridging the gap between an application and domain-specific ontologies – Application-specific labels and definitions– Exclusion of sets of classes and properties from
the model used by the application– Restriction of domain and range for some
imported properties – Definition of display order of object and data
properties at class level
Application-specific module
c o n s o r t i u m
Commitment to collaboration with similar efforts aimed at resource modeling – Aligned high level models with NIF, RDS, VIVO– Service, instrument (device) implemented in OBI and reused by NIF
and eagle-i– Coordinated representation of reagents, biospecimens, and genotype
information (in progress)
Challenges– Process is time consuming and it requires extra implementation efforts
• Implement and import back from reference ontologies
– Application ontologies have peculiar requirements • Example: Service hierarchy in eagle-i based on type of process rather than
input and output of the process (OBI)
Community coordination
c o n s o r t i u m
Reusing/referencing existent ontologies– Ontofox, OWL module extractor, NCBO extractor service
Have tools integrated in ontology editors (Protégé)– Effective methods for managing and syncing MIREOTed
terms
Have several “community views” or ‘slims’ that could be directly imported with different level of complexity
Best practices and tools
c o n s o r t i u m
Developing an ontology-driven application has been an important benchmark for usage of biomedical ontologies
We have designed a layered set of ontologies, consisting of a broadly applicable core ontology and application-specific module– Requirements and principles to inform a general design
pattern
Future steps Refining, documenting and sharing requirements and lessons
learned Engage in efforts addressing the issues we have experienced
Conclusion
c o n s o r t i u m
Thank you
eagle-i core module: http://code.google.com/p/eagle-i/eagle-i search: http://eagle-i.net
Carlo [email protected]
Acknowledgments: Ted Bashor, Rob Frost, Larry Stone and Daniela Bourges
Project funded through NIH/NCRR ARRA award #U24RR029825