A Tool for Collaborative Construction of Large Biological Ontologies

Post on 18-May-2015

1.116 views 3 download

Tags:

Transcript of A Tool for Collaborative Construction of Large Biological Ontologies

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

A Tool for Collaborative Construction of Large Biological Ontologies

Jie Baoa, Zhiliang Hub, Doina Carageaa, James Reecyb, Vasant G HonavaraaArtificial Intelligence Research Laboratory, Department of Computer Science

aCenter for Computational Intelligence, Learning, and DiscoverybDepartment of Animal Science,

Iowa State University, Ames, IA 50011, USAEmail: {baojie, zhu, dcaragea, jreecy, honavar}@iastate.edu

2

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Outline

• Collaborative Ontology Building (COB) Desiderata

• Limitations of CVS-based Collaboration

• COB-based on Modular Ontologies

• The COB Editor

3

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Large Biological Ontologies

Gramineae Taxonomy

Plant Ontology

Gene Ontology

MGED Ontology

(microarray)

4

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Non-collaborative Ontology Building

DownloadOntology Local Editing

UploadOntology

(single curator)

(Protégé) (OBO-Edit)

5

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Collaboration In NeedExample: Gene Ontology Consortium

6

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Collaboration In Need (2)

Swine

Cattle Chicken

Horse

Each group works on an ontology module for a particular species (according to the group’s best expertise)

Example 2: an animal trait ontology that involves multiple research groups across the world

7

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Challenges

• Knowledge Integration

• Concurrence Management

• Consistency Maintenance

• Privilege Management

• History Maintenance

• Scalability

8

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Solutions1. Pipeline

• Divide the ontology building process into sequential phrases

• Each phrase is assigned to a particular contributor.

2. CVS• Treat an ontology as a monolithic file/document;• use collaborative tools like CVS to build the ontology.

3. Modular Ontology • Build the ontology with fine-grained modules; • Different contributors can concurrently edit different

modules.

<= Very limited collaboration

<= Collaboration with high cost

<= Our approach

9

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Outline

• Collaborative Ontology Building (COB) Desiderata

• Limitations of CVS-based Collaboration

• COB-based on Modular Ontologies

• The COB Editor

10

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

CVS-based Ontology Building

Get GO CVS Account

Get Source Forge Account

Set Up CVS Access

Submit Change Request

Track the Request

User submit change suggestion

(in natural language)

Get Source Forge Account

Take a Change Request

Curator

Download Whole GO Flat File

Local Editing

Make Local Log File

Save GO Flat File

Version Control

Commit Whole New Ontology to CVS

11

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Unprincipled Authorization and Organization

• No principled mechanism to ensure curator privilege assignments,

• No clear organizational division of the whole ontology into smaller manageable units.

12

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Risk of Inconsistency

• No principled way to avoid unintended couplings and over-writing.

• The validity and consistency of the ontology are heavily dependent on the curator discipline and good community communications (e.g., via email lists).

13

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Lack of Partial Editing/Reuse

• A curator has to – download the entire ontology, before

editing,

– and submit the entire modified ontology, after editing;

• A user cannot download and reuse only a selected subset of the ontology

14

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Expensive History Maintenance

• Even a minor edit of the ontology causes the ontology file to be replicated in its entirety

• Tracing the changing history of a term requires processing the entire ontology file for comparisons

15

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Limited Participation

• Since all editing has global effect, it is diffcult to – grant privileges scope to different types of

users (e.g., core curators versus normal curators)

– accept/deny/modify/revert local changes made by other curators

• The curator community has to belimited to a small number of trusted curators.

16

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Outline

• Collaborative Ontology Building (COB) Desiderata

• Limitations of CVS-based Collaboration

• COB based on Modular Ontologies

• The COB Editor

17

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Basic Strategy

• Localize the interactions among different parts of a large ontology.

• Build an ontology with fine-grained organizational structure.

• Allow group collaboration on different ontology modules.

18

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Package-based Ontologies

• The whole ontology consists of a set of packages

• Each package represents a fragment of the whole ontology

• Each term has a "home package"

General Cattle

Pig Chicken

Animal Trait ontology

EggChicken

ReproductionGeneral

19

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Package Nesting• A nested package is a part of

another package• Could be used to represent

the organizational structure of an ontology– Arrange knowledge– Enforce hierarchical

management of knowledge

General

Pig

Pig Health

Animal trait ontology

20

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Division of Labor

• A package can be assigned to curators with the best knowledge of the relevant sub-domain. – e.g. Pig Health, Pig Reproduction

• The package hierarchy helps to manage interactions among experts with different degrees of expertise.– e.g. Pig, Pig Health

21

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Partial Reuse

General Cattle

Pig Chicken

Animal Trait Ontology(Centralized)

Pork

General

Pig

Cattle

Chicken

Pork

Animal Trait Ontology(Package-based)

Semantic importing

Knowledge incorporated in Pork ontology

Knowledge not presented in Prok ontologyLegend:

22

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Scaleability• Reduction in communication overhead and

computational time cost – Parsing– Transfering– Consistency check

• Reduction in memory requirements– Ontology can be partially loaded into memory

• Reduction in history tracking cost– Effect of changes is localized

23

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Broadened Participation• Open-community collaboration success witnessed

by DMOZ and Wikipedia• Package-based ontology management can

– Control the scope of an editing action– Minimize the risk of vandalization

• Better tradeoff between broader participation and ontology quality– There are different levels of curators, e.g. ontology

admins, pig experts, pig health experts.– An editing action can be approved or denied by a

curator with higher privileges

24

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Outline

• Collaborative Ontology Building (COB) Desiderata

• Limitations of CVS-based Collaboration

• COB-based on Modular Ontologies

• The COB Editor

25

Iowa State University Department of Computer ScienceArtificial Intelligence Research LaboratoryThe COB Editor

Pig Package

Cattle Package

Chicken Package

[BIDM06 Paper] a.k.a [8]

26

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Collaborative Ontology BuildingOntology modularity facilitates collaborative building• Each package can be independently developed• Different curators can concurrently edit the

ontology on different packages• Ontology can be only partially loaded• Unwanted interactions are minimized by limiting

term and axiom visibility• Module access privileges can be controlled by the

package hierarchy

27

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Work with COB Editor

Download

• http://www.animalgenome.org/bioinfo/projects/ATO/

• http://sourceforge.net/projects/cob (source code)

Get Ontology Account

Check out a package

CuratorCreate new

package

or Lock Package

Edit the Package

Commit the Package

(Auto) Server Change Log

28

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

More Features

• Support import/export from/to OWL and OBO format– can be used for Gene Ontology and others

• Ontology shared on a database server

• Allows multi-relational hierarchies– e.g. both is-a and part-of

• Visibility of a term can be controlled by scope limitation modifiers– e.g. public, private, protected

29

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Conclusions

• Modular ontologies can improve collaborative ontology building in many aspects

• Package-based Ontology offers an "importing" based ontolog language.

• COB Editor provides the necessary tool to collaboratively build well-structured, large-scale, biomedical ontologies

30

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Future Work

• Support of inference and consistency checking

• Accommodation and modularization of existing ontologies, e.g. GO, EC, SCOP

• Support of ontology mapping and ontology integration

• Support of more expressive ontologies, e.g. UMLS, SNOMED

31

Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory

Thanks!