DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford...

25
DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University QuickTime™ TIFF (Uncompre are needed to QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Transcript of DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford...

Page 1: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

DYI Ontology Development

Mark A. MusenProfessor of Medicine and Computer Science

Stanford University

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 2: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

Supreme genus: SUBSTANCE

Subordinate genera: BODY SPIRIT

Differentiae: material immaterial

Differentiae: animate inanimate

Differentiae: sensitive insensitive

Subordinate genera: LIVING MINERAL

Proximate genera: ANIMAL PLANT

Species: HUMAN BEAST

Differentiae: rational irrational

Individuals: Socrates Plato Aristotle …

Porphyry’s depiction of Aristotle’s Categories

Page 3: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

Page 4: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

Creating Ontologies in Machine-Processable Form

• Provides a mechanism for developers to codify salient distinctions about the world or some application area

• Provides a structure for knowledge bases that can enable– Information retrieval

– Information integration

– Automated translation

– Decision support

Page 5: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

The New Philosophers

• Categorizing “what exists” in machine-understandable form

• Providing a structure that enables– Developers to locate and update relevant

descriptions – Computers to infer relationships and properties

• Creating new abstractions to facilitate the creation of this structure

Page 6: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

There is a misconception …

• That people building ontologies are all well versed in metaphysics, computer science, knowledge representation, and the content domain

• That ontologies in the real world are “clean” and well defined

• That most people who are creating ontologies understand all the ramifications of what they are doing!

Page 7: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

Lots of ontology builders are not very good philosophers

• Nearly always, ontologies are created to address pressing professional needs

• The people who have the most insight into professional knowledge may have little appreciation for metaphysics, principles of knowledge representation, or computational logic

• There simply aren’t enough good philosophers to go around

Page 8: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

The pressing need to standardize the names of human genes

Page 9: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

But the human genome is only part of the problem …

• Biologists maintain huge databases of gene sequences and gene expression for a wide range of “model organisms” (e.g., mouse, rat, yeast, fruit fly, round worm, slime mold)

• Database entries are annotated with the entries such as the name of a gene, the function of the gene, and so on

• How do you ensure uniformity in the nature of these annotations?

Page 10: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

Gene Ontology Consortium

• Founded in 1998 as a collaboration among scientists responsible for developing different databases of genomic data for model organisms (fruit fly, yeast, mouse)

• Now, essentially all developers of all model-organism databases participate

• Goal: To produce a dynamic, controlled vocabulary that can be applied to all organism databases even as knowledge of gene and protein roles in cells is accumulating and changing

Page 11: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

GO = Three OntologiesGO = Three Ontologies

• Molecular Function – elemental activity or task

– example: DNA binding

• Cellular Component – location or complex

– example: cell nucleus

• Biological Process – goal or objective within cell

– example: secretion

Page 12: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.
Page 13: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

GO has been wildly successful!!

• Dozens of biologists around the world contribute to GO on a regular basis

• The ontology is updated every 30 minutes!

• It’s now impossible to work in most areas of computational biology without making use of GO terms

Page 14: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

But GO has real problems …

• Ontologies are represented in an idiosyncratic format that is not compatible with standard knowledge-representation systems

• The format is based on directed acyclic graphs of concepts, without the general ability to specify machine interpretable properties of concepts or definitions of concepts

• Because of the informal knowledge-representation system, lots of errors have crept into GO– Terms that are duplicated in different places– Terms with no superclasses– Uncertain relationships between terms

Page 15: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

Tension in the GO Community

• Biologists around the world with pressing needs to integrate research databases work together to add terms to GO nearly continuously– Using an impoverished, nonstandard knowledge-

representation system– Using no standards to assure uniform modeling

conventions from one part of GO to another

• Computer scientists bemoan all this ad-hoc-ery and condemn GO as a hack that will become increasingly unusable and unmaintainable

Page 16: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

The Capulets and MontaguesA plague on both your houses?

Professor Carole GobleUniversity of Manchester, UK

Warning: This talk contains sweeping generalisations

A wonderful keynote talk from the recent meeting on Standards and Ontologies for Functional Genomics

Page 17: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

PrologueTwo households, both alike in dignity,In fair genomics, where we lay our scene,(One, comforted by its logic’s rigour,Claims ontology for the realm of pure,The other, with blessed scientist’s vigour,Acts hastily on models that endure),From ancient grudge break to new mutiny,When “being” drives a fly-man to blaspheme.From forth the fatal loins of these two foesResearchers to unlock the book of life;Whole misadventured piteous overthrowsCan with their work bury their clans’ strife.The fruitful passage of their GO-mark'd love,And the continuance of their studies sage,Which, united, yield ontologies undreamed-of,Is now the hours' traffic of our stage;The which if you with patient ears attend,What here shall miss, our toil shall strive to mend.

Based on an idea by Shakespeare

Carole Goble

Page 18: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

The Montagues

Computer Science, Knowledge engineering, AILogic and Languages

TheoryTop down, well-behaved neatness

Generic and lots of toysMethodologies & patterns

Tools and standardsTechnology pushAcademic pursuit

One, comforted by its logic’s rigour,

Claims ontology for the realm of pure

Carole Goble

Page 19: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

The Capulets

Life ScientistsPractice

Bottom up, real-worldSpecific and many of them

Methodologies, community practiceTools and standards

Application pullPractical pursuit – build ‘n’ use it

The other, with blessed scientist’s vigour,

Acts hastily on models that endure

Carole Goble

Page 20: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

The Philosophers

PhilosophersTheoryTruth

Generic – the one true ontology?Methodologies, patterns & foundational ontologies

Not really into tools No push or pull

Academic pursuit

One, comforted by its logic’s rigour,

Claims ontology for the realm of pure

Carole Goble

Page 21: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

The Princes of Genomics

Rebellious subjects, enemies to peace,Profaners of this neighbour-stained steel,--Will they not hear? What, ho! you men, you beasts,That quench the fire of your pernicious rageWith purple fountains issuing from your veins,On pain of torture, from those bloody handsThrow your mistemper'd weapons to the ground,And hear the sentence of your moved prince.Three civil brawls, bred of an airy word,By thee, old Capulet, and Montague,Have thrice disturb'd the quiet of our streets,And made genomics's ancient citizensCast by their grave beseeming ornaments,To wield old partisans, in hands as old,Canker'd with peace, to part your canker'd hate:

Carole Goble

Page 22: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

A tragedy?

As in Romeo and Juliet, the threats are political

and sociological

Page 23: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

Creating ontologies has become a widespread cottage industry

• Professional Societies– MGED: Microarray Gene Expression Data Society– HUPO: Human Protein Organization

• Government– NCI Thesaurus– NIST: Process Specification Language

• Open Biological Ontologies– GO– Three dozen (and growing) other ontologies– Mostly in DAG-Edit, some in Protégé format

Page 24: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

Moving from cottage industry to the industrial age

• Government and professional societies must set expectations regarding the need for appropriate standards

• Government and professional societies must invest in educational programs to teach Montagues to identify with Capulets, and vice versa

• Demonstration projects must communicate to the potential developers of future ontologies the strengths and weaknesses of the guidelines, tools, and languages that facilitated the development work

Page 25: DYI Ontology Development Mark A. Musen Professor of Medicine and Computer Science Stanford University.

A thousand flowers are blooming from every corner of the landscape

• Ontologies are being developed by interested groups from every sector of academia, industry, and government

• Many of these ontologies have been proven to be extraordinarily useful to wide communities

• Many of these same ontologies have been shown to be structurally flawed and of uncertain semantics

• We finally are at the stage where we have tools and representation languages that can lift us out of the grass roots to create durable and maintainable ontologies with rich semantic content