Http://ontologist.com 1 The OBO Foundry A Gold Standard Approach to Ontology Evaluation Barry Smith...
-
Upload
vanessa-rollins -
Category
Documents
-
view
215 -
download
1
Transcript of Http://ontologist.com 1 The OBO Foundry A Gold Standard Approach to Ontology Evaluation Barry Smith...
1http://ontologist.com
The OBO FoundryA Gold Standard Approach to
Ontology Evaluation
Barry Smith
http://ontology.buffalo.edu/smith
2http://ontologist.com
Two types of ontology
natural-science ontologies capture terminology-level knowledge underlying the best current science
contrasted with administrative ontologies (e.g. billing ontologies, bloodbank ontologies, lab workflow ontologies) prepared for specific, local purposes
3http://ontologist.com
scientific ontologies have special features
Every term in a scientific ontology must be such that the developers of the ontology believe it to refer to some entity on the basis of the best current evidence
scientific ontologies are realism-based
4http://ontologist.com
For scientific ontologies
reusability is crucialcompatibility with neighboring scientific
ontologies
it is generalizations that are important
= universals, types, kinds
5http://ontologist.com
An ontology is a representation of universals
We learn about universals in reality from looking at the results of scientific experiments in the form of scientific theories
experiments relate to what is particular science describes what is general
6http://ontologist.com
what is the difference between an ontology and a scientific theory?
an ontology is also a terminological standardization
WHAT DOES THIS MEAN?
7http://ontologist.com
1st aspect: additivity
cell = def. plant cell, consisting of protoplast and cell wall; ... [Plant Ontology]
what happens when the users of the Plant Ontology need to consider bacterial pathogens in plants?
8http://ontologist.com
2nd aspect: calibration with reality
gold standard kilogram
the same universal is defined by reference either to some artifact or to some universal physical constant(for realists there is no problem here)
9http://ontologist.com
VIM: the InternationalVocabulary of Metrology
(i) repeated measurements always give rise to some variation in values, (ii) one can never be sure (fallibilism) that one has got the true value, Hence: (iii) there are no true values.
To keep happy those who dismiss the notion of the true value, the international community is agreeing to a set of terms which intentionally allow two possible interpretations
once again: bad philosophy leads to bad standards Compare:http://ontology.buffalo.edu/medo/Wuesteria.pdf
10http://ontologist.com
from: The NIST Reference on Constants, Units and UncertaintyThe creation of the decimal Metric System at the time of the French Revolution and the subsequent deposition of two platinum standards representing the meter and the kilogram, on 22 June 1799, in the Archives de la République in Paris can be seen as the first step in the development of the present International System of Units.
11http://ontologist.com
from: The NIST Reference on Constants, Units and UncertaintyIn the 1860s Maxwell and Thomson ‘formulated the requirement for a coherent system of units with base units and derived units. In 1874 the British Association for the Advancement of Science introduced the CGS system, a three-dimensional coherent unit system based on the three mechanical units centimeter, gram and second, using prefixes ranging from micro to mega to express decimal submultiples and multiples. The following development of physics as an experimental science was largely based on this system.’
12http://ontologist.com
13http://ontologist.com
Base and Derived Units
Units based on undefined SI dimensions: meter, second, kilogram, ampere, candela, kelvin, mole.
Units based on defined SI dimensions: volume, area, velocity, acceleration, newton, joule, pascal, coulomb, farad, henry, hertz, lumen, lux, ohm, etc.
Dimensions can be multiplied and divided (meters/second).
14http://ontologist.com
The SI System of Units
is a qualitative ontology: it captures qualitative dimensions of reality to which quantities can be applied (it captures measurable dimensions of reality)
there is a degree of conventionality in the choice of basic vs. derived units, and in the standard [e.g. the Paris meter] that is used to define the unit in each dimension
15http://ontologist.com
but the dimensions themselves exist independently of our conventions
so that an ontology of these dimensions is a true representation of an independently existing reality
16http://ontologist.com
Quantities are UniversalsIngvar Johansson:
Many different things can simultaneously have a mass of 5kg (length of 4m, etc.).
Determinate quantities are universals, which means that they have many instances
17http://ontologist.com
Units Ontology
developed in conjunction with PATO, the Phenotypic qualities ontology
obo.sourceforge.net/cgi-bin/detail.cgi?quality
18
fiat subtypes of qualities
spatial quality
length weighttemperature
is_a
1mm 1cm 1g 1kg…
quality
19
Representation of measurements
spatial quality
length weighttemperature
is_a
mm
cm
kg
g
qualityunit
measurement_of
20http://ontologist.com
Ingvar Johansson:
(a) no object can possibly at one and the same time take two values of the same quantity dimension
(b) in case of additive quantities, only quantities of the same dimension can be added together to give rise to a sum: no material object can have two masses, and masses can only be added to other masses
21http://ontologist.com
Controlled vocabulary
Each SI unit is represented by a symbol, not an abbreviation. The use of unit symbols is regulated by precise rules.
These symbols are the same in every language of the world, even though the names of the units themselves vary in spelling according to national conventions.
22http://ontologist.com
The SI system of units gives you:
a gold standard controlled vocabulary for the expression of scientific results which makes these results comparable and integratable– my hypotheses can be checked against your
datamy measuring equipment can be callibrated against your measuring equipment (because each can be callibrated against the same gold standard)the SI system of units can serve as a gold standard because it is a true reflection of an independent reality
23http://ontologist.com
a system of units is a legend for measurement data
heartrate
cadencespeed
torque power
24http://ontologist.com
compare: legends for mapscompare: legends for maps
25http://ontologist.com
Creating a system of units
is not easy; it has to match the way the measurable dimensions are interconnected in reality
it may need to be revised in light of new discoveries about how reality is structured
26http://ontologist.com
after Maxwell and Thomson
the subsequent development of physics as an experimental science was largely based on their system of standardized units.
27http://ontologist.com
analogous achievements also in chemistry
IUPAC
InChI
and in molecular biology,
for proteins, enzymes, genes, etc.
IUBMB
HUGO Gene Nomenclature Committee,
etc.
28http://ontologist.com
Periodic Table
29http://ontologist.com
the goal of realist ontology
to generalize this achievement– specifically in biology– and in medicine (where forces are at work
which tend to thwart standardization of vocabulary)
to move from standardizations of nouns to standardizations of sentences
gene expression data
realist ontologies are legends for data
31http://ontologist.com
where in the body ?
what kind of disease process ?
need for semantic annotation of data
in what kind of cell?
32http://ontologist.com
33http://ontologist.com
the Gene Ontology is already a de facto standard
34http://ontologist.com
natural language labels organized in a graph-theoretic structure,designed to make the data
cognitively accessible to human beings
algorithmically accessible to machines
linked up to other data resources because the same labels have been used
35http://ontologist.com
compare: legends for cartoons (for diagrams in scientific texts)
36http://ontologist.com
xi = vector of measurements of gene i k = the state of the gene ( as “on” or “off”)θi = set of parameters of the Gaussian model......
ontologies are legends for mathematical equations
37http://ontologist.com
or chemistry diagrams
Prasanna, et al. Chemical Compound Navigator: A Web-Based Chem-BLAST, Chemical Taxonomy-Based Search Engine for Browsing Compounds
PROTEINS: Structure, Function, and Bioinformatics 63:907–917 (2006)
38http://ontologist.com
annotation using common ontologies yields integration of databases
MouseEcotope GlyProt
DiabetInGene
GluChem
Holliday junction helicase complex
39http://ontologist.com
What is mapping (1)
“Given two ontologies A and B, mapping one ontology with another means that for each concept (node) in ontology A, we try to find a corresponding concept (node), which has the same or similar semantics, in ontology B and vice verse.”
M. Ehrig M and Y. Sure, Ontology mapping - an integrated approach. In Proceedings of the First European Semantic Web Symposium, ESWS 2004,
volume 3053 of Lecture Notes in Computer Science, pages 76–91, Heraklion, Greece, May 2004. Springer Verlag.
40http://ontologist.com
What is mapping (2)“the task of relating the vocabulary of two ontologies in such a way that the mathematical structure of ontological signatures and their intended interpretations, as specified by the ontological axioms, are respected ”.[ontological signature = a hierarchy of concept symbols together with a set of relation symbols whose arguments are defined over the concepts of the concept hierarchy]
Y. Kalfoglou and M. Schorlemmer, Ontology mapping: the state of the art. Knowl. Eng. Rev., 18(1): 2003.
41http://ontologist.com
What is mapping (3)“a formal expression that states the semantic
relation between two entities belonging to different ontologies”,
“Simple examples are: concept c1 in ontology O1 is equivalent to concept
c2 in ontology O2; concept c1 in ontology O1 is similar to concept c2
in ontology O2; individual i1 in ontology O1 is the same as
individual i2 in ontology O2”
P. Bouquet et al. KnowledgeWeb deliverable D2.2.1. Specification of a common framework for characterizing alignment.
42http://ontologist.com
One way to support ontology matching (and evaluation)
have experts manually prepare for each given matching problem a gold standard to which matching efforts could be compared.
– M. Ehrig and J. Euzenat, Relaxed Precision and Recall for Ontology Matching, in: Proc. K-Cap 2005 workshop on Integrating ontology, Banff (CA), p. 25-32, 2005.
43http://ontologist.com
Gold standard methodology for ontology evaluation
is very expensive
who are the experts?
sometimes cannot be done for political reasons• UMLS metathesaurus
even a gold standard can contain errors
44http://ontologist.com
Solution: The OBO Foundry1. some large pieces already exist (especially
Gene Ontology, Foundational Model of Anatomy)
2. processes of unification and reform already in place
3. all participants aiming for additivity4. procedures for constant update in light of
scientific advance
http://obofoundry.org
45http://ontologist.com
science basis of the GO: trained experts curating peer-reviewed literature
RESULT: a slowly growing computer-interpretable map of biological reality within which major databases are automatically integrated in semantically searchable form
Contrast: data-mining based approaches to ontology construction
The GO methodology of annotations
46http://ontologist.com
Systematic annotation of references to gene products in literature
• leads to improvements and extensions of the ontology• leads to better annotations• leads to a virtuous cycle of improvement in the quality and reach of both future annotations and the ontology itself
47http://ontologist.com
Five bangs for your GO buckscience base
cross-species database integration
cross-granularity database integration
through links to the entities in biological reality
semantic searchability links people to software
48http://ontologist.com
a shared portal for (so far) 58 ontologies (low regimentation)
http://obo.sourceforge.net NCBO BioPortal
First step (2003)First step (2003)
49http://ontologist.com
50http://ontologist.com
Second step (2004)reform efforts initiated, e.g. linking GO to other
OBO ontologies to ensure orthogonality
id: CL:0000062name: osteoblastdef: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." is_a: CL:0000055relationship: develops_from CL:0000008relationship: develops_from CL:0000375
GO
Cell type
New Definition
+
=Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.
51http://ontologist.com
The OBO FoundryThe OBO Foundryhttp://obofoundry.org/http://obofoundry.org/
Third step (2006)Third step (2006)
52http://ontologist.com
A prospective standarddesigned to guarantee interoperability of ontologies from the very start (contrast to: post hoc mapping)
established March 2006
12 initial candidate OBO ontologies – focused primarily on basic science domains
several being constructed ab initio
by influential consortia who have the authority to impose their use on large parts of the relevant communities.
53http://ontologist.com
undergoing rigorous reform
new
GO Gene OntologyChEBI Chemical Ontology CL Cell OntologyFMA Foundational Model of AnatomyPaTO Phenotype Quality OntologySO Sequence Ontology
CARO Common Anatomy Reference Ontology CTO Clinical Trial OntologyFuGO Functional Genomics Investigation OntologyPrO Protein Ontology RnaO RNA Ontology RO Relation Ontology
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
54http://ontologist.com
Ontology Scope URL Custodians
Cell Ontology (CL)
cell types from prokaryotes to mammals
obo.sourceforge.net/cgi-
bin/detail.cgi?cell
Jonathan Bard, Michael Ashburner, Oliver Hofman
Chemical Entities of Bio-
logical Interest (ChEBI)
molecular entities ebi.ac.uk/chebiPaula Dematos,Rafael Alcantara
Common Anatomy Refer-
ence Ontology (CARO)
anatomical structures in human and model
organisms(under development)
Melissa Haendel, Terry Hayamizu, Cornelius
Rosse, David Sutherland,
Foundational Model of Anatomy (FMA)
structure of the human body
fma.biostr.washington.
edu
JLV Mejino Jr.,Cornelius Rosse
Functional Genomics Investigation
Ontology (FuGO)
design, protocol, data instrumentation, and
analysisfugo.sf.net FuGO Working Group
Gene Ontology (GO)
cellular components, molecular functions, biological processes
www.geneontology.org
Gene Ontology Consortium
Phenotypic Quality Ontology
(PaTO)
qualities of anatomical structures
obo.sourceforge.net/cgi
-bin/ detail.cgi?attribute_and_value
Michael Ashburner, Suzanna
Lewis, Georgios Gkoutos
Protein Ontology (PrO)
protein types and modifications
(under development)Protein Ontology
Consortium
Relation Ontology (RO)
relationsobo.sf.net/
relationshipBarry Smith, Chris
Mungall
RNA Ontology(RnaO)
three-dimensional RNA structures
(under development) RNA Ontology Consortium
Sequence Ontology(SO)
properties and features of nucleic sequences
song.sf.net Karen Eilbeck
55http://ontologist.com
to providing a FRAMEWORK OF RULES to counteract the current policy of ad hoc creation of new ontologies y each clinical research group
REUSABILITY: if data-schemas are formulated using a single well-integrated framework ontology system in widespread use, then this data will be to this degree itself become more widely accessible and usable
GOALS
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
56http://ontologist.com
to serve as BENCHMARK FOR IMPROVEMENTS: once a system of interoperable reference ontologies is there, it will make sense to calibrate existing terminologies in its terms in order to achieve more robust alignment and greater domain coverage
GOALS
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
57http://ontologist.com
Gold standard
Two aspects:
1. an expression of practice carried out perfectly (for example, the optimal therapy for a given medical problem)
2. based on complete acceptance or consensus: everyone qualified to render a judgement would agree to what the gold standard is.
Friedman CP, Wyatt J. Evaluation Methods in Medical Informatics
58http://ontologist.com
Gold standards
are worth approximating. That is, “tarnished” or “fuzzy” standards are better than no standards at all. ... studies comparing the performance of information resources against imperfect standards, so long as the degree of imperfection has been estimated, represent a stronger approach than those that bypass the issue of a standard altogether.
Friedman CP, Wyatt J. Evaluation Methods in Medical Informatics
59http://ontologist.com
Gold standardscan also be partial: to serve ontology matching and evaluation it is enough to have ontologies comprehending even selected aspects of biomedical reality, provided the assertions contained in these ontologies are universally truein non-closed worlds, gold standards will always be partial in complex disciplines gold standards will always be evolving
60http://ontologist.com
the constraint of universalityOBO Foundry ontologies accept only those relations
between their terms which obtain universally (= for all instances)
lung is_a anatomical structurelobe of lung part_of lung
Compare: electrons have a negative electric charge electrons have a negative electric charge of 1.6 × 10-19 coulomb
61http://ontologist.com
Principle of Low Hanging Fruit
Ontologies should include even absolutely trivial assertions (assertions you know to be universally true)
herpes virus is_a virus
Computers need to be led by the hand
62http://ontologist.com
if the standard is to workit has to simulate the achievements of the SI system
of units• simple• controlled vocabulary• wide acceptance• uncontroversial• allows cross-disciplinary, cross-experimenter
callibration • my data can confirm or disconfirm your
hypothesis