Building a Suite of Biomedical Ontologies
Barry Smith
1
Problems with UMLS-style approaches
• let a million ontologies bloom, each one close to the terminological habits of its authors
• in concordance with the “not invented here” syndrome
• then map these ontologies, and use these mappings to integrate your different pots of data
2
Mappings are hardThey create an N2 problem; are fragile, and expensive to
maintainNeed new authorities to maintain(one for each pair of
mapped ontologies), yielding new risk of forking – who will police the mappings?
The goal should be to minimize the need for mappings, by avoiding redundancy in the first place – one ontology for each domain
Invest resources in disjoint ontology modules which work well together – reduce need for mappings to minimum possible
3
How to do it right?• how create an incremental, evolutionary process,
where what is good survives, and what is bad fails• where the number of ontologies needing to be
used together is small – integration = addition• where these ontologies are stable• by creating a scenario in which people will find it
profitable to reuse ontologies, terminologies and coding systems which have been tried and tested
4
Modularity
modularity ensures • annotations can be additive• division of labor amongst domain experts• high value of training in any given module• lessons learned in one module can benefit
work on other modules• incentivization of those responsible for
individual modules
5
Reasons why GO has been successful
It is a system for prospective standardization built with coherent top level but with content contributed and monitored by domain specialists
Based on community consensusUpdated every nightClear versioning principles ensure backwards
compatibility; prior annotations do not lose their value
Initially low-tech to encourage users, with movement to more powerful formal approaches (including OWL-DL – though still proceeding caution)
6
GO has learned the lessons of successful cooperation
• Clear documentation• The terms chosen are already familiar• Fully open source (allows thorough testing in
manifold combinations with other ontologies)• Subjected to considerable third-party critique• Tracker for user input and help desk with rapid
turnaround
7
GO has been amazingly successful in overcoming the data balkanization
problembut it covers only generic biological entities of three sorts:
– cellular components– molecular functions– biological processes
no diseases, symptoms, disease biomarkers, protein interactions, experimental processes …
8
How create a disease ontology?
• One option: a flat list• One option: template approach
– Cancer– Infectious Disease– Diabetes– Autoimmune Disease
• To make this work: think very hard about what a disease is
9
Aristotelian definitions
• To define a term ‘A’ in an ontology identify the parent term ‘B’ and start your definition:
• An A is a B which … Cs ….
A = speciesB = genusC = differentia
10
• Cancer disease is a disease which …• Genetic disease is a disease which …• Infectious disease is a disease which …
11
Information Artifact
Ontology(IAO)
Ontology for Biomedical
Investigations(OBI)
Ontology of General Medical Science (OGMS)
Basic Formal Ontology (BFO)
12
Anatomy Ontology(FMA*, CARO)
Environment
Ontology(EnvO)
Infectious Disease
Ontology(IDO*)
Biological Process
Ontology (GO*)
Cell Ontology
(CL)
CellularComponentOntology
(FMA*, GO*) Phenotypic Quality
Ontology(PaTO)
Subcellular Anatomy Ontology (SAO)Sequence Ontology
(SO*) Molecular Function
(GO*)Protein Ontology(PRO*) OBO Foundry Modular Organization
top level
mid-level
domain level
Information Artifact Ontology
(IAO)
Ontology for Biomedical Investigations
(OBI)
Ontology of General Medical Science
(OGMS)
Basic Formal Ontology (BFO)
13
Ontology for General Medical Science
http://code.google.com/p/ogms/
(OBO) http://purl.obolibrary.org/obo/ogms.obo
(OWL) http://purl.obolibrary.org/obo/ogms.owl
14
OGMS-based initiatives
Vital Signs Ontology (VSO)
EHR / Demographics Ontology
Infectious Disease Ontology (IDO)
Psychology Ontology (PSY)
Emotion Ontology (PSY-EM)
…
Genetic Disease Ontology
Cancer Ontology
15
BFO: the very top
Continuant Occurrent(Process, Event)
IndependentContinuant
DependentContinuant
16
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
17
BFO & GO
continuant occurrent
biological processes
independentcontinuant
cellular component
dependentcontinuant
molecular function
18
Basic Formal Ontology
Continuant Occurrent
process, eventIndependentContinuant
thing
DependentContinuant
quality
.... ..... .......
types
instances19
Experience with BFO in building ontologies provides
• a community of skilled ontology developers and users (user group has 120 members)
• associated logical tools • documentation for different types of users• a methodology for building conformant
ontologies by starting with BFO and populating downwards
20
Example: The Cell Ontology
How to build an ontologyimport BFO into ontology editor such as Protégé
work with domain experts to create an initial mid-level classification
find ~50 most commonly used terms corresponding to types in reality
arrange these terms into an informal is_a hierarchy according to this universality principle
A is_a B every instance of A is an instance of B
fill in missing terms to give a complete hierarchy
(leave it to domain experts to populate the lower levels of the hierarchy)
22
Basic Formal Ontology
continuant occurrent
independentcontinuant
dependentcontinuant
organism
23
Continuants
• continue to exist through time, preserving their identity while undergoing different sorts of changes
• independent continuants – objects, things, ...
• dependent continuants – qualities, attributes, shapes, potentialities ...
24
Occurrents
• processes, events, happenings– your life– this process of accelerated cell
division
25
Qualitiestemperatureblood pressuremass...
are continuantsthey exist through time while undergoing changes
26
Qualitiestemperature / blood pressure /
mass ...are dimensions of variation within the structure of the entitya quality is something which can change while its bearer remains one and the same
27
A Chart representing how John’s temperature
changes
28
A Chart representing how John’s temperature
changes
29
John’s temperature,the temperature he has throughout his entire life, cycles through different determinate temperatures from one time to the next
John’s temperature in thus changing, exerts an influence on other dimensions of variation in the physiology of the organism through time
30
BFO: The Very Top
continuant
independentcontinuant
dependentcontinuant
quality
occurrent
temperature 31
Blinding Flash of the Obvious
independentcontinuant
dependentcontinuant
quality
temperature types
instances
organism
John John’s
temperature 32
Blinding Flash of the Obvious
independentcontinuant
dependentcontinuant
quality
temperature types
instances
organism
John John’s
temperature 33
Blinding Flash of the Obvious
temperature types
instances
organism
John John’s
temperature .inheres_in
34
temperature types
instances
John’s temperature
37ºC37.1º
C37.5º
C37.2º
C37.3º
C37.4º
C
instantiates at t1
instantiates at t2
instantiates at t3
instantiates at t4
instantiates at t5
instantiates at t6
35
human types
instances
John
embryo
fetus adultneonat
einfant child
instantiates at t1
instantiates at t2
instantiates at t3
instantiates at t4
instantiates at t5
instantiates at t6
36
whole plant continuants
occurrents37
zygote
pro-embry
o
mature whole plant
globular
embryo
bilateral
embryo...
becomes reproductivel
y able
fertili-zation
first cell
division
child transformation_of fetus
38
Temperature subtypesDevelopment-stage
subtypes
are threshold divisions (hence we do not have sharp boundaries, and we have a certain degree of choice, e.g. in how many subtypes to distinguish, though not in their ordering)
39
independentcontinuant
dependentcontinuant
quality
temperature types
instances
organism
John John’s
temperature
40
independentcontinuant
dependentcontinuant
quality
temperature
organism
John John’s
temperature
occurrent
process
course of temperature
changes
John’s temperature history
41
independentcontinuant
dependentcontinuant
quality
temperature
organism
John John’s
temperature
occurrent
process
life of an organism
John’s life
42
BFO: The Very Top
continuant occurrent
independentcontinuant
dependentcontinuant
quality disposition
43
BFO: The Very Top
continuant
independentcontinuant
dependentcontinuant
qualityfunctionroledisposition
occurrent
44
disposition- of a glass vase, to shatter if dropped- of a human, to eat - of a banana, to ripen- of John, to lose hair
45
dispositionif it ceases to exist, then its bearer and/or its immediate surrounding environment is physically changedits realization occurs when its bearer is in some special physical circumstancesits realization is what it is in virtue of the bearer’s physical make-up
46
function - of liver: to store glycogen- of birth canal: to enable transport- of eye: to see- of mitochondrion: to produce ATP
not optional; reflection of physical makeup of bearer; subtype of disposition
47
independentcontinuant
dependentcontinuant
function
to seeeye
John’s eye function of John’s eye: to see
occurrent
process
process of seeing
John seeing
48
OGMSOntology for General Medical
Science
http://code.google.com/p/ogms
49
Physical Disorder
50
:.
Physical Disorder
– independent continuantfiat object part
A causally linked combination of physical components of the extended organism that is clinically abnormal.
51
Clinically abnormal
– (1) not part of the life plan for an organism of the relevant type (unlike aging or pregnancy),
– (2) causally linked to an elevated risk either of pain or other feelings of illness, or of death or dysfunction, and
– (3) such that the elevated risk exceeds a certain threshold level.*
*Compare: baldness52
Big Picture
53
Pathological Process=def. A bodily process that is a manifestation of a disorder and is clinically abnormal.
Disease =def. – A disposition to undergo pathological processes that exists in an organism because of one or more disorders in that organism.
54
Cirrhosis - environmental exposure
• Etiological process - phenobarbitol-induced hepatic cell death– produces
• Disorder - necrotic liver– bears
• Disposition (disease) - cirrhosis– realized_in
• Pathological process - abnormal tissue repair with cell proliferation and fibrosis that exceed a certain threshold; hypoxia-induced cell death– produces
• Abnormal bodily features– recognized_as
• Symptoms - fatigue, anorexia• Signs - jaundice, enlarged spleen
55
Dispositions and Predispositions
All diseases are dispositions; not all dispositions are diseases.
Predisposition to Disease
=def. – A disposition in an organism that constitutes an increased risk of the organism’s subsequently developing some disease.
56
HNPCC - genetic pre-disposition• Etiological process - inheritance of a mutant mismatch repair gene
– produces• Disorder - chromosome 3 with abnormal hMLH1
– bears• Disposition (disease) - Lynch syndrome
– realized_in• Pathological process - abnormal repair of DNA mismatches
– produces• Disorder - mutations in proto-oncogenes and tumor suppressor genes with
microsatellite repeats (e.g. TGF-beta R2)– bears
• Disposition (disease) - non-polyposis colon cancer– realized in
• Symptoms (including pain)
57
Huntington’s Disease – genetic disease
• Etiological process - inheritance of >39 CAG repeats in the HTT gene– produces
• Disorder - chromosome 4 with abnormal mHTT– bears
• Disposition (disease) - Huntington’s disease– realized_in
• Pathological process - accumulation of mHTT protein fragments, abnormal transcription regulation, neuronal cell death in striatum– produces
• Abnormal bodily features– recognized_as
• Symptoms - anxiety, depression• Signs - difficulties in speaking and
swallowing
Symptoms & Signs used_in
Interpretive process produces
Hypothesis - rule out Huntington’s suggests
Laboratory tests produces
Test results - molecular detection of the HTT gene with >39CAG repeats used_in
Interpretive process produces
Result - diagnosis that patient X has a disorder that bears the disease Huntington’s disease
58
Cirrhosis - environmental exposure
• Etiological process - phenobarbitol-induced hepatic cell death
– produces
• Disorder - necrotic liver
– bears
• Disposition (disease) - cirrhosis
– realized_in
• Pathological process - abnormal tissue repair with cell proliferation and fibrosis that exceed a certain threshold; hypoxia-induced cell death
– produces
• Abnormal bodily features
– recognized_as
• Symptoms - fatigue, anorexia
• Signs - jaundice, splenomegaly
Symptoms & Signs used_in
Interpretive process produces
Hypothesis - rule out cirrhosis suggests
Laboratory tests produces
Test results - elevated liver enzymes in serum used_in
Interpretive process produces
Result - diagnosis that patient X has a disorder that bears the disease cirrhosis
59
Systemic arterial hypertension
• Etiological process – abnormal reabsorption of NaCl by the kidney
– produces
• Disorder – abnormally large scattered molecular aggregate of salt in the blood
– bears
• Disposition (disease) - hypertension
– realized_in
• Pathological process – exertion of abnormal pressure against arterial wall
– produces
• Abnormal bodily features
– recognized_as
• Symptoms - headaches, dizziness
• Signs – elevated blood pressure
Symptoms & Signs used_in
Interpretive process produces
Hypothesis - rule out hypertension suggests
Laboratory tests produces
Test results - used_in
Interpretive process produces
Result - diagnosis that patient X has a disorder that bears the disease hypertension
60
Type 2 Diabetes Mellitus• Etiological process –
– produces• Disorder – abnormal pancreatic beta
cells and abnormal muscle/fat cells– bears
• Disposition (disease) – diabetes mellitus– realized_in
• Pathological processes – diminished insulin production , diminished muscle/fat uptake of glucose
– produces• Abnormal bodily features
– recognized_as• Symptoms – polydipsia, polyuria,
polyphagia, blurred vision• Signs – elevated blood glucose and
hemoglobin A1c
Symptoms & Signs used_in
Interpretive process produces
Hypothesis - rule out diabetes mellitus suggests
Laboratory tests – fasting serum blood glucose, oral glucose challenge test, and/or blood hemoglobin A1c produces
Test results - used_in
Interpretive process produces
Result - diagnosis that patient X has a disorder that bears the disease type 2 diabetes mellitus
61
Type 1 hypersensitivity to penicillin• Etiological process – sensitizing of mast
cells and basophils during exposure to penicillin-class substance
– produces• Disorder – mast cells and basophils with
epitope-specific IgE bound to Fc epsilon receptor I
– bears• Disposition (disease) – type I
hypersensitivity– realized_in
• Pathological process – type I hypersensitivity reaction
– produces• Abnormal bodily features
– recognized_as• Symptoms – pruritis, shortness of breath• Signs – rash, urticaria, anaphylaxis
Symptoms & Signs used_in
Interpretive process produces
Hypothesis - suggests
Laboratory tests – produces
Test results – occasionally, skin testing used_in
Interpretive process produces
Result - diagnosis that patient X has a disorder that bears the disease type 1 hypersensitivity to penicillin
62
63
Disease vs. Disease course
Disease =def. – A disposition to undergo pathological processes that exists in an organism because of one or more disorders in that organism.
Disease course =def. – The aggregate of processes in which a disease disposition is realized.
64
coronary heart disease
John’s coronary heart disease
disease associated
with asymptomatic
(‘silent’) infarction
disease associated with early
lesions and small fibrous
plaques
stable angina
disease associated
with surface disruption of plaque
unstable angina
instantiates at t1
instantiates at t2
instantiates at t3
instantiates at t4
instantiates at t5
time65
independentcontinuant
dependentcontinuant
disposition
diseasedisorder
John’s disordered
heart
John’s coronary heart
disease
occurrent
process
course of disease
course of John’s disease
66
OGMS IDO
Independent Continuant
DisorderInfectious disorder
Dependent Continuant
Disease
Predisposition to disease
Infectious disease
Protective resistance
Occurrent Disease courseInfectious
disease course
Examples of ontology terms
IDO (Infectious Disease Ontology) CoreFollows GO strategy of providing a
canonical ontology of what is involved in every infectious disease – host, pathogen, vector, virulence, vaccine, transmission – accompanied by IDO Extensions for specific diseases, pathogens and vectorsProvides common terminology resources and tested common guidelines for a vast array of different disease communities
68
Infectious Disease Ontology Consortium• MITRE, Mount Sinai, UTSouthwestern –
Influenza• IMBB/VectorBase – Vector borne diseases
(A. gambiae, A. aegypti, I. scapularis, C. pipiens, P. humanus)
• Colorado State University – Dengue Fever• Duke University – Tuberculosis, Staph.
aureus• Cleveland Clinic – Infective Endocarditis• University of Michigan – Brucellosis• Duke University, University at Buffalo – HIV
69
Influenza - infectious
• Etiological process - infection of airway epithelial cells with influenza virus
– produces
• Disorder - viable cells with influenza virus
– bears
• Disposition (disease) - flu
– realized_in
• Pathological process - acute inflammation
– produces
• Abnormal bodily features
– recognized_as
• Symptoms - weakness, dizziness
• Signs - fever 70
Influenza – disease course
• Etiological process - infection of airway epithelial cells with influenza virus
– produces
• Disorder - viable cells with influenza virus
– bears
• Disposition (disease) - flu
– realized_in
• Pathological process - acute inflammation
– produces
• Abnormal bodily features
– recognized_as
• Symptoms - weakness, dizziness
• Signs - fever 71
The disorder also induces normal physiological processes (immune response) that can results in the elimination of the disorder (transient disease course).
Big Picture
72
Top Related