Who am I? Director, US National Center for Ontological Research – leader on ontology projects for...
-
date post
19-Dec-2015 -
Category
Documents
-
view
217 -
download
3
Transcript of Who am I? Director, US National Center for Ontological Research – leader on ontology projects for...
Who am I?Who am I?
• Director, US National Center for Ontological Research – leader on ontology projects for US Defense Dept.
• Key Scientist, US National Center for Biomedical Ontology
• Consultant to German Federal Health Ministry on cross-border transmission of emergency health information
• Consultant to EU epSOS (European patients Smart Open Services) project
• Member of ARGOS consortium on EU-US health information standardization
1
Co-Principal Investigator◦ Protein Ontology◦ Infectious Disease Ontology
Scientific Advisor◦ Gene Ontology (world’s most successful
ontology)◦ Cleveland Clinic Semantic Database in
Cardiothoracic Surgery ◦ Ontology of Human Expertise, Resource
Repository Project of NIH National Center for Research Resources, collaboration with Know-Soft
2
Barry SmithBarry Smith
Large-scale health IT Large-scale health IT projectsprojectsand their problems
4/24
Relational databasesRelational databasesand their problems
5/24
A brief history of the Semantic A brief history of the Semantic WebWebthe html demonstrated the power of the
Web to allow sharing of information can we use semantic technology to
create a Web 2.0 which would allow algorithmic reasoning with online information based on XLM, RDF and above all OWL (Web Ontology Language)?
can we use RDF and OWL to break down silos, and create useful integration of on-line data and information
6/24
RDF triple storesRDF triple storesand their problems
7/24
people tried, but the more they people tried, but the more they were successful, they more they were successful, they more they failedfailed
OWL breaks down data silos via controlled vocabularies for the formulation of data dictionaries
Unfortunately the very success of this approach led to the creation of multiple new silos, because multiple ontologies are being created in ad hoc ways
8/24
two factorstwo factorsTim Berners Lee mentality
(modelled on the success of html):◦ let a million ‘lite ontologies bloom’,
and somehow intelligence will be created
◦‘links’ can mean anythiingshrink-wrapped software mentality
– you will not get paid for reusing old and good ontologies
“Linked Open Data”9/
24
Ontology success stories, Ontology success stories, and some reasons for and some reasons for failurefailure
A fragment of the Linked Open Data in the biomedical domain
10
What you get with What you get with ‘mappings’‘mappings’All in Human Phenotype Ontology (= all phenotypes: excess hair loss, splayed feet ...)
mapped to
all organisms in NCBI organism classification
allose in ChEBI chemistry ontology
Acute Lymphoblastic Leukemia (A.L.L.) in National Cancer Institute Thesaurus
11
What you get with What you get with ‘mappings’‘mappings’all phenotypes (excess hair loss, duck feet)
all organisms
allose (a form of sugar)
Acute Lymphoblastic Leukemia (A.L.L.)
12
Mappings are hardMappings are hard
They are fragile, and expensive to maintain
The goal should be to minimize the need for mappings
Invest resources in ontology modules which work well together
13
Why should you care?Why should you care?you need to create systems for
data mining and text processing which will yield useful digitally coded output
if the codes you use are constantly in need of ad hoc repair huge resources will be wasted
14/24
How to do it right?How to do it right?how create an incremental,
evolutionary process, where what is good survives, and what is bad fails
create a scenario in which people will find it profitable to reuse ontologies, terminologies and coding systems which have been tried and tested
15/24
Uses of ‘ontology’ in PubMed abstractsUses of ‘ontology’ in PubMed abstracts
16
By far the most successful: GO (Gene Ontology)
17
GO provides a controlled system GO provides a controlled system of terms for use in annotating of terms for use in annotating (describing, tagging) data(describing, tagging) data
multi-species, multi-disciplinary, open source
contributing to the cumulativity of scientific results obtained by distinct research communities
compare use of kilograms, meters, seconds … in formulating experimental results
18
Hierarchical view representing relations between represented types 19
US $100 mill. invested in US $100 mill. invested in literature and data curation literature and data curation using GOusing GOover 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GOexperimental results reported in 52,000 scientific journal articles manually annoted by expert biologists using GO
20
GO is amazingly successful in GO is amazingly successful in overcoming balkanization overcoming balkanization problemproblem
but it covers only generic biological entities of three sorts:
◦cellular components◦molecular functions◦biological processes
and it does not provide representations of diseases, symptoms, …
21
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
Original OBO Foundry ontologies (Gene Ontology in yellow)
22
Developers commit to working to ensure that, for each domain, there is community convergence on a single ontology
and agree in advance to collaborate with developers of ontologies in adjacent domains.
http://obofoundry.org
The OBO Foundry: a step-The OBO Foundry: a step-by-step, evidence-based by-step, evidence-based approach to expand the approach to expand the GOGO
23
OBO Foundry PrinciplesOBO Foundry Principles
Common governance (coordinating editors)
Common training
Common architecture to overcome Tim Berners Lee-ism:
• simple shared top level ontology
• shared Relation Ontology: www.obofoundry.org/ro
24
Pistoia AlliancePistoia AllianceOpen standards for data and Open standards for data and technology interfaces in the life technology interfaces in the life science research industryscience research industry
consortium of major pharmaceutical companies working to address the data silo problems created by multiplicity of proprietary terminologies
declare terminology ‘pre-competitive’
require shared use of OBO Foundry ontologies in presentation of information
http://pistoiaalliance.org/
26
OBO Foundry (example ontologies)OBO Foundry (example ontologies)GO Gene Ontology
CL Cell Ontology
SO Sequence Ontology
ChEBI Chemical Ontology
PATO Phenotype (Quality) Ontology
FMA Foundational Model of Anatomy Ontology
ChEBI Chemical Entities of Biological Interest
PRO Protein Ontology
Plant Ontology
Environment Ontology
Ontology for Biomedical Investigations
RNA Ontology
27
Example OntologiesExample Ontologies
Human Phenotype Ontology (HPO)for genetic diseases codifying OMIM (Online Mendelian
Inheritance in Man) database
Clinical Diagnostics in Human Genetics with Semantic Similarity Searches in Ontologies. American Journal of Human Genetics, Vol. 85
28/24
Infectious Disease Ontology (IDO)
general templatewith extensions
◦HIV Ontology ◦Influenza Ontology (InfluenzO)◦Malaria Ontology (IDO-MAL)◦Staph. aureus Ontology ...
29/24
How OBO Foundry can How OBO Foundry can helphelp
The problem: ◦General: data silos◦Particular: continuity of care
30
with thanks to http://dbmotion.com 31
the problem of continuity of care: patients move around
32
f
f
f
ff
synchronic and diachronic problems of semantic interoperability
(across space and across time)
f
The Data Model That Nearly The Data Model That Nearly Killed MeKilled Meby Joe by Joe Bugajski Bugajski
http://tiny.cc/S1HWo
“If data cannot be made reliably available across silos in a single EHR, then this data cannot be made reliably available to a huge, heterogeneous collection of networked systems.”
33
EPIC, etc.EPIC, etc.will provide a way to capture and represent some of what is needed in a form that is usable
by computers (somewhat)
by you yourself
but not by other clinics, hospitals and researchers ...
34
35
f
f
f
ff
how can we link EHR 1 to EHR 2 in a reliable, trustworthy, useful way, which
both systems can understand ?
f
EHR 1 EHR 2
36
f
f
f
ff
the ideal solution: WHO International Classification of
Diseases
fICD
EHR 1 EHR 2
ICDICDPRO: De facto US billing standardMultilanguageCON: De facto US billing standard (corrupts data)No definitions of terms, and so difficult to
judge accuracy of hierarchy and of codingInconsistent hierarchiesHard to reason with resultsHence few secondary uses
37
38
f
f
f
ff
the ideal solution: a single universal clinical vocabulary
fSNOMED-CT
EHR 1 EHR 2
SNOMED CT: SNOMED CT: Systematized Nomenclature of Systematized Nomenclature of Medicine-Clinical TermsMedicine-Clinical Terms
PRO:International standard (sort of)Centerpiece of UK national programHuge resourcesFree for member countriesMulti-language (including Spanish)
39
SNOMED CTSNOMED CTCONHuge (but redundant ... and gappy)Still in need of work
◦ Lacks a coherent representation of the medical domain ◦ No consistent interpretation of relations◦ Many erroneous relation assertions◦ Many idiosyncratic relations◦ Mixes ontology with epistemology◦ It contains numerous compound terms (e.g., test for X)
without the constituent terms (here: X), even where the latter are of obvious salience
(
40
Coding with SNOMED-CT is unreliable and inconsistent
Multi-stage committee process for adding terms that follows intuitive rules and not formal principles
Does there exist a strategy for evolutionary improvement?
44
SNOMED CT
45
f
f
f
fanf
above all: SNOMED CT cannot solve the problem of continuity of care because it has
too much redundancy
f
EHR 1 EHR 2
SNOMED-CT
SNOMED redundancy SNOMED redundancy (examples)(examples)
46
SNOMED: Abscess (disorder)SNOMED: Abscess (morphologic abnormality)
SNOMED: Solitary leiomyoma (clinical finding)SNOMED: Leiomyoma, no ICD-O subtype (body structure)
50
f
f
f
ff
link EHR 1 to EHR 2 through a messaging standard
(cf. air traffic control English)
f HL7 Messaging Standard
EHR 1 EHR 2
http://hl7-http://hl7-watch.blogspot.com/watch.blogspot.com/HL7 critical blog
HL7 will in any case provide only the messaging forma – it will still need content from SNOMED CT or elsewhere
51/24
52
f
f
f
ff
link EHR 1 to EHR 2 through a snapshot of the patient’s condition which both systems
can understand
f snapshot of patient’s condition
EHR 1 EHR 2
53
f
f
f
ff
but how to formulate this snapshot?US: Clinical Care Document (CCD)
merger of Continuity of Care Record (CCR) (XML-format message types) with HL7 Common Document Architecture (CDA)
f snapshot of patient’s condition
EHR 1 EHR 2
54
f
f
f
ff
CCD is able to solve the problem at best on a case by case basis; XML still
provides only an algorithmically inaccessible blob; HL7 problems remain
f snapshot of patient’s condition
EHR 1 EHR 2
55
f
f
f
ff
CCD hard to use, hard to build the needed mappings, and no clear
strategy to ensure general validity
f snapshot of patient’s condition
EHR 1 EHR 2
56
f
f
f
ff
in any case CDA/CDD will require content provided through (something like)
SNOMED CT codes
f snapshot of patient’s condition
EHR 1 EHR 2
An example of OBO Foundry An example of OBO Foundry ontology contentontology content
Question: What is a disease?
SNOMED: Disease is_a Clinical Finding is_a SNOMED CT Concept
59
SNOMED Glossary 2010SNOMED Glossary 2010
Concept: An ambiguous term. Depending on the context, it may
refer to:
A clinical idea to which a unique ConceptID has been assigned.
The ConceptID itself
The real-world referent(s) of the ConceptID
60/24
Definitions of ‘disease’Definitions of ‘disease’A state of ill-healthA state or process of a person’s
body or mind that tends to cause ill health in the bearer
Disease is a state of a person which issues in abnormal behavior
Failing to do what one ordinarily does because of obstruction or opposition
61
OGMSOGMS
Ontology for General Medical Science
http://code.google.com/p/ogms
62
Basic Formal Ontology Basic Formal Ontology (BFO)(BFO)
PharmaOntology (W3C HCLS SIG)MediCognos / Microsoft HealthvaultMajor Histocompatibility Complex (MHC) Ontology
(NIAID)Neuroscience Information Framework Standard
(NIFSTD) and Constituent OntologiesInterdisciplinary Prostate Ontology (IPO)Nanoparticle Ontology (NPO): Ontology for Cancer
Nanotechnology ResearchNeural Electromagnetic Ontologies (NEMO)ChemAxiom – Ontology for Chemistry
http://www.ifomis.org/bfo
64
Users of BFOUsers of BFOOntology for Risks Against Patient Safety (RAPS/REMINE)Interdisciplinary Prostate Ontology (IPO)Nanoparticle Ontology (NPO): Ontology for Cancer
Nanotechnology ResearchNeural Electromagnetic Ontologies (NEMO)ChemAxiom – Ontology for ChemistryOntology for Risks Against Patient Safety (RAPS/REMINE)
(EU FP7)IDO Infectious Disease Ontology (NIAID)National Cancer Institute Biomedical Grid Terminology
(BiomedGT)US Army Biometrics OntologyUS Army Command and Control Ontology
65
DependenceDependence
temperature types
instances
organism
John John’s
temperature .
76
quality
temperature
organism
John John’s
temperature
process
life of an organism
John’s life
82
DispositionDisposition- of a glass vase, to shatter if dropped- of a human, to eat - of a banana, to ripen- of John, to lose hair
84
88
Physical DisorderPhysical Disorder
Physical DisorderPhysical Disorder
an independent continuant (part of the extended organism)
A causally linked combination of physical components that is clinically abnormal.
89
Clinically abnormalClinically abnormal
◦(1) not part of the life plan for an organism of the relevant type (unlike aging or pregnancy),
◦(2) causally linked to an elevated risk either of pain or other feelings of illness, or of death or dysfunction, and
◦(3) such that the elevated risk exceeds a certain threshold level.*
*Compare: baldness
90
91
Pathological ProcessPathological Process
=def. A bodily process that is a manifestation of a disorder and is clinically abnormal.
Disease =def. – A disposition to undergo pathological processes that exists in an organism because of one or more disorders in that organism.
92
Cirrhosis - environmental exposureCirrhosis - environmental exposure
Etiological process - phenobarbitol-induced hepatic cell death
◦ produces Disorder - necrotic liver
◦ bears Disposition (disease) - cirrhosis
◦ realized_in Pathological process - abnormal tissue repair with cell
proliferation and fibrosis that exceed a certain threshold; hypoxia-induced cell death
◦ produces Abnormal bodily features
◦ recognized_as Symptoms - fatigue, anorexia Signs - jaundice, enlarged spleen
Influenza - infectiousInfluenza - infectious Etiological process - infection of airway epithelial cells with
influenza virus◦ produces
Disorder - viable cells with influenza virus◦ bears
Disposition (disease) - flu◦ realized_in
Pathological process - acute inflammation◦ produces
Abnormal bodily features◦ recognized_as
Symptoms - weakness, dizziness Signs - fever
94
Dispositions and PredispositionsDispositions and Predispositions
All diseases are dispositions; not all dispositions are diseases.
Predisposition to Disease
=def. – A disposition in an organism that constitutes an increased risk of the organism’s subsequently developing some disease.
Huntington’s Disease – genetic (sure-fire)Huntington’s Disease – genetic (sure-fire)
Etiological process - inheritance of >39 CAG repeats in the HTT gene◦ produces
Disorder - chromosome 4 with abnormal mHTT◦ bears
Disposition (disease) - Huntington’s disease◦ realized_in
Pathological process - accumulation of mHTT protein fragments, abnormal transcription regulation, neuronal cell death in striatum◦ produces
Abnormal bodily features◦ recognized_as
Symptoms - anxiety, depression Signs - difficulties in speaking and swallowing
The problem of continuity of The problem of continuity of carecare
103
Patients move around
The The opportunityopportunity of continuity of of continuity of carecare
104
Patients move around
EHR – a new approachEHR – a new approach
Epic, Allscripts, Eclipsys ...SNOMED CT ICDOpenEHR / CEN 13606
perhaps it doesn’t matter which one you choose – they key is to exploit the fact that patients move around
105