1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

Post on 19-Dec-2015

218 views 1 download

Tags:

Transcript of 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

1

Part I: Biomedical Ontologies: A Critical Survey

Barry Smith

http://ontology.buffalo.edu/smith

2

I: Biomedical Ontologies: A Critical SurveyOntologies, terminologies and thesauri are now in common use in the domain of biomedical informatics. Their goal is to support search and retrieval, but also to advance genuine reasoning about biomedical phenomena and to enable re-use of heterogeneous data through the use of common systems of annotations. We examine a representative collection of biomedical ontologies in light of these criteria, and draw (somewhat sad) conclusions as to the current state of the field.

II. The Ontology of Biomedical Reality (terminology)Ontologies to support scientific research and clinical medicine have special characteristics, which we shall outline in terms of a distinction between three levels: (1) the level of reality; (2) the level of cognitive representations; and (3) the level of the publicly accessible concretizations of such cognitive representations for example in ontologies. Against this background we shall clarify the relations between ontologies, terminologies, information models, databases, and similar artifacts.

III. The OBO Foundry Project: Towards Scientific Standards and Principles-Based Coordination in Biomedical Ontology DevelopmentThe OBO Foundry is a collaborative experiment, involving a group of ontology developers who have agreed in advance to the adoption of a growing set of principles specifying best practices in ontology development. The primary objective is to establish gold standard reference ontologies, one for each core domain of biomedical science. We shall describe how this objective is already being realized, and show how it can not only help solve the problems of data retrieval and re-use but also foster the development of the powerful tools that will be needed to reason with biomedical data in the future.

3

Problem:how to reason with data deriving from different sources, each of which uses its own system of classification ?

4

Solution:

Ontology !

5

Examples of current needs for ontologies in biomedicine

to enforce semantic consistency within a database

to enable data retrieval, sharing and re-use

to enable data integration (bridging across data at multiple granularities)

to allow querying

6

General trend

on the part of NIH, FDA and other bodies to consolidate ontology-based standards for the communication and processing of biomedical data.

7

Old approach

gather terminologies in libraries

Unified Medical Language System

National Library of Medicine

8

SNOMED

DEMONS

U M L S

9

New Approach

MusicBeanz

10

http://www.w3.org/

11

Semantic Web deposits

Pet Profile Ontology

Review Vocabulary

Band Description Vocabulary

Musical Baton Vocabulary

MusicBrainz Metadata Vocabulary

Kissology

12

http://www.w3.org/

Beer Ontology

all instances of hops that have ever existed are necessarily ingredients of beer.

13

Both UMLS- and OWL-type responses involve ad hoc creation of new terminologies by each separate community, and an open-door policy for admission

Many of these terminologies remain as torsos, gather dust, poison the wells, ...

14

OWL’s syntactic regimentation is not enough to ensure high-quality

ontologies

– the use of a common syntax and logical machinery and the careful separating out of ontologies into namespaces does not solve the problem of ontology integration

15

from Ontological Engineering

location =def. a spatial point identified by a name (p. 12)

arrivalPlace =def. a journey ends at a location (p. 13)

facet = def. ternary relation that holds between a frame, a slot, and the facet (p. 51)

an example of function is Pays, which obtains the price of a room after applying a discount (p. 13)

16

from Handbook of Ontology

On 'achieving consistency from multiple sources‘:if exact semantic identity is lacking, terms can be unified at a higher level, and information that is possibly related can be retrieved as well. When the application objective is to study and understand, the end-user can reject misleading records. (p. 94)

owl:InverseFunctionalProperty defines a property that for which two different objects cannot have the same value, e.g. isTheSocialSecurityNumberOf (a social number is assigned to one person only) (p. 78)

17

SNOMED

DEMONS

U M L S

The Good, the Bad, and the UGLY

18

A methodology for quality-assurance of ontologies

tested thus far in the biomedical domain on:

FMAGO + other OBO OntologiesFuGOSNOMEDUMLS Semantic NetworkNCI ThesaurusICF (International Classification of Functioning,

Disability and Health)ISO Terminology StandardsHL7-RIM

19

The Good

Foundational Model of Anatomy (FMA)

Proclear statement of scope: structural human anatomy, at all levels of granularity, from the whole organism to the biological macromoleculePowerful treatment of definitions, from which the entire FMA hierarchy is generated – can serve as basis for formal reasoning

ConSome unfortunate artifacts in the ontology deriving from its specific computer representation (Protégé)

20

it’s better manually

Pleural Cavity

Pleural Cavity

Interlobar recess

Interlobar recess

Mesothelium of Pleura

Mesothelium of Pleura

Pleura(Wall of Sac)

Pleura(Wall of Sac)

VisceralPleura

VisceralPleura

Pleural SacPleural Sac

Parietal Pleura

Parietal Pleura

Anatomical SpaceAnatomical Space

OrganCavityOrganCavity

Serous SacCavity

Serous SacCavity

AnatomicalStructure

AnatomicalStructure

OrganOrgan

Serous SacSerous Sac

MediastinalPleura

MediastinalPleura

TissueTissue

Organ PartOrgan Part

Organ Subdivision

Organ Subdivision

Organ Component

Organ Component

Organ CavitySubdivision

Organ CavitySubdivision

Serous SacCavity

Subdivision

Serous SacCavity

Subdivision

part

_of

is_a

22

The Foundational Model of Anatomy

Follows formal rules for ‘Aristotelian’ definitions

When A is_a B, the definition of ‘A’ takes the form:

an A =def. a B which ...

a human being =def. an animal which is rational

23

FMA Example

Cell =def. an anatomical structure which consists of cytoplasm surrounded by a plasma membrane with or without a cell nucleus

Plasma membrane =def. a cell part that surrounds the cytoplasm

24

The FMA regimentation

Each definition reflects the position in the hierarchy to which a defined term belongs.

The position of a term within the hierarchy enriches its own definition by incorporating automatically the definitions of all the terms above it.

The entire information content of the FMA’s term hierarchy can be translated very cleanly into a computer representation

25

Principle

Use Aristotelian definitions

An A is a B which C’s.

26

IntermediateGALEN

Pro Allows formal representation of clinical information Allows multiple views of relevant detail as needed Uses powerful Description Logic (DL)-based formal structureMakes definitions easy to formulateConRemains only partially developedContains errors: Vomitus contains carrot

– which DLs did not prevent

27

Principle

An ontology should not remain a torso

28

Principle

An ontology should have a properly personed help desk

29

Principle

An ontology should have procedures for up-dating in light of scientific advance

30

IntermediateThe Gene Ontology

Con

Poor formal architecture

Full of errors

menopause part_of death

Poor support for automatic reasoning and error-checking

Poor treatment of definitions

Not trans-granular

No relation to time or instances

31

The Gene Ontology

Pro

Open Source

Cross-Species

... has recognized the need for reform, including explicit representation of granular levels

32

Old GO Definitions

hemolysis =def. the causes of hemolysis

GO now adopting structured definitions which contain both genus and differentiae

Species =def Genus + Differentiae

neuron cell differentiation =defdifferentiation by which a cell acquires features of a neuron

Ontology alignmentOne of the current goals of GO is to align:

cone cell fate commitment retinal_cone_cell

keratinocyte differentiation keratinocyte

adipocyte differentiation fat_cell

dendritic cell activation dendritic_cell

lymphocyte proliferation lymphocyte

T-cell homeostasis T_lymphocyte

garland cell differentiation garland_cell

heterocyst cell differentiation heterocyst

Cell Types in GO Cell Types in the Cell Ontologywith

Alignment of the two ontologies will permit the generation of consistent and complete definitions

id: CL:0000062name: osteoblastdef: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629]is_a: CL:0000055relationship: develops_from CL:0000008relationship: develops_from CL:0000375

GO

Cell type

New Definition

+

=Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.

36

Other Ontologies to be aligned with GO

Chemical ontologies3,4-dihydroxy-2-butanone-4-phosphate synthase activity

Anatomy ontologiesmetanephros development

37

Principle

Exploit existing ontologies when formulating definitions

38

The Bad

Reactome ProRich catalogue of biological process ConIncoherent treatment of categories:

ReferentEntity (embracing e.g. small molecules) is a sibling of PhysicalEntity (embracing complexes, molecules, ions and particles). Similarly CatalystActivity is a sibling of Event.

39

Principle

An ontology should be in agreement with the truths of basic science (e.g. that molecules are physical entities)

40

The UglyDisease Ontology / ICD-10

Other problems with special functions

Tuberculosis of unspecified bones and joints, tubercle bacilli not found by bacteriological or histological examination, but tuberculosis confirmed by other methods (inoculation of animals)

Other mineral salts, not elsewhere classified, causing adverse effects in therapeutic use

Other general medical examination for administrative purposes

Assault by other specified means

41

The UglyDisease Ontology / ICD-10

Other accidental submersion or drowning in water transport accident injuring other specified person

Accident to powered aircraft, other and unspecified, injuring occupant of military aircraft, any rank

Other accidental submersion or drowning in water transport accident injuring occupant of other watercraft - crew

42

The UglyDisease Ontology / ICD-10

Normal pregnancy

Fall on stairs or ladders in water transport injuring occupant of small boat, unpowered

Railway accident involving collision with rolling stock and injuring pedal cyclist

Injury due to war operations by lasers

Nontraffic accident involving motor-driven snow vehicle injuring pedestrian 

43

The UglyDisease Ontology / ICD-10

Donors of other specified organ or tissue

Fitting and adjustment of wheelchair

Hot (boiling) tap water

Training in use of lead dog for the blind

Person consulting on behalf of another person

44

Principle

An ontology should have a clearly specified domain (captured by its root node)

45

“Circular Hierarchical Relationships in the UMLS:Etiology, Diagnosis, Treatment, Complications and Prevention”

Olivier Bodenreider

Topographic regions: General terms

Physical anatomical entity

Anatomical spatial entity

Anatomical surface

Body regions

Topographic regions

46

Principle

Avoid cycles

47

MeSH

National Socialism is_a Political Systems

National Socialism is_a Anthropology ...

48

Principle

Use singular nouns

49

MeSH

National Socialism is_a MeSH Descriptor

50

Plant Ontology

cell = def. structural and physiological unit of a living organism; it (i.e., plant cell) consists of protoplast and cell wall; ...

51

Principle

For the sake of interoperability with other ontologies, do not give special meanings to terms with established general meanings

(Don’t use ‘cell’ when you mean ‘plant cell’)

52

ICNP: International Classification of Nursing Procedures

water =def. a type of Nursing Phenomenon of Physical Environment with the specific characteristics: clear liquid compound of hydrogen and oxygen that is essential for most plant and animal life influencing life and development of human beings.

53

MORE UGLYNational Cancer Institute Thesaurus

(NCIT)

54

The NCIT reflects a recognition of the need

for high quality shared ontologies and terminologies the use of which by clinical researchers in large communities can ensure re-usability of data collected by different research groups

55

NCIT

“a biomedical vocabulary that provides consistent, unambiguous codes and definitions for concepts used in cancer research”

“exhibits ontology-like properties in its construction and use”.

56

Goals

to make use of current terminology “best practices” to relate relevant concepts to one another in a formal structure, so that computers as well as humans can use the Thesaurus for a variety of purposes, including the support of automatic reasoning;

to speed the introduction of new concepts and new relationships in response to the emerging needs of basic researchers, clinical trials, information services and other users.

57

Formal Definitions

of 37,261 nodes, 33,720 were stipulated to be primitive in the DL sense

Thus only a small portion of the NCIT ontology can be used for purposes of automatic classification and error-checking by using OWL.

58

Principle

Supply definitions wherever possible

(both human-understandable natural language definitions, and equivalent formal definitions)

59

Verbal Definitions

About half the NCIT terms are assigned verbal definitions

Unfortunately some are assigned more than one

60

Disease ProgressionDefinition1

Cancer that continues to grow or spread. Definition2

Increase in the size of a tumor or spread of cancer in the body.

Definition3 The worsening of a disease over time. This concept is most often used for chronic and incurable diseases where the stage of the disease is an important determinant of therapy and prognosis.

61

Principle

Each term should have at most one definition*

*which may have both natural-language and formal versions

62

To make matters worse Disease Progression has as subclass:

Cancer Progression

Definition:

The worsening of a cancer over time. This concept is most often used for incurable cancers where the stage of the cancer is an important determinant of therapy and prognosis.

63

Cancer

a process (of getting better or worse)

an object (which can grow and spread)

64

Principle

Distinguish continuant entities (molecule, cell, tumor, organism) from occurrent entities (processes of growth, change, ...)

65

Two kinds of entitiesoccurrents (processes, events, happenings)

cell division, ovulation, death

continuants (objects, qualities, ...)

cell, ovum, organism, temperature of organism, ...

66

NCIT confuses definitions with descriptions

Tuberculosis DefinitionA chronic, recurrent infection caused by the bacterium Mycobacterium tuberculosis. Tuberculosis (TB) may affect almost any tissue or organ of the body with the lungs being the most common site of infection. The clinical stages of TB are primary or initial infection, latent or dormant infection, and recrudescent or adult-type TB. Ninety to 95% of primary TB infections may go unrecognized. Histopathologically, tissue lesions consist of granulomas which usually undergo central caseation necrosis. Local symptoms of TB vary according to the part affected; acute symptoms include hectic fever, sweats, and emaciation; serious complications include granulomatous erosion of pulmonary bronchi associated with hemoptysis. If untreated, progressive TB may be associated with a high degree of mortality. This infection is frequently observed in immunocompromised individuals with AIDS or a history of illicit IV drug use.

67

Confuses definitions with descriptionsTuberculosis

DefinitionA chronic, recurrent infection caused by the bacterium Mycobacterium tuberculosis. Tuberculosis (TB) may affect almost any tissue or organ of the body with the lungs being the most common site of infection. The clinical stages of TB are primary or initial infection, latent or dormant infection, and recrudescent or adult-type TB. Ninety to 95% of primary TB infections may go unrecognized. Histopathologically, tissue lesions consist of granulomas which usually undergo central caseation necrosis. Local symptoms of TB vary according to the part affected; acute symptoms include hectic fever, sweats, and emaciation; serious complications include granulomatous erosion of pulmonary bronchi associated with hemoptysis. If untreated, progressive TB may be associated with a high degree of mortality. This infection is frequently observed in immunocompromised individuals with AIDS or a history of illicit IV drug use.

68

A better definition

Tuberculosis

Definition:

A chronic, recurrent infection caused by the bacterium Mycobacterium tuberculosis.

IS THIS CORRECT? (An infection is not a disease)

69

the use-mention confusion

Conceptual Entities =Def.

An organizational header for concepts representing mostly abstract entities.

Confuses use and mention (swimming is healthy and has eight letters)

70

Principle

Don’t confuse an entity with the name of an entity

71

Duratec, Lactobutyrin, Stilbene Aldehyde

are classified by the NCIT as Unclassified Drugs and Chemicals

72

Problematic synonymsAnatomic Structure, System, or Substance ~ Anatomic

Structures and Systems

Does ‘anatomic’ apply only to structure or also to system and substance?

Biological Function ~ Biological Processsome biological processes are the exercises of biological

functionsothers (e.g. pathological processes, side effects) not

Genetic Abnormality ~ Molecular Abnormality (with subtype: Molecular Genetic Abnormality) (definitions not supplied)

73

Three disjoint classes of plants

Vascular Plant

Non-vascular Plant

Other Plant

74

Three kinds of cells

Abnormal Cell is a top-level class (thus not subsumed by Cell

Normal Cell is a subclass of Microanatomy.

Cell is a subclass of Other Anatomic Concept (so that cells themselves are concepts)

75

NCIT as now constituted will block automatic reasoning

Neither Normal Cells nor Abnormal Cells are Cells within the context of the NCIT

76

Some consolationsNCIT is open source

NCIT has broad coverage

NCIT has some formal structure (OWL-DL)

NCIT is much, much better than (for example) the HL7-RIM

NCIT has realized the errors of its ways

77

What might have been

http://www.cbd-net.com/index.php/search/show/938464

= “Review of NCI Thesaurus and Development of Plan to Achieve OBO Compliance”

78

Fragment of Pre-NCIT Hierarchy

Murine Tissue Type Body Fluids and Substances (MMHCC) Cardiovascular System (MMHCC) Blood Vessel (MMHCC) Heart (MMHCC) Digestive System (MMHCC)

Welcome to the Pre-NCIT:http://nciterms.nci.nih.gov/

NCIBrowser/Dictionary.do

79

More UGLY

80

MeSHMeSH Descriptors

Index Medicus Descriptor Anthropology, Education, Sociology and Social Phenomena (MeSH Category) Social Sciences Political Systems National Socialism

National Socialism is_a Political SystemsNational Socialism is_a Anthropology ...

81

MeSH

National Socialism is_a MeSH Descriptors

The Bodenreider Defence:

MeSH is not an ontology

82

BIRNLex

83

BIRNLexThe eye =def.

The eyeball and its constituent parts, e.g. retina

mouse =def.

common name for the species mus musculus

84

BIRNLex

85

BIRNLex

86

Principle

Avoid circular definitions

(The term defined should not appear in its own definition)

87The UMLS Semantic Network

88

More UglyUMLS Semantic Network

Pros

Broad coverage; no multiple inheritance

Cons

Incoherent use of ‘conceptual entities’

(e.g. the digestive system as a conceptual part of the organism)

Full of errors

89

UMLS Semantic Network

Edges in the graph represent merely “possible significant (= some-some) relations”:Bacterium causes Experimental Model of

Disease

Experimental Model of Disease affects Fungus

Experimental model of disease is_a Pathologic Function

90

UMLS Semantic NetworkUnclear what the nodes of the graph are:

Drug Delivery Device contains Clinical Drug Drug Delivery Device narrower_in_meaning_than Manufactured Object

The use-mention confusion:“Swimming is healthy and has 8 letters”

91

UMLS Semantic Network

Edges in the graph represent merely “possible significant (= some-some) relations”:Bacterium causes Experimental Model of

Disease

Experimental Model of Disease affects Fungus

Experimental Model of Disease is_a Pathologic Function

92a pudding of ‘concepts’

93

location_of

Fungus location_of Vitamin

Tissue location_of Mental or Behavioral Dysfunction

94

Fungus location_of Vitamin

Every instance of vitamin is located in some fungus?

Some instances of vitamin are located in some fungi?

Some instances of fungi have instances of vitamin located in them?

Every instance of vitamin is located in every instance of fungus?

95what are the nodes in this graph?

96

97

UMLS Semantic NetworkUnclear what the nodes of the graph are:

Drug Delivery Device contains Clinical Drug Drug Delivery Device narrower_in_meaning_than Manufactured Object

The use-mention confusion:“Swimming is healthy and has 8 letters”

98

NCIT inherits this ontological and terminological incoherence from source vocabularies in UMLS

Conceptual Entities =def

An organizational header for concepts representing mostly abstract entities.

Includes as subtypes:

action, change, color, death, event, fluid, injection, temperature

99

The UMLS

Unified Medical Language System

Metathesaurus

Semantic Network (SN)

100

BIRNLex and UMLS-SNRest =SN Daily or Recreational ActivityPrincipal Investigator =SN Professional or Occupational Group

Left handedness =SN Organism AttributeAmbidextrous =SN Finding

Brain Imaging =SN Diagnostic ProcedureBrain Mapping =SN Diagnostic Procedure & Research Activity

Healthy Adult =SN Finding

101

To build a high quality shared ontology requires hard work and

staying power

You cannot cheat by borrowing from UMLS

UMLS (= the UMLS Metathesaurus) is not an ontology

102

is_a (sensu UMLS)

A is_a B =def

‘A’ is narrower in meaning than ‘B’

grows out of the heritage of dictionaries, which reflect meanings, not biological reality

103

Concepts, Concept Names, and their Identifiers in the UMLS

The Metathesaurus is organized by concept. One of its primary purposes is to connect different names for the same concept from many different vocabularies.

104

The desperate search for ‘mappings’

A concept is a meaning. A meaning can have many different names. A key goal of Metathesaurus construction is to understand the intended meaning of each name in each source vocabulary and to link all the names from all of the source vocabularies that mean the same thing (the synonyms).

105

The desperate search for ‘mappings’

This is not an exact science. ... Metathesaurus editors decide what view of synonymy to represent in the Metathesaurus concept structure. Please note that each source vocabulary’s view of synonymy is also present in the Metathesaurus, irrespective of whether it agrees or disagrees with the Metathesaurus view.

106

These strange mappingbetween names as they appear in different source vocabularies created for widely different purposes can still be very usefulbut the source vocabularies themselves are of variable quality

(not all mappings are created equal)and the sorts of search which the UMLS supports reflects an already outmoded technology

107

is_a (sensu UMLS)congenital absent nipple is_a nipple

surgical procedure not carried out because of patient’s decision is_a surgical procedure

cancer documentation is_a cancer

disease prevention is_a disease

living subject is_a information object representing an animal or complex organism

individual allele is_a act of observation

limb is_a tissue

108

is_a (sensu UMLS)

both testes is_a testis

plant leaves is_a plant

smoking is_a individual behavior

walking is_a social behavior

109

Advantages of the methodology of shared coherently defined

ontologiesonce the interoperable gold standard reference ontologies are there, it will make sense to reformulate parts of existing incompatible terminologies (e.g. in UMLS) in terms of the standard ontologies in order to achieve greater domain coverage and alignment of different but veridical views. Thus not everything that was done in the past turns out to be a waste.

110

is_a (sensu UMLS)

A is_a B =def

‘A ’ is narrower in meaning than ‘B ’

grows out of the heritage of dictionaries

(which ignore the basic distinction between universals and instances)

111

The really ugly

112

113

HL7 Marketing

HL7 V3 claims to be:

“The foundation of healthcare interoperability”

“The data standard for biomedical informatics”

from blood banks to Electronic Health Records to clinical genomics

114

HL7 Incredibly Successful

adopted by Oracle as basis for its Electronic Health Record technology; supported by IBM, GE, Sun ...

embraced as US federal standard

central part of $35 billion program to integrate all UK hospital information systems

115

Problem V3 of HL7 is designed to address

in HL7 V2 the realization of the messaging task allows ad hoc interpretations of the standard by each sending or receiving institution.

Result: vendor products never properly interoperable, and always require mapping software.

116

The solution to this problem (V3) is the HL7 RIM

or Reference Information Model

= a world standard for exchange of information between clinical information systems

117

The V3 solution

Remove optionality by having the RIM serve as a master model of all health information, from blood banks to Electronic Health Records to clinical genomics

118

The hype

“HL7 V3 is the standard of choice for countries and their initiatives to create national EHR and EHR data exchange standards as it provides a level of semantic interoperability unavailable with previous versions and other standards. Significant V3 national implementations exist in many countries, e.g. in the UK (e.g. the English NHS), the Netherlands, Canada, Mexico, Germany and Croatia.”

119

The reality (I asked them)“None of the implementations have a national scope” (e.g. Stockholm City Council)

The paradigm Dutch national HL7 V3 EHR implementation uses HL7 technology exclusively for exchanging data (i.e. messaging). The EHR architectures themselves are HL7-free.

120

The Oracle Healthcare Transaction Base (HTB)

Oracle itself refers (April 2006) to three implementations of HTB described as being 'live for EHR projects':

1) Byrraju Foundation (BSRF) in India (Live)2) Stockholm County (planned to go live by May 2006)3) Louisiana (planned to go live by May 2006)

121

Regarding the Byrraju case, I am told that there is no V3 application running in India today and that the Byrraju Foundation is presently not using any telemedicine application that utilizes HL7.

As to the Stockholm case, the HTB was purchased and deployed in late 2004. An attempt to port a pilot system was made during the spring of 2005. This attept was abandoned, as I understand from my Swedish colleagues, partly because of poor performance (the new application performed significantly less well than the system it was designed to replace, even though it was being run on considerably more expensive hardware), and partly because of a lack of fault tolerance, which made it inadequate as a mechanism for integrating legacy systems marked by a high degree of variation in data quality. During the spring of 2006, it seems, an attempt will be made to construct a new pilot application, this time with the more modest goal of handling referrals.

122

The hype

The RIM is “credible, clear, comprehensive, concise, and consistent”

It is “universally applicable” and “extremely stable”

123

The reality

• HL7 V3 documentation is 542,458 KB, divided into 7,573 files

• It remains subject to frequent revisions

• It is very difficult to understand

124

The reality

The decision to adopt the RIM was made already in 1996, yet the promised benefits of interoperability still, after 10 years, remain elusive.

HL7 has bet the farm on the RIM – technology has advanced in these 10 years

125RIM NORMATIVE CONTENT

126

to design a message, choose from here

127

Too many combinations

as the traffic on HL7’s own vocabulary mailing list reveals, there is no adequate mechanism for ensuring that the vast number of combinations of coded terms within actual messages can be controlled in such a way that messages will be understood in the same way by designers, senders and receivers.

128

129

These pre-defined attributes

code, class_code, mood_code,

status_code, etc.

yield a combinatorial explosion:

class_code (61 values) x mood_code (13 values) x code (estimate 200) x status_code (10 codes) = 1.58 million combinations.

Adding in the other codes this becomes 810 billion.

130

Why does the RIM embody so many

combinations?

To ensure in advance that everything can be said in conformity to the standard

131

The RIM methodologydefines a set of ‘normative’ classes (Act, Role, and so on), with which are associated a rich stock of attributes from which one must make a selection when applying the RIM to each new domain (pharmacy, clinical genomics ...), Compare: attempting to create manufacturing software by drawing from a store containing pre-established parts (so that the store would need to have the bits needed for making every conceivable manufacturable thing, be it a lawnmower, a refrigerator, a hunting bow, and so on).

132

The RIM methodology

are there examples where a methodology of this sort has been made to work? Does the RIM yield a coherent basis for constructing well-designed software artifacts for functions like the EHR or computerized decision support?

133

This methodology does not impede the formation of local dialects

Different teams produce different message designs for the very same topic.

In the UK, the £ 35 bn. NHS National Program “Connecting for Health” has applied the RIM rigorously, using all the normative elements, and it discovered that it needed to create dialects of its own to make the V3-based system work for its purposes (it still does not work)

134

The RIM documentation• is subject to multiple and systematic internal

inconsistencies and unclarities: • is marked by sloppy and unexplained use of

terms such as ‘act’, ‘Act’, ‘Acts’, ‘action’, ‘ActClass’ ‘Act-instance’, ‘Act-object’

• and uncertain cross-referencing to other HL7 documents

• no publicly available teaching materials (no HL7 for Dummies)

135

from HL7 email forum (do not circulate)

“I am ... frightened when I contemplate the number of potential V3ers who ... simply are turned away by the difficulty of accessing the product.

  “Some of them attend V3 tutorials which explain V3 as the hugely complex process of creating a message and are turned off. [They] simply do not have the stamina, patience, endurance, time, or brain-cells to understand enough for them to feel comfortable contributing to debates / listserves, etc., so they remain silent.”

136

Problems of scope

Only two main classes in the RIM

Act = roughly: intentional action

Entity = persons, places, organizations, material

How can the RIM deal transparently with information about, say, disease processes, drug interactions, wounds, accidents, bodily organs, documents?

137

Diseases in the RIM... are not Acts... are not Entities... are not Roles, Participations ...

So what are they?At best: a case of pneumonia is identified as

the Act of Observation of a case of pneumonia

Note: RIM’s treatment of SNOMED codes

138

HL7 Clinical Document Architecturedefines a document as an Act

HL7’s Clinical Genomics Standard Specifications

defines an individual allele as an Act of Observation

139

Why the centrality of ‘Act’

because of HL7’s roots in US hospital messaging – and thus in US hospital billing:

intentional actions are what can be billed

140

Mayo RIM discussion of the meaning of ‘Act’ as “intentional action”

Is a snake bite or bee sting an "intentional action"?

Is a knife stabbing an intentional action?

Is a car accident an intentional action?

When a child swallows the contents of a bottle of poison is that an intentional action?

141

The RIM has no coherent criteria for deciding

For this reason, too, dialects are formed – and the RIM does not do its job. One health information system might conceive snakebites and gunshots as Procedures. Another might classify them with diseases, and so treat them as Observations.

If basic categories cannot be agreed upon for common phenomena like snakebites, then the RIM is in serious trouble.

142

Are definitions like this a good basis for achieving semantic interoperability in the biomedical domain?:

LivingSubject Definition: A subtype of Entity representing an organism or complex animal, alive or not.

143

Person (from HL7 Glossary)

Definition: A Living Subject representing single human being [sic] who is uniquely identifiable through one or more legal documents

144

The Problem of Circularity

A Person =def. A person with documents

‘An A is an A which is B’– useless in practical terms, since neither we

nor the machine can use it to find out what ‘A’ means

– incorporates a vicious infinite regress– has the effect of making it impossible to

refer to A’s which are not Bs, for example to undocumented persons

145

Katrina

146

Katrina

147

What is the RIM about?blood pressure measurement = an information item blood pressure = something in reality which exists independently of any recording of information, and which the measurement measures

Q: Is the RIM about information, or about the reality to which such information relates? A: There is no difference between the two

148

RIM Philosophy“The truth about the real world is constructed through a combination and arbitration of attributed statements ...

“As such, there is no distinction between an activity and its documentation.”

149

The RIM as an Information Model

‘a static (UML) model of health and health care information’

The scope of the RIM’s class hierarchy consists in packets of information:

the information content of invoices, statements of observations, lab reports, …

150

A good, general constraint on a theory of meaning

For each linguistic expression ‘E’

‘E’ means E

‘snow’ means snow

‘pneumonia’ means pneumonia

151

From the perspective of the RIM on the Information Model conception‘medication’ does not mean: medication rather it means:

the record of medication in an information system

‘stopping a medication’ does not mean: stopping a medication

rather it means: change of state in the record of a Substance Administration Act from Active to Aborted

152

The RIM’s Entity class

persons, places, organizations, material

153

States of Entity• active: The state representing the fact that the

Entity is currently active. • nullified: The state representing the termination

of an Entity instance that was created in error. • inactive: The state representing the fact that an

entity can no longer be an active participant in events.

• normal: The “typical” state. Excludes “nullified”, which represents the termination state of an Entity instance that was created in error

154

Persons are Entities

What do ‘active’ and ‘nullifed’ mean as applied to Person?

Is there a special kind of death-through-nullification in the case of those instances of Person who were created in error?

155

HL7 GlossaryDefinition of Animal: A subtype of Living Subject representing any animal-of-interest to the Personnel Management domain.

An Animal is not an animal. Rather (an) Animal represents an animal: it is an information item which represents a certain highly specific kind of animal-of-interest, namely an animal that is of interest to the Personnel Management domain.

156

Double Standards

The RIM is a confusion of two separate artifacts:

1. an “information model”, relating to names of persons, records of observations, social security numbers, etc.

2. a reference ontology, relating to persons, observations, documents, acts, etc.

157

The examples provided to illustrate the RIM’s classes

are almost always in conformity with the Reference Ontology Conception of the RIM

They involve the familiar kinds of things and processes in reality (medication, patients, devices, paper documents, surgery, diet, supply of bedding) with which healthcare messages are concerned.

158

HL7 Glossary:

Instances of Person include: John Smith, RN, Mary Jones, MD, etc.

not: information about John Smith ...

159

Some of the RIM’s definitions are in conformity with the

Information Model Conception

160

Definition of Act:A record of something that is being done,

has been done, can be done, or is intended or requested to be done

An Act is the record of an Act

“There is no difference between an activity and its documentation”

HL7’s backbone ‘Act’ class

161

Acts are records: but the examples of Act given by the RIM are as follows:

“The kinds of acts that are common in health care are (1) a clinical observation, (2) an assessment of health condition (such as problems and diagnoses), (3) healthcare goals, (4) treatment services (such as medication, surgery, physical and psychological therapy), ...

162

The class Procedure (a subclass of Act)

Definition of Procedure: An Act whose immediate and primary outcome (post-condition) is the alteration of the physical condition of the subject

Examples:

chiropractic treatment, acupuncture, straightening rivers, draining swamps.

163

What is an information model ?

Is it a model of entities in reality (an ontology)?

Or of information about entities in reality (an ontology)?

The RIM is an incoherent mixture of the two

Does this matter?

164

What’s gone wrong? 

People of good will are making mistakes because of insufficient concern for clarity and consistency

Even large ontologies are built in the spirit of the amateur hobbyist

Money is wasted on megasystems that cannot be used

165

Lessons for Semantic Interoperability

Clear and easily accessible documentation – based on an intuitive ontology (understandable to all classes of users)

Business model should be such that those responsible for creating documentation do not have an incentive for it to be unclear

Centralized control of documentation, to ensure consistency (too much democracy is a bad thing)

166

Lessons for Standards for Semantic Interoperability

Create standards on the basis of thorough pilot testing

(Avoid systems like the RIM, which is imposed from the top down, on a wing and a prayer)

167

What should take the place of the RIM?1. A Reference Ontology of the types of biomedical entity such

as thing, process, person, disease, infection, molecule, procedure, etc.,

2. A Reference Ontology of the types of biomedical information entity such as message, document, record, image, diagnosis, interpretation, etc.

1. provides a high-level framework in terms of which the lower-level types captured in vocabularies like SNOMED CT could be coherently organized

2. helps to specify how information can be combined into meaningful units and used for further processing.