A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards &...

46
A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace * Slides reuse material from Kleinsorge, Willis, and Emrick; 2007.

Transcript of A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards &...

Page 1: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

A.I. in health informatics lecture 5 standards &

ontolologies kevin small & byron wallace

*Slides  reuse  material  from  Kleinsorge,  Willis,  and  Emrick;  2007.  

Page 2: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

today

•  standards –  facilitates interoperability if well designed –  stifles creativity if poorly designed

•  ontologies – standardization of knowledge

•  UMLS

Page 3: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

standards

•  set of rules and definitions regarding completion of a process

•  permits disassociated entities to cooperate

•  important for medicine

Page 4: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

standards

•  required when excessive diversity impedes effectiveness

•  standards can impede innovation

•  effective and timely standards can also focus innovation

Page 5: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

health care standards

•  portability of patient records –  accuracy –  security, privacy

•  uniformity in billing

•  standardization of clinical information

•  electronic communication standards

Page 6: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

origination of standards

•  ad hoc – mutual agreement of participating entities

•  de facto –  incidentally generated by dominant entity

•  government mandate –  HCFA UB92 insurance-claim form

•  consensus –  open process by interested parties

Page 7: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

developing standards

•  identification stage –  need and technological maturity

•  conceptualization stage –  purpose, scope, format, etc.

•  discussion stage –  identification of critical issues, timelines

•  early implementation •  conformance and certification

Page 8: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

the fruit of standards

Page 9: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

…and

•  ANSI, CEN, ISO, ASTM •  Health Care Informatics Standards

Board (HISB) – electronic health care records – data exchange – health care codes and terminology –  communication with devices and

instrumentation – Knowledge and model representation – privacy, confidentiality, security

•  HIMSS, CPRI, IHE, NQF, WEDI

Page 10: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

why do we care?

•  in many (most) ways, we (I) don’t

•  standardized information is easier to reason with (less ambiguity) – ontologies

•  research science should have a role in determining standards

Page 11: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

ontologies

“An  ontology  may  take  a  variety  of  forms,  but  necessarily  it  will  include  a  vocabulary  of  terms,  and  some  specifica3on  

of  their  meaning.    This  includes  definiFons  and  an  indicaFon  of  how  concepts  are  inter-­‐related  which  collecFvely  impose  a  structure  on  the  domain  and  constrain  the  possible  interpretaFon  of  terms”  

-­‐Uschold  et  al.,  1998  

Page 12: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

ontologies

Burgun  et  al.,  2001  

Page 13: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

UMLS semantic network

Kleonsorge  et  al.,  2007  

Organism process of

Embryonic Structure

Anatomical Abnormality

Congenital Abnormality

Acquired Abnormality

Fully Formed Anatomical Structure

Anatomical Structure

part of

Organism Attribute

property of

Body Substance

contains,produces conceptual

part of

evaluation of

Body System conceptual part of

part of

Body Part, Organ or Organ Component

part of

Tissue

part of

Cell

part of

Cell Component

Gene or Genome

Body Space or Junction

adjacent to

location of

location of

evaluation of Finding

Laboratory or Test Result

Sign or Symptom

Biologic Function

Physiologic Function

Pathologic Function

Body Location or Region

conceptual part of

conceptual part of

Injury or Poisoning

disrupts

disrupts

co-occurs with

Page 14: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

ontologies

•  frames knowledge within a domain –  structured representation – universal encoding

•  reduces ambiguity

gene   “the  coding  region  of  DNA”  

“DNA  fragment  that  can  be  transcribed  and  translated  in  to  a  protein”  

“DNA  region  of  biological  interest  with  a  name  and  that  carries  a  geneFc  trait  or  phenotype”  

Page 15: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

ontologies

•  scientific knowledge desires precision – decomposes into entities and relationships –  logical reasoning

•  allows storing information in databases –  requires hand encoding or information

extraction from natural language – efficient storage of volumes of knowledge – human/machine interface

Page 16: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

ontological components

•  concepts – entities within a domain

•  relations –  interactions between concepts

•  instances – named entities

•  axioms – constraints between (named) entities

conceptualizaFon  

concrete  

Page 17: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

concepts

•  primitive concepts – globular protein hydrophobic core

•  defined concepts – nucleus containing Eukaryotic

Page 18: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

relations

•  taxonomic – specialization (“is a kind of”) – partitive (“is a component of”)

•  associative – nominative –  locative – causative – many others…

Page 19: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

ontology use

•  domain-oriented – encodes domain knowledge

•  task-oriented – encodes methods of completing tasks

•  generic – open frameworks (e.g., Cyc)

Page 20: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

ontology benefits

•  community reference

•  schema specification

•  ontology-based search

•  input to NLP systems

Page 21: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

ontology life-cycle

Stevens  et  al.,  2000  

Page 22: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

knowledge representation

•  natural-language vocabularies – hand-crafted tree inheritance structures

•  frame-based systems – similar to object-based modeling

•  description logics – primacy of relationship inference

Page 23: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

evaluating ontologies

•  expressivity – can the domain be encoded?

•  rigor – satisfiable and consistent

•  semantics – captures intended meaning?

Page 24: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

Unified Medical Language System (UMLS®)

•  more than just an ontology •  ostensibly rich knowledge source for AI

research •  http://umls.nlm.nih.gov

“The  UMLS,  or  Unified  Medical  Language  System,  is  a  set  of  files  and  soRware  that  brings  together  many  health  and  biomedical  vocabularies  and  standards  to  enable  interoperability  between  computer  systems.  

You  can  use  the  UMLS  to  enhance  or  develop  applicaFons,  such  as  electronic  health  records,  classificaFon  tools,  dicFonaries  and  language  translators.”  

Page 25: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

NLM strategy

•  designating U.S. standards – starting point and maintenance

•  coordinate development of standards into interlocking set – broaden participation – promote usage

Page 26: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

standards

•  CHI  (clinical)        LOINC  •  E.g.,  lab  test  results,  problems,  diagnoses,  history,  physical  •  Electronic  exchange  of  clinical  health  informaFon  in  U.S.  Government  

systems  

•  HIPAA  (administraFve)    CPT  •  e.g.,  health  insurance  claims,  billing,  ordering  •  HIPAA  AdministraFve  SimplificaFon  provisions    •  Designated  DHHS  naFonal  standards  for  electronic  healthcare  

transacFons  

•  PHIN  (public  health)    ICD-­‐9-­‐CM  •  e.g.,  disease  surveillance,  immunizaFon  rates,  environmental  

monitoring  •  CDC  designated  standards  for  public  health  reporFng

Page 27: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

UMLS objectives

•  intellectual middleware •  for developers (not end users)

•  knowledge sources to ameliorate: – disparities in language and format (e.g.,

atrial fibrillation, auricular fibrillation, af)

– disparities in granularity and perspective (e.g., contusions, hematoma, bruise)

Page 28: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

knowledge sources

Metathesaurus   SemanFc  Network  SPECIALIST  

Lexicon  &  Tools  

135  broad  categories  and  54  relaFonships  between  categories  

1  million+  biomedical  concepts  from  over  100  sources  

lexical  informaFon  and  programs  for  language  processing  

3  knowledge  sources  (used  separately  or  together)  

Page 29: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

Metathesaurus

•  very large •  multipurpose •  multi-lingual

•  information regarding – biomedical/health concepts – names and associated codes – relationships amongst concepts

Page 30: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

source vocabularies

•  derived from clinical, research, administrative, public health, etc.

•  valid values – thesauri: MeSH, CRISP, NCI – statistical classifications: ICD-9-CM – billing codes: CPT, ABC codes – clinical coding: SNOMED CT

Page 31: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

source vocabularies

•  intended to coordinate, not derive a single vocabulary –  Diagnosis/signs and symptoms: ICD9CM,

ICD10, ICD10CM, ICD10AM, ICD-O, ICPC, ICF, SNOMED CT, Read Codes, MedDRA, MEDCIN, DSM

–  Procedures: CPT, CDT, HCPCS, OCPS, SNOMED CT, ICD9CM, ICD10-PCS

–  Nursing: NANDA, NIC, NOC, OMS, HHC –  Diagnostic tests: LOINC, UltraSTAR –  Drugs: VANDF, NDC, RXNORM, NDDF – Medical devices: SPN, UMD –  Genomics: GO, HUGO, NCBI Taxonomy

Page 32: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

term clusters

•  concepts contain synonymous terms •  preferred term is indicated

– unique identifier (CUI) is assigned            term                                                                                    source                        term  type            source  ID  Addison’s  disease                                              Metathesaurus            PN    Addison’s  disease                                              SNOMED  CT                  PT      363732003  Addison’s  Disease                                            MedlinePlus              PT            T1233  Addison  Disease                                                  MeSH      PT      D000224  Bronzed  disease                                                    SNOMED  Intl      SY                DB-­‐70620      Primary  Adrenal  Insufficiency    MeSH      EN      D000224  Primary  hypoadrenalism                      MedDRA    LT      10036696          syndrome,  Addison  

C0001403   Addison’s  disease  

Page 33: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

concept organization

Concept  C0001621  

[…]  Term  

L0001621  

S0011231  Adrenal  Gland  Disease      A0020266  MeSH        A7568579  NCI  Thesaurus

S0000441  Disease  of  adrenal  gland      A0001264  SNOMED  1982        A6917004  SNOMED  Clinical  Terms    

S0481705  Diseases  of  Adrenal  Gland      A0014499  SNOMED  1982  

S0220090  Diseases,  adrenal  gland      A0049924  MeSH  

Term  L0181041  

S0632950  Disorder  of  adrenal  gland      A0688820  Read  Codes        A4778687  SNOMED  Clinical  Terms    

S0354509  Adrenal  Gland  Disorders      A6996540  MedlinePlus        A7576253  NCI  Thesaurus        A7561794  Psychological  Index  Terms    

Term  L1279026   S1520972  Nebennierenkrankheiten  

     A7500884    

Page 34: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

concepts

•  ~1.5M CUI – synonymous sets

•  ~5.5M LUI – normalized names

•  ~6.1M SUI – concept strings

•  ~7.4M AUI – source-specific

L0018681

L0380797

C0018681

S0046855

A0066007 Headaches (MedDRA) A12003304 Headaches (OMIM)

S0046854

A0066000 Headache (MeSH) A0065992 Headache (ICD-10)

S0475647 A0540936 Cephalodynia (MeSH)

2007  numbers  

Page 35: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

concept categories

•  high-level – semantic types

•  NLM derived – source independent

Disease or Syndrome

Endocrine Diseases

Adrenal Gland Diseases

Addison’s Disease

Diseases

Adrenal Gland Hypofunction

Page 36: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

concept relationships

•  symbolic relationships – ~8M pairs of concepts

•  co-occurrence relationships – ~6M pairs of concepts

•  mapping relationships – ~150k mappings

Page 37: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

symbolic relationships

•  Hierarchical  –  Parent  /  Child  –  Broader  /  Narrower  than  

•  Derived  from  hierarchies  –  Siblings  (children  of  parents)  

•  AssociaFve  –  Other  

•  Various  flavors  of  near-­‐synonymy  –  Similar  –  Source  asserted  synonymy  

–  Possible  synonymy

PAR/CHD RB/RN

SIB

RO

RL

SY

RQ

Page 38: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

Anatomical Structure

Fully Formed Anatomical Structure

Embryonic Structure

Body Part, Organ or Organ Component Pharmacologic

Substance

Disease or Syndrome

Population Group

Semantic Types

Semantic Network

Heart

Concepts

Metathesaurus

38

237

49

5

16

13 22

Esophagus

Left Phrenic Nerve

Heart Valves

Fetal Heart

Medias- tinum

Saccular Viscus

Angina Pectoris

Cardiotonic Agents

Tissue Donors

Page 39: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

semantic network

•  135 semantic types – broad categories (drug, virus, etc.)

•  54 semantic relationships –  links categories (is-a, causes, treats)

•  types + relationships – broad categorization of biomedicine

Page 40: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

UMLS semantic network

Kleonsorge  et  al.,  2007  

Organism process of

Embryonic Structure

Anatomical Abnormality

Congenital Abnormality

Acquired Abnormality

Fully Formed Anatomical Structure

Anatomical Structure

part of

Organism Attribute

property of

Body Substance

contains,produces conceptual

part of

evaluation of

Body System conceptual part of

part of

Body Part, Organ or Organ Component

part of

Tissue

part of

Cell

part of

Cell Component

Gene or Genome

Body Space or Junction

adjacent to

location of

location of

evaluation of Finding

Laboratory or Test Result

Sign or Symptom

Biologic Function

Physiologic Function

Pathologic Function

Body Location or Region

conceptual part of

conceptual part of

Injury or Poisoning

disrupts

disrupts

co-occurs with

Page 41: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

SPECIALIST lexicon

•  Over 330k entries – syntax – morphology – orthography

•  natural language interface

Page 42: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

orthography

•  Spelling  variants  – oe/e  – ae/e  –  ise/ize  – geniFve  mark  

– BriFsh-­‐American  variants  

Addison's  disease  Addison  disease  Addisons  disease  

oesophagus  -­‐  esophagus  

anaemia  -­‐  anemia  

cauterise  -­‐  cauterize  

criFcise  -­‐-­‐  criFcize  centre  -­‐-­‐  center  foetus  -­‐-­‐  fetus  

Page 43: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

normalization Hodgkin’s  diseases,  NOS  

Hodgkin  diseases,  NOS  Remove genitive

Hodgkin  diseases,    Remove stop words

hodgkin  diseases,    Lowercase

hodgkin  diseases  Strip punctuation

hodgkin  disease  Uninflect

Sort words disease  hodgkin  

Page 44: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

SPECIALIST lexicon {base=Kaposi's  sarcoma  spelling_variant=Kaposi  sarcoma  entry=E0003576  

 cat=noun    variants=uncount    variants=reg    variants=

}  

{base=chronic  entry=E0016869  

 cat=adj    variants=inv    posiFon=aurib(1)    posiFon=pred    

}  

{base=aspirate  entry=E0010803                  cat=verb                  variants=reg                  tran=np                  nominalizaFon=aspiraFon|noun|E0010804  }  

{base=in  entry=E0033870                  cat=prep  }  

Page 45: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

subdomain integration

Biomedical literature

MeSH

Genome annotations

GO Model organisms

NCBI Taxonomy

Genetic knowledge bases

OMIM

Clinical repositories

SNOMED CT Other subdomains

Anatomy

FMA

UMLS

Page 46: A.I. in health informatics - Tufts University · A.I. in health informatics lecture 5 standards & ontolologies kevin small & byron wallace *Slides(reuse(material(from(Kleinsorge,(Willis,(and(Emrick;2007.

why do we care?

•  actionable intelligence requires knowledge about the universe – medicine very knowledge intensive – very rich information source

•  machine learning requires appropriate representation – models uncertainty