CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC...

Post on 25-Aug-2020

4 views 0 download

Transcript of CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC...

CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC ANNOTATION ADAPTATION

1

Cédric  PRUSKI    

Dri%-­‐a-­‐LOD@EKAW  2016,    November  20th,  Bologna,  Italy  

MOTIVATION

2

data  

KS KT

malignancy Malignant neoplasm

=

?

inaccessible

Outdated mappings and annotations may trigger undesirable results in biomedical systems

Crucial maintaining mappings

and annotations valid Malignant neoplasm

Large size and complexity

Prevents a totally manual maintenance

malignancy

malignancy

data

?

•  What is the impact of concept drift (or ontology evolution) on ontology mappings and semantic annotations? •  Quantitative •  Qualitative

•  How can we formally characterize concept drift? •  Basic changes (Addition/Deletion of concepts) •  Complex changes (Split, merge, move of concepts)

•  Can we reuse information that characterizes concept drift to adapt ontology mappings and semantic annotations? •  Prevention of re-alignment / re-annotation of whole datasets

PROBLEMATIC

3

①  Concept drift for mapping adaptation a.  DynaMO research project b.  Change patterns

②  Concept drift for semantic annotation maintenance a.  ELISA research project b.  Background knowledge

③  Discussion a.  Concept drift for LOD

AGENDA

4

THE CASE OF MAPPING ADAPTATION

5

“Adaptation of existing mappings according to modifications

affecting KOS elements at evolution time”

Definition and Problematic

ONTOLOGY MAPPING ADAPTATION

6

MV1=(s, t, r) MV2=(s’, t, r’)

Hypothesis: There is a correlation between the way KOS’ elements evolve and the way mappings are adapted

UNDERSTANDING MAPPING EVOLUTION

7

•  Identify potential interdependencies between changes affecting KOS entities and the mapping evolution

•  Empirically examine official and real-world mappings over time •  Evolution of SNOMED CT and ICD9CM as a case study

~400 000 mappings analyzed

SNOMEDCT

Jan/10

SNOMEDCT

Jul/10

SNOMEDCT

Jan/11

SNOMEDCT

Jul/11

ICD9CM 2009

ICD9CM 2010

MST 1

Jan/10 MST 2 Jul/10

MST 3 Jan/11

MST 4 Jul/11

How concept drift impact mappings?

How to identify these attributes?

KEY FINDINGS

8

This concept changed 560.39

≡ ≤ ≤ ≤

560.39

168000

is-a

44635007

is-a

29162007 168000 40515007

is-a

560.32

This concept was added

40515007

Before Evolution After Evolution

197063004

≡ ≤ ≤

ICD9CM

SNOMED CT SNOMED CT

ICD9CM

similarity

Enterolith (disorder)

Typhlolithiasis (disorder) Concretion

of intestine (disorder)

Impaction of intestine

29162007 44635007

≤ ≡

Fecal impaction

Fecal impaction of colon

197063004

Fecal impaction

Fecal impaction of colon

Observed modifications

Time

Attributes -Concretion of intestine -Enterolith -Fecal impaction

Mapping adaptation based on the evolution of relevant concept attributes

Lexical change patterns

CHARACTERIZATION OF CHANGES

9

a1, a2,…, an

asup1, asup2,…, asupn

asib1

asub1, asub2, …, asubn

a1, a2, …, an asib1, asib2

Ø  Total Copy (TC)

Ø  Total Transfer (TT)

Ø  Partial Copy (PC)

Ø  Partial Transfer (PT)

unspecified mental behavioral problem

bronzed diabetes inflammatory bowel diseases

bronzed diabetes inflammatory bowel diseases 1

specified behavioral problem

inflammatory bowel diseases

cs0

cs1

time CONTEXT = SUP ∪ SUB ∪ SIB

time j

specified behavioral problem

time j+1

Semantic change patterns

CHARACTERIZATION OF CHANGES

10

a1, a2,…, an

asup1, asup2,…, asupn

asib1

a1, a2, …, an

asib1, asib2

asub1,…, asubn

Ø  Equivalent (EQV)

Ø  Partial Match (PTM)

Ø  More Specific (MSP)

Ø  Less Specific (LSP)

Diabetes type 1

Diabetes type I Focal atelectasis

Helical atelectasis

familial chylomicronemia

familial hyperchylomicronemia Kappa chain disease

Kappa light chain disease

cs0

cs1

time j+1

time

time j

CONTEXT = SUP ∪ SUB ∪ SIB

Heuristics

LINKING CP AND MAINTENANCE ACTIONS

11

as1, as2, as3, …, asn

as1, as2, as3, …, asn

asib1, asib2,…, asibn

cs0a1,…, ak ct

semType Affected by KOS changes

KOS KS KOS KT

cs1

relevant attributes

MoveM(mst , ccand1

)

ccand1

∃!Lexical CP (Total Transfer) Semantic CP

unchanged

Kappa light chain disease

Kappa chain disease

CONTEXT = SUP ∪ SUB ∪ SIB

time j

time j+1

•  Concept drift has a huge impact on ontology mappings but some changes in concept do not affect mappings

•  Drift of attribute values governs the mapping adaptation process

•  In most of the cases concept drift results in local changes •  Change in super, sub concepts and siblings

•  Considering ontology versions alone is not enough to characterize concept drift •  Need of external background knowledge to better determine the semantic relationship

between versions of concept •  Cf. semantic annotation adaptation

Lessons learned

CONCEPT DRIFT FOR MAPPING ADAPTATION

12

THE CASE OF SEMANTIC ANNOTATIONS ADAPTATION

13

www.elisa-­‐project.lu    elisaelisa

Problem

SEMANTIC ANNOTATIONS ADAPTATION

14

Impact of concept drift on semantic annotations

METHODOLOGY

15

RESULTS

16

RESULTS

17

RESULTS

18

RESULTS

19

•  Concept may have labels before and after evolution that are disjoint from the syntactic or lexical point of view •  Ex: Cancer Malignant neoplasm

•  Lexical and Semantic change patterns cannot be applied

•  Consideration of external knowledge sources are required to characterize the evolution of concepts in such situations

•  We propose a methods exploiting Bioportal to overcome this limitation •  Ontologies •  Mappings

•  The method is able to find the semantic relationship between two versions of the same concepts •  Equivalent, less specific, more specific, unrelated, partially matched

Use of external knowledge source

CONCEPT DRIFT FOR ANNOTATIONS

20

Example

USE OF EXTERNAL KNOWLEDGE SOURCE

21

“Pituitary)dwarfism”)(MeSH))

“Pituitary)dwarfism)II”)(MeSH))

SNOMED)CT,)ICD9CM,)MEDDRA,)

NCIT,)DOID,)RCD,)HP,)DERMLEX,)NATPRO,)

CRISP,)SOPHARM,)BDO,)SNMI)

OMIM)NDFRT)

Search)in)ontologies) Search)in)ontologies)

No)common)ontologies)

Use)mappings)

15)mappings)available)(OMIM)ontology))

“Pituitary)dwarfism)II”)(OMIM))Mapped_to)

“LaronRtype)isolated)somatotropin)defect”)(SNOMED)CT))

SNOMED)CT)is)the)common)ontology)

“LaronRtype)isolated)somatotropin)defect”)and)“Pituitary)dwarfism”)have)the)same)super)concept)

(“short)stature)disorder”))they)are)siblings)

1 1

2

(Direct)method))

(Indirect)method))

3

•  Ontology regions do not evolve in the same way •  Unstable regions à handle with care •  Interesting for predicting concept drift

•  Concept drift has a different impact on annotation tools •  GATE •  NCBO annotator

•  Background knowledge gives promising results for characterizing concept drift •  Bioportal ontologies •  RDF datasets, Web data under investigation

•  Will machine learning help in understanding concept drift? •  Identification of relevant features •  What ML techniques to use?

Lessons learned (so far …)

CONCEPT DRIFT IN ANNOTATION ADAPTATION

22

•  Linked Open Data requires vocabulary for semantic interoperability purposes

•  LOD for characterizing concept drift •  Quality of LOD is problematic •  Some datasets rely on outdated vocabularies

•  Concept drift impacting LOD: •  FOAF, DC not so dynamic as domain ontologies •  No control over the datasets using controlled vocabularies

à How to propagate changes observed in the vocabulary to RDF datasets?

Concept drift for LOD

DISCUSSION

23

•  Silvio Cardoso, •  Dr. Marcos Da Silveira, •  Dr. Duy Dinh, •  Dr. Julio Dos Reis, •  Dr. Anika Gross, •  Pr. Erhard Rahm •  Pr. Chantal Reynaud-Delaître,

•  And all the others …

COLLABORATORS

24

M. Da Silveira, J. C. Dos Reis, C. Pruski, Management of Dynamic Biomedical Terminologies: Current Status and Future Challenges, IMIA Yearbook of Medical Informatics, 10(1), 125-133, 2015 J. C. Dos Reis, D. Dinh, M. Da Silveira, C. Pruski, C. Reynaud-Delaître, Recognizing lexical and semantic change patterns in evolving life science ontologies to inform mapping adaptation, Artificial Intelligence in Medicine, 63(3), 153-170, (DOI: http://dx.doi.org/10.1016/j.artmed.2014.11.002), 2015 J. C. Dos Reis, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, Understanding semantic mapping evolution by observing changes in biomedical ontologies, Journal of Biomedical Informatics, 47, 71-82, 2014. S. D. Cardoso, C. Pruski, M. Da Silveira, Y-C Lin, A. Gross, E. Rahm, C. Reynaud-Delaitre, Leveraging the Impact of Ontology Evolution on Semantic Annotations, Knowledge Engineering and Knowledge Management - 20th International Conference, (EKAW) 2016, Bologna, Italy, November 19-23, 2016 J.C. Dos Reis, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, Characterizing Semantic Mappings Adaptation via Biomedical KOS Evolution: A Case Study Investigating SNOMED CT and ICD, AMIA 2013 Annual Symposium, Washington DC (USA), 2013 J.C. Dos Reis, D. Dinh, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, Mapping Adaptation Actions for the Automatic Reconciliation of Dynamic Ontologies, ACM International Conference on Information and Knowledge Management (CIKM 2013), San Francisco, CA (USA), 2013 J.C. Dos Reis, D. Dinh, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, The influence of similarity between concepts in evolving biomedical ontologies for mapping adaptation, European Medical Informatics Conference (MIE), 31/08 - 03/09, Istanbul, Turquie, 2014 J.C. Dos Reis, D. Dinh, C. Pruski, M. Da Silveira and C. Reynaud-Delaître, Identifying change patterns of concept attributes in ontology evolution, Proc. of the 11th ESWC, Anissaras, Crete, (Greece), 2014. C. Pruski, J.C. Dos Reis, M. Da Silveira, Capturing the relationship between evolving biomedical concepts via background knowledge, 9th International SWAT4LS conference, Amsterdam, 2016

REFERENCES

25