An ontology driven module for accessing chronic pathology literature- CHRONIOUS-Swws2011

16
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR An ontology driven module for accessing chronic pathology literature Joint work with Stephan Kiefer, Jochen Rauch, Marco Attene, Franca Giannini, Simone Marini, Luc Schneider, Carlos Mesquita, Xin Xing Riccardo Albertoni, Institute for Applied Mathematics and Information Technology C.N.R., Genova, Italy

Transcript of An ontology driven module for accessing chronic pathology literature- CHRONIOUS-Swws2011

Page 1: An ontology driven module for accessing chronic pathology literature- CHRONIOUS-Swws2011

20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR

An ontology driven module for accessing

chronic pathology literature

Joint work with Stephan Kiefer, Jochen Rauch, Marco Attene, Franca Giannini, Simone Marini, Luc

Schneider, Carlos Mesquita, Xin Xing

Riccardo Albertoni,Institute for Applied Mathematics and Information Technology

C.N.R., Genova, Italy

Page 2: An ontology driven module for accessing chronic pathology literature- CHRONIOUS-Swws2011

20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR

Motivation, problem area Chronic diseases are the leading causes of death and

disability for a large amount of people in most industrialized nations.

Chronic diseases have a deep impact on today’s society costs:1. Costs of medical care in relation to diagnosis and treatment of

disease2. Loss of human resources caused by morbidity or premature death3. Intangible costs capture the psychological dimensions of illness

including pain and anxiety

New technologies for acquiring and analyzing vital signals are arising

Page 3: An ontology driven module for accessing chronic pathology literature- CHRONIOUS-Swws2011

20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR

Motivation, problem area

New monitoring and treatment of chronic patients are becoming possible

No already existent guidelines are available

Knowledge in the domain is rapidly evolving

Need of tools for indexing and retrieving well focused documentation accordingly to continuously evolving knowledge

Page 4: An ontology driven module for accessing chronic pathology literature- CHRONIOUS-Swws2011

20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR

Context: the ProjectAn Open, Ubiquitous and Adaptive Chronic Disease Management Platform for Chronic Obstructive Pulmonary Disease (COPD) and Chronic Kidney disease (CKD),

FP7-ICT-2007–1– 216461 CHRONIOUS, February 2008 – January 2012 (48 months) http://www.chronious.eu/

Page 5: An ontology driven module for accessing chronic pathology literature- CHRONIOUS-Swws2011

20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR

Context: The CHRONIOUS project

Page 6: An ontology driven module for accessing chronic pathology literature- CHRONIOUS-Swws2011

20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR

Literature search module: design requirements

Using explicit medical terminology (e.g., controlled vocabularies) Specific to the considered pathologies, but also terminology allowing

different levels of granularity in search (searching by coarse- and fine- grained concepts)

Terminology as much as possible Modular and Extendable CKD and COPD are a kind of test-bed for Chronious, but other chronic

diseases should be pluggable eventually Knowledge in these domains evolves, so do related terminologies! we should

support in keeping terminologies up-to-date

Offering multilingual capabilities Search must be possible in different languages, at least when well

established translations of terminologies are available

Page 7: An ontology driven module for accessing chronic pathology literature- CHRONIOUS-Swws2011

20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR

The developed system

Document Search Conceptual search

Metadata search

Free Text searchNLP

Document upload

AutomaticUpload

tool

Manual upload

tool

CKD COPD

ontologyOWL/RDF

MeSHthesauru

s SKOS/RDFmapping API Indexer

Concept Associator

Knowledge cache

Document processing

Format Transformati

onNLP

Ontology enrichme

nt tool

Page 8: An ontology driven module for accessing chronic pathology literature- CHRONIOUS-Swws2011

20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR

Terminology to index scientific literature

Medical Subject Headings (MeSH) is a well known controlled vocabulary used for indexing articles from MEDLINE/PubMed it isn’t enough specialized to deeply cover COPD and CKD domains

Ontologies have been defined to deepen COPD and CKD diseases (OWL by IFOMIS)

However MeSH is still required in Chronious The search is not always made at the same level of granularity, often

keyword search can be done moving back and forward from coarse to very disease-specialized concepts

Multilingual support, some “certified” translations are available for example in Italian, Portuguese, Spanish

Terminological de facto standard, clinicians expect MeSH is included

How to combine ontologies and MeSH in CHRONIOUS ?

COPD, CKD Ontologies

Middle Layer Ontology for Clinical Care (MLOCC)

Open Biological and Biomedical Ontology (OBO) Foundry:

Basic Formal Ontology (BFO) + Relation Ontology (RO) +

Foundational Model of Anatomy Ontology (FMA)

Page 9: An ontology driven module for accessing chronic pathology literature- CHRONIOUS-Swws2011

20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR

The adopted approachRDF URI as a kind of lingua franca

MeSH (provided by the US medical library) has been encoded in SKOS/RDF (W3C)

Italian, Portuguese and Spanish translations of MeSH (provided by national authorities) have been encoded in SKOS/RDF We kept RDF ID consistent to the original MESH descriptor

identifiers A semi-automatic mapping between MeSH in SKOS and

developed Ontologies A script compares MeSH terms with lexical representation of concepts

form OWL ontologies The suggested mapping are validated in two stages-process by Ontology

Engineers and Clinicians

Page 10: An ontology driven module for accessing chronic pathology literature- CHRONIOUS-Swws2011

20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR

Natural Language Processing

Based on General Architecture for Text Engineering (GATE) framework Open Source, JAVA suite, originally developed at the University of Sheffield

beginning in 1995 used worldwide by a wide community of scientists, companies, teachers and

students for all sorts of natural language processing tasks, including information extraction in many languages

Default processes applied to extract headwords of a text: Sentence splitter, Tokeniser, Part-of-speech tagger, Morphological analyzer

Modules included for the Ontology Enrichment Tool and the Indexer OntoRoot Gazetteer: A GATE plug-in that produces ontology aware

annotations for extracted terms; Shallow Parser: it identifies word groups such as “chronic diseases” and

“lung function”; RegEx-Pattern Matcher: it matches a lemma of a token with word patterns

defined as regular expression; Thesaurus Matcher: it matches the lemma of a token to a domain thesaurus,

a JAPE resources has been developed to access MEsH and the mapped ontology concepts through MeSH Mapping API

Page 11: An ontology driven module for accessing chronic pathology literature- CHRONIOUS-Swws2011

20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR

Ontology enrichmentSemi-automatic process - candidate concepts are rated according to

Corpus relevance: determined by its average Term Frequency Inverted Document Frequency (TF.IDF) value with respect to the whole document corpus;

Concept co-occurrences average distance in the text between the candidate concept and a concept within the corpus is calculated as a benchmark.

Domain relevance: matching with common dictionary (WordNet), domain thesaurus (MeSH) and with regular expression patterns;

Subclass-of relations: extraction of vertical relations, linguistic patterns or dictionary hypernyms.

Candidate concepts are marked as “new”, “to validate”, “postponed”,“accepted” or “rejected” by ontology engineers and clinicians

Page 12: An ontology driven module for accessing chronic pathology literature- CHRONIOUS-Swws2011

20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR

Ontology Enrichment Tool – Relevance details

Page 13: An ontology driven module for accessing chronic pathology literature- CHRONIOUS-Swws2011

20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR

Search Interface

Page 14: An ontology driven module for accessing chronic pathology literature- CHRONIOUS-Swws2011

20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR

Search results:(black Box Testing) intial comparison with PubMed

“Inhaler Device”

“PostBronchodilator Spirometry”“

“Inhaler device”

Horizontal axis: Number of considered/retrieved documentsVertical axis: F-measure Also glass box testing has been performed to ensure ontologiesrepresent the right concepts

Page 15: An ontology driven module for accessing chronic pathology literature- CHRONIOUS-Swws2011

20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR

Conclusions

The CHRONIOUS search system can become a specialized indexing and search system for the hospital:

It can manage internal hospital documents to be indexed into the database

It can cover other medical domains and languages using MeSH

It is already specialized in COPD and CKD by using specific ontologies

It provides the tools for ontology maintenance thus well suited to domains characterized by rapidly evolving knowledge

Page 16: An ontology driven module for accessing chronic pathology literature- CHRONIOUS-Swws2011

20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR

Critical and open issues for future work

User notifications • about changes in Enrichment Tool data (e.g. if new documents with extracted candidate concepts are available)• supporting the collaboration among clinicians and Ontology Engineers

Re-indexing of documents• what happens when there is a new ontology version?• some incremental indexing should be provided