A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library...

28
A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA [email protected]

Transcript of A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library...

Page 1: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library

Wesley W. ChuComputer Science Dept,

[email protected]

Page 2: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

2

NIH Program Project Grant A 5 year $ 10M joint interdisciplinary project

between Medical School & CS faculty Project 1-- teleradaiology infrastructure Project 2-- neuroradiology workstation Project 3-- multimedia information architecture Project 4-- natural language processing for

medical reports Project 5-- medical digital library

Page 3: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

3

Project 5 Personnel

Graduate students:Victor Z. LiuWenlei MaoQinghua Zou

Consultants:Hooshang Kangaloo, M.D.Denies Aberle, M.D.

Project leader: Wesley W. Chu

Page 4: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

4

Data in a Medical Digital Library Structured data (patient lab data,

demographic data,…)--CoBase Images (X rays, MRI, CT scans)--

KMeD Free-text

Patient reports Teaching files Literature News articles

Page 5: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

5

System Overview

Patient reports

Medical literature

Medical Digital Library(MDL)

Teaching materials

Query results

Ad-hoc query

Patient report for content correlation

News Articles

Page 6: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

6

A Sample Patient Report

…Tissue Source:LUNG (FINE NEEDLE ASPIRATION) (LEFT

LOWER LOBE)…FINAL DIAGNOSIS:

- LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION):- LUNG CANCER, SMALL CELL, STAGE II.

…Tissue Source:LUNG (FINE NEEDLE ASPIRATION) (LEFT

LOWER LOBE)…FINAL DIAGNOSIS:

- LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION):- LUNG CANCER, SMALL CELL, STAGE II.

Page 7: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

7

Treatment-related articles

??? How to treat the disease

Diagnosis-related articles

??? How to diagnose the disease

Scenario Specific Retrieval

…Tissue Source:LUNG (FINE NEEDLE

ASPIRATION) (LEFT LOWER LOBE)

…FINAL DIAGNOSIS:

- LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION):- LUNG CANCER, SMALL CELL, STAGE II.

Page 8: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

8

Challenge I: Indexing Extracting domain-specific key

concepts in the free text for indexing Free-text: Lung cancer, small cell, stage II

Concept terms in knowledge source: stage II small cell lung cancer

Conventional methods use NLP Not scalable Cannot adapt to various forms of word

permutation

Page 9: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

9

Challenge II: Terms used in the query are too general

Expanding the general terms in the query to specific terms that are used in the document

Query: lung cancer, diagnosis options

Document: … the effectiveness of chest x-ray and bronchography on patients with lung cancer …

?√

Query: lung cancer, chest x-ray, bronchography, …

Page 10: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

10

Challenge III: Mismatching between terms used in query and documents

ExampleQuery: … lung cancer, …

Document 3: anti-cancerdrug combinations…

?? ?Document 1: … lung carcinoma …

Document 2: … lung neoplasm …

Page 11: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

11

Challenge I: Indexing Challenge II: Terms in the query

are too general Challenge III: Mismatch between

terms in the query and the documents

Page 12: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

12

IndexFinder: Extracting domain-specific key concepts

Technique Permute words from text to generate

concept candidates. Use knowledge base to select the

valid candidates. Problem

Valid candidates may be irrelevant to specific domain indexing.

Page 13: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

13

Eliminating irrelevant concepts

Syntactic filter: Limit permutation of words within a

sentence. Semantic filter:

Use the semantic type (e.g. body part, disease, treatment, diagnosis) to filter out irrelevant concepts

Use ISA relationship to filter out general concepts and yield specific concepts.

Page 14: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

14

IndexFinder Performance Two orders of magnitude faster than

conventional approaches No NLP Knowledge base (UMLS) and index files are

resided in main memory Time complexity is linear with the number of

distinct words in the text Preliminary Evaluation

IndexFinder generates 4% more concepts than conventional approaches

(using a single noun phrase) All concepts are relevant

Page 15: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

15

Challenge I: Indexing Challenge II: Terms in the query

are too general Challenge III: Mismatch between

terms in the query and the documents

Page 16: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

16

Query Expansion (QE) Queries in the following form

benefit from expansion:

<key concept> + <general supporting concept(s)>e.g. lung cancer e.g. diagnosis options

<key concept> + <specific supporting concept(s)>e.g. lung cancer e.g. chest x-ray, bronchography

expansion

Page 17: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

17

Traditional QE Appends all terms that statistically co-

occur with the key terms in the query Not semantically focused

Original Query: lung cancer, diagnosis options

expansion

Expanded Query: lung cancer, radiotherapy, chemotherapy, antineoplastic agents, survival rate

Page 18: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

18

Knowledge-based QE

Knowledge source(UMLS,by theNLM)

diagnoses

Concept

Disease or Syndrome

Diagnostic Procedure

Sign or Symptom

Pharmacologic Substance

lung cancer chest x-ray

Semantic Type

Key concept Specific supporting concepts

A class of conceptsthat belong to aSemantic Type

BodyParts

Injury orPoisoning

Semantic NetworkMetathesaurus

diagnoses

diagnoses

Page 19: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

19

Challenge I: Indexing Challenge II: Terms in the query

are too general Challenge III: Mismatch between

terms in the query and the documents

Page 20: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

20

Document: … lung carcinoma …Document: … lung neoplasm …Document: … anti-cancer drugcombinations …

Document: … anti-cancer drugcombinations …

Phrase-based Vector Space Model (VSM)

Query: … lung cancer, …

?

Knowledge-source

lung cancer = lung carcinoma …√

lung neoplasm …

parent_of

anti-cancer drug combinations

missing!!!

Query: … lung cancer, …

√??

Page 21: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

21

Phrase-based VSM Examples

Query

Document

[(C0242379); “lung” “cancer”] …[(C0003393); “anti” “cancer” “drug” “combin”] …

Query:“lung cancer …”

Phrases:[(C0242379); “lung” “cancer”]…

Document:“anti-cancer drugcombinations …”

Phrases:[(C0003393); “anti” “cancer” “drug” “combin”]…

Page 22: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

22

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

recall

aver

age

prec

isio

n ov

er 1

05 q

uerie

s

Stems

Retrieval Effectiveness Comparison (Corpus: OHSUMED, KB: UMLS)

16%100 queries

vs.5%

50 queries

Page 23: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

23

System Overview

Patient reports

Medical literature

Medical Digital Library(MDL)

Teaching materials

Query results

Ad-hoc query

Patient report for content correlation

News Articles

Page 24: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

24

Application: Query Answering via Templates

Sample templates:“<disease>, treatment,”“<disease>, diagnosis ”

QueryExpansion

…Template:“<disease>, treatment”

lung cancer

lung cancerradiotherapychemotherapycisplatin

relevant documents

IndexFinder

lung cancer,treatment

Phrase-basedVSM

Page 25: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

25

Applications (cont’d) Scenario-specific content

correlation

Query Templates Scenario

Selection

e.g. treatment, diagnosis, etc.

PatientReport

QueryExpansion

relevant documents

Phrase-basedVSM

IndexFinder

Page 26: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

26

Conclusion

Knowledge based (UMLS) approach provides scenario-specific medical free-text retrieval

IndexFinder – use word permutation as well as syntactic and semantic filtering to extract domain-specific key concepts in the free text for indexing

Knowledge-based query expansion – transform general terms in the query into the scenario specific terms used in the documents, giving the query a higher probability of matching with the relevant documents

Phrase based indexing – transform document indexing into phrase paradigm (concept and its word stems) to improve retrieve effectiveness

Page 27: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

27

Acknowledgement

This research is supported in part by NIC/NIH Grant#4442511-33780

Page 28: A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu.

31

Demo http://fargo.cs.ucla.edu/umls/search.aspx

Test Texts

• Technically successful left lower lobe nodule biopsy.

• Preliminary localization CT images again demonstrate a left lower lobe nodule adjacent to the posterior segmental bronchus.

• CT scans obtained during biopsy demonstrate the coaxial cannula adjacent to the proximal aspect of the nodule.

• Surrounding pulmonary parenchymal hemorrhage as a result of the biopsy is also noted.

• There may be a tiny left apical air collection in the pleural space lateral to the apical bulla.

• Formal cytologic evaluation of the withdrawn specimen is pending at this time, although abnormal appearing "spindle" cells were identified during on-site cytopathologic evaluation of specimen adequacy.