Using Patient Data to Retrieve Health Knowledge James J. Cimino, Mark Meyer, Nam-Ju Lee, Suzanne...

28
Using Patient Data to Retrieve Health Knowledge James J. Cimino, Mark Meyer, Nam-Ju Lee, Suzanne Bakken Columbia University AMIA Fall Symposium October 25, 2005
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of Using Patient Data to Retrieve Health Knowledge James J. Cimino, Mark Meyer, Nam-Ju Lee, Suzanne...

Using Patient Data to Retrieve Health Knowledge

James J. Cimino, Mark Meyer, Nam-Ju Lee, Suzanne Bakken

Columbia UniversityAMIA Fall Symposium

October 25, 2005

Automated Retrieval with Clinical Data

UnderstandInformation

Needs

1

Get InformationFrom EMR

2

AutomatedTranslation

5

ResourceTerminology

4

Presentation

7ResourceSelection

3

Querying

6MRSA

What’s Hardest about Infobuttons?

• It’s not knowing the questions

• It’s not integrating clinical info systems

• It’s not linking to resources

• It’s translating source data to target terms

Automated Retrieval with Clinical Data

UnderstandInformation

Needs

1

Get InformationFrom EMR

2

AutomatedTranslation

5

ResourceTerminology

4

Presentation

7ResourceSelection

3

Querying

6MRSA

What’s Hardest about Infobuttons?

• It’s not knowing the questions

• It’s not integrating clinical info systems

• It’s not linking to resources

• It’s translating source data to target terms

Types of Source Terminologies

• Uncoded (narrative):– Radiology reports (?)

"…infiltrate is seen in the left upper lobe."

• Coded– Lab tests (6,133)

AMIKACIN, PEAK LEVEL– Sensitivity tests (476)

AMI 6 MCG/ML– Microbiology results (2,173)

ESCHERECHIA COLI– Medications (15,311)

UD AMIKACIN 1 GM VIAL

Types of Target Terminologies

• Narrative search:– PubMed– RxList– Up to Date– Micromedex– Lab Tests Online– OneLook– National Guideline

Clearinghouse

• Coded resource:– Lexicomp– CPMC Lab Manual

• Coded search– PubMed

The Experiments

• Identify sources of patient data

• Get random sample of terms for each source

Term Samples

• 100 terms from radiology reports using MedLEE

• 100 Medication ingredients

• 100 Lab test analytes

• 100 Microbiology results

• 94 Sensitivity test reagents

The Experiments

• Identify sources of patient data

• Get random sample of terms for each source

• Translate terms if needed (multiple methods)

• Perform automated retrieval with terms

Searches Performed

Narrative Concept ConceptResource Resource Search

Un-Coded

Coded

RadiologyTerms

Medications

LabTests

SensitivityTests

MicrobiologyResults

PubMed, NGC,OneLook, UptoDate

RxList, Micromedex Lexicomp

LabtestsOnline, CPMC Lab PubMed PubMed Manual

RxList, Micromedex

UptoDate, PubMed PubMed

Mapping Methods

• Microbiology results to MeSH:– Semi-automated

• Lab tests to MeSH analytes:– Automated, using UMLS

• Medications to Lexicomp:– Natural language processing

• Lab tests to CPMC Lab Manual:– Manual matching

Results: Multiple DocumentsTerms from Data Source Searches Performed Retrieval Success

100 Findings 100 PubMed 100 % (92,440)and Diagnoses 100 Up to Date 82% (28.6)

from 20 100 NGC 95% (119) Radiology Reports 100 One Look 81% (25.8)

100 Up to Date 94% (1.4)100 Microbiology 100 PubMed 100% (3,328)

Result Terms 100 PubMed (using MeSH translation)

100% (18,036)

100 Lab Test Terms 100 Lab Tests Online 73% (133)(using analyte names) 100 PubMed 99% (84,633)

100 PubMed (using MeSH translation)

100% (90,656)

Retrieval success is represented as percent of terms that successfully retrieved any results; numbers in parentheses indicate average numbers of results (citations, documents, topics, definitions, etc., depending on the target resource) for those searches that retrieved at least one result.

Uncoded versus Coded Searches

• 1,028/2,173 (47.3%) of microbiology tests terms mapped to MeSH

• 940/1041 (90.3%) of lab analytes mapped to LOINC• 485/940 (51.6%) LOINC analytes mapped to MeSH

Result Type Number Ratio

Identical 33 1.00

Slight Diff 7 1.44

Large Diff 60 29.92

Result Type Number Ratio

Identical 72 1.00

Slight Diff 16 1.05

Large Diff 12 3.28

Results: Single Document

Terms from Data Source Searches Performed Retrieval Success

100 Medication Terms 100 Lexicomp (using document identifiers)

96% (1)

100 Laboratory Test Terms

100 Lab Manual (using document identifiers)

94% (1)

Retrieval success is represented as percent of terms that successfully retrieved any results; numbers in parentheses indicate average numbers of results (citations, documents, topics, definitions, etc., depending on the target resource) for those searches that retrieved at least one result.

Results: Page of Links

Terms from Data Source Searches Performed Retrieval Success

100 Medication Terms 100 Rx List 95% [.88/.04](using ingredient names) 100 Micromedex 100% [.89/.06]94 Sensitivity Test Terms 94 Rx List 85%[.79/.06](using antibiotic names) 94 Micromedex 97% [.96/.01]

Results for Rx List and Micromedex are difficult to quantify, because they provided heterogeneous lists of links; rather than provide link counts, we assessed the true positive and false negative rates, shown in brackets.

Micromedex versus RXList

194 Terms

9 missed by both

RxList: 163 Micromedex: 180158 Terms

foundby both22 found by Micromedex

but missed by RxList

5 found by RxListbut missed by Micromedex

See For Yourself!

www.dbmi.columbia.edu/cimino/2005amia-data.html

Discussion• 7 sources, 894 terms, 11 resources, 1,592 searches• Automated retrieval is technically possible

– Found something 73-100% of the time– 12/16 experiments “succeeded” 94-100%

• Translation often unsuccessful• Automated indexing works• Usefulness of translation to MeSH is marginal• Good quality when retrieving pages of links

(Micromedex and RxList)• Good quality when with concept-indexed resources• Recall/precision of document retrievals unknown

– Need to define the question– Additional evaluation needed

Next Steps

• Creation of terminology management and indexing suite

• Formal analysis of qualities of answers

Acknowledgments

This work is supported in part by NLM grants R01LM07593 and R01LM07659 and NLM Training Grants LM07079-11 and P20NR007799.