Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

115
Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011

Transcript of Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Page 1: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Natural Language Processing:A Healthcare-Oriented Tutorial

Philip Resnik, Ph.D.March 22, 2011

Page 2: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

What is natural language processing?

• Natural language = naturally occurring human language– Excludes Java, C++, Klingon, …

• Natural language processing– The branch of information science that deals with

natural language information (syn: NLP, human language technology) [WordNet 1.7]

– a range of computational techniques for analyzing and representing naturally occurring texts at one or more levels of linguistic analysis for the purpose of achieving human-like language processing for a range of particular tasks or applications [E. Liddy, ASIS bulletin, April 1998]

Page 3: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

NLP is everywhere…

Page 4: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Overview

• Personal background• Historical perspective and high level overview• Elements of NLP for computer assisted coding• Evaluating how well a system is doing• Confidence estimation and “auto-coding”• Understanding rule-based and statistical methods• The importance of natural language input• Where NLP is headed• Some key take-aways

Page 5: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Personal background

Page 6: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Overview

• Personal background• Historical perspective and high level overview• Elements of NLP for computer assisted coding• Evaluating how well a system is doing• Confidence estimation and “auto-coding”• Understanding rule-based and statistical methods• The importance of natural language input• Where NLP is headed• Some key take-aways

Page 7: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Hiya, Hal. Can you believe Tennessee beat the Giants on Sunday?!

Page 8: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

• SHRDLU (Winograd)

– Person: PICK UP A BIG RED BLOCK.– Computer: OK. (does it)

…– Person: FIND A BLOCK WHICH IS TALLER THAN THE ONE YOU ARE HOLDING AND PUT IT

INTO THE BOX.– Computer: BY ”IT”, I ASSUME YOU MEAN THE BLOCK WHICH IS TALLER THAN THE ONE I

AM HOLDING.– Computer: OK. (does it)– Person: WHAT DOES THE BOX CONTAIN?– Computer: THE BLUE PYRAMID AND THE BLUE BLOCK.– Person: WHAT IS THE PYRAMID SUPPORTED BY?– Computer: THE BOX.

Page 9: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

block n.1. a. A solid piece of a hard substance, … having one or more flat sides.b. Such a piece used as a construction member or as a support.c. Such a piece upon which chopping or cutting is done: a butcher's block.d. Such a piece upon which persons are beheaded.

e. One of a set of small wooden or plastic pieces, such as a cube, bar, or cylinder, used as a building toy.f. Printing A large amount of text.g. Sports A starting block.2. A stand from which articles are displayed and sold at an auction: Many priceless antiques went on the block.3. A mold or form on which an item is shaped or displayed: a hat block.4. A substance, such as wood or stone, that has been prepared for engraving.5. a. A pulley or a system of pulleys set in a casing.b. An engine block.6. A bloc.7. A set of like items, such as shares of stock, sold or handled as a unit.8. A group of four or more unseparated postage stamps forming a rectangle.9. Canadian A group of townships in an unsurveyed area.10. a. A usually rectangular section of a city or town bounded on each side by consecutive streets.b. A segment of a street bounded by consecutive cross streets and including its buildings and inhabitants.11. A large building divided into separate units, such as apartments.12. A length of railroad track controlled by signals.13. The act of obstructing.14. Something that obstructs; an obstacle.15. a. Sports An act of bodily obstruction, as of a player or ball.b. Football Legal interference with an opposing player to clear the path of the ball carrier.16. Medicine Interruption, especially obstruction, of a normal physiological function: nerve block.17. Psychology A sudden cessation of speech or a thought process without an immediate observable cause, sometimes considered a consequence of repression. Also called mental block.18. Slang The human head: threatened to knock my block off.19. A blockhead.

Page 10: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Artificial examples

MIT

Stanford

Edinburgh

Page 11: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

LUNAR (Woods, 1973)

What is the average concentration of iron in ilmenite?

Give me references on sector zoning.

What is the average weight of all your samples?

Page 12: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Why didn’t it work?• Ambiguity and the lack of world knowledge

– Iraqi Head Seeks Arms

Lack of scalability / need for hand-crafting How many rules does it take to “get it right”, and how do they get

written and kept up to date? Lack of context awareness

The rule works in one context, but what about all the other contexts you haven’t considered?

Brittleness in the face of natural variability What happens when you get unexpected input?

Lack of confidence assessment How does the system “know” if it’s wrong?

Page 13: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

… k uh k q ah dh ow z t ow m ey dx z d ux d …

… you should cook those tomatoes, dude …

Observed unstructured input

Correct structure

Speech recognition – a different approach

Embarrassingly concocted example and unrelated waveform.

Page 14: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Automatically learned pattern matching(This network recognizes a variety of valid pronunciations for “tomato”)

Page 15: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Effective methods• Despite a lack of

world knowledge• Without labor

intensive hand crafting

• Accepting of a wide variety of human variation

Page 16: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

NLP researchers

Speech researchers

Page 17: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

¡Viva la revolución!

• 1990s– DARPA forces speech

recognition researchers and NLP researchers to get together.

– “Statistical revolution” in NLP ensues.

Page 18: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

P. K. Agarwal and M. Sharir. Algorithmic techniques for geometric optimization. In Computer Science Today: Recent Trends and Developments, volume 1000 of Lecture Notes Comput. Sci., pages 234--253. Springer-Verlag, 1995.

author = "Pankaj K. Agarwal and Micha Sharir", title = "Algorithmic Techniques for Geometric Optimization", booktitle = "Computer Science Today", pages = "234-253", year = "1995

Observed unstructured text

Correct structure

Example: learning how to find the structure in unstructured text

Source: Geng (2002)

Page 19: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Source: Geng (2002)

Automatically learned model of patterns using examples of correct structure (This network recognizes a variety of valid formats for bibliography entries)

Page 20: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

It’s not all statistics, is it?

• Rule-based “preprocessing”– Dividing the text into basic units (“tokens”)

• Categories – Booktitle, Journal, Volume, Year…

• Hard constraints and rules – Years must have 4 digits and start with ‘19’ or ‘20’

• Structure of the statistical model– Bibliography entries are “beads on a string”

Page 21: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Why do statistical methods work well?

– Discovering the patterns in the data• Especially from lots of correctly answered cases

Creating “soft” constraints• (e.g. Year probably signals the end of a bibliography

entry, but it doesn’t have to) Graceful handling of variability, ill-formedness

• System recognizes unforeseen input as less likely, rather than treating it as unprocessable.

Page 22: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Sources: graph adapted from Church, K. (2003) “Speech and Language Processing: Where have we been and where are we going,” Eurospeech, Geneva, Switzerland. Green circle data have been added from figures in Cardie and Mooney (1999).

0%20%40%60%80%

100%

1985

1990

1995

2000

2005

Annual Meeting of the Association for Computational Linguistics

% “Statistical” Papers

The statistical revolution in NLP

Page 23: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Driving progress: evaluation

• Up to the mid-1980s, an NLP system was evaluated by watching a demonstration.

• Over the last 20 years, NLP systems are subjected to more rigorous evaluation and measurement.

• This has been a primary driver of progress in the field.

Page 24: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Method Precision Recall F-measure

Baseline 34.5 42.5 38.1

Method 2 59.4 59.4 59.4

(+55.9%)

Method 3 68.0 66.6 67.3

(+76.6%)

Typical evaluation in NLP

Obvious, simple technique or previous state of the art

Multiple relevant measures, often capturing a tradeoff

Single “figure of merit” for cross-system comparison

Alternative methods or systems being evaluated

Page 25: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Source: Church, K. (2003) “Speech and Language Processing: Where have we been and where are we going,” Eurospeech, Geneva, Switzerland.

Page 26: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.
Page 27: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

What is NLP today?• Applications and tasks, not “understanding”• Finding the structure in unstructured text• Learning to make good predictions, often from lots of examples that include the correct answer• Combining knowledge sources with data-driven techniques

Page 28: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Overview

• Personal background• Historical perspective and high level overview• Elements of NLP for computer assisted coding• Evaluating how well a system is doing• Confidence estimation and “auto-coding”• Understanding rule-based and statistical methods• The importance of natural language input• Where NLP is headed• Some key take-aways

Page 29: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Billing

NLP Engine RoutingCoder Review

Traditional Coding

CAC Landscape

ORDER_EXAM: MRI Hd wo&w ORDER_IND: head - h/o plasmacytoma^ MR head, without and with IV gadolinium. Comparison is made with previous outside…

Page 30: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Relevant steps from an NLP perspective

• Context: demographics, codeset, payer specifics…• Identifying document regions• Identifying information units• Combining information units • Creating an internal representation • Mapping to/prediction of codes• Coding logic

Note: I am not describing any specific system! Examples are constructed for this presentation.

Page 31: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Mrs. Zoe is a 57-year-old female who has been having chest pains which she describes as a sharp pain, located substernally occurring at night when she tries to lie on her right side. She has not had any exertional type chest discomfort and the discomfort in her chest will last as long as she is lying in that position. …

Mrs. Zoe exercises daily, walking one and a half to three miles and also uses some weights. Her mother is age 81 and has a history of angina and congestive heart failure along with atrial fibrillation.

Weight is 150 pounds, stable. No history of thyroid dysfunction. No renal dysfunction. No gastrointestinal symptoms. No asthma, wheezing, or lung problem. Is having menopausal symptoms. No claudication. Neurologic is negative.

Identifying document regions

Mrs. Zoe is a 57-year-old female who has been having chest pains which she describes as a sharp pain, located substernally occurring at night when she tries to lie on her right side. She has not had any exertional type chest discomfort and the discomfort in her chest will last as long as she is lying in that position. …

Mrs. Zoe exercises daily, walking one and a half to three miles and also uses some weights. Her mother is age 81 and has a history of angina and congestive heart failure along with atrial fibrillation.

Weight is 150 pounds, stable. No history of thyroid dysfunction. No renal dysfunction. No gastrointestinal symptoms. No asthma, wheezing, or lung problem. Is having menopausal symptoms. No claudication. Neurologic is negative.

History of present illness

Past medical history

Family history

Review of systems

Page 32: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Some other kinds of regions

• Negated– No evidence of pneumonia

• Equivocal or Modal– … could represent atelectasis…– … likely fracture…

Page 33: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Sentence breakingMrs. Zoe is a 57-year-old female who has been having chest pains which she describes as a sharp pain, located substernally occurring at night when she tries to lie on her right side. She has not had any exertional type chest discomfort and the discomfort in her chest will last as long as she is lying in that position. …Mrs. Zoe exercises daily, walking one and a half to three miles and also uses some weights. Her mother is age 81 and has a history of angina and congestive heart failure along with atrial fibrillation. Weight is 150 pounds, stable. No history of thyroid dysfunction. No renal dysfunction. No gastrointestinal symptoms. No asthma, wheezing, or lung problem. Is having menopausal symptoms. No claudication. Neurologic is negative.

Page 34: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Morphological analysis

Mrs. Zoe is a 57-year-old female who has been having chest pains which she describes as a sharp pain, located substernally occurring at night when she tries to lie on her right side.

= pain + PLURALIn this context, pains is the same as pain.

Sometimes singular vs. plural matters, e.g. cyst is different from cysts.

English morphology happens to be pretty simple. It’s not as simple for other languages.

sub belowsternal sternumly (related to)

Page 35: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Approaches to identifying/combining information units

Page 36: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Creating internal representations of evidence

Symptom: pain

Degree: sharp

Loc: chest

LocMod: substernal

Source: HPI

Chest pains which she decribes as a sharp pain...Sharp pain in her chest…Sharp chest pain…Chest pain which feels sharp…

Page 37: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

An aside: words, terms, and meanings

• Multi-word expressions are logical units– Myocardial infarction

• Synonymy: many expressions one meaning– Myocardial infarction, MI, heart attack

• Ambiguity: one expression many meanings– Neck, head, depression

• These issues can be addressed using knowledge-based methods (e.g. terminologies) and/or statistical methods.

Page 38: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.
Page 39: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Copyright © 2007 by the American Health Information Management

Association. All rights reserved.

Another aside: ontologies• Ontologies are (typically hierarchical) specifications of

concepts and relationships between concepts, which encode knowledge about a domain.

• “Coding” can also map from language to concepts in an ontology.

• Ontologies support limited forms of reasoning and inference.

• This is not “understanding” in any usual sense of the term.plasmacytoma

steroid

cancer

drug

prednisone

disease

treats

Page 40: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Back from our asides:Mapping to/predicting codes

• Rule-based matching– Match representation assign code

• Statistical prediction– Statistical prediction of code based on aggregated data

Symptom:pain

Degree: sharp

Loc: chest

LocMod: substernal

Source: HPI

Page 41: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

One expert’s judgment.

719.07

Will another expert agree?

824.8

Lots of evidence reliable conclusions

824.9

824.8824.8824.8824.8824.8824.8824.8

719.07719.07

Statistical prediction

Page 42: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Machine learning: an example

Tom Mitchell (1997), Machine Learning, McGraw-Hill

Cf. C. Sims et al., Predicting cesarean delivery with decision tree models, Annual Meeting of the Society for Maternal-Fetal Medicine No11 (31/01/2000) 2000, vol. 183, no 5, pp. 1049-1231 (14 ref.), pp. 1198-1206

If

normal fetal presentation and no previous C-section and first pregnancy and no fetal distress and birth weight > 3349g

Then

Predict C-section with likelihood of 22%

Page 43: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Machine learning: an example

If

normal fetal presentation and previous C-section

Then

Predict C-section with likelihood of 39%

(regardless of first pregnancy. fetal distress, birth weight)

Page 44: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Machine learning: another example

??

Evidence 1

Evidence 2

(Now do this in 5,000 dimensions…)

Page 45: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Coding logic

• General logic– Pertinent vs. incidental findings– Choice of primary code– Code combination

• Client or payer-specific logic

Codes (along with the evidence that produced

them)

Page 46: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Overview

• Personal background• Historical perspective and high level overview• Elements of NLP for computer assisted coding• Evaluating how well a system is doing• Confidence estimation and “auto-coding”• Understanding rule-based and statistical methods• The importance of natural language input• Where NLP is headed• Some key take-aways

Page 47: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Recall: did you assign all the codes you should have?

Precision: did you assign any extra codes you shouldn’t have?

Hand-crafted rules

Matching terms in terminologies

Machine learning

How well are we doing?

Page 48: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

How NLP people evaluate NLP systems

• Evaluation by demonstration• Evaluation by inspection of examples• Evaluation by unscripted demonstration• Evaluation on data using a figure of merit• Evaluation on test data using an automatic metric• Evaluation on common test data• Evaluation on common, unseen test data

Analysis and insights driving improvement

Page 49: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Transcription

Billing

NLP Engine RoutingCoder Review

QA

Traditional Coding

CAC Evaluation Landscape

AuditTraditional evaluation methods

QA

Page 50: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

CAC Evaluation Landscape• What auditing (post hoc

evaluation) can’t provide:– Replicability: ability for you to

implement my system and verify that we get exactly the same result

– Comparability: ability to evaluate two different systems fairly against each other

– Tracking: ability to evaluate one system against itself at two points in time

– Automation: ability to perform rapid “devtest” evaluations

– Fidelity: Avoiding the "benefit of the doubt effect”, which inflates inter-coder agreement estimates

SystemTest set SystemTest set

The NLP community has converged on a standard approach to these problems…

Page 51: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Gold Standards in NLP Technology Evaluation

“Gold standard”: annotated test set• Create a representative set of test items

– Make sure it’s not used for development! • Have multiple annotators independently provide

their correct answers• Adjudicate inter-annotator disagreements

– Consensus by discussion, voting, …• Define the “upper bound” on performance as

pre-adjudication inter-annotator agreement.

Page 52: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Gold ≠ Perfect• NLP gold standards accommodate legitimate gray areas

阿富汗地震灾民开始重建家园

• NLP gold standard evaluations define human upper bounds:[We] estimate an upper bound on performance by estimating the ability for human judges to agree with one another

(Gale, Church, and Yarowsky, 1992) location

organization

earthquake victims in afghanistan start to rebuild homelandearthquake victims start reconstruction in afghanistanafghans begin restoring home after quakeafghan earth quake victims begin to rebuild their homes

Sanchez went to the bank for a loan

Page 53: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Transcription

Billing

NLP Engine RoutingCoder Review

QA

Traditional Coding

Gold standard

Intrinsic and Extrinsic NLP Evaluations

Intrinsic evaluation

Extrinsic evaluation

AuditTraditional evaluation methods

Language technology evaluation standards

QA

Upper bounds

Inter-annotator agreement

Intra-annotator agreement

Page 54: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Intrinsic and Extrinsic NLP Evaluations: Example

…trauma …

…trauma…

How accurately do we resolve ambiguous terms?

How well do we facilitate searches for clinical information?

C0043251: Injuries and Wounds:Wounds and Injuries: trauma: traumatic disorders: Traumatic injury:

C0597316: Shock; psychological shock

Page 55: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Measures of Effectiveness

• Good measures of effectiveness should– Capture some aspect of what the user wants

• Pertinent, valid, meaningful

– Have predictive value for other situations• Different test data, different coders

– Be easily replicated by others– Be expressed as a single number

• Allows two systems to be easily compared

Page 56: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Some Principles for Good Language Technology Evaluation

• Pertinent evaluation metrics• Replicable evaluation metrics• Reporting all relevant experimental parameters• Establishing upper bounds on performance• Establishing lower bounds on performance• Testing statistical significance• Never allowing developers to see the test data

Page 57: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Evaluation using recall and precision

• Evaluation metrics– Precision– Recall– F = 2PR/(P+R)

Precision P =35

Recall R = 3 6

Evaluee’s output

Page 58: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Pre

cisi

on

system1

system2

system3

Recall/Precision tradeoffs

Page 59: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Auto-coding/Accuracy tradeoffs

• It is meaningless to report recall without reporting precision (or vice versa).– Any system can easily get high recall by sacrificing

precision.

• For the same reason, it is meaningless to report direct-to-bill volumes without also reporting how accurate the codes are.

Page 60: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

• To measure NLP engine accuracy– Compare the NLP engine output to the final codes

assigned during human QA/reviewing.– Assume any changed code was incorrect. – Assume any unchanged code was correct.– Accuracy is simply

#correct / (#correct + #incorrect)– “Change rate” = 1 accuracy

Another metric: coder change rates

Page 61: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

• One limitation we would expect– “Benefit of the doubt” effects lead to inflated estimates

of coding accuracy. (Morris et al. 2000, Nossal et al. 2006)– This is a problem with all post hoc methods, including

formal auditing.

• Another limitation we would not expect– It turns out that coders sometimes change correct auto-

generated codes to incorrect codes.– This is true for both CPT and ICD codes.– This is true even for good coders.– (See Stoner et al. 2006.)

Limitations of coder change rates

Page 62: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Overview

• Personal background• Historical perspective and high level overview• Elements of NLP for computer assisted coding• Evaluating how well a system is doing• Confidence estimation and “auto-coding”• Understanding rule-based and statistical methods• The importance of natural language input• Where NLP is headed• Some key take-aways

Page 63: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

66

Confidence: A ParadoxThe more reliable a system is, the more we trust it.

But the more we trust it, the more important that the system itself alert us when we shouldn’t trust it!

Page 64: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

An automatic translation system’s output (from Arabic):

U.S. is a terrorist state and says Syria.

Seems pretty good, right?

The real translation: U.S. says Syria is a terrorist state. !!!

Page 65: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Principled confidence measures

• The more accurate the automated technology gets, the more people trust it.

• Therefore, the more important it is for the system to assess accurately for itself whether or not its decisions require human review.

• Computer assisted coding (CAC) systems need a principled basis for evaluating their own correctness at run time, in order to avoid representing sub-par coding results as trustworthy.

Page 66: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Some Possibilities

• Auditor = Coder: using internally-driven engine confidence

• Rules of thumb: “When the engine assigns these codes, they’re generally right…”

• Table driven: allow only valid CPT/ICD combinations

• Confidence assessment (uses CPT/ICD only): Pr(Correct | CPT, ICD)

• Situated confidence assessment (uses context):Pr(Correct | CPT, ICD, evidence, steps)

Page 67: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Billing

NLP Engine Routing

Engine Coding versus Confidence Assessment

Metadata evidence

Language evidence

Looking at all the evidence in the chart, which codes are the best choice?

Looking at chosen codes and how the evidence led to them, how confident are we that those codes are correct?

Sufficiently confident

“Coder” “Auditor”

Page 68: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Subject: B!G OPP@RTUN1TYTo: [email protected], [email protected], alice@foo,com, [email protected], ……

W A R N I N G !IF YOU RECEIVE AN E-MAIL WITH SUBJECT"A VIRTUAL CARD FOR YOU"DO NOT OPEN !!!IT CONTAINS A VERY, VERY DANGEROUS VIRUS.IT WAS CLASSIFIED YESTERDAY BY MICROSOFT AND MCAFEE AS THE MOST DESTRUCTIVE VIRUS OF ALL TIMES. VIRUS DESTROY HARD DISK. WITHOUT POSSIBILITY TO REPAIR. PLEASE SEND THIS MESSAGE TO EVERYBODY YOU KNOW !

Mail CategoryLabeling Routing

An Analogy: Mail Routing

E-mail header

E-mail body

Looking at all the evidence in the e-mail message, should we code this mail as “good” or as “spam”?

Looking at the choice that was made (“good” or “spam”) and at the evidence leading to that choice, how confident are we that the choice is correct?

evidence

If your mail’s going straight here, your routing had better be trustworthy!

Sufficiently confident

Page 69: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Overview

• Personal background• Historical perspective and high level overview• Elements of NLP for computer assisted coding• Evaluating how well a system is doing• Confidence estimation and “auto-coding”• Understanding rule-based and statistical methods• The importance of natural language input• Where NLP is headed• Some key take-aways

Page 70: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Understanding rule-based and statistical methods

• The terms “rule based” and “statistical” show up frequently when people are trying to assess NLP-based CAC solutions.

• The characterization of these language technology approaches seems to be confusing to a lot of people.

• This piece: help clarify what these terms mean, so that potential users of the technology understand how they relate to each other, and have an idea what questions to ask.

Page 71: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

ORDER_EXAM: MRI Hd wo&w ORDER_IND: head - h/o plasmacytoma^ MR head, without and with IV gadolinium. Comparison is made with previous outside MR head examinations 5/3/04 and 11/16/04. On the earliest outside examination, there was a mass in the right central skull base, extending infratemporal fossa, sphenoidsinus, and foramen ovale. This subsequently was demonstrated to represent a plasmacytoma. This mass is markedly reduced in size on the subsequent outside MR. Our examination continues to show abnormal signal and peripheral enhancement, in the right central skull base, and involving right clivus, right sphenoid, and base of right pterygoid. This is probably stable when compared with 11/16/04, but is considerably smaller than 5/3/04. The infratemporal soft tissue component of the lesion has resolved. No new or progressing bone lesion. Incidental note is made of a small amount of hemosiderin deposition within the cortex of the left parietal operculum without abnormal enhancement. This could represent cryptic vascular malformation, or chronic lacunar infarct. Mild cerebral leukoaraiosis.

history - head plasmacytoma

0 - skull mass 0 - head {abnormality} 0 - head plasmocytoma

CPT: 70553

ICD: 784.2

type modifier bodypart diag/problem

delimit identify normalize extract predict apply_logic assess

Page 72: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Let’s start with an exercise

• We’re building a system to recognize spoken medical terms.

• Let’s handle two words:– infarction– infection

• What does the system need to “know”?

Page 73: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

in FARK shunshin

then it should recognize the term ‘infarction’

if the system detects

en

Did you remember FACK?

If not, there’s a doctor in Hahvid Yahd who’s upset with you.

FACK???

Page 74: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

(in or en) (FARK or FACK) (shun or shin)

then it should recognize the term ‘infarction’

if the system detects

RULE

PATTERN (antecedent)

ACTION/CONCLUSION (consequent)

Page 75: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

(in or en) (FECK) (shun or shin)

then it should recognize the term ‘infection’

if the system detects

Page 76: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

in FECK shunFARK

What do we do now?

Use context… a severe ___________

… a previous myocardial ___________

Did you remember related to?

___________ related to…

Page 77: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Rule-based methods

• Encode expert knowledge• Are generally human-readable

• Historically have had trouble with the variety and variability of real-world language use

• Make all-or-nothing decisions, rather than encoding gradations or confidence (in a principled way)

• Are challenging to support

Page 78: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Rule-based NLP

Eight documents??!!(Eight documents?!)

Hirschman and Sager (1976)

Page 79: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

19

85

19

90

19

95

20

00

20

05

1983 — 1993“the return of empiricism… probabilistic models throughout speech and language processing”

2000— 2008 “the rise of machine learning”

Jurafsky and Martin (2009), Speech and Natural Language Processing

19

70

19

83

1970 — 1983“natural language understanding”

1994 — 1999 “the field comes together”

Page 80: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

… a severe ___________

… a previous myocardial ___________

___________ related to…

Machine learning: a very simple example

infection infection infection

infarction infection infectioninfection infection infection

infection infection infection

.90 infection

.10 infarction

infarction infarction infarction

infarction infarction infarctioninfarction infarction infarction

infarction infarction infarction

.0001 infection

.9999 infarction

infection infection infection

infarction infection infectioninfection infection infection

infection infection infection

.05 infection

.95 infarction

Page 81: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Machine learning: another example

http://www.nytimes.com/2010/03/09/technology/09translate.html (page A-1, March 9, 2010)

Page 82: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Statistical methods

• Learn automatically from observations of data• Can have the same kinds of structure as manually

written rules• Require representative data to learn from

• Provide confidence measures• Can be tuned to balance recall/precision

Page 83: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

A view of the state of the art in NLP

Rule-based methods

Rule-based methods

Statistical NLP

Machine learning

Rule-based methods informed by large scale

data analysis

Data

Page 84: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

A very recent example: Watson

• “A massively parallel probabilistic evidence-based architecture”– Exploits manually constructed knowledge

• Liquid IS-A Fluid (1, WordNet)

– Learns from large volumes of naturally occurring text.• Fluid IS-A Liquid (0.7)

• Employs confidence estimation pervasively– Relates its internal confidence to cost of incorrect answers

• Data driven and evaluation driven: evolved via continuous objective evaluation and an agile, omnivorous approach to development.

Page 85: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

A very recent example: Watson

“As our results dramatically improved, we observed that system-level advances allowing rapid integration and evaluation of new ideas and new components against end-to-end metrics were essential to our progress.”(Ferrucci et al., 2010)

Page 86: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Watson’s overarching principlesMassive parallelism: Exploit massive parallelism in the consideration of multiple interpretations and hypotheses.

Many experts: Facilitate the integration, application, and contextual evaluation of a wide range of loosely coupled probabilistic question and content analytics.

Pervasive confidence estimation: No component commits to an answer; all components produce features and associated confidences, scoring differ- ent question and content interpretations. An underlying confidence-processing substrate learns how to stack and combine the scores.

Integrate shallow and deep knowledge: Balance the use of strict semantics and shallow semantics, leveraging many loosely formed ontologies.

(Ferrucci et al., 2010)

Page 87: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Questions to think about when looking at statistical NLP systems

• What is the quality of the expert knowledge that has gone into the system? (Who are the experts?)

• Where in the system are machine learning techniques employed? (And again, who are the experts?)

• How much data has the system learned from?• How much variety was there in the data?• Where and how does the system employ confidence

estimation?

Page 88: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Overview

• Personal background• Historical perspective and high level overview• Elements of NLP for computer assisted coding• Confidence estimation and “auto-coding”• Evaluating how well a system is doing• Understanding rule-based and statistical methods• The importance of natural language input• Where NLP is headed• Some key take-aways

Page 89: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Beyond coding for billing…

• Electronic capture and presentation of patient, demographic and clinical information

• Outcomes analysis• Clinical decision support• Pharmacovigilance• Biosurveillance• Knowledge discovery

Page 90: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Transcription

Payers

NLP Engine RoutingCoder review

Clinical Information Landscape

Clinicians

Researchers

Policy makers

Patients

Traditional coding

Unrestricted physician language

Codes

Page 91: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Data, Information, Knowledge, Wisdom

ORDER_EXAM: MRI Hd wo&w ORDER_IND: head - h/o plasmacytoma^ MR head, without and with IV gadolinium. Comparison is made with previous outside MR head examinations 5/3/04 and 11/16/04. On the earliest outside examination, there was a mass in the right central skull base, extending infratemporal fossa, sphenoidsinus, and foramen ovale. This subsequently was demonstrated to represent a plasmacytoma. This mass is markedly reduced in size on the subsequent outside MR. Our examination continues to show abnormal signal and peripheral enhancement, in the right central skull base, and involving right clivus, right sphenoid, and base of right pterygoid. This is probably stable when compared with 11/16/04, but is considerably smaller than 5/3/04. The infratemporal soft tissue component of the lesion has resolved. No new or progressing bone lesion. Incidental note is made of a small amount of hemosiderin deposition within the cortex of the left parietal operculum without abnormal enhancement. This could represent cryptic vascular malformation, or chronic lacunar infarct. Mild cerebral leukoaraiosis. …

Original Unrestricted Unprocessed

Page 92: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

ORDER_EXAM: MRI Hd wo&w ORDER_IND: head - h/o plasmacytoma^ MR head, without and with IV gadolinium. Comparison is made with previous outside MR head examinations 5/3/04 and 11/16/04. On the earliest outside examination, there was a mass in the right central skull base, extending infratemporal fossa, sphenoidsinus, and foramen ovale. This subsequently was demonstrated to represent a plasmacytoma. This mass is markedly reduced in size on the subsequent outside MR. Our examination continues to show abnormal signal and peripheral enhancement, in the right central skull base, and involving right clivus, right sphenoid, and base of right pterygoid. This is probably stable when compared with 11/16/04, but is considerably smaller than 5/3/04. The infratemporal soft tissue component of the lesion has resolved. No new or progressing bone lesion. Incidental note is made of a small amount of hemosiderin deposition within the cortex of the left parietal operculum without abnormal enhancement. This could represent cryptic vascular malformation, or chronic lacunar infarct. Mild cerebral leukoaraiosis. …

Clinical history:

History of plasmacytoma (head)

Mass in right central skull base

Mass subsequently reduced in size

Findings:

Abnormality in right central skull base…

Mass smaller in size

Mild cerebral leukoaraiosis

Hypotheses:

Cryptic vascular malformation

Chronic lacunar infarct

Adds structure and categories to create units

Data, Information, Knowledge, Wisdom

Clinical history:

History of plasmacytoma (head)

Mass in right central skull base

Mass subsequently reduced in size

Findings:

Abnormality in right central skull base…

Mass smaller in size

Mild cerebral leukoaraiosis

Current medications:

QD100 mgprednisone

QD9 mgmelphalan

FRQDOSENAME

Page 93: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

ORDER_EXAM: MRI Hd wo&w ORDER_IND: head - h/o plasmacytoma^ MR head, without and with IV gadolinium. Comparison is made with previous outside MR head examinations 5/3/04 and 11/16/04. On the earliest outside examination, there was a mass in the right central skull base, extending infratemporal fossa, sphenoidsinus, and foramen ovale. This subsequently was demonstrated to represent a plasmacytoma. This mass is markedly reduced in size on the subsequent outside MR. Our examination continues to show abnormal signal and peripheral enhancement, in the right central skull base, and involving right clivus, right sphenoid, and base of right pterygoid. This is probably stable when compared with 11/16/04, but is considerably smaller than 5/3/04. The infratemporal soft tissue component of the lesion has resolved. No new or progressing bone lesion. Incidental note is made of a small amount of hemosiderin deposition within the cortex of the left parietal operculum without abnormal enhancement. This could represent cryptic vascular malformation, or chronic lacunar infarct. Mild cerebral leukoaraiosis. …

Identifies relationships between units

Clinical history:

History of plasmacytoma (head)

Mass in right central skull base

Mass subsequently reduced in size

Findings:

Abnormality in right central skull base…

Mass smaller in size

Mild cerebral leukoaraiosis

Current medications:

Melphalan 9 mg/m2 per day

Prednisone 100 mg/day

Hypotheses:

Cryptic vascular malformation

Chronic lacunar infarct

plasmacytoma

steroid

cancer

drug

prednisone

disease

treats

Data, Information, Knowledge, Wisdom

Page 94: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

ORDER_EXAM: MRI Hd wo&w ORDER_IND: head - h/o plasmacytoma^ MR head, without and with IV gadolinium. Comparison is made with previous outside MR head examinations 5/3/04 and 11/16/04. On the earliest outside examination, there was a mass in the right central skull base, extending infratemporal fossa, sphenoidsinus, and foramen ovale. This subsequently was demonstrated to represent a plasmacytoma. This mass is markedly reduced in size on the subsequent outside MR. Our examination continues to show abnormal signal and peripheral enhancement, in the right central skull base, and involving right clivus, right sphenoid, and base of right pterygoid. This is probably stable when compared with 11/16/04, but is considerably smaller than 5/3/04. The infratemporal soft tissue component of the lesion has resolved. No new or progressing bone lesion. Incidental note is made of a small amount of hemosiderin deposition within the cortex of the left parietal operculum without abnormal enhancement. This could represent cryptic vascular malformation, or chronic lacunar infarct. Mild cerebral leukoaraiosis. …

Ability to make good choices

Clinical history:

History of plasmacytoma (head)

Mass in right central skull base

Mass subsequently reduced in size

Findings:

Abnormality in right central skull base…

Mass smaller in size

Mild cerebral leukoaraiosis

Current medications:

Melphalan 9 mg/m2 per day

Prednisone 100 mg/day

Hypotheses:

Cryptic vascular malformation

Chronic lacunar infarct

plasmacytoma

steroid

cancer

drug

prednisone

disease

treats

Individual clinical

expertiseBest external

evidence

Patient values and expectations

Data, Information, Knowledge, Wisdom

Page 95: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

ORDER_EXAM: MRI Hd wo&w ORDER_IND: head - h/o plasmacytoma^ MR head, without and with IV gadolinium. Comparison is made with previous outside MR head examinations 5/3/04 and 11/16/04. On the earliest outside examination, there was a mass in the right central skull base, extending infratemporal fossa, sphenoidsinus, and foramen ovale. This subsequently was demonstrated to represent a plasmacytoma. This mass is markedly reduced in size on the subsequent outside MR. Our examination continues to show abnormal signal and peripheral enhancement, in the right central skull base, and involving right clivus, right sphenoid, and base of right pterygoid. This is probably stable when compared with 11/16/04, but is considerably smaller than 5/3/04. The infratemporal soft tissue component of the lesion has resolved. No new or progressing bone lesion. Incidental note is made of a small amount of hemosiderin deposition within the cortex of the left parietal operculum without abnormal enhancement. This could represent cryptic vascular malformation, or chronic lacunar infarct. Mild cerebral leukoaraiosis. …

Clinical history:

History of plasmacytoma (head)

Mass in right central skull base

Mass subsequently reduced in size

Findings:

Abnormality in right central skull base…

Mass smaller in size

Mild cerebral leukoaraiosis

Current medications:

Melphalan 9 mg/m2 per day

Prednisone 100 mg/day

plasmacytoma

steroidcancer

drug

prednisone

disease

treats

Hypotheses:

Cryptic vascular malformation

Chronic lacunar infarct

Information

• The process of knowledge discovery is a natural cycle

• At every iteration, information emerges from data by structuring and categorizing the data according to what we know now

• As we improve our knowledge, those structures and categories change

Data

KnowledgeKnowledge

Data

Information

Page 96: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

ORDER_EXAM: MRI Hd wo&w ORDER_IND: head - h/o plasmacytoma^ MR head, without and with IV gadolinium. Comparison is made with previous outside MR head examinations 5/3/04 and 11/16/04. On the earliest outside examination, there was a mass in the right central skull base, extending infratemporal fossa, sphenoidsinus, and foramen ovale. This subsequently was demonstrated to represent a plasmacytoma. This mass is markedly reduced in size on the subsequent outside MR. Our examination continues to show abnormal signal and peripheral enhancement, in the right central skull base, and involving right clivus, right sphenoid, and base of right pterygoid. This is probably stable when compared with 11/16/04, but is considerably smaller than 5/3/04. The infratemporal soft tissue component of the lesion has resolved. No new or progressing bone lesion. Incidental note is made of a small amount of hemosiderin deposition within the cortex of the left parietal operculum without abnormal enhancement. This could represent cryptic vascular malformation, or chronic lacunar infarct. Mild cerebral leukoaraiosis. …

Clinical history:

History of plasmacytoma (head)

Mass in right central skull base

Mass subsequently reduced in size

Findings:

Abnormality in right central skull base…

Mass smaller in size

Mild cerebral leukoaraiosis

Current medications:

Melphalan 9 mg/m2 per day

Prednisone 100 mg/day

plasmacytoma

steroidcancer

drug

prednisone

disease

treats

Hypotheses:

Cryptic vascular malformation

Chronic lacunar infarct

Information

• The process of knowledge discovery is a natural cycle

• At every iteration, information emerges from data by structuring and categorizing the data according to what we know now

• As we improve our knowledge, those structures and categories change

Data

KnowledgeKnowledge

Data

Information

Page 97: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Transcription

Clinicians

Researchers

Policy makers

Patients

Unrestricted physician language

ORDER_EXAM: MRI Hd wo&w ORDER_IND: head - h/o plasmacytoma^ MR head, without and with IV gadolinium. Comparison is made with previous outside MR head examinations 5/3/04 and 11/16/04. On the earliest outside examination, there was a mass in the right central skull base, extending infratemporal fossa, sphenoidsinus, and foramen ovale. This subsequently was demonstrated to represent a plasmacytoma. This mass is markedly reduced in size on the subsequent outside MR. Our examination continues to show abnormal signal and peripheral enhancement, in the right central skull base, and involving right clivus, right sphenoid, and base of right pterygoid. This is probably stable when compared with 11/16/04, but is considerably smaller than 5/3/04. The infratemporal soft tissue component of the lesion has resolved. No new or progressing bone lesion. Incidental note is made of a small amount of hemosiderin deposition within the cortex of the left parietal operculum without abnormal enhancement. This could represent cryptic vascular malformation, or chronic lacunar infarct. Mild cerebral leukoaraiosis. …

Data

Clinical history:

History of plasmacytoma (head)

Mass in right central skull base

Mass subsequently reduced in size

Findings:

Abnormality in right central skull base…

Mass smaller in size

Mild cerebral leukoaraiosis

Current medications:

Melphalan 9 mg/m2 per day

Prednisone 100 mg/day

plasmacytoma

steroidcancer

drug

prednisone

disease

treats

Hypotheses:

Cryptic vascular malformation

Chronic lacunar infarct

Information

KnowledgeKnowledge

Data

What happens if physicians enter structured information directly, instead of the original data?

How we transform data into information depends on our current state of knowledge.

Page 98: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Transcription

Clinicians

Researchers

Policy makers

Patients

Clinical history:

History of plasmacytoma (head)

Mass in right central skull base

Mass subsequently reduced in size

Findings:

Abnormality in right central skull base…

Mass smaller in size

Mild cerebral leukoaraiosis

Current medications:

Melphalan 9 mg/m2 per day

Prednisone 100 mg/day

plasmacytoma

steroidcancer

drug

prednisone

disease

treats

Hypotheses:

Cryptic vascular malformation

Chronic lacunar infarct

Information

KnowledgeKnowledge

The full clinical narrative never comes into existence.

Potentially relevant information is lost forever.

The knowledge discovery cycle is broken.

ORDER_EXAM: MRI Hd wo&w ORDER_IND: head - h/o plasmacytoma^ MR head, without and with IV gadolinium. Comparison is made with previous outside MR head examinations 5/3/04 and 11/16/04. On the earliest outside examination, there was a mass in the right central skull base, extending infratemporal fossa, sphenoidsinus, and foramen ovale. This subsequently was demonstrated to represent a plasmacytoma. This mass is markedly reduced in size on the subsequent outside MR. Our examination continues to show abnormal signal and peripheral enhancement, in the right central skull base, and involving right clivus, right sphenoid, and base of right pterygoid. This is probably stable when compared with 11/16/04, but is considerably smaller than 5/3/04. The infratemporal soft tissue component of the lesion has resolved. No new or progressing bone lesion. Incidental note is made of a small amount of hemosiderin deposition within the cortex of the left parietal operculum without abnormal enhancement. This could represent cryptic vascular malformation, or chronic lacunar infarct. Mild cerebral leukoaraiosis. …

Clinical history:

History of plasmacytoma (head)

Mass in right central skull base

Mass subsequently reduced in size

Findings:

Abnormality in right central skull base…

Mass smaller in size

Mild cerebral leukoaraiosis

Current medications:

QD100 mgprednisone

QD9 mgmelphalan

FRQDOSENAME

Page 99: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Mr. John Doe was seen in our office today in follow up of his paroxysmal atrial fibrillation. . . . He recently called our office in February stating he was back in atrial fibrillation which was documented on electrocardiogram. I elected to increase his Betapace to 160 mg twice a day and he did convert back to normal sinus rhythm. We had recommended Coumadin to him at that time but he did not start any Coumadin. He has done well since with no recurrence of arrhythmia and he is acutely aware of when he goes into the fibrillation. . . .He seems to be doing well on the increased dose of Betapace 160 mg twice a day. I told him he should take a daily baby aspirin and also that if he has recurrent episodes of fibrillation, he needs to let us know because I think he would need to be on Coumadin anticoagulation and may need an adjustment in his antiarrhythmic regimen.

If the full clinical narrative never comes into existence…

There is clear evidence that this patient’s self-reports are trustworthy and relevant. In your thinking on his clinical

care, you should make sure to pay attention to them.

Here’s the reasoning connected to my recommendation of Coumadin, the status of that recommendation, and the

circumstances under which I think the recommendation should be revisited.

Page 100: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

…the knowledge discovery cycle is broken.

NEW YORK (Reuters Health), Jun 19 - Unlike adults with intracranial aneurysms, most intracranial arterial aneurysms in pediatric patients are idiopathic and are associated with no known risk factors for vascular disease, investigators reported at the American Society of Neuroradiology's annual meeting in Chicago.

"Our study suggests that -- unlike the adult disease -- childhood aneurysms may be driven by unique predisposing factors that we have not yet identified. It could have much less to do with underlying conditions commonly thought to contribute to their development," presenter Dr. Todd Abruzzo told Reuters Health. …

Dr. Abruzzo, an interventional neuroradiologist at the University of Cincinnati in Ohio, and associates conducted a review of records from three tertiary referral hospitals between 1993 and 2006. …

If the full clinical narrative never comes into existence…

Page 101: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

• What if the true risk factors we look are not part of any EHR’s structured nomenclature?

• What if the relevant factors are expressed in current clinical narratives using non-clinical terminology?

• If physicians enter nomenclature directly, instead of the full narrative, how will we ever know what information we have lost?

• Without the original data, we can never reanalyze physicians’ observations in the light of new knowledge and new categories.

The knowledge discovery cycle is broken

Page 102: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

• There are good reasons to standardize the representations in health information systems (interoperability, data mining, etc.)

but

• The data-information-knowledge-wisdom analysis forces us to ask what will be lost, if clinicians’ input language is standardized.

A dilemma

Page 103: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

• Recognize that standardized representations are different from standardized input.

• Allow clinicians to express the clinical narrative in all its richness and nuance, through natural dictation.

• Transform their natural language into representations that permit standardization and interoperability.

NLP as a solution

Page 104: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

NLP Engine

Clinicians

Researchers

Policy makers

Patients

plasmacytoma

steroidcancer

drug

prednisone

disease

treats

ORDER_EXAM: MRI Hd wo&w ORDER_IND: head - h/o plasmacytoma^ MR head, without and with IV gadolinium. Comparison is made with previous outside MR head examinations 5/3/04 and 11/16/04. On the earliest outside examination, there was a mass in the right central skull base, extending infratemporal fossa, sphenoidsinus, and foramen ovale. This subsequently was demonstrated to represent a plasmacytoma. This mass is markedly reduced in size on the subsequent outside MR. Our examination continues to show abnormal signal and peripheral enhancement, in the right central skull base, and involving right clivus, right sphenoid, and base of right pterygoid. This is probably stable when

ORDER_EXAM: MRI Hd wo&w ORDER_IND: head - h/o plasmacytoma^ MR head, without and with IV gadolinium. Comparison is made with previous outside MR head examinations 5/3/04 and 11/16/04. On the earliest outside examination, there was a mass in the right central skull base, extending infratemporal fossa, sphenoidsinus, and foramen ovale. This subsequently was demonstrated to represent a plasmacytoma. This mass is markedly reduced in size on the subsequent outside MR. Our examination continues to show abnormal signal and peripheral enhancement, in the right central skull base, and involving right clivus, right sphenoid, and base of right pterygoid. This is probably stable when

ORDER_EXAM: MRI Hd wo&w ORDER_IND: head - h/o plasmacytoma^ MR head, without and with IV gadolinium. Comparison is made with previous outside MR head examinations 5/3/04 and 11/16/04. On the earliest outside examination, there was a mass in the right central skull base, extending infratemporal fossa, sphenoidsinus, and foramen ovale. This subsequently was demonstrated to represent a plasmacytoma. This mass is markedly reduced in size on the subsequent outside MR. Our examination continues to show abnormal signal and peripheral enhancement, in the right central skull base, and involving right clivus, right sphenoid, and base of right pterygoid. This is probably stable when

ORDER_EXAM: MRI Hd wo&w ORDER_IND: head - h/o plasmacytoma^ MR head, without and with IV gadolinium. Comparison is made with previous outside MR head examinations 5/3/04 and 11/16/04. On the earliest outside examination, there was a mass in the right central skull base, extending infratemporal fossa, sphenoidsinus, and foramen ovale. This subsequently was demonstrated to represent a plasmacytoma. This mass is markedly reduced in size on the subsequent outside MR. Our examination continues to show abnormal signal and peripheral enhancement, in the right central skull base, and involving right clivus, right sphenoid, and base of right pterygoid. This is probably stable when

ORDER_EXAM: MRI Hd wo&w ORDER_IND: head - h/o plasmacytoma^ MR head, without and with IV gadolinium. Comparison is made with previous outside MR head examinations 5/3/04 and 11/16/04. On the earliest outside examination, there was a mass in the right central skull base, extending infratemporal fossa, sphenoidsinus, and foramen ovale. This subsequently was demonstrated to represent a plasmacytoma. This mass is markedly reduced in size on the subsequent outside MR. Our examination continues to show abnormal signal and peripheral enhancement, in the right central skull base, and involving right clivus, right sphenoid, and base of right pterygoid. This is probably stable when

ORDER_EXAM: MRI Hd wo&w ORDER_IND: head - h/o plasmacytoma^ MR head, without and with IV gadolinium. Comparison is made with previous outside MR head examinations 5/3/04 and 11/16/04. On the earliest outside examination, there was a mass in the right central skull base, extending infratemporal fossa, sphenoidsinus, and foramen ovale. This subsequently was demonstrated to represent a plasmacytoma. This mass is markedly reduced in size on the subsequent outside MR. Our examination continues to show abnormal signal and peripheral enhancement, in the right central skull base, and involving right clivus, right sphenoid, and base of right pterygoid. This is probably stable when

Transcription

Clinical history:

History of plasmacytoma (head)

Mass in right central skull base

Mass subsequently reduced in size

Findings:

Abnormality in right central skull base…

Mass smaller in size

Mild cerebral leukoaraiosis

Current medications:

Melphalan 9 mg/m2 per day

Prednisone 100 mg/day

Physicians focus on the care of the patient and communicate unimpeded, full, narrative clinical data.

Informed by the best current knowledge and data, language technology transforms clinical language into standardized, interoperable, available information.

Both health information technology and medical communities of practice inform, and are informed by, evolving medical knowledge.

Page 105: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Overview

• Personal background• Historical perspective and high level overview• Elements of NLP for computer assisted coding• Confidence estimation and “auto-coding”• Evaluating how well a system is doing• Understanding rule-based and statistical methods• The importance of natural language input• Where NLP is headed• Some key take-aways

Page 106: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Using text to predict the real world

O’Connor, B.; Balasubramanyan, R.; Routledge, B. R.; Smith, N. A. 2010. From tweets to polls: linking text sentiment to public opinion time series. Proc. ICWSM pp. 122-129.

Page 107: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Using text to predict the real world

Page 108: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Using text to predict the real world

Page 109: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Topics and agendas05_03_02.txt.0002 BEGALA Good evening. Welcome to CROSSFIRE,coming to you live from the George Washington University in beautifuldowntown Washington, D.C. Tonight in the CROSSFIRE, the case ofthe Reverend Paul Shanley, the Roman Catholic priest facing childrape charges in Massachusetts. Should his superiors be heldresponsible? Also, Matt Drudge, founder of the Internet "DrudgeReport." Is he a right-wing muckraker, an Internet gossip or alegitimate journalist? We'll ask Drudge himself when we get himin the CROSSFIRE. First, flying the not-so-friendly skies, wouldyou feel safer if pilots were armed? One outspoken congressionalcritic is against having guns in the cockpit. We're going tointroduce her now. Please welcome, Eleanor Holmes Norton, theDemocratic delegate from the District of Columbia. Ms. Norton, thankyou. Welcome back.

05_03_02.txt.0003 CARLSON Now, Ms. Norton, the majority,the vast majority of commercial airline pilots are strongly in favorof carrying guns in the cockpit on commercial airliners. You'reagainst it. What do you as a delegate know about operating acommercial airliner that the majority of commercial airline pilotsdon't know

05_03_02.txt.0004 DELEGATE Well, I know whatTransportation Secretary Norm Mineta tells me, and I know whatHomeland Security Adviser Tom Ridge tells me, and they are againstit. And I think the reason they are against it is you don't wantthe guy who's flying one of these big busters up there also with agun in his hand trying to protect his plane. You want air marshalsto do that. You want flight attendants to understand how to protectthe cockpit. And you want the redundancies that we have built in,redundancy after redundancy, working for you. We are panicking theAmerican people. They say, oh my God, I thought they had thehearings, I thought they did that. Here come the pilots saying,oh no, they haven't. We've got to have guns.

05_03_02.txt.0002 BEGALA Good evening. Welcome to CROSSFIRE,coming to you live from the George Washington University in beautifuldowntown Washington, D.C. Tonight in the CROSSFIRE, the case ofthe Reverend Paul Shanley, the Roman Catholic priest facing childrape charges in Massachusetts. Should his superiors be heldresponsible? Also, Matt Drudge, founder of the Internet "DrudgeReport." Is he a right-wing muckraker, an Internet gossip or alegitimate journalist? We'll ask Drudge himself when we get himin the CROSSFIRE. First, flying the not-so-friendly skies, wouldyou feel safer if pilots were armed? One outspoken congressionalcritic is against having guns in the cockpit. We're going tointroduce her now. Please welcome, Eleanor Holmes Norton, theDemocratic delegate from the District of Columbia. Ms. Norton, thankyou. Welcome back.

05_03_02.txt.0003 CARLSON Now, Ms. Norton, the majority,the vast majority of commercial airline pilots are strongly in favorof carrying guns in the cockpit on commercial airliners. You'reagainst it. What do you as a delegate know about operating acommercial airliner that the majority of commercial airline pilotsdon't know

05_03_02.txt.0004 DELEGATE Well, I know whatTransportation Secretary Norm Mineta tells me, and I know whatHomeland Security Adviser Tom Ridge tells me, and they are againstit. And I think the reason they are against it is you don't wantthe guy who's flying one of these big busters up there also with agun in his hand trying to protect his plane. You want air marshalsto do that. You want flight attendants to understand how to protectthe cockpit. And you want the redundancies that we have built in,redundancy after redundancy, working for you. We are panicking theAmerican people. They say, oh my God, I thought they had thehearings, I thought they did that. Here come the pilots saying,oh no, they haven't. We've got to have guns.

05_03_02.txt.0002 BEGALA Good evening. Welcome to CROSSFIRE,coming to you live from the George Washington University in beautifuldowntown Washington, D.C. Tonight in the CROSSFIRE, the case ofthe Reverend Paul Shanley, the Roman Catholic priest facing childrape charges in Massachusetts. Should his superiors be heldresponsible? Also, Matt Drudge, founder of the Internet "DrudgeReport." Is he a right-wing muckraker, an Internet gossip or alegitimate journalist? We'll ask Drudge himself when we get himin the CROSSFIRE. First, flying the not-so-friendly skies, wouldyou feel safer if pilots were armed? One outspoken congressionalcritic is against having guns in the cockpit. We're going tointroduce her now. Please welcome, Eleanor Holmes Norton, theDemocratic delegate from the District of Columbia. Ms. Norton, thankyou. Welcome back.

05_03_02.txt.0003 CARLSON Now, Ms. Norton, the majority,the vast majority of commercial airline pilots are strongly in favorof carrying guns in the cockpit on commercial airliners. You'reagainst it. What do you as a delegate know about operating acommercial airliner that the majority of commercial airline pilotsdon't know

05_03_02.txt.0004 DELEGATE Well, I know whatTransportation Secretary Norm Mineta tells me, and I know whatHomeland Security Adviser Tom Ridge tells me, and they are againstit. And I think the reason they are against it is you don't wantthe guy who's flying one of these big busters up there also with agun in his hand trying to protect his plane. You want air marshalsto do that. You want flight attendants to understand how to protectthe cockpit. And you want the redundancies that we have built in,redundancy after redundancy, working for you. We are panicking theAmerican people. They say, oh my God, I thought they had thehearings, I thought they did that. Here come the pilots saying,oh no, they haven't. We've got to have guns.

Can word frequencies tell you about the topics in a set of documents?

Page 110: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Topics and agendas05_03_02.txt.0002 BEGALA Good evening. Welcome to CROSSFIRE,coming to you live from the George Washington University in beautifuldowntown Washington, D.C. Tonight in the CROSSFIRE, the case ofthe Reverend Paul Shanley, the Roman Catholic priest facing childrape charges in Massachusetts. Should his superiors be heldresponsible? Also, Matt Drudge, founder of the Internet "DrudgeReport." Is he a right-wing muckraker, an Internet gossip or alegitimate journalist? We'll ask Drudge himself when we get himin the CROSSFIRE. First, flying the not-so-friendly skies, wouldyou feel safer if pilots were armed? One outspoken congressionalcritic is against having guns in the cockpit. We're going tointroduce her now. Please welcome, Eleanor Holmes Norton, theDemocratic delegate from the District of Columbia. Ms. Norton, thankyou. Welcome back.

05_03_02.txt.0003 CARLSON Now, Ms. Norton, the majority,the vast majority of commercial airline pilots are strongly in favorof carrying guns in the cockpit on commercial airliners. You'reagainst it. What do you as a delegate know about operating acommercial airliner that the majority of commercial airline pilotsdon't know

05_03_02.txt.0004 DELEGATE Well, I know whatTransportation Secretary Norm Mineta tells me, and I know whatHomeland Security Adviser Tom Ridge tells me, and they are againstit. And I think the reason they are against it is you don't wantthe guy who's flying one of these big busters up there also with agun in his hand trying to protect his plane. You want air marshalsto do that. You want flight attendants to understand how to protectthe cockpit. And you want the redundancies that we have built in,redundancy after redundancy, working for you. We are panicking theAmerican people. They say, oh my God, I thought they had thehearings, I thought they did that. Here come the pilots saying,oh no, they haven't. We've got to have guns.

05_03_02.txt.0002 BEGALA Good evening. Welcome to CROSSFIRE,coming to you live from the George Washington University in beautifuldowntown Washington, D.C. Tonight in the CROSSFIRE, the case ofthe Reverend Paul Shanley, the Roman Catholic priest facing childrape charges in Massachusetts. Should his superiors be heldresponsible? Also, Matt Drudge, founder of the Internet "DrudgeReport." Is he a right-wing muckraker, an Internet gossip or alegitimate journalist? We'll ask Drudge himself when we get himin the CROSSFIRE. First, flying the not-so-friendly skies, wouldyou feel safer if pilots were armed? One outspoken congressionalcritic is against having guns in the cockpit. We're going tointroduce her now. Please welcome, Eleanor Holmes Norton, theDemocratic delegate from the District of Columbia. Ms. Norton, thankyou. Welcome back.

05_03_02.txt.0003 CARLSON Now, Ms. Norton, the majority,the vast majority of commercial airline pilots are strongly in favorof carrying guns in the cockpit on commercial airliners. You'reagainst it. What do you as a delegate know about operating acommercial airliner that the majority of commercial airline pilotsdon't know

05_03_02.txt.0004 DELEGATE Well, I know whatTransportation Secretary Norm Mineta tells me, and I know whatHomeland Security Adviser Tom Ridge tells me, and they are againstit. And I think the reason they are against it is you don't wantthe guy who's flying one of these big busters up there also with agun in his hand trying to protect his plane. You want air marshalsto do that. You want flight attendants to understand how to protectthe cockpit. And you want the redundancies that we have built in,redundancy after redundancy, working for you. We are panicking theAmerican people. They say, oh my God, I thought they had thehearings, I thought they did that. Here come the pilots saying,oh no, they haven't. We've got to have guns.

05_03_02.txt.0002 BEGALA Good evening. Welcome to CROSSFIRE,coming to you live from the George Washington University in beautifuldowntown Washington, D.C. Tonight in the CROSSFIRE, the case ofthe Reverend Paul Shanley, the Roman Catholic priest facing childrape charges in Massachusetts. Should his superiors be heldresponsible? Also, Matt Drudge, founder of the Internet "DrudgeReport." Is he a right-wing muckraker, an Internet gossip or alegitimate journalist? We'll ask Drudge himself when we get himin the CROSSFIRE. First, flying the not-so-friendly skies, wouldyou feel safer if pilots were armed? One outspoken congressionalcritic is against having guns in the cockpit. We're going tointroduce her now. Please welcome, Eleanor Holmes Norton, theDemocratic delegate from the District of Columbia. Ms. Norton, thankyou. Welcome back.

05_03_02.txt.0003 CARLSON Now, Ms. Norton, the majority,the vast majority of commercial airline pilots are strongly in favorof carrying guns in the cockpit on commercial airliners. You'reagainst it. What do you as a delegate know about operating acommercial airliner that the majority of commercial airline pilotsdon't know

05_03_02.txt.0004 DELEGATE Well, I know whatTransportation Secretary Norm Mineta tells me, and I know whatHomeland Security Adviser Tom Ridge tells me, and they are againstit. And I think the reason they are against it is you don't wantthe guy who's flying one of these big busters up there also with agun in his hand trying to protect his plane. You want air marshalsto do that. You want flight attendants to understand how to protectthe cockpit. And you want the redundancies that we have built in,redundancy after redundancy, working for you. We are panicking theAmerican people. They say, oh my God, I thought they had thehearings, I thought they did that. Here come the pilots saying,oh no, they haven't. We've got to have guns.

Looking at just word counts often gives you

a mish-mash.

Page 111: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Topics and agendas05_03_02.txt.0002 BEGALA Good evening. Welcome to CROSSFIRE,coming to you live from the George Washington University in beautifuldowntown Washington, D.C. Tonight in the CROSSFIRE, the case ofthe Reverend Paul Shanley, the Roman Catholic priest facing childrape charges in Massachusetts. Should his superiors be heldresponsible? Also, Matt Drudge, founder of the Internet "DrudgeReport." Is he a right-wing muckraker, an Internet gossip or alegitimate journalist? We'll ask Drudge himself when we get himin the CROSSFIRE. First, flying the not-so-friendly skies, wouldyou feel safer if pilots were armed? One outspoken congressionalcritic is against having guns in the cockpit. We're going tointroduce her now. Please welcome, Eleanor Holmes Norton, theDemocratic delegate from the District of Columbia. Ms. Norton, thankyou. Welcome back.

05_03_02.txt.0003 CARLSON Now, Ms. Norton, the majority,the vast majority of commercial airline pilots are strongly in favorof carrying guns in the cockpit on commercial airliners. You'reagainst it. What do you as a delegate know about operating acommercial airliner that the majority of commercial airline pilotsdon't know

05_03_02.txt.0004 DELEGATE Well, I know whatTransportation Secretary Norm Mineta tells me, and I know whatHomeland Security Adviser Tom Ridge tells me, and they are againstit. And I think the reason they are against it is you don't wantthe guy who's flying one of these big busters up there also with agun in his hand trying to protect his plane. You want air marshalsto do that. You want flight attendants to understand how to protectthe cockpit. And you want the redundancies that we have built in,redundancy after redundancy, working for you. We are panicking theAmerican people. They say, oh my God, I thought they had thehearings, I thought they did that. Here come the pilots saying,oh no, they haven't. We've got to have guns.

05_03_02.txt.0002 BEGALA Good evening. Welcome to CROSSFIRE,coming to you live from the George Washington University in beautifuldowntown Washington, D.C. Tonight in the CROSSFIRE, the case ofthe Reverend Paul Shanley, the Roman Catholic priest facing childrape charges in Massachusetts. Should his superiors be heldresponsible? Also, Matt Drudge, founder of the Internet "DrudgeReport." Is he a right-wing muckraker, an Internet gossip or alegitimate journalist? We'll ask Drudge himself when we get himin the CROSSFIRE. First, flying the not-so-friendly skies, wouldyou feel safer if pilots were armed? One outspoken congressionalcritic is against having guns in the cockpit. We're going tointroduce her now. Please welcome, Eleanor Holmes Norton, theDemocratic delegate from the District of Columbia. Ms. Norton, thankyou. Welcome back.

05_03_02.txt.0003 CARLSON Now, Ms. Norton, the majority,the vast majority of commercial airline pilots are strongly in favorof carrying guns in the cockpit on commercial airliners. You'reagainst it. What do you as a delegate know about operating acommercial airliner that the majority of commercial airline pilotsdon't know

05_03_02.txt.0004 DELEGATE Well, I know whatTransportation Secretary Norm Mineta tells me, and I know whatHomeland Security Adviser Tom Ridge tells me, and they are againstit. And I think the reason they are against it is you don't wantthe guy who's flying one of these big busters up there also with agun in his hand trying to protect his plane. You want air marshalsto do that. You want flight attendants to understand how to protectthe cockpit. And you want the redundancies that we have built in,redundancy after redundancy, working for you. We are panicking theAmerican people. They say, oh my God, I thought they had thehearings, I thought they did that. Here come the pilots saying,oh no, they haven't. We've got to have guns.

05_03_02.txt.0002 BEGALA Good evening. Welcome to CROSSFIRE,coming to you live from the George Washington University in beautifuldowntown Washington, D.C. Tonight in the CROSSFIRE, the case ofthe Reverend Paul Shanley, the Roman Catholic priest facing childrape charges in Massachusetts. Should his superiors be heldresponsible? Also, Matt Drudge, founder of the Internet "DrudgeReport." Is he a right-wing muckraker, an Internet gossip or alegitimate journalist? We'll ask Drudge himself when we get himin the CROSSFIRE. First, flying the not-so-friendly skies, wouldyou feel safer if pilots were armed? One outspoken congressionalcritic is against having guns in the cockpit. We're going tointroduce her now. Please welcome, Eleanor Holmes Norton, theDemocratic delegate from the District of Columbia. Ms. Norton, thankyou. Welcome back.

05_03_02.txt.0003 CARLSON Now, Ms. Norton, the majority,the vast majority of commercial airline pilots are strongly in favorof carrying guns in the cockpit on commercial airliners. You'reagainst it. What do you as a delegate know about operating acommercial airliner that the majority of commercial airline pilotsdon't know

05_03_02.txt.0004 DELEGATE Well, I know whatTransportation Secretary Norm Mineta tells me, and I know whatHomeland Security Adviser Tom Ridge tells me, and they are againstit. And I think the reason they are against it is you don't wantthe guy who's flying one of these big busters up there also with agun in his hand trying to protect his plane. You want air marshalsto do that. You want flight attendants to understand how to protectthe cockpit. And you want the redundancies that we have built in,redundancy after redundancy, working for you. We are panicking theAmerican people. They say, oh my God, I thought they had thehearings, I thought they did that. Here come the pilots saying,oh no, they haven't. We've got to have guns.

Bayesian topic models* discover the distinct topics interwoven in documents.

*Wikipedia: Topic Model; Blei et al. 2003

Page 112: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

.03

.44

.00

.11

Topics and agendas

Any part of the conversation can be viewed as a

mixture of topics.

Well, I know what Transportation Secretary Norm Mineta tells me, and I know what Homeland Security Adviser Tom Ridge tells me, and they are against it. And I think the reason they are against it is you don't want the guy who's flying one of these big busters up there also with a gun in his hand trying to protect his plane. You want air marshals to do that. You want flight attendants to understand how to protect the cockpit. And you want the redundancies that we have built in, redundancy after redundancy, working for you. We are panicking the American people. They say, oh my God, I thought they had the hearings, I thought they did that. Here come the pilots saying, oh no, they haven't. We've got to have guns.

Page 113: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Thoughts on where NLP is headed• Watson’s key lessons

– Bringing together data-driven methods with knowledge– Pursuing paths in parallel and combining evidence– Constant data-driven assessment/evaluation– Pervasive confidence estimation

• Supervised learning methods– Using human choices/behavior as basis for prediction

• Semi-supervised learning methods– Taking additional advantage of raw data

• Unsupervised methods– Discovering structure in text

and what it reveals about the real world

Page 114: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Some references (general)• Philip Resnik and Jimmy Lin, ``Evaluation of NLP Systems'', in Alex Clark, Chris

Fox, and Shalom Lappin (eds.), Handbook of Computational Linguistics and Natural Language Processing, Wiley Blackwell, June 2010.

• Philip Resnik, "Word Sense Ambiguity". International Encyclopedia of Linguistics, 2nd edition. William J. Frawley, editor. Oxford University Press: Oxford, England, 2003.

Page 115: Natural Language Processing: A Healthcare-Oriented Tutorial Philip Resnik, Ph.D. March 22, 2011.

Some references (healthcare)• Philip Resnik, Michael Niv, Michael Nossal, Andrew Kapit, and Richard Toren,

“Communication of Clinically Relevant Information in Electronic Health Records: A Comparison between Structured Data and Unrestricted Physician Language,” Perspectives in Health Information Management: Computer Assisted Coding Conference Proceedings, AHIMA, Fall 2008.

• Philip Resnik Michael Niv, Michael Nossal, Gregory Schnitzer, Jean Stoner, Andrew Kapit, and Richard Toren, “Using Intrinsic and Extrinsic Metrics to Evaluate Accuracy and Facilitation in Computer Assisted Coding,” Perspectives in Health Information Management: Computer Assisted Coding Conference Proceedings, AHIMA, Fall 2006.

• Yuankai Jiang, Michael Nossal, and Philip Resnik, “How Does the System Know It's Right? Automated Confidence Assessment for Compliant Coding,” Perspectives in Health Information Management: Computer Assisted Coding Conference Proceedings, AHIMA, Fall 2006.

• Michael Nossal, Philip Resnik, Jean Stoner, “Assessing Coder Change Rates as an Evaluation Metric,” Perspectives in Health Information Management: Computer Assisted Coding Conference Proceedings, AHIMA, Fall 2006.