Automating Discovery from Biomedical Texts
-
Upload
idola-carr -
Category
Documents
-
view
26 -
download
0
description
Transcript of Automating Discovery from Biomedical Texts
![Page 1: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/1.jpg)
Automating Discovery from Biomedical Texts
Marti Hearst & Barbara RosarioUC Berkeley
Agyinc VisitAugust 16, 2000
![Page 2: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/2.jpg)
The LINDI ProjectLinking Information for New
Discoveries
UIs for building and reusing hypothesis seeking strategies.
Statistical language analysis techniques for extracting propositions
Two Main Thrusts:
![Page 3: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/3.jpg)
Scenario: Explore Functions of a Gene
Objective– Determine the functions of a newly
sequenced Gene X. Known facts
– Gene X co-expresses (activated in the same cell) with Gene A, B, C
– The relationship of Gene A, B, C with certain types of diseases (from medical literature)
Question– What types of diseases are Gene X related
to?
![Page 4: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/4.jpg)
Gene Co-expression:Role in the genetic pathway
g?
PSA
Kall.
PAP
h?
PSA
Kall.
PAP
g?
Other possibilities as well
![Page 5: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/5.jpg)
Make use of the literature
Look up what is known about the other genes.
Different articles in different collections Look for commonalities
– Similar topics indicated by Subject Descriptors
– Similar words in titles and abstractsadenocarcinoma, neoplasm, prostate, prostatic
neoplasms, tumor markers, antibodies ...
![Page 6: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/6.jpg)
Developing Strategies
Different strategies seem needed for different situations– First: see what is known about
Kallikrein.– 7341 documents. Too many– AND the result with “disease” category
» If result is non-empty, this might be an interesting gene
– Now get 803 documents
![Page 7: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/7.jpg)
Medical Literature
Explore Functions of New Gene X
Gene-A
Key
wo
rds
Slide adapted from K. Patel
Projection
Mapping
Query
![Page 8: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/8.jpg)
Developing Strategies
Different strategies seem needed for different situations– First: see what is known about Kallikrein.– 7341 documents. Too many– AND the result with “disease” category
» If result is non-empty, this might be an interesting gene
– Now get 803 documents– AND the result with PSA
» Get 11 documents. Better!
![Page 9: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/9.jpg)
Medical Literature
Explore Functions of New Gene X
Gene-A
Key
wo
rds
Key
wo
rds
Gene-B Gene-C
Key
wo
rds
Projection
Keywords
Intersection
Query
![Page 10: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/10.jpg)
Developing Strategies
Look for commalities among these documents– Manual scan through ~100 category
labels– Would have been better if
»Automatically organized» Intersections of “important” categories
scanned for first
![Page 11: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/11.jpg)
Medical Literature
Explore Functions of New Gene X
Gene-A
Key
wo
rds
Key
wo
rds
Gene-B
Keywords
Keywords
Slide adapted from K. Patel
Slicing
Gene-C
Key
wo
rds
Projection
Keywords
Intersection
Mapping
Query
![Page 12: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/12.jpg)
Try a new tack
Researcher uses knowledge of field to realize these are related to prostate cancer and diagnostic tests
New tack: intersect search on all three known genes– Hope they all talk about diagnostics
and prostate cancer– Fortunately, 7 documents returned– Bingo! A relation to regulation of this
cancer
![Page 13: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/13.jpg)
Medical Literature
Explore Functions of New Gene X
Possible FunctionFor Gene-X
Gene-A
Key
wo
rds
Key
wo
rds
Gene-B
Keywords
Keywords
Slide adapted from K. Patel
Slicing
Gene-C
Key
wo
rds
Projection
Keywords
Intersection
Mapping
Query
Query
![Page 14: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/14.jpg)
Formulate a Hypothesis
Hypothesis: mystery gene has to do with regulation of expression of genes leading to prostate cancer
New tack: do some lab tests– See if mystery gene is similar in
molecular structure to the others– If so, it might do some of the same
things they do
![Page 15: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/15.jpg)
Strategies again
In hindsight, combining all three genes was a good strategy.– Store this for later
Might not have worked– Need a suite of strategies– Build them up via experience and a
good UI
![Page 16: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/16.jpg)
The System Doing the same query with slightly different
values each time is time-consuming and tedious
Same goes for cutting and pasting results– IR systems don’t support varying queries
like this very well.– Each situation is a bit different
Some automatic processing is needed in the background to eliminate/suggest hypotheses
![Page 17: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/17.jpg)
The User Interface A general search interface should
support– History– Context– Comparison– Operators: Intersection, Union, Slicing– Operator Reuse– Visualization (where appropriate)
We have an initial implementation It needs lots of work
![Page 18: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/18.jpg)
Architecture of LINDI UI
Data Layer Annotation Layer User Interface Layer
![Page 19: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/19.jpg)
Data Layer Purpose
– Hide different formats of text collections Components
– Data: Abstractions representing records of a text collection
– Operations: performed on the data Data
– A set of records– Each record is a set of tuples with types
Operations– union, intersection, projection, mapping
![Page 20: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/20.jpg)
Annotation Layer
Purpose– Associate data set with operations
that produced them (history)– History is a first class object
Advantage– Streamline a sequence of operations– Reuse operations– Parameterize operations
![Page 21: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/21.jpg)
User Interface
Direct manipulation of information objects and access operations– Query– Intersection– Union– Mapping– Slicing
Record and reuse of past operations Parameterization of operations Streamlining of operations
![Page 22: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/22.jpg)
Initial Palette
![Page 23: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/23.jpg)
Query Structure Determined by Collection Type
![Page 24: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/24.jpg)
Query Operation Results
![Page 25: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/25.jpg)
Projection Operation and Subsequent Results
![Page 26: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/26.jpg)
Parameterized Query: Repeat operations with different values
GC
GB
GA
![Page 27: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/27.jpg)
Intersection over Projected Attribute
![Page 28: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/28.jpg)
Intersection over Projected Attribute
![Page 29: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/29.jpg)
Example Interaction with UI Prototype
1 Query on Gene names2 Project out only mesh headings3 Intersect the results4 Map to create a ranking5 Slice out the top-ranked.
![Page 30: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/30.jpg)
Future Work on UI As currently designed
– Better labeling– Better layout
» Intuitive» Scalable
– Connection to real backend– User Testing
» Does direct manipulation work?» What operator sequences help?» How to improve parameterization?
More advanced– Support for strategies– Incorporation of NLP
![Page 31: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/31.jpg)
Language Analysis Component
Goals:– Extract Propositions from Text– Make Inferences
![Page 32: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/32.jpg)
Language Analysis Component
Why Extract Propositions from Text?– Text is how knowledge at the
propositional level is communicated– Text is continually being created and
updated by the outside world
![Page 33: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/33.jpg)
Example:Statistical Semantic
GrammarTo detect causal relationships between medical concepts– Title:
Magnesium deficiency implicated in increased stress levels.
– Interpretation: <nutrient><reduction> related-to
<increase><symptom>
– Inference:» Increase(stress, decrease(mg))
![Page 34: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/34.jpg)
Statistical Semantic Grammars
Empirical NLP has made great strides– But mainly applied to syntactic structure
Semantic grammars are powerful, but– Brittle – Time-consuming to construct
Idea:– Use what we now know about statistical NLP
to build up a probabilistic grammar
![Page 35: Automating Discovery from Biomedical Texts](https://reader036.fdocuments.net/reader036/viewer/2022062721/56813866550346895da015ff/html5/thumbnails/35.jpg)
LINDI: Target Components
1. Special UI for retrieving appropriate docs
2. Language analysis on docs to detect causal relationships between concepts
3. Probabilistic representation of concepts and relationships
4. UI + User: Hypothesis creation