quadri: bumps in the road from language to data filequadri: bumps in the road from language to data...
-
Upload
vuongkhanh -
Category
Documents
-
view
215 -
download
0
Transcript of quadri: bumps in the road from language to data filequadri: bumps in the road from language to data...
![Page 1: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/1.jpg)
quadri: bumps in the road from language to data
presented by richard waldinger
joint work with cleo condoravdi danny bobrow, kyle richardson, and amar das
9 march 2012
![Page 2: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/2.jpg)
why do we need logic? Want to distinguish between A patient does not have a regimen
with AZT. and A patient has a regimen. The regimen
does not have AZT.
Go waldinger quadri 2
![Page 3: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/3.jpg)
axiomatic subject domain theory defines concepts in queries. describes constructs in database. introduces the background knowledge
that bridges the gap between them.
waldinger quadri 3
![Page 4: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/4.jpg)
SNARK: theorem proving full first order logic: resolution equality reasoning: paramodulation,
rewriting. ontology reasoning: sorted logic. temporal reasoning: allen temporal interval
calculus, date and time arithmetic. answer extraction. procedural attachment….
created by Mark Stickel at SRI
waldinger quadri 4
![Page 5: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/5.jpg)
procedural attachment symbols in domain theory linked to
procedures: data base look-up other computations
when symbol appears in search, corresponding procedure is invoked.
results of computation introduced into proof search.
virtual extension of theory
waldinger quadri 5
![Page 6: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/6.jpg)
derived objects entity allowed in query. defined in domain theory. not represented explicitly in the data
base. duration (finish-time - start-time) “treatment change episode” (tce).
waldinger quadri 6
![Page 7: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/7.jpg)
playback time Show me patients on AZT. there exists a patient14 such that there exists a regimen15 such that there exists a azt13 such that patient14 is a patient and patient14 has regimen15, regimen15 has azt13 and azt13 is azt
waldinger quadri 7
![Page 8: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/8.jpg)
donkey anaphora a patient has a regimen with azt. exists(?pa+ent,?regimen)pa+ent‐has‐regimen(?pa+ent,?regimen)&
regimen‐has‐drug(?regimen,azt)
the regimen is of at least 24 weeks. dura+on(?regimen)≥weeks(24)
note “the regimen” is outside of the scope of the quantifier for ?regimen.
treated by squeezing the new condition inside the scope of the quantifier.
waldinger quadri 8
![Page 9: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/9.jpg)
donkey anaphora a patient has a regimen with azt. exists(?pa+ent,?regimen)pa+ent‐has‐regimen(?pa+ent,?regimen)&
regimen‐has‐drug(?regimen,azt)
the regimen is of at least 24 weeks. dura+on(?regimen)≥weeks(24)
note “the regimen” is outside of the scope of the quantifier for ?regimen.
treated by squeezing the new condition inside the scope of the quantifier.
waldinger quadri 9
![Page 10: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/10.jpg)
cardinality quantifiers
the regimen has a least 2 drugs. exists(≥2?drug)
regimen‐has‐drug(?regimen,?drug)
translated into exists(?drug1)regimen‐has‐drug(?regimen,?drug1)&
exists(≥1?drug)
regimen‐has‐drug(?regimen,?drug)&
?drug≠?drug1
or card(drugs‐ofregimen(?regimen)≥2
waldinger quadri 10
![Page 11: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/11.jpg)
bridge anaphora find a patient with a tce.
(failing regimen)(salvage regimen) The patient has a high viral load 24
weeks before the baseline. what is the “baseline”?
waldinger quadri 11
![Page 12: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/12.jpg)
evaluation SweetInfo: provides graphical
answers to queries…. evaluation replicates a discovery from
the literature. adding a box to the HIV database
treatment change episode page.
waldinger quadri 12
![Page 13: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/13.jpg)
SweetInfo Display
waldinger quadri 13
What patients had a high viral load after 24 weeks on a regimen with RTV?
![Page 14: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/14.jpg)
metaquadri replace hiv theory with arbitrary
theory. introduce vocabulary. pass sort structure back into parser
to remove ambiguities. allow new axioms to be introduced as
declarative English sentences.
waldinger quadri 14
![Page 15: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/15.jpg)
waldinger quadri 15
![Page 16: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/16.jpg)
what’s the problem? provide access to novice users–
physicians and researchers. a single query can require access to
multiple databases. answers may need to be deduced or
computed. database languages (e.g. sql) require
specialized expertise.
waldinger quadri 16
![Page 17: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/17.jpg)
how is this different from google, watson, siri, etc.?
understanding of question. precise answers to questions. understanding of subject domain. focused subject domain.
. waldinger quadri 17
![Page 18: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/18.jpg)
our approach ask questions in english. translate into a logical form. reason in a theory of the subject
domain (HIV treatment). allow the reasoner to access
appropriate databases.
waldinger quadri 18
![Page 19: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/19.jpg)
the quadri team natural language—parc.
cleo condoravdi (now stanford csli) dan bobrow kyle richardson (now university of stuttgart)
reasoning—sri richard waldinger tomer altman
database and hiv expertise—stanford amar das robert shafer soo-yon rhee
funding: NIH National Library of Medicine
waldinger quadri 19
![Page 20: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/20.jpg)
hiv ontology patients regimens drugs viral loads mutations (genetic tests)
stanford hiv database shafer, rhee
waldinger quadri 20
![Page 21: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/21.jpg)
example What patients on azt exhibited a high
viral load? parc’s xle translates into logical form
(a theorem). exists(?patient)[patient-has-regimen… sri’s snark proves theorem and
extracts answer from proof. patient-id(605) …. stanford’s hiv-db (and others) provides
data. waldinger quadri 21
![Page 22: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/22.jpg)
axiomatic hiv theory defines concepts in query language. describes capabilities of data
sources. provides background knowledge to link
them together. sorted axiomatic theory. independent of any one data source.
includes ontology.
waldinger quadri 22
![Page 23: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/23.jpg)
sample axiom high(viral-load, ?measurement) ⇔ log(?measurement) ≥ 4
i.e, a viral load measurement is high if and only if its log is greater than or equal to 4.
waldinger quadri 23
![Page 24: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/24.jpg)
challenges in use of natural language
language of query different from language of data source. qualitative vs. quantitative approximate vs. precise
english is highly ambiguous. query may be expressed as a
sequence of questions.
waldinger quadri 24
![Page 25: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/25.jpg)
mapping english to symbols patients on azt ⇒ patient-has-regimen(?patient, ?regimen) & regimen-has-drug(?regimen, azt)
domain dependent. ?regimen implicit.
waldinger quadri 25
![Page 26: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/26.jpg)
ambiguity patients had a regimen with azt.
azt modifies regimen (correct) or azt modifies had (wrong). I had a martini with an olive vs. I had a martini with Olivia. (A martini can have an olive but cannot have
Olivia.)
waldinger quadri 26
![Page 27: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/27.jpg)
approaches to ambiguity use ontology to discard syntactically
plausible but semantically meaningless readings.
e.g., azt is a drug a regimen can have azt. azt cannot have a regimen
waldinger quadri 27
![Page 28: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/28.jpg)
domain knowledge reduces ambiguity Find patients who had a high viral load after
24 weeks on a regimen with azt.
62 readings without subject domain knowledge.
1 reading with subject domain knowledge.
waldinger quadri 28
![Page 29: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/29.jpg)
logical form Find patients who had a high viral load after more
than 24 weeks on a regimen with azt. ex(?pat, ?reg) patient-has-regimen(?pat, ?reg) & regimen-has-drug(?reg, azt) & ex(?viral-test, ?time-point) patient-has-test(?pat, ?viral-test) & test-has-time(?viral-test, ?time-point) & test-has-result(?viral-test, ?test-result) & submeasurement(viral-load, ?test-result, high) & ex(?time-interval) duration(?time-interval) ≥ 24*weeks & start-time(?time-interval) = start-time(?regimen) & finish-time(?time-interval) = ?time-point.
waldinger quadri 29
![Page 30: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/30.jpg)
playback logical form(s) translated back into
unambiguous (if clunky) English. user may select among alternatives. user may rephrase query if necessary.
waldinger quadri 30
![Page 31: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/31.jpg)
playback example
english: Find patients who have no regimens with azt.
playback: there exists a patient1 such that for all regimen2's, patient1 is a patient and it is not so that patient1 has regimen2 and regimen2 has azt
waldinger quadri 31
![Page 32: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/32.jpg)
theorem proving: SNARK automatic first-order logic. includes ontology reasoning. answers to queries extracted from
proof. special procedures for temporal
reasoning. procedural attachment.
waldinger quadri 32
![Page 33: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/33.jpg)
procedural attachment symbol in theory linked to
access of a table in data source. other procedures
when the symbol occurs in the proof search, the procedure is invoked.
result of the procedure is introduced into the proof.
axiomatic theory virtually extended. e.g. patient-has-regimen(patient17, ?regimen)
waldinger quadri 33
![Page 34: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/34.jpg)
procedural attachments to multiple data sources
patient-has-regimen, patient-has-test the stanford hiv drug resistance data
base. other american and european sources
planned.
waldinger quadri 34
![Page 35: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/35.jpg)
display answers multiple proofs: multiple answers. user may request visual display of
answers. SweetInfo project (Stanford)
provides visual display of HIV data. evaluation of Quadri anticipated using
SweetInfo test questions.
waldinger quadri 35
![Page 36: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/36.jpg)
SweetInfo Display
waldinger quadri 36
What patients had a high viral load after 24 weeks on a regimen with RTV?
![Page 37: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/37.jpg)
explanation axioms and procedural attachments
from the proof are used to construct an English paragraph that explains and justifies the answer and provides the provenance of the data invoked.
waldinger quadri 37
![Page 38: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/38.jpg)
Sample Explanation A patient has a high viral load if the log of the viral
load is at least 4. The duration of a time interval is the difference between its finish point and its start point. ….
Patient 378 was on a regimen that started 29 August 1993. Patient 378 had a viral load of log 5 on 30 November 1999. …
English transcriptions of axioms results of procedural attachments.
waldinger quadri 38
![Page 39: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/39.jpg)
complex queries: multiple questions
Find patients who exhibited m184v. The patients were on azt. The patients had a high viral load
after more than 24 weeks on the regimen.
waldinger quadri 39
![Page 40: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/40.jpg)
testing allow access to quadri via the
stanford hiv database–the quadri box!
waldinger quadri 40
![Page 41: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/41.jpg)
future work expressing new knowledge in English. adaptation to new subject domains
breast cancer providing health care information to
patients.
waldinger quadri 41
![Page 42: quadri: bumps in the road from language to data filequadri: bumps in the road from language to data presented by richard waldinger joint work with cleo condoravdi danny bobrow, kyle](https://reader031.fdocuments.net/reader031/viewer/2022031209/5bc7a75209d3f22f268b7d03/html5/thumbnails/42.jpg)
new knowledge: high viral load English: A viral load is high if and only
if the log of the viral load is greater than or equal to 4.
Logic: high(viral-load, ?measurement) ⇔ log(?measurement) ≥ 4
waldinger quadri 42