©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical...
-
date post
21-Dec-2015 -
Category
Documents
-
view
224 -
download
1
Transcript of ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical...
![Page 1: ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade .](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d545503460f94a31606/html5/thumbnails/1.jpg)
©2007 H5
Simultaneous Achievement of
high Precision and high Recall through
Socio-Technical Information Retrieval
Robert S. Bauer, Teresa Jadewww.H5technologies.com
&Mitchell P. Marcus
www.cis.upenn.edu/~mitch/
June 7, 2007
STIR:
![Page 2: ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade .](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d545503460f94a31606/html5/thumbnails/2.jpg)
©2007 H5 Slide of 9
The e-Discovery IDEAL: High P with High R• Find every relevant
document& only those docs that are relevant
• Desired
P=0.8 (or better)@
R=0.8 (or better)
• Acceptable
P=2/3 (or better)@
R=2/3 (or better)
1
![Page 3: ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade .](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d545503460f94a31606/html5/thumbnails/3.jpg)
©2007 H5 Slide of 9
The e-Discovery REALITY
High P & Low R= RISK (important docs not retrieved)
Low P & High R= COST (many more documents must be reviewed)
TextREtrivalConference
1
![Page 4: ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade .](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d545503460f94a31606/html5/thumbnails/4.jpg)
©2007 H5 Slide of 9
Agenda
• Results– TREC ad hoc (= typical)– Queries typifying Communities of Practice (CoPs)
• e-Discovery Approaches– 5 Dimensions– Linguistics of CoPs
• Research Issues– TREC– AI– Linguists– Lawyers
2
![Page 5: ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade .](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d545503460f94a31606/html5/thumbnails/5.jpg)
©2007 H5 Slide of 9
Typical Results – ad hoc queries
(from Chapter 3, “Retrieval System Evaluation” by Chris Buckley and Ellen M. Voorhees, inTREC: Experiment and Evaluation in Information Retrieval, Voorhees & Harman, ed., MIT Press, 2005, p62, Fig. 3.1)
• 22 Topics
• Average
• Desiredis Rare
• Acceptable< 10%
3
![Page 6: ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade .](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d545503460f94a31606/html5/thumbnails/6.jpg)
©2007 H5 Slide of 9
compared with STIR topical avg in 4 cases (I-IV) encompassing 42 topics
Accuracy Metrics
Most accurate TREC results for 20 of 22 topics in one test case
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 LR KR MV Z
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
LR
KR
MV
Z
Ideal
TREC avg
Acceptable
F1 =
2. (
P. R
)/(P
+R
)
I II III IV
4
![Page 7: ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade .](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d545503460f94a31606/html5/thumbnails/7.jpg)
©2007 H5 Slide of 9
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Recall
Pre
cisi
o
TREC Lone Ranger Knight Rider Miami Vice
Recall
Pre
cisi
on
• Average P & R for each case
STIR compared with TREC IR
Topical P & R results for one TREC and 4 STIR cases
STIR
TREC
5
![Page 8: ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade .](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d545503460f94a31606/html5/thumbnails/8.jpg)
©2007 H5 Slide of 9
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Recall
Pre
cisi
on
Category 1
Category 2
Category 3
Category 4
Category 5
Category 6
Category 7
Category 8
Category 9
Category 10
Category 11
Category 12
Recall Improvement
Sampled Corpus Tests for 12 Topics in case I during STIR Training
Recall
Pre
cisi
on ● STIR training provides substantial Recall improvement with acceptable Precision reduction
5
Retrieval Acceptableto lowest limitof statistical uncertainty
![Page 9: ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade .](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d545503460f94a31606/html5/thumbnails/9.jpg)
©2007 H5 Slide of 9
Agenda
• Results– TREC ad hoc (= typical)– Queries typifying Communities of Practice (CoPs)
• e-Discovery Approaches– 5 Dimensions– Linguistics of CoPs
• Research Issues– TREC– AI– Linguists– Lawyers
6
![Page 10: ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade .](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d545503460f94a31606/html5/thumbnails/10.jpg)
©2007 H5 Slide of 9
Dimensions of e-Discovery
SubjectSubjectMatterMatter
LegalLegalCaseCase
LinguisticsLinguisticsDocumentsDocuments CommunityCommunity
7
![Page 11: ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade .](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d545503460f94a31606/html5/thumbnails/11.jpg)
©2007 H5 Slide of 9
Dimensions of e-Discovery: Document Review
LegalLegalCaseCase
DocumentsDocuments
Example Systems:• Manual (human)
review conducted by attorneys
• Basic keyword searches targeted to legal issues
• Supervised learning with relevance feedback
7
![Page 12: ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade .](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d545503460f94a31606/html5/thumbnails/12.jpg)
©2007 H5 Slide of 9
Dimensions of e-Discovery: Expert Search
SubjectSubjectMatterMatter
LegalLegalCaseCase
DocumentsDocuments
Example Systems:• Subject matter
experts reviewresults under legal team direction
● Domain-specificlexicons used
7
![Page 13: ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade .](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d545503460f94a31606/html5/thumbnails/13.jpg)
©2007 H5 Slide of 9
Dimensions of e-Discovery: Model Meaning
SubjectSubjectMatterMatter
LegalLegalCaseCase
LinguisticsLinguisticsDocumentsDocuments
Example Systems:• Supervised
learning with– relevance
feedback– semantic analysis
● Semantic search
7
![Page 14: ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade .](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d545503460f94a31606/html5/thumbnails/14.jpg)
©2007 H5 Slide of 9
Dimensions of e-Discovery: Model Communities
SubjectSubjectMatterMatter
LegalLegalCaseCase
LinguisticsLinguisticsDocumentsDocuments CommunityCommunity
Example System:● Socio-
Technical-IR
7
![Page 15: ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade .](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d545503460f94a31606/html5/thumbnails/15.jpg)
©2007 H5 Slide of 9
Dimensions of e-Discovery: Socio-Technical-IR
LinguisticsLinguistics CommunityCommunity
• Non-computational Linguistic Disciplines– Pragmatics– Socio-
Linguistics– Ethno-
Methodology– Discourse
Analysis
• A community of practice is– a diverse group of people– engaged in real work– over a significant period of time– developing their own tools, language, and processes– during which they build things, solve problems, learn and invent– evolving a practice that is highly skilled and highly creative
7
![Page 16: ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade .](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d545503460f94a31606/html5/thumbnails/16.jpg)
©2007 H5 Slide of 9
Agenda
• Results– TREC ad hoc (= typical)– Queries typifying Communities of Practice (CoPs)
• e-Discovery Approaches– 5 Dimensions– Linguistics of CoPs
• Research Issues– TREC– AI– Linguists– Lawyers
8
![Page 17: ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade .](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d545503460f94a31606/html5/thumbnails/17.jpg)
©2007 H5 Slide of 9
Research Issues• TREC
– Nature of the relatively rare high P with high R queries– Measuring both recall and precision effectively
• AI– Knowledge-Based (Expert) Systems that codify linguistic expertise– Characterize practice communities of subject matter experts– Investigate combination systems applied to different types of topics
• Linguists– Identify and characterize different types of topics and map to system
types– Language patterns in communities as well as subject matter fields– Defining categories in concrete terms
• Lawyers– Defining categories in concrete terms– Integration of technology and processes
9
![Page 18: ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade .](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d545503460f94a31606/html5/thumbnails/18.jpg)
©2007 H5 Slide of 9
Back-Up
![Page 19: ©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade .](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d545503460f94a31606/html5/thumbnails/19.jpg)
©2007 H5 Slide of 9
STIR Analysis: CoPs’ Enunciatory language
RelevantDocument
Text
State of Affairs
Object
Process
Action
Fact
Event