Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

37
Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries Danyun Xu, Gong Cheng , Yuzhong Qu Nanjing University, China Presented at ESWC 2014, Crete, Greece

description

Presented at ESWC2014, Crete, Greece.

Transcript of Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Page 1: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Facilitating Human Intervention in

Coreference Resolution with Comparative Entity Summaries

Danyun Xu, Gong Cheng, Yuzhong QuNanjing University, China

Presented at ESWC 2014, Crete, Greece

Page 2: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Coreference resolution

TimBL

givenName: “Tim”

surname: “Berners-Lee”altName: “Tim BL”

type: Scientist

gender: “male”

isDirectorOf: W3C

TBL

name: “Tim Berners-Lee”type: ComputerScientist

type: RoyalSocietyFellow

sex: “Male”

invented: WWW

founded: WSRI

Wendy

fullName: “Wendy Hall”

type: ComputerScientist

type: RoyalSocietyFellow

sex: “Female”

birthplace: London

founded: WSRI

Page 3: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Methods with humans in the loop(or, coordinating “ings”)• Active learning• Crowdsourcing• Pay-as-you-go

Page 4: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Methods with humans in the loop(or, coordinating “ings”)• Active learning• Crowdsourcing• Pay-as-you-go

Candidate coreferent entities…TimBL ------Wendy

TimBL ------TBLChrisB ------Bizer…

Select & Present

Verify

Page 5: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Methods with humans in the loop(or, coordinating “ings”)• Active learning• Crowdsourcing• Pay-as-you-go

Candidate coreferent entities…TimBL ------Wendy

TimBL ------TBLChrisB ------Bizer…

Select & Present

Verify

Existing focus

Page 6: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Methods with humans in the loop(or, coordinating “ings”)• Active learning• Crowdsourcing• Pay-as-you-go

Candidate coreferent entities…TimBL ------Wendy

TimBL ------TBLChrisB ------Bizer…

Select & Present

Verify

Our focus

Page 7: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Present entire entity descriptions?

Page 8: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Present a compact comparative summary!

givenName: “Tim”

surname: “Berners-Lee”

isDirectorOf: W3C

name: “Tim Berners-Lee”invented: WWW

Page 9: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Present a compact comparative summary!

Which property-value (PV) pairs are more helpful?

Page 10: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Four aspects of a good comparative summary

1. Reflecting commonality2. Reflecting difference3. Providing information on identity4. Providing diverse information

Page 11: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

1. Commonality

• Common PV pairs = comparable properties + similar values

TimBL

givenName: “Tim”

surname: “Berners-Lee”altName: “Tim BL”

type: Scientist

gender: “male”

isDirectorOf: W3C

TBL

name: “Tim Berners-Lee”type: ComputerScientist

type: RoyalSocietyFellow

sex: “Male”

invented: WWW

founded: WSRI

Page 12: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

1. Commonality

• Common PV pairs = comparable properties + similar values• More helpful properties =

more like an Inverse Functional Property (IFP)

TimBL

givenName: “Tim”

surname: “Berners-Lee”altName: “Tim BL”

type: Scientist

gender: “male”

isDirectorOf: W3C

TBL

name: “Tim Berners-Lee”type: ComputerScientist

type: RoyalSocietyFellow

sex: “Male”

invented: WWW

founded: WSRI

Page 13: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

1. Commonality (details)

• Comparability between properties• Learned from known coreferent entities

• String similarityComparable properties = Properties having similar values

Page 14: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

1. Commonality (details)

• Comparability between properties• Learned from known coreferent entities

• String similarity

• Similarity between values• String similarity

Comparable properties = Properties having similar values

Page 15: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

1. Commonality (details)

• Comparability between properties• Learned from known coreferent entities

• String similarity

• Similarity between values• String similarity

• Likeness to an IFP• Estimated based on the data set

𝐿𝑖𝑘𝑒𝑛𝑒𝑠𝑠=𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑖𝑠𝑡𝑖𝑛𝑐𝑡𝑣𝑎𝑙𝑢𝑒𝑠

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠

Comparable properties = Properties having similar values

Page 16: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

1. Commonality (weakness)

• Only reflecting commonality can be misleading.

TBL

name: “Tim Berners-Lee”type: ComputerScientist

type: RoyalSocietyFellow

sex: “Male”

invented: WWW

founded: WSRI

Wendy

fullName: “Wendy Hall”

type: ComputerScientist

type: RoyalSocietyFellow

sex: “Female”

birthplace: London

founded: WSRI

Page 17: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

2. Difference

• Different PV pairs = comparable properties + dissimilar values

TBL

name: “Tim Berners-Lee”type: ComputerScientist

type: RoyalSocietyFellow

sex: “Male”

invented: WWW

founded: WSRI

Wendy

fullName: “Wendy Hall”

type: ComputerScientist

type: RoyalSocietyFellow

sex: “Female”

birthplace: London

founded: WSRI

Page 18: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

2. Difference

• Different PV pairs = comparable properties + dissimilar values• More helpful properties =

more like a Functional Property (FP)

TBL

name: “Tim Berners-Lee”type: ComputerScientist

type: RoyalSocietyFellow

sex: “Male”

invented: WWW

founded: WSRI

Wendy

fullName: “Wendy Hall”

type: ComputerScientist

type: RoyalSocietyFellow

sex: “Female”

birthplace: London

founded: WSRI

Page 19: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

2. Difference (details)

• Comparability between properties• Learned from known coreferent entities• String similarity

• Dissimilarity between values• String similarity

• Likeness to a FP• Estimated based on the data set

𝐿𝑖𝑘𝑒𝑛𝑒𝑠𝑠=𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑖𝑠𝑡𝑖𝑛𝑐𝑡 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑙𝑙 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠

Page 20: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

3. Information on identity

TimBL

givenName: “Tim”

surname: “Berners-Lee”altName: “Tim BL”

type: Scientist

gender: “male”

isDirectorOf: W3C

TBL

name: “Tim Berners-Lee”type: ComputerScientist

type: RoyalSocietyFellow

sex: “Male”

invented: WWW

founded: WSRI

Page 21: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

3. Information on identity (details)

• Information on identity• Estimated based on the data set

𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛=1−log (𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑛𝑡𝑖𝑡𝑖𝑒𝑠h𝑎𝑣𝑖𝑛𝑔 h𝑡 𝑖𝑠 𝑃𝑉 𝑝𝑎𝑖𝑟 )

log (𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑙𝑙𝑒𝑛𝑡𝑖𝑡𝑖𝑒𝑠 )

Page 22: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

4. Diversity of information

• Overlapping PV pairs = similar properties or similar values

TimBL

givenName: “Tim”

surname: “Berners-Lee”altName: “Tim BL”

type: Scientist

gender: “male”

isDirectorOf: W3C

Overlapping

Page 23: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

To find an optimal summary(or, to find the most helpful PV pairs)

• Maximize• Commonality• Difference• Information on identity• Diversity of information

• Subject to• A length limit

Page 24: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

To find an optimal summary(or, to find the most helpful PV pairs)

• Maximize• Commonality• Difference• Information on identity• Diversity of information

• Subject to• A length limit

• Formulated as a binary quadratic knapsack problem• Solved by GRASP-based local search

Page 25: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Evaluation method

• 4 approaches to be blindly tested• 20 subjects (university students)• 24 random tasks for each subject• 4 approaches * (3 positive cases + 3 negative cases)• Sorted in random order

givenName: “Tim”

surname: “Berners-Lee”

isDirectorOf: W3C

name: “Tim Berners-Lee”

invented: WWW

Entity summary Subject

Coreferent Non-coreferent Not sure

Present

Verify

Page 26: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Data sets and tasks

• Data setsPlaces

Films

Page 27: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Data sets and tasks

• Data sets

• Tasks

http://dbpedia.org/resource/Paris,_Texas

http://dbpedia.org/resource/Paris

http://sws.geonames.org/4717560/

http://sws.geonames.org/2988507/

sameAs(positive case)

sameAs(positive case)

Places

Films

Page 28: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Data sets and tasks

• Data sets

• Tasks

Paris

http://dbpedia.org/resource/Paris,_Texas

http://dbpedia.org/resource/Paris

http://sws.geonames.org/4717560/

http://sws.geonames.org/2988507/

disambiguates

sameAs(positive case)

sameAs(positive case)

(negative cases)

Places

Films

Page 29: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Approaches

Approach Description

NOSUMM Present entire entity descriptions

GENERIC • Information on identity [3]• Diversity of information

COMPSUMM • Commonality• Difference• Information on identity• Diversity of information

COMPSUMM-C • Commonality• Difference• Information on identity• Diversity of information

[3] Gong Cheng et al. RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization (ISWC 2011)

Page 30: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Results (1)

• Accuracy of verification• COMPSUMM ≈ NOSUMM

> COMPSUMM-C> GENERIC

Page 31: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Results (2)

• Efficiency of verification• COMPSUMM > NOSUMM (2.7—2.9 times faster)

Page 32: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Take-home messages

• Provide entity summaries for verifying coreference.• improves efficiency (2.7—2.9 times faster)• without notably affecting accuracy

• Provide comparative (but not just generic) summaries.• Show both commonality and difference.

Page 33: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Future work

• Present = Summarize + Visualize

Candidate coreferent entities…TimBL ------Wendy

TimBL ------TBLChrisB ------Bizer…

Select & Present

Verify

Our focus

Page 34: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Thanks for your attention

Page 35: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Results (3)

• Erroneous decisions• COMPSUMM-C > COMPSUMM (mostly in negative

cases)

Page 36: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Performance testing

• Offline computation• Comparability between properties (the learning part)• Likeness to an IFP/FP• Information on identity

Page 37: Facilitating Human Intervention in Coreference Resolution with Comparative Entity Summaries

Performance testing

• Offline computation• Comparability between properties (the learning part)• Likeness to an IFP/FP• Information on identity

• Online computation• Similarity between properties/values• Optimization

• Results• Places (DBpedia and GeoNames): 24ms per case• Films (DBpedia and LinkedMDB): 35ms per case