Discriminative Dialog Analysis Using a Massive Collection of BBS comments
description
Transcript of Discriminative Dialog Analysis Using a Massive Collection of BBS comments
Discriminative Dialog AnalysisUsing a Massive Collection of
BBS comments
Eiji ARAMAKI (University of Tokyo)Takeshi ABEKAWA (University of Tokyo)Yohei MURAKAMI (NICT)Akiyo NADAMOTO (NICT)
Bulletin Board Systems
Why BBS? ← |BBS|>>What sort of text?
|News Wired||Wikipedia|
Japan
IDID namename
please tell me why my nano sometimes stops even battery still remains.please tell me why my nano sometimes stops even battery still remains.
How about iriver N12? extremely light andsmall.How about iriver N12? extremely light andsmall.
It is because battery display approaches approx. Even battery runs out, display sometimes shows it is still left.
It is because battery display approaches approx. Even battery runs out, display sometimes shows it is still left.
iriver N series has stopped producing.iriver N series has stopped producing.
What is the most light or small mp3 player? iPod Shuffle is the best way to do?
What is the most light or small mp3 player? iPod Shuffle is the best way to do?
Not ReplyNot Reply
ReplyReply
IDID namename
please tell me why my nano sometimes stops even battery still remains.please tell me why my nano sometimes stops even battery still remains.
How about iriver N12? extremely light andsmall.How about iriver N12? extremely light andsmall.
It is because battery display approaches approx. Even battery runs out, display sometimes shows it is still left.
It is because battery display approaches approx. Even battery runs out, display sometimes shows it is still left.
iriver N series has stopped producing.iriver N series has stopped producing.
What is the most light or small mp3 player? iPod Shuffle is the best way to do?
What is the most light or small mp3 player? iPod Shuffle is the best way to do?
BUT: NLP suffers from gaps
between corresponding comments
BUT: NLP suffers from gaps
between corresponding comments
ReplyReply
ReplyReply
“N12” is a “small and light” “MP3 player”, but now
“has stopped producing”
“N12” is a “small and light” “MP3 player”, but now
“has stopped producing”
How Often Such Gaps?Gap length (distance) & Frequency
No gap (distance=1) is only 50%No gap (distance=1) is only 50%
Usually distance =2 ~5Usually distance =2 ~5
Gap is a popular phenomenonGap is a popular phenomenon
【 QUESTION 】Despite gaps, how does a human-being capture REPLY-TO relations
【 QUESTION 】Despite gaps, how does a human-being capture REPLY-TO relations
Linguistic already gave several answers
• One of answers is
Not enough!How to calculate
relevance?
Not enough!How to calculate
relevance?
Linguist
Relevance theory [Sperber1986]
Human communication is based on relevance
Human communication is based on relevance
Computer Scientist
【 This study’s GOAL 】To formalize relevance【 This study’s GOAL 】To formalize relevance
Outline
• Background• Method
• Task setting / Our Approach• How to formalize two types of relevance
• Experiment• Related Works• Conclusion
Task-setting• Natural Task-setting = To which a comment reply-to?
i thi thi-1i-1i-2i-2i-3i-3
• INSTEAD: Discriminative Task• Input: Two comments in the same BBS (P & Q)• Output: True (=Q is reply-to P) / False→ Suitable to Machine learning (such as SVM)
PP Trueor
False
→ Complex task
Our Task-setting
Our Approach/Assumption
• 2 types of relevance are available(1) Contents Relevance(1) Contents Relevance
(2) Discourse Relevance (2) Discourse Relevance
What is the most light or small mp3 player?
How about iriver N12? extremely light and small.
Roughly speaking: sentence similarity
Discourse or function of comments
Outline
• Background• Method
• Task setting / Our Approach• How to formalize two types of relevance
• (1) Contents Relevance• (2) Discourse Relevance
• Experiment• Related Works• Conclusion
Two Contents Relevance
• (1) Word Overlap Ratio = 4/12= 0.33
• (2) WebPMI based Sentence Similarity• WebPMI [Bollegala2007] is defined by ↓
What is the most light or small mp3 player?
How about iriver N12? extremely light and small.
1 3 4 5 62
1 3 4 5 62
Simple Word overlap Ratio can not capture mp3 player
iriver N12!!
Web-PMI
For each word in P, search Q’s word with the highest WebPMI, and sum up their
values
For each word in P, search Q’s word with the highest WebPMI, and sum up their
values
# of web pages that contain “N12” &“MP3”
# of web pages that contain “N12” &“MP3”
# of web pages that contains “N12”
# of web pages that contains “N12”
# of web pages that contains “MP3”
# of web pages that contains “MP3”
WEBPMI (p,q)=log H(p∩q) / NH(p) / N ・H(q)/NMutual information of two words in WEB pages
Content Relevance
Outline
• Background• Method
• Task setting• How to formalize two types of relevance
• (1) Contents Relevance• (2) Discourse Relevance
• Experiment• Related Works• Conclusion
Discourse Relevance (CMPI; Corresponding PMI ←newly proposed) • ALSO: PMI-based measure• BUT: Count co- occurring phrases in P and Q
# of P-Q pairs that contain “please tell me why” in P “It is because” in Q
# of P-Q pairs that contain “please tell me why” in P “It is because” in Q
# of Q that contain“It is because ”
# of Q that contain“It is because ”
CPMI (p,q)=log H(p∩q) / NH(p) / N ・H(q)/N!!
please tell me why my nano sometimes stops …It is because battery display approaches …
# of P that contain “please tell me why”
# of P that contain “please tell me why”
• Sometimes (=5.1%), we can easily know a response target by using lexical clues (NAME or COMMNET-ID)
UnknownUnknown
Known5.1 %Known5.1 %
It’s my first comment! Nice to meet you.It’s my first comment! Nice to meet you.
100> nice to meet you..100> nice to meet you..
100100
102102
• Of COURSE: 5.1% is low ratio• OUR SOLUTION: We rely on the data
scale (17,300,000 comments) → enough amount for PHI calculation
Building a collection of P & Q pairs,
by using Lexical-patterns
Outline
• Background• Method• Experiment• Related Works• Conclusion
Experiment 1
• TEST-SET: 140 comment pairs (140 P-Q pairs)• TASK: output Q reply-to P or not• METHODS:
– Human-A,B,C– Overlap: Only overlap ratio– WEBPMI: Only Contents Relevance– CPMI: Only Discourse Relevance– SVM: Feature=
IF ratio > ThThen TRUE Else FALSE
IF ratio > ThThen TRUE Else FALSE
IF PMI > ThThen TRUEElse FALSE
IF PMI > ThThen TRUEElse FALSE
VALUE: WEBPMI & CPMILEXICON: WORDS∈P, Q
Half is positive (extracted by patterns),
the other is random pairs
Half is positive (extracted by patterns),
the other is random pairs
Result Summary
Human-AHuman-A 79.279.2 83.383.3 75.375.3 79.179.1
Human-BHuman-B 75.775.7 78.278.2 73.973.9 76.076.0
Human-CHuman-C 70.770.7 71.671.6 72.672.6 72.172.1
OVERLAPOVERLAP 61.461.4 58.758.7 87.687.6 70.370.3
WEBPMI WEBPMI 61.461.4 72.072.0 42.442.4 53.453.4
CPMICPMI 65.765.7 66.266.2 69.869.8 67.967.9
SVMSVM 63.863.8 64.464.4 79.479.4 72.172.1
AccuracyAccuracy
precisionprecision recallrecall Fβ=1Fβ=1
>
= 70-79%
OVERLAP ≒ WEBPMI
< SVM < CPMIDiscourse is strong
Feature design is not suitable?
Kappa Matrix
• Agreement between methods
Human-BHuman-B Human-CHuman-C OVERLAPOVERLAP WEBPMIWEBPMI CPMICPMI
Human-AHuman-A
Human-AHuman-A
Human-CHuman-C
OVERLAPOVERLAP
WEBPMIWEBPMI
0.560.56 0.490.49 0.080.08 0.200.20 0.280.28
0.470.47 0.090.09 0.210.21 0.250.25
0.150.15 0.050.05 0.250.25
0.210.21 0.130.13
0.160.16
:=Moderate (High) :=Slight(Low)
Human output is similar to each other
WEBPMI & CPMI have low agreement → They succeed in
different examples,This supports our assumption,
which decompose relevance into two: (1) contents & (2) discourse
7.477.47 How about…How about… as soon as possibleas soon as possible
6.726.72 I …I … I…tooI…too
8.438.43 I’d like to goI’d like to go Wait for youWait for you8.378.37 It is in/atIt is in/at7.627.62 Please tell me…Please tell me… I think it is …I think it is …
Where is it …Where is it …
Several Examples of phrase pairs that have high CPHI values
6.806.80 Thank youThank you Your welcomeYour welcome
7.387.38 You can …You can … I try …I try …7.127.12 I think …I think … Thank youThank you6.936.93 …, isn’t it ?…, isn’t it ? MaybeMaybe
PHIPHI PP QQ
ANSWER and THANKINGANSWER and THANKING
These are outside the reach of sentence similarity,
motivating discourse clues
These are outside the reach of sentence similarity,
motivating discourse clues
Event sequence P says “go” & Q says “wait”
Event sequence P says “go” & Q says “wait”
Outline
• Background• Method• Experiment• Related Works (if enough time left)
• Conclusion
Related Works (1/2)in Linguistics
• 4 conversational maxims [Grice1975]• Relevance theory [Sperber1986]
How to calculate maxim/relevance?How to calculate
maxim/relevance?We’ve
formalized it!We’ve
formalized it!
In BBSs, adjacency pairs are not adjacent
This motivates our task
In BBSs, adjacency pairs are not adjacent
This motivates our task
• Adjacency Pair [Schegloff&Sacks1973]Which is a sequence of two utterances (such as “offers-acceptance”)
Related Works (2/2)in NLP
• Previous (Dialog and Discourse) Studies– Such as
– Based on carefully annotated corpus Rich set of labels/relations
• This Study– Only one relation (REPLY-TO relation)– BUT: not require human annotation → large scale→ enable to calculate Statistical Values (PMI)
DAMSL [Core&Allen1997]RST-DT [Carlson2002]Discourse Graph-Bank [Wolf2005]
Outline
• Background• Method• Experiment• Related Works• Conclusion
Conclusion• (1) NEW_TASK
– To Detect REPLY-TO relation in comments
• (2) Formalization for Relevance– To solve the task: We formalize two relevance
CONTENTS & DISCOURSE relevance
• (3) Automatic Corpus Building– To calculate DISCOURSE relevance, we also
proposed pattern based corpus construction
FINALLY: We believe this study will boost larger scale dialog study (using WEB)FINALLY: We believe this study will boost larger scale dialog study (using WEB)