Data-Driven Response Generation...Alan Ritter, Colin Cherry, Bill Dolan (EMNLP 2011) “Data-Driven...
Transcript of Data-Driven Response Generation...Alan Ritter, Colin Cherry, Bill Dolan (EMNLP 2011) “Data-Driven...
Alan RitterOhio State University
Data-Driven Response Generation
1950s ~ 2010 Dialog systems mostly rule-based
Alan Ritter (Ohio State University)
Rule-Based: Eliza (Weizenbaum 1966)
Goal-Directed Dialogue Systems:
Information Retrieval (Isbell et. al. 2000)
ATIS Dataset (Hemphill, 1990)
-774 flight reservation conversations-Manually annotated
Chatbots:
1990s ~ 2010s Data-Driven Machine Translation
millions of bilingual documents on the web
Alan Ritter (Ohio State University)
Findings of WMT 2010 (Callison-Burch et. al.) The Mathematics of Statistical Machine Translation: Parameter Estimation (Brown et. al.)
July 2011 Data-Driven Dialogue
500 million conversations per month on Twitter alone
Alan Ritter, Colin Cherry, Bill Dolan (EMNLP 2011) “Data-Driven Response Generation in Social Media”
(vs. 30m for French-English translation)
July 2011 Data-Driven Dialogue
500 million conversations per month on Twitter alone
Alan Ritter, Colin Cherry, Bill Dolan (EMNLP 2011) “Data-Driven Response Generation in Social Media”
Named Entity Recognition (Ritter et. al. EMNLP 2011)
Open-Domain Event Extraction (Ritter et. al. KDD 2012)
Unsupervised Dialogue Acts (Ritter, Cherry, Dolan, NAACL 2010)
NLP on Noisy User-Generated Text:
Minimally-Supervised Event Extraction (Ritter et. al. WWW 2015)
(vs. 30m for French-English translation)
July 2011 Data-Driven Dialogue
500 million conversations per month on Twitter alone
Alan Ritter, Colin Cherry, Bill Dolan (EMNLP 2011) “Data-Driven Response Generation in Social Media”
MTDialogue
… and they lived happily ever after.
(vs. 30m for French-English translation)
Alan Ritter, Colin Cherry, Bill Dolan (EMNLP 2011) “Data-Driven Response Generation in Social Media”
But, unlike MT, conversations are not semantically equivalent.
Who wants to come over for dinner tomorrow?Input:
Alan Ritter, Colin Cherry, Bill Dolan (EMNLP 2011) “Data-Driven Response Generation in Social Media”
Who wants to come over for dinner tomorrow?Input:
Output:
Yum ! I
{Alan Ritter, Colin Cherry, Bill Dolan (EMNLP 2011) “Data-Driven Response Generation in Social Media”
Who wants to come over for dinner tomorrow?Input:
Output:{
want toYum ! I
{Alan Ritter, Colin Cherry, Bill Dolan (EMNLP 2011) “Data-Driven Response Generation in Social Media”
Who wants to come over for dinner tomorrow?Input:
Output:{
want toYum ! I
{be there
{Alan Ritter, Colin Cherry, Bill Dolan (EMNLP 2011) “Data-Driven Response Generation in Social Media”
Who wants to come over for dinner tomorrow?Input:
Output:{
want toYum ! I
{be there
{tomorrow !
{Alan Ritter, Colin Cherry, Bill Dolan (EMNLP 2011) “Data-Driven Response Generation in Social Media”
2015 ~ present Neural MT-based Conversation Models
Alan Ritter, Colin Cherry, Bill Dolan (EMNLP 2011) “Data-Driven Response Generation in Social Media”
• I. Serban, A. Sordoni, Y. Bengio, A. Courville and J. Pineau. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Networks. In Proc of AAAI, 2016.
• Jesse Dodge, Andreea Gane, Xiang Zhang, Antoine Bordes, Sumit Chopra, Alexander Miller, Arthur Szlam, Jason Weston. Evaluating Prerequisite Qualities for Learning End-to-end Dialog Systems, ICLR 2016
• Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Meg Mitchell, Jian-Yun Nie, Jianfeng Gao, and Bill Dolan, A Neural Network Approach to Context-Sensitive Generation of Conversational Responses. NAACL 2015
• Lifeng Shang, Zhengdong Lu, Hang Li. Neural Responding Machine for Short Text Conversation. ACL 2015
• O. Vinyals, Q.V. Le. A Neural Conversational Model. ICML Deep Learning Workshop 2015
• Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao and Bill Dolan. A Diversity-Promoting Objective Function for Neural Conversation Models. NAACL 2016
Alan Ritter, Colin Cherry, Bill Dolan (EMNLP 2011) “Data-Driven Response Generation in Social Media”
But, maximum likelihood estimate responses can be safe and boring
arg max
r1,...,rlP (r1, . . . , rl|m1, . . . ,mk)
Input MessageResponse
Alan Ritter, Colin Cherry, Bill Dolan (EMNLP 2011) “Data-Driven Response Generation in Social Media”
But, maximum likelihood estimate responses can be safe and boring
Some replies work for almost any input:
arg max
r1,...,rlP (r1, . . . , rl|m1, . . . ,mk)
Input MessageResponse
Alan Ritter, Colin Cherry, Bill Dolan (EMNLP 2011) “Data-Driven Response Generation in Social Media”
But, maximum likelihood estimate responses can be safe and boring
Some replies work for almost any input:
“I don’t know”
arg max
r1,...,rlP (r1, . . . , rl|m1, . . . ,mk)
Input MessageResponse
2016 Neural Dialogue with Deep Reinforcement Learning
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky (EMNLP 2016) “Deep Reinforcement Learning for Dialogue Generation”
2016 Neural Dialogue with Deep Reinforcement Learning
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky (EMNLP 2016) “Deep Reinforcement Learning for Dialogue Generation”
Jiwei Li (PhD Stanford 2017)
How old are you ?
Problem: Short-sighted conversation decisions.
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky (EMNLP 2016) “Deep Reinforcement Learning for Dialogue Generation”
How old are you ?
i 'm 16 .
Problem: Short-sighted conversation decisions.
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky (EMNLP 2016) “Deep Reinforcement Learning for Dialogue Generation”
How old are you ?
i 'm 16 .
16 ?
Problem: Short-sighted conversation decisions.
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky (EMNLP 2016) “Deep Reinforcement Learning for Dialogue Generation”
How old are you ?
i 'm 16 .
16 ?
i don 't know what you 're talking about
Problem: Short-sighted conversation decisions.
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky (EMNLP 2016) “Deep Reinforcement Learning for Dialogue Generation”
How old are you ?
i 'm 16 .
16 ?
i don 't know what you 're talking about
Problem: Short-sighted conversation decisions.
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky (EMNLP 2016) “Deep Reinforcement Learning for Dialogue Generation”
How old are you ?
i 'm 16 .
16 ?
i don 't know what you 're talking about
you don 't know what you 're saying
Problem: Short-sighted conversation decisions.
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky (EMNLP 2016) “Deep Reinforcement Learning for Dialogue Generation”
How old are you ?
i 'm 16 .
16 ?
i don 't know what you 're talking about
you don 't know what you 're saying
i don 't know what you 're talking about
Problem: Short-sighted conversation decisions.
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky (EMNLP 2016) “Deep Reinforcement Learning for Dialogue Generation”
How old are you ?
i 'm 16 .
16 ?
i don 't know what you 're talking about
you don 't know what you 're saying
i don 't know what you 're talking about
you don 't know what you 're saying
Problem: Short-sighted conversation decisions.
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky (EMNLP 2016) “Deep Reinforcement Learning for Dialogue Generation”
How old are you ?
i 'm 16 .
16 ?
i don 't know what you 're talking about
you don 't know what you 're saying
i don 't know what you 're talking about
you don 't know what you 're saying
Bad Action
Problem: Short-sighted conversation decisions.
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky (EMNLP 2016) “Deep Reinforcement Learning for Dialogue Generation”
How old are you ?
i 'm 16 .
16 ?
i don 't know what you 're talking about
you don 't know what you 're saying
i don 't know what you 're talking about
you don 't know what you 're saying Outcome
Problem: Short-sighted conversation decisions.
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky (EMNLP 2016) “Deep Reinforcement Learning for Dialogue Generation”
How old are you ?
i 'm 16 .
16 ?
i don 't know what you 're talking about
you don 't know what you 're saying
i don 't know what you 're talking about
you don 't know what you 're saying Outcome does not emerge
until a few turns later
Can Reinforcement Learning Handle This?
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky (EMNLP 2016) “Deep Reinforcement Learning for Dialogue Generation”
How old are you ?
how old are you
Encoding
Notations: State
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky (EMNLP 2016) “Deep Reinforcement Learning for Dialogue Generation”
How old are you ?
i 'm 16 .
I’m 16 . EOS
Decoding
EOS I’m fine .how old are you
Encoding
Notations: Action
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky (EMNLP 2016) “Deep Reinforcement Learning for Dialogue Generation”
A message from training set
Encode
r1DecodeEncode
r2Decode
…
Simulation
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky (EMNLP 2016) “Deep Reinforcement Learning for Dialogue Generation”
S1 S2 Sn
Compute Accumulated Reward R(S1,S2,…,Sn)
Input Message
Encode Decode
Turn 1
Encode
Turn 2
Decode Encode
…
Decode
Turn N
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky (EMNLP 2016) “Deep Reinforcement Learning for Dialogue Generation”
S1 S2 Sn
REINFORCE Algorithm (William,1992)
Input Message
Encode Decode
Turn 1
Encode
Turn 2
Decode Encode
…
Decode
Turn N
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky (EMNLP 2016) “Deep Reinforcement Learning for Dialogue Generation”
Policy Gradient Methods:
S1 S2 Sn
REINFORCE Algorithm (William,1992)
What we want to learn
Input Message
Encode Decode
Turn 1
Encode
Turn 2
Decode Encode
…
Decode
Turn N
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, D. Jurafsky (EMNLP 2016) “Deep Reinforcement Learning for Dialogue Generation”
Policy Gradient Methods:
Q: How to a Specify Reward Signal?
(J. Li, W. Monroe, T. Shi, S. Jean, A. Ritter, D. Jurafsky (EMNLP 2017) “Adversarial Learning for Neural Dialogue Generation”
A: Turing Test
Adversarial Learning(Goodfellow et al., 2014)
(J. Li, W. Monroe, T. Shi, S. Jean, A. Ritter, D. Jurafsky (EMNLP 2017) “Adversarial Learning for Neural Dialogue Generation”
Q: How to a Specify Reward Signal?
A: Turing Test
Real-world conversations
Response Generator
generate response
sample human response
Discriminator Real or Fake?
(J. Li, W. Monroe, T. Shi, S. Jean, A. Ritter, D. Jurafsky (EMNLP 2017) “Adversarial Learning for Neural Dialogue Generation”
Adversarial Learning for Neural Dialogue
Real-world conversations
Response Generator
generate response
sample human response
Discriminator
(Alternate Between Training Generator and Discriminator)
Real or Fake?
(J. Li, W. Monroe, T. Shi, S. Jean, A. Ritter, D. Jurafsky (EMNLP 2017) “Adversarial Learning for Neural Dialogue Generation”
Adversarial Learning for Neural Dialogue
Real-world conversations
Response Generator
Discriminator
(Alternate Between Training Generator and Discriminator)
REINFORCE Algorithm (Williams,1992)
Real or Fake?
generate response
sample human response
(J. Li, W. Monroe, T. Shi, S. Jean, A. Ritter, D. Jurafsky (EMNLP 2017) “Adversarial Learning for Neural Dialogue Generation”
Adversarial Learning for Neural Dialogue
Human Evaluator:
Machine Evaluator:
Adversarial Success (How often can you fool a machine)
Adversarial Learning 8.0%Standard Seq2Seq model 4.9%
Adversarial Win
Adversarial Lose
Tie
62% 18% 20%
Adversarial Learning Improves Response Generationvs a vanilla generation model
(J. Li, W. Monroe, T. Shi, S. Jean, A. Ritter, D. Jurafsky (EMNLP 2017) “Adversarial Learning for Neural Dialogue Generation”
34
Extract Entities,
Relations and Events
Barack Obama
Hawaii
Born in
United States
President
David Ige
Mayor
Spouse
Michelle Obama Alma
Mater
Princeton
Honolulu
Capitol
Future: Integrating dynamic knowledge graphs
(A. Konovalov, B. Strauss, A. Ritter and B. O'Connor (WWW 2017) “Learning to Extract Events from Knowledge Base Revisions”
Takeaways
Alan Ritter (Ohio State University)
MTDialogue
Open-Domain Dialogue
Takeaways
Alan Ritter (Ohio State University)
Learning from Delayed-Reward
MTDialogue
Open-Domain Dialogue
Takeaways
Alan Ritter (Ohio State University)
Learning from Delayed-Reward
MTDialogue
Open-Domain Dialogue
Adversarial Learning for Dialogue
Takeaways
Alan Ritter (Ohio State University)
Learning from Delayed-Reward
MTDialogue
Open-Domain Dialogue
Adversarial Learning for Dialogue
Takeaways
Alan Ritter (Ohio State University)
Thank You!