taolusi.github.io · Dialogue State Induction Using Neural Latent Variable Models. Qingkai Min. 1;2...
Transcript of taolusi.github.io · Dialogue State Induction Using Neural Latent Variable Models. Qingkai Min. 1;2...
Dialogue State Induction Using Neural Latent Variable Models
Qingkai Min1;2 , Libo Qin3 , Zhiyang Teng1;2 , Xiao Liu4 and Yue Zhang1;2
1School of Engineering, Westlake University2Institute of Advanced Technology, Westlake Institute for Advanced Study3Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology4School of Computer Science and Technology, Beijing Institute of Technology
目录
Motivation
Method
Experiments
Conclusion
Motivation
CHAPTER 1 What is a task-oriented dialogue system?
Pic from: Gao J, Galley M, Li L. Neural approaches to conversational AI[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018: 1371-1374.
Typical modular architecture
Assist user in solving a task
CHAPTER 1 What is a task-oriented dialogue system?
End-to-end architecture
Assist user in solving a task
Pic from: Gao J, Galley M, Li L. Neural approaches to conversational AI[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018: 1371-1374.
CHAPTER 1 What is a task-oriented dialogue system?
Typical modular architecture
Pic from: Gao J, Galley M, Li L. Neural approaches to conversational AI[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018: 1371-1374.
CHAPTER 1 What is dialogue state tracking (DST)?
The dialogue state represents what the user is looking for at the current turn of the conversation.
CHAPTER 1
User: I want an expensive restaurant that serves Turkish food.
System: Anatolia serves Turkish food.User: What is the area?
Turn 1:inform(price=expensive, food=Turkish)
Turn 2:inform(price=expensive, food=Turkish); request(area)
What is dialogue state tracking (DST)?
The dialogue state represents what the user is looking for at the current turn of the conversation.
CHAPTER 1 Current DST scenarios
CHAPTER 1 Current DST scenarios
NLU DST
Propagatinguncertainty
Traditional DST:
utterance dialogue state
CHAPTER 1
NLU DST
Propagatinguncertainty
Traditional DST:
A single DST component
End-to-end DST:
utterance dialogue state
dialogue stateutterance
Current DST scenarios
CHAPTER 1 End-to-end DST
CHAPTER 1 End-to-end DST
User: I want an expensive restaurant that serves Turkish food.
System: Anatolia serves Turkish food.User: What is the area?
CHAPTER 1 End-to-end DST
User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)
System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)
Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]
CHAPTER 1 End-to-end DST
User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)
System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)
Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]
DST
CHAPTER 1 End-to-end DST
User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)
System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)
Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]
Turn 1:inform(price=expensive, food=Turkish)
Turn 2: inform(price=expensive, food=Turkish); request(area)
DST
CHAPTER 1 Limitation of end-to-end DST
CHAPTER 1 Limitation of end-to-end DST
End-to-end DST paradigm:
manual labeling train testDST modelcorpus
CHAPTER 1 Limitation of end-to-end DST
End-to-end DST paradigm:
manual labeling train testDST modelcorpus
Limitations of end-to-end DST:
CHAPTER 1 Limitation of end-to-end DST
End-to-end DST paradigm:
manual labeling train testDST modelcorpus
Limitations of end-to-end DST:
• Costly and slow: 8438 dialogues with 1249 workers in MultiWOZ 2.0 dataset
CHAPTER 1 Limitation of end-to-end DST
End-to-end DST paradigm:
manual labeling train testDST modelcorpus
Limitations of end-to-end DST:
• Costly and slow: 8438 dialogues with 1249 workers in MultiWOZ 2.0 dataset
• Error-prone:Annotation errors
MultiWOZ 2.0 around 40% [Eric et al., 2019]
MultiWOZ 2.1 over 30% [Zhang et al., 2019]
[Eric et al., 2019] Mihail Eric, Rahul Goel, Shachi Paul, Abhishek Sethi, Sanchit Agarwal, Shuyag Gao, and Dilek Hakkani-Tur. Multiwoz 2.1: Multi-domain dialogue state corrections and state tracking baselines. arXiv, 2019.[Zhang et al., 2019] Jian-Guo Zhang, Kazuma Hashimoto, Chien-Sheng Wu, Yao Wan, Philip S Yu, Richard Socher, and Caiming Xiong. Find or classify? dual strategy for slot-value predictions on multi-domain dialog state tracking. arXiv, 2019.
CHAPTER 1 Limitation of end-to-end DST
End-to-end DST paradigm:
manual labeling train testDST modelcorpus
Limitations of end-to-end DST:
• Costly and slow: 8438 dialogues with 1249 workers in MultiWOZ 2.0 dataset
• Error-prone:Annotation errors
MultiWOZ 2.0 around 40% [Eric et al., 2019]
MultiWOZ 2.1 over 30% [Zhang et al., 2019]
[Eric et al., 2019] Mihail Eric, Rahul Goel, Shachi Paul, Abhishek Sethi, Sanchit Agarwal, Shuyag Gao, and Dilek Hakkani-Tur. Multiwoz 2.1: Multi-domain dialogue state corrections and state tracking baselines. arXiv, 2019.[Zhang et al., 2019] Jian-Guo Zhang, Kazuma Hashimoto, Chien-Sheng Wu, Yao Wan, Philip S Yu, Richard Socher, and Caiming Xiong. Find or classify? dual strategy for slot-value predictions on multi-domain dialog state tracking. arXiv, 2019.
now updated toMultiWOZ 2.2
CHAPTER 1 What is the problem?
CHAPTER 1
Successful in narrow domains with large annotated datasets
What is the problem?
CHAPTER 1
Successful in narrow domains with large annotated datasets
√
What is the problem?
CHAPTER 1
Limited to the domain trained on and do not afford generalization to new domains.
Successful in narrow domains with large annotated datasets
√
What is the problem?
CHAPTER 1 What is the problem?
Successful in narrow domains with large annotated datasets
Limited to the domain trained on and do not afford generalization to new domains.
√
CHAPTER 1 Dialogue State Induction (DSI)
Definition:
What is given?
A set of customer service records without annotation.
User: I want an expensive restaurant that serves Turkish food.System: Anatolia serves Turkish food.User: What is the area?
CHAPTER 1
Definition:
What is given?
A set of customer service records without annotation.
What is the target?
Automatically discover information that the user is looking for at each turn.
User: I want an expensive restaurant that serves Turkish food.System: Anatolia serves Turkish food.User: What is the area?
inform(price=expensive, food=Turkish)
inform(price=expensive, food=Turkish); request(area)
Dialogue State Induction (DSI)
CHAPTER 1 Dialogue State Induction vs DST
CHAPTER 1 Dialogue State Induction vs DST
User: I want an expensive restaurant that serves Turkish food.
System: Anatolia serves Turkish food.User: What is the area?
CHAPTER 1 Dialogue State Induction vs DST
User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)
System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)
CHAPTER 1 Dialogue State Induction vs DST
User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)
System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)
Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]
CHAPTER 1 Dialogue State Induction vs DST
User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)
System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)
Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]
Turn 1:inform(price=expensive, food=Turkish)
Turn 2:inform(price=expensive, food=Turkish); request(area)
CHAPTER 1 Dialogue State Induction vs DST
User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)
System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)
Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]
Turn 1:inform(price=expensive, food=Turkish)
Turn 2:inform(price=expensive, food=Turkish); request(area)
User: I want an expensive restaurant that serves Turkish food.
System: Anatolia serves Turkish food.User: What is the area?
CHAPTER 1 Dialogue State Induction vs DST
User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)
System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)
Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]
Turn 1:inform(price=expensive, food=Turkish)
Turn 2:inform(price=expensive, food=Turkish); request(area)
User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)
System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)
Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]
CHAPTER 1 Dialogue State Induction vs DST
User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)
System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)
Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]
Turn 1:inform(price=expensive, food=Turkish)
Turn 2:inform(price=expensive, food=Turkish); request(area)
User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)
System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)
Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]
Turn 1:inform(price=expensive, food=Turkish)
Turn 2:inform(price=expensive, food=Turkish); request(area)
CHAPTER 1 DSI VS zero-shot DST
Zero-shot DST: support unseen domains (services)
CHAPTER 1
Zero-shot DST: support unseen domains (services)
Motivation: Different domains (services) with similar schemas
DSI VS zero-shot DST
CHAPTER 1
Zero-shot DST: support unseen domains (services)
service_name: “Flights" Service
name: "origin" Slots
name: "destination"
service_name: "Trains" Service
name: "from" Slots
name: "to"
Motivation: Different domains (services) with similar schemas
DSI VS zero-shot DST
CHAPTER 1
Zero-shot DST: support unseen domains (services)
Example from the SGD dataset [Rastogi et al., 2019].
Motivation: Different domains (services) with similar schemasslots along with their natural language description
service_name: “Flights" Servicedescription: "Search for cheap flights across multiple providers"
name: "origin" Slotsdescription: “City of origin for the flight"
name: "destination"description: “City of destination for the flight"
service_name: "Trains" Servicedescription: “Service to find train journeys between cities"
name: "from" Slotsdescription: “Starting city for train journey"
name: "to"description: “Ending city for the train journey"
[Rastogi et al., 2019]Rastogi A, Zang X, Sunkara S, et al. Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset[J]. arXivpreprint arXiv:1909.05855, 2019.
DSI VS zero-shot DST
CHAPTER 1
service_name: “Flights" Servicedescription: "Search for cheap trains across multiple providers"
name: "origin" Slotsdescription: “City of origin for the flight"
name: "destination"description: “City of destination for the flight"
service_name: "Trains" Servicedescription: “Service to find train journeys between cities"
name: "from" Slotsdescription: “Starting city for train journey"
name: "to"description: “Ending city for the train journey"
DSI VS zero-shot DST
CHAPTER 1
service_name: “Flights" Servicedescription: "Search for cheap trains across multiple providers"
name: "origin" Slotsdescription: “City of origin for the flight"
name: "destination"description: “City of destination for the flight"
service_name: "Trains" Servicedescription: “Service to find train journeys between cities"
name: "from" Slotsdescription: “Starting city for train journey"
name: "to"description: “Ending city for the train journey"
Zero-shot DST Limitations:
DSI VS zero-shot DST
CHAPTER 1
• High qualified (consistent) human annotation
service_name: “Flights" Servicedescription: "Search for cheap trains across multiple providers"
name: "origin" Slotsdescription: “City of origin for the flight"
name: "destination"description: “City of destination for the flight"
service_name: "Trains" Servicedescription: “Service to find train journeys between cities"
name: "from" Slotsdescription: “Starting city for train journey"
name: "to"description: “Ending city for the train journey"
Zero-shot DST Limitations:
DSI VS zero-shot DST
CHAPTER 1
service_name: “Flights" Servicedescription: "Search for cheap trains across multiple providers"
name: "origin" Slotsdescription: “City of origin for the flight"
name: "destination"description: “City of destination for the flight"
service_name: "Trains" Servicedescription: “Service to find train journeys between cities"
name: "from" Slotsdescription: “Starting city for train journey"
name: "to"description: “Ending city for the train journey"
• High qualified (consistent) human annotationZero-shot DST Limitations:
DSI VS zero-shot DST
CHAPTER 1
service_name: “Flights" Servicedescription: "Search for cheap trains across multiple providers"
name: "origin" Slotsdescription: “City of origin for the flight"
name: "destination"description: “City of destination for the flight"
service_name: "Trains" Servicedescription: “Service to find train journeys between cities"
name: "from" Slotsdescription: “Starting city for train journey"
name: "to"description: “Ending city for the train journey"
• High qualified (consistent) human annotationZero-shot DST Limitations:
DSI VS zero-shot DST
CHAPTER 1
service_name: “Flights" Servicedescription: "Search for cheap trains across multiple providers"
name: "origin" Slotsdescription: “City of origin for the flight"
name: "destination"description: “City of destination for the flight"
service_name: "Trains" Servicedescription: “Service to find train journeys between cities"
name: "from" Slotsdescription: “Starting city for train journey"
name: "to"description: “Ending city for the train journey"
consistency
• High qualified (consistent) human annotationZero-shot DST Limitations:
DSI VS zero-shot DST
CHAPTER 1
service_name: “Flights" Servicedescription: "Search for cheap trains across multiple providers"
name: "origin" Slotsdescription: “City of origin for the flight"
name: "destination"description: “City of destination for the flight"
service_name: "Trains" Servicedescription: “Service to find train journeys between cities"
name: "from" Slotsdescription: “Starting city for train journey"
name: "to"description: "Ending city for the train journey"
consistency
• High qualified (consistent) human annotation
• Transfer to distant domain (service)
Zero-shot DST Limitations:
DSI VS zero-shot DST
CHAPTER 1
service_name: "Financial" Servicedescription: “Search for financial products across multiple providers"
name: “deposit_rate" Slotsdescription: "the standard for calculating deposit interest"
name: "loan_rate"description: "the interest rate that banks charge to borrowers"
service_name: "Trains" Servicedescription: “Service to find train journeys between cities"
name: "from" Slotsdescription: "Starting city for train journey"
name: "to"description: "Ending city for the train journey"
consistency
• High qualified (consistent) human annotation
• Transfer to distant domain (service)
Zero-shot DST Limitations:
DSI VS zero-shot DST
CHAPTER 1
service_name: "Financial" Servicedescription: “Search for financial products across multiple providers"
name: “deposit_rate" Slotsdescription: "the standard for calculating deposit interest"
name: "loan_rate"description: "the interest rate that banks charge to borrowers"
service_name: "Trains" Servicedescription: “Service to find train journeys between cities"
name: "from" Slotsdescription: "Starting city for train journey"
name: "to"description: "Ending city for the train journey"
consistency
• High qualified (consistent) human annotation
• Transfer to distant domain (service)
XZero-shot DST Limitations:
DSI VS zero-shot DST
CHAPTER 1
service_name: "Financial" Servicedescription: “Search for financial products across multiple providers"
name: “deposit_rate" Slotsdescription: "the standard for calculating deposit interest"
name: "loan_rate"description: "the interest rate that banks charge to borrowers"
service_name: "Trains" Servicedescription: “Service to find train journeys between cities"
name: "from" Slotsdescription: "Starting city for train journey"
name: "to"description: "Ending city for the train journey"
consistency
• High qualified (consistent) human annotation
Zero-shot DST Limitations:
• Transfer to distant domain (service)
XDSI features:• Release human burden
• Data-driven: automatically discover
DSI VS zero-shot DST
Method
CHAPTER 2 How we solve DSI?
Two steps: Utterance: I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
CHAPTER 2 How we solve DSI?
train, Chicago, Dallas, Wednesday
• Candidates (values) extraction (POS tag, NER, coreference)
Two steps: Utterance: I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
CHAPTER 2 How we solve DSI?
train, Chicago, Dallas, Wednesday
• Candidates (values) extraction (POS tag, NER, coreference)
Inform{train-departure=Chicago, train-destination=Dallas, train-leave at=Wednesday}train=None
Two steps:
• Slot assignment: two neural latent variable models(DSI-base and DSI-GM)
Utterance: I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
CHAPTER 2 What is VAE?
Pics from https://www.jianshu.com/p/ffd493e10751
AutoEncoder Variational AutoEncoder
CHAPTER 2 Model: generative DSI-base
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder
CHAPTER 2 Model: generative DSI-base
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
0 1 0 1 … 0 0 1 1
vocab length (all candidates)
one-hot
Encoder
CHAPTER 2 Model: generative DSI-base
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
0 1 0 1 … 0 0 1 1
vocab length (all candidates)
one-hot
LT+SoftPlus+Dropout
Encoder
CHAPTER 2 Model: generative DSI-base
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
0 1 0 1 … 0 0 1 1
vocab length (all candidates)
one-hot
LT+SoftPlus+Dropout
encoded oh
Encoder
CHAPTER 2 Model: generative DSI-base
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
0 1 0 1 … 0 0 1 1
vocab length (all candidates)
one-hot Pre-trained ELMo
train Chicago Dallas Wednesdaycontextualized
embedding
LT+SoftPlus+Dropout
encoded oh
Encoder
CHAPTER 2 Model: generative DSI-base
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
0 1 0 1 … 0 0 1 1
vocab length (all candidates)
one-hot Pre-trained ELMo
train Chicago Dallas Wednesdaycontextualized
embedding
LT+SoftPlus+Dropout
AvgPooling+LT+SoftPlus+Dropout
encoded oh
Encoder
CHAPTER 2 Model: generative DSI-base
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
0 1 0 1 … 0 0 1 1
vocab length (all candidates)
one-hot Pre-trained ELMo
train Chicago Dallas Wednesdaycontextualized
embedding
LT+SoftPlus+Dropout
AvgPooling+LT+SoftPlus+Dropout
encoded oh encoded ce
Encoder
CHAPTER 2 Model: generative DSI-base
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
0 1 0 1 … 0 0 1 1
vocab length (all candidates)
one-hot Pre-trained ELMo
train Chicago Dallas Wednesdaycontextualized
embedding
LT+SoftPlus+Dropout
AvgPooling+LT+SoftPlus+Dropout
encoded oh encoded ce
LT+BatchNorm LT+BatchNorm
posterior µ 2posterior logσ
Encoder
CHAPTER 2
posterior µ 2posterior logσ
Model: generative DSI-base
CHAPTER 2
Sampling
posterior µ 2posterior logσ
Model: generative DSI-base
CHAPTER 2
( ), 0,µ σ ε ε+ ∗ Ι
reparameterization trick
Sampling
posterior µ 2posterior logσ
Model: generative DSI-base
CHAPTER 2
( ), 0,µ σ ε ε+ ∗ Ι
reparameterization trick
zlatent state vector
Sampling
posterior µ 2posterior logσ
Model: generative DSI-base
CHAPTER 2
( ), 0,µ σ ε ε+ ∗ Ι
reparameterization trick
zlatent state vector
Sampling
Decoder
posterior µ 2posterior logσ
Model: generative DSI-base
CHAPTER 2
( ), 0,µ σ ε ε+ ∗ Ι
reparameterization trick
zlatent state vector
LT+Softmax
Sampling
Decoder
posterior µ 2posterior logσ
Model: generative DSI-base
CHAPTER 2
( ), 0,µ σ ε ε+ ∗ Ι
reparameterization trick
zlatent state vector
LT+Softmax
slatent slot vector
(dimension: slot_num)
Sampling
Decoder
posterior µ 2posterior logσ
Model: generative DSI-base
CHAPTER 2
( ), 0,µ σ ε ε+ ∗ Ι
reparameterization trick
zlatent state vector
LT+Softmax
slatent slot vector
(dimension: slot_num)
LT+BatchNorm
Sampling
Decoder
posterior µ 2posterior logσ
Model: generative DSI-base
CHAPTER 2
( ), 0,µ σ ε ε+ ∗ Ι
reparameterization trick
zlatent state vector
LT+Softmax
slatent slot vector
(dimension: slot_num)
LT+BatchNorm
0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4
reconstructed ohvocab length (all candidates)
Sampling
Decoder
posterior µ 2posterior logσ
Model: generative DSI-base
CHAPTER 2
( ), 0,µ σ ε ε+ ∗ Ι
reparameterization trick
zlatent state vector
LT+Softmax
slatent slot vector
(dimension: slot_num)
LT+BatchNorm LT+BatchNorm
0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4
reconstructed ohvocab length (all candidates)
Sampling
Decoder
posterior µ 2posterior logσ
Model: generative DSI-base
CHAPTER 2
( ), 0,µ σ ε ε+ ∗ Ι
reparameterization trick
zlatent state vector
LT+Softmax
slatent slot vector
(dimension: slot_num)
LT+BatchNorm LT+BatchNorm
0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4
reconstructed oh
recon_µ
vocab length (all candidates)contextualized
feature dimension
Sampling
Decoder
posterior µ 2posterior logσ
Model: generative DSI-base
CHAPTER 2
( ), 0,µ σ ε ε+ ∗ Ι
reparameterization trick
zlatent state vector
LT+Softmax
slatent slot vector
(dimension: slot_num)
LT+BatchNorm LT+BatchNorm LT+BatchNorm
0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4
reconstructed oh
recon_µ
vocab length (all candidates)contextualized
feature dimension
Sampling
Decoder
posterior µ 2posterior logσ
Model: generative DSI-base
CHAPTER 2
( ), 0,µ σ ε ε+ ∗ Ι
reparameterization trick
zlatent state vector
LT+Softmax
slatent slot vector
(dimension: slot_num)
LT+BatchNorm LT+BatchNorm LT+BatchNorm
0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4
reconstructed oh
recon_µ 2recon_ logσ
vocab length (all candidates)contextualized
feature dimensioncontextualized
feature dimension
Sampling
Decoder
posterior µ 2posterior logσ
Model: generative DSI-base
CHAPTER 2
( ), 0,µ σ ε ε+ ∗ Ι
reparameterization trick
zlatent state vector
LT+Softmax
slatent slot vector
(dimension: slot_num)
LT+BatchNorm LT+BatchNorm LT+BatchNorm
0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4
reconstructed oh
recon_µ 2recon_ logσ
vocab length (all candidates)contextualized
feature dimensioncontextualized
feature dimensionreconstructed parameters for ce
Sampling
Decoder
posterior µ 2posterior logσ
Model: generative DSI-base
CHAPTER 2
reconstruction term
regularization term
Loss
CHAPTER 2
0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4 reconstructed oh
vocab length (all candidates)
0 1 0 1 … 0 0 1 1 ohCross Entropy
Loss
reconstruction term
regularization term
Loss
CHAPTER 2
0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4 reconstructed oh
recon_µ
2recon_ logσ
vocab length (all candidates)
contextualized feature dimension
contextualized feature dimension
0 1 0 1 … 0 0 1 1 ohCross Entropy
Loss
Multivariate Gaussian Distribution
reconstruction term
regularization term
Loss
CHAPTER 2
0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4 reconstructed oh
recon_µ
2recon_ logσ
vocab length (all candidates)
contextualized feature dimension
contextualized feature dimension
0 1 0 1 … 0 0 1 1 ohCross Entropy
Loss
Multivariate Gaussian Distribution
train Chicago Dallas Wednesdayreconstruction
term
regularization term
Loss
CHAPTER 2
0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4 reconstructed oh
recon_µ
2recon_ logσ
vocab length (all candidates)
contextualized feature dimension
contextualized feature dimension
0 1 0 1 … 0 0 1 1 ohCross Entropy
Loss
Multivariate Gaussian Distribution
Pic from https://www.datalearner.com/blog/1051485590815771
train Chicago Dallas Wednesday
-dist.log_prob(ce[train])
-dist.log_prob(ce[Wednesday])
-dist.log_prob(ce[Chicago])
-dist.log_prob(ce[Dallas])
reconstruction term
regularization term
Loss
CHAPTER 2
0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4 reconstructed oh
recon_µ
2recon_ logσ
vocab length (all candidates)
contextualized feature dimension
contextualized feature dimension
0 1 0 1 … 0 0 1 1 ohCross Entropy
Loss
Multivariate Gaussian Distribution
Pic from https://www.datalearner.com/blog/1051485590815771
train Chicago Dallas Wednesday
-dist.log_prob(ce[train])sum
-dist.log_prob(ce[Wednesday])
-dist.log_prob(ce[Chicago])
-dist.log_prob(ce[Dallas])
reconstruction term
regularization term
Loss
CHAPTER 2
0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4 reconstructed oh
recon_µ
2recon_ logσ
vocab length (all candidates)
contextualized feature dimension
contextualized feature dimension
0 1 0 1 … 0 0 1 1 ohCross Entropy
Loss
Multivariate Gaussian Distribution
Pic from https://www.datalearner.com/blog/1051485590815771
train Chicago Dallas Wednesday
-dist.log_prob(ce[train])sum
-dist.log_prob(ce[Wednesday])
-dist.log_prob(ce[Chicago])
-dist.log_prob(ce[Dallas])
reconstruction term
( ) ( )( , , 0,1 )KL µ σ KL Divergence between posterior and prior: regularization term
Loss
CHAPTER 2 DSI-base inferenceI need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
CHAPTER 2 DSI-base inferenceI need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder
CHAPTER 2 DSI-base inferenceI need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder
posterior µ 2posterior logσ
CHAPTER 2 DSI-base inferenceI need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder
posterior µ 2posterior logσ
Sampling
CHAPTER 2 DSI-base inferenceI need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder
zlatent state vector
posterior µ 2posterior logσ
Sampling
CHAPTER 2 DSI-base inferenceI need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder
zlatent state vector
posterior µ 2posterior logσ
Sampling
LT+Softmax
CHAPTER 2 DSI-base inferenceI need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder
zlatent state vector
posterior µ 2posterior logσ
Sampling
LT+Softmax
s
CHAPTER 2 DSI-base inferenceI need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder
zlatent state vector
posterior µ 2posterior logσ
Sampling
LT+Softmax
s
p(s)[slot_num,1]
CHAPTER 2 DSI-base inferenceI need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder
zlatent state vector
posterior µ 2posterior logσ
Sampling
LT+Softmax
s
p(s)
For each candidate in {train, Chicago, Dallas, Wednesday}
[slot_num,1]
CHAPTER 2
slatent slot vector
(dimension: slot_num)
LT+BatchNorm LT+BatchNorm LT+BatchNorm
0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4
reconstructed oh
recon_µ 2recon_ logσ
vocab length (all candidates)contextualized
feature dimensioncontextualized
feature dimensionreconstructed parameters for ce
What does the model learn ?
Decoder
CHAPTER 2
slatent slot vector
(dimension: slot_num)
LT+BatchNorm LT+BatchNorm LT+BatchNorm
0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4
reconstructed oh
recon_µ 2recon_ logσ
vocab length (all candidates)contextualized
feature dimensioncontextualized
feature dimensionreconstructed parameters for ce
…
…
…
…
…
slot_num
vocab len
W: parameter matrix
What does the model learn ?
Decoder
CHAPTER 2
0.10.7
0.01
0.08
……
What does the model learn ?s: slot vector
…
…
…
…
…
slot_num
vocab_len
W: parameter matrix
slot_num 0.01 0.2 0.01 0.8 … 0.03 0.02 0.3 0.4× =
CHAPTER 2
0.10.7
0.01
0.08
……
What does the model learn ?s: slot vector
…
…
…
…
…
slot_num
vocab_len
W: parameter matrix
slot_num 0.01 0.2 0.01 0.8 … 0.03 0.02 0.3 0.4
0 1 0 1 … 0 0 1 1
× =Cross Entropy Loss
CHAPTER 2
…….
0.10.7
0.01
0.08
……
What does the model learn ?s: slot vector
…
…
…
…
…
slot_num
vocab_len
W: parameter matrix
slot_num 0.01 0.2 0.01 0.8 … 0.03 0.02 0.3 0.4
0 1 0 1 … 0 0 1 1
× =Cross Entropy Loss
slot_num different slot-candidate distributions: p(c|si), i=1,2,…, slot_num
CHAPTER 2 DSI-base inference
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder
zlatent state vector
posterior µ 2posterior logσ
Sampling
LT+Softmax
s
p(s)
For each candidate in {train, Chicago, Dallas, Wednesday}
[slot_num,1]
CHAPTER 2 DSI-base inference
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder
zlatent state vector
posterior µ 2posterior logσ
Sampling
LT+Softmax
s
p(s)
For each candidate in {train, Chicago, Dallas, Wednesday}
0 0 0 1 … 0 0 0 0oh
[slot_num,1]
CHAPTER 2 DSI-base inference
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder
zlatent state vector
posterior µ 2posterior logσ
Sampling
LT+Softmax
s
p(s)
For each candidate in {train, Chicago, Dallas, Wednesday}
0 0 0 1 … 0 0 0 0oh
slot_num
vocab_len
[slot_num,1]
CHAPTER 2 DSI-base inference
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder
zlatent state vector
posterior µ 2posterior logσ
Sampling
LT+Softmax
s
p(s)
For each candidate in {train, Chicago, Dallas, Wednesday}
0 0 0 1 … 0 0 0 0oh
slot_num
vocab_len
p(oh|s)[slot_num,1] [slot_num,1]
CHAPTER 2 DSI-base inference
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder
zlatent state vector
posterior µ 2posterior logσ
Sampling
LT+Softmax
s
p(s)
For each candidate in {train, Chicago, Dallas, Wednesday}
0 0 0 1 … 0 0 0 0oh
slot_num
vocab_len
p(oh|s)[slot_num,1] [slot_num,1]
Chicagoce
CHAPTER 2
slatent slot vector
(dimension: slot_num)
LT+BatchNorm LT+BatchNorm LT+BatchNorm
0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4
reconstructed oh
recon_µ 2recon_ logσ
vocab length (all candidates)contextualized
feature dimensioncontextualized
feature dimensionreconstructed parameters to generate ce
DSI-base inference
feature_dim
Decoder
CHAPTER 2
slatent slot vector
(dimension: slot_num)
LT+BatchNorm LT+BatchNorm LT+BatchNorm
0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4
reconstructed oh
recon_µ 2recon_ logσ
vocab length (all candidates)contextualized
feature dimensioncontextualized
feature dimensionreconstructed parameters to generate ce
feature_dim
DSI-base inference
…
…
…
…
…
slot_num
feature_dim
Decoder
CHAPTER 2
slatent slot vector
(dimension: slot_num)
LT+BatchNorm LT+BatchNorm LT+BatchNorm
0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4
reconstructed oh
recon_µ 2recon_ logσ
vocab length (all candidates)contextualized
feature dimensioncontextualized
feature dimensionreconstructed parameters to generate ce
feature_dim
DSI-base inference
…
…
…
…
…
slot_num
feature_dim
…
…
…
…
…
slot_num
Decoder
CHAPTER 2
0.10.7
0.01
0.08
……
What does the model learn ?
s: slot vector
…
…
…
…
…
slot_num
featrure_dim
W: parameter matrix
slot_num ×
=
…
…
…
…
…
slot_num
featrure_dim
recon_µ
2recon_ logσ=
CHAPTER 2
0.10.7
0.01
0.08
……
What does the model learn ?
s: slot vector
…
…
…
…
…
slot_num
featrure_dim
W: parameter matrix
slot_num ×
=
…
…
…
…
…
slot_num
featrure_dim
recon_µ
2recon_ logσ=
1µ
1Sµ −
2µ
Sµ
…….
1σ
1Sσ −
2σ
Sσ
……
CHAPTER 2
0.10.7
0.01
0.08
……
What does the model learn ?
s: slot vector
…
…
…
…
…
slot_num
featrure_dim
W: parameter matrix
slot_num ×
=
…
…
…
…
…
slot_num
featrure_dim
recon_µ
2recon_ logσ=
1µ
1Sµ −
2µ
Sµ
…….
1σ
1Sσ −
2σ
Sσ
……
slot_num different Multivariate Gaussian Distributions
( , ), 1, 2, ,i i i Sµ σ =
CHAPTER 2 Multivariate Gaussian probability density
µ
σ
contextualized feature dimension
contextualized feature dimension Multivariate Gaussian Distribution
Pic from https://www.datalearner.com/blog/1051485590815771
train Chicago Dallas Wednesday
-dist.log_prob(ce[train])
-dist.log_prob(ce[Wednesday])
-dist.log_prob(ce[Chicago])
-dist.log_prob(ce[Dallas])
CHAPTER 2
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder+Sampling
zlatent state vector
LT+Softmax
slatent slot vector(dimension: slot_num)
p(s)
For each candidate in {train, Chicago, Dallas, Wednesday}
0 0 0 1 … 0 0 0 0
p(oh|s)
oh
[slot_num,1]
slot_num
vocab_len
[slot_num,1]
Chicagoce
DSI-base inference
CHAPTER 2
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder+Sampling
zlatent state vector
LT+Softmax
slatent slot vector(dimension: slot_num)
p(s)
For each candidate in {train, Chicago, Dallas, Wednesday}
0 0 0 1 … 0 0 0 0
p(oh|s)
oh
[slot_num,1]
slot_num
vocab_len
[slot_num,1]
Chicagoce
( , ),1, 2, , _
i i
i slot numµ σ
=
DSI-base inference
CHAPTER 2
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder+Sampling
zlatent state vector
LT+Softmax
slatent slot vector(dimension: slot_num)
p(s)
For each candidate in {train, Chicago, Dallas, Wednesday}
0 0 0 1 … 0 0 0 0
p(oh|s)
oh
[slot_num,1]
slot_num
vocab_len
[slot_num,1]
Chicagoce
( , ),1, 2, , _
i i
i slot numµ σ
=
p(ce|s)[slot_num,1]
DSI-base inference
CHAPTER 2
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder+Sampling
zlatent state vector
LT+Softmax
slatent slot vector(dimension: slot_num)
p(s)
For each candidate in {train, Chicago, Dallas, Wednesday}
0 0 0 1 … 0 0 0 0
p(oh|s)
oh
[slot_num,1]
slot_num
vocab_len
[slot_num,1]
Chicagoce
( , ),1, 2, , _
i i
i slot numµ σ
=
p(ce|si)[slot_num,1]
* *
DSI-base inference
CHAPTER 2
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder+Sampling
zlatent state vector
LT+Softmax
slatent slot vector(dimension: slot_num)
p(s)
For each candidate in {train, Chicago, Dallas, Wednesday}
0 0 0 1 … 0 0 0 0
p(oh|s)
oh
[slot_num,1]
slot_num
vocab_len
[slot_num,1]
Chicagoce
( , ),1, 2, , _
i i
i slot numµ σ
=
p(ce|si)[slot_num,1]
* *p(s|Chicago)[slot_num,1]
DSI-base inference
CHAPTER 2 Post-processing: slot mapping
Chicago slot 0
Houston
Washington
Boston
slot 0
slot 0
slot 0
predictions
train-departure
references
Mapping from slot indexes to labels?
{0: train-departure}
Majority vote
train-departure
train-departure
train-departure
Chicago
Houston
Washington
Boston
Magdalene College slot 0 Magdalene College attraction-name
4 times
1 time
CHAPTER 2 Post-processing: slot mapping
Chicago slot 0
Houston
Washington
Boston
slot 0
slot 0
slot 0
predictions
train-departure
references
Mapping from slot indexes to labels?
train-departure
train-departure
train-departure
Chicago
Houston
Washington
Boston
Magdalene College slot 0 Magdalene College train-departure
{0: train-departure}
CHAPTER 2 What is the problem with DSI-base?
Multiple domains train
departuredestinationday
taxi
leave atarrive bySimilar slots
CHAPTER 2
Multiple domains train
departuredestinationday
taxi
leave atarrive by
I need to arrive by 19:45 and departs from Cambridge.
Similar slots
Similar contexts
What is the problem with DSI-base?
CHAPTER 2 How does DSI-GM work?
train-arrive by
train-leave at
train-daytrain-destination
train-departure
taxi-arrive by
taxi-leave at
taxi-destinationtaxi-departure
Z
A single Gaussian prior
µ σ
CHAPTER 2 How does DSI-GM work?
train-arrive by
train-leave at
train-daytrain-destination
train-departure
taxi-arrive by
taxi-leave at
taxi-destinationtaxi-departure
Z
A single Gaussian prior
µ σ
train-arrive by
train-leave at
train-day
train-destinationtrain-departure
taxi-arrive by
taxi-leave at
taxi-destination
taxi-departureZ
A Mixture-of-Gaussians prior
1 Kµ µ 1 Kσ σ
Z
π
CHAPTER 2
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
How does DSI-GM work?
CHAPTER 2
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
DSI-base
How does DSI-GM work?
CHAPTER 2
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder+Sampling
DSI-base
How does DSI-GM work?
CHAPTER 2
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder+Sampling
z
DSI-base
How does DSI-GM work?
CHAPTER 2
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder+Sampling
z
Prediction
DSI-base
How does DSI-GM work?
CHAPTER 2
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder+Sampling
z
Prediction
Chicago: taxi-departureDSI-base
How does DSI-GM work?
CHAPTER 2
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder+Sampling
z
Prediction
Chicago: taxi-departureDSI-base DSI-GM
How does DSI-GM work?
CHAPTER 2
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder+Sampling
z
Prediction
Chicago: taxi-departureDSI-base
Encoder+Sampling
DSI-GM
How does DSI-GM work?
CHAPTER 2
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder+Sampling
z
Prediction
Chicago: taxi-departureDSI-base
Encoder+Sampling
z
DSI-GM
How does DSI-GM work?
CHAPTER 2
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder+Sampling
z
Prediction
Chicago: taxi-departureDSI-base
Encoder+Sampling
z
DSI-GM
A Mixture-of-Gaussians
Domain: train
How does DSI-GM work?
CHAPTER 2
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder+Sampling
z
Prediction
Chicago: taxi-departureDSI-base
Encoder+Sampling
z
Prediction
DSI-GM
A Mixture-of-Gaussians
Domain: train
How does DSI-GM work?
CHAPTER 2
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder+Sampling
z
Prediction
Chicago: taxi-departureDSI-base
Encoder+Sampling
z
Prediction Chicago: departure
DSI-GM
A Mixture-of-Gaussians
Domain: train
How does DSI-GM work?
CHAPTER 2
I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.
Encoder+Sampling
z
Prediction
Chicago: taxi-departureDSI-base
Encoder+Sampling
z
Prediction Chicago: departure
DSI-GM
A Mixture-of-Gaussians
Domain: train
Chicago: train-departure
How does DSI-GM work?
请输入您的标题NINDEBIAOTIExperiments
CHAPTER 3 DSI results
CHAPTER 3 DSI results
CHAPTER 3 DSI-Based Response Generation
CHAPTER 3 DSI-Based Response Generation
[Chen et al., 2019] Wenhu Chen, Jianshu Chen, Pengda Qin, Xifeng Yan, and William Yang Wang. Semantically conditioned dialog response generation via hierarchical disentangled self-attention. In ACL, 2019.
CHAPTER 3 DSI-Based Response Generation
[Chen et al., 2019] Wenhu Chen, Jianshu Chen, Pengda Qin, Xifeng Yan, and William Yang Wang. Semantically conditioned dialog response generation via hierarchical disentangled self-attention. In ACL, 2019.
CHAPTER 3 DSI-Based Response Generation
[Chen et al., 2019] Wenhu Chen, Jianshu Chen, Pengda Qin, Xifeng Yan, and William Yang Wang. Semantically conditioned dialog response generation via hierarchical disentangled self-attention. In ACL, 2019.
CHAPTER 3 Analysis
CHAPTER 3
Domain level comparison of the latent representation z.
Analysis
请输入您的标题NINDEBIAOTIConclusion
CHAPTER 4 Conclusion
CHAPTER 4
• Dialogue state induction: a novel task to automatically identify dialogue states
Conclusion
CHAPTER 4
• Dialogue state induction: a novel task to automatically identify dialogue states
Conclusion
• DSI-base/DSI-GM: two neural generative models with latent variables
CHAPTER 4
• Dialogue state induction: a novel task to automatically identify dialogue states
Conclusion
• DSI-base/DSI-GM: two neural generative models with latent variables
• Challenging and promising: unsupervised setting is very practical
CHAPTER 4
• Dialogue state induction: a novel task to automatically identify dialogue states
Conclusion
• DSI-base/DSI-GM: two neural generative models with latent variables
• Challenging and promising: unsupervised setting is very practical
• IJCAI review: this problem is important and interesting, this area should attract more attention. This work has great potential of motivating follow-up research.
Contact: [email protected]
Paper: https://www.ijcai.org/Proceedings/2020/0532.pdf
GitHub: https://github.com/taolusi/dialogue-state-induction
GitHubpaper