Text Generation with Exemplar-based Adaptive Decodinghapeng/slides/peng2019... · 2019. 6. 8. ·...
Transcript of Text Generation with Exemplar-based Adaptive Decodinghapeng/slides/peng2019... · 2019. 6. 8. ·...
Text Generation with Exemplar-based Adaptive Decoding
Hao Peng, Ankur Parikh, Manaal Faruqui, Bhuwan Dhingra, Dipanjan Das
@NAACLJune 4, 2019
❖ Background and Overview❖ Adaptive Decoding ❖ Experiments
Outline
Conditioned Text Generation
Source x
yTarget
A Portuguese train derailed in Oporto on Wednesday, killing three people.
Portuguese train derailed, killing three.
Conditioned Text Generation
Source x
yTarget
A Portuguese train derailed in Oporto on Wednesday, killing three people.
Portuguese train derailed, killing three.
Attention + Copy
· · ·
↵
Encoder
· · ·Decoder
Enc(x)
Dec⇣y | Enc(x)
⌘
Guu et al., 2017; Cao et al., 2018
Exemplar-informed GenerationSource x
A Portuguese train derailed in Oporto on Wednesday, killing three people Two die in a Britain train collision
Exemplar zRetrieveTraining target
Three die in a Portuguese train derailmentGoal y
Guu et al., 2017; Cao et al., 2018
Exemplar-informed GenerationSource x
A Portuguese train derailed in Oporto on Wednesday, killing three people Two die in a Britain train collision
Exemplar zRetrieveTraining target
Three die in a Portuguese train derailmentGoal y
Motivation • Better performance Cao et al., 2018; Zhang et al., 2018• Diversity and interpretability Guu et al., 2017; Wiseman et al., 2018
What to say How to say it
···
Guu et al., 2017; Cao et al., 2018
Source xA Portuguese train derailed in Oporto on Wednesday, killing three people Two die in a Britain train collision
Exemplar zRetrieveTraining target
Training pairs
Exemplar-informed Generation
(x3,y3)
(x1,y1)(x2,y2)
(x4,y4)
Three die in a Portuguese train derailmentGoal y
What to say How to say it
Guu et al., 2017; Cao et al., 2018
Exemplar-informed GenerationSource x
A Portuguese train derailed in Oporto on Wednesday, killing three people Two die in a Britain train collision
Exemplar zRetrieveTraining target
Similar
(x3,y3)
(x1,y1)(x2,y2)
(x4,y4)
Three die in a Portuguese train derailmentGoal y
Training pairs
What to say How to say it
Guu et al., 2017; Cao et al., 2018
Exemplar-informed GenerationSource x
A Portuguese train derailed in Oporto on Wednesday, killing three people Two die in a Britain train collision
Exemplar zRetrieveTraining target
(x3,y3)
(x1,y1)(x2,y2)
(x4,y4)
Three die in a Portuguese train derailmentGoal y
Similar
Training pairs
What to say How to say it
A Portuguese train derailed in Oporto on Wednesday, killing three people. Two die in a Britain train collision.
Source x Exemplar z[ ; ]
Status Quo
Guu et al., 2017; Cao et al., 2018
Three die in a Portuguese train derailmentGoal y
· · ·Enc�[x; z]
�
A Portuguese train derailed in Oporto on Wednesday, killing three people. Two die in a Britain train collision.
Status Quo
Guu et al., 2017; Cao et al., 2018
· · ·
↵ Attention + copy
· · ·
Enc�[x; z]
�
Dec⇣y | Enc
�[x; z]
�⌘
Three die in a Portuguese train derailmentGoal y
Source x Exemplar z[ ; ]
· · ·
↵ Attention + copy
· · ·
Enc�[x; z]
�
Dec⇣y | Enc
�[x; z]
�⌘
Status Quo
A Portuguese train derailed in Oporto on Wednesday, killing three people. Two die in a Britain train collision.
Guu et al., 2017; Cao et al., 2018
What to say
How to say it
Source x Exemplar z[ ; ]
Three die in a Portuguese train derailmentGoal y
· · ·
↵ Attention + copy
· · ·
Enc�[x; z]
�
Dec⇣y | Enc
�[x; z]
�⌘
Status Quo
A Portuguese train derailed in Oporto on Wednesday, killing three people. Two die in a Britain train collision.
Guu et al., 2017; Cao et al., 2018
Britain train derailed, killing two. Output
Source x Exemplar z[ ; ]
OverviewMotivation • Encoder what to say• Decoder how to say it
Method: Adaptive Decoding • Exemplar-specific decoder
Source x Exemplar zP�y | ,
�
AdaDecz
OverviewMotivation • Encoder what to say• Decoder how to say it
Method: Adaptive Decoding • Exemplar-specific decoder• Drop-in replacement in
seq2seq
Source x Exemplar zP�y | ,
�
AdaDecz
· · ·↵
· · ·
x
OverviewMotivation • Encoder what to say• Decoder how to say it
Method: Adaptive Decoding • Exemplar-specific decoder• Drop-in replacement in
seq2seq
Experiments • Summarization • Data2text generation
Source x Exemplar zP�y | ,
�
AdaDecz
· · ·↵
· · ·
x
❖ Background and Overview❖ Adaptive Decoding ❖ Experiments
Outline
Adaptive Decoder
Goal • Customized decoder for each
exemplar.
AdaDecz
Adaptive Decoder
Goal • Customized decoder for each
exemplar.
Key Points • Exemplar-informed
interpolation of backbones.
n, , , · · ·
o
Interpolation z
AdaDecz
Adaptive Decoder
Goal • Customized decoder for each
exemplar.
Key Points • Exemplar-informed
interpolation of backbones.• Low-rank constraints by
construction.
n, , , · · ·
o
Interpolation z
AdaDecz
Adaptive Decoder
c1W1 c2W2 c3
⇣,
⌘ ⇣,
⌘ ⇣,
⌘
W3
Adaptive Decoder
c1W1 c2W2 c3
⇣,
⌘ ⇣,
⌘ ⇣,
⌘
W3
AdaDecz
+= �2 �3�1 +
W = �1W1 + �2W2 + �3W3
AdaDecz
Adaptive Decoder
c1
⇣,
⌘
W1 c2
⇣,
⌘
W2 c3
⇣,
⌘
W3
Exemplar
+= �2 �3�1 +
W = �1W1 + �2W2 + �3W3
z
Adaptive Decoder
c1
⇣,
⌘
W1 c2
⇣,
⌘
W2 c3
⇣,
⌘
W3
2
4�1
�2
�3
3
5 =
2
4p>c1p>c2p>c3
3
5
p = RNN(z)
W = �1W1 + �2W2 + �3W3
Adaptive Decoder
c1
⇣,
⌘
W1 c2
⇣,
⌘
W2 c3
⇣,
⌘
W3
2
4�1
�2
�3
3
5 =
2
4p>c1p>c2p>c3
3
5
p = RNN(z)
W = �1W1 + �2W2 + �3W3
Low-rank Constraints
W = �1W1 + �2W2 + �3W3
= +�1 �2 + �3
Low-rank Constraints
W = �1W1 + �2W2 + �3W3
= +�1 �2 + �3
Too many params!
Low-rank Constraints
= + +�1 �2 �3
W = �1u1v1> + �2u2v2
> + �3u3v3>
W = �1W1 + �2W2 + �3W3
Low-rank Constraints
= + +�1 �2 �3
| {z }Rank = 1
| {z }Rank 3
W = �1u1v1> + �2u2v2
> + �3u3v3>
W = �1W1 + �2W2 + �3W3
Low-rank Constraints
W = �1u1v1> + �2u2v2
> + �3u3v3> + · · ·
+| {z }Rank = 1
| {z }Rank d | {z }
m = d
· · ·= + +�1 �2 �3 +
W = �1W1 + �2W2 + �3W3
Low-rank Constraints
W = �1u1v1> + �2u2v2
> + �3u3v3> + · · ·
+| {z }Rank = 1
| {z }Rank d | {z }
m = d
· · ·= + +�1 �2 �3 +
W = �1W1 + �2W2 + �3W3
O(d3) ! O(d2)
WalkthroughSource x Exemplar zRetrieve
Training target
WalkthroughSource x Exemplar zRetrieve
Training target
2
4�1
�2
�3
3
5 =
2
4p>c1p>c2p>c3
3
5
p = RNN(z)
WalkthroughSource x Exemplar zRetrieve
Training target
2
4�1
�2
�3
3
5 =
2
4p>c1p>c2p>c3
3
5
p = RNN(z)
+
�1
�2
�3
WalkthroughSource x Exemplar zRetrieve
Training target
2
4�1
�2
�3
3
5 =
2
4p>c1p>c2p>c3
3
5
p = RNN(z)
· · ·
↵
· · ·AdaDecz
Enc
+
�1
�2
�3
❖ Background and Overview❖ Adaptive Decoding ❖ Experiments
Outline
Experiments: Summarization
Datasets: • Gigaword. Rush et al., 2015• New York Times (NYT). Durrett et al., 2016
Implementation: • TF-IDF + cosine similarity for exemplar retrieval.• LSTM encoder/decoder.• Comparable implementation and tuning.
Experiments: Summarization
Datasets: • Gigaword. Rush et al., 2015• New York Times (NYT). Durrett et al., 2016
Implementation: • TF-IDF + cosine similarity for exemplar retrieval.• LSTM encoder/decoder.• Comparable implementation and tuning.
Seq2seq AttExp, Cao AdaDec, this work Cao, FullEnc. & Att. ExemplarAdaptive DecodingRerank
Rouge scores on Gigaword test set
Roug
e
19.018.517.116.6
34.534.733.232.4
37.037.336.035.0
ROUGE-1 ROUGE-L ROUGE-2
Cao et al., 2018
Seq2seq AttExp, Cao AdaDec, this work Cao, FullEnc. & Att. ExemplarAdaptive DecodingRerank
Rouge scores on Gigaword test set
Roug
e
19.018.517.116.6
34.534.733.232.4
37.037.336.035.0
ROUGE-1 ROUGE-L ROUGE-2
Cao et al., 2018
Seq2seq AttExp, Cao AdaDec, this work Cao, FullEnc. & Att. ExemplarAdaptive DecodingRerank
Rouge scores on Gigaword test set
Roug
e
19.018.517.116.6
34.534.733.232.4
37.037.336.035.0
ROUGE-1 ROUGE-L ROUGE-2
Cao et al., 2018
Paulus et al., 2018
Seq2seq Paulus AttExp AdaDecEnc. & Att. ExemplarAdaptive Decoding
Rouge scores on NYT test set
Roug
e
26.425.726.025.1
43.242.542.941.9
ROUGE-1 ROUGE-2
Experiments: Data2textDataset: • WikiBio. Lebret et al., 2016
Implementation: • TF-IDF + cosine similarity for retrieval.• LSTM encoder/decoder. See et al., 2017• Comparable implementation and tuning.
Input
Jacques-Louis David (30 August 1748 – 29 December 1825) was a French
painter in the Neoclassical style.
Output
Experiments: Data2textDataset: • WikiBio. Lebret et al., 2016
Implementation: • TF-IDF + cosine similarity for exemplar retrieval.• LSTM encoder/decoder. See et al., 2017• Comparable implementation and tuning.
Seq2seq Wiseman AttExp AdaDec, this workEnc. & Att. ExemplarAdaptive Decoding
WikiBio test performance
Roug
e/BL
EU43.6
43.1
34.8
42.5
40.640.0
38.639.3
ROUGE-4 BLEU
Wiseman et al., 2018
Conclusion
Problem
P(y | x, z)
Conclusion
Problem Method
AdaDecz⇣y | Enc(x)
⌘
P(y | x, z)
Conclusion
Problem Method Results
AdaDecz⇣y | Enc(x)
⌘
P(y | x, z)
Exemplars Outputs
Portuguese train derailed, killing three.
Two die in a Britain train collision Three killed in a Portuguese train derailment
Two people were killed in a Britain train collision
Three people were killed in a Portuguese train derailment
A train collision in Canada killed two people
A Portugueses train derails in northern Mexico, killed three
Thank You!shorturl.at/kzAGY
A Portuguese train derailed in Oporto on Wednesday, killing three people
Source
?