Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. ·...
Transcript of Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. ·...
![Page 1: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/1.jpg)
TTIC 31210: Advanced Natural Language Processing
Kevin Gimpel Spring 2017
Lecture 5: Sequence-‐to-‐Sequence Modeling
and Sentence Embeddings
1
![Page 2: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/2.jpg)
Recurrent Neural Networks
2
“hidden vector”
![Page 3: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/3.jpg)
“Output” Recurrent Neural Networks
3
“hidden vector”
“output symbol”
![Page 4: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/4.jpg)
“Output” Recurrent Neural Networks
4
• y is a symbol, not a vector • O is the “output” vocabulary • we have a new parameter vector emb(y) for each output symbol in O
• emb(y) = x? • probability distribuUon over output symbols?
![Page 5: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/5.jpg)
5
![Page 6: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/6.jpg)
Example: Language Modeling
6
… if the car …
![Page 7: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/7.jpg)
Language Modeling: Training
7
… if the car …
![Page 8: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/8.jpg)
Language Modeling: Training
8
… if the car …
…
![Page 9: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/9.jpg)
Language Modeling: Decoding • we’ll use the term “decoding” to mean roughly “test-‐Ume inference”
• for language modeling, decoding means “output the highest-‐probability sentence”
9
![Page 10: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/10.jpg)
Language Modeling: Decoding
10
<s>
“The”
![Page 11: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/11.jpg)
Language Modeling: Decoding
11
<s> The
“one”
![Page 12: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/12.jpg)
Language Modeling: Decoding
12
<s> The one
![Page 13: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/13.jpg)
Concern • there’s a mismatch between training and test! • (what is it?)
13
![Page 14: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/14.jpg)
Sequence-‐to-‐Sequence Modeling • data: <input sequence, output sequence> pairs • use one neural network to encode input sequence as a vector
• use an output RNN to generate the output sequence (condiUoned on the input sequence vector)
• more generally called “encoder-‐decoder” architectures
14
![Page 15: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/15.jpg)
EMNLP 2013
![Page 16: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/16.jpg)
EMNLP 2013
![Page 17: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/17.jpg)
EMNLP 2014
![Page 18: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/18.jpg)
NIPS 2014
![Page 19: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/19.jpg)
NIPS 2014
![Page 20: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/20.jpg)
Unsupervised Sentence Models • how should we evaluate sentence models? • we consider two ways here: – sentence similarity:
• two sentences with similar meanings should have high cosine similariUes • metric: corr. between similariUes & human judgments
– sentence classificaUon: • train a linear classifier using the fixed sentence representaUon as the input features • metric: average accuracy over 6 tasks
20
![Page 21: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/21.jpg)
EvaluaUon 1: SemanUc Textual Similarity (STS)
21
Other ways are needed.
We must find other ways.
I absolutely do believe there was an iceberg in those waters. I don't believe there was any iceberg at all anywhere near the Titanic.
4.4
1.2
Results reported on datasets from 6 domains
![Page 22: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/22.jpg)
Paragraph Vectors • Represent sentence (or paragraph) by predicUng its own words or context words
22
Le & Mikolov (2014)
![Page 23: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/23.jpg)
EvaluaUon Sentence Embedding Model STS Classifica?on Paragraph Vector 44 70.4
23
Hill, Cho, Korhonen (2016)
![Page 24: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/24.jpg)
Sentence Autoencoders • encode sentence as vector, then decode it • minimize reconstrucUon error (using squared error or cross entropy) of original words in sentence
24
![Page 25: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/25.jpg)
Recursive Neural Net Autoencoders
25
Socher, Huang, Pennington, Ng, Manning (2011)
• composiUon based on syntacUc parse
![Page 26: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/26.jpg)
LSTM Autoencoders
26
Li, Luong, Jurafsky (2015)
• Encode sentence, decode sentence
![Page 27: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/27.jpg)
EvaluaUon Sentence Embedding Model STS Classifica?on Paragraph Vector 44 70.4 LSTM Autoencoder 43 75.9
27
Hill, Cho, Korhonen (2016)
![Page 28: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/28.jpg)
LSTM Denoising Autoencoders
28
Hill, Cho, Korhonen (2016)
• Encode “corrupted” sentence, decode sentence
![Page 29: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/29.jpg)
EvaluaUon Sentence Embedding Model STS Classifica?on Paragraph Vector 44 70.4 LSTM Autoencoder 43 75.9 LSTM Denoising Autoencoder 38 78.9
29
Hill, Cho, Korhonen (2016)
![Page 30: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/30.jpg)
Other Training Criteria for Sentence Embeddings
• encode sentence, decode other things from it: – decode its translaUon – decode neighboring sentences – predict words in the sentence and predict words in neighboring sentences
30
![Page 31: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/31.jpg)
31
Cho, van Merrienboer, Gulcehre, Bahdanau, Bougares, Schwenk, Bengio (2014)
Neural Machine TranslaUon • Encode source sentence, decode translaUon
Sutskever, Vinyals, Le (2014)
![Page 32: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/32.jpg)
32
Sutskever, Vinyals, Le (2014)
Encoder as a Sentence Embedding Model?
![Page 33: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/33.jpg)
33
Encoder as a Sentence Embedding Model?
Sutskever, Vinyals, Le (2014)
![Page 34: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/34.jpg)
Cho, van Merrienboer, Gulcehre, Bahdanau, Bougares, Schwenk, Bengio (2014)
![Page 35: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/35.jpg)
EvaluaUon Sentence Embedding Model STS Classifica?on Paragraph Vector 44 70.4 LSTM Autoencoder 43 75.9 LSTM Denoising Autoencoder 38 78.9 Neural MT Encoder 42 76.9
35
Hill, Cho, Korhonen (2016)
![Page 36: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/36.jpg)
Skip-‐Thoughts
36
• encode sentence using an RNN • decode two neighboring sentences • use different RNNs for previous and next sentences • also pass center sentence vector on each decoding step
…I got back home I could see the cat on the steps This was strange …
Kiros, Zhu, Salakhutdinov, Zemel, Torralba, Urtasun, Fidler (2015)
![Page 37: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/37.jpg)
Skip-‐Thoughts
37
query sentence: im sure youll have a glamorous evening , she said , giving an exaggerated wink .
nearest neighbor: im really glad you came to the party tonight , he said , turning to her .
Kiros, Zhu, Salakhutdinov, Zemel, Torralba, Urtasun, Fidler (2015)
![Page 38: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/38.jpg)
Skip-‐Thoughts
38
query sentence: if he had a weapon , he could maybe take out their last imp , and then beat up errol and vanessa .
nearest neighbor: if he could ram them from behind , send them sailing over the far side of the levee , he had a chance of stopping them .
Kiros, Zhu, Salakhutdinov, Zemel, Torralba, Urtasun, Fidler (2015)
![Page 39: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/39.jpg)
EvaluaUon Sentence Embedding Model STS Classifica?on Paragraph Vector 44 70.4 LSTM Autoencoder 43 75.9 LSTM Denoising Autoencoder 38 78.9 Neural MT Encoder 42 76.9 Skip Thought 31 85.3
39
Hill, Cho, Korhonen (2016) WieUng, Bansal, Gimpel, Livescu (2016)
![Page 40: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/40.jpg)
FastSent • encode center sentence using sum • decode to predict words in neighboring sentences • different embedding spaces for “input” and “output” words
40
…I got back home I could see the cat on the steps This was strange …
Hill, Cho, Korhonen (2016)
…
I could see the cat on the steps
home was strange
![Page 41: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/41.jpg)
FastSent + Autoencoder • encode center sentence using sum • decode to predict words in current+neighboring sentences
41
…I got back home I could see the cat on the steps This was strange …
Hill, Cho, Korhonen (2016)
…
I could see the cat on the steps
home was strange
cat
![Page 42: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/42.jpg)
EvaluaUon Sentence Embedding Model STS Classifica?on Paragraph Vector 44 70.4 LSTM Autoencoder 43 75.9 LSTM Denoising Autoencoder 38 78.9 Neural MT Encoder 42 76.9 Skip Thought 31 85.3 FastSent 64 79.3 FastSent + Autoencoder 62 79.7
42
Hill, Cho, Korhonen (2016) WieUng, Bansal, Gimpel, Livescu (2016)
![Page 43: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/43.jpg)
Sentence Embedding Model STS Classifica?on Paragraph Vector 44 70.4 LSTM Autoencoder 43 75.9 LSTM Denoising Autoencoder 38 78.9 Neural MT Encoder 42 76.9 Skip Thought 31 85.3 FastSent 64 79.3 FastSent + Autoencoder 62 79.7 C-‐PHRASE 67 81.7
43
Hill, Cho, Korhonen (2016) WieUng, Bansal, Gimpel, Livescu (2016)
![Page 44: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/44.jpg)
Sentence Embedding Model STS Classifica?on Paragraph Vector 44 70.4 LSTM Autoencoder 43 75.9 LSTM Denoising Autoencoder 38 78.9 Neural MT Encoder 42 76.9 Skip Thought 31 85.3 FastSent 64 79.3 FastSent + Autoencoder 62 79.7 C-‐PHRASE 67 81.7 Avg. pretrained word embeddings 65 80.6
44
Hill, Cho, Korhonen (2016) WieUng, Bansal, Gimpel, Livescu (2016)
![Page 45: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/45.jpg)
Paraphrase Database (PPDB) (Ganitkevitch, Van Durme, and Callison-‐Burch, 2013)
45
credit: Chris Callison-‐Burch
![Page 46: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/46.jpg)
Training Data: phrase pairs from PPDB
tens of millions more!
good great
be given the opportunity to have the possibility of
i can hardly hear you . you 're breaking up .
and the establishment as well as the development
laying the foundaUons pave the way
making every effort to do its utmost
… …
![Page 47: Lecture$5:$ SequenceDtoDSequence$Modeling$ …kgimpel/teaching/31210/... · 2017. 4. 17. · Language$Modeling:$Decoding$ • we’ll$use$the$term$“decoding”$to$mean$ roughly$“testUme$inference”$](https://reader035.fdocuments.net/reader035/viewer/2022071407/60feaf0b6e9e9a1afd5cb7bd/html5/thumbnails/47.jpg)
Sentence Embedding Model STS Classifica?on Paragraph Vector 44 70.4 LSTM Autoencoder 43 75.9 LSTM Denoising Autoencoder 38 78.9 Neural MT Encoder 42 76.9 Skip Thought 31 85.3 FastSent 64 79.3 FastSent + Autoencoder 62 79.7 C-‐PHRASE 67 81.7 Avg. pretrained word embeddings 65 80.6 Ours (avg. trained on PPDB) 71 N/A
47
Hill, Cho, Korhonen (2016) WieUng, Bansal, Gimpel, Livescu (2016)