An Introduction to Neural Machine Translation...Translation after training: Chop source sentences...
Transcript of An Introduction to Neural Machine Translation...Translation after training: Chop source sentences...
![Page 1: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/1.jpg)
An Introduction to �Neural Machine Translation
ATA59 Carola F. Berger
![Page 2: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/2.jpg)
![Page 3: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/3.jpg)
Disclaimer 1 – This Presentation�
Carola F. Berger, An Introduction to NMT, ATA59 2
![Page 4: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/4.jpg)
Disclaimer 1 – This Presentation�
Carola F. Berger, An Introduction to NMT, ATA59 2
![Page 5: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/5.jpg)
Disclaimer 2�
Carola F. Berger, An Introduction to NMT, ATA59 3
After this presentation, you will hopefully understand how neural MT works, but you will not understand what it does.
![Page 6: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/6.jpg)
Disclaimer 2�
Carola F. Berger, An Introduction to NMT, ATA59 3
After this presentation, you will hopefully understand how neural MT works, but you will not understand what it does.
~100 million parameters
![Page 7: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/7.jpg)
Outline�
Brief recap: previous MT approaches
How do neural networks work?
How do words get into and out of a neural network?
Carola F. Berger, An Introduction to NMT, ATA59 4
![Page 8: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/8.jpg)
MT Approaches� Rules based:
Basically just grammar + dictionary
Statistical MT: Chop sentences up into n-grams (sequences of n words) or phrases. Training of the engine: Calculate frequency = probability in source and target language. Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = ?
Carola F. Berger, An Introduction to NMT, ATA59 5
![Page 9: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/9.jpg)
MT Approaches� Rules based:
Basically just grammar + dictionary
Statistical MT: Chop sentences up into n-grams (sequences of n words) or phrases. Training of the engine: Calculate frequency = probability in source and target language. Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary Pros: Output is deterministic. No words missing. Cons: Context!
Neural MT – this presentation
Carola F. Berger, An Introduction to NMT, ATA59 5
![Page 10: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/10.jpg)
Statistical MT�
Carola F. Berger, An Introduction to NMT, ATA59 6
From G. M. de Buy Wenninger, K. Sima’an, �PBML No. 101, April 2014, pp. 43
![Page 11: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/11.jpg)
How Do Neural Networks Work?�
A (not so) brief recap of last year’s �presentation at ATA58. See also handouts as PDF in the app or on my website (see �references).
Carola F. Berger, An Introduction to NMT, ATA59 7
![Page 12: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/12.jpg)
Biological Neuron�
Bruce Blaus, https://commons.wikimedia.org/wiki/File:Blausen_0657_MultipolarNeuron.png
Carola F. Berger, An Introduction to NMT, ATA59 8
![Page 13: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/13.jpg)
Artificial Neuron�
Carola F. Berger, An Introduction to NMT, ATA59 9
![Page 14: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/14.jpg)
Artificial Neuron�
Carola F. Berger, An Introduction to NMT, ATA59 10
![Page 15: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/15.jpg)
Artificial Neuron - Perceptron�
+
If blob > 3 => output 1, else output 0
Carola F. Berger, An Introduction to NMT, ATA59 10
![Page 16: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/16.jpg)
Artificial Neuron - Perceptron�
1
1
+
If blob > 3 => output 1, else output 0
Carola F. Berger, An Introduction to NMT, ATA59 10
![Page 17: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/17.jpg)
Artificial Neuron - Perceptron�
1
1
+
If blob > 3 => output 1, else output 0
0
Carola F. Berger, An Introduction to NMT, ATA59 10
![Page 18: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/18.jpg)
Artificial Neuron - Perceptron�
2
1
+
If blob > 3 => output 1, else output 0
1
Carola F. Berger, An Introduction to NMT, ATA59 10
![Page 19: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/19.jpg)
Artificial Neuron - Perceptron�
Carola F. Berger, An Introduction to NMT, ATA59 11
x1� x2� output�
1� 1� 0�1� 2� 1�
![Page 20: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/20.jpg)
Artificial Neuron - Perceptron�
Carola F. Berger, An Introduction to NMT, ATA59 11
![Page 21: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/21.jpg)
Artificial Neuron - Perceptron�
Carola F. Berger, An Introduction to NMT, ATA59 11
![Page 22: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/22.jpg)
Artificial Neuron - Perceptron�
Carola F. Berger, An Introduction to NMT, ATA59 11
![Page 23: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/23.jpg)
Artificial Neuron - Perceptron�
Carola F. Berger, An Introduction to NMT, ATA59 11
![Page 24: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/24.jpg)
Artificial Neuron - Perceptron�
Carola F. Berger, An Introduction to NMT, ATA59 11
![Page 25: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/25.jpg)
Artificial Neuron�
Carola F. Berger, An Introduction to NMT, ATA59 12
![Page 26: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/26.jpg)
Neural Network� Human brain:
Hagmann P, Cammoun L, Gigandet X, Meuli R, Honey CJ, Wedeen VJ, Sporns O (2008) �Mapping the structural core of human cerebral cortex. PLoS Biology Vol. 6, No. 7, e159.
Carola F. Berger, An Introduction to NMT, ATA59 13
![Page 27: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/27.jpg)
Artificial Neural Network�
Adapted from: Cburnett, https://commons.wikimedia.org/wiki/File:Artificial_neural_network.svg
Carola F. Berger, An Introduction to NMT, ATA59 14
![Page 28: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/28.jpg)
What is AI?�
Adapted from: S. Jurvetson, https://www.flickr.com/photos/jurvetson/31409423572/ Carola F. Berger, An Introduction to NMT, ATA59 15
![Page 29: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/29.jpg)
Artificial Neural Network� Training:
Adapted from: Cburnett, https://commons.wikimedia.org/wiki/File:Artificial_neural_network.svg
Carola F. Berger, An Introduction to NMT, ATA59 16
![Page 30: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/30.jpg)
Artificial Neural Network� Training:
Adapted from: Cburnett, https://commons.wikimedia.org/wiki/File:Artificial_neural_network.svg
Adapt weights �(“arrows”) according to difference between desired output and �actual output, e.g. by backpropagation
Feed in �training data
Carola F. Berger, An Introduction to NMT, ATA59 16
![Page 31: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/31.jpg)
Neural Net Example – Digit Recognition�Sample input data
Carola F. Berger, An Introduction to NMT, ATA59 17
![Page 32: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/32.jpg)
Neural Net Example – Digit Recognition�Sample input (20x20 pixels)
Carola F. Berger, An Introduction to NMT, ATA59 18
![Page 33: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/33.jpg)
Neural Net Example – Digit Recognition�Sample input (20x20 pixels)
Carola F. Berger, An Introduction to NMT, ATA59 18
![Page 34: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/34.jpg)
Neural Net Example – Digit Recognition�Weights
Carola F. Berger, An Introduction to NMT, ATA59 19
![Page 35: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/35.jpg)
Neural Net Example – Digit Recognition�Weights to hidden units – “feature” extraction
Carola F. Berger, An Introduction to NMT, ATA59 20
![Page 36: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/36.jpg)
Neural Net Example – Digit Recognition�Weights to hidden units
Carola F. Berger, An Introduction to NMT, ATA59 21
![Page 37: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/37.jpg)
Neural Net Example – Digit Recognition�Weights to hidden units
Carola F. Berger, An Introduction to NMT, ATA59 21
![Page 38: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/38.jpg)
Neural Net Example – Digit Recognition�Hidden units to output
Carola F. Berger, An Introduction to NMT, ATA59 22
![Page 39: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/39.jpg)
Neural Net Example – Digit Recognition�Hidden units to output
Carola F. Berger, An Introduction to NMT, ATA59 23
![Page 40: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/40.jpg)
Neural Net Example – Digit Recognition�Hidden units to output
Carola F. Berger, An Introduction to NMT, ATA59 23
![Page 41: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/41.jpg)
Neural Net Example – Digit Recognition�
Input Internal convolution Hidden Output
Internal conv.
2
Carola F. Berger, An Introduction to NMT, ATA59 24
![Page 42: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/42.jpg)
Neural Net Example – Digit Recognition�What happens with unknowns? Klingon 6 [jav]
Input Internal convolution Hidden Output
Carola F. Berger, An Introduction to NMT, ATA59 25
![Page 43: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/43.jpg)
Neural Net Example – Digit Recognition�Klingon 6 [jav]
Input Internal convolution Hidden Output
Internal conv.
2
Carola F. Berger, An Introduction to NMT, ATA59 25
![Page 44: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/44.jpg)
Artificial Neural Network�
Adapted from: Cburnett, https://commons.wikimedia.org/wiki/File:Artificial_neural_network.svg
Carola F. Berger, An Introduction to NMT, ATA59 26
Feed-forward neural net
![Page 45: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/45.jpg)
How Do Words Get Into and �Out of the Network?�
• Challenges for NMT:
Input and output length not fixed, different sentence ordering in source and target languages => Use recurrent neural networks with attention or convolutional networks
Context => document (or at least paragraph) level, not sentence level
Training metrics
Carola F. Berger, An Introduction to NMT, ATA59 27
![Page 46: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/46.jpg)
How Do Words Get Into and �Out of the Network?�
• Challenges for NMT:
Input and output length not fixed, different sentence ordering in source and target languages => Use recurrent neural networks with attention or convolutional networks
Context => document (or at least paragraph) level, not sentence level
Training metrics
Carola F. Berger, An Introduction to NMT, ATA59 27
![Page 47: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/47.jpg)
How Do Words Get Into and Out of the Network?�
Adapted from: Cburnett, https://commons.wikimedia.org/wiki/File:Artificial_neural_network.svg
Carola F. Berger, An Introduction to NMT, ATA59 28
Recurrent neural net
![Page 48: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/48.jpg)
How Do Words Get Into and Out of the Network?�
Source: K. Xu et al, https://arxiv.org/abs/1502.03044
Carola F. Berger, An Introduction to NMT, ATA59 29
Attention mechanism
![Page 49: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/49.jpg)
How Do Words Get Into and Out of the Network?�
Source: D. Bahdanau et al, ICLR conference proceedings, https://arxiv.org/pdf/1409.0473.pdf
Carola F. Berger, An Introduction to NMT, ATA59 30
Attention mechanism
![Page 50: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/50.jpg)
How Do Words Get Into and Out of the Network?�
Carola F. Berger, An Introduction to NMT, ATA59 31
Encoder� Hidden �State� Decoder�Source
Text�Target
Text�
RNN with�attention
RNN with�attention
tokenization �(optional)
![Page 51: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/51.jpg)
How Do Words Get Into and Out of the Network?�
Carola F. Berger, An Introduction to NMT, ATA59 31
Encoder� Hidden �State� Decoder�Source
Text�Target
Text�
RNN with�attention
RNN with�attention
![Page 52: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/52.jpg)
How Do Words Get Into and Out of the Network?�
Carola F. Berger, An Introduction to NMT, ATA59 32
https://projector.tensorflow.org
![Page 53: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/53.jpg)
How Do Words Get Into and Out of the Network?�
Carola F. Berger, An Introduction to NMT, ATA59 6
From G. M. de Buy Wenninger, K. Sima’an, �PBML No. 101, April 2014, pp. 43
Recall: SMT
![Page 54: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/54.jpg)
Neural Nets - Recap�ü Training = extraction of
“features” (=patterns) from training data
Carola F. Berger, An Introduction to NMT, ATA59 33
![Page 55: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/55.jpg)
Neural Nets - Recap�ü Training = extraction of
“features” (=patterns) from training data ü The more hidden layers and hidden units, the
more parameters (possible overfitting!)
Carola F. Berger, An Introduction to NMT, ATA59 33
![Page 56: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/56.jpg)
Neural Nets - Recap�ü Training = extraction of
“features” (=patterns) from training data ü The more hidden layers and hidden units, the
more parameters (possible overfitting!) ü Beware: Garbage in -> worse garbage out!
Carola F. Berger, An Introduction to NMT, ATA59 33
![Page 57: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/57.jpg)
Neural Nets - Recap�ü Training = extraction of
“features” (=patterns) from training data ü The more hidden layers and hidden units, the
more parameters (possible overfitting!) ü Beware: Garbage in -> worse garbage out! ü ANNs work well for pattern recognition
after training, including “context”
Carola F. Berger, An Introduction to NMT, ATA59 33
![Page 58: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/58.jpg)
Neural Nets - Recap�ü Training = extraction of
“features” (=patterns) from training data ü The more hidden layers and hidden units, the
more parameters (possible overfitting!) ü Beware: Garbage in -> worse garbage out! ü ANNs work well for pattern recognition
after training, including “context” ü Completely unpredictable when confronted
with new, hitherto unknown data
Carola F. Berger, An Introduction to NMT, ATA59 33
![Page 59: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/59.jpg)
Training Data�
Carola F. Berger, An Introduction to NMT, ATA59 34 Source: P. Koehn, https://arxiv.org/abs/1709.07809
![Page 60: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/60.jpg)
Unpredictability�
Carola F. Berger, An Introduction to NMT, ATA59 35
![Page 61: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/61.jpg)
Unpredictability�
Carola F. Berger, An Introduction to NMT, ATA59 35
![Page 62: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/62.jpg)
Unpredictability�
Carola F. Berger, An Introduction to NMT, ATA59 35
![Page 63: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/63.jpg)
Unpredictability�
Carola F. Berger, An Introduction to NMT, ATA59 35
![Page 64: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/64.jpg)
Unpredictability�
Carola F. Berger, An Introduction to NMT, ATA59 35
![Page 65: An Introduction to Neural Machine Translation...Translation after training: Chop source sentences into n-grams or phrases, apply previously calculated probabilities. 1-gram SMT = dictionary](https://reader031.fdocuments.net/reader031/viewer/2022013003/5f0ee32c7e708231d4416d01/html5/thumbnails/65.jpg)
References & Further Reading� Slides at: https://www.CFBtranslations.com
Handouts in app and also at https://www.CFBtranslations.com
A. Ng, Machine Learning, Coursera, https://www.coursera.org/learn/machine-learning
Google’s Tensorflow: https://www.tensorflow.org/
Madly Ambiguous - game to illustrate how NMT deals with context: http://madlyambiguous.osu.edu
Carola F. Berger, An Introduction to NMT, ATA59 36