Open Source Toolkit for Speech to Text...

10
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association Institute for Antrophomatics www.kit.edu Open Source Toolkit for Speech to Text Translation Thomas Zenkel, Matthias Sperber, Jan Niehues, Markus Müller, Ngoc- Quan Pham, Sebastian Stüker, Alex Waibel

Transcript of Open Source Toolkit for Speech to Text...

Page 1: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues

KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association

Institute for Antrophomatics

www.kit.edu

Open Source Toolkit for Speech to Text TranslationThomas Zenkel, Matthias Sperber, Jan Niehues, Markus Müller, Ngoc-

Quan Pham, Sebastian Stüker, Alex Waibel

Page 2: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues

Institute for Anthropomatics2 16.08.18 Jan Niehues - S2T Translation

Motivation

• Speech translation interesting challenge• Neural models• End-to-End models

• Provide a baseline• Cascade of several models

• Easy to extend• Develop models for part

• Easy to use• Download pretrained models

Page 3: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues

Institute for Anthropomatics3 16.08.18 Jan Niehues - S2T Translation

Cascade Spoken Language Translation

• Serial combination of severalmodels

• ASR• Audio → Text

• Segmentation• Add case information• Add punctuation information

• Machine translation• Source language →

target language

Page 4: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues

Institute for Anthropomatics4 16.08.18 Jan Niehues - S2T Translation

CTC-based ASR

• Input:• 40 dimensional Mel-filterbank features

• Output:• Byte-pair units (300 or 10000)

• Model:• 4-layer Bi-LSTM• Softmax layer

• Trained using CTC loss function

Page 5: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues

Institute for Anthropomatics5 16.08.18 Jan Niehues - S2T Translation

Encoder-Decoder Based ASR

• XNMT-based implementation• Input:

• 40 dimensional Mel-filterbank features

• Encoder:• 4-layer bidirectional pyramidal

encoder• Decoder:

• One-layer bidirectional decoder

Page 6: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues

Institute for Anthropomatics6 16.08.18 Jan Niehues - S2T Translation

Segmentation and Punctuation

• Monolingual machine translation system• Add punctuation and case

• Example:• Input:

• i felt wor@@ se why i wro@@ te a who@@ le book

• Output:• U L L. U? U L L L L• I felt worse. Why? I wrote a whole book

• Preprocessing:• Randomly split training data and

remove punctuation information

• OpenNMT-based model

Page 7: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues

Institute for Anthropomatics7 16.08.18 Jan Niehues - S2T Translation

Machine Translation

• OpenNMT-based model• RNN-based Encoder and Decoder

• Preprocessing:• Tokenizer• Byte-pair encoding

• Mid-size model:• Pre-training on all data• Adaptation to in-domain data using

continue training

Page 8: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues

Institute for Anthropomatics8 16.08.18 Jan Niehues - S2T Translation

Data

• Scripts to download and preprocessdefault data

• Audio:• TED LIUM corpus

• Text:• Small model:

• WIT corpus• Midsize model:

• EPPS corpus• WIT corpus

Page 9: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues

Institute for Anthropomatics9 16.08.18 Jan Niehues - S2T Translation

Results

• Evaluation tool to calculate 4 metrics provided• BLEU, TER, CharacTER, BEER• Automatic re-segmentation

Model dev2010 tst2010 tst2013 tst2014

Attention 13.42 13.57 12.04 11.88

CTC 300 12.33 11.88 12.47 11.49

CTC 10K 13.04 13.44 13.41 12.58

Rover 13.98 14.08 13.73 13.23

Page 10: Open Source Toolkit for Speech to Text Translationufal.mff.cuni.cz/mtm18/files/pbml-mtm-presentation-01-SLT-KIT.pdf · Quan Pham, Sebastian Stüker, Alex Waibel . 2 16.08.18 Jan Niehues

Institute for Anthropomatics10 16.08.18 Jan Niehues - S2T Translation

Conclusion

• Combination of several toolkits to build full speech translation toolkit

• Easy usage:• Dockerized

• Applications• Apply pre-trained models• Train models using provided data (IWSLT)• Train models on own data

• Link:• https://github.com/isl-mt/SLT.KIT