Natural Language Processing - WordPress.com

36
Natural Language Processing Budditha Hettige Department of Computer Engineering

Transcript of Natural Language Processing - WordPress.com

Page 1: Natural Language Processing - WordPress.com

Natural Language Processing

Budditha Hettige

Department of Computer Engineering

Page 2: Natural Language Processing - WordPress.com

Machine Translation

Page 3: Natural Language Processing - WordPress.com

Overview

• What is Machine Translation?

• History

• Approaches

• Existing Machine translation systems

3NLP-MT 2020

Page 4: Natural Language Processing - WordPress.com

Machine Translation

• Computer software that translates text or

speech from one natural language to another

• Sub field of Artificial Intelligence (AI) in the

area of Computer Science

• is a way of converting “the meaning” of one

language into others through a software

program

• Machine Translation gives a potential

solution for language barrier

NLP-MT 2020 4

Page 5: Natural Language Processing - WordPress.com

Pipeline of MT

NLP-MT 2020 5

Page 6: Natural Language Processing - WordPress.com

Machine Translation Pyramid

NLP-MT 2020 6

Page 7: Natural Language Processing - WordPress.com

History

NLP-MT 2020 7

Page 8: Natural Language Processing - WordPress.com

Machine Translation

History

NLP-MT 2020 8

Page 9: Natural Language Processing - WordPress.com

History

• In 1948, “dictionary look-up system” at “Birkbeck College, London”

• 1948, Booth and Richens introduce a dictionary lookup procedure to handle machine translation

• The first machine translation conference was held in 1952 at the MIT

• A word-for-word machine translation system for Russian text into English was introduced by the Perry at MIT in 1952

• In 1958, the first practical MT system (Russian text into English) was implemented by the IBM to US Airforce under the direction of “Gilbet King”

NLP-MT 2020 9

Page 10: Natural Language Processing - WordPress.com

History

• After 1970, “SYSTRAN” implemented a new

Russian-English MT system

• In 1980, computer-aided translations were the

most successful approach for MT, especially for

Japanese- English

• After 1980 Corpus- based machine translation

approach is introduced

• Neural machine translation was first introduced by

Google in 2016

NLP-MT 2020 10

Page 11: Natural Language Processing - WordPress.com

TimeLine

NLP-MT 2020 11

Page 12: Natural Language Processing - WordPress.com

Approaches

NLP-MT 2020 12

Page 13: Natural Language Processing - WordPress.com

Approaches to MT

NLP-MT 2020 13

Page 14: Natural Language Processing - WordPress.com

Interlingua Approach

• Language-independent meaning representation for

the source language to target language translation

• Easier to add a new language

• Meaning representation of the source language is

difficult

• If source language is more complex, then

generation will be too difficult

• Requires all the levels of language analysis– Morphological

– Syntactical

– Semantical

– Pragmatic

NLP-MT 2020 14

Page 15: Natural Language Processing - WordPress.com

Flow of the interlingua MT

NLP-MT 2020 15

Page 16: Natural Language Processing - WordPress.com

Interlingua Systems

• UNITRAN (Translate among English, Spanish, and

German)

• ICENT - A Chinese-English MT system

• English to Arabic machine translation

• English-Hindi interlingua-based machine

translation system

NLP-MT 2020 16

Page 17: Natural Language Processing - WordPress.com

Human-Assisted

• Uses human interaction for the pre editing, post

editing and/or intermediate editing stages

• Uses human support for the semantic handling

in the machine translation

• Humans and machines co-operate is more

success than others

• Systems

– Anusaaraka

– ManTra

– MaTra

NLP-MT 2020 17

Page 18: Natural Language Processing - WordPress.com

Human-Assisted

• considered as a semi-automated machine

translation system

• Much popular for low resource languages

• Human interaction for the “pre-editing”,

“post-editing” and/or “intermediate editing”

stages

• CAT tools

– OmegaT

– Anglabharthi

NLP-MT 2020 18

Page 19: Natural Language Processing - WordPress.com

Dictionary-based MT

• One of the early approaches to machine translation

• Systems give attention to word level

• Systems should be capable of handling morphology

• Is based on word-by-word (word level) translations

• Approach is more accurate on languages that are closely related

• Performance of the dictionary-based translation can be enhanced by introducing the source language morphological analyser and target language morphological generator

NLP-MT 2020 19

Page 20: Natural Language Processing - WordPress.com

Dictionary based MT

NLP-MT 2020 20

Page 21: Natural Language Processing - WordPress.com

Rule-based MT

• Classical approach for MT

• Based on linguistic information about the source and target languages

• Uses a set of language specific rules to provide grammatically correct translations

• RBMT system contains– Source language morphological analyzer

– Source language parser

– Source to target translator

– Target language composer

– Target language morphological generator

– Lexicon dictionaries

NLP-MT 2020 21

Page 22: Natural Language Processing - WordPress.com

Architecture of the RBMT

NLP-MT 2020 22

Page 23: Natural Language Processing - WordPress.com

Systems

• Apertium

• Toshiba

• BEES (English to Sinhala)

NLP-MT 2020 23

Page 24: Natural Language Processing - WordPress.com

Statistical Approach

• Most studied MT approach

• Generates translations using statistical

methods through the bilingual text resources

• Systems

– Moses

– Babel Fish

– Bing Translator

– Google Translator

NLP-MT 2020 24

Page 25: Natural Language Processing - WordPress.com

Activity on Statistical MT

NLP-MT 2020 25

Page 26: Natural Language Processing - WordPress.com

Neural Machine Translation

• A successful approach to machine

translation

• Uses machine learning concepts

• Language models

– recurrent neural language model

– feed-forward neural language model

– long short-term memory models

– deep models

– neural translation models

NLP-MT 2020 26

Page 27: Natural Language Processing - WordPress.com

Neural Machine Translation

• Google’s Neural Machine Translation

• TensorFlow’s Neural Machine Translation

• Sequence-to-sequence model

• Encoder-decoder architecture

NLP-MT 2020 27

Page 28: Natural Language Processing - WordPress.com

Issues in Machine Translation

• Word and Sentence Segmentation

• Word Conjugation

• Tense Detection

• Multi-word Expression

• Out of Vocabulary

• Translating Idiomatic Phrases

NLP-MT 2020 28

Page 29: Natural Language Processing - WordPress.com

Existing

Machine Translation

Systems

NLP-MT 2020 29

Page 30: Natural Language Processing - WordPress.com

Anusaaraka System

NLP-MT 2020 30

• Makes text in one Indian

language accessible to another

Indian language

• System uses Paninian

Grammar model to its language

analysis

• Developed to translate Punjabi,

Bengali, Telugu, Kannada and

Marathi languages into Hindi

• English-Hindi Anusaaraka

translates English text into Hindi

• URL: http://anusaaraka.iiit.ac.in

Page 31: Natural Language Processing - WordPress.com

Apertium

NLP-MT 2020 31

• Rule-based Machine Translation system

• Apertium engine follows a shallow transfer approach

• consists of the eight pipelined modules– de-formatter,

– A morphological analyzer,

– A parts-of-speech tagger

– A lexical transfer module,

– A structural transfer module

– A morphological generator

– A post-generator, and

– A re-formatter

Page 32: Natural Language Processing - WordPress.com

Yahoo Bable fish

• Uses Statistical

approach

NLP-MT 2020 32

Page 33: Natural Language Processing - WordPress.com

Google Translator

• Statistical

approach and

Neural Machine

Translation

NLP-MT 2020 33

Page 34: Natural Language Processing - WordPress.com

Systrans

• Statistical

NLP-MT 2020 34

Page 35: Natural Language Processing - WordPress.com

Bing Translator

NLP-MT 2020 35

Page 36: Natural Language Processing - WordPress.com

Questions

• Answer the following questions

a) Briefly describe the pipeline of the machine translation.

b) Briefly describe Rule-based approach for machine translation

c) Explain how dictionary based machine translation can be improved through the source language morphological analysis.

d) By consider Rule-based machine translation approach, briefly explain the source language understanding steps on the following English sentence.

"The good boy reads a new book."

NLP-MT 2020 36