Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The...
-
Upload
melvin-allen -
Category
Documents
-
view
221 -
download
0
Transcript of Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The...
![Page 1: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/1.jpg)
Week 9: resources for globalisation
Finish spell checkers Machine Translation (MT)
The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate
Calculus Human involvement Historical note
![Page 2: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/2.jpg)
Spelling dictionaries Implementing spelling identification
and correction algorithm
![Page 3: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/3.jpg)
Spelling dictionaries Implementing spelling identification and
correction algorithm STAGE 1: compare each string in document with a
list of legal strings; if no corresponding string in list mark as misspelled
STAGE 2: generate list of candidates Apply any single transformation to the typo string Filter the list by checking against a dictionary
STAGE 3: assign probability values to each candidate in the list
STAGE 4: select best candidate
![Page 4: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/4.jpg)
Spelling dictionaries STAGE 3
prior probability given all the words in English, is this candidate more
likely to be what the typist meant than that candidate? P(c) = c/N where N is the number of words in a corpus
likelihood Given, the possible errors, or transformation, how likely
is it that error y has operated on candidate x to produce the typo?
P(t/c), calculated using a corpus of errors, or transformations
Bayesian rule: get the product of the prior probability and the
likelihood P(c) X P(t/c)
![Page 5: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/5.jpg)
Spelling dictionaries non-word errors Implementing spelling identification
and correction algorithm STAGE 1: identify misspelled words STAGE 2: generate list of candidates STAGE 3a: rank candidates for probability STAGE 3b: select best candidate Implement:
noisy channel model Bayesian Rule
![Page 6: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/6.jpg)
Resoucres for Globalisation:Machine translation
![Page 7: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/7.jpg)
Resoucres for Globalisation:Machine translation
The ‘decoding’ paradigm Assumes one-to-one relation between
source symbol and target symbol
![Page 8: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/8.jpg)
Resoucres for Globalisation:Machine translation
The ‘decoding’ paradigm Assumes one-to-one relation between
source symbol and target symbol one-to-many (homonymy)
![Page 9: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/9.jpg)
Resoucres for Globalisation:Machine translation
The ‘decoding’ paradigm Assumes one-to-one relation between
source symbol and target symbol one-to-many (homonymy) one-to-many (hypernym →
hyponyms):
![Page 10: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/10.jpg)
Resoucres for Globalisation:Machine translation
The ‘decoding’ paradigm Assumes one-to-one relation between
source symbol and target symbol one-to-many (homonymy) one-to-many (hypernym →
hyponyms): many-to-one (hyponyms → hypernym)
![Page 11: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/11.jpg)
Machine translation
The ‘decoding’ paradigm one-to-many (homonymy)
bank → Ufer, Bank (German)
![Page 12: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/12.jpg)
Machine translation
The ‘decoding’ paradigm one-to-many (homonymy) one-to-many (hypernym →
hyponyms): brother → otooto, oniisan (Japanese) blue → синий, голубой (Russian)
many-to-one (hyponyms → hypernym)
![Page 13: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/13.jpg)
Machine translation
The ‘decoding’ paradigm one-to-many (homonymy) one-to-many (hypernym →
hyponyms): many-to-one (hyponyms → hypernym)
hill, mountain → Berg (German) learn, teach → leren (Dutch)
![Page 14: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/14.jpg)
Machine translation and globalisation
Ambiguity‘I made her duck’
“The possibility of interpreting an expression in two or more distinct ways”
Collins English Dictionary
![Page 15: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/15.jpg)
Machine translation Ambiguity
Challenge of the translation depends on the level of ambiguity that arises
This depends on the closeness of the source and target languages w.r.t. the following:
vocabulary homonyms
grammar structural ambiguity
conceptual structure specificity ambiguity lexical gaps
![Page 16: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/16.jpg)
Machine translation
Pragmatic approach
![Page 17: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/17.jpg)
Machine translation
Pragmatic approach aim for a rough translation, ‘gist’
translation Used for multi-lingual information
retrieval
![Page 18: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/18.jpg)
Machine translation
Pragmatic approach aim for a rough translation, ‘gist’
translation Used for multi-lingual information
retrieval involve human translators in the
process:computer-aided translation
![Page 19: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/19.jpg)
Machine translation
Translation models Transfer model ‘the dog bit my friend’
Hindi: kutte-ne mere dost ko-kata dog my friend bit
![Page 20: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/20.jpg)
Machine translation
Translation models Transfer model
Alter grammatical structure of source language to make it adhere to the grammatical structure of target language
Use transformation rule Analysis process (source) Transfer process (‘bridge’) Generation process (target) Problem: each source-target pair will need it own
unique set of transformation rules
![Page 21: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/21.jpg)
Machine translation
Translation models Inter-lingua model
Extract the meaning from the source string Give it a language independent
representation, i.e. an interlingua Translation process takes the interlingua as
its input Multiple translation processes take the same
input for multiple target language outputs
![Page 22: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/22.jpg)
Machine translation
Translation models What is the inter-lingua?
for words, some sort of semantic analysis,
e.g. (GO, BY-FOOT) (GO, BY-TRANSPORT)Russian: идти ехать
English: go go
![Page 23: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/23.jpg)
Machine translation and globalisation
Translation models What is the inter-lingua?
for sentences, a logical languagee.g. First Order Predicate Calculus
![Page 24: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/24.jpg)
Meaning representation Goal:
1. the semantic representation must give you a one-to-one mapping to non-linguistic knowledge of the world 2. The representation must be expressive, i.e. handle different types of data
![Page 25: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/25.jpg)
Meaning representation First Order Predicate Calculus
computationally tractable objects (terms) properties of objects relations amongst objects
Predicate argument structure large composite representations
logical connectives
![Page 26: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/26.jpg)
Meaning representation First Order Predicate Calculus
Object: referred to uniquely by a term constant e.g. SurreyUniversity function e.g. LocationOf(SurreyUniversity) variable
![Page 27: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/27.jpg)
Meaning representation First Order Predicate Calculus
Relations amongst objects Predicates:
“symbols that refer to, or name, the relations that hold among some fixed number of objects” (J & M)
Educates(SurreyUniversity, Citizens) two-place predicate
![Page 28: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/28.jpg)
Meaning representation First Order Predicate Calculus
Relations amongst objects Predicates: Can specify the category of an object
University(SurreyUniversity) one-place predicate
![Page 29: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/29.jpg)
Meaning representation First Order Predicate Calculus
properties / parts of objects functions:
LocationOf(SurreyUniversity)
![Page 30: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/30.jpg)
Meaning representation First Order Predicate Calculus
Composite representations through predicates and functions:Near(LocationOf(SurreyUniversity), LocationOf(Cathedral))
![Page 31: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/31.jpg)
Meaning representation First Order Predicate Calculus
Logical connectives combine basic representations to form
larger more complex representationse.g ٨ operator = ‘and’
![Page 32: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/32.jpg)
Meaning representation First Order Predicate Calculus
Logical connectives combine basic representations to form larger
more complex representationsEducates(SurreyUniversity, Citizens) ٨ ¬ Remunerates(SurreyUniversity, Staff)
![Page 33: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/33.jpg)
Machine translation and globalisation
Machine translation and globalisation: change of
priorities 1954: IBM and Georgetown University, first MT demo
goal: ‘perfect’ translation 1967: Automatic Language Process Advisory Committee
(ALPAC) report: damning of goal Post ALPAC
Goal: rough translation, involve human element Current situation: online translation, e.g. Babel Fish,
descendant of SYSTRAN whose goal was rough translation Journal of Machine Translation
![Page 34: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.](https://reader035.fdocuments.net/reader035/viewer/2022062518/56649e6b5503460f94b68eb5/html5/thumbnails/34.jpg)
Next week
Globalisation as an industry SDL and the SDLX-TRADOS
globalisation application