Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of...
-
Upload
stewart-shields -
Category
Documents
-
view
239 -
download
0
description
Transcript of Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of...
![Page 1: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/1.jpg)
Source Language Adaptationfor Resource-Poor Machine Translation
Pidong Wang, National University of SingaporePreslav Nakov, QCRI, Qatar Foundation
Hwee Tou Ng, National University of Singapore
![Page 2: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/2.jpg)
Introduction
![Page 3: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/3.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
3Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng) 3Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Overview
Statistical Machine Translation (SMT) systems Need large sentence-aligned bilingual corpora (bi-texts).
ProblemSuch training bi-texts do not exist for most languages.
IdeaAdapt a bi-text for a related resource-rich language.
![Page 4: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/4.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
4Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Idea: reuse bi-texts from related resource-rich languages to improve resource-poor SMT
Related languages have overlapping vocabulary (cognates)
e.g., casa (‘house’) in Spanish, Portuguese
similarword ordersyntax
Idea & Motivation
![Page 5: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/5.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
5Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng) 5Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Related EU – nonEU languages Swedish – Norwegian Bulgarian – Macedonian
Related EU languages Spanish – Catalan Czech – Slovak Irish – Gaelic Scottish Standard German – Swiss German
Related languages outside Europe MSA – Dialectical Arabic (e.g., Egyptian, Gulf, Levantine, Iraqi) Hindi – Urdu Turkish – Azerbaijani Russian – Ukrainian Malay – Indonesian
Resource-rich vs. Resource-poor Languages
We will explorethese pairs.
![Page 6: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/6.jpg)
6Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Our Main focus:
ImprovingIndonesian-English SMT
Using Malay-English
![Page 7: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/7.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
7Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng) 7Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Malay vs. Indonesian
MalaySemua manusia dilahirkan bebas dan samarata dari segi
kemuliaan dan hak-hak.Mereka mempunyai pemikiran dan perasaan hati dan
hendaklah bertindak di antara satu sama lain dengan semangat persaudaraan.
IndonesianSemua orang dilahirkan merdeka dan mempunyai martabat
dan hak-hak yang sama.Mereka dikaruniai akal dan hati nurani dan hendaknya
bergaul satu sama lain dalam semangat persaudaraan.
~50% exact word overlap
from Article 1 of the Universal Declaration of Human Rights
![Page 8: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/8.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
8Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng) 8Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Malay Can Look “More Indonesian”…
MalaySemua manusia dilahirkan bebas dan samarata
dari segi kemuliaan dan hak-hak.Mereka mempunyai pemikiran dan perasaan hati
dan hendaklah bertindak di antara satu sama lain dengan semangat persaudaraan.
~75% exact word overlap
Post-edited Malay to look “Indonesian” (by an Indonesian speaker).
IndonesianSemua manusia dilahirkan bebas dan mempunyai martabat
dan hak-hak yang sama.Mereka mempunyai pemikiran dan perasaan dan hendaklah
bergaul satu sama lain dalam semangat persaudaraan.
from Article 1 of the Universal Declaration of Human Rights
We attempt to do this automatically:adapt Malay to look IndonesianThen, use it to improve SMT…
![Page 9: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/9.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
9Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Indonesian
Malay
Englishpoor
rich
Method at a Glance
Indonesian
“Indonesian”
Englishpoor
rich
Step 1:Adaptation
Indonesian + “Indonesian” EnglishStep 2:
Combination
Adapt
Note that we have no Malay-Indonesian bi-text!
![Page 10: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/10.jpg)
10Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Step 1:Adapting Malay-Englishto “Indonesian”-English
![Page 11: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/11.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
11Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng) 11Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Word-Level Bi-text Adaptation:Overview
Given a Malay-English sentence pair
1. Adapt the Malay sentence to “Indonesian”• Word-level paraphrases• Phrase-level paraphrases• Cross-lingual morphology
2. We pair the adapted “Indonesian” with English from Malay-English sentence pair
Thus, we generate a new “Indonesian”-English sentence pair.
![Page 12: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/12.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
12Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng) 12Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Malay: KDNK Malaysia dijangka cecah 8 peratus pada tahun 2010.
Decode using a large Indonesian LM
Word-Level Bi-text Adaptation:Overview
![Page 13: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/13.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
13Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Malaysia’s GDP is expected to reach 8 per cent in 2010.
13
Pair each with the English counter-part
Thus, we generate a new “Indonesian”-English bi-text.
Word-Level Bi-text Adaptation:Overview
![Page 14: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/14.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
14Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Indonesian translations for Malay: pivoting over English
Weights
14Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Malay sentenceML1 ML2 ML3 ML4 ML5
English sentenceEN1 EN2 EN3 EN4
English sentenceEN11 EN3 EN12
Indonesian sentenceIN1 IN2 IN3 IN4
ML-EN bi-text
IN-EN bi-text
Word-Level Adaptation:Extracting Paraphrases
Note: we have no Malay-Indonesian bi-text, so we pivot.
![Page 15: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/15.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
15Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
IN-EN bi-text is small, thus:
Unreliable IN-EN word alignments bad ML-IN paraphrases Solution:
improve IN-EN alignments using the ML-EN bi-text concatenate: IN-EN*k + ML-EN
» k ≈ |ML-EN| / |IN-EN|
word alignment get the alignments for one copy of IN-EN only
15Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Word-Level Adaptation:Issue 1
IN
ML
ENpoor
rich
Works because of cognates between Malay and Indonesian.
![Page 16: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/16.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
16Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
IN-EN bi-text is small, thus:
Small IN vocabulary for the ML-IN paraphrases Solution:
Add cross-lingual morphological variants: Given ML word: seperminuman Find ML lemma: minum Propose all known IN words sharing the same lemma:
» diminum, diminumkan, diminumnya, makan-minum, makananminuman, meminum, meminumkan, meminumnya, meminum-minuman, minum, minum-minum, minum-minuman, minuman, minumanku, minumannya, peminum, peminumnya, perminum, terminum
16Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Word-Level Adaptation:Issue 2
IN
ML
ENpoor
rich
Note: The IN variants are from a larger monolingual IN text.
![Page 17: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/17.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
17Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Word-level pivoting Ignores context, and relies on LM Cannot drop/insert/merge/split/reorder words Solution:
Phrase-level pivoting Build ML-EN and EN-IN phrase tables Induce ML-IN phrase table (pivoting over EN) Adapt the ML side of ML-EN to get “IN”-EN bi-text:
» using Indonesian LM and n-best “IN” as before
Also, use cross-lingual morphological variants
17Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Word-Level Adaptation:Issue 3
- Models context better: not only Indonesian LM, but also phrases.- Allows many word operations, e.g., insertion, deletion.
IN
ML
ENpoor
rich
![Page 18: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/18.jpg)
18Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Step 2:Combining
IN-EN + “IN”-EN
![Page 19: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/19.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
19Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Combining IN-EN and “IN”-EN bi-texts Simple concatenation: IN-EN + “IN”-EN
Balanced concatenation: IN-EN * k + “IN”-EN
Sophisticated phrase table combination: (Nakov and Ng, EMNLP 2009), (Nakov and Ng, JAIR 2012) Improved word alignments for IN-EN Phrase table combination with extra features
Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Improved Statistical Machine Translation for Resource-Poor Languages Using Related Resource-Rich Languages. (EMNLP 2009)Preslav Nakov, Hwee Tou Ng
![Page 20: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/20.jpg)
20Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Experiments & Evaluation
![Page 21: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/21.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
21Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Data
Translation data (for IN-EN) IN2EN-train: 0.9M IN2EN-dev: 37K IN2EN-test: 37K EN-monoling.: 5M
Adaptation data (for ML-EN “IN”-EN) ML2EN: 8.6M IN-monoling.: 20M
(tokens)
![Page 22: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/22.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
22Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Isolated Experiments:Training on “IN”-EN only
BLEU
System combination using MEMT (Heafield and Lavie, 2010)
![Page 23: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/23.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
23Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng) 23
BLEU
Combined Experiments:Training on IN-EN + “IN”-EN
![Page 24: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/24.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
24Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Experiments: Improvements
24
BLEU
![Page 25: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/25.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
25Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Improve Macedonian-English SMT by adapting Bulgarian-English bi-text Adapt BG-EN (11.5M words) to “MK”-EN (1.2M words) OPUS movie subtitles
Application to Other Languages & Domains
BLEU
![Page 26: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/26.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
26Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng) 26Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Conclusion
![Page 27: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/27.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
27Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Adapt bi-texts for related resource-rich languages, using confusion networks word-level & phrase-level paraphrasing cross-lingual morphological analysis
Achieved:+6.7 BLEU over ML2EN+2.6 BLEU over IN2EN+1.5-3.0 BLEU over comb(IN2EN,ML2EN)
Future work add split/merge as word operations better integrate word-level and phrase-level methods apply our methods to other languages & NLP problems
Thank you!
Conclusion & Future Work
Supported by the Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative and administered by the IDM Programme Office.
![Page 28: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/28.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
28Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng) 28Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Further Analysis
![Page 29: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/29.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
29Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
ParaphrasingNon-Indonesian Malay Words Only
So, we do need to paraphrase all words.
![Page 30: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/30.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
30Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Human Judgments
Morphology yields worse top-3 adaptationsbut better phrase tables, due to coverage.
Is the adapted sentence better Indonesianthan the original Malay sentence?
100 random sentences
![Page 31: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/31.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
31Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Reverse AdaptationIdea:
Adapt dev/test Indonesian input to “Malay”,then, translate with a Malay-English system
Input to SMT: - “Malay” lattice- 1-best “Malay” sentence from the lattice
Adapting dev/test is worse than adapting the training bi-text:So, we need both n-best and LM
![Page 32: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/32.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
32Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng) 32Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Related Work
![Page 33: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/33.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
33Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Related Work (1)
Machine translation between related languages E.g.
Cantonese–Mandarin (Zhang, 1998)
Czech–Slovak (Hajic & al., 2000)
Turkish–Crimean Tatar (Altintas & Cicekli, 2002)
Irish–Scottish Gaelic (Scannell, 2006)
Bulgarian–Macedonian (Nakov & Tiedemann, 2012)
We do not translate (no training data), we “adapt”.
![Page 34: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/34.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
34Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Related Work (2)
Adapting dialects to standard language (e.g., Arabic)(Bakr & al., 2008; Sawaf, 2010; Salloum & Habash, 2011)
manual rules
Normalizing Tweets and SMS(Aw & al., 2006; Han & Baldwin, 2011)
informal text: spelling, abbreviations, slang same language
![Page 35: Source Language Adaptation for Resource-Poor Machine Translation Pidong Wang, National University of Singapore Preslav Nakov, QCRI, Qatar Foundation Hwee.](https://reader036.fdocuments.net/reader036/viewer/2022062412/5a4d1ae37f8b9ab059978145/html5/thumbnails/35.jpg)
EMNLP-CoNLL 2012, July 12, 2012, Jeju, Korea
35Source Language Adaptation for Resource-Poor Machine Translation (Wang, Nakov, & Ng)
Related Work (3)
Adapt Brazilian to European Portuguese (Marujo & al. 2011)
rule-based, language-dependent tiny improvements for SMT
Reuse bi-texts between related languages (Nakov & Ng. 2009)
no language adaptation (just transliteration)