Computational Model of Grammar for English to Sinhala...

31
Computational Model of Grammar for English to Sinhala Machine Translation By Budditha Hettige Department of Statistics and Computer Science, University of Sri Jayewardenepura, Sri Lanka & Asoka S. Karunanada Faculty of Information Technology, University of Moratuwa, Sri Lanka 1

Transcript of Computational Model of Grammar for English to Sinhala...

Computational Model of Grammar for English to

Sinhala Machine Translation

By

Budditha Hettige

Department of Statistics and Computer Science,

University of Sri Jayewardenepura, Sri Lanka

&

Asoka S. Karunanada

Faculty of Information Technology,

University of Moratuwa, Sri Lanka

1

OverviewOverview

• Introduction

• Machine Translation

• Sinhala Language

• Computational Model of Grammar for Sinhala

Language

• Design & Implementation

• Evaluation

• Conclusion & further works

Computational Model of Grammar for English to Sinhala Machine Translation 2

IntroductionIntroduction

• Machine Translation

– Computer software that translates text or speech from one natural language to another

• Machine Translation gives a potential solution for language barrier

• Many countries use Machine Translation as a solution for their language barrier

– India

– Japan etc.

3Computational Model of Grammar for English to Sinhala Machine Translation

Existing ApproachesExisting Approaches

• Human-assisted

• Rule-based

• Statistical

• Example-based

• Knowledge-based

• Hybrid

• Agent-based

4Computational Model of Grammar for English to Sinhala Machine Translation

NLP @ NLP @ SSri Lankari Lanka

• UCSC

– Optical Character Recognizer

– Sinhala Corpus

– MT etc.

• Other NLP Systems

– Several undergraduate Research

• BEES

– Rule-based machine translation system run under the concept of “Varanageema” (Conjugation)

5Computational Model of Grammar for English to Sinhala Machine Translation

Sinhala LanguageSinhala Language

6

Sinhala LanguageSinhala Language

• Sinhala language has its own writing system, which is

an offspring of the Brahmi script

• Sinhala alphabet consists of 61 letters comprising 18

vowels, 41 consonants and 2 semi-consonants

• Part of speech

– Noun

– Verb

– Indeclinable particles (�පාත, උපස�ග)

7Computational Model of Grammar for English to Sinhala Machine Translation

Sinhala Noun Morphology Sinhala Noun Morphology

• Sinhala Noun is a word that represents the

noun, pronoun and the adjective

• Is inflected for – Gender (lingaya)

– Number (Wachana)

– Person (Purusha)

– Case (Vibhakthi)

– Definiteness

8Computational Model of Grammar for English to Sinhala Machine Translation

Word conjugation Word conjugation ((නාමනාමනාමනාමනාමනාමනාමනාම වරණැ��ලවරණැ��ලවරණැ��ලවරණැ��ලවරණැ��ලවරණැ��ලවරණැ��ලවරණැ��ල))

• More than 27 forms of nouns that can be generated by

inflecting a single root word

• Contains more than hundred rules to conjugate a noun using a

given base form (Prakurthi)

• There are 15 conjugation patterns identified for generating a

Sinhala noun (GANA)

– Eath Ganaya (ඇ� ගණය)

– Wasu Ganaya (ව� ගණය)

– Tara Ganaya (තාර ගණය)

– etc.

9Computational Model of Grammar for English to Sinhala Machine Translation

EathEath GanayaGanaya ((ඇ�ඇ�ඇ�ඇ�ඇ�ඇ�ඇ�ඇ� ගණයගණයගණයගණයගණයගණයගණයගණය))

10

පත�ය �යත ඒක අ�යත උ�ත

Example a r Example a r Example

ඇ� ◌ා ◌් ඇතා ෙත� � ඇෙත�

ෙකො� ◌ා ◌් ෙකොකා ෙක� � ෙකොෙක�

ෙගො! ◌ා ◌් ෙගොනා ෙන� ! ෙගොෙන�

�ක" ◌ා ◌් �කමා ෙම� " �කෙම�

#$� ◌ා ◌් #$ලා ෙල� � #$ෙල�

%�ස් ◌ා ◌් %�සා ෙස� ස් %�ෙස�

Computational Model of Grammar for English to Sinhala Machine Translation

Noun ConjugationNoun Conjugation

• A Single Noun has 28 word forms

11Computational Model of Grammar for English to Sinhala Machine Translation

Sinhala Verb MorphologySinhala Verb Morphology

• More than 18 inflection forms are available in a Sinhala base

verb including inflection of the tense, number and the person

12

Person Number Present Past Future

First Singular බල% බැ'% බල!ෙන%

First Plural බල( බැ'( බල!ෙන(

Second Singular බල) බැ') බල!ෙන)

Second Plural බල* බැ'* බල!ෙන*

Third Singular බල+ බැ' බල!ෙ!ය

Third Plural බල, බැ- බල!ෙනෝය

Computational Model of Grammar for English to Sinhala Machine Translation

Verb ConjugationVerb Conjugation

• A Single Verb has more than 42 word forms

13Computational Model of Grammar for English to Sinhala Machine Translation

Concept of Varanageema Concept of Varanageema (Conjugation)(Conjugation)

• Words in a language can be generated by limited set of base words

• There are limited rules for generating word forms using base words

• Conjugation applies to both Noun and Verbs

• Using base words and those rules, we can reduce the need for storing large

number of words in dictionaries

• Varanageema in Sinhala creates not only derived words, but also handle the

following concepts in English

– Person

– Number

– Determinants

– Prepositions

– Tense

14Computational Model of Grammar for English to Sinhala Machine Translation

Computational Model for Sinhala Computational Model for Sinhala

MorphologyMorphology

• Nama Gana and Kriya Gana give the way, how each

nouns and verbs are derived from its base form

• Iimplemented 85 grammar rules for Sinhala Nouns

• Implement the 18 rules for Kriya Gana

15Computational Model of Grammar for English to Sinhala Machine Translation

KAPUTU GANAYAKAPUTU GANAYA ((ක/0ක/0ක/0ක/0ක/0ක/0ක/0ක/0 ගණයගණයගණයගණයගණයගණයගණයගණය))

16

Kaputu Ganaya ක/0 ගණය

Base Form ක/0

Form Add Remove Example

�යත ඒකවචන ◌ා ◌ු◌ු◌ු◌ු ක/ටා

අ�යත උ�ත ෙට� 0000 ක/ෙට�

අ�යත අ4�ත ෙට5 0000 ක/යට5

බ*වචන උ�ත ෙටෝ 0000 ක/ෙටෝ

බ*වචන අ4�ත ! ◌ු◌ු◌ු◌ු ක/ට!

Computational Model of Grammar for English to Sinhala Machine Translation

Finite State Automata for Finite State Automata for SinhalaSinhala

““kaputukaputu GanaGana””

17Computational Model of Grammar for English to Sinhala Machine Translation

SyntaxSyntax

• 36 syntax rules are implemented to generate

grammatically correct Sinhala sentences

• The Context-Free Grammar (CFG) stands for a

particular method of describing the syntax of

languages

18Computational Model of Grammar for English to Sinhala Machine Translation

Sinhala Language SyntaxSinhala Language Syntax

• Eight components

– Attributive adjunct of Subject (උ�ත 6ෙශේෂණය)

– Subject (උ�තය)

– Attributive adjunct of Object (ක�ම 6ෙශේෂණය)

– Object (ක�මය)

– Attributive adjunct of Predicate (ආ; යාත 6ෙශේෂණය)

– Attributive adjunct of the complement of predicate (ආ; යාත <�ණ 6ෙශේෂණය)

– Complement of predicate (ආ; යාත <�ණය)

– Predicate (ආ; යාතය)

19Computational Model of Grammar for English to Sinhala Machine Translation

ContextContext--Free Grammar for Sinhala Free Grammar for Sinhala

languagelanguage

20

“ද�ෂ >?වරයා තම @ෂ�යා ඉ�මB! දැCමැ,

6ශාරදය5 කෙළේය”

subP = Subject Phrase

VebP = Verb Phrase

Sub = Subject

Obj = Object

ObjP = Objective Phrase

AdjSub = Attributive adjunct of Subject

AdjObj = Attributive adjunct of Object

Pre = Predicate

AdjPre = Attributive adjunct of Predicate

AdjCmp = Attributive adjunct of Complement

CmpPre = Complement of predicate

CmpPreP = = Complement of predicate phrase

S -> SubP VebP

SubP -> Sub

SubP -> AdjSub Sub

VebP -> ObjP PreP

VebP-> PreP

ObjP -> Obj

ObjP -> AdjObj Obj

PreP ->? AdjPre CmpPrep

PreP -> CmpPrep

CmpPrep -> Pre

CmpPrep -> Pre CmpPre

CmpPre -> Cmp

CmpPre -> AdjCmp Cmp

Sub -> Noun

AdjSub ->? Noun

Obj -> Noun

AdjObj -> Noun

AdjPre -> Adv

Cmp -> Noun

AdjCmp -> Noun

Pre -> Verb

Computational Model of Grammar for English to Sinhala Machine Translation

DesignDesign

Computational Model of Grammar for

English to Sinhala Machine Translation21

DesignDesign

22Computational Model of Grammar for English to Sinhala Machine Translation

Sinhala Sentence Composer

• Composes grammatically

correct Sinhala sentence

• Context-Free Grammar is

used to implement

• Implemented through SWI-

Prolog

Sinhala Morphological

Generator

• works through the concepts

of Varanegeema

• Implemented through SWI-

Prolog

ImplementationImplementation

• Sinhala Word conjugator

• BEES

23Computational Model of Grammar for English to Sinhala Machine Translation

EvaluationEvaluation

• Morphological generator successfully works with the

96 % accuracy

• Sinhala Morphological generator handles 85 grammar

rules for the Sinhala nouns and 36 grammar rules for

the Sinhala verbs

• Experimental result shows 89% accuracy of the

overall system

24Computational Model of Grammar for English to Sinhala Machine Translation

LimitationsLimitations

• The translation system perfectly works on the simple

sentences

• System does not successfully handle multi-word

expressions, idioms and compound sentences

• Lexical resources are limited

25Computational Model of Grammar for English to Sinhala Machine Translation

ConclusionConclusion

• Computational model of grammar for Sinhala language has

been developed by considering the Morphology and the

Syntax of the Sinhala language

• Finite State Transducers (FST) and Context-free grammar

(CFG) have been used to describe the computational grammar

for Sinhala

• The grammar has been tested through the English to Sinhala

Machine Translation System

• The concept of Varanegeema (conjugation) is used as

theoretical basics of the translation

26Computational Model of Grammar for English to Sinhala Machine Translation

Further worksFurther works

• Grammar can be used to develop various types of Sinhala language based computer applications such as spell and grammar checkers, word generator etc.

• Handling compound sentences and expansion to the parser for handling more grammatical structures

• Use of Agent technology for improving various aspects of BEES including, Semantic handling and autonomous updating of lexical resources

• Use the Stranded Evaluation matrix (BLUE) to evaluate MT

27Computational Model of Grammar for English to Sinhala Machine Translation

DemonstrationDemonstration

• Sinhala Word Conjugator

• BEES

28Computational Model of Grammar for English to Sinhala Machine Translation

Computational Model of Grammar for

English to Sinhala Machine Translation29

ReferencesReferences1. B. Hettige, A. S. Karunananda, “Varanageema: A Theoretical basics for English to Sinhala”,

Accepted to present, 7th Annual Sessions of Sri Lanka Association for Artificial Intelligence

(SLAAI), Kelaniya, 2010.

2. B. Hettige, A. S. Karunnanda, “An Evaluation methodology for English to Sinhala machine

translation”, Accepted to present 6th International conference on Information and Automation

foe Sustainability (ICIAfS 2010), IEEE., 2010.

3. B. Hettige, A. S. Karunananda, “Context-based approach to semantics handling in English to

Sinhala Machine Translation”, Poster presentation of the 26th National IT conference (NITC),

Sri Lanka. - Colombo, 2009.

4. B. Hettige, A. S. Karunananda, “Developing Lexicon Databases for English to Sinhala

Machine Translation”, proceedings of second International Conference on Industrial and

Information Systems (ICIIS2007), Colombo, IEEE, 2007.

5. B. Hettige, A. S. Karunananda, “A Morphological analyzer to enable English to Sinhala

Machine Translation”, Proceedings of the 2nd International Conference on Information and

Automation (ICIA2006), Colombo, IEEE, 2006, pp 21-26.

6. B. Hettige, A. S. Karunananda, “A Parser for Sinhala Language - First Step Towards English to

Sihala Machine Translation”, To appear in the proceedings of International Conference on

Industrial and Information Systems ICIIS, Colombo : IEEE, 2006.

7. B. Hettige, Bilingual Expert for English to Sinhala, Available:

http://dscs.sjp.ac.lk/~budditha/bees.htm.

30Computational Model of Grammar for English to Sinhala Machine Translation

Thank you!Thank you!

Computational Model of Grammar for

English to Sinhala Machine Translation31