Computational Model of Grammar for English to Sinhala...

of 31 /31
Computational Model of Grammar for English to Sinhala Machine Translation By Budditha Hettige Department of Statistics and Computer Science, University of Sri Jayewardenepura, Sri Lanka & Asoka S. Karunanada Faculty of Information Technology, University of Moratuwa, Sri Lanka 1

Embed Size (px)

Transcript of Computational Model of Grammar for English to Sinhala...

  • Computational Model of Grammar for English to

    Sinhala Machine Translation

    By

    Budditha Hettige

    Department of Statistics and Computer Science,

    University of Sri Jayewardenepura, Sri Lanka

    &

    Asoka S. Karunanada

    Faculty of Information Technology,

    University of Moratuwa, Sri Lanka

    1

  • OverviewOverview

    • Introduction

    • Machine Translation

    • Sinhala Language

    • Computational Model of Grammar for Sinhala

    Language

    • Design & Implementation

    • Evaluation

    • Conclusion & further works

    Computational Model of Grammar for English to Sinhala Machine Translation 2

  • IntroductionIntroduction

    • Machine Translation

    – Computer software that translates text or speech from one natural language to another

    • Machine Translation gives a potential solution for language barrier

    • Many countries use Machine Translation as a solution for their language barrier

    – India

    – Japan etc.

    3Computational Model of Grammar for English to Sinhala Machine Translation

  • Existing ApproachesExisting Approaches

    • Human-assisted

    • Rule-based

    • Statistical

    • Example-based

    • Knowledge-based

    • Hybrid

    • Agent-based

    4Computational Model of Grammar for English to Sinhala Machine Translation

  • NLP @ NLP @ SSri Lankari Lanka

    • UCSC

    – Optical Character Recognizer

    – Sinhala Corpus

    – MT etc.

    • Other NLP Systems

    – Several undergraduate Research

    • BEES

    – Rule-based machine translation system run under the concept of “Varanageema” (Conjugation)

    5Computational Model of Grammar for English to Sinhala Machine Translation

  • Sinhala LanguageSinhala Language

    6

  • Sinhala LanguageSinhala Language

    • Sinhala language has its own writing system, which is

    an offspring of the Brahmi script

    • Sinhala alphabet consists of 61 letters comprising 18

    vowels, 41 consonants and 2 semi-consonants

    • Part of speech

    – Noun

    – Verb

    – Indeclinable particles (�පාත, උපස�ග)

    7Computational Model of Grammar for English to Sinhala Machine Translation

  • Sinhala Noun Morphology Sinhala Noun Morphology

    • Sinhala Noun is a word that represents the

    noun, pronoun and the adjective

    • Is inflected for – Gender (lingaya)

    – Number (Wachana)

    – Person (Purusha)

    – Case (Vibhakthi)

    – Definiteness

    8Computational Model of Grammar for English to Sinhala Machine Translation

  • Word conjugation Word conjugation ((නාමනාමනාමනාමනාමනාමනාමනාම වරණැ��ලවරණැ��ලවරණැ��ලවරණැ��ලවරණැ��ලවරණැ��ලවරණැ��ලවරණැ��ල))

    • More than 27 forms of nouns that can be generated by

    inflecting a single root word

    • Contains more than hundred rules to conjugate a noun using a

    given base form (Prakurthi)

    • There are 15 conjugation patterns identified for generating a

    Sinhala noun (GANA)

    – Eath Ganaya (ඇ� ගණය)

    – Wasu Ganaya (ව� ගණය)

    – Tara Ganaya (තාර ගණය)

    – etc.

    9Computational Model of Grammar for English to Sinhala Machine Translation

  • EathEath GanayaGanaya ((ඇ�ඇ�ඇ�ඇ�ඇ�ඇ�ඇ�ඇ� ගණයගණයගණයගණයගණයගණයගණයගණය))

    10

    පත�ය �යත ඒක අ�යත උ�ත

    Example a r Example a r Example

    ඇ� ◌ා ◌් ඇතා ෙත� � ඇෙත�

    ෙකො� ◌ා ◌් ෙකොකා ෙක� � ෙකොෙක�

    ෙගො! ◌ා ◌් ෙගොනා ෙන� ! ෙගොෙන�

    �ක" ◌ා ◌් �කමා ෙම� " �කෙම�

    #$� ◌ා ◌් #$ලා ෙල� � #$ෙල�

    %�ස් ◌ා ◌් %�සා ෙස� ස් %�ෙස�

    Computational Model of Grammar for English to Sinhala Machine Translation

  • Noun ConjugationNoun Conjugation

    • A Single Noun has 28 word forms

    11Computational Model of Grammar for English to Sinhala Machine Translation

  • Sinhala Verb MorphologySinhala Verb Morphology

    • More than 18 inflection forms are available in a Sinhala base

    verb including inflection of the tense, number and the person

    12

    Person Number Present Past Future

    First Singular බල% බැ'% බල!ෙන%

    First Plural බල( බැ'( බල!ෙන(

    Second Singular බල) බැ') බල!ෙන)

    Second Plural බල* බැ'* බල!ෙන*

    Third Singular බල+ බැ' බල!ෙ!ය

    Third Plural බල, බැ- බල!ෙනෝය

    Computational Model of Grammar for English to Sinhala Machine Translation

  • Verb ConjugationVerb Conjugation

    • A Single Verb has more than 42 word forms

    13Computational Model of Grammar for English to Sinhala Machine Translation

  • Concept of Varanageema Concept of Varanageema (Conjugation)(Conjugation)

    • Words in a language can be generated by limited set of base words

    • There are limited rules for generating word forms using base words

    • Conjugation applies to both Noun and Verbs

    • Using base words and those rules, we can reduce the need for storing large

    number of words in dictionaries

    • Varanageema in Sinhala creates not only derived words, but also handle the

    following concepts in English

    – Person

    – Number

    – Determinants

    – Prepositions

    – Tense

    14Computational Model of Grammar for English to Sinhala Machine Translation

  • Computational Model for Sinhala Computational Model for Sinhala

    MorphologyMorphology

    • Nama Gana and Kriya Gana give the way, how each

    nouns and verbs are derived from its base form

    • Iimplemented 85 grammar rules for Sinhala Nouns

    • Implement the 18 rules for Kriya Gana

    15Computational Model of Grammar for English to Sinhala Machine Translation

  • KAPUTU GANAYAKAPUTU GANAYA ((ක/0ක/0ක/0ක/0ක/0ක/0ක/0ක/0 ගණයගණයගණයගණයගණයගණයගණයගණය))

    16

    Kaputu Ganaya ක/0 ගණය

    Base Form ක/0

    Form Add Remove Example

    �යත ඒකවචන ◌ා ◌ු◌ු◌ු◌ු ක/ටා

    අ�යත උ�ත ෙට� 0000 ක/ෙට�

    අ�යත අ4�ත ෙට5 0000 ක/යට5

    බ*වචන උ�ත ෙටෝ 0000 ක/ෙටෝ

    බ*වචන අ4�ත ! ◌ු◌ු◌ු◌ු ක/ට!

    Computational Model of Grammar for English to Sinhala Machine Translation

  • Finite State Automata for Finite State Automata for SinhalaSinhala

    ““kaputukaputu GanaGana””

    17Computational Model of Grammar for English to Sinhala Machine Translation

  • SyntaxSyntax

    • 36 syntax rules are implemented to generate

    grammatically correct Sinhala sentences

    • The Context-Free Grammar (CFG) stands for a

    particular method of describing the syntax of

    languages

    18Computational Model of Grammar for English to Sinhala Machine Translation

  • Sinhala Language SyntaxSinhala Language Syntax

    • Eight components

    – Attributive adjunct of Subject (උ�ත 6ෙශේෂණය)

    – Subject (උ�තය)

    – Attributive adjunct of Object (ක�ම 6ෙශේෂණය)

    – Object (ක�මය)

    – Attributive adjunct of Predicate (ආ; යාත 6ෙශේෂණය)

    – Attributive adjunct of the complement of predicate (ආ; යාත

  • ContextContext--Free Grammar for Sinhala Free Grammar for Sinhala

    languagelanguage

    20

    “ද�ෂ >?වරයා තම @ෂ�යා ඉ�මB! දැCමැ,

    6ශාරදය5 කෙළේය”

    subP = Subject Phrase

    VebP = Verb Phrase

    Sub = Subject

    Obj = Object

    ObjP = Objective Phrase

    AdjSub = Attributive adjunct of Subject

    AdjObj = Attributive adjunct of Object

    Pre = Predicate

    AdjPre = Attributive adjunct of Predicate

    AdjCmp = Attributive adjunct of Complement

    CmpPre = Complement of predicate

    CmpPreP = = Complement of predicate phrase

    S -> SubP VebP

    SubP -> Sub

    SubP -> AdjSub Sub

    VebP -> ObjP PreP

    VebP-> PreP

    ObjP -> Obj

    ObjP -> AdjObj Obj

    PreP ->? AdjPre CmpPrep

    PreP -> CmpPrep

    CmpPrep -> Pre

    CmpPrep -> Pre CmpPre

    CmpPre -> Cmp

    CmpPre -> AdjCmp Cmp

    Sub -> Noun

    AdjSub ->? Noun

    Obj -> Noun

    AdjObj -> Noun

    AdjPre -> Adv

    Cmp -> Noun

    AdjCmp -> Noun

    Pre -> Verb

    Computational Model of Grammar for English to Sinhala Machine Translation

  • DesignDesign

    Computational Model of Grammar for

    English to Sinhala Machine Translation21

  • DesignDesign

    22Computational Model of Grammar for English to Sinhala Machine Translation

    Sinhala Sentence Composer

    • Composes grammatically

    correct Sinhala sentence

    • Context-Free Grammar is

    used to implement

    • Implemented through SWI-

    Prolog

    Sinhala Morphological

    Generator

    • works through the concepts

    of Varanegeema

    • Implemented through SWI-

    Prolog

  • ImplementationImplementation

    • Sinhala Word conjugator

    • BEES

    23Computational Model of Grammar for English to Sinhala Machine Translation

  • EvaluationEvaluation

    • Morphological generator successfully works with the

    96 % accuracy

    • Sinhala Morphological generator handles 85 grammar

    rules for the Sinhala nouns and 36 grammar rules for

    the Sinhala verbs

    • Experimental result shows 89% accuracy of the

    overall system

    24Computational Model of Grammar for English to Sinhala Machine Translation

  • LimitationsLimitations

    • The translation system perfectly works on the simple

    sentences

    • System does not successfully handle multi-word

    expressions, idioms and compound sentences

    • Lexical resources are limited

    25Computational Model of Grammar for English to Sinhala Machine Translation

  • ConclusionConclusion

    • Computational model of grammar for Sinhala language has

    been developed by considering the Morphology and the

    Syntax of the Sinhala language

    • Finite State Transducers (FST) and Context-free grammar

    (CFG) have been used to describe the computational grammar

    for Sinhala

    • The grammar has been tested through the English to Sinhala

    Machine Translation System

    • The concept of Varanegeema (conjugation) is used as

    theoretical basics of the translation

    26Computational Model of Grammar for English to Sinhala Machine Translation

  • Further worksFurther works

    • Grammar can be used to develop various types of Sinhala language based computer applications such as spell and grammar checkers, word generator etc.

    • Handling compound sentences and expansion to the parser for handling more grammatical structures

    • Use of Agent technology for improving various aspects of BEES including, Semantic handling and autonomous updating of lexical resources

    • Use the Stranded Evaluation matrix (BLUE) to evaluate MT

    27Computational Model of Grammar for English to Sinhala Machine Translation

  • DemonstrationDemonstration

    • Sinhala Word Conjugator

    • BEES

    28Computational Model of Grammar for English to Sinhala Machine Translation

  • Computational Model of Grammar for

    English to Sinhala Machine Translation29

  • ReferencesReferences1. B. Hettige, A. S. Karunananda, “Varanageema: A Theoretical basics for English to Sinhala”,

    Accepted to present, 7th Annual Sessions of Sri Lanka Association for Artificial Intelligence

    (SLAAI), Kelaniya, 2010.

    2. B. Hettige, A. S. Karunnanda, “An Evaluation methodology for English to Sinhala machine

    translation”, Accepted to present 6th International conference on Information and Automation

    foe Sustainability (ICIAfS 2010), IEEE., 2010.

    3. B. Hettige, A. S. Karunananda, “Context-based approach to semantics handling in English to

    Sinhala Machine Translation”, Poster presentation of the 26th National IT conference (NITC),

    Sri Lanka. - Colombo, 2009.

    4. B. Hettige, A. S. Karunananda, “Developing Lexicon Databases for English to Sinhala

    Machine Translation”, proceedings of second International Conference on Industrial and

    Information Systems (ICIIS2007), Colombo, IEEE, 2007.

    5. B. Hettige, A. S. Karunananda, “A Morphological analyzer to enable English to Sinhala

    Machine Translation”, Proceedings of the 2nd International Conference on Information and

    Automation (ICIA2006), Colombo, IEEE, 2006, pp 21-26.

    6. B. Hettige, A. S. Karunananda, “A Parser for Sinhala Language - First Step Towards English to

    Sihala Machine Translation”, To appear in the proceedings of International Conference on

    Industrial and Information Systems ICIIS, Colombo : IEEE, 2006.

    7. B. Hettige, Bilingual Expert for English to Sinhala, Available:

    http://dscs.sjp.ac.lk/~budditha/bees.htm.

    30Computational Model of Grammar for English to Sinhala Machine Translation

  • Thank you!Thank you!

    Computational Model of Grammar for

    English to Sinhala Machine Translation31