Computational Model of Grammar for English to Sinhala...
Embed Size (px)
Transcript of Computational Model of Grammar for English to Sinhala...
-
Computational Model of Grammar for English to
Sinhala Machine Translation
By
Budditha Hettige
Department of Statistics and Computer Science,
University of Sri Jayewardenepura, Sri Lanka
&
Asoka S. Karunanada
Faculty of Information Technology,
University of Moratuwa, Sri Lanka
1
-
OverviewOverview
• Introduction
• Machine Translation
• Sinhala Language
• Computational Model of Grammar for Sinhala
Language
• Design & Implementation
• Evaluation
• Conclusion & further works
Computational Model of Grammar for English to Sinhala Machine Translation 2
-
IntroductionIntroduction
• Machine Translation
– Computer software that translates text or speech from one natural language to another
• Machine Translation gives a potential solution for language barrier
• Many countries use Machine Translation as a solution for their language barrier
– India
– Japan etc.
3Computational Model of Grammar for English to Sinhala Machine Translation
-
Existing ApproachesExisting Approaches
• Human-assisted
• Rule-based
• Statistical
• Example-based
• Knowledge-based
• Hybrid
• Agent-based
4Computational Model of Grammar for English to Sinhala Machine Translation
-
NLP @ NLP @ SSri Lankari Lanka
• UCSC
– Optical Character Recognizer
– Sinhala Corpus
– MT etc.
• Other NLP Systems
– Several undergraduate Research
• BEES
– Rule-based machine translation system run under the concept of “Varanageema” (Conjugation)
5Computational Model of Grammar for English to Sinhala Machine Translation
-
Sinhala LanguageSinhala Language
6
-
Sinhala LanguageSinhala Language
• Sinhala language has its own writing system, which is
an offspring of the Brahmi script
• Sinhala alphabet consists of 61 letters comprising 18
vowels, 41 consonants and 2 semi-consonants
• Part of speech
– Noun
– Verb
– Indeclinable particles (�පාත, උපස�ග)
7Computational Model of Grammar for English to Sinhala Machine Translation
-
Sinhala Noun Morphology Sinhala Noun Morphology
• Sinhala Noun is a word that represents the
noun, pronoun and the adjective
• Is inflected for – Gender (lingaya)
– Number (Wachana)
– Person (Purusha)
– Case (Vibhakthi)
– Definiteness
8Computational Model of Grammar for English to Sinhala Machine Translation
-
Word conjugation Word conjugation ((නාමනාමනාමනාමනාමනාමනාමනාම වරණැ��ලවරණැ��ලවරණැ��ලවරණැ��ලවරණැ��ලවරණැ��ලවරණැ��ලවරණැ��ල))
• More than 27 forms of nouns that can be generated by
inflecting a single root word
• Contains more than hundred rules to conjugate a noun using a
given base form (Prakurthi)
• There are 15 conjugation patterns identified for generating a
Sinhala noun (GANA)
– Eath Ganaya (ඇ� ගණය)
– Wasu Ganaya (ව� ගණය)
– Tara Ganaya (තාර ගණය)
– etc.
9Computational Model of Grammar for English to Sinhala Machine Translation
-
EathEath GanayaGanaya ((ඇ�ඇ�ඇ�ඇ�ඇ�ඇ�ඇ�ඇ� ගණයගණයගණයගණයගණයගණයගණයගණය))
10
පත�ය �යත ඒක අ�යත උ�ත
Example a r Example a r Example
ඇ� ◌ා ◌් ඇතා ෙත� � ඇෙත�
ෙකො� ◌ා ◌් ෙකොකා ෙක� � ෙකොෙක�
ෙගො! ◌ා ◌් ෙගොනා ෙන� ! ෙගොෙන�
�ක" ◌ා ◌් �කමා ෙම� " �කෙම�
#$� ◌ා ◌් #$ලා ෙල� � #$ෙල�
%�ස් ◌ා ◌් %�සා ෙස� ස් %�ෙස�
Computational Model of Grammar for English to Sinhala Machine Translation
-
Noun ConjugationNoun Conjugation
• A Single Noun has 28 word forms
11Computational Model of Grammar for English to Sinhala Machine Translation
-
Sinhala Verb MorphologySinhala Verb Morphology
• More than 18 inflection forms are available in a Sinhala base
verb including inflection of the tense, number and the person
12
Person Number Present Past Future
First Singular බල% බැ'% බල!ෙන%
First Plural බල( බැ'( බල!ෙන(
Second Singular බල) බැ') බල!ෙන)
Second Plural බල* බැ'* බල!ෙන*
Third Singular බල+ බැ' බල!ෙ!ය
Third Plural බල, බැ- බල!ෙනෝය
Computational Model of Grammar for English to Sinhala Machine Translation
-
Verb ConjugationVerb Conjugation
• A Single Verb has more than 42 word forms
13Computational Model of Grammar for English to Sinhala Machine Translation
-
Concept of Varanageema Concept of Varanageema (Conjugation)(Conjugation)
• Words in a language can be generated by limited set of base words
• There are limited rules for generating word forms using base words
• Conjugation applies to both Noun and Verbs
• Using base words and those rules, we can reduce the need for storing large
number of words in dictionaries
• Varanageema in Sinhala creates not only derived words, but also handle the
following concepts in English
– Person
– Number
– Determinants
– Prepositions
– Tense
14Computational Model of Grammar for English to Sinhala Machine Translation
-
Computational Model for Sinhala Computational Model for Sinhala
MorphologyMorphology
• Nama Gana and Kriya Gana give the way, how each
nouns and verbs are derived from its base form
• Iimplemented 85 grammar rules for Sinhala Nouns
• Implement the 18 rules for Kriya Gana
15Computational Model of Grammar for English to Sinhala Machine Translation
-
KAPUTU GANAYAKAPUTU GANAYA ((ක/0ක/0ක/0ක/0ක/0ක/0ක/0ක/0 ගණයගණයගණයගණයගණයගණයගණයගණය))
16
Kaputu Ganaya ක/0 ගණය
Base Form ක/0
Form Add Remove Example
�යත ඒකවචන ◌ා ◌ු◌ු◌ු◌ු ක/ටා
අ�යත උ�ත ෙට� 0000 ක/ෙට�
අ�යත අ4�ත ෙට5 0000 ක/යට5
බ*වචන උ�ත ෙටෝ 0000 ක/ෙටෝ
බ*වචන අ4�ත ! ◌ු◌ු◌ු◌ු ක/ට!
Computational Model of Grammar for English to Sinhala Machine Translation
-
Finite State Automata for Finite State Automata for SinhalaSinhala
““kaputukaputu GanaGana””
17Computational Model of Grammar for English to Sinhala Machine Translation
-
SyntaxSyntax
• 36 syntax rules are implemented to generate
grammatically correct Sinhala sentences
• The Context-Free Grammar (CFG) stands for a
particular method of describing the syntax of
languages
18Computational Model of Grammar for English to Sinhala Machine Translation
-
Sinhala Language SyntaxSinhala Language Syntax
• Eight components
– Attributive adjunct of Subject (උ�ත 6ෙශේෂණය)
– Subject (උ�තය)
– Attributive adjunct of Object (ක�ම 6ෙශේෂණය)
– Object (ක�මය)
– Attributive adjunct of Predicate (ආ; යාත 6ෙශේෂණය)
– Attributive adjunct of the complement of predicate (ආ; යාත
-
ContextContext--Free Grammar for Sinhala Free Grammar for Sinhala
languagelanguage
20
“ද�ෂ >?වරයා තම @ෂ�යා ඉ�මB! දැCමැ,
6ශාරදය5 කෙළේය”
subP = Subject Phrase
VebP = Verb Phrase
Sub = Subject
Obj = Object
ObjP = Objective Phrase
AdjSub = Attributive adjunct of Subject
AdjObj = Attributive adjunct of Object
Pre = Predicate
AdjPre = Attributive adjunct of Predicate
AdjCmp = Attributive adjunct of Complement
CmpPre = Complement of predicate
CmpPreP = = Complement of predicate phrase
S -> SubP VebP
SubP -> Sub
SubP -> AdjSub Sub
VebP -> ObjP PreP
VebP-> PreP
ObjP -> Obj
ObjP -> AdjObj Obj
PreP ->? AdjPre CmpPrep
PreP -> CmpPrep
CmpPrep -> Pre
CmpPrep -> Pre CmpPre
CmpPre -> Cmp
CmpPre -> AdjCmp Cmp
Sub -> Noun
AdjSub ->? Noun
Obj -> Noun
AdjObj -> Noun
AdjPre -> Adv
Cmp -> Noun
AdjCmp -> Noun
Pre -> Verb
Computational Model of Grammar for English to Sinhala Machine Translation
-
DesignDesign
Computational Model of Grammar for
English to Sinhala Machine Translation21
-
DesignDesign
22Computational Model of Grammar for English to Sinhala Machine Translation
Sinhala Sentence Composer
• Composes grammatically
correct Sinhala sentence
• Context-Free Grammar is
used to implement
• Implemented through SWI-
Prolog
Sinhala Morphological
Generator
• works through the concepts
of Varanegeema
• Implemented through SWI-
Prolog
-
ImplementationImplementation
• Sinhala Word conjugator
• BEES
23Computational Model of Grammar for English to Sinhala Machine Translation
-
EvaluationEvaluation
• Morphological generator successfully works with the
96 % accuracy
• Sinhala Morphological generator handles 85 grammar
rules for the Sinhala nouns and 36 grammar rules for
the Sinhala verbs
• Experimental result shows 89% accuracy of the
overall system
24Computational Model of Grammar for English to Sinhala Machine Translation
-
LimitationsLimitations
• The translation system perfectly works on the simple
sentences
• System does not successfully handle multi-word
expressions, idioms and compound sentences
• Lexical resources are limited
25Computational Model of Grammar for English to Sinhala Machine Translation
-
ConclusionConclusion
• Computational model of grammar for Sinhala language has
been developed by considering the Morphology and the
Syntax of the Sinhala language
• Finite State Transducers (FST) and Context-free grammar
(CFG) have been used to describe the computational grammar
for Sinhala
• The grammar has been tested through the English to Sinhala
Machine Translation System
• The concept of Varanegeema (conjugation) is used as
theoretical basics of the translation
26Computational Model of Grammar for English to Sinhala Machine Translation
-
Further worksFurther works
• Grammar can be used to develop various types of Sinhala language based computer applications such as spell and grammar checkers, word generator etc.
• Handling compound sentences and expansion to the parser for handling more grammatical structures
• Use of Agent technology for improving various aspects of BEES including, Semantic handling and autonomous updating of lexical resources
• Use the Stranded Evaluation matrix (BLUE) to evaluate MT
27Computational Model of Grammar for English to Sinhala Machine Translation
-
DemonstrationDemonstration
• Sinhala Word Conjugator
• BEES
28Computational Model of Grammar for English to Sinhala Machine Translation
-
Computational Model of Grammar for
English to Sinhala Machine Translation29
-
ReferencesReferences1. B. Hettige, A. S. Karunananda, “Varanageema: A Theoretical basics for English to Sinhala”,
Accepted to present, 7th Annual Sessions of Sri Lanka Association for Artificial Intelligence
(SLAAI), Kelaniya, 2010.
2. B. Hettige, A. S. Karunnanda, “An Evaluation methodology for English to Sinhala machine
translation”, Accepted to present 6th International conference on Information and Automation
foe Sustainability (ICIAfS 2010), IEEE., 2010.
3. B. Hettige, A. S. Karunananda, “Context-based approach to semantics handling in English to
Sinhala Machine Translation”, Poster presentation of the 26th National IT conference (NITC),
Sri Lanka. - Colombo, 2009.
4. B. Hettige, A. S. Karunananda, “Developing Lexicon Databases for English to Sinhala
Machine Translation”, proceedings of second International Conference on Industrial and
Information Systems (ICIIS2007), Colombo, IEEE, 2007.
5. B. Hettige, A. S. Karunananda, “A Morphological analyzer to enable English to Sinhala
Machine Translation”, Proceedings of the 2nd International Conference on Information and
Automation (ICIA2006), Colombo, IEEE, 2006, pp 21-26.
6. B. Hettige, A. S. Karunananda, “A Parser for Sinhala Language - First Step Towards English to
Sihala Machine Translation”, To appear in the proceedings of International Conference on
Industrial and Information Systems ICIIS, Colombo : IEEE, 2006.
7. B. Hettige, Bilingual Expert for English to Sinhala, Available:
http://dscs.sjp.ac.lk/~budditha/bees.htm.
30Computational Model of Grammar for English to Sinhala Machine Translation
-
Thank you!Thank you!
Computational Model of Grammar for
English to Sinhala Machine Translation31