Dravidian WordNet

26
DRAVIDIAN WORDNET S.Arulmozi Dravidian University 29 April 2013

description

Dravidian WordNet. S.Arulmozi Dravidian University. Tamil Thesaurus. Preliminary work on lexical semantics. Monumental work on Tamil Thesaurus. Ontologicial classification of Tamil Vocabulary Rajendran, S. (2001) tamizhc coRkaLanjciyam. (in Tamil).Tamil University Publication. - PowerPoint PPT Presentation

Transcript of Dravidian WordNet

Page 1: Dravidian  WordNet

DRAVIDIAN WORDNETS.Arulmozi

Dravidian University

29 April 2013

Page 2: Dravidian  WordNet

Tamil Thesaurus

• Preliminary work on lexical semantics.• Monumental work on Tamil Thesaurus.

• Ontologicial classification of Tamil Vocabulary

• Rajendran, S. (2001) tamizhc coRkaLanjciyam. (in Tamil).Tamil University Publication.

29 April 2013

Page 3: Dravidian  WordNet

Domains in Tamil Thesaurus

•Tamil vocabulary is classified into four major domains: • Entities• Abstracts• Events and • Relationals

29 April 2013

Page 4: Dravidian  WordNet

29 April 2013

parumaippeyarkaL`concrete nouns

'

aHRinaippeyarkaL`irrational nouns'

uyirillaatavai`non-living beings'

uruvaakkiya maRRum patananjceyta poruTkaL`manufactured and processed items'

kaTTappaTTavai`constructed'

Lexical Hierarchy of the Domain `Construction’

Page 5: Dravidian  WordNet

NounsRelations ExampleSynonymy viiTu ‘house’ - illam `house‘Hypernymy-Hyponymy paLLi 'school' – kalviccaalai

'educational institution‘Hyponym-Hypernymy kalluuri 'college' –

aracukkalluuri `govt college‘Holonymy-Meronymy ndaaRkaali 'chair' - kaal 'leg‘Meronymy-Holonymy cakkaram 'wheel' to vaNTi

'cart‘Related Verb paTittal ‘reading’ – paTi ‘read’ Coordinate terms kooyil `temple' – macuuti

'mosque'

29 April 2013

Page 6: Dravidian  WordNet

Verbs

Relations Example

Synonym paTi ‘read’ – payilu ‘read’

Hypernymy cuvai ‘taste’ – uNar

Troponymy keeL ‘ask’– kenjcu ‘plead’

Nominal paruku `drink’ – parukutal `drinking’

Related Noun kaNTupiTi `discover’ – kaNTupiTippu

`discovery’

29 April 2013

Page 7: Dravidian  WordNet

Tamil WordNetObjective: To build a WordNet for Tamil to enhance machine translation

Resources: Tamil Thesaurus, Technical Glossaries (Tamil University Publications), Princeton English WordNet

Funding Agency: Tamil Software Development Fund, Tamil Virtual University - 4 lacs

Time Frame: 18 months

29 April 2013

Page 8: Dravidian  WordNet

DetailsSoftware used

Front-end – Java Back-end - Mysql Database

Project Deliverables50k root words Relationships codedStand-alone and web-based interfaceEmbedded morphological analyser

29 April 2013

Page 9: Dravidian  WordNet

Statistics

Total Words: 50497Unique Senses: 41013

Nouns: 46710Verbs: 2881Adjectives: 416Adverbs: 490

29 April 2013

Page 10: Dravidian  WordNet

Total Words: 50497Unique Senses: 41013

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

Nouns Verbs Adjectives Adverbs

Total Words

Unique Senses (Tokens)

29 April 2013

Project Completed (2004)

http://www.nrcfosshelpline.in/code/wiki/TamilWordnet

Page 11: Dravidian  WordNet

29 April 2013

Page 12: Dravidian  WordNet

Standalone version – Tamil WordNet (Snapshot)29 April 2013

Page 13: Dravidian  WordNet

Standalone version – Tamil WordNet (Snapshot)29 April 2013

Page 14: Dravidian  WordNet

Web-version – Tamil WordNet (Snapshot)

29 April 2013

Page 15: Dravidian  WordNet

Web-version – Tamil WordNet (Snapshot)29 April 2013

Page 16: Dravidian  WordNet

First Effort on Dravidian Languages

• National Workshop on WordNet for Dravidian Languages

• 2-3 June 2003• Organized by AU-KBC Research Centre, Chennai, Central Institute of Indian Languages, Mysore and Tamil University.

• Hands-on experience on specified domain – construction

• Report available on Global WordNet website

29 April 2013

Page 17: Dravidian  WordNet

MHRD ProjectCreation of Machine Translation tools and resources for English to Dravidian Languages: Pilot Study

to develop Machine Translation(MT) system and needed linguistic resources for English-Dravidian languages(Tamil, Malayalam, Telugu and Kannada),

This would facilitate the creation of rich educational contents in Indian languages.

This research effort is to make all the tools and translation system to be based on Machine Learning methodologies so that computer graduates and other such non-linguists are able to immediately participate in the national mission on literacy by contributing additional tools for language translation.

29 April 2013

Page 18: Dravidian  WordNet

Modules• Module 1: Machine Translation

• aims at developing teaching material corresponding to the tools developed so that it can be delivered as part of undergraduate computer science and engineering curriculum on data mining/machine learning.

• This will ensure a critical amount of man power required for sustaining translation effort needed for national mission on education.

• Module 2: Training• aims at training 500 faculties selected from across the country on

machine translation methodologies using machine learning techniques.

• Module 3: Dravidian WordNet• aims at developing a Dravidian WordNet required for translation.

29 April 2013

Page 19: Dravidian  WordNet

Total Budget• IIT Bombay – 15 lacs• Amrita University – 40 lacs• Tamil University – 15 lacs• University of Hyderabad – 15 lacs• Dravidian University – 15 lacs• Time Frame

• 12 months• March 30, 2009 – March 29, 2010

29 April 2013

Page 20: Dravidian  WordNet

Work done• Part of a one year Pilot project involving Tamil, Telugu, Malayalam and Kannada

• Funding Agency: Ministry of HRD• Duration: 18 months (July 2009-Dec 2010)• Deliverable: 13k synsets• 7k synsets linked to IndoWordNet, available at http://www.cfilt.iitb.ac.in/wordnet/webhwn/wn.php

29 April 2013

Page 21: Dravidian  WordNet

Statistics on Dravidian WordNet

29 April 2013

Page 22: Dravidian  WordNet

Publications`Tamil WordNet’, Proceedings of the Fifth Global WordNet

Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Rajendran)`Building a WordNet’ for Dravidian Languages, Proceedings of

the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Rajendran, S.Gopakumar, V.Dhanalakshmi)

`Representation of Kinship in WordNet’, Proceedings of the 9th International Tamil Internet Conference, Coimbatore, 23-27 June 2010 (S.Arulmozi)

`Polysemy in Tamil and other Indian Languages’, Proceedings of the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Arulmozi & Panchanan Mohanty)

`Telugu WordNet’, Proceedings of the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Arulmozi)

29 April 2013

Page 23: Dravidian  WordNet

First IndoWordNet Workshop• Amrita University• 11-14 June 2009• Necessity for developing linked WordNets of different

languages of India was stressed• Challenges such as language divergence, lexical semantics,

embedding WordNet in MT and cross-lingual search applications can be achieved

• Participation from groups: Hindi, Marathi, Sanskrit, Nepali, Assamese, Bodo, Manipuri, Konkani, Kashmiri, Tamil, Telugu, Malayalam, Kannada

• Proposal on Indhradhanush

29 April 2013

Page 24: Dravidian  WordNet

Dravidian WordNet• Present Project • Funded by DIT.

29 April 2013

Page 25: Dravidian  WordNet

LinksTamil WordNet – Open Source http://www.nrcfosshelpline.in/code/wiki/TamilWordnetVerbNet (English) http://verbs.colorado.edu/~mpalmer/projects/verbnet.htmlPrinceton English WordNet

http://wordnet.princeton.edu/Global WordNet Association

http://www.globalwordnet.org/WordNets in the World

http://www.globalwordnet.org/gwa/wordnet_table.htmWordNet Bibliography

http://lit.csci.unt.edu/~wordnet/IndoWordNethttp://www.cfilt.iitb.ac.in/wordnet/webhwn/wn.php

29 April 2013

Page 26: Dravidian  WordNet

Thank you!

29 April 2013