NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development...

19
NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology Center Thailand 27 August 2002, AFNLP/COLING2002, Taipei, Taiwan

Transcript of NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development...

Page 1: NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.

NLP Related Activities in Thailand

Virach Sornlertlamvanich

Information Research and Development Division

National Electronics and Computer Technology Center

Thailand

27 August 2002, AFNLP/COLING2002, Taipei, Taiwan

Page 2: NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.

SNLP-O-COCOSDA 2002Hua-Hin, Thailand

May 9-11, 2002

Page 3: NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.

✔ About 100 participants

✔ Invited talks:

• 'Information Retrieval and Modeling of Tonal Features of Speech', Prof. H. Fujisaki, U. of Tokyo

• 'Speech Synthesis for Tonal Languages' Prof. Fangxin Chen, IBM China Lab.

• 'Natural Language Understanding and Action Control', Prof. Takenobu Tokunaga, TIT

• 'Cross-Language Projection of Linguistic Knowledge', Prof. David Yarowsky, John Hopkins U.

Page 4: NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.

✔ Presentations:• 57 oral presentations

- 28 regular papers- 17 short papers - 8 COCOSDA papers- 4 student papers

✔ Submission countries:

• 23 from Thailand• 14 from Japan• 5 from China• 3 from Korea and India• 2 from Taiwan• 1 from Malaysia, Indonesia and Guam• 4 student paers from Thailand

Page 5: NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.

✔ Types of papers:• 11 papers in IR/IE• 7 papers in pattern recognition• 6 papers in NLP application• 4 papers in language resources• 3 papers in morphology• 2 papers in syntax• 13 papers in speech processing• 8 papers in O-COCOSDA• 4 papers in student session

Page 6: NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.

✔ SNLP• 1993 Chulalongkorn U. and NECTEC• 1995 Kasetsart U. and NECTEC• 1997 AIT and NECTEC• 2000 King Mongkut's U. Of Technology

(Thonburi) and NECTEC• 2002 Sirindhorn International Institute of

Technology and NECTEC

Page 7: NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.

Survey on Research and Development of Machine Translation

in Asian Countries

Merlin Beach ResortPhuket, Thailand

May 13-14, 2002

Page 8: NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.

Participation

Participants : 11 Countries

India, Indonesia, Japan, Korea, Lao PDR, Malaysia, Myanmar, Philippines, Singapore, Thailand, Vietnam

: and 1 region Hong Kong

Total participants: 50 (oversea 24, local 26)- R&D group (39)- Supporting, policy and planning group (11)

Page 9: NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.

Objectives:

1) To update technology status of machine translation in Asia.

2) To exchange research and development experience in the field.

3) To establish collaboration for developing a cross language web navigation in Asia.

4) To establish activities for technology transfer from experienced countries to the inexperienced countries.

5) To develop human resources in the field.

Page 10: NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.

Activities

- Keynote speech: 'AAMT Activities and General Trends of MT' by Dr. Hitoshi Iida

- 17 papers by participants* MT Research Techniques: spoken language translation,

semantic annotation, research on particular cases* MT Research Status in the Countries* Digitalization Research and Infrastructure Status* Problems in R&D

- Roundtable Discussion

* Collaboration within Asia* Standardization for languages within this region* Financial support problem* Possibility of joining the existing working bodies

Page 11: NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.

MT status in Asia

Basic components:- Standardization (Character code and locale) Lao PDR, Myanmar, Philippines, Vietnam

Working on Regional MT:- Philippines and India (official lang. - dialects) Malaysia, Indonesia and Brunei (share resources

for Malay-English MT development)

Experience on English and mother language MT Hong Kong, Indonesia, Japan, Korea, Malaysia, Singapore, Thailand.

Page 12: NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.

Expected Collaboration in Asia

Establish Asian chapters for Intra-Asian Collaboration.

Construct Help-desk operation for standardization.

Establish a Working group or Liaison Secretariat for Language resources.

Establish a Working group or Liaison Secretariat for MT.

Make standardization among Asian languages Share language resources

Exchange information for research & application

Page 13: NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.

What should we do?

Establish Asian chapters for Intra-Asian Collaboration

in ISO; Language resource, code, document

Install BBS and mailing group

Provide tutorial programa and application projects

Promote the contribution from each country

Page 14: NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.

What should we do?

Construct Help-desk operation for standardization

Standardized documents/ activities (ISO.MPEG7) Code standardization (unicode, etc.)

Page 15: NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.

Establish a Working group or Liaison Secretariat for Language resources.

What should we do?

Coding description

Basic descriptors and mechanisms for language resources

Representation schemes

Multilingual text representation

Lexical databases

Workflow of Language Resource Management

Page 16: NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.

What should we do?

Establish a Working group or Liaison Secretariat for MT

Machine Translation (may be under AAMT)

Language Processing (General/ Infrastructure)

Verification for standardization

Copyright

Operation personnel/ fund cooperation

Page 17: NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.

Three Levels of Collaboration

StandardizationFont, Character code, I/O method, Print, Locale

Language Resources and Processing ToolsDictionary, Corpus, NLP generic tools

Cross-Language R&D and ServicesMachine Translation, Search engine, Information Retrieval/Extraction

Resource Sharing

Cross-Language Technology

Open Source Software

Page 18: NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.

Cross Language Technology

ChineseLanguage Processing

JapaneseLanguage Processing

FrenchLanguage Processing

KoreanLanguage Processing

MyanmarLanguage Processing

VietnamLanguage Processing

IndonesiaLanguage Processing

ThaiLanguage Processing

cc� �Language Processing

cc� �Language Processing

MT

MT MT MT MT

MT MT MT

MTMT

e-Content Dictionary e-Content Dictionary e-Content Dictionary e-Content Dictionary

e-Content Dictionary e-Content Dictionary

e-Content Dictionary e-Content Dictionary e-Content Dictionary e-Content Dictionary

ChineseLanguage Processing

JapaneseLanguage Processing

FrenchLanguage Processing

KoreanLanguage Processing

MyanmarLanguage Processing

VietnamLanguage Processing

IndonesiaLanguage Processing

ThaiLanguage Processing

.......Language Processing

.......Language Processing

MT

MT MT MT MT

MTMT MT

MTMT

e-Content Dictionary e-Content Dictionary e-Content Dictionary E-Content Dictionary

e-Content Dictionary e-Content Dictionary

e-Content Dictionary e-Content Dictionary e-Content Dictionary e-Content Dictionary

EnglishLanguage Processing

e-Content Dictionary

EnglishLanguage Processing

e-Content Dictionary

Page 19: NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.

Application over the Cross Language Tech.

Cross Language Technology

VisualizationPresentation Extraction Retrieval Summarization MT Mining

E-services

EnglishLanguage Processing

ChineseLanguage Processing

JapaneseLanguage Processing

FrenchLanguage Processing

KoreanLanguage Processing

MyanmarLanguage Processing

VietnamLanguage Processing

IndonesiaLanguage Processing

ThaiLanguage Processing

……Language Processing

……Language Processing

MT

MT MT MT MT

MT MT MT

MTMT

e-Content Dictionary e-Content Dictionary e-Content Dictionary e-Content Dictionary

e-Content Dictionary e-Content Dictionary

e-Content Dictionary

e-Content Dictionary e-Content Dictionary e-Content Dictionary e-Content Dictionary

EnglishLanguage Processing

ChineseLanguage Processing

JapaneseLanguage Processing

FrenchLanguage Processing

KoreanLanguage Processing

MyanmarLanguage Processing

VietnamLanguage Processing

IndonesiaLanguage Processing

ThaiLanguage Processing

……Language Processing

……Language Processing

MT

MT MT MT MT

MT MT MT

MTMT

e-Content Dictionary e-Content Dictionary e-Content Dictionary e-Content Dictionary

e-Content Dictionary e-Content Dictionary

e-Content Dictionary

e-Content Dictionary e-Content Dictionary e-Content Dictionary e-Content Dictionary