AFNLP Board Meeting 2009 Indonesia Country Report
description
Transcript of AFNLP Board Meeting 2009 Indonesia Country Report
1
AFNLP Board Meeting 2009Indonesia Country Report
Hammam RizaAgency for the Assessment and Application of Technology (BPPT)
Indonesia Language Technology Research Community -ILTRC
2
Activities in Language Technologies (2008-2009)
PANL 10n project: UI and BPPT develop POS Tagger and Machine Translation using 1M words corpus (Penn TB and Indonesian news corpus)
Indonesian-English Statistical MT – Pharaoh/ Moses decoder (using 10 Million words parallel corpus) for Bi-directional Multi-Domain Translation System (BPPT) ALR Workshop 2009
A-STAR Consortium: Speech to Speech Translation System for 9 Asian Languages (in cooperation with NICT, ETRI, CASIA, CDAC, IOIT, I2R and BPPT)
Promotion of Local Computing Capacity for Local Languages (endangered languages) Ministry of Communication and Information, National Language Center
Dictionary system: Indonesian-German, Indonesian-France, was launched by Institute of Technology Bandung
Information Retrieval and Extraction at University of Indonesia
Distribution of Bahasa Indonesia
6067
74
01020304050607080
%
1980 1990 2000
Memahami Bhs Indonesia
17.5
27.1
37.5
05
10152025303540
juta
an
1980 1990 2000
MenggunakanBhs Indonesia
Distribution of 13 largest local languages
Speaker (Mio)
Jawa 75.2
Sunda 27.0
Melayu 20.0
Madura 13.7
Minangkabau 6.5
Batak 5.2
Bugis 4.0
Bali 3.8
Aceh 3.0
Sasak 2.1
Makasar 1.6
Lampung 1.5
Rejang 1.0
713 bhs lainnya 45.4
Number of Local Languages 742 (Ethnolgoue)
Ratio of Population/Local Languages
6
Activities inText to Speech (TTS)
• TTS Indonesia – A-STAR Project, HMM based (NICT-ATR), toward Asian Networked Speech to Speech Translation System
• Intensive research and development for “automatic prosody pattern extractor” using Artificial Neural Network (ITB)
• Text to Speech system for Indonesian language (ITB/UI)
• TTS for 5 local languages (Javanese, Sundanese, Balinese, Minang and Makassar) started by National Language Center
Spoken Language Communication Group
Kyoto, Japan
7
Activities in Speech Recognition
• BPPT collaboration with Telkom on Speech Recognition and Summarization resulted in Indonesia Linux Voice Command (ILVC) – further developed into LiSan and Perisalah (Transcription System)
• Speech Corpora and Speech recognition system for Bahasa Indonesia at the University of Indonesia (M. Adriani et.al) and Institut Teknologi Bandung (A. Arman et.al)
LiSan (Linux Voice Command)developed by O. Riandi and BPPT team
Voice command as man-machine interface for accessibility is governed by Law No.4 1997 for Handicapped Person, Law PP No.43 on use of technology for improving social benefit of handicapped person.
Use of Bahasa Indonesia as “language of conduct”
Indonesia Transcription System
June 2009System use in Cabinet Meetingendorsed by State Secretary andOffice of President
Thank You – Terima Kasih